Converting php [] to array() syntax in multiple files - php

I'd like to create a (temporary) version of my web app that supports PHP < 5.4, for which I need to convert all my short array syntax to the long one.
There's plenty of "old to new" scripts, but the only "new to old" one seems to be this — https://github.com/slang800/php-array-syntax-converter, however when I tried running it, it gave me the following error:
PHP Catchable fatal error: Argument 1 passed to processTree() must be
an instance of Pharborist\TopNode, instance of Pharborist\RootNode
given
Hopefully, someone has had a similar experience and could point me in the right direction.
Thanks.

If I were you, I'd simply write my own script, so you don't have to wait for a fix to be written for the script you're using now. It's really not that hard to work out what to replace, really: It's all occurrences of this pattern /\[[^]]*\]/ that are part of an expression, other than $var[$idx] (accessing an index of an array, or pushing onto an array: $var[] =).
Off the top of my head, I'd write a script that just does this, basically:
$pattern = '/(=>?|,|\[|\()\s*\[([^]]+)\](?!\s*=)/';
$script = file_get_contents('path/to/file.php');
$rewritten = preg_replace(
$pattern,
'$1 array($2)',
$script
);
//only AFTER you've checked the output, of course!
file_put_contents('path/to/file.php', $rewritten);
Note that this regex isn't finished just yet: it can't cope with nested arrays like:
$x = [ $y[1] => [1, 2, 3]];
But Play around with it here, until you can get it to work. It might be worth while using preg_match_all with a different pattern, matching the entire expression containing nested arrays, and replace what needs replacing separately. That'd make your life a lot easier, I do think

Related

Multiple procedure assignment for self variable in one line

Is there any way a variable can be assigned from multiple procedures in one line?
For example:
$class_splits = explode("\\", $class_name);
$short_class_name = $class_splits[count($class_splits) - 1] ?? null;
Translated to this pseudo code:
$short_class_name = explode("\\", $class_name) => prev_result(count(prev_result) -1);
I don't have big expectations on this as I know it looks too "high-level" but not sure if newer versions of PHP can handle this.
Thanks.
You can use an assignment as an expression, then refer to the variable that was assigned later in the containing expression.
$short_class_name = ($class_splits = explode("\\", $class_name))[count($class_splits) - 1] ?? null;
That said, I don't recommend coding like this. If the goal was to avoid creating another variable, it doesn't succeed at that. It just makes the whole thing more complicated and confusing.
I believe you have an "X/Y Problem": your actual requirement seems to be "how to split a string and return just the last element", but you've got stuck thinking about a particular solution to that.
As such, we can look at the answers to "How to get the last element of an array without deleting it?" To make it a one-line statement, we need something that a) does not require an argument by reference, and b) does not require the array to be mentioned twice.
A good candidate looks like array_slice, which can return a single-element array with just the last element, from which we can then extract the string with [0]:
$short_class_name = array_slice(explode("\\", $class_name), -1)[0];
Since we no longer need to call count(), we can avoid the problem of needing the same intermediate value in two places.
Whether the result is actually more readable than using two lines of code is a matter of taste - remember that a program is as much for human use as for machine use.

PHP regex parsing - splitting tokens in my own language. Is there a better way?

I am creating my own language.
The goal is to "compile" it to PHP or Javascript, and, ultimately, to interpret and run it on the same language, to make it look like a "middle-level" language.
Right now, I'm focusing on the aspect of interpreting it in PHP and run it.
At the moment, I'm using regex to split the string and extract the multiple tokens.
This is the regex I have:
/\:((?:cons#(?:\d+(?:\.\d+)?|(?:"(?:(?:\\\\)+"|[^"]|(?:\r\n|\r|\n))*")))|(?:[a-z]+(?:#[a-z]+)?|\^?[\~\&](?:[a-z]+|\d+|\-1)))/g
This is quite hard to read and maintain, even though it works.
Is there a better way of doing this?
Here is an example of the code for my language:
:define:&0:factorial
:param:~0:static
:case
:lower#equal:cons#1
:case:end
:scope
:return:cons#1
:scope:end
:scope
:define:~0:static
:define:~1:static
:require:static
:call:static#sub:^~0:~1 :store:~0
:call:&-1:~0 :store:~1
:call:static#sum:^~0:~1 :store:~0
:return:~0
:scope:end
:define:end
This defines a recursive function to calculate the factorial (not so well written, that isn't important).
The goal is to get what is after the :, including the #. :static#sub is a whole token, saving it without the :.
Everything is the same, except for the token :cons, which can take a value after. The value is a numerical value (integer or float, called static or dynamic in the language, respectively) or a string, which must start and end with ", supporting escaping like \". Multi-line strings aren't supported.
Variables are the ones with ~0, using ^ before will get the value to the above :scope.
Functions are similar, being used &0 instead and &-1 points to the current function (no need for ^&-1 here).
Said this, Is there a better way to get the tokens?
Here you can see it in action: http://regex101.com/r/nF7oF9/2
[Update] To issue the pattern being complicated and maintainability, you can split it using PCRE_EXTENDED, and comments:
preg_match('/
# read constant (?)
\:((?:cons#(?:\d+(?:\.\d+)?|
# read a string (?)
(?:"(?:(?:\\\\)+"|[^"]|(?:\r\n|\r|\n))*")))|
# read an identifier (?)
(?:[a-z]+(?:#[a-z]+)?|
# read whatever
\^?[\~\&](?:[a-z]+|\d+|\-1)))
/gx
', $input)
Beware that all space are ignored, except under certain conditions (\n is normally "safe").
Now, if you want to pimp you lexer and parser, then read that:
What does (f)lex [GNU equivalent of LEX] is simply let you pass a list of regexp, and eventually a "group". You can also try ANTLR and PHP Target Runtime to get the work done.
As for you request, I've made a lexer in the past, following the principle of FLEX. The idea is to cycle through the regexp like FLEX does:
$regexp = [reg1 => STRING, reg2 => ID, reg3 => WS];
$input = ...;
$tokens = [];
while ($input) {
$best = null;
$k = null;
for ($regexp as $re => $kind) {
if (preg_match($re, $input, $match)) {
$best = $match[0];
$k = $kind;
break;
}
}
if (null === $best) {
throw new Exception("could not analyze input, invalid token");
}
$tokens[] = ['kind' => $kind, 'value' => $best];
$input = substr($input, strlen($best)); // move.
}
Since FLEX and Yacc/Bison integrates, the usual pattern is to read until next token (that is, they don't do a loop that read all input before parsing).
The $regexp array can be anything, I expected it to be a "regexp" => "kind" key/value, but you can also an array like that:
$regexp = [['reg' => '...', 'kind' => STRING], ...]
You can also enable/disable regexp using groups (like FLEX groups works): for example, consider the following code:
class Foobar {
const FOOBAR = "arg";
function x() {...}
}
There is no need to activate the string regexp until you need to read an expression (here, the expression is what come after the "="). And there is no need to activate the class identifier when you are actually in a class.
FLEX's group permits to read comments, using a first regexp, activating some group that would ignore other regexp, until some matches is done (like "*/").
Note that this approach is a naïve approach: a lexer like FLEX will actually generate an automaton, which use different state to represent your need (the regexp is itself an automaton).
This use an algorithm of packed indexes or something alike (I used the naïve "for each" because I did not understand the algorithm enough) which is memory and speed efficient.
As I said, it was something I made in the past - something like 6/7 years ago.
It was on Windows.
It was not particularly quick (well it is O(N²) because of the two loops).
I think also that PHP was compiling the regexp each times. Now that I do Java, I use the Pattern implementation which compile the regexp once, and let you reuse it. I don't know PHP does the same by first looking into a regexp cache if there was already a compiled regexp.
I was using preg_match with an offset, to avoid doing the substr($input, ...) at the end.
You should try to use the ANTLR3 PHP Code Generation Target, since the ANTLR grammar editor is pretty easy to use, and you will have a really more readable/maintainable code :)

Is there an automated way of fixing one line ifs in PHP?

I don't know if it's just me or not, but I am allergic to one line ifs in any c like language, I always like to see curly brackets after an if, so instead of
if($a==1)
$b = 2;
or
if($a==1) $b = 2;
I'd like to see
if($a==1){
$b = 2;
}
I guess I can support my preference by arguing that the first one is more prone to errors, and it has less readability.
My problem right now is that I'm working on a code that is packed with these one line ifs, I was wondering if there is some sort of utility that will help me correct these ifs, some sort of php code beautifier that would do this.
I was also thinking of developing some sort of regex that could be used with linux's sed command, to accomplish this, but I'm not sure if that's even possible given that the regex should match one line ifs, and wrap them with curley brackets, so the logic would be to find an if and the conditional statement, and then look for { following the condition, if not found then wrap the next line, or the next set of characters before a line break and then wrap it with { and }
What do you think?
Your best bet is probably to use the built-in PHP tokenizer, and to parse the resulting token stream.
See this answer for more information about the PHP Tokenizer: https://stackoverflow.com/a/5642653/1005039
You can also take a look at a script I wrote to parse PHP source files to fix another common problem in legacy code, namely to fix unquoted array indexes:
https://github.com/GustavBertram/php-array-index-fixer/blob/master/aif.php
The script uses a state machine instead of a generalized parser, but in your case it might be good enough.

parse search string

I have search strings, similar to the one bellow:
energy food "olympics 2010" Terrorism OR "government" OR cups NOT transport
and I need to parse it with PHP5 to detect if the content belongs to any of the following clusters:
AllWords array
AnyWords array
NotWords array
These are the rules i have set:
If it has OR before or after the word or quoted words if belongs to
AnyWord.
If it has a NOT before word or quoted words it belongs to NotWords
If it has 0 or more more spaces before the word or quoted phrase it
belongs to AllWords.
So the end result should be something similar to:
AllWords: (energy, food, "olympics 2010")
AnyWords: (terrorism, "government", cups)
NotWords: (Transport)
What would be a good way to do this?
If you want to do this with Regex, be aware that your parsing will break on stupid user input (the user, not the input =) ).
I'd try the following Regexes.
NotWords:
(?<=NOT\s)\b((?!NOT|OR)\w+|"[^"]+")\b
AllWords:
(?<!OR\s)\b((?!NOT|OR)\w+|"[^"]+")\b(?!\s+OR)
AnyWords:
Well.. the rest. =) They are not that easy to spot, since I do not know how to put "OR behind it or OR in front of it" into regex. Maybe you could join the results from the three regexes
(?<=OR\s)\b((?!NOT|OR)\w+|"[^"]+")\b(?!\s+OR)
(?<=OR\s)\b((?!NOT|OR)\w+|"[^"]+")\b(?=\s+OR)
(?<!OR\s)\b((?!NOT|OR)\w+|"[^"]+")\b(?=\s+OR)
Problems: These require exactly one space between modifier words and expressions. PHP only supports lookbehinds for fixes length expressions, so I see no way around that, sorry. You could just use \b(\w+|"[^"]+")\b to split the input, and parse the resulting array manually.
This is an excellent example of how an test-first driven approach can help you arrive at a solution. It might not be the very best one, but having tests written allow you to refactor with confidence and instantly see if you break any of the existing tests. Anyway, you could set up a few tests like:
public function setUp () {
$this->searchParser = new App_Search_Parser();
}
public function testSingleWordParsesToAllWords () {
$this->searchParser->parse('Transport');
$this->assertEquals(
$this->searchParser->getAllWords(),
array('Transport')
);
$this->assertEquals($this->searchParser->getNotWords(), array());
$this->assertEquals($this->searchParser->getAnyWords());
}
public function testParseOfCombinedSearchString () {
$query = 'energy food "olympics 2010" Terrorism ' .
'OR "government" OR cups NOT transport';
$this->searchParser->parse($query);
$this->assertEquals(
$this->searchParser->getAllWords(),
array('energy', 'food', 'olympics 2010')
);
$this->assertEquals(
$this->searchParser->getNotWords(),
array('Transport')
);
$this->assertEquals(
$this->searchParser->getAnyWords(),
array( 'terrorism', 'government', 'cups')
);
}
Other good tests would include:
testParseTwoWords
testParseTwoWordsWithOr
testParseSimpleWithNot
testParseInvalid
Here you have to decide what invalid input looks like and how you interpret it, i.e:
'NOT Transport': Search for anything that doesn't contain Transport or inform the user that he has to include at least one search term too?
'OR energy': Is it ok to begin with a combinator?
'food OR NOT energy': Does this mean "search for food or anything that doesn't contain energy", or does it mean "search for food and not energy", or doesn't it mean anything? (i.e. throw exception, return false or whatnot)
testParseEmpty
Then, write the tests one by one, and write a simple solution that passes the test. Then refactor and make it right, and run again to see that you still pass the test.
Once a test passes and the code is refactored, then write the next test and repeat the procedure. Add more tests as you find special cases and refactor the code so that it passes all tests. If you break a test, back-up and re-write the code (not the test!) such that it passes.
As for how you can solve this problem, look into preg_match, strtok or rely simply loop through the string adding up tokens as you go.

Problem with PHP Array

Why is it not possible to do something equivalent to this in PHP:
(Array(0))[0];
This is just for sake of argument, but it seems strange it does not allow access of anonymous objects. I would have to do something like the following:
$array = Array(0);
$array[0];
Any ideas why this is the behavior of PHP?
I read something somewhat detailed about this once and I regret not bookmarking it because it was quite insightful. However, it's something along the lines of
"Because the array does not exist in memory until the current statement (line) executes in full (a semicolon is reached)"
So, basically, you're only defining the array - it's not actually created and readable/accessible until the next line.
I hope this somewhat accurately sums up what I only vaguely remember reading many months ago.
This language feature hasn’t been inplemented yet but will come in PHP 6.
I guess the short answer is: nobody has coded it yet. I've used (and loved) that syntax in both Python and Javascript, but still we wait for PHP.
The main reason is because unlike some languages like Python and JavaScript, Array() (or in fact array()) is not an object, but an language construct which creates an inbuilt data type.
Inbuilt datatypes themselves aren't objects either, and the array() construct doesn't return a reference to the "object" but the actual value itself when can then be assigned to a variable.

Categories