PHP Call-time pass-by-reference replace with regex - php

I am porting a office managing system from php 5.3 to 5.4 and the errors consists that many functions implement pass-by-reference (the "&" symbol in arguments) and this stops the application runtime. I have solved in the following way:
//This is a function with pass by reference:
function myfunc(&$x, $y, &$z) { ... }
//This is the above function being implemented:
$myClass->myfunc(&$var1, $var2, &$var3);
The php documentation tells me that I have to remove the "&" when I implement the function, so I have to replace that line for:
$myClass->myfunc($var1, $var2, $var3);
because the function definition already have the symbol meaning that argument comes by reference.
But I have so many php files, more than 800 files and I would have to replace one by one, line by line.
So I just need a regex that helps me locate all this "&$" coincidences (avoiding a && that means AND, and any of "&$" coincidences in a function declaration).
I built this regex: [^&]&\$ and it works but need to exclude any coincidence that starts the line in "function" (it would be a function declaration).

^(\s*function.*)$|(?<!&)&(?=\$[^(]*\))
Try this.This should work for you.Replace by $1.
See demo.
https://regex101.com/r/sH8aR8/35
Your regex [^&]&\$ does not use 0 width assertions.It will replace 3 characters.So it's better to use lookaheads and lookbehnids.
$re = "/^(\\s*function.*)$|(?<!&)&(?=\\$[^(]*\\))/im";
$str = "&\$)\n&&\$\n\$myClass->myfunc(&\$var1, \$var2, &\$var3);\nfunction myfunc(&\$x, \$y, &\$z) { ... }\n\n";
$subst = "$1";
$result = preg_replace($re, $subst, $str);

Related

How to replace below preg_replace with preg_replace_callback?

I am having difficulty converting below preg_replace() function call
preg_replace("/\{(.*?)\}/e", '$\1', $data)
to using preg_replace_callback() (because of the removed e modifier in PHP 7.0).
I have tried this but I have no idea how to fully handle '$\1':
preg_replace_callback('/\{(.*?)\}/', function ($matches) {
return $matches[0];
}, $data);
Any help would be highly appreciated.
I'd like to suggest the following code as an answer to the concrete question:
$vars = get_defined_vars();
$result = preg_replace_callback('/{(.*?)}/', function ($matches) use ($vars) {
return $vars[$matches[1]];
}, $data);
unset($vars);
The remaining part of the answer should provide more information and references for mainly two things:
Show how this can be solved with divide and conquer also leading to a step-by-step guide on how to port such code.
Add more context as depending on how/where that code is to be ported, there can be differences, also for error handling and PHP version compatibility requirements.
This should make the answer more applicable to similar variable variables related preg_replace() with e modifier migration question based on backreferences.
The e (PREG_REPLACE_EVAL) Modifier
This feature was DEPRECATED in PHP 5.5.0, and REMOVED as of PHP 7.0.0.
It was only used by preg_replace() and was ignored by other PCRE functions.
From a previous PHP manual description revision:
If this deprecated modifier is set, preg_replace() does normal substitution of backreferences in the replacement string, evaluates it as PHP code, and uses the result for replacing the search string. Single quotes, double quotes, backslashes (\) and NULL chars will be escaped by backslashes in substituted backreferences.
Rationale and context why it was deprecated/removed can be found in RFC: Remove preg_replace /e modifier, mainly three issue classes:
Security issues
Overescaping of quotes
Use as obfuscation in exploit scripts
The PHP RFC Wiki page has more details, and the information is a good addition to the answer as a port at least crosses 1. and 2. for the removed PHP code evaluation.
The '$\1' Replacement
As per the e modifiers description, '$\1' will be evaluated after the backreference \1 (first matching group) is replaced.
In the questions example that is the contents of the angle brackets {...}:
'/\{(.*?)\}/'
~~~~~
1 : first matching group
For example when the subject string is "Hello {name}", the contents of the first matching group is "name". Resolving it leads to the following PHP code that then is evaluated:
$name
That is a variable named "name". The evaluation is done within the scope where preg_replace() is called.
So far the description of the replacement pattern.
How to make compatible with PHP 7.0.0 (and earlier/later)?
A common way to start changing away from the e modifier is to make use of preg_replace_callback() instead of preg_replace(), which is done by replacing it and using an anonymous function (or any other callback method, however anonymous functions are normally the preferable way in most cases).
This is also (thankfully) outlined on the reference question. In the following I'll first leave backslash escaping of the substituted backreferences out to simplify the solution (and address it later).
An example of what has been done so far (with only a slight correction on the $matches index - it needs to be 1 not 0):
preg_replace_callback('/\{(.*?)\}/', function ($matches) {
return $matches[1];
}, $data);
The \1 backreference from the first matching group is done by using $matches[1] here. It will contain the contents of the angle brackets {...}, e.g. "name" from the previous example.
(compare: Changing preg_replace to preg_replace_callback)
More or less obviously for the here specific $\1 replacement, it is incomplete as it would only replace with the name of the variable and not (yet) its contents.
Still missing is to connect the name with the original scope. Which requires a little more work.
Obtain Variables in preg_replace() Scope
To obtain all variables defined in the same scope as the preg_replace_callback() (previously preg_replace()) call, the get_defined_vars() function is an option:
This function returns a multidimensional array containing a list of all defined variables, be them environment, server or user-defined variables, within the scope that get_defined_vars() is called.
Using that array within the anonymous callback function then allows to obtain the value of a variable by its name as array key:
$vars = get_defined_vars(); # <1>
preg_replace_callback('/\{(.*?)\}/', function ($matches) use ($vars) { # <2>
$name = $matches[1];
return $vars[$name]; # <3>
}, $data);
Obtain variables from preg_replace scope.
Use variables with the anonymous function (the use language construct).
Access variables value by name and return.
This was the missing part in the question to turn the backreference used as variable name to obtain the actual value already.
As so often, there are similar ways to achieve the same, some of them more depending on context. Truly get_defined_vars() is a pretty generic way to create a "variable table" and map names to their value. But there can be circumstances for which an array is already available and there might be no need to call that function.
Alternative to get_defined_vars(): Use of $GLOBALS array
This approach has been chosen by Wiktor Stribiżew in his answer:
Given the scope is the global scope (likely not, but if), then the $GLOBALS superglobal can be used instead:
$result = preg_replace_callback('/\{(.*?)\}/', function ($matches) {
$name = $matches[1];
return $GLOBALS[$name];
}, $data);
No need to call get_defined_vars() nor to unset the $vars array after the call (or otherwise need to potentially care about it). But this is binding to global variable state (may or not be an issue with the application).
Alternative to get_defined_vars(): Re-Use of another array (if available)
Given variables were previously imported into the scope where preg_replace() with the e modifier was running from an array, then the import is redundant and the array itself can be used with the callbacks function use clause. An example:
function replace_variables(string $data, array $vars) {
# previously here: extract($vars);
$result = preg_replace_callback('/\{(.*?)\}/', function ($matches) use ($vars) {
$name = $matches[1];
return $vars[$name];
}, $data);
# ...
}
As extract() comes with side effects you normally want to prevent, this would catch two birds with one stone: The variables array was already available and get_defined_vars() must not be called. Additionally, an unsafe extract operation can be dropped as it is not necessary any longer to create variables in the scope of the earlier preg_replace().
This should leave enough food for thought to connect the name in the backreference to the value. The PHP manual has more about variable scope in case there is a more specific context. Normally get_defined_vars() should address most issues if an array is not yet available.
Notes for the '/\{(.*?)\}/' Regular Expression Pattern
This pattern comes with some caveats, therefore I'm leaving some notes for additional information and to open up on error handling and changes of it due to porting, which will address more issues.
The backslashes "\" are redundant:
Just a minor thing to get it out of the way:
ok.....: '/\{(.*?)\}/'
correct: '/{(.*?)}/'
This change can be always done, those backslashes are redundant. They don't qualify as quantifiers.
This improves readability of the pattern.
Change in Regular Expression Pattern PHP Error Behaviour
Second worth a note on the search pattern is to highlight a potential incompatibility:
The pattern allows a zero-length match, that is the empty angle brackets group {} does match leading to a zero-length (variable) name. It could be used to present a default value (e.g. null) but perhaps you may want to not have it matching at all or may want to add error handling.
w/ empty.: '/{(.*?)}/'
w/ length: '/{(.+?)}/'
Which brings up a related point: Undefined variable/index warnings.
To prevent undefined index warnings these could resolve to null silently (or you may want to add error handling). This has been done in the upfront code porting suggestion at the very beginning of the answer.
Note thought that these errors were harsher with the previous preg_replace() call with the e modifier as the empty name resulted in a parse error when evaluated and then a fatal error. Example:
PHP Parse error: syntax error, unexpected ';', expecting variable (T_VARIABLE) or '$' in ... : regexp code on line 1
PHP Fatal error: preg_replace(): Failed evaluating code:
$
To define such errors out of existence as of a PHP 7.0.0 (and above/below) compatible port:
$vars = get_defined_vars();
$result = preg_replace_callback('/{(.*?)}/', function ($matches) use ($vars) {
$name = $matches[1];
return isset($vars[$name]) ? $vars[$name] : null;
}, $data);
unset($vars);
Alternatively it is possible to mimik the old error behaviour (a bit) by throwing (e.g. on empty name), as it triggers a fatal, uncaught exception error:
$vars = get_defined_vars();
$result = preg_replace_callback('/{(.*?)}/', function ($matches) use ($vars) {
$name = $matches[1];
if ('' === $name) {
throw new \RuntimeException('preg_replace_callback(): callback: Expected variable name, got zero-length string.');
}
return isset($vars[$name]) ? $vars[$name] : null;
}, $data);
unset($vars);
(if backwards compatibility below PHP 7.0.0 is not an issue, throwing an \Error is a more matching alternative for PHP 7.0.0 and above. Alternatively use trigger_error() instead to include versions below PHP 7.0.0 as well)
However, I'd suggest to look more into how the overall process can be made more error-safe. Even this depends much on the context of the original code and requires a more decent look, it allows benefiting from the changes. The following discussion/example will show even more.
Changes in Replacement Pattern (previous Backslash Escapes for Backreferences)
Removing the e (PREG_REPLACE_EVAL) modifier does not only require to have a callback function but also comes with another change: Backslash escapes were added earlier but will not any longer with the callback function.
This has been kept out so far. To complete the answer, it should get some attention. First as a reminder, from the (now removed) e modifier documentation what this is about:
Single quotes, double quotes, backslashes (\) and NULL chars will be escaped by backslashes in substituted backreferences.
This can lead to code that contains one or more calls to stripslashes() within the replacement pattern. This is not the case for this question so the consequences are that backslash escapes aren't added any longer.
As mario writes in an answer to the reference question:
[...] stripslashes() often becomes redundant in literal expressions.
In this question, it is a little different: As stripslashes() is not within the replacement pattern, there is nothing to be redundant / remove in "$\1".
To demonstrate the changes with a double and single quote within a "variable name" in the absence of the escaping for preg_replace_callback() compared to using the e modifier:
Data
e Modifier
Callback
{abc}
$abc
$abc
{a"bc}
$a\"bc (E)
$a"bc (I)
${${'abc'}}
$${\'abc\' (E)
$${'abc' (I)
(E): PHP Parse error
(I): Invalid variable name (informative only)
This once more highlights that the original replacement pattern has issues with the name stored as backreference to the first matching group - as discussed above for a zero-length variable name - it is lax and allows invalid names (which could have lead to PHP Parse errors due to evaluating the replacement previously).
The backslash escaping added to that. As the regular expression pattern does a lazy match (.*? - the question mark after the asterix) it was at least not completely in free-form.
The port therefore has less such issues but only on a finer difference.
Therefore, porting itself does not address this issue much. Actually what was a PHP fatal error earlier now turns into an undefined index PHP warning with the consequence that the script continues to run where it stopped earlier.
This could be seen as an argument for (or against) failing early with the port - it depends.
It could be done by checking for invalid variable names (assuming those would have caused a fatal parse error during evaluation - not an undefined variable warning).
A PCRE regular expression pattern for variable names in PHP is ^[a-zA-Z_\x80-\xff][a-zA-Z0-9_\x80-\xff]*$). One idea which came to my mind was to use it to classify a name whether it is a valid PHP variable name or not.
Additionally, the next example is an opportunity to show how the backslash escapes and the error/warning behaviour can be preserved:
$vars = get_defined_vars();
$result = preg_replace_callback('/{(.*?)}/', function ($matches) use ($vars) {
$name = addcslashes($matches[1], "'\"\0"); # <1>
if (!preg_match('(^[a-zA-Z_\x80-\xff][a-zA-Z0-9_\x80-\xff]*$)D', $name)) { # <2>
trigger_error("Not a variable name: $name", E_USER_ERROR);
}
if (!array_key_exists($name, $vars)) { # <3>
trigger_error("Undefined variable: $name");
}
return isset($vars[$name]) ? $vars[$name] : null; # <4>
}, $data);
unset($vars);
Backslash escape single quotes, double quotes, backslashes (\) and NULL chars for backreference \1 (as the e modifier did so).
Trigger fatal error on invalid variable names as those with the e modifier would result in a parse error followed by a fatal error evaluating code with syntax error(s).
Trigger warning on undefined variable.
Undefined variables (now not isset() indexes) result in null values.
This more verbose example is to mimik even more of the original behaviour and therefore could be seen as a more complete port. However, it contradicts many of the benefits why the e modifier was deprecated and removed in the first place. Therefore, do not apply it blindly, it is an additional example to highlight the differences between the e modifiers eval and the callback version.
This is also the reason I've kept this out of the foremost answer.
PHP Version Compatibility
The port as outlined above is done with an anonymous function and therefore it is compatible with PHP 5.3 or later.
If backwards compatibility is not necessary or as an outlook for a future migration, some comments on more recent PHP versions:
Since PHP 7.4 arrow functions can be used. They have the benefit that the scope is automatically inherited ("closed"), so the use-clause becomes redundant. However variable variables can not be used with arrow functions which makes the array as "variable table" still necessary - like before. It can condense the code thought, especially if error conditions (see discussion above) from the pattern would be removed already (not the case in the following example code, it still uses the original pattern):
$vars = get_defined_vars();
$result = preg_replace_callback('/{(.*?)}/', fn($matches) => $vars[$matches[1]] ?? null, $data);
unset($vars);
Since PHP 8.0 - as throw new \Error is an expression - throwing could be another option, however for my taste it is not of much benefit then as control is not fine-grained and also readability is degraded. Your mileage may vary thought, it is an option since PHP 8.0:
$vars = get_defined_vars();
$result = preg_replace_callback(
'/{(.*?)}/',
fn($matches) =>
$vars[$matches[1]]
?? throw new \Error(sprintf('Expected existing variable name, got "%s" which is undefined', $matches[1])),
$data
);
unset($vars);
You can access global variables using $GLOBALS Superglobal array:
preg_replace_callback('/\{(.*?)\}/', function ($matches) {
return $GLOBALS[$matches[1]];
}, $data);
See the PHP demo:
$data = 'Some {abc} here';
$abc = "Word";
echo preg_replace_callback('/\{(.*?)\}/', function ($matches) {
return $GLOBALS[$matches[1]];
}, $data);
Output:
Some Word here

PHP preg_replace, split or match?

I need to parse a string and replace a specific format for tv show names that don't fit my normal format of my media player's queue.
Some examples
Show.Name.2x01.HDTV.x264 should be Show.Name.S02E01.HDTV.x264
Show.Name.10x05.HDTV.XviD should be Show.Name.S10E05.HDTV.XviD
After the show name, there may be 1 or 2 digits before the x, I want the output to always be an S with two digits so add a leading zero if needed. After the x it should always be an E with two digits.
I looked through the manual pages for the preg_replace, split and match functions but couldn't quite figure out what I should do here. I can match the part of the string I want with /\dx\d{2}/ so I was thinking first check if the string has that pattern, then try and figure out how to split the parts out of the match but I didn't get anywhere.
I work best with examples, so if you can point me in the right direction with one that would be great. My only test area right now is a PHP 4 install, so please no PHP 5 specific directions, once I understand whats happening I can probably update it later for PHP 5 if needed :)
A different approach as a solution using #sprintf using PHP4 and below.
$text = preg_replace('/([0-9]{1,2})x([0-9]{2})/ie',
'sprintf("S%02dE%02d", $1, $2)', $text);
Note: The use of the e modifier is depreciated as of PHP5.5, so use preg_replace_callback()
$text = preg_replace_callback('/([0-9]{1,2})x([0-9]{2})/',
function($m) {
return sprintf("S%02dE%02d", $m[1], $m[2]);
}, $text);
Output
Show.Name.S02E01.HDTV.x264
Show.Name.S10E05.HDTV.XviD
See working demo
preg_replace is the function you are looking function.
You have to write a regex pattern that picks correct place.
<?php
$replaced_data = preg_replace("~([0-9]{2})x([0-9]{2})~s", "S$1E$2", $data);
$replaced_data = preg_replace("~S([1-9]{1})E~s", "S0$1E", $replaced_data);
?>
Sorry I could not test it but it should work.
An other way using the preg_replace_callback() function:
$subject = <<<'LOD'
Show.Name.2x01.HDTV.x264 should be Show.Name.S02E01.HDTV.x264
Show.Name.10x05.HDTV.XviD should be Show.Name.S10E05.HDTV.XviD
LOD;
$pattern = '~([0-9]++)x([0-9]++)~i';
$callback = function ($match) {
return sprintf("S%02sE%02s", $match[1], $match[2]);
};
$result = preg_replace_callback($pattern, $callback, $subject);
print_r($result);

Error trying to pass regex match to function

I'm getting Syntax error, unexpected T_LNUMBER, expecting T_VARIABLE or '$'
This is the code i'm using
function wpse44503_filter_content( $content ) {
$regex = '#src=("|\')'.
'(/images/(19|20)(0-9){2}/(0|1)(0-9)/[^.]+\.(jpg|png|gif|bmp|jpeg))'.
'("|\')#';
$replace = 'src="'.get_site_url( $2 ).'"';
$output = preg_replace( $regex, $replace, $content );
return $output;
}
This is the line where i'm getting that error $replace = 'src="'.get_site_url( $2 ).'"';
Can anyone help me to fix it?
Thanks
You can't have '$2' as a variable name. It must start with a letter or underscore.
http://php.net/manual/en/language.variables.basics.php
Variable names follow the same rules as other labels in PHP. A valid variable name starts with a letter or underscore, followed by any number of letters, numbers, or underscores. As a regular expression, it would be expressed thus: '[a-zA-Z_\x7f-\xff][a-zA-Z0-9_\x7f-\xff]*'
Edit Above was my original answer and is the correct answer to the simple "syntax error" question. More in-depth answer below...
You are trying to use $2 to represent "the second capture group", but you haven't done anything at that point to match your regex. Even if $2 was a valid PHP variable name, it still wouldn't be set at that point in your script. Because of this, you can determine that you are using preg_replace improperly and that it may not suit your actual needs.
Note that the preg_replace documentation doesn't support using $n as a separate variable outside of the replacement operation. In other words, 'foo' . $1 . 'bar' is not a valid replacement string, but 'foo$1bar' is.
Depending on the complexity of get_site_url, you have 2 options:
If get_site_url is simply adding a root directory or server name, you could change your replacement string to src="/myotherlocation$2". This will effectively replace "/image/..." with "/myotherlocation/image/..." in the img src. This will not work if get_site_url is doing something more complex.
If get_site_url is complex, you should use preg_replace_callback per other answers. Give the documentation a read and post a new question (or I guess update this question?) if you have trouble with the implementation.
What you're trying to do (ie replacing the matched string with the result of a function call) can't be done using preg_replace, you'll need to use preg_replace_callback instead to get a function called for every match.
A short example of preg_replace_callback;
$get_site_url = // Returns replacement
function($row) {
return '!'.$row[1].'!'; // row[1] is first "backref"
};
$str = 'olle';
$regex = '/(ll)/'; // String to match
$output = preg_replace_callback( // Match, calling get_site_url for replacement
$regex,
$get_site_url,
$str);
var_dump($output); // output "o!ll!e"
PHP variable names cant begin with a number.
$2 is not a valid PHP variable. If you meant the second group in the regex then you want to put \2 in a string. However, since you're passing it to a function then you'll need to use preg_replace_callback() instead and substitute appropriately in the callback.
if PHP variable begins with number use following:
when I was getting the following as the result set from thrid party API
Code Works
$stockInfo->original->data[0]->close_yesterday
Code Failed
$stockInfo->original->data[0]->52_week_low
Solution
$stockInfo->original->data[0]->{'52_week_high'}

How to use preg_replace in php to replace text with function result?

I have a complicated problem:
I have a very long text and I need to call some php functions inside my text.
The function name is myfunction();
I`we included in my text the function in the following way:
" text text text myfunction[1,2,3,4,5]; more text text ... "
And I want to replace each myfunction[...] with the result of the function myfunction with the variables from the [] brackets.
my code is:
<?php echo preg_replace('/myfunction[[0-9,]+]/i',myfunction($1),$post['content']); ?>
,but it`s not working.
The parameter should be an array, because it can contain any number of values.
If I were you, I would avoid using the e modifier to preg_replace because it can lead you open to execution of arbitrary code. Use preg_replace_callback instead. It's slightly more verbose, but much more effective:
echo preg_replace_callback('/myfunction\[([0-9,]+)\]/i', function($matches) {
$args = explode(',', $matches[1]); // separate the arguments
return call_user_func_array('myfunction', $args); // pass the arguments to myfunction
}, $post['content']);
This uses an anonymous function. This functionality won't be available to you if you use a version of PHP before 5.3. You'll have to create a named function and use that instead, as per the instructions on the manual page.
You can use preg_replace()'s "e" modifier (for EVAL) used like this :
$text = preg_replace('/myfunction\[(.*?)\]/e', 'myfunction("$1")', $text);
I didn't really get how your data is structured so it's all I can do to help you at the moment. You can explore that solution.
From the PHP Manual :
e (PREG_REPLACE_EVAL)
If this modifier is set, preg_replace() does normal substitution of backreferences in the replacement string, evaluates it as PHP code, and uses the result for replacing the search string. Single quotes, double quotes, backslashes () and NULL chars will be escaped by backslashes in substituted backreferences.
You need to add the "e" modifier, escape [ and ] in the regex expression and stringify the second argument.
preg_replace('/myfunction\[[0-9,]+\]/ei','myfunction("$1")',$post['content']);

PHP regex: what is "class at offset 0"?

I'm trying to strip all punctuation out of a string using a simple regular expression and the php preg_replace function, although I get the following error:
Compilation failed: POSIX named classes are supported only within a class at offset 0
I guess this means I can't use POSIX named classes outside of a class at offset 0. My question is, what does it means when it says "within a class at offset 0 "?
$string = "I like: perl";
if (eregi('[[:punct:]]', $string))
$new = preg_replace('[[:punct:]]', ' ', $string); echo $new;
The preg_* functions expect Perl compatible regular expressions with delimiters. So try this:
preg_replace('/[[:punct:]]/', ' ', $string)
NOTE: The g modifier is not needed with PHP's PCRE implementation!
In addition to Gumbo's answer, use the g modifier to replace all occurances of punctuation:
preg_replace('/[[:punct:]]/g', ' ', $string)
// ^
From Johnathan Lonowski (see comments):
> [The g modifier] means "Global" -- i.e., find all existing matches. Without it, regex functions will stop searching after the first match.
An explanation of why you're getting that error: PCRE uses Perl's loose definition of what a delimiter is. Your outer []s look like valid delimiters to it, causing it to read [:punct:] as the regex part.
(Oh, and avoid the ereg functions if you can - they're not going to be included in PHP 5.3.)
I just added g to the regexp as suggested in one of the anwers, it did the opposite of wahts expected and DIDN'T filter out the punctuation, turns out preg_replace doesnt require g as it's global/recursive in the first place

Categories