I am having difficulty converting below preg_replace() function call
preg_replace("/\{(.*?)\}/e", '$\1', $data)
to using preg_replace_callback() (because of the removed e modifier in PHP 7.0).
I have tried this but I have no idea how to fully handle '$\1':
preg_replace_callback('/\{(.*?)\}/', function ($matches) {
return $matches[0];
}, $data);
Any help would be highly appreciated.
I'd like to suggest the following code as an answer to the concrete question:
$vars = get_defined_vars();
$result = preg_replace_callback('/{(.*?)}/', function ($matches) use ($vars) {
return $vars[$matches[1]];
}, $data);
unset($vars);
The remaining part of the answer should provide more information and references for mainly two things:
Show how this can be solved with divide and conquer also leading to a step-by-step guide on how to port such code.
Add more context as depending on how/where that code is to be ported, there can be differences, also for error handling and PHP version compatibility requirements.
This should make the answer more applicable to similar variable variables related preg_replace() with e modifier migration question based on backreferences.
The e (PREG_REPLACE_EVAL) Modifier
This feature was DEPRECATED in PHP 5.5.0, and REMOVED as of PHP 7.0.0.
It was only used by preg_replace() and was ignored by other PCRE functions.
From a previous PHP manual description revision:
If this deprecated modifier is set, preg_replace() does normal substitution of backreferences in the replacement string, evaluates it as PHP code, and uses the result for replacing the search string. Single quotes, double quotes, backslashes (\) and NULL chars will be escaped by backslashes in substituted backreferences.
Rationale and context why it was deprecated/removed can be found in RFC: Remove preg_replace /e modifier, mainly three issue classes:
Security issues
Overescaping of quotes
Use as obfuscation in exploit scripts
The PHP RFC Wiki page has more details, and the information is a good addition to the answer as a port at least crosses 1. and 2. for the removed PHP code evaluation.
The '$\1' Replacement
As per the e modifiers description, '$\1' will be evaluated after the backreference \1 (first matching group) is replaced.
In the questions example that is the contents of the angle brackets {...}:
'/\{(.*?)\}/'
~~~~~
1 : first matching group
For example when the subject string is "Hello {name}", the contents of the first matching group is "name". Resolving it leads to the following PHP code that then is evaluated:
$name
That is a variable named "name". The evaluation is done within the scope where preg_replace() is called.
So far the description of the replacement pattern.
How to make compatible with PHP 7.0.0 (and earlier/later)?
A common way to start changing away from the e modifier is to make use of preg_replace_callback() instead of preg_replace(), which is done by replacing it and using an anonymous function (or any other callback method, however anonymous functions are normally the preferable way in most cases).
This is also (thankfully) outlined on the reference question. In the following I'll first leave backslash escaping of the substituted backreferences out to simplify the solution (and address it later).
An example of what has been done so far (with only a slight correction on the $matches index - it needs to be 1 not 0):
preg_replace_callback('/\{(.*?)\}/', function ($matches) {
return $matches[1];
}, $data);
The \1 backreference from the first matching group is done by using $matches[1] here. It will contain the contents of the angle brackets {...}, e.g. "name" from the previous example.
(compare: Changing preg_replace to preg_replace_callback)
More or less obviously for the here specific $\1 replacement, it is incomplete as it would only replace with the name of the variable and not (yet) its contents.
Still missing is to connect the name with the original scope. Which requires a little more work.
Obtain Variables in preg_replace() Scope
To obtain all variables defined in the same scope as the preg_replace_callback() (previously preg_replace()) call, the get_defined_vars() function is an option:
This function returns a multidimensional array containing a list of all defined variables, be them environment, server or user-defined variables, within the scope that get_defined_vars() is called.
Using that array within the anonymous callback function then allows to obtain the value of a variable by its name as array key:
$vars = get_defined_vars(); # <1>
preg_replace_callback('/\{(.*?)\}/', function ($matches) use ($vars) { # <2>
$name = $matches[1];
return $vars[$name]; # <3>
}, $data);
Obtain variables from preg_replace scope.
Use variables with the anonymous function (the use language construct).
Access variables value by name and return.
This was the missing part in the question to turn the backreference used as variable name to obtain the actual value already.
As so often, there are similar ways to achieve the same, some of them more depending on context. Truly get_defined_vars() is a pretty generic way to create a "variable table" and map names to their value. But there can be circumstances for which an array is already available and there might be no need to call that function.
Alternative to get_defined_vars(): Use of $GLOBALS array
This approach has been chosen by Wiktor Stribiżew in his answer:
Given the scope is the global scope (likely not, but if), then the $GLOBALS superglobal can be used instead:
$result = preg_replace_callback('/\{(.*?)\}/', function ($matches) {
$name = $matches[1];
return $GLOBALS[$name];
}, $data);
No need to call get_defined_vars() nor to unset the $vars array after the call (or otherwise need to potentially care about it). But this is binding to global variable state (may or not be an issue with the application).
Alternative to get_defined_vars(): Re-Use of another array (if available)
Given variables were previously imported into the scope where preg_replace() with the e modifier was running from an array, then the import is redundant and the array itself can be used with the callbacks function use clause. An example:
function replace_variables(string $data, array $vars) {
# previously here: extract($vars);
$result = preg_replace_callback('/\{(.*?)\}/', function ($matches) use ($vars) {
$name = $matches[1];
return $vars[$name];
}, $data);
# ...
}
As extract() comes with side effects you normally want to prevent, this would catch two birds with one stone: The variables array was already available and get_defined_vars() must not be called. Additionally, an unsafe extract operation can be dropped as it is not necessary any longer to create variables in the scope of the earlier preg_replace().
This should leave enough food for thought to connect the name in the backreference to the value. The PHP manual has more about variable scope in case there is a more specific context. Normally get_defined_vars() should address most issues if an array is not yet available.
Notes for the '/\{(.*?)\}/' Regular Expression Pattern
This pattern comes with some caveats, therefore I'm leaving some notes for additional information and to open up on error handling and changes of it due to porting, which will address more issues.
The backslashes "\" are redundant:
Just a minor thing to get it out of the way:
ok.....: '/\{(.*?)\}/'
correct: '/{(.*?)}/'
This change can be always done, those backslashes are redundant. They don't qualify as quantifiers.
This improves readability of the pattern.
Change in Regular Expression Pattern PHP Error Behaviour
Second worth a note on the search pattern is to highlight a potential incompatibility:
The pattern allows a zero-length match, that is the empty angle brackets group {} does match leading to a zero-length (variable) name. It could be used to present a default value (e.g. null) but perhaps you may want to not have it matching at all or may want to add error handling.
w/ empty.: '/{(.*?)}/'
w/ length: '/{(.+?)}/'
Which brings up a related point: Undefined variable/index warnings.
To prevent undefined index warnings these could resolve to null silently (or you may want to add error handling). This has been done in the upfront code porting suggestion at the very beginning of the answer.
Note thought that these errors were harsher with the previous preg_replace() call with the e modifier as the empty name resulted in a parse error when evaluated and then a fatal error. Example:
PHP Parse error: syntax error, unexpected ';', expecting variable (T_VARIABLE) or '$' in ... : regexp code on line 1
PHP Fatal error: preg_replace(): Failed evaluating code:
$
To define such errors out of existence as of a PHP 7.0.0 (and above/below) compatible port:
$vars = get_defined_vars();
$result = preg_replace_callback('/{(.*?)}/', function ($matches) use ($vars) {
$name = $matches[1];
return isset($vars[$name]) ? $vars[$name] : null;
}, $data);
unset($vars);
Alternatively it is possible to mimik the old error behaviour (a bit) by throwing (e.g. on empty name), as it triggers a fatal, uncaught exception error:
$vars = get_defined_vars();
$result = preg_replace_callback('/{(.*?)}/', function ($matches) use ($vars) {
$name = $matches[1];
if ('' === $name) {
throw new \RuntimeException('preg_replace_callback(): callback: Expected variable name, got zero-length string.');
}
return isset($vars[$name]) ? $vars[$name] : null;
}, $data);
unset($vars);
(if backwards compatibility below PHP 7.0.0 is not an issue, throwing an \Error is a more matching alternative for PHP 7.0.0 and above. Alternatively use trigger_error() instead to include versions below PHP 7.0.0 as well)
However, I'd suggest to look more into how the overall process can be made more error-safe. Even this depends much on the context of the original code and requires a more decent look, it allows benefiting from the changes. The following discussion/example will show even more.
Changes in Replacement Pattern (previous Backslash Escapes for Backreferences)
Removing the e (PREG_REPLACE_EVAL) modifier does not only require to have a callback function but also comes with another change: Backslash escapes were added earlier but will not any longer with the callback function.
This has been kept out so far. To complete the answer, it should get some attention. First as a reminder, from the (now removed) e modifier documentation what this is about:
Single quotes, double quotes, backslashes (\) and NULL chars will be escaped by backslashes in substituted backreferences.
This can lead to code that contains one or more calls to stripslashes() within the replacement pattern. This is not the case for this question so the consequences are that backslash escapes aren't added any longer.
As mario writes in an answer to the reference question:
[...] stripslashes() often becomes redundant in literal expressions.
In this question, it is a little different: As stripslashes() is not within the replacement pattern, there is nothing to be redundant / remove in "$\1".
To demonstrate the changes with a double and single quote within a "variable name" in the absence of the escaping for preg_replace_callback() compared to using the e modifier:
Data
e Modifier
Callback
{abc}
$abc
$abc
{a"bc}
$a\"bc (E)
$a"bc (I)
${${'abc'}}
$${\'abc\' (E)
$${'abc' (I)
(E): PHP Parse error
(I): Invalid variable name (informative only)
This once more highlights that the original replacement pattern has issues with the name stored as backreference to the first matching group - as discussed above for a zero-length variable name - it is lax and allows invalid names (which could have lead to PHP Parse errors due to evaluating the replacement previously).
The backslash escaping added to that. As the regular expression pattern does a lazy match (.*? - the question mark after the asterix) it was at least not completely in free-form.
The port therefore has less such issues but only on a finer difference.
Therefore, porting itself does not address this issue much. Actually what was a PHP fatal error earlier now turns into an undefined index PHP warning with the consequence that the script continues to run where it stopped earlier.
This could be seen as an argument for (or against) failing early with the port - it depends.
It could be done by checking for invalid variable names (assuming those would have caused a fatal parse error during evaluation - not an undefined variable warning).
A PCRE regular expression pattern for variable names in PHP is ^[a-zA-Z_\x80-\xff][a-zA-Z0-9_\x80-\xff]*$). One idea which came to my mind was to use it to classify a name whether it is a valid PHP variable name or not.
Additionally, the next example is an opportunity to show how the backslash escapes and the error/warning behaviour can be preserved:
$vars = get_defined_vars();
$result = preg_replace_callback('/{(.*?)}/', function ($matches) use ($vars) {
$name = addcslashes($matches[1], "'\"\0"); # <1>
if (!preg_match('(^[a-zA-Z_\x80-\xff][a-zA-Z0-9_\x80-\xff]*$)D', $name)) { # <2>
trigger_error("Not a variable name: $name", E_USER_ERROR);
}
if (!array_key_exists($name, $vars)) { # <3>
trigger_error("Undefined variable: $name");
}
return isset($vars[$name]) ? $vars[$name] : null; # <4>
}, $data);
unset($vars);
Backslash escape single quotes, double quotes, backslashes (\) and NULL chars for backreference \1 (as the e modifier did so).
Trigger fatal error on invalid variable names as those with the e modifier would result in a parse error followed by a fatal error evaluating code with syntax error(s).
Trigger warning on undefined variable.
Undefined variables (now not isset() indexes) result in null values.
This more verbose example is to mimik even more of the original behaviour and therefore could be seen as a more complete port. However, it contradicts many of the benefits why the e modifier was deprecated and removed in the first place. Therefore, do not apply it blindly, it is an additional example to highlight the differences between the e modifiers eval and the callback version.
This is also the reason I've kept this out of the foremost answer.
PHP Version Compatibility
The port as outlined above is done with an anonymous function and therefore it is compatible with PHP 5.3 or later.
If backwards compatibility is not necessary or as an outlook for a future migration, some comments on more recent PHP versions:
Since PHP 7.4 arrow functions can be used. They have the benefit that the scope is automatically inherited ("closed"), so the use-clause becomes redundant. However variable variables can not be used with arrow functions which makes the array as "variable table" still necessary - like before. It can condense the code thought, especially if error conditions (see discussion above) from the pattern would be removed already (not the case in the following example code, it still uses the original pattern):
$vars = get_defined_vars();
$result = preg_replace_callback('/{(.*?)}/', fn($matches) => $vars[$matches[1]] ?? null, $data);
unset($vars);
Since PHP 8.0 - as throw new \Error is an expression - throwing could be another option, however for my taste it is not of much benefit then as control is not fine-grained and also readability is degraded. Your mileage may vary thought, it is an option since PHP 8.0:
$vars = get_defined_vars();
$result = preg_replace_callback(
'/{(.*?)}/',
fn($matches) =>
$vars[$matches[1]]
?? throw new \Error(sprintf('Expected existing variable name, got "%s" which is undefined', $matches[1])),
$data
);
unset($vars);
You can access global variables using $GLOBALS Superglobal array:
preg_replace_callback('/\{(.*?)\}/', function ($matches) {
return $GLOBALS[$matches[1]];
}, $data);
See the PHP demo:
$data = 'Some {abc} here';
$abc = "Word";
echo preg_replace_callback('/\{(.*?)\}/', function ($matches) {
return $GLOBALS[$matches[1]];
}, $data);
Output:
Some Word here
Related
I am porting a office managing system from php 5.3 to 5.4 and the errors consists that many functions implement pass-by-reference (the "&" symbol in arguments) and this stops the application runtime. I have solved in the following way:
//This is a function with pass by reference:
function myfunc(&$x, $y, &$z) { ... }
//This is the above function being implemented:
$myClass->myfunc(&$var1, $var2, &$var3);
The php documentation tells me that I have to remove the "&" when I implement the function, so I have to replace that line for:
$myClass->myfunc($var1, $var2, $var3);
because the function definition already have the symbol meaning that argument comes by reference.
But I have so many php files, more than 800 files and I would have to replace one by one, line by line.
So I just need a regex that helps me locate all this "&$" coincidences (avoiding a && that means AND, and any of "&$" coincidences in a function declaration).
I built this regex: [^&]&\$ and it works but need to exclude any coincidence that starts the line in "function" (it would be a function declaration).
^(\s*function.*)$|(?<!&)&(?=\$[^(]*\))
Try this.This should work for you.Replace by $1.
See demo.
https://regex101.com/r/sH8aR8/35
Your regex [^&]&\$ does not use 0 width assertions.It will replace 3 characters.So it's better to use lookaheads and lookbehnids.
$re = "/^(\\s*function.*)$|(?<!&)&(?=\\$[^(]*\\))/im";
$str = "&\$)\n&&\$\n\$myClass->myfunc(&\$var1, \$var2, &\$var3);\nfunction myfunc(&\$x, \$y, &\$z) { ... }\n\n";
$subst = "$1";
$result = preg_replace($re, $subst, $str);
I've created a function that cleans a posted title.
function title_var($title_variable) {
$title_variable = mysql_real_escape_string(ucwords(strtolower(trim(htmlspecialchars($title_variable, ENT_QUOTES)))));
return stripslashes($title_variable);
}
I now need to be able to make anything between () or [] all uppercase. For instance "my business name (cbs) limited" or "my business name [cbs] limited", becomes "My Business Name (CBS) Limited", with "CBS" being in all capitals.
I've done the first part of making all the words capital, I just need a way of making anything between the brackets capital.
Always use context-based escaping
Do not try to build a single function to handle all the possible cases. Just don't. It's pointless. In your function, you're trying to "clean" the string by removing certain characters. You can't clean a string by removing a set of characters. That idea is flawed because you're always going to have to allow the use of some characters that are special in some syntax or the other.
Instead, treat the string according to the context where it's going to be used. For example:
If you are going to use this string in an SQL query, you have to use prepared statements (or mysqli_real_escape_string()) to properly escape the data.
If you're going to output this value in HTML markup, you need to use htmlspecialchars() to escape the data.
If you're going to use it as command-line argument, you need to use escapeshellcmd() or escapeshellarg().
Solving the problem at hand
Use preg_replace_callback() to accomplish this. You can use the following regex to match the text inside the brackets (including the brackets):
[\(\[].*?[\)\]]
Explanation:
[\(\[] - Matches the opening bracket
.*? - Matches the text in between
[\)\]] - Matches the closing bracket
$m[0] will contain the entire matched string. You can just transform it into upper-case with strtoupper().
Modifying your function, it becomes just:
function get_title($title) {
$title = ucwords(strtolower(trim($title, ENT_QUOTES)));
return preg_replace_callback('/[\(\[].*?[\)\]]/', function ($m) {
return strtoupper($m[0]);
}, $title);
}
Demo
Update 5/26
I've fixed the behavior of the regular expressions that were previously contained in this question, but as others have mentioned, my syntax still wasn't correct. Apparently the fact that it compiles is due to PHP's preg_* family of functions overlooking my mistakes.
I'm definitely a PCRE novice so I'm trying to understand what mistakes are present so that I can go about fixing them. I'm also open to critique about design/approach, and as others have mentioned, I am also going to build in compatibility with JSON and YAML, but I'd like to go ahead and finish this home-brewed parser since I have it working and I just need to work on the expression syntax (I think).
Here are all of the preg_match_all references and the one preg_replace reference extracted from the whole page of code:
// matches the outside container of objects {: and :}
$regex = preg_match_all('/\s\{:([^\}]+):\}/i', $this->html, $HTMLObjects);
// double checks that the object container is removed
$markup = preg_replace('/[\{:]([^\}]+):\}/i', '$1', $markup);
// matches all dynamic attributes (those containing bracketed data)
$dynamicRegEx = preg_match_all('/[\n]+([a-z0-9_\-\s]+)\[([^\]]+)\]/', $markup, $dynamicMatches);
// matches all static attributes (simple colon-separated attributes)
$staticRegEx = preg_match_all('/([^:]+):([^\n]+)/', $staticMarkup, $staticMatches);
If you'd like to see the preg_match_all and preg_replace references in context so that you can comment/critique that as well, you can see the containing source file by following the link below.
Note: viewing the source code of the page makes everything much more readable
http://mdl.fm/codeshare.php?htmlobject
Like I said, I have it functioning as it stands, I'm just asking for supervision on my PCRE syntax so that it isn't illegal. However, if you have comments on the structure/design or anything else I'm open to all suggestions.
(Rewritten to reflect new question)
The first regex is correct, but you don't need to escape } within a character class. Also, I usually include both braces to avoid the matching of nested objects (your regex would match {:foo {:bar:} in the string "{:foo {:bar:} baz:}"), mine would only match {:bar:}. The /i mode modifier is useless since there is no cased text in your regex.
// matches the outside container of objects {: and :}
$regex = preg_match_all('/\s\{:([^{}]+):\}/', $this->html, $HTMLObjects);
In your second regex, there is an incorrect character class at the start that needs to be removed. Otherwise, it's the same.
// double checks that the object container is removed
$markup = preg_replace('/\{:([^{}]+):\}/', '$1', $markup);
Your third regex looks OK; there's another useless character class, though. Again, I've included both brackets in the negated character class. I'm not sure why you've made it case-sensitive - shouldn't there be an /i modifier here?
// matches all dynamic attributes (those containing bracketed data)
$dynamicRegEx = preg_match_all('/\n+([a-z0-9_\-\s]+)\[([^\[\]]+)\]/i', $markup, $dynamicMatches);
The last regex is OK, but it will always match from the very first character of the string until the first colon (and then on to the rest of the line). I think I would add a newline character to the first negated character class to make sure that can't happen:
// matches all static attributes (simple colon-separated attributes)
$staticRegEx = preg_match_all('/([^\n:]+):([^\n]+)/', $staticMarkup, $staticMatches);
I have a complicated problem:
I have a very long text and I need to call some php functions inside my text.
The function name is myfunction();
I`we included in my text the function in the following way:
" text text text myfunction[1,2,3,4,5]; more text text ... "
And I want to replace each myfunction[...] with the result of the function myfunction with the variables from the [] brackets.
my code is:
<?php echo preg_replace('/myfunction[[0-9,]+]/i',myfunction($1),$post['content']); ?>
,but it`s not working.
The parameter should be an array, because it can contain any number of values.
If I were you, I would avoid using the e modifier to preg_replace because it can lead you open to execution of arbitrary code. Use preg_replace_callback instead. It's slightly more verbose, but much more effective:
echo preg_replace_callback('/myfunction\[([0-9,]+)\]/i', function($matches) {
$args = explode(',', $matches[1]); // separate the arguments
return call_user_func_array('myfunction', $args); // pass the arguments to myfunction
}, $post['content']);
This uses an anonymous function. This functionality won't be available to you if you use a version of PHP before 5.3. You'll have to create a named function and use that instead, as per the instructions on the manual page.
You can use preg_replace()'s "e" modifier (for EVAL) used like this :
$text = preg_replace('/myfunction\[(.*?)\]/e', 'myfunction("$1")', $text);
I didn't really get how your data is structured so it's all I can do to help you at the moment. You can explore that solution.
From the PHP Manual :
e (PREG_REPLACE_EVAL)
If this modifier is set, preg_replace() does normal substitution of backreferences in the replacement string, evaluates it as PHP code, and uses the result for replacing the search string. Single quotes, double quotes, backslashes () and NULL chars will be escaped by backslashes in substituted backreferences.
You need to add the "e" modifier, escape [ and ] in the regex expression and stringify the second argument.
preg_replace('/myfunction\[[0-9,]+\]/ei','myfunction("$1")',$post['content']);
I am stuck with parsing a string containing key-value pairs with operators in between (like the one below) in PHP. I am planning to user regex to parse it (I am not good at it though).
key: "value" & key2 : "value2" | title: "something \"here\"..." &( key: "this value in paranthesis" | key: "another value")
Basically the units in the above block are as follows
key - Anything that qualifies to be a javascript variables.
value - Any string long or short but enclosed in double quotes ("").
pair - (key:value) A key and value combined by colon just like in javascript objects.
operator - (& or |) Simply indicating 'AND' or 'OR'.
There can be multiple blocks nested within prantheses ( and ).
Being inspired from Matt (http://stackoverflow.com/questions/2467955/convert-javascript-regular-expression-to-php-pcre-expression) I have used the following regular expressions.
$regs[':number'] = '(?:-?\\b(?:0|[1-9][0-9]*)(?:\\.[0-9]+)?(?:[eE][+-]?[0-9]+)?\\b)';
$regs[':oneChar'] = '(?:[^\\0-\\x08\\x0a-\\x1f\"\\\\]|\\\\(?:[\"/\\\\bfnrt]|u[0-9A-Fa-f]{4}))';
$regs[':string'] = '(?:\"'.$regs[':oneChar'].'*\")';
$regs[':varName'] = '\\$(?:'.$regs[':oneChar'].'[^ ,]*)';
$regs[':func'] = '(?:{[ ]*'.$regs[':oneChar'].'[^ ]*)';
$regs[':key'] = "({$regs[':varName']})";
$regs[':value'] = "({$regs[':string']})";
$regs[':operator'] = "(&|\|)";
$regs[':pair'] = "(({$regs[':key']}\s*:)?\s*{$regs[':value']})";
if(preg_match("/^{$regs[':value']}/", $query, $matches))
{
print_r($matches);
}
When executing the above, PHP throws an error near the IF condition
Warning: preg_match() [function.preg-match]: Unknown modifier '\' in /home/xxxx/test.xxxx.com/experiments/regex/index.php on line 23
I have tried to preg_match with :string and :oneChar but still I get the same error.
Therefor I feel there is something wrong in the :oneChar reg ex. Kindly help me in resolving this issue.
I see at least one error in the second regular expression ($regs[':oneChar']). There is a forward slash in it. And it is conflicting with the forward slashes being used in preg_match as delimiters. Use preg_match("#^{$regs[':value']}#", $query, $matches) instead.
You may also need to use preg_quote on the input string.
$query = preg_quote($query, '/');
Beyond that, I would run each of your regular expressions one at a time to see which one is throwing the error.