Backreference does not work in PHP - php

Lately I've been studying (more in practice to tell the truth) regex, and I'm noticing his power. This demand made by me (link), I am aware of 'backreference'. I think I understand how it works, it works in JavaScript, while in PHP not.
For example I have this string:
[b]Text B[/b]
[i]Text I[/i]
[u]Text U[/u]
[s]Text S[/s]
And use the following regex:
\[(b|i|u|s)\]\s*(.*?)\s*\[\/\1\]
This testing it on regex101.com works, the same for JavaScript, but does not work with PHP.
Example of preg_replace (not working):
echo preg_replace(
"/\[(b|i|u|s)\]\s*(.*?)\s*\[\/\1\]/i",
"<$1>$2</$1>",
"[b]Text[/b]"
);
While this way works:
echo preg_replace(
"/\[(b|i|u|s)\]\s*(.*?)\s*\[\/(b|i|u|s)\]/i",
"<$1>$2</$1>",
"[b]Text[/b]"
);
I can not understand where I'm wrong, thanks to everyone who helps me.

It is because you use a double quoted string, inside a double quoted string \1 is read as the octal notation of a character (the control character SOH = start of heading), not as an escaped 1.
So two ways:
use single quoted string:
'/\[(b|i|u|s)\]\s*(.*?)\s*\[\/\1\]/i'
or escape the backslash to obtain a literal backslash (for the string, not for the pattern):
"/\[(b|i|u|s)\]\s*(.*?)\s*\[\/\\1\]/i"
As an aside, you can write your pattern like this:
$pattern = '~\[([bius])]\s*(.*?)\s*\[/\1]~i';
// with oniguruma notation
$pattern = '~\[([bius])]\s*(.*?)\s*\[/\g{1}]~i';
// oniguruma too but relative:
// (the second group on the left from the current position)
$pattern = '~\[([bius])]\s*(.*?)\s*\[/\g{-2}]~i';

Related

preg_replace not working but regex working [duplicate]

Lately I've been studying (more in practice to tell the truth) regex, and I'm noticing his power. This demand made by me (link), I am aware of 'backreference'. I think I understand how it works, it works in JavaScript, while in PHP not.
For example I have this string:
[b]Text B[/b]
[i]Text I[/i]
[u]Text U[/u]
[s]Text S[/s]
And use the following regex:
\[(b|i|u|s)\]\s*(.*?)\s*\[\/\1\]
This testing it on regex101.com works, the same for JavaScript, but does not work with PHP.
Example of preg_replace (not working):
echo preg_replace(
"/\[(b|i|u|s)\]\s*(.*?)\s*\[\/\1\]/i",
"<$1>$2</$1>",
"[b]Text[/b]"
);
While this way works:
echo preg_replace(
"/\[(b|i|u|s)\]\s*(.*?)\s*\[\/(b|i|u|s)\]/i",
"<$1>$2</$1>",
"[b]Text[/b]"
);
I can not understand where I'm wrong, thanks to everyone who helps me.
It is because you use a double quoted string, inside a double quoted string \1 is read as the octal notation of a character (the control character SOH = start of heading), not as an escaped 1.
So two ways:
use single quoted string:
'/\[(b|i|u|s)\]\s*(.*?)\s*\[\/\1\]/i'
or escape the backslash to obtain a literal backslash (for the string, not for the pattern):
"/\[(b|i|u|s)\]\s*(.*?)\s*\[\/\\1\]/i"
As an aside, you can write your pattern like this:
$pattern = '~\[([bius])]\s*(.*?)\s*\[/\1]~i';
// with oniguruma notation
$pattern = '~\[([bius])]\s*(.*?)\s*\[/\g{1}]~i';
// oniguruma too but relative:
// (the second group on the left from the current position)
$pattern = '~\[([bius])]\s*(.*?)\s*\[/\g{-2}]~i';

PHP preg_replace pattern only seems to work if its wrong?

I have a string that looks like this
../Clean_Smarty_Projekt/tpl/templates_c\.
../Clean_Smarty_Projekt/tpl/templates_c\..
I want to replace ../, \. and \.. with a regulare expression.
Before, I did this like this:
$result = str_replace(array("../","\..","\."),"",$str);
And there it (pattern) has to be in this order because changing it makes the output a little buggy. So I decided to use a regular expression.
Now I came up with this pattern
$result = preg_replace('/(\.\.\/)|(\\[\.]{1,2})/',"",$str);
What actually returns only empty strings...
Reason: (\\[\.]{1,2})
In Regex101 its all ok. (Took me a couple of minutes to realize that I don't need the /g in preg_replace)
If I use this pattern in preg_replace I have to do (\\\\[\.]{1,2}) to get it to work. But that's obviously wrong because im not searching for two slashes.
Of course I know the escaping rulse (escaping slashes).
Why doesn't this match correctly ?
I suggest you to use a different php delimiter. Within the / delimiter, you need to use three \\\ or four \\\\ backslashes to match a single backslash.
$string = '../Clean_Smarty_Projekt/tpl/templates_c\.'."\n".'../Clean_Smarty_Projekt/tpl/templates_c\..';
echo preg_replace('~\.\./|\\\.{1,2}~', '', $string)
Output:
Clean_Smarty_Projekt/tpl/templates_c
Clean_Smarty_Projekt/tpl/templates_c

RegEx to match value of a variable or a string (with or without quotes)

Here is my dilemma:
I wrote this RegEx pattern which works in my sandbox but does not work on my website:
Sandbox: http://regex101.com/r/vP3uG4
Pattern:
(.*[$]'.$variable.'\s*=\s*\'?)(.*?)(\'?;.*)
The line of code goes like this:
$savedsettings_new = preg_replace('/(.*[$]'.$variable.'\s*=\s*\'?)(.*?)(\'?;.*)/is','$1'. $value .'$3',$savedsettings_temp);
As you can see it works on the sandbox but it doesn't work live.
I am trying to match values of variables that can be expressed as strings (with single quotes around them) or numerical values with no quotes, like so:
$match_string = 'value';
$match_number = 1;
Right now this code works fine with strings but with numerical variables that are not enclosed in strings I just get the contents of the backreference $3 and I don't get anything at all before that!
I'm scratching my head and really can't figure out why it works on RegEx101 but not live... Aren't I doing the right thing when matching for one or no single quotes (and escaping them because the preg_replace has quotes?
Okay, found out the issue. The solution is to wrap the backreference in ${}.
Quoting the PHP manual:
When working with a replacement pattern where a backreference is immediately followed by another number (i.e.: placing a literal number immediately after a matched pattern), you cannot use the familiar \\1 notation for your backreference. \\11, for example, would confuse preg_replace() since it does not know whether you want the \\1 backreference followed by a literal 1, or the \\11 backreference followed by nothing. In this case the solution is to use \${1}1.
So, your code should look like:
header('Content-Type: text/plain');
$variable = 'tbs_development';
$value = '333';
$savedsettings_temp = <<<'CODE'
$tbs_underconstruction = 'foo';
$tbs_development = 0;
CODE;
$pattern = '/(.*[$]'.preg_quote($variable).'\s*=\s*\'?)(.*?)(\'?;.*)/is';
$replacement = '${1}'.$value.'${3}';
$savedsettings_new = preg_replace($pattern, $replacement, $savedsettings_temp);
echo $savedsettings_new;
Output:
$tbs_underconstruction = 'foo';
$tbs_development = 333;
Demo.
If the variable $value contains a numerical value then the replacement pattern in your preg_replace will look like this: $12$3
That's true but not as you expected. In Regex Engine, $ddd or here $dd (which are equal to \ddd and \dd) are treated as octal numbers.
So in this case $12 means a octal index 12 which is equal to a kind of space in ASCII.
In the case of working with these tricky issues in Regular Expressions you should wrap your backreference number within {} so it should be ${1}2${3}
Change your replacement pattern to '${1}'.$value.'${3}'

Can't get Regex working in PHP, works in RegEXP program

Here is the input I am searching:
\u003cspan class=\"prs\">email_address#me.com\u003c\/span>
Trying to just return email_address#me.com.
My regex class=\\"prs\\">(.*?)\\ returns "class=\"prs\">email_address#me.com\" in RegExp which is OK, I can work with that result.
But I can't get it to work in PHP.
$regex = "/class=\\\"prs\\\">(.*?)\\/";
Gives me an error "No ending delimiter"
Can someone please help?
Your original code:
$regex = "/class=\\\"prs\\\">(.*?)\\/";
The reason you get No ending delimiter is that although you are escaping the backslash prior to the closing forward slash, what you have done is escaped it in the context of the PHP string, not in the context of the regex engine.
So the PHP string escaping mechanism does its thing, and by the time the regex engine gets it, it will look like this:
/class=\"prs\">(.*?)\/
This means that the regular expression engine will see the backslash at the end of the expression as escaping the forward slash that you are intending to use to close the expression.
The usual PHP solution to this kind of thing is to switch to using single-quoted string instead of a double-quoted one, but this still won't work, as \\ is an escaped backslash in both single and double quoted strings.
What you need to do is double up the number of backslash characters at the end of your string, so your code needs to look like this:
$regex = "/class=\\\"prs\\\">(.*?)\\\\/";
The way to prove what it's doing is to print the contents of the $regex variable, so you can see what the string will look like to the regex engine. These kinds of errors are actually very hard to spot, but looking at the actual content of the string will help you spot them.
Hope that helps.
If you change to single quotes it should fix it
$regex = '/class=\\\"prs\\\">(.*?)\\/';

Regex pattern matching literal repeated \n

Given a literal string such as:
Hello\n\n\n\n\n\n\n\n\n\n\n\nWorld
I would like to reduce the repeated \n's to a single \n.
I'm using PHP, and been playing around with a bunch of different regex patterns. So here's a simple example of the code:
$testRegex = '/(\\n){2,}/';
$test = 'Hello\n\n\n\n\n\n\n\n\nWorld';
$test2 = preg_replace($testRegex ,'\n',$test);
echo "<hr/>test regex<hr/>".$test2;
I'm new to PHP, not that new to regex, but it seems '\n' conforms to special rules. I'm still trying to nail those down.
Edit: I've placed the literal code I have in my php file here, if I do str_replace() I can get good things to happen, but that's not a complete solution obviously.
To match a literal \n with regex, your string literal needs four backslashes to produce a string with two backlashes that’s interpreted by the regex engine as an escape for one backslash.
$testRegex = '/(\\\\n){2,}/';
$test = 'Hello\n\n\n\n\n\n\n\n\n\n\n\nWorld';
$test2 = preg_replace($testRegex, '\n', $test);
Perhaps you need to double up the escape in the regular expression?
$pattern = "/\\n+/"
$awesome_string = preg_replace($pattern, "\n", $string);
Edit: Just read your comment on the accepted answer. Doesn't apply, but is still useful.
If you're intending on expanding this logic to include other forms of white-space too:
$output = echo preg_replace('%(\s)*%', '$1', $input);
Reduces all repeated white-space characters to single instances of the matched white-space character.
it indeed conforms to special rules, and you need to add the "multiline"-modifier, m. So your pattern would look like
$pattern = '/(\n)+/m'
which should provide you with the matches. See the doc for all modifiers and their detailed meaning.
Since you're trying to reduce all newlines to one, the pattern above should work with the rest of your code. Good luck!
Try this regular expression:
/[\n]*/

Categories