Use preg_replace() to add two backslashes before each match - php

I have code below, what I need to change to get result mercedes\\-benz instead of mercedes\-benz
$value = 'mercedes-benz';
$pattern = '/(\+|-|\/|&&|\|\||!|\(|\)|\{|}|\[|]|\^|"|~|\*|\?|:|\\\)/';
$replace = '\\\\${1}';
echo preg_replace($pattern, $replace, $value);

Welcome to the joys of "leaning toothpick syndrome" - backslash is such a commonly used escape character that it frequently requires escaping multiple times. Let's have a look at your case:
Required output (presumably because of some other escaping context): \\
Escape each \ with an additional \ for use in the PCRE regex engine: \\\\
Escape each \ there for use in a PHP string: \\\\\\\\
$value = 'mercedes-benz';
$pattern = '/(\+|-|\/|&&|\|\||!|\(|\)|\{|}|\[|]|\^|"|~|\*|\?|:|\\\)/';
$replace = '\\\\\\\\${1}';
echo preg_replace($pattern, $replace, $value);
As mickmackusa points out, you can get away with six rather than eight backslashes in some cases, such as a replacement of '\\\\\\'; this works because the regex engine sees \\\, which is an escaped backslash (\\) followed by a single backslash (\) that can't be escaping anything because it's the end of the string. Simply doubling for each "layer" of escaping is probably safer than learning when this short-cut is and isn't valid, though.

I can't be sure that I've 100% translated your original attempt, but this works for your lone sample input.
The pattern uses a character class and curly braced quantifiers to improve readability and brevity. Using \K eliminates the need for the reference in the replacement string.
Code: (Demo)
$value = 'mercedes-benz';
$pattern = '`&{2}|\|{2}|[-+/!(){}[\]^"~*?:\\\]\K`';
$replace = '\\\\\\';
echo preg_replace($pattern, $replace, $value);
Ultimately, the trick was to keep adding backslashes to the replacement to get them to show up.

Related

Regex for matching single-quoted strings fails with PHP

So I have this regex:
/'((?:[^\\']|\\.)*)'/
It is supposed to match single-quoted strings while ignoring internal, escaped single quotes \'
It works here, but when executed with PHP, I get different results. Why is that?
This might be easier using negative lookbehind. Note also that you need to escape the slashes twice - once to tell PHP that you want a literal backslash, and then again to tell the regex engine that you want a literal backslash.
Note also that your capturing expression (.*) is greedy - it will capture everything between ' characters, including other ' characters, whether they are escaped or not. If you want it to stop after the first unescaped ', use .*? instead. I have used the non-greedy version in my example below.
<?php
$test = "This is a 'test \' string' for regex selection";
$pattern = "/(?<!\\\\)'(.*?)(?<!\\\\)'/";
echo "Test data: $test\n";
echo "Pattern: $pattern\n";
if (preg_match($pattern, $test, $matches)) {
echo "Matches:\n";
var_dump($matches);
}
This is kinda escaping hell. Despite the fact that there's already an accepted answer, the original pattern is actually better. Why? It allows escaping the escape character using the
Unrolling the loop technique described by Jeffery Friedl in "Mastering Regular Expressions": "([^\\"]*(?:\\.[^\\"]*)*)" (adapted for single quotes)
Demo
Unrolling the Loop (using double quotes)
" # the start delimiter
([^\\"]* # anything but the end of the string or the escape char
(?:\\. # the escape char preceding an escaped char (any char)
[^\\"]* # anything but the end of the string or the escape char
)*) # repeat
" # the end delimiter
This does not resolve the escaping hell but you have been covered here as well:
Sample Code:
$re = '/\'([^\\\\\']*(?:\\\\.[^\\\\\']*)*)\'/';
$str = '\'foo\', \'can\\\'t\', \'bar\'
\'foo\', \' \\\'cannott\\\'\\\\\', \'bar\'
';
preg_match_all($re, $str, $matches, PREG_SET_ORDER, 0);
var_dump($matches);

Replacing backslash characters with preg_replace

I have this piece of code:
$data = 'Test\\vv testing\\vv';
$data = preg_replace('/(\\vv)/', '\test', $data);
Yet after the preg_replace, $data is still Test\vv testing\vv.
I don't understand why, on any regex tester, the regex matches the part of the string I want to replace, which are the '\v' parts.
A backslash character (\) is considered to be an escape character by both PHP's parser and the regular expression engine (PCRE). If you write a single backslash character, it will be considered as an escape character by PHP parser. If you write two backslashes, it will be interpreted as a literal backslash by PHP's parser. But when used in a regular expression, the regular expression engine also picks it up as an escape character.
So in order to match a literal backslash in PHP regex, you need to write four slashes: \\\\:
From the PHP manual documentation on Escape Sequences:
Single and double quoted PHP strings have special meaning of backslash. Thus if \ has to be matched with a regular expression \\, then "\\\\" or '\\\\' must be used in PHP code.
Your code should be:
$data = preg_replace('/(\\\\vv)/', '\replaced', $data);
Demo
Note: As Bob0t correctly said in the comments, you do not need preg_replace() for this task since it only involves plain strings and not patterns. A plain 'ol str_replace() should suffice:
$data = str_replace('\\vv', '\test', $data);
In PHP you can use:
$re = "/\\\\vv/";
$str = "Test\\vv testing\\vv";
$subst = '\\replaced';
$result = preg_replace($re, $subst, $str);
In PHP you need to use \\\\ to match a single backslash \. It is because 1st escapig is for PHP string and 2nd escaping is for underlying PCRE regex engine.
Regex Demo

Is it really required to escape backslashes in regex patterns?

This might be a dumb question, but I have trouble understanding why the following code works as expected
$text = "ab cd";
$text = preg_replace("/\s+/", "", $text);
echo $text;
and outputs abcd.
Shouldn't the backslash in \s be escaped to get its literal meaning inside the regular expression?
Not necessarily, because the string literal rules say that if \ is followed by anything other than another \ or a ' it is treated as any other character. This general rule also affects double-quoted strings, although in that case there are more recognized escape sequences than just these two.
You could escape it if you wanted to, but personally I think the world has enough backslashes already.

matching either nothing (beginning of string) or any character but a \

To use a simplified example, I have:
$str = "Hello :special_text:! Look, I can write \:special_text:";
$pattern = /*???*/":special_text:";
$res = preg_replace($pattern, 'world', $str);
$res = str_replace("/:", ":", $res);
$res === "Hello world! Look, I can write :special_text:"; // => true
In other words, I'd like to be able to "escape" something that I'm writing.
I think that I have something almost working (using [^:]? as the first part of pattern), but I don't think that works if $str === ":special_text:", in that^doesn't match[^:]?`.
You can use a negative lookbehind:
(?<!\\):special_text:
This says "replace a :special_text: that isn't preceded by a backslash".
In your second str_replace looks like you want to replace \: by :.
See it in action here.
Also, don't forget if you use backslash in PHP strings you need to escape them once more (if you want a literal \ you need to use PHP \\, and to get a literal \\ you need to use PHP \\\\:
$pattern = '#(?<!\\\\):([^:]+):#';
Here the # is just a regex delimiter.
$pattern = "/[^\\\\]*:special_text:/";
-or-
$pattern = "/(?<!\\\\):special_text:/";
The other answers don't take into account the need to super-escape the backslashes in this situation. It's a little crazy.
To match a literal backslash, one has to write \\\\ as the regex string because the regular expression must be \\, and each backslash must be expressed as \\ inside a string literal. In regexes that feature backslashes repeatedly, this leads to lots of repeated backslashes and makes the resulting strings difficult to understand.
Something like this should do it: /[^\\]\:([a-z]+)\:/i
You can use RegexPal to text your regex against possible strings in realtime.

how do i correct this regular expressions pattern for php

How do i make this match the following text correctly?
$string = "(\'streamer\',\'http://dv_fs06.ovfile.com:182/d/pftume4ksnroarhlslexwl7bcnoqyljeudgmd7dimssniu2b2r2ikr2h/video.flv\')";
preg_match("/streamer\\'\,\\\'(.*?)\\\'\)/", $string , $result);
var_dump($result);
Your $string looks weird. Better to make a three pass parse:
$string = str_replace(array("\'"), '', $string);
Now we have string:
"(streamer,http://dv_fs06.ovfile.com:182/d/pftume4ksnroarhlslexwl7bcnoqyljeudgmd7dimssniu2b2r2ikr2h/video.flv)"
Now let's trim brackets:
$string = trim($string, '()');
And finaly, explode:
list($streamer, $url) = explode(',', $string, 2);
No need of regex.
Btw, your string looks like it was crappyly slashed in mysql query.
It's been a while since I last did regexp matching in PHP, but I think you have to remember that:
' doesn't need to be escaped in PHP strings enclosed by "
\ always needs to be escaped in PHP strings
\ needs to be escaped yet another time in regexps (for it's a special character and you want to treat it as a normal one)
=> \ as part of the string to be matched must be escaped 4 times.
My suggestion:
preg_match("/\\(streamer\\\\',\\\\'(.*?)\\\\'\\)/", $string , $result);
You're on the right track. Two barriers to overcome (As codethief says):
1 - Double quoted string interpolation
2 - Regex escape interpolation
For (2), neither comma's nor quotes need to be escaped because they are not metachars
special to regex's. Only the backslash as a literal needs to be escaped, otherwise
in regex context, it represents the start of a metachar sequence (like \s).
For (1), php will try to interpolate escaped chars as a control code (like \n), for
that reason the literal backslash needs to be escaped. Since this is double quoted,
\' the escaped single qoute has no escape meaning.
Therefore, "\\\'" resolves to \\ = \ + \'=\' ~ \\' which is what the regex sees.
Then the regex interpolates the sequence /\\'/ as a literal \+'.
Making a slight change of your regex solves the problem:
preg_match("/streamer\\\',\\\'(.*?)\\\'\)/", $string , $result);
A working example is here http://beta.ideone.com/47EIY

Categories