Replacing backslash characters with preg_replace - php

I have this piece of code:
$data = 'Test\\vv testing\\vv';
$data = preg_replace('/(\\vv)/', '\test', $data);
Yet after the preg_replace, $data is still Test\vv testing\vv.
I don't understand why, on any regex tester, the regex matches the part of the string I want to replace, which are the '\v' parts.

A backslash character (\) is considered to be an escape character by both PHP's parser and the regular expression engine (PCRE). If you write a single backslash character, it will be considered as an escape character by PHP parser. If you write two backslashes, it will be interpreted as a literal backslash by PHP's parser. But when used in a regular expression, the regular expression engine also picks it up as an escape character.
So in order to match a literal backslash in PHP regex, you need to write four slashes: \\\\:
From the PHP manual documentation on Escape Sequences:
Single and double quoted PHP strings have special meaning of backslash. Thus if \ has to be matched with a regular expression \\, then "\\\\" or '\\\\' must be used in PHP code.
Your code should be:
$data = preg_replace('/(\\\\vv)/', '\replaced', $data);
Demo
Note: As Bob0t correctly said in the comments, you do not need preg_replace() for this task since it only involves plain strings and not patterns. A plain 'ol str_replace() should suffice:
$data = str_replace('\\vv', '\test', $data);

In PHP you can use:
$re = "/\\\\vv/";
$str = "Test\\vv testing\\vv";
$subst = '\\replaced';
$result = preg_replace($re, $subst, $str);
In PHP you need to use \\\\ to match a single backslash \. It is because 1st escapig is for PHP string and 2nd escaping is for underlying PCRE regex engine.
Regex Demo

Related

Regex escape escape characters in PHP

So I have this regex that works on regex101.com
(?:[^\#\\S\\+]*)
It matches the first from first#second.
Whenever I try to use my regex with PHP's preg_replace I don't get the result I expect.
So far I tried it via preg_quote():
\(\?\:\[\^\\#\\S\\\+\]\*\)
And tried it with escaping the original \\ with 4 \'s:
\(\?\:\[\^\\#\\\\S\\\\\+\]\*\)
Still no success. Am I doing something fundamentaly wrong?
I'm just using:
preg_replace("/$regex/", "", $string);
All my other regexes that don't need so many escape chars work perfectly that way.
When you use (?:[^\#\\S\\+]*) in a preg_match in PHP, both in a single or double quoted string literal, the \\S is parsed as a non-whitespace pattern. [^\S] is equal to \s, i.e. it matches whitespace.
The preg_quote() function is only meant to be used to make any string a literal one for a regex, it just escapes all chars that are sepcial regex metacharacters / operators (like (, ), [, etc.), thus you should not use it here.
While you could use a regex to match 1+ chars other than whitespace and # from the start of a string like preg_match('~^[^#\s]+~', $s, $match), you can just explode your input string with # and get the 0th item.

regular expression for remove duplicate slashes

How are you? I have the next task. I have a lot of strings that can contain duplicate slashes. I need to replace duplicate slashes to one slash (any count of slashes), but when the next symbols found after slashes (quote, double quote, NUL (NULL byte)) - all slashes should be removed. Thanks. My language - PHP. Some tests:
$s1 = 'test\\\\string';
// test\string
$s2 = 'test\\\\\"\\\\\'\\\\string';
// test"'\string
$s3 = 'test\\string\\\\\"';
// test\string"
Use
preg_replace("~\\\\+([\"\'\\x00\\\\])~", "$1", $string);
to replace arbitrary amounts of \ with just one \.
The pattern consist of arbitrary initial backslahes \\\\+ and a following symbol that is one of ", ', \x00, or \. The replacement will effectively remove any precending backslahes.
You need 4 backslashes in your regular expression. Two backslashes (\\) will lead to one backslash (\) inside the regular expression string because the PHP interpreter uses backslashes to escape special characters like " or \. For the same reason you will need two backslahes inside your regular expression.
Or explained the other way around: To gain \+ as regular expression, you have to add a backslash to tell PCRE that the one backslash is not for escaping the +. To get \\+ as a string you will also need to add one backslash before each backslash to tell the PHP interpreter that you don't want to escape the second backslash with the first.
source code: \\\\+
inside regular expression string: \\+
pattern matches: \+
Replace 2 or more consecutive slashes to a single slash
preg_replace('/\\\\+/','\\',$str);
Alternative way.
$s = 't\est\\\\\\\\\\\\stri\\\\\"\\\\\'\\\\0\\\\ng';
$s = preg_replace('~\\\\+~', '\\', $s);
$s = str_replace(array('\\"', '\\\'', '\\0'), array('"', '\'', "\0"), $s);
Try these:
preg_replace("/\\+(['\"\0\\])/", "$1", $string);
What's wrong with stripslashes? It accounts for slashes that escape a "special" character and removes "extra" slashes.

matching either nothing (beginning of string) or any character but a \

To use a simplified example, I have:
$str = "Hello :special_text:! Look, I can write \:special_text:";
$pattern = /*???*/":special_text:";
$res = preg_replace($pattern, 'world', $str);
$res = str_replace("/:", ":", $res);
$res === "Hello world! Look, I can write :special_text:"; // => true
In other words, I'd like to be able to "escape" something that I'm writing.
I think that I have something almost working (using [^:]? as the first part of pattern), but I don't think that works if $str === ":special_text:", in that^doesn't match[^:]?`.
You can use a negative lookbehind:
(?<!\\):special_text:
This says "replace a :special_text: that isn't preceded by a backslash".
In your second str_replace looks like you want to replace \: by :.
See it in action here.
Also, don't forget if you use backslash in PHP strings you need to escape them once more (if you want a literal \ you need to use PHP \\, and to get a literal \\ you need to use PHP \\\\:
$pattern = '#(?<!\\\\):([^:]+):#';
Here the # is just a regex delimiter.
$pattern = "/[^\\\\]*:special_text:/";
-or-
$pattern = "/(?<!\\\\):special_text:/";
The other answers don't take into account the need to super-escape the backslashes in this situation. It's a little crazy.
To match a literal backslash, one has to write \\\\ as the regex string because the regular expression must be \\, and each backslash must be expressed as \\ inside a string literal. In regexes that feature backslashes repeatedly, this leads to lots of repeated backslashes and makes the resulting strings difficult to understand.
Something like this should do it: /[^\\]\:([a-z]+)\:/i
You can use RegexPal to text your regex against possible strings in realtime.

how do i correct this regular expressions pattern for php

How do i make this match the following text correctly?
$string = "(\'streamer\',\'http://dv_fs06.ovfile.com:182/d/pftume4ksnroarhlslexwl7bcnoqyljeudgmd7dimssniu2b2r2ikr2h/video.flv\')";
preg_match("/streamer\\'\,\\\'(.*?)\\\'\)/", $string , $result);
var_dump($result);
Your $string looks weird. Better to make a three pass parse:
$string = str_replace(array("\'"), '', $string);
Now we have string:
"(streamer,http://dv_fs06.ovfile.com:182/d/pftume4ksnroarhlslexwl7bcnoqyljeudgmd7dimssniu2b2r2ikr2h/video.flv)"
Now let's trim brackets:
$string = trim($string, '()');
And finaly, explode:
list($streamer, $url) = explode(',', $string, 2);
No need of regex.
Btw, your string looks like it was crappyly slashed in mysql query.
It's been a while since I last did regexp matching in PHP, but I think you have to remember that:
' doesn't need to be escaped in PHP strings enclosed by "
\ always needs to be escaped in PHP strings
\ needs to be escaped yet another time in regexps (for it's a special character and you want to treat it as a normal one)
=> \ as part of the string to be matched must be escaped 4 times.
My suggestion:
preg_match("/\\(streamer\\\\',\\\\'(.*?)\\\\'\\)/", $string , $result);
You're on the right track. Two barriers to overcome (As codethief says):
1 - Double quoted string interpolation
2 - Regex escape interpolation
For (2), neither comma's nor quotes need to be escaped because they are not metachars
special to regex's. Only the backslash as a literal needs to be escaped, otherwise
in regex context, it represents the start of a metachar sequence (like \s).
For (1), php will try to interpolate escaped chars as a control code (like \n), for
that reason the literal backslash needs to be escaped. Since this is double quoted,
\' the escaped single qoute has no escape meaning.
Therefore, "\\\'" resolves to \\ = \ + \'=\' ~ \\' which is what the regex sees.
Then the regex interpolates the sequence /\\'/ as a literal \+'.
Making a slight change of your regex solves the problem:
preg_match("/streamer\\\',\\\'(.*?)\\\'\)/", $string , $result);
A working example is here http://beta.ideone.com/47EIY

How to use regex to match this html tag?

I can't seem to figure out what I'm doing wrong...
I'm trying to find matches of
<cite>stuffhere</cite>
Is this right?
preg_match_all('<cite>(.*?)</cite>/ms', $str, $matches)
escape the /
preg_match_all('/<cite>(.*?)<\/cite>/ms', $str, $matches);
Your confusion is not your fault; PHP is notoriously weird in this area.
In most programming languages, you create a regex object one of two ways. If the language supports regexes as a first-class language element, you can use a regex literal:
var re = /<b>"\w+"<\/b>/; // JavaScript
Here, the forward-slash (/) is the regex delimiter; if you want to match a literal /, you have to escape it with a backslash: \/.
In other languages, you have to write the regex in the form of a string literal, which you then pass to a constructor or a factory method:
Pattern p = Pattern.compile("<b>\"\\w+\"</b>"); // Java
The forward-slash doesn't need to be escaped, but both the double-quote (") and backslash (\) do, because of their special meanings in string literals.
But PHP is unique: it doesn't support regex literals, so you have to write the regex as a string, but the string has to look like a regex literal! That is, it has to have string delimiters (quotes) and regex delimiters. For example:
$re = '/<b>"\w+"<\/b>/';
It isn't all bad; as you can see, you can use PHP's single-quoted strings instead of double-quoted, so you don't have to escape all backslashes and double-quotes. You can also choose different regex delimiters, so you don't have to escape (for example) literal forward-slashes in your regex:
$re = '~<cite>(.*?)</cite>~s'
The modifiers ('s' for single-line, 'i' for ignore-case, etc.) go after the trailing regex delimiter, as in Perl or JavaScript. Almost any ASCII punctuation character can be used as a regex delimiter; ~ and # are popular choices.
You should use an HTML Parser to parse html, or you will end up with unexpected errors. However, this is what your regex should be:
'#<cite>(.*?)</cite>#s'
Try this:
preg_match_all('/<cite>(.*?)<\/cite>/ms', $str, $matches);

Categories