regex searching for a backslash - php

why is it that in searching for a backslash in a regex you need to escape the backslash 4 times?
Example:
$pattern = '/\\\\/';
$string = 'to\m';
preg_match( $pattern, $string, $matches );
echo "<pre>";
print_r($matches);
echo "</pre>";
Returns:
Array
(
[0] => \
)

Because there are two levels of parsing being done, once by PHP, and a second time by the regular expression engine:
The intended target: \
Well I need to put that in a string without it escaping the character after it: "\\", PHP sees \
Now I need to feed that into a regex: "\\\\" PHP sees \\, regex engine sees \
The function preg_quote() will remove a layer of confusion for you by escaping all regular expression metacharacters for you. eg:
$foo = preg_quote("c:\\some\\path\\or_whatever");
preg_match("/$foo/", $bar);
edit
You seem to be thinking of this as "units of \\", which doesn't seem like an accurate depiction of what is happening. For a better example let's use a different character that is also significant in both PHP and regular expressions, $.
Intended target: $
Escaping for a PHP string: "\$", the literal string seen by PHP is $
Escaping for a PHP string to be interpreted as a literal $ in a regular expression:
"\\\$", PHP sees the literal string \$, the regular expression sees the literal string $
Illustrated with different styles of braces representing different levels of escaping:
0: $ $
1: \$ [\$]
2: \\\\ [{\\}{\$}]
0: \ \
1: \\ [\\]
2: \\\\ [{\\}{\\}]
0: \\server\$c\Windows
1: [\\][\\]server[\\][\$]c[\\]Windows
2: [{\\}{\\}][{\\}{\\}]server[{\\}{\\}][{\\}{\$}]c[{\\}{\\}]Windows
Which also illustrates why dealing with Windows paths sucks butts.

This is because the backslash has a special meaning in both a php string and a regular expression, so you must escape it twice:
To match a single backslash, the pure regex should be:
/\\/
If it was:
/\/
, the backslash would be escaping the forward slash, leading to an invalid regex matching a single forward slash, but missing it's ending slash.
Then, this pure regex is put into a php string, and each backslash is again escaped:
'/\\\\/'

Because a backslash is a special character, you need to escape it twice. So \\ for the first backslash, and \\ for the second.

Related

PHP/Regex: Get content between Stars but not if there's a leading backslash

I would like to get everything between two stars - except of they have a leading backslash.
So for example:
*hello* world
should return "hello", but
*hello \* world*
should return "hello * world"
I tried the following regex:
/(?<!\\)\*(.+?)(?<!\\)\*/s
which works perfect on http://regex101.com/ but php returns:
Warning: preg_replace(): Compilation failed: missing ) at offset 21
What am I doing wrong?
--
EDIT 1:
Here's my PHP-Code for that:
var_dump(preg_replace('/(?<!\\)\*(.+?)(?<!\\)\*/s', '<strong>$1</strong>', '*hello world*'));
You are not escaping the backslashes correctly which results in escaping the ) character.
To match a \ in PHP you need 4 backslashes
/(?<!\\\\)\*(.+?)(?<!\\\\)\*/s
It must be done like this because every backslash in a C-like string
must be escaped by a backslash. That would give us a regular
expression with 2 backslashes, as you might have assumed at first.
However, each backslash in a regular expression must be escaped by a
backslash, too. This is the reason that we end up with 4 backslashes.
Or use a character class with 2 backslashes
/(?<![\\])\*(.+?)(?<![\\])\*/s
A literal backslash can also be matched using preg_match() by using a
character class instead. Backslashes are not escaped when they appear
within character classes in regular expressions. Therefore (“[\]“)
would match a literal backslash. The backslash must still be escaped
once by another backslash because it is still a C-like string.
Edit Found this article which explains why this is necessary. Also, added explanations.
You can use this regex:
\*(.*?(?<!\\))\*
Working demo

Backslash in Regex- PHP

I am trying to learn Regex in PHP and stuck in here now. My ques may appear silly but pls do explain.
I went through a link:
Extra backslash needed in PHP regexp pattern
But I just could not understand something:
In the answer he mentions two statements:
2 backslashes are used for unescaping in a string ("\\\\" -> \\)
1 backslash is used for unescaping in the regex engine (\\ -> \)
My ques:
what does the word "unescaping" actually means? what is the purpose of unescaping?
Why do we need 4 backslashes to include it in the regex?
The backslash has a special meaning in both regexen and PHP. In both cases it is used as an escape character. For example, if you want to write a literal quote character inside a PHP string literal, this won't work:
$str = ''';
PHP would get "confused" which ' ends the string and which is part of the string. That's where \ comes in:
$str = '\'';
It escapes the special meaning of ', so instead of terminating the string literal, it is now just a normal character in the string. There are more escape sequences like \n as well.
This now means that \ is a special character with a special meaning. To escape this conundrum when you want to write a literal \, you'll have to escape literal backslashes as \\:
$str = '\\'; // string literal representing one backslash
This works the same in both PHP and regexen. If you want to write a literal backslash in a regex, you have to write /\\/. Now, since you're writing your regexen as PHP strings, you need to double escape them:
$regex = '/\\\\/';
One pair of \\ is first reduced to one \ by the PHP string escaping mechanism, so the actual regex is /\\/, which is a regex which means "one backslash".
I think you can use "preg_quote()":
http://php.net/preg_quote
This function escapes special chars, so you can give an input as it is, without escaping by yourself:
<?php
$string = "online 24/7. Only for \o/";
$escaped_string = preg_quote($string, "/"); // 2nd param is optional and used if you want to escape also the delimiter of your regex
echo $escaped_string; // $escaped_string: "online 24\/7. Only for \\o\/"
?>

matching either nothing (beginning of string) or any character but a \

To use a simplified example, I have:
$str = "Hello :special_text:! Look, I can write \:special_text:";
$pattern = /*???*/":special_text:";
$res = preg_replace($pattern, 'world', $str);
$res = str_replace("/:", ":", $res);
$res === "Hello world! Look, I can write :special_text:"; // => true
In other words, I'd like to be able to "escape" something that I'm writing.
I think that I have something almost working (using [^:]? as the first part of pattern), but I don't think that works if $str === ":special_text:", in that^doesn't match[^:]?`.
You can use a negative lookbehind:
(?<!\\):special_text:
This says "replace a :special_text: that isn't preceded by a backslash".
In your second str_replace looks like you want to replace \: by :.
See it in action here.
Also, don't forget if you use backslash in PHP strings you need to escape them once more (if you want a literal \ you need to use PHP \\, and to get a literal \\ you need to use PHP \\\\:
$pattern = '#(?<!\\\\):([^:]+):#';
Here the # is just a regex delimiter.
$pattern = "/[^\\\\]*:special_text:/";
-or-
$pattern = "/(?<!\\\\):special_text:/";
The other answers don't take into account the need to super-escape the backslashes in this situation. It's a little crazy.
To match a literal backslash, one has to write \\\\ as the regex string because the regular expression must be \\, and each backslash must be expressed as \\ inside a string literal. In regexes that feature backslashes repeatedly, this leads to lots of repeated backslashes and makes the resulting strings difficult to understand.
Something like this should do it: /[^\\]\:([a-z]+)\:/i
You can use RegexPal to text your regex against possible strings in realtime.

PHP Regex for matching a UNC path

I'm after a bit of regex to be used in PHP to validate a UNC path passed through a form. It should be of the format:
\\server\something
... and allow for further sub-folders. It might be good to strip off a trailing slash for consistency although I can easily do this with substr if need be.
I've read online that matching a single backslash in PHP requires 4 backslashes (when using a "C like string") and think I understand why that is (PHP escaping (e.g. 2 = 1, so 4 = 2), then regex engine escaping (the remaining 2 = 1). I've seen the following two quoted as equivalent suitable regex to match a single backslash:
$regex = "/\\\\/s";
or apparently this also:
$regex = "/[\\]/s";
However these produce different results, and that is slightly aside from my final aim to match a complete UNC path.
To see if I could match two backslashes I used the following to test:
$path = "\\\\server";
echo "the path is: $path <br />"; // which is \\server
$regex = "/\\\\\\\\\/s";
if (preg_match($regex, $path))
{
echo "matched";
}
else
{
echo "not matched";
}
The above however seems to match on two or more backslashes :( The pattern is 8 slashes, translating to 2, so why would an input of 3 backslashes ($path = "\\\\\\server") match?
I thought perhaps the following would work:
$regex = "/[\\][\\]/s";
and again, no :(
Please help before I jump out a window lol :)
Use this little gem:
$UNC_regex = '=^\\\\\\\\[a-zA-Z0-9-]+(\\\\[a-zA-Z0-9`~!##$%^&(){}\'._-]+([ ]+[a-zA-Z0-9`~!##$%^&(){}\'._-]+)*)+$=s';
Source: http://regexlib.com/REDetails.aspx?regexp_id=2285 (adopted to PHP string escaping)
The RegEx shown above matches for valid hostname (which allows only a few valid characters) and the path part behind the hostname (which allows many, but not all characters)
Sidenote on the backslashes issue:
When you use double quotes (") to enclose your string, you must be aware of PHP special character escaping.. "\\" is a single \ in PHP.
Important: even with single quotes (') those backslashes must be escaped.
A PHP string with single quotes takes everything in the string literally (unescaped) with a few exceptions:
A backslash followed by a backslash (\\) is interpreted as a single backslash.
('C:\\*.*' => C:\*.*)
A backslash followed by a single-quote (\') is interpreted as a single quote.
('I\'ll be back' => I'll be back)
A backslash followed by anything else is interpreted as a backslash.
('Just a \ somewhere' => Just a \ somewhere)
Also, you must be aware of PCRE escape sequences.
The RegEx parser treats \ for character classes, so you need to escape it for RegEx, again.
To match two \\ you must write $regex = "\\\\\\\\" or $regex = '\\\\\\\\'
From the PHP docs on PCRE escape sequences:
Single and double quoted PHP strings have special meaning of backslash. Thus if \ has to be matched with a regular expression \, then "\\" or '\\' must be used in PHP code.
Regarding your Question:
why would an input of 3 backslashes ($path = "\\\server") match with regex "/\\\\\\\\/s"?
The reason is that you have no boundaries defined (use ^ for beginning and $ for end of string), thus it finds \\ "somewhere" resulting in a positive match. To get the expected result, you should do something like this:
$regex = '/^\\\\\\\\[^\\\\]/s';
The RegEx above has 2 modifications:
^ at the beginning to only match two \\ at the beginning of the string
[^\\] negative character class to say: not followed by an additional backslash
Regarding your last RegEx:
$regex = "/[\\][\\]/s";
You have a confusion (see above for clarification) with backslash escaping here. "/[\\][\\]/s" is interpreted by PHP to /[\][\]/s, which will let the RegEx fail because \ is a reserved character in RegEx and thus must be escaped.
This variant of your RegEx would work, but also match any occurance of two backslashes for the same reason i already explained above:
$regex = '/[\\\\][\\\\]/s';
Echo your regex as well, so you see what's the actual pattern, writing those slashes inside PHP can become akward for the pattern, so you can verify it's correct.
Also you should put ^ at the beginning of the pattern to match from string start and $ to the end to specify that the whole string has to be matched.
\\server\something
Regex:
~^\\\\server\\something$~
PHP String:
$pattern = '~^\\\\\\\\server\\\\something$~';
For the repetition, you want to say that a server exists and it's followed by one or more \something parts. If server is like something, this can be simplified:
^\\(?:\\[a-z]+){2,}$
PHP String:
$pattern = '~^\\\\(?:\\\\[a-z]+){2,}$~';
As there was some confusion about how \ characters should be written inside single quoted strings:
# Output:
#
# * Definition as '\\' ....... results in string(1) "\"
# * Definition as '\\\\' ..... results in string(2) "\\"
# * Definition as '\\\\\\' ... results in string(3) "\\\"
$slashes = array(
'\\',
'\\\\',
'\\\\\\',
);
foreach($slashes as $i => $slashed) {
$definition = sprintf('%s ', var_export($slashed, 1));
ob_start();
var_dump($slashed);
$result = rtrim(ob_get_clean());
printf(" * Definition as %'.-12s results in %s\n", $definition, $result);
}

how do i correct this regular expressions pattern for php

How do i make this match the following text correctly?
$string = "(\'streamer\',\'http://dv_fs06.ovfile.com:182/d/pftume4ksnroarhlslexwl7bcnoqyljeudgmd7dimssniu2b2r2ikr2h/video.flv\')";
preg_match("/streamer\\'\,\\\'(.*?)\\\'\)/", $string , $result);
var_dump($result);
Your $string looks weird. Better to make a three pass parse:
$string = str_replace(array("\'"), '', $string);
Now we have string:
"(streamer,http://dv_fs06.ovfile.com:182/d/pftume4ksnroarhlslexwl7bcnoqyljeudgmd7dimssniu2b2r2ikr2h/video.flv)"
Now let's trim brackets:
$string = trim($string, '()');
And finaly, explode:
list($streamer, $url) = explode(',', $string, 2);
No need of regex.
Btw, your string looks like it was crappyly slashed in mysql query.
It's been a while since I last did regexp matching in PHP, but I think you have to remember that:
' doesn't need to be escaped in PHP strings enclosed by "
\ always needs to be escaped in PHP strings
\ needs to be escaped yet another time in regexps (for it's a special character and you want to treat it as a normal one)
=> \ as part of the string to be matched must be escaped 4 times.
My suggestion:
preg_match("/\\(streamer\\\\',\\\\'(.*?)\\\\'\\)/", $string , $result);
You're on the right track. Two barriers to overcome (As codethief says):
1 - Double quoted string interpolation
2 - Regex escape interpolation
For (2), neither comma's nor quotes need to be escaped because they are not metachars
special to regex's. Only the backslash as a literal needs to be escaped, otherwise
in regex context, it represents the start of a metachar sequence (like \s).
For (1), php will try to interpolate escaped chars as a control code (like \n), for
that reason the literal backslash needs to be escaped. Since this is double quoted,
\' the escaped single qoute has no escape meaning.
Therefore, "\\\'" resolves to \\ = \ + \'=\' ~ \\' which is what the regex sees.
Then the regex interpolates the sequence /\\'/ as a literal \+'.
Making a slight change of your regex solves the problem:
preg_match("/streamer\\\',\\\'(.*?)\\\'\)/", $string , $result);
A working example is here http://beta.ideone.com/47EIY

Categories