I know I can create a verbatim string literal in C# by using the # symbol. For example, the usual
String path = "C:\\MyDocs\\myText.txt";
can also be re-written as
String path = #"C:\MyDocs\myText.txt";
In this way, the string literal isn't cluttered with escape characters and makes it much more readable.
What I would like to know is whether PHP also has an equivalent or do I have to manually escape the string myself?
$path = 'C:\MyDocs\myText.txt';
" double quotes allow for all sorts of special character sequences, ' single quotes are verbatim (there's only some fine print about escaping ' and escaping an escape \).
Even single-quoted strings in PHP have the need for escaping at least literal single-quotes and literal backslashes:
$str = 'Single quotes won\'t help me \ avoid escapes or save a tree';
The only non-parsed solution for PHP is to use nowdocs. This requires you use PHP 5.3.
$str = <<<'EOD'
I mustn't quote verbatim text \
maybe in the version next
EOD;
Related
I am trying to learn Regex in PHP and stuck in here now. My ques may appear silly but pls do explain.
I went through a link:
Extra backslash needed in PHP regexp pattern
But I just could not understand something:
In the answer he mentions two statements:
2 backslashes are used for unescaping in a string ("\\\\" -> \\)
1 backslash is used for unescaping in the regex engine (\\ -> \)
My ques:
what does the word "unescaping" actually means? what is the purpose of unescaping?
Why do we need 4 backslashes to include it in the regex?
The backslash has a special meaning in both regexen and PHP. In both cases it is used as an escape character. For example, if you want to write a literal quote character inside a PHP string literal, this won't work:
$str = ''';
PHP would get "confused" which ' ends the string and which is part of the string. That's where \ comes in:
$str = '\'';
It escapes the special meaning of ', so instead of terminating the string literal, it is now just a normal character in the string. There are more escape sequences like \n as well.
This now means that \ is a special character with a special meaning. To escape this conundrum when you want to write a literal \, you'll have to escape literal backslashes as \\:
$str = '\\'; // string literal representing one backslash
This works the same in both PHP and regexen. If you want to write a literal backslash in a regex, you have to write /\\/. Now, since you're writing your regexen as PHP strings, you need to double escape them:
$regex = '/\\\\/';
One pair of \\ is first reduced to one \ by the PHP string escaping mechanism, so the actual regex is /\\/, which is a regex which means "one backslash".
I think you can use "preg_quote()":
http://php.net/preg_quote
This function escapes special chars, so you can give an input as it is, without escaping by yourself:
<?php
$string = "online 24/7. Only for \o/";
$escaped_string = preg_quote($string, "/"); // 2nd param is optional and used if you want to escape also the delimiter of your regex
echo $escaped_string; // $escaped_string: "online 24\/7. Only for \\o\/"
?>
To match a literal backslash, many people and the PHP manual say: Always triple escape it, like this \\\\
Note:
Single and double quoted PHP strings have special meaning of backslash. Thus if \ has to be matched with a regular expression \\, then "\\\\" or '\\\\' must be used in PHP code.
Here is an example string: \test
$test = "\\test"; // outputs \test;
// WON'T WORK: pattern in double-quotes double-escaped backslash
#echo preg_replace("~\\\t~", '', $test); #output -> \test
// WORKS: pattern in double-quotes with triple-escaped backslash
#echo preg_replace("~\\\\t~", '', $test); #output -> est
// WORKS: pattern in single-quotes with double-escaped backslash
#echo preg_replace('~\\\t~', '', $test); #output -> est
// WORKS: pattern in double-quotes with double-escaped backslash inside a character class
#echo preg_replace("~[\\\]t~", '', $test); #output -> est
// WORKS: pattern in single-quotes with double-escaped backslash inside a character class
#echo preg_replace('~[\\\]t~', '', $test); #output -> est
Conclusion:
If the pattern is single-quoted, a backslash has to be double-escaped \\\ to match a literal \
If the pattern is double-quoted, it depends whether
the backlash is inside a character-class where it must be at least double-escaped \\\
outside a character-class it has to be triple-escaped \\\\
Who can show me a difference, where a double-escaped backslash in a single-quoted pattern e.g. '~\\\~' would match anything different than a triple-escaped backslash in a double-quoted pattern e.g. "~\\\\~" or fail.
When/why/in what scenario would it be wrong to use a double-escaped \ in a single-quoted pattern e.g. '~\\\~' for matching a literal backslash?
If there's no answer to this question, I would continue to always use a double-escaped backslash \\\ in a single-quoted PHP regex pattern to match a literal \ because there's possibly nothing wrong with it.
A backslash character (\) is considered to be an escape character by both PHP's parser and the regular expression engine (PCRE). If you write a single backslash character, it will be considered as an escape character by PHP parser. If you write two backslashes, it will be interpreted as a literal backslash by PHP's parser. But when used in a regular expression, the regular expression engine picks it up as an escape character. To avoid this, you need to write four backslash characters, depending upon how you quote the pattern.
To understand the difference between the two types of quoting patterns, consider the following two var_dump() statements:
var_dump('~\\\~');
var_dump("~\\\\~");
Output:
string(4) "~\\~"
string(4) "~\\~"
The escape sequence \~ has no special meaning in PHP when it's used in a single-quoted string. Three backslashes do also work because the PHP parser doesn't know about the escape sequence \~. So \\ will become \ but \~ will remain as \~.
Which one should you use:
For clarity, I'd always use ~\\\\~ when I want to match a literal backslash. The other one works too, but I think ~\\\\~ is more clear.
There is no difference between the actual escaping of the slash in either single or double quoted strings in PHP - as long as you do it correct. The reason why you're getting a WONT WORK on your first example is, as pointed out in the comments, it expands \t to the tab meta character.
When you're using just three backslashes, the last one in your single quoted string will be interpreted as \~, which as far as single quoted strings go, will be left as it is (since it does not match a valid escape sequence). It is however just a coincidence that this will be parsed as you expect in this case, and not have some sort of side effect (i.e, \\\' would not behave the same way).
The reason for all the escaping is that the regular expression also needs backslashes escaped in certain situations, as they have special meaning there as well. This leads to the large number of backslashes after each other, such as \\\\ (which takes eight backslashes for the markdown parser, as it yet again adds another level of escaping).
Hopefully that clears it up, as you seem to be confused regarding the handling of backslashes in single/double quoted strings more than the behaviour in the regular expression itself (which will be the same regardless of " or ', as long as you escape things correctly).
I am confused with following string function
echo strlen("l\n2"); //give 3 in output
where as
echo strlen('l\n2'); //give 4 in output
can anybody explain why ?
Because when you use single quotes (' '), PHP does not expand the \n as a single new line character whereas in double quotes (" "), \n translates to the new line character (ie. a single character) thus giving 3 characters
Taken from PHP's String Documentation: http://php.net/manual/en/language.types.string.php
Note: Unlike the double-quoted and heredoc syntaxes, variables and escape sequences for special characters will not be expanded when they occur in single quoted strings.
\n is not parsed as a newline character when the string is wrapped in single quotes. Instead, it is treated as a literal \ followed by n.
How do i make this match the following text correctly?
$string = "(\'streamer\',\'http://dv_fs06.ovfile.com:182/d/pftume4ksnroarhlslexwl7bcnoqyljeudgmd7dimssniu2b2r2ikr2h/video.flv\')";
preg_match("/streamer\\'\,\\\'(.*?)\\\'\)/", $string , $result);
var_dump($result);
Your $string looks weird. Better to make a three pass parse:
$string = str_replace(array("\'"), '', $string);
Now we have string:
"(streamer,http://dv_fs06.ovfile.com:182/d/pftume4ksnroarhlslexwl7bcnoqyljeudgmd7dimssniu2b2r2ikr2h/video.flv)"
Now let's trim brackets:
$string = trim($string, '()');
And finaly, explode:
list($streamer, $url) = explode(',', $string, 2);
No need of regex.
Btw, your string looks like it was crappyly slashed in mysql query.
It's been a while since I last did regexp matching in PHP, but I think you have to remember that:
' doesn't need to be escaped in PHP strings enclosed by "
\ always needs to be escaped in PHP strings
\ needs to be escaped yet another time in regexps (for it's a special character and you want to treat it as a normal one)
=> \ as part of the string to be matched must be escaped 4 times.
My suggestion:
preg_match("/\\(streamer\\\\',\\\\'(.*?)\\\\'\\)/", $string , $result);
You're on the right track. Two barriers to overcome (As codethief says):
1 - Double quoted string interpolation
2 - Regex escape interpolation
For (2), neither comma's nor quotes need to be escaped because they are not metachars
special to regex's. Only the backslash as a literal needs to be escaped, otherwise
in regex context, it represents the start of a metachar sequence (like \s).
For (1), php will try to interpolate escaped chars as a control code (like \n), for
that reason the literal backslash needs to be escaped. Since this is double quoted,
\' the escaped single qoute has no escape meaning.
Therefore, "\\\'" resolves to \\ = \ + \'=\' ~ \\' which is what the regex sees.
Then the regex interpolates the sequence /\\'/ as a literal \+'.
Making a slight change of your regex solves the problem:
preg_match("/streamer\\\',\\\'(.*?)\\\'\)/", $string , $result);
A working example is here http://beta.ideone.com/47EIY
What's the regular expression to find \"
I think it's this: '/\\"/' but I need to use it on a really large dataset so need to make sure this is correct.
I need to replace it with " so my code is : $data = preg_replace('/\\"/', '"', $data)
Is that correct?
For matching backslashes you need to 'double-escape' them, so you have four \ at the end:
$data = preg_replace('/\\\\"/', '"', $data);
Why you need 4 \: PHP parses a string \\" as \" and RegEx interprets this as " since in RegEx you don't need to escape ". So it wont match \". \\\\" will be parsed as \\" which will be interpreted as \" by RegEx.
A backslash does not need to be escaped in either a single-quoted string or a regular expression, unless the following character is a character that can be escaped (such as the backslash itself).
A double quote does not need to be escaped and cannot be escaped in a single-quoted string. In a regular expression it doesn't have to be either, but it can be.
That means \\ in both a single-quoted string and a regular expression becomes \, while \" in a single-quoted string remains \", while in a regular expression it becomes ".
However, in PHP you can only create a regular expression from a string, so you have to escape twice.
In other words...
Original string String processed Regexp processed
'/\"/' /\"/ "
'/\\"/' /\"/ "
'/\\\"/' /\\"/ \"
'/\\\\"/' /\\"/ \"
'/\\\\\"/' /\\\"/ \"
'/\\\\\\"/' /\\\"/ \"
'/\\\\\\\"/' /\\\\"/ \\"
Bonus backslash
In a double-quoted string, of course, the " does need to be escaped, so...
"/\"/" /"/ "
"/\\"/" syntax error
"/\\\"/" /\"/ "
"/\\\\"/" syntax error
"/\\\\\"/" /\\"/ \"
"/\\\\\\"/" syntax error
"/\\\\\\\"/" /\\\"/ \"
"/\\\\\\\\"/" syntax error
"/\\\\\\\\\"/" /\\\\"/ \\"
I think you should probably go for preg_replace("/\\\\\\\"/", "\"", $data) just to be on the safeconfusing side.
As long as you mean the literal string \", matching for those characters in a regular expression requires:
\\"
So, you'd use /\\\\"/ as the pattern parameter in a preg_* function.
(You only need to escape the backslash - since PHP handles backslashes in single and double-quotes strings as a special character, you need to escape them twice.)
Is this all you need to match? If so, I'd recommend just using str_replace():
$string = str_replace('\\"', '"', $string);
For a simple search/replace of literal characters like this, an iterative string function like str_replace() will be faster than a regular expression.
this one is correct.
preg_replace('/\\\"/', '"', $data);
http://sandbox.phpcode.eu/g/1283c.php
In PHP, backslashes have special meaning. You can therefore represent a literal backslash as either of the following: \\\ or \\\\. The alternative method is to use a character class: [\\].
Refer to the section labeled "Note" here:
http://www.php.net/manual/en/regexp.reference.escape.php
Would this not work just as well for your data?
str_replace('\\"','"',$data);
$result = preg_replace('/\\\\"/i', '"', $subject);