Replacing ereg_replace() - php

I have some legacy code I'm trying to update to run on PHP 5.4:
$row['subject'] = ereg_replace('[\]', '', $row['subject']);
I know I have to replace this with preg_replace, but since I've never used ereg, I'm not sure what this actually does.
Is the equivalent:
preg_replace('/[\\]/'...
or
preg_replace('/\[\\\]/'...
Can anyone shine a light on what the correct replacement should be?

The code you posted replaces the backslashes with nothing and it doesn't match the square brackets; they are used to create a character range that contains only the backslash character and are completely useless on your regex.
The equivalent preg_replace() statement is:
$row['subject'] = preg_replace('/\\\\/', '', $row['subject']);
The regex is /\\/. Using a single backslash (\) produces an error; the backslash is an escape character in regex, it escapes the / making it be interpreted literally and not as the delimiter. The two extra backslashes are needed because the backslash is also an escape character in PHP and it needs to be escaped too.
I guess this was the reason the original coder enclosed the backslash into a range, to circumvent the need for escaping and double escaping.
Using the same trick you can write it with preg_replace() as:
$row['subject'] = preg_replace('/[\]/', '', $row['subject']);
A single backslash is enough. In the regex the backslash escapes are not allowed in character ranges. And in PHP single-quoted strings, the backslash needs to be escaped only if it is the last character from the string.
(I know, the documentation teaches us to double all the backslashes, but, on the other hand, it says that a backslash that does not precede an apostrophe or another backslash in single-quoted strings is interpreted literally.)
Back to the regex, there is a better (faster, cleaner) way to rewrite the above call to ereg_replace(): do not use regex at all. Because all it does is to match (and replace) the backslashes, you don't need to use regex. A simple str_replace() is enough:
$row['subject'] = str_replace('\\', '', $row['subject']);

Related

How to properly escape a string for use in regular expression in PHP?

I am trying to escape a string for use in a regular expression in PHP. So far I tried:
preg_quote(addslashes($string));
I thought I need addslashes in order to properly account for any quotes that are in the string. Then preg_quote escapes the regular expression characters.
However, the problem is that quotes are escaped with backslash, e.g. \'. But then preg_quote escapes the backslash with another one, e.g. \\'. So this leaves the quote unescaped once again. Switching the two functions does not work either because that would leave an unescaped backslash which is then interpreted as a special regular expression character.
Is there a function in PHP to accomplish the task? Or how would one do it?
The proper way is to use preg_quote and specify the used pattern delimiter.
preg_quote() takes str and puts a backslash in front of every character that is part of the regular expression syntax... characters are: . \ + * ? [ ^ ] $ ( ) { } = ! < > | : -
Trying to use a backslash as delimiter is a bad idea. Usually you pick a character, that's not used in the pattern. Commonly used is slash /pattern/, tilde ~pattern~, number sign #pattern# or percent sign %pattern%. It is also possible to use bracket style delimiters: (pattern)
Your regex with modification mentioned in comments by #CasimiretHippolyte and #anubhava.
$pattern = '/(?<![a-z])' . preg_quote($string, "/") . '/i';
Maybe wanted to use \b word boundary. No need for any additional escaping.

Backslash in Regex- PHP

I am trying to learn Regex in PHP and stuck in here now. My ques may appear silly but pls do explain.
I went through a link:
Extra backslash needed in PHP regexp pattern
But I just could not understand something:
In the answer he mentions two statements:
2 backslashes are used for unescaping in a string ("\\\\" -> \\)
1 backslash is used for unescaping in the regex engine (\\ -> \)
My ques:
what does the word "unescaping" actually means? what is the purpose of unescaping?
Why do we need 4 backslashes to include it in the regex?
The backslash has a special meaning in both regexen and PHP. In both cases it is used as an escape character. For example, if you want to write a literal quote character inside a PHP string literal, this won't work:
$str = ''';
PHP would get "confused" which ' ends the string and which is part of the string. That's where \ comes in:
$str = '\'';
It escapes the special meaning of ', so instead of terminating the string literal, it is now just a normal character in the string. There are more escape sequences like \n as well.
This now means that \ is a special character with a special meaning. To escape this conundrum when you want to write a literal \, you'll have to escape literal backslashes as \\:
$str = '\\'; // string literal representing one backslash
This works the same in both PHP and regexen. If you want to write a literal backslash in a regex, you have to write /\\/. Now, since you're writing your regexen as PHP strings, you need to double escape them:
$regex = '/\\\\/';
One pair of \\ is first reduced to one \ by the PHP string escaping mechanism, so the actual regex is /\\/, which is a regex which means "one backslash".
I think you can use "preg_quote()":
http://php.net/preg_quote
This function escapes special chars, so you can give an input as it is, without escaping by yourself:
<?php
$string = "online 24/7. Only for \o/";
$escaped_string = preg_quote($string, "/"); // 2nd param is optional and used if you want to escape also the delimiter of your regex
echo $escaped_string; // $escaped_string: "online 24\/7. Only for \\o\/"
?>

How to properly escape a backslash to match a literal backslash in single-quoted and double-quoted PHP regex patterns

To match a literal backslash, many people and the PHP manual say: Always triple escape it, like this \\\\
Note:
Single and double quoted PHP strings have special meaning of backslash. Thus if \ has to be matched with a regular expression \\, then "\\\\" or '\\\\' must be used in PHP code.
Here is an example string: \test
$test = "\\test"; // outputs \test;
// WON'T WORK: pattern in double-quotes double-escaped backslash
#echo preg_replace("~\\\t~", '', $test); #output -> \test
// WORKS: pattern in double-quotes with triple-escaped backslash
#echo preg_replace("~\\\\t~", '', $test); #output -> est
// WORKS: pattern in single-quotes with double-escaped backslash
#echo preg_replace('~\\\t~', '', $test); #output -> est
// WORKS: pattern in double-quotes with double-escaped backslash inside a character class
#echo preg_replace("~[\\\]t~", '', $test); #output -> est
// WORKS: pattern in single-quotes with double-escaped backslash inside a character class
#echo preg_replace('~[\\\]t~', '', $test); #output -> est
Conclusion:
If the pattern is single-quoted, a backslash has to be double-escaped \\\ to match a literal \
If the pattern is double-quoted, it depends whether
the backlash is inside a character-class where it must be at least double-escaped \\\
outside a character-class it has to be triple-escaped \\\\
Who can show me a difference, where a double-escaped backslash in a single-quoted pattern e.g. '~\\\~' would match anything different than a triple-escaped backslash in a double-quoted pattern e.g. "~\\\\~" or fail.
When/why/in what scenario would it be wrong to use a double-escaped \ in a single-quoted pattern e.g. '~\\\~' for matching a literal backslash?
If there's no answer to this question, I would continue to always use a double-escaped backslash \\\ in a single-quoted PHP regex pattern to match a literal \ because there's possibly nothing wrong with it.
A backslash character (\) is considered to be an escape character by both PHP's parser and the regular expression engine (PCRE). If you write a single backslash character, it will be considered as an escape character by PHP parser. If you write two backslashes, it will be interpreted as a literal backslash by PHP's parser. But when used in a regular expression, the regular expression engine picks it up as an escape character. To avoid this, you need to write four backslash characters, depending upon how you quote the pattern.
To understand the difference between the two types of quoting patterns, consider the following two var_dump() statements:
var_dump('~\\\~');
var_dump("~\\\\~");
Output:
string(4) "~\\~"
string(4) "~\\~"
The escape sequence \~ has no special meaning in PHP when it's used in a single-quoted string. Three backslashes do also work because the PHP parser doesn't know about the escape sequence \~. So \\ will become \ but \~ will remain as \~.
Which one should you use:
For clarity, I'd always use ~\\\\~ when I want to match a literal backslash. The other one works too, but I think ~\\\\~ is more clear.
There is no difference between the actual escaping of the slash in either single or double quoted strings in PHP - as long as you do it correct. The reason why you're getting a WONT WORK on your first example is, as pointed out in the comments, it expands \t to the tab meta character.
When you're using just three backslashes, the last one in your single quoted string will be interpreted as \~, which as far as single quoted strings go, will be left as it is (since it does not match a valid escape sequence). It is however just a coincidence that this will be parsed as you expect in this case, and not have some sort of side effect (i.e, \\\' would not behave the same way).
The reason for all the escaping is that the regular expression also needs backslashes escaped in certain situations, as they have special meaning there as well. This leads to the large number of backslashes after each other, such as \\\\ (which takes eight backslashes for the markdown parser, as it yet again adds another level of escaping).
Hopefully that clears it up, as you seem to be confused regarding the handling of backslashes in single/double quoted strings more than the behaviour in the regular expression itself (which will be the same regardless of " or ', as long as you escape things correctly).

PHP preg_replace backslash

I have double backslashes '\' in my string that needs to be converted into single backslashes '\'. I've tried several combinations and end up with the whole string disappearing when I used echo or more backslashes are added to the string by accident. This regex thing is making me go bonkers...lol...
I tried this amongst other failed attempts:
$pattern = '[\\]';
$replacement = '/\/';
?>
<td width="100%"> <?php echo preg_replace($pattern, $replacement,$q[$i]);?></td>
I do apologise if this is a foolish issue and I appreciate any pointers.
Use stripslashes() - it does exactly what you're looking for.
<td width="100%"> <?php echo stripslashes($q[$i]);?></td>
Use stripslashes instead. Also, in your regex, you are searching for single backslashes and your replacement is incorrect. \\{2} should search for double backslashes and \ should replace them with singles, although I haven't tested this.
Just to explain further, the pattern [\\] matches any character in a set comprised of a single backslash. In php, you should also delimit your regex with forward slashes: /[\\]/
Your replacement, which is (without delimiters) \, is not a regular expression for matching a single backslash. The regex for matching a single backslash is \\. Note the escaping. This said, the replacement term needs to be a string, not a regex (with the exception of backreferences).
EDIT: Sven claims below that stripslashes removes all backslashes. This is simply not true, and I will explain why below.
If a string contains 2 backslashes, the first one will be considered an escaping backslash and will be removed. This can be seen at http://www.phpfiddle.org/main/code/3yn-2ut. The fact that any backslashes remain at all by itself contradicts the claim that stripslashes removes all backslashes.
Just to clarify, this string declaration is invalid: $x = "\";, since the backslash escapes the second quote. This string "\\" contains one backslash. In the process of unquoting this string, this backslash will be removed. This "\\\\" string contains two backslashes. When unquoting, the first will be considered an escaping backslash, and will be removed.
Use preg_replace to turn double backslash into single backslash:
preg_replace('/\\\\{2}/', '\\', $str)
The \ in the first parameter needs to be escaped twice, once for string and once more for regex, just like CodeAngry says.
In the second parameter it only gets excaped once for string.
Make sense?
Never use a regular expression if the string you are looking for is constant, as is the case with "Every instance of double backslash".
Use str_replace() for this task. It is a very easy function that replaces every occurance of a string with another.
In your case: str_replace('\\\\', '\\', $var).
The double backslash actually translates into four backslashed, because inside any quotes (single or double), a single backslash is the start of an escape sequence for the following character. If you want one literal backslash, you have to write two of them. You want two backslashes, you have to write four of them.
I do not like the suggestion of stripslashes(). This will of course "decode" your double backslash into one single backslash. But it will also remove all single backslashes in the whole string. If there were none - fine, otherwise things will fail now.
$pattern = '[\\]'; // wrong
$pattern = '[\\\\]'; // right
escape \ as \\ and escape \\ as \\\\ because \\] means escaped ].
Use htmlentities function to convert your slashes to html entities then using str_replace or preg_match to change them with new entity

PHP regex periods

How do I put a period into a PHP regular expression?
The way it is used in the code is:
echo(preg_match("/\$\d{1,}\./", '$645.', $matches));
But apparently the period in that $645. doesn't get recognized. Requesting tips on how to make this work.
Since . is a special character, you need to escape it to have it literally, so \..
Remember to also escape the escape character if you want to use it in a string. So if you want to write the regular expression foo\.bar in a string declaration, it needs to be "foo\\.bar".
Escape it. The period has a special meaning within a regular expression in that it represents any character — it's a wildcard. To represent and match a literal . it needs to be escaped which is done via the backslash \, i.e., \.
/[0-9]\.[ab]/
Matches a digit, a period, and "a" or "ab", whereas
/[0-9].[ab]/
Matches a digit, any single character1, and "a" or "ab".
Be aware that PHP uses the backslash as an escape character in double-quoted string, too. In these cases you'll need to doubly escape:
$single = '\.';
$double = "\\.";
UPDATE
This echo(preg_match("/\$\d{1,}./", '$645.', $matches)); could be rewritten as echo(preg_match('/\$\d{1,}\./', '$645.', $matches)); or echo(preg_match("/\\$\\d{1,}\\./", '$645.', $matches));. They both work.
1) Not linefeeds, unless configured via the s modifier.

Categories