How are you? I have the next task. I have a lot of strings that can contain duplicate slashes. I need to replace duplicate slashes to one slash (any count of slashes), but when the next symbols found after slashes (quote, double quote, NUL (NULL byte)) - all slashes should be removed. Thanks. My language - PHP. Some tests:
$s1 = 'test\\\\string';
// test\string
$s2 = 'test\\\\\"\\\\\'\\\\string';
// test"'\string
$s3 = 'test\\string\\\\\"';
// test\string"
Use
preg_replace("~\\\\+([\"\'\\x00\\\\])~", "$1", $string);
to replace arbitrary amounts of \ with just one \.
The pattern consist of arbitrary initial backslahes \\\\+ and a following symbol that is one of ", ', \x00, or \. The replacement will effectively remove any precending backslahes.
You need 4 backslashes in your regular expression. Two backslashes (\\) will lead to one backslash (\) inside the regular expression string because the PHP interpreter uses backslashes to escape special characters like " or \. For the same reason you will need two backslahes inside your regular expression.
Or explained the other way around: To gain \+ as regular expression, you have to add a backslash to tell PCRE that the one backslash is not for escaping the +. To get \\+ as a string you will also need to add one backslash before each backslash to tell the PHP interpreter that you don't want to escape the second backslash with the first.
source code: \\\\+
inside regular expression string: \\+
pattern matches: \+
Replace 2 or more consecutive slashes to a single slash
preg_replace('/\\\\+/','\\',$str);
Alternative way.
$s = 't\est\\\\\\\\\\\\stri\\\\\"\\\\\'\\\\0\\\\ng';
$s = preg_replace('~\\\\+~', '\\', $s);
$s = str_replace(array('\\"', '\\\'', '\\0'), array('"', '\'', "\0"), $s);
Try these:
preg_replace("/\\+(['\"\0\\])/", "$1", $string);
What's wrong with stripslashes? It accounts for slashes that escape a "special" character and removes "extra" slashes.
Related
I have this piece of code:
$data = 'Test\\vv testing\\vv';
$data = preg_replace('/(\\vv)/', '\test', $data);
Yet after the preg_replace, $data is still Test\vv testing\vv.
I don't understand why, on any regex tester, the regex matches the part of the string I want to replace, which are the '\v' parts.
A backslash character (\) is considered to be an escape character by both PHP's parser and the regular expression engine (PCRE). If you write a single backslash character, it will be considered as an escape character by PHP parser. If you write two backslashes, it will be interpreted as a literal backslash by PHP's parser. But when used in a regular expression, the regular expression engine also picks it up as an escape character.
So in order to match a literal backslash in PHP regex, you need to write four slashes: \\\\:
From the PHP manual documentation on Escape Sequences:
Single and double quoted PHP strings have special meaning of backslash. Thus if \ has to be matched with a regular expression \\, then "\\\\" or '\\\\' must be used in PHP code.
Your code should be:
$data = preg_replace('/(\\\\vv)/', '\replaced', $data);
Demo
Note: As Bob0t correctly said in the comments, you do not need preg_replace() for this task since it only involves plain strings and not patterns. A plain 'ol str_replace() should suffice:
$data = str_replace('\\vv', '\test', $data);
In PHP you can use:
$re = "/\\\\vv/";
$str = "Test\\vv testing\\vv";
$subst = '\\replaced';
$result = preg_replace($re, $subst, $str);
In PHP you need to use \\\\ to match a single backslash \. It is because 1st escapig is for PHP string and 2nd escaping is for underlying PCRE regex engine.
Regex Demo
I have double backslashes '\' in my string that needs to be converted into single backslashes '\'. I've tried several combinations and end up with the whole string disappearing when I used echo or more backslashes are added to the string by accident. This regex thing is making me go bonkers...lol...
I tried this amongst other failed attempts:
$pattern = '[\\]';
$replacement = '/\/';
?>
<td width="100%"> <?php echo preg_replace($pattern, $replacement,$q[$i]);?></td>
I do apologise if this is a foolish issue and I appreciate any pointers.
Use stripslashes() - it does exactly what you're looking for.
<td width="100%"> <?php echo stripslashes($q[$i]);?></td>
Use stripslashes instead. Also, in your regex, you are searching for single backslashes and your replacement is incorrect. \\{2} should search for double backslashes and \ should replace them with singles, although I haven't tested this.
Just to explain further, the pattern [\\] matches any character in a set comprised of a single backslash. In php, you should also delimit your regex with forward slashes: /[\\]/
Your replacement, which is (without delimiters) \, is not a regular expression for matching a single backslash. The regex for matching a single backslash is \\. Note the escaping. This said, the replacement term needs to be a string, not a regex (with the exception of backreferences).
EDIT: Sven claims below that stripslashes removes all backslashes. This is simply not true, and I will explain why below.
If a string contains 2 backslashes, the first one will be considered an escaping backslash and will be removed. This can be seen at http://www.phpfiddle.org/main/code/3yn-2ut. The fact that any backslashes remain at all by itself contradicts the claim that stripslashes removes all backslashes.
Just to clarify, this string declaration is invalid: $x = "\";, since the backslash escapes the second quote. This string "\\" contains one backslash. In the process of unquoting this string, this backslash will be removed. This "\\\\" string contains two backslashes. When unquoting, the first will be considered an escaping backslash, and will be removed.
Use preg_replace to turn double backslash into single backslash:
preg_replace('/\\\\{2}/', '\\', $str)
The \ in the first parameter needs to be escaped twice, once for string and once more for regex, just like CodeAngry says.
In the second parameter it only gets excaped once for string.
Make sense?
Never use a regular expression if the string you are looking for is constant, as is the case with "Every instance of double backslash".
Use str_replace() for this task. It is a very easy function that replaces every occurance of a string with another.
In your case: str_replace('\\\\', '\\', $var).
The double backslash actually translates into four backslashed, because inside any quotes (single or double), a single backslash is the start of an escape sequence for the following character. If you want one literal backslash, you have to write two of them. You want two backslashes, you have to write four of them.
I do not like the suggestion of stripslashes(). This will of course "decode" your double backslash into one single backslash. But it will also remove all single backslashes in the whole string. If there were none - fine, otherwise things will fail now.
$pattern = '[\\]'; // wrong
$pattern = '[\\\\]'; // right
escape \ as \\ and escape \\ as \\\\ because \\] means escaped ].
Use htmlentities function to convert your slashes to html entities then using str_replace or preg_match to change them with new entity
To use a simplified example, I have:
$str = "Hello :special_text:! Look, I can write \:special_text:";
$pattern = /*???*/":special_text:";
$res = preg_replace($pattern, 'world', $str);
$res = str_replace("/:", ":", $res);
$res === "Hello world! Look, I can write :special_text:"; // => true
In other words, I'd like to be able to "escape" something that I'm writing.
I think that I have something almost working (using [^:]? as the first part of pattern), but I don't think that works if $str === ":special_text:", in that^doesn't match[^:]?`.
You can use a negative lookbehind:
(?<!\\):special_text:
This says "replace a :special_text: that isn't preceded by a backslash".
In your second str_replace looks like you want to replace \: by :.
See it in action here.
Also, don't forget if you use backslash in PHP strings you need to escape them once more (if you want a literal \ you need to use PHP \\, and to get a literal \\ you need to use PHP \\\\:
$pattern = '#(?<!\\\\):([^:]+):#';
Here the # is just a regex delimiter.
$pattern = "/[^\\\\]*:special_text:/";
-or-
$pattern = "/(?<!\\\\):special_text:/";
The other answers don't take into account the need to super-escape the backslashes in this situation. It's a little crazy.
To match a literal backslash, one has to write \\\\ as the regex string because the regular expression must be \\, and each backslash must be expressed as \\ inside a string literal. In regexes that feature backslashes repeatedly, this leads to lots of repeated backslashes and makes the resulting strings difficult to understand.
Something like this should do it: /[^\\]\:([a-z]+)\:/i
You can use RegexPal to text your regex against possible strings in realtime.
I'm after a bit of regex to be used in PHP to validate a UNC path passed through a form. It should be of the format:
\\server\something
... and allow for further sub-folders. It might be good to strip off a trailing slash for consistency although I can easily do this with substr if need be.
I've read online that matching a single backslash in PHP requires 4 backslashes (when using a "C like string") and think I understand why that is (PHP escaping (e.g. 2 = 1, so 4 = 2), then regex engine escaping (the remaining 2 = 1). I've seen the following two quoted as equivalent suitable regex to match a single backslash:
$regex = "/\\\\/s";
or apparently this also:
$regex = "/[\\]/s";
However these produce different results, and that is slightly aside from my final aim to match a complete UNC path.
To see if I could match two backslashes I used the following to test:
$path = "\\\\server";
echo "the path is: $path <br />"; // which is \\server
$regex = "/\\\\\\\\\/s";
if (preg_match($regex, $path))
{
echo "matched";
}
else
{
echo "not matched";
}
The above however seems to match on two or more backslashes :( The pattern is 8 slashes, translating to 2, so why would an input of 3 backslashes ($path = "\\\\\\server") match?
I thought perhaps the following would work:
$regex = "/[\\][\\]/s";
and again, no :(
Please help before I jump out a window lol :)
Use this little gem:
$UNC_regex = '=^\\\\\\\\[a-zA-Z0-9-]+(\\\\[a-zA-Z0-9`~!##$%^&(){}\'._-]+([ ]+[a-zA-Z0-9`~!##$%^&(){}\'._-]+)*)+$=s';
Source: http://regexlib.com/REDetails.aspx?regexp_id=2285 (adopted to PHP string escaping)
The RegEx shown above matches for valid hostname (which allows only a few valid characters) and the path part behind the hostname (which allows many, but not all characters)
Sidenote on the backslashes issue:
When you use double quotes (") to enclose your string, you must be aware of PHP special character escaping.. "\\" is a single \ in PHP.
Important: even with single quotes (') those backslashes must be escaped.
A PHP string with single quotes takes everything in the string literally (unescaped) with a few exceptions:
A backslash followed by a backslash (\\) is interpreted as a single backslash.
('C:\\*.*' => C:\*.*)
A backslash followed by a single-quote (\') is interpreted as a single quote.
('I\'ll be back' => I'll be back)
A backslash followed by anything else is interpreted as a backslash.
('Just a \ somewhere' => Just a \ somewhere)
Also, you must be aware of PCRE escape sequences.
The RegEx parser treats \ for character classes, so you need to escape it for RegEx, again.
To match two \\ you must write $regex = "\\\\\\\\" or $regex = '\\\\\\\\'
From the PHP docs on PCRE escape sequences:
Single and double quoted PHP strings have special meaning of backslash. Thus if \ has to be matched with a regular expression \, then "\\" or '\\' must be used in PHP code.
Regarding your Question:
why would an input of 3 backslashes ($path = "\\\server") match with regex "/\\\\\\\\/s"?
The reason is that you have no boundaries defined (use ^ for beginning and $ for end of string), thus it finds \\ "somewhere" resulting in a positive match. To get the expected result, you should do something like this:
$regex = '/^\\\\\\\\[^\\\\]/s';
The RegEx above has 2 modifications:
^ at the beginning to only match two \\ at the beginning of the string
[^\\] negative character class to say: not followed by an additional backslash
Regarding your last RegEx:
$regex = "/[\\][\\]/s";
You have a confusion (see above for clarification) with backslash escaping here. "/[\\][\\]/s" is interpreted by PHP to /[\][\]/s, which will let the RegEx fail because \ is a reserved character in RegEx and thus must be escaped.
This variant of your RegEx would work, but also match any occurance of two backslashes for the same reason i already explained above:
$regex = '/[\\\\][\\\\]/s';
Echo your regex as well, so you see what's the actual pattern, writing those slashes inside PHP can become akward for the pattern, so you can verify it's correct.
Also you should put ^ at the beginning of the pattern to match from string start and $ to the end to specify that the whole string has to be matched.
\\server\something
Regex:
~^\\\\server\\something$~
PHP String:
$pattern = '~^\\\\\\\\server\\\\something$~';
For the repetition, you want to say that a server exists and it's followed by one or more \something parts. If server is like something, this can be simplified:
^\\(?:\\[a-z]+){2,}$
PHP String:
$pattern = '~^\\\\(?:\\\\[a-z]+){2,}$~';
As there was some confusion about how \ characters should be written inside single quoted strings:
# Output:
#
# * Definition as '\\' ....... results in string(1) "\"
# * Definition as '\\\\' ..... results in string(2) "\\"
# * Definition as '\\\\\\' ... results in string(3) "\\\"
$slashes = array(
'\\',
'\\\\',
'\\\\\\',
);
foreach($slashes as $i => $slashed) {
$definition = sprintf('%s ', var_export($slashed, 1));
ob_start();
var_dump($slashed);
$result = rtrim(ob_get_clean());
printf(" * Definition as %'.-12s results in %s\n", $definition, $result);
}
Need Some Help With Regex:
I want to replace
[url=http://youtube.com]YouTube.com[/url]
with
YouTube.com
the regex
preg_replace("/[url=(.*?)](.*?)[/url]/is", '$2', $text);
why does this give me:
Warning: preg_replace() [function.preg-replace]: Unknown modifier 'r' in C:\Programa\wamp\www\func.php on line 18
You should escape special characters in your regular expression:
preg_replace('/\[url=(.*?)](.*?)\[\/url]/is', '$2', $text);
I have escaped the [ characters (they specify the start of a character class) and the / character (it specifies the boundaries of the regular expression.)
Alternatively (for the / character) you can use another boundary character:
preg_replace('#\[url=(.*?)](.*?)\[/url]#is', '$2', $text);
You still have to escape the [ characters, though.
PHP is interpreting the '/' in /url as being the end of the regex and the start of the regex options. Insert a '\' before it to make it a literal '/'.
You need to escape the '['s in the same way (otherwise they will be interpreted as introducing a character class).
preg_replace("/\[url=(.*?)](.*?)\[\/url]/is", '$2', $text);
Both the slashes and square brackets are special characters in regex, you will need to escape them:
\/ \[ \]
The 2nd '/' in a regex string ends the regex. You need to escape it. Also, preg_replace will interpret the '[url=(.*?)]' as a character class, so you need to escape those as well.
preg_replace('/\[url=(.*?)\](.*?)\[\/url\]/is', '$2', $text);
You seem to be just starting out with regular expressions. If that is the case - or maybe even if it isn't - you will find the Regex Coach to be a very helpful tool. It provides a sandbox for us to test our pattern matches and our replace strings too. If you had been using that it would have highlighted the need to escape the special characters.