Why does PHP's preg_quote escape unnecessary characters? - php

From http://php.net/manual/en/function.preg-quote.php:
preg_quote() takes str and puts a backslash in front of every character that is part of the regular expression syntax. This is useful if you have a run-time string that you need to match in some text and the string may contain special regex characters.
The special regular expression characters are: . \ + * ? [ ^ ] $ ( ) { } = ! < > | : -
Note that / is not a special regular expression character.
} is unnecessary but I can understand why they'd include it for symmetry. E.g. the following code works:
$re = '/}{This is fine}{/';
preg_match($re, $re, $match);
var_dump($match);
The output is:
array(1) {
[0] =>
string(16) "}{This is fine}{"
}
Why do they include = ! < > :? As far as I can tell, they're only ever special after being introduced by another unescaped meta character, e.g. immediately after (?, both of which characters also get escaped. : can also be special inside character classes like so: [[:alpha:]], but all four brackets get escaped.

I think that the idea behind is to have a consistent behaviour.
The goal of preg_quote is to produce a literal string for a regex pattern. This means that no character in the returned string can be interpreted as something else than itself whatever the context, and the context can be a concatenation with an other part of the pattern.
If I write '/(?' . preg_quote('>') . 'abc)/', I expect that the > will not be interpreted as the > of an atomic group, and that the pattern returns an error.
If I write '/.{3' . preg_quote('}') . '/', I expect that the } will not be interpreted as the closing curly bracket of a quantifier, and that the pattern matches a string like 'a{3}', but not 'abc'.
You can easily build the same kind of examples for = ! < > : using lookahead assertions, named groups, non-capturing groups, or atomic groups.
The important is that the expected behaviour is always the same whatever the way or the context in which the function is used.

Well what happens if you're trying to write some code like this:
$lookahead = getUserInput(); // Not escaped
$results = preg_match('/abc(?' . $lookahead . ')/', $subject);
and the user gives the input !def? The answer is you get negative lookahead instead of regular lookahead. If you don't want to allow negative lookaheads, you're going to want to make sure that exclamation mark is escaped.

Related

PHP how to use a variable in match

I am using the blow code to see if my password contains special char which works fine. but I would like to be able to use a variable like $mySpecialChar instead of the "[\'^£$%&*()}{##~?><>,|=_+¬-]" string, I'm not sure if I can do that. Reason for that is because I want to be able to pull string from a datatable.
I've tried preg_match_all("/".$mySpecialChar."/"), but no luck.
$matches = array();
if (preg_match_all("/[\'^£$%&*()}{##~?><>,|=_+¬-]/", $pwd, $matches) > 0) {
foreach ($matches[0] as $match) { $specialcase += strlen($match); }
}
Make sure to escape any variables you put in a regular expression
preg_match_all('/'.preg_quote($mySpecialChar, '/').'/', $pwd, $matches);
preg_quote
string preg_quote ( string $str [, string $delimiter = NULL ] )
preg_quote() takes str and puts a backslash in front of every character that is part of the regular expression syntax. This is useful if you have a run-time string that you need to match in some text and the string may contain special regex characters.
The special regular expression characters are: . \ + * ? [ ^ ] $ ( ) { } = ! < > | : -
Note that / is not a special regular expression character.
You have 5 or more special characters in there.
Of note is that last line in the quote above Note that / is not a special regular expression character. While not entirely necessary in your case (I don't see / in your variable string), If you put the second argument as the delimiter it will escape that too. If you pay close attention to what I put above, you will see that is exactly what I did, preg_quote($mySpecialChar, '/')
If you don't quote these, well it's anyone guess what it will do. you could get an error, you could get an empty capture group () you could match anything with the . etc. etc. AS you have it,
[\'^£$%&*()}{##~?><>,|=_+¬-]
This is a character set, so it will escape most of the stuff inside it, that's if that's intentional. If you had [^\'£$%&*()}{##~?><>,|=_+¬-] you would have a not (or negative) character set.
Seeing as you are using preg_match_all, and not preg_match, I can probably assume you don't want the character set. Otherwise why use preg_match_all
It should simply be, if you want to match everything in $mySpecialChar:
preg_match('/['.preg_quote($mySpecialChar, '/').']+', $pwd, $matches);
If you are just trying to match the stuff between the [....], I would still escape it as it doesn't matter, but if you put it in a database and have it start with ^ instead it will make a difference, or if you get the - between certain characters 0-9 for example it may make a difference. Escaping never hurts, just remove the [] when you save it and replace them as I have above.
The [ .... ]+ means 1 or more, the [ ... ]* means none or more. the [...]+
? means one or more non-gready etc. Then you should be able to use just [...]+ with preg_match which will give you a cleaner match then using [...] match one, with preg_match_all.
Most of the time \W (uppercase) will also match most symbols, basically that means [^a-zA-Z0-9_] or not a-Z, 0-9 and _
You could always just look for characters that AREN'T the basic ones:
preg_match_all('/[^0-9A-Za-z]/', $pwd, $matches)
Much shorter and just as effective.
You can easily put this in a string if you like:
$specialChars = '[^0-9A-Za-z]';
preg_match_all("/{$specialChars}/", $pwd, $matches);
Running this on the provided password will return an array in $matches which contains all of the special characters from the string. All you need to do in order to evaluate password complexity is look at the length of $pwd and how many entries are in $matches, as this tells you the number of special characters.

'Delimiter must not be alphanumeric or backslash' and preg_replace() [duplicate]

I am trying to take a string of text like so:
$string = "This (1) is (2) my (3) example (4) text";
In every instance where there is a positive integer inside of parentheses, I'd like to replace that with simply the integer itself.
The code I'm using now is:
$result = preg_replace("\((\d+)\)", "$0", $string);
But I keep getting a
Delimiter must not be alphanumeric or backslash.
Warning
Any thoughts? I know there are other questions on here that sort of answer the question, but my knowledge of regex is not enough to switch it over to this example.
You are almost there. You are using:
$result = preg_replace("((\d+))", "$0", $string);
The regex you specify as the 1st
argument to preg_* family of function
should be delimited in pair of
delimiters. Since you are not using
any delimiters you get that error.
( and ) are meta char in a regex,
meaning they have special meaning.
Since you want to match literal open
parenthesis and close parenthesis,
you need to escape them using a \.
Anything following \ is treated
literally.
You can capturing the integer
correctly using \d+. But the captured
integer will be in $1 and not $0. $0
will have the entire match, that is
integer within parenthesis.
If you do all the above changes you'll get:
$result = preg_replace("#\((\d+)\)#", "$1", $string);
1) You need to have a delimiter, the / works fine.
2) You have to escape the ( and ) characters so it doesn't think it's another grouping.
3) Also, the replace variables here start at 1, not 0 (0 contains the FULL text match, which would include the parentheses).
$result = preg_replace("/\((\d+)\)/", "\\1", $string);
Something like this should work. Any further questions, go to PHP's preg_replace() documentation - it really is good.
Check the docs - you need to use a delimiter before and after your pattern: "/\((\d+)\)/"
You'll also want to escape the outer parentheses above as they are literals, not a nested matching group.
See: preg_replace manual page
Try:
<?php
$string = "This (1) is (2) my (3) example (4) text";
$output = preg_replace('/\((\d)\)/i', '$1', $string);
echo $output;
?>
The parenthesis chars are special chars in a regular expression. You need to escape them to use them.
Delimiter must not be alphanumeric or backslash.,
try typing your parameters inside "/ .... /" as shown bellow. Else the code will output >>> Delimiter must not be alphanumeric or backslash.
$yourString='hi there, good friend';
$dividorString='there';
$someSstring=preg_replace("/$dividorString/",'', $yourString);
echo($someSstring);
// hi, good friend
.
.
worked for me.

PCRE regex with lookahead and lookbehind always returns true

I’m trying to create a regex for form validation but it always returns true. The user must be able to add something like {user|2|S} as input but also use brackets if they are escaped with \.
This code checks for the left bracket { for now.
$regex = '/({(?=([a-zA-Z0-9]+\|[0-9]*\|(S|D[0-9]*)}))|[^{]|(?<=\\\){)*/';
if (preg_match($regex, $value)) {
return TRUE;
} else {
return FALSE;
}
A possible correct input would be:
Hello {user|1|S}, you have {amount|2|D2}
or
Hello {user|1|S}, you have {amount|2|D2} in \{the_bracket_bank\}
However, this should return false:
Hello {user|1|S}, you have {amount|2}
and this also:
Hello {user|1|S}, you have {amount|2|D2} in {the_bracket_bank}
A live example can be found here: http://regexr.com?37tpu Note that there is a \ in the lookbehind at the end, PHP was giving me error messages because I had to escape it an extra time in my code.
The main error is that you do not specify that the regex should match from the beginning to the of the checked string. Use the ^ and $ assertions.
I think you have to escape { and } in your regex as they have special meaning. Together they form a quantifier.
The (?<=\\\) is better written (?<=\\\\). The backslash has to be double escaped as it has special meaning in both single-quoted string and PCRE regex. Using \\\ works too, because if single-quoted string contains any escape sequence except \\ and \', it handles it as literal backslash and letter, therefore \) is taken literally. But explicitly escaping the backslash twice seems easier to read to me.
The regex should be
$regex = '/^(\{(?=([a-zA-Z0-9]+\|[0-9]*\|(S|D[0-9]*)\}))|[^{]|(?<=\\\\)\{)*$/';
But notice that the look-around assertions are not necessary. This regex should do the job too:
$regex = '/^([^{]|\\\{|\{[a-zA-Z0-9]+\|[0-9]*\|(S|D[0-9]*)\})*$/';
Any non-{ characters are matched by the first alternative. When a { is read, one of the remaining two alternatives is used. Either the pattern for the brace thing matches, or the regex engine backtracks one character and tries to match \{ character sequence. If it fails, both ways, it backtracks further till it reaches string start and fails completely.
Matching without lookbehind
You can make a regex for this without using lookbehind/lookaheads (which is usually recommended).
For example, if your requirement is that you can match any character but a { and a } unless it's preceded by a \. You can also say:
Match any character but a { and a } OR match a \{ or a \}. To match any character but a { and a } use:
[^{}]
To match a \{ use:
\\\{
One backslash is for escaping the { (which might not be necessary, depending on your regex compiler) and one backslash is for escaping the other backslash.
You would end up with this:
(?:
[^{}]
|
\\\{
|
\\\}
)+
I nicely formatted this regex so that it's readable. If you want to use it in your code like this make sure to use the [PCRE_EXTENDED][1] modifier.
Looks more of a job for a lookbehind to me:
/((?<!\\\\)\{[a-zA-Z0-9]+\|[0-9]+\|[SD][0-9]*\})/
However, the obfuscation factor is so high that I would rather recognize all bracketed strings and parse them later.

PHP Regex for matching a UNC path

I'm after a bit of regex to be used in PHP to validate a UNC path passed through a form. It should be of the format:
\\server\something
... and allow for further sub-folders. It might be good to strip off a trailing slash for consistency although I can easily do this with substr if need be.
I've read online that matching a single backslash in PHP requires 4 backslashes (when using a "C like string") and think I understand why that is (PHP escaping (e.g. 2 = 1, so 4 = 2), then regex engine escaping (the remaining 2 = 1). I've seen the following two quoted as equivalent suitable regex to match a single backslash:
$regex = "/\\\\/s";
or apparently this also:
$regex = "/[\\]/s";
However these produce different results, and that is slightly aside from my final aim to match a complete UNC path.
To see if I could match two backslashes I used the following to test:
$path = "\\\\server";
echo "the path is: $path <br />"; // which is \\server
$regex = "/\\\\\\\\\/s";
if (preg_match($regex, $path))
{
echo "matched";
}
else
{
echo "not matched";
}
The above however seems to match on two or more backslashes :( The pattern is 8 slashes, translating to 2, so why would an input of 3 backslashes ($path = "\\\\\\server") match?
I thought perhaps the following would work:
$regex = "/[\\][\\]/s";
and again, no :(
Please help before I jump out a window lol :)
Use this little gem:
$UNC_regex = '=^\\\\\\\\[a-zA-Z0-9-]+(\\\\[a-zA-Z0-9`~!##$%^&(){}\'._-]+([ ]+[a-zA-Z0-9`~!##$%^&(){}\'._-]+)*)+$=s';
Source: http://regexlib.com/REDetails.aspx?regexp_id=2285 (adopted to PHP string escaping)
The RegEx shown above matches for valid hostname (which allows only a few valid characters) and the path part behind the hostname (which allows many, but not all characters)
Sidenote on the backslashes issue:
When you use double quotes (") to enclose your string, you must be aware of PHP special character escaping.. "\\" is a single \ in PHP.
Important: even with single quotes (') those backslashes must be escaped.
A PHP string with single quotes takes everything in the string literally (unescaped) with a few exceptions:
A backslash followed by a backslash (\\) is interpreted as a single backslash.
('C:\\*.*' => C:\*.*)
A backslash followed by a single-quote (\') is interpreted as a single quote.
('I\'ll be back' => I'll be back)
A backslash followed by anything else is interpreted as a backslash.
('Just a \ somewhere' => Just a \ somewhere)
Also, you must be aware of PCRE escape sequences.
The RegEx parser treats \ for character classes, so you need to escape it for RegEx, again.
To match two \\ you must write $regex = "\\\\\\\\" or $regex = '\\\\\\\\'
From the PHP docs on PCRE escape sequences:
Single and double quoted PHP strings have special meaning of backslash. Thus if \ has to be matched with a regular expression \, then "\\" or '\\' must be used in PHP code.
Regarding your Question:
why would an input of 3 backslashes ($path = "\\\server") match with regex "/\\\\\\\\/s"?
The reason is that you have no boundaries defined (use ^ for beginning and $ for end of string), thus it finds \\ "somewhere" resulting in a positive match. To get the expected result, you should do something like this:
$regex = '/^\\\\\\\\[^\\\\]/s';
The RegEx above has 2 modifications:
^ at the beginning to only match two \\ at the beginning of the string
[^\\] negative character class to say: not followed by an additional backslash
Regarding your last RegEx:
$regex = "/[\\][\\]/s";
You have a confusion (see above for clarification) with backslash escaping here. "/[\\][\\]/s" is interpreted by PHP to /[\][\]/s, which will let the RegEx fail because \ is a reserved character in RegEx and thus must be escaped.
This variant of your RegEx would work, but also match any occurance of two backslashes for the same reason i already explained above:
$regex = '/[\\\\][\\\\]/s';
Echo your regex as well, so you see what's the actual pattern, writing those slashes inside PHP can become akward for the pattern, so you can verify it's correct.
Also you should put ^ at the beginning of the pattern to match from string start and $ to the end to specify that the whole string has to be matched.
\\server\something
Regex:
~^\\\\server\\something$~
PHP String:
$pattern = '~^\\\\\\\\server\\\\something$~';
For the repetition, you want to say that a server exists and it's followed by one or more \something parts. If server is like something, this can be simplified:
^\\(?:\\[a-z]+){2,}$
PHP String:
$pattern = '~^\\\\(?:\\\\[a-z]+){2,}$~';
As there was some confusion about how \ characters should be written inside single quoted strings:
# Output:
#
# * Definition as '\\' ....... results in string(1) "\"
# * Definition as '\\\\' ..... results in string(2) "\\"
# * Definition as '\\\\\\' ... results in string(3) "\\\"
$slashes = array(
'\\',
'\\\\',
'\\\\\\',
);
foreach($slashes as $i => $slashed) {
$definition = sprintf('%s ', var_export($slashed, 1));
ob_start();
var_dump($slashed);
$result = rtrim(ob_get_clean());
printf(" * Definition as %'.-12s results in %s\n", $definition, $result);
}

PHP using preg_replace : "Delimiter must not be alphanumeric or backslash" error

I am trying to take a string of text like so:
$string = "This (1) is (2) my (3) example (4) text";
In every instance where there is a positive integer inside of parentheses, I'd like to replace that with simply the integer itself.
The code I'm using now is:
$result = preg_replace("\((\d+)\)", "$0", $string);
But I keep getting a
Delimiter must not be alphanumeric or backslash.
Warning
Any thoughts? I know there are other questions on here that sort of answer the question, but my knowledge of regex is not enough to switch it over to this example.
You are almost there. You are using:
$result = preg_replace("((\d+))", "$0", $string);
The regex you specify as the 1st
argument to preg_* family of function
should be delimited in pair of
delimiters. Since you are not using
any delimiters you get that error.
( and ) are meta char in a regex,
meaning they have special meaning.
Since you want to match literal open
parenthesis and close parenthesis,
you need to escape them using a \.
Anything following \ is treated
literally.
You can capturing the integer
correctly using \d+. But the captured
integer will be in $1 and not $0. $0
will have the entire match, that is
integer within parenthesis.
If you do all the above changes you'll get:
$result = preg_replace("#\((\d+)\)#", "$1", $string);
1) You need to have a delimiter, the / works fine.
2) You have to escape the ( and ) characters so it doesn't think it's another grouping.
3) Also, the replace variables here start at 1, not 0 (0 contains the FULL text match, which would include the parentheses).
$result = preg_replace("/\((\d+)\)/", "\\1", $string);
Something like this should work. Any further questions, go to PHP's preg_replace() documentation - it really is good.
Check the docs - you need to use a delimiter before and after your pattern: "/\((\d+)\)/"
You'll also want to escape the outer parentheses above as they are literals, not a nested matching group.
See: preg_replace manual page
Try:
<?php
$string = "This (1) is (2) my (3) example (4) text";
$output = preg_replace('/\((\d)\)/i', '$1', $string);
echo $output;
?>
The parenthesis chars are special chars in a regular expression. You need to escape them to use them.
Delimiter must not be alphanumeric or backslash.,
try typing your parameters inside "/ .... /" as shown bellow. Else the code will output >>> Delimiter must not be alphanumeric or backslash.
$yourString='hi there, good friend';
$dividorString='there';
$someSstring=preg_replace("/$dividorString/",'', $yourString);
echo($someSstring);
// hi, good friend
.
.
worked for me.

Categories