Literal delimiter ( delimiter inside \Q \E block ) - php

I've been trying to make a few of functions based on RegEx and most of them use \Q and \E as some of the RegEx pattern is user input.
So, let's say hypothetically that we're using the delimiter / and want to match it against / the function would construct something amongst the lines of /\Q/\E/.
I'm not sure why /\Q/\E/ doesn't match / but with every other delimiter it does, unless you use the same delimiter as input.
Maybe, it considers the delimiter the end, even though, it's in a literal-only block and the escape as literal. Not sure, tried a bunch.
Hopefully someone can push me into the right direction as to what workarounds there are for this issue.

It helps to understand that / is not a regex metacharacter, like * or (. It's special because you're using it to delimit the regex itself, and the only way to escape the regex delimiter is with a backslash (\/).
But you shouldn't need to use \Q and \E. The preg_quote() method takes a delimiter argument, so it correctly adds backslashes everywhere they're needed.

Related

Comments in preg regexes using # as delimiter?

With perl like regular expression syntax, you are able to make inline comments using the /x modifier and the # character to annotate comments, but what if I'm using PHP and using # as delimiter for styling reasons, any way to make a comment then?
preg_replace("/foo # This is a comment\n/x", "bar","foobar")
works but
preg_replace("#foo # This is a comment\n#x", "bar","foobar")
doesnt work, neither does //, /**/ or any common comment sequence I tried.
In a PHP regex pattern, a delimiter has more "weight" than a pattern part. If you define a delimiter as # you cannot use it as a part of another special construct. So, "#foo # This is a comment\n#x" and "#foo (?# This is a comment\n)#x" won't work as the # signals the end of the pattern space inside the regex.
When you escape a #, it becomes a literal # symbol. The "#foo \\# This is a comment\n#x" will match "foo#Thisisacomment" as once it is escaped, it is matched as a literal symbol.
So, the best advice is available on the "Delimiters" page at php.net:
If the delimiter needs to be matched inside the pattern it must be escaped using a backslash. If the delimiter appears often inside the pattern, it is a good idea to choose another delimiter in order to increase readability.

PHP Preg_replace() pattern not work

I wan to replace text using preg_replace.But my search string have a / so it makes problem.
How can I solve it?
$search='r/trtrt';
echo preg_replace('/\b'.addslashes($search).'\b/', 'ERTY', 'TG FRT');
I am getting error preg_replace(): Unknown modifier 'T'
Use a different delimiter and don't use addslashes, that is escaping non-regex special characters (or a mix of regex and non-regex characters, I'd say the majority of the time dont use addslashes).
$search='r/trtrt';
echo preg_replace('~\b'. $search.'\b~', 'ERTY', 'TG FRT');
You could use preg_quote as an alternative. Just changing the delimiter is the easiest solution though.
use ~ as delimiter:
$search='r/trtrt';
echo preg_replace('~\b'.addslashes($search).'\b~', 'ERTY', 'TG FRT');
I always use ~ as it is one of the least used char in a string but you can use any character you want and won't need to escape your regexp chars!
You don't need addslashes() in your case but if you have a more complex regexp and you want to escape chars you should use preg_quote($search).
Why not escape it the way it is meant to be done
$search='r/trtrt';
echo preg_replace('/\b'.preg_quote($search, '/').'\b/', 'ERTY', 'TG FRT');
http://php.net/manual/en/function.preg-quote.php
preg_quote() takes str and puts a backslash in front of every
character that is part of the regular expression syntax. This is
useful if you have a run-time string that you need to match in some
text and the string may contain special regex characters
delimiter
If the optional delimiter is specified, it will also be escaped.
This is useful for escaping the delimiter that is required by the PCRE
functions. The / is the most commonly used delimiter.
Add slashes is not the function to use here. It provides no escaping for any of the special characters in Regx.
The special regular expression characters are: . \ + * ? [ ^ ] $ ( ) {
} = ! < > | : -
Using the proper functions promote readability of the code, if at some later point in time you or another coder see the ~ delimiter they may just think its part of a personal "style" or pay it little attention. However, seeing the input properly escaped will tell any experienced coder that the input could contain characters that conflict with regular expressions.
Personally, readability is at the top of my list whenever I write code. If you cant understand it at a glance, what good is it.

Why are backslashes used in preg_match function of PHP?

I'm been practicing the preg_match() function in PHP. The tutorial said that it is needed to add fore slashes before the characters.
I also noticed that without the slashes, it works strangely. It gives a warning:
preg_match(): Delimiter must not be alphanumeric or backslash.
Q: What difference does the fore slashes do?
Here's the code:
$string = 'Okay, I\'m fine with it! ';
$math = 'Okay'; // I need to add fore slashes for it to work
echo preg_match($math, $string); // It supposedly echoes out 1 or 0
// depending if the former argument
// is in the latter argument
There is no particular reason, it's a syntaxic choice. This syntax has the avantage to be handy to add global modifiers to the pattern:
delimiter - pattern - delimiter - [global modifiers]
As explained in the error message and in the php manual, you can choose the delimiter between special characters, the most commonly used is the slash, but it's not always a pertinent choice in particular when the pattern contains a lot of literal slashes that need to be escaped.
It's because you can also apply switches to the regular expression (eg. m for multiline, u for Unicode) and these need to be defined outside of the delimiter, so the syntax is
opening delimiter expression closing delimiter [optional switches]
e.g.
/^[a-z]*$/mi
for the multiline (m) and case insensitive (i) switches, using a delimiter of /
The delimiter must not be a character that can be misinterpreted by the regexp parser, it must be very clear that it is a delimiter, so it cannot be alpha (e.g. i, or a \ that is used to "escape" characters in the regexp
Note that you can also use braces as delimiters, so
[^[a-z]*$]mi
is valid

PHP preg_match_all strange behaviour with "/" character

Using :
preg_match_all(
"/\b".$KeyWord."\b/u",
$SearchStr,
$Array1,
PREG_OFFSET_CAPTURE);
This code works fine for all cases except when there is a / in the $KeyWord var. Then I get a warning and unsuccessful match of course.
Any idea how to work around this?
Thanks
use preg_quote() around the keyword.
http://us2.php.net/preg_quote
but also provide your delimiter, so it gets escaped: preg_quote($KeyWord, "/")
You must parse $KeyWord and add "\" before all spec symbols, you can use preg_quote()
Dynamic Values In Patterns
You are using a dynamic value inside the pattern. Like escaping for SQL or HTML, a specific escaping for the value is needed. If you do not escape meta characters inside the value are interpreted by the regex engine. The escaping function for PCRE patterns is preg_quote().
preg_match_all(
"(\b".preg_quote($KeyWord)."\b)u",
$SearchStr,
$Array1,
PREG_OFFSET_CAPTURE
);
Delimiters
The syntax of a pattern in PHPs preg_* function is:
DELIMITER PATTERN DELIMITER OPTIONS
The / is the delimiter in your pattern. So the / inside the $keyWord was recognized as the closing delimiter.
But all non alphanumeric characters can be used. In Perl and JS you can define a regular expression directly (not as string) using / so it is often the default in tutorials.
Most delimiters have to be escaped inside the pattern.
Match a \: '/\//'
The exception to this rule are brackets. You use any of the bracket pairs as delimiter. And because it is a pair, they can still be used inside the pattern.
Match a \: '(/)'
The () brackets are a good decision, you can count them as "subpattern 0".
You can use preg_quote to handle the backslash character.
From the manual:
puts a backslash in front of every
character that is part of the regular
expression syntax
You can also pass the delimiter as the second parameter and it will also be escaped. However, if you're using # as your delimiter, then there's no need to escape /
So, you can either use:
preg_match_all("/\b".preg_quote($KeyWord, "/")."\b/u", $SearchStr,$Array1,PREG_OFFSET_CAPTURE))
or, if you are sure that your keyword does not contain any other regex-special characters, you can simply change the delimiter, and use to escape the backslash:
preg_match_all("#\b".$KeyWord."\b#u", $SearchStr,$Array1,PREG_OFFSET_CAPTURE))

why does this regex fail in PHP?

I got the expression directly from RegExr, but PHP has a problem with the =
"/[a-z0-9!#$%&'*+/=?^_`{|}~-]+(?:\.[a-z0-9!#$%&'*+/=?^_`{|}~-]+)*#(?:[a-z0-9](?:[a-z0-9-]*[a-z0-9])?\.)+[a-z0-9](?:[a-z0-9-]*[a-z0-9])?/"
The expression is for matching email addresses.
You used / as the delimiter marking the start and end of the pattern, but then also used that character within the pattern. You must either use a different delimiter, or escape instances of it within the pattern. If you meant to escape the equals signs, then you used the wrong slash.
Escape the slash preceding the = (and the other slash in that expression). You use / as a delimiter, therefore if it occurs inside the pattern it has to be escaped.
"/[a-z0-9!#$%&'*+\/=?^_`{|}~-]+(?:\.[a-z0-9!#$%&'*+\/=?^_`{|}~-]+)*#(?:[a-z0-9](?:[a-z0-9-]*[a-z0-9])?\.)+[a-z0-9](?:[a-z0-9-]*[a-z0-9])?/"
should work, then.
You are using / as delimiter. There are two / in the regex which are not escaped. Escape them as \/:
"/[a-z0-9!#$%&'*+\/=?^_`{|}~-]+(?:\.[a-z0-9!#$%&'*+\/=?^_`{|}~-]+)*#(?:[a-z0-9](?:[a-0-9-]*[a-z0-9])?\.)+[a-z0-9](?:[a-z0-9-]*[a-z0-9])?/"
^^ ^^

Categories