Escaping regular expressions in PHP - php

I am trying to escape a PCRE in PHP for use in a script. For some reason I can't get it to function when it has been escaped, I've only managed to get it working when the REGEX is given as a form input.
The Regex I'm using is:
$pattern = '£((http|ftp|https):\/\/)?([\w\-_]+(?:(?:\.[\w\-_]+)+))([\w\-\.,#?^=%&:/~\+#]*[\w\-\#?^=%&/~\+#])?£';
So far I have tried:
preg_quote(): converts the Regex to the following and throws an error: £((http\|ftp\|https):\/\/)\?([\w\-_]+(\?:(\?:\.[\w\-_]+)+))([\w\-\.,#\?\^\=%&:/~\+#]*[\w\-\#\?\^\=%&/~\+#])\?£
htmlentities(): gives error: Warning: preg_match(): Unknown modifier 'a'
addslashes(): same as above
mixture of the 3: same as above
Does anyone have an idea of what I'm doing wrong?

The pound symbol was the issue here, replacing it to an exclamation mark solved the problem.
Working expression:
$pattern = '!((http|ftp|https):\/\/)?([\w\-_]+(?:(?:\.[\w\-_]+)+))([\w\-\.,#?^=%&:/~\+#]*[\w\-\#?^=%&/~\+#])?!';
For some reason this is working fine with no escape functions.

Related

PHP preg_match_all regex expression weirdness

I am having some trouble with regex in php (preg_match_all).
I am using the following code to find an email encapsulated by <>:, i.e. :
preg_match_all("<[_a-z0-9-]+(\.[_a-z0-9-]+)*#[a-z0-9-]+(\.[a-z0-9-]+)*(\.[a-z]{2,3})>:", $body,$matches);
For some reason PHP is blowing up at the colon with the following error:
Warning: preg_match_all() [function.preg-match-all]: Unknown modifier ':' in...
Any help would be much appreciated, as I am no regex guru, and am just about out of hair to pull.
You need to use delimeters EX:
preg_match_all('/<[_a-z0-9-]+(\.[_a-z0-9-]+)*#[a-z0-9-]+(\.[a-z0-9-]+)*(\.[a-z]{2,3})>:/', $body,$matches);
See the / I added on both ends telling PHP where the regex starts and ends.
You could use T-Regx which doesn't need / at the start and end
$pattern = "<[_a-z0-9-]+(\.[_a-z0-9-]+)*#[a-z0-9-]+(\.[a-z0-9-]+)*(\.[a-z]{2,3})>:";
$matches = Pattern::of($pattern)->match($body)->all();

php regex error, unknown modifier 'd' [duplicate]

This question already has answers here:
Warning: preg_replace(): Unknown modifier
(3 answers)
Closed 3 years ago.
Im trying to search for something on a page but i keep getting this silly error
this is the error i am getting
Warning: preg_match() [function.preg-match]: Unknown modifier 'd'
this is the code im using
$qa = file_get_contents($_GET['url']);
preg_match('/Click here/',$qa,$result);
And $_GET['url'] can eqaul many things but in one case it was http://freegamesforyourwebsite.com/game/18-wheeler--2.html
the the html of that url basically
Anyone got a clue :S ? I dont even know where to start cus i dont know what a modifire is and the php.net site is no help
thankyou !
You need to escape the '/' before download.php otherwise it thinks you are ending your regex and providing 'd' as a modifier for your regex. You will also need to escape the next '/' in the ending anchor tag.
preg_match('/<a href="\/download.php\?g=(?P<number>.+)">Click here<\/a>/',$qa,$result);
You have to escape your pattern delimiters or use different ones:
# v- escape the '/'
preg_match('/Click here/',$qa,$result);
# v- use hatch marks instead
preg_match('#Click here#',$qa,$result);
Your regular expression needs to be escaped correctly.
It should be:
'/<a href="\/download.php\?g=(?P<number>.+)">Click here<\/a>/'
The problem is that your regular expression is delimited by / characters, but also contains / characters as data. What it's complaining about is /download -- it thinks the / has ended your regular expression and the d that follows is a modifier for your regular expression. However, there is no such modifier d.
The easiest solution is to use some character that is not contained in the regex to delimit it. In this case, # would work well.
preg_match('#Click here#',$qa,$result);

What does the Unknown modifier 'c' mean in Regex? [duplicate]

This question already has answers here:
Warning: preg_replace(): Unknown modifier
(3 answers)
Closed 3 years ago.
I'm a newbie with regular expressions and i need some help :).
I have this:
$url = '<img src="http://mi.url.com/iconos/oks/milan.gif" alt="Milan">';
$pattern = '/<img src="http:\/\/mi.url.com/iconos/oks/(.*)" alt="(.*)"\>/i';
preg_match_all($pattern, $url, $matches);
print_r($matches);
And I get this error:
Warning: preg_match_all() [function.preg-match-all]: Unknown modifier 'c'
I want to select that 'milan.gif'.
How can I do that?
If you’re using / as delimiter, you need to escape every occurrence of that character inside the regular expression. You didn’t:
/<img src="http:\/\/mi.url.com/iconos/oks/(.*)" alt="(.*)"\>/i
^
Here the marked / is treated as end delimiter of the regular expression and everything after is is treated as modifier. i is a valid modifier but c isn’t (see your error message).
So:
/<img src="http:\/\/mi\.url\.com\/iconos\/oks\/(.*)" alt="(.*)"\>/i
But as Pekka already noted in the comments, you shouldn’t try to use regular expressions on a non-regular language like HTML. Use an HTML parser instead. Take a look at Best methods to parse HTML.
The problem is that you haven't escaped the forward slashes in the url string (you have escaped the ones in the http:// part, but not the url path).
Therefore the first one it comes across it (which is after .com), it thinks is the end of the regex, so it treats everything after that slash as the 'modifier' codes.
The next character ('i') is a valid modifier (as you know, since you're actually using it in your example), so that passes the test. However the next character ('c') is not, so it throws an error, which is what you're seeing.
To fix it, simply escape the slashes. So your example would look like this:
$pattern = '/<img src="http:\/\/mi.url.com\/iconos\/oks\/(.*)" alt="(.*)"\\>/i';
Hope that helps.
Note, as someone has already said, it's generally not advisable to use regex to match HTML, since HTML can be too complex to match accurately. It's generally preferrable to use a DOM parser. In your example, the regex could fail if the alt attribute or the end of the image URL contains unexpected characters, or if the quoting in the HTML code isn't as you expect.

Regular expression error: no ending delimiter

I'm trying to execute this regular expression:
<?php
preg_match("/^([^\x00-\x1F]+?){0,1}/", 'test string');
?>
But keep getting an error:
Warning: preg_match() [function.preg-match]: No ending delimiter '/' found in /var/www/preg.php on line 6
I can't understand where it is coming from. I have an ending delimeter right there... I tried to change delimiter to other symbols and it didn't help.
I would appreciate your help on this problem.
I guess PHP chokes on the NULL character that denotes the end of a string in C.
Try it with single quotes so that \x00 is interpreted by the PCRE engine and not by PHP:
'/^([^\x00-\x1F]+?){0,1}/'
It seems that this is an already known bug (see Problems with strings containing \x00).
Like Gumbo said, preg_match is not binary safe.
Use instead:
preg_match("/^([^\\x{00}-\\x{1F}]+?){0,1}/", 'test string'));
This is the correct way to specify Unicode code points in PCRE.
I am not sure about php, but maybe the problem is that you need to escape your backslashes?
try "/^([^\\x00-\\x1F]+?){0,1}/"

Weird error using preg_match and unicode

if (preg_match('(\p{Nd}{4}/\p{Nd}{2}/\p{Nd}{2}/\p{L}+)', '2010/02/14/this-is-something'))
{
// do stuff
}
The above code works. However this one doesn't.
if (preg_match('/\p{Nd}{4}/\p{Nd}{2}/\p{Nd}{2}/\p{L}+/u', '2010/02/14/this-is-something'))
{
// do stuff
}
Maybe someone could shed some light as to why the one below doesn't work. This is the error that is being produced:
A PHP Error was encountered
Severity: Warning
Message: preg_match()
[function.preg-match]: Unknown
modifier '\'
Try this: (delimit the regex with ())
if (preg_match('#\p{Nd}{4}/\p{Nd}{2}/\p{Nd}{2}/\p{L}+#', '2010/02/14/this-is-something'))
{
// do stuff
}
Edited
The modifier u is available from PHP 4.1.0 or greater on Unix and from PHP 4.2.3 on win32.
Also as nvl observed, you are using / as the delimiter and you are not escaping the / present in the regex. So you'lll have to use:
/\p{Nd}{4}\/\p{Nd}{2}\/\p{Nd}{2}\/\p{L}+/u
To avoid this escaping you can use a different set of delimiters like:
#\p{Nd}{4}/\p{Nd}{2}/\p{Nd}{2}/\p{L}+#
or
#\p{Nd}{4}/\p{Nd}{2}/\p{Nd}{2}/\p{L}+#
As a tip, if your delimiter is present in your regex, its better to choose a different delimiter not found in the regex. This keeps the regex clean and short.
In the second regex you're using / as the regex delimiter, but you're also using it in the regex. The compiler is trying to interpret this part as a complete regex:
/\p{Nd}{4}/
It thinks the next character after the second / should be a modifier like 'u' or 'm', but it sees a backslash instead, so it throws that cryptic exception.
In the first regex you're using parentheses as regex delimiters; if you wanted to add the u modifier, you would put it after the closing paren:
'(\p{Nd}{4}/\p{Nd}{2}/\p{Nd}{2}/\p{L}+)u'
Although it's legal to use parentheses or other bracketing characters ({}, [], <>) as regex delimiters, it's not a good idea IMO. Most people prefer to use one of the less common punctuation characters. For example:
'~\p{Nd}{4}/\p{Nd}{2}/\p{Nd}{2}/\p{L}+~u'
'%\p{Nd}{4}/\p{Nd}{2}/\p{Nd}{2}/\p{L}+%u'
Of course, you could also escape the slashes in the regex with backslashes, but why bother?

Categories