PHP preg_match_all regex expression weirdness - php

I am having some trouble with regex in php (preg_match_all).
I am using the following code to find an email encapsulated by <>:, i.e. :
preg_match_all("<[_a-z0-9-]+(\.[_a-z0-9-]+)*#[a-z0-9-]+(\.[a-z0-9-]+)*(\.[a-z]{2,3})>:", $body,$matches);
For some reason PHP is blowing up at the colon with the following error:
Warning: preg_match_all() [function.preg-match-all]: Unknown modifier ':' in...
Any help would be much appreciated, as I am no regex guru, and am just about out of hair to pull.

You need to use delimeters EX:
preg_match_all('/<[_a-z0-9-]+(\.[_a-z0-9-]+)*#[a-z0-9-]+(\.[a-z0-9-]+)*(\.[a-z]{2,3})>:/', $body,$matches);
See the / I added on both ends telling PHP where the regex starts and ends.

You could use T-Regx which doesn't need / at the start and end
$pattern = "<[_a-z0-9-]+(\.[_a-z0-9-]+)*#[a-z0-9-]+(\.[a-z0-9-]+)*(\.[a-z]{2,3})>:";
$matches = Pattern::of($pattern)->match($body)->all();

Related

Escaping regular expressions in PHP

I am trying to escape a PCRE in PHP for use in a script. For some reason I can't get it to function when it has been escaped, I've only managed to get it working when the REGEX is given as a form input.
The Regex I'm using is:
$pattern = '£((http|ftp|https):\/\/)?([\w\-_]+(?:(?:\.[\w\-_]+)+))([\w\-\.,#?^=%&:/~\+#]*[\w\-\#?^=%&/~\+#])?£';
So far I have tried:
preg_quote(): converts the Regex to the following and throws an error: £((http\|ftp\|https):\/\/)\?([\w\-_]+(\?:(\?:\.[\w\-_]+)+))([\w\-\.,#\?\^\=%&:/~\+#]*[\w\-\#\?\^\=%&/~\+#])\?£
htmlentities(): gives error: Warning: preg_match(): Unknown modifier 'a'
addslashes(): same as above
mixture of the 3: same as above
Does anyone have an idea of what I'm doing wrong?
The pound symbol was the issue here, replacing it to an exclamation mark solved the problem.
Working expression:
$pattern = '!((http|ftp|https):\/\/)?([\w\-_]+(?:(?:\.[\w\-_]+)+))([\w\-\.,#?^=%&:/~\+#]*[\w\-\#?^=%&/~\+#])?!';
For some reason this is working fine with no escape functions.

How do I add line breaks at the end of lines in a string; to my pattern using preg_match_all? PHP

I am trying to add line breaks using this pattern---- $pattern = "/^.$pattern.\$/m"; but I don't seem to get the results I want. Any help would be appreciated. Thanks.
Use s modifier.
$pattern = "/^.$pattern.\$/s";
http://php.net/manual/en/reference.pcre.pattern.modifiers.php

Regex pattern causing syntax error when pasted into preg_match_all()

I have used the online regex tester http://gskinner.com/RegExr/ to come up with the following pattern: ( I've pasted all three lines as I'm not sure if i should be pasting the Regexp or the pattern.)
RegExp: /<div class="label">.*?<h3>(.*?)</h3>.*?"more">(.*?)\|/g
pattern: <div class="label">.*?<h3>(.*?)</h3>.*?"more">(.*?)\|
flags: g
If I use it in php like this:
$pattern = '/<div class="label">.*?<h3>(.*?)</h3>.*?"more">(.*?)\|/g';
preg_match_all($pattern,$page,$matches );
I get an error:
Warning: preg_match_all()
[function.preg-match-all]: Unknown
modifier '.' in ...
Can someone please explain how I can get my regex from this tool, into the correct format for use in PHP. Many thanks.
You're not escaping the slash in </h3> and g means apply globally, that's not needed here since you're using preg_match_all, and it's not a valid modifier in PHP's implementation of regex, just omit it
Try this:
$pattern = '/<div class="label">.*?<h3>(.*?)<\/h3>.*?"more">(.*?)\|/';

Regular expression error: no ending delimiter

I'm trying to execute this regular expression:
<?php
preg_match("/^([^\x00-\x1F]+?){0,1}/", 'test string');
?>
But keep getting an error:
Warning: preg_match() [function.preg-match]: No ending delimiter '/' found in /var/www/preg.php on line 6
I can't understand where it is coming from. I have an ending delimeter right there... I tried to change delimiter to other symbols and it didn't help.
I would appreciate your help on this problem.
I guess PHP chokes on the NULL character that denotes the end of a string in C.
Try it with single quotes so that \x00 is interpreted by the PCRE engine and not by PHP:
'/^([^\x00-\x1F]+?){0,1}/'
It seems that this is an already known bug (see Problems with strings containing \x00).
Like Gumbo said, preg_match is not binary safe.
Use instead:
preg_match("/^([^\\x{00}-\\x{1F}]+?){0,1}/", 'test string'));
This is the correct way to specify Unicode code points in PCRE.
I am not sure about php, but maybe the problem is that you need to escape your backslashes?
try "/^([^\\x00-\\x1F]+?){0,1}/"

Grubers new and improved URL recognising regex

I've been trying to use grubers latest url matching regex in a php project.
To test it I threw together something very simple:
$regex = "(?i)\b((?:[a-z][\w-]+:(?:/{1,3}|[a-z0-9%])|www\d{0,3}[.]|[a-z0-9.\-]+[.][a-z]{2,4}/)(?:[^\s()<>]+|\(([^\s()<>]+|(\([^\s()<>]+\)))*\))+(?:\(([^\s()<>]+|(\([^\s()<>]+\)))*\)|[^\s`!()\[\]{};:"'.,<>?«»“”‘’]))";
$array = pret_match_all($regex, $theblockofurltext);
print_r($array);
The first problem was the " would escape a string, depending which I wrapped the regex with, so I just removed it. The use of this is personal and I will never have " anywhere near a url anyway. This left me with a new regex.
$regex = "(?i)\b((?:[a-z][\w-]+:(?:/{1,3}|[a-z0-9%])|www\d{0,3}[.]|[a-z0-9.\-]+[.][a-z]{2,4}/)(?:[^\s()<>]+|\(([^\s()<>]+|(\([^\s()<>]+\)))*\))+(?:\(([^\s()<>]+|(\([^\s()<>]+\)))*\)|[^\s`!()\[\]{};:'.,<>?«»“”‘’]))";
Raring to go I then ran my little script and it gave me the following error:
Warning: preg_split() [function.preg-split]: Unknown modifier '\' in D:\wwwroot\xxx\index.php on line 14
Unfortunately my REGEX class at school wasn't taught to anywhere near the levels of this regex requires, and I have no idea where to begin fixing this for use with PHP. Any help would be greatly appreciated. No doubt I'm probably doing something stupid too, so please go easy on me :)
Jon
Add # before and after your RE.
$regex = "#(?i)\b((?:[a-z][\w-]+:(?:/{1,3}|[a-z0-9%])|www\d{0,3}[.]|[a-z0-9.\-]+[.][a-z]{2,4}/)(?:[^\s()<>]+|\(([^\s()<>]+|(\([^\s()<>]+\)))*\))+(?:\(([^\s()<>]+|(\([^\s()<>]+\)))*\)|[^\s`!()\[\]{};:'.,<>?«»“”‘’]))#";
If you use PCRE, the regular expression must be enclosed in delimiters. Now, parenthesis () can also be delimiters, that is why the engine thinks, your expression is only (?i) and interprets the next \ as modifier.
You could use ~ as delimiter:
$regex = "~(?i)\b...]))~";
Update:
I don't know whether PHP supports the partial modifying of an expression with (?i). So you might have to remove this and put the modifier after the delimiter instead (you apply it to the whole expression anyway):
$regex = "~\b...]))~i";

Categories