preg_match_all issue - php

I'am trying to get all {{product.smth}} with preg_match_all, but if i have few of this in one line i get wrong result.
Example:
$smth = '<name>{{product.name}}</name><getname>{{product.getName()}}</getname>';
$pattern = '/\{\{product\.(.*)\}\}/';
preg_match_all($pattern, $smth, $matches);
//returns '{{product.name}}</name><getname>{{product.getName()}}'
//instad of '{{product.name}}' and '{{product.getName()}}'
What iam doing wrong? Please help.

The problem is that repetition is greedy. Either make it ungreedy by using .*? or better yet: disallow the } character for the repetition:
$pattern = '/\{\{product\.([^}]*)\}\}/';
If you do want to allow single } in that value (like {{product.some{thing}here}}), the equivalent solution uses a negative lookahead:
$pattern = '/\{\{product\.((?:(?!\}\}).)*)\}\}/';
For every single character included in .* it checks that that character doesn't mark the start of a }}.

I think it'll work if you change .* to .*? this will make it lazy instead of greedy and it will try to match as little as possible - so, till the first occurance of }} rather than the last.

Related

preg_replace with Regex - find number-sequence in URL

I'm a regex-noobie, so sorry for this "simple" question:
I've got an URL like following:
http://stellenanzeige.monster.de/COST-ENGINEER-AUTOMOTIVE-m-w-Job-Mainz-Rheinland-Pfalz-Deutschland-146370543.aspx
what I'm going to archieve is getting the number-sequence (aka Job-ID) right before the ".aspx" with preg_replace.
I've already figured out that the regex for finding it could be
(?!.*-).*(?=\.)
Now preg_replace needs the opposite of that regular expression. How can I archieve that? Also worth mentioning:
The URL can have multiple numbers in it. I only need the sequence right before ".aspx". Also, there could be some php attributes behind the ".aspx" like "&mobile=true"
Thank you for your answers!
You can use:
$re = '/[^-.]+(?=\.aspx)/i';
preg_match($re, $input, $matches);
//=> 146370543
This will match text not a hyphen and not a dot and that is followed by .aspx using a lookahead (?=\.aspx).
RegEx Demo
You can just use preg_match (you don't need preg_replace, as you don't want to change the original string) and capture the number before the .aspx, which is always at the end, so the simplest way, I could think of is:
<?php
$string = "http://stellenanzeige.monster.de/COST-ENGINEER-AUTOMOTIVE-m-w-Job-Mainz-Rheinland-Pfalz-Deutschland-146370543.aspx";
$regex = '/([0-9]+)\.aspx$/';
preg_match($regex, $string, $results);
print $results[1];
?>
A short explanation:
$result contains an array of results; as the whole string, that is searched for is the complete regex, the first element contains this match, so it would be 146370543.aspx in this example. The second element contains the group captured by using the parentheeses around [0-9]+.
You can get the opposite by using this regex:
(\D*)\d+(.*)
Working demo
MATCH 1
1. [0-100] `http://stellenanzeige.monster.de/COST-ENGINEER-AUTOMOTIVE-m-w-Job-Mainz-Rheinland-Pfalz-Deutschland-`
2. [109-114] `.aspx`
Even if you just want the number for that url you can use this regex:
(\d+)

How to write the reg express to get the following pattern in the php?

There is a website and I would like to get all the <td> (any content) </td> pattern string
So I write like this:
preg_match("/<td>.*</td>/", $web , $matches);
die(var_dump($matches));
That return null, how to fix the problem? Thanks for helping
OK.
You are only not escaping properly I guess.
Also use groups to capture your stuff properly.
<td>(.*)<\/td>
should do. You can try this regex on your given text here. Don't forget the global flag if you are matching ALL td's. (preg_match_all in PHP)
Usually parsing HTML with regex is not a good idea, try to use DOM parsers instead.
Example -> http://simplehtmldom.sourceforge.net/
Test the above regex with
$web = file_get_contents('http://www.w3schools.com/html/html_tables.asp' );
preg_match_all("/<td>(.*)<\/td>/", $web , $matches);
print_r( $matches);
Lazy Quantifier, Different Delimiter
You need .*? rather than .*, otherwise you can overshoot the closing </td>. Also, your / delimiter needed to be escaped when it appeared in </td>. We can replace it with another one that doesn't need escaping.
Do this:
$regex = '~<td>.*?</td>~';
preg_match_all($regex, $web, $matches);
print_r($matches[0]);
Explanation
The ~ is just an esthetic tweak—you can use any delimiter you like around your regex patttern, and in general ~ is more versatile than /, which needs to be escaped more often, for instance in </td>.
The star quantifier in .*? is made "lazy" by the ? so that the dot only matches as many characters as needed to allow the next token to match (shortest match). Without the ?, the .* first matches the whole string, then backtracks only as far as needed to allow the next token to match (longest match).

Using delimiters with preg_match

I am having difficulties to understand preg_match function.An e.g is way better
$subject="XY=abC%3Fedr%3Damp;35"
I am trying to extract
bC%3Fed
using preg_match and store it in variable
if(preg_match($pattern, $subject, $matches))
{
$string = $matches[1];
}
echo $string;
Here are the different variation that i use for $pattern
I want to use # as a delimeter
#bC(.*?)#
#bC.*?#
I just don't understand why its not working , i guess something is wrong in the $pattern.
Please don't use complicated regex and try to fix my attempt as the aim here is to understand how preg_match works and what is wrong here.
Regards
Using # as the delimiter is OK, but the regex is wrong. I guess you want:
#(bC.*?)r# // matches #bC and the following characters unless and 'r' (see comments)
A good starting point to learn the regex syntax is the PCRE manual
Example:
$subject="XY=abC%3Fedr%3Damp;35";
$pattern="#(bC.*?)r#";
preg_match($pattern, $subject, $matches);
$string = $matches[1];
echo $string; // bC%3Fed
The ? after .* switches the greediness of the pattern. By default patterns are greedy, they try to find the longest match. So you .*? means any char, any count, smallest match. Because here is nothing after that will anchor it, the smallest possible match is an empty string.

preg_replace not matching properly

I know this has been asked before as ive just been reading those answers but still cant get this to work (properly).
Im very new to regex and am trying to do something that sounds pretty simple:
The string would be:
http://www.something.com/section/filter/colour/red-#998682/size/small/
What i would like to do is a preg_replace to remove the -#?????? so the url looks like:
http://www.something.com/section/filter/colour/red/size/small/
So i tried:
$string = $theURL;
$pattern = '/-\#(.*)\//i';
$replacement = '/';
$newURL = preg_replace($pattern, $replacement, $string);
That sort of works but it doesnt stop. If I have anything after the -#?????? it also removes that as well. But I thought having the / on the end would stop it doing that?
Hoping someone can help and thanks for reading
PCRE is greedy by default, meaning that .* will match as big a chunk as possible. Make it ungreedy by adding the U flag (for the entire pattern) or use .*? (for just that wildcard part):
/-\#(.*)\//iU
or
/-\#(.*?)\//i
You need to use non-greedy quantifier.
$pattern = '/-\#(.*?)\//i';
Your regex is greedy, which means that (.*)\/ looks for the last slash, not the first one.
demo
(.*) pattern is gready, which means it'll match as many characters as possible. To match everything to the first slash use (.*?):
$pattern = '/-\#(.*?)\//i';

Simple RegEx PHP

Since I am completely useless at regex and this has been bugging me for the past half an hour, I think I'll post this up here as it's probably quite simple.
hey.exe
hey2.dll
pomp.jpg
In PHP I need to extract what's between the <a> tags example:
hey.exe
hey2.dll
pomp.jpg
Avoid using '.*' even if you make it ungreedy, until you have some more practice with RegEx. I think a good solution for you would be:
'/<a[^>]+>([^<]+)<\/a>/i'
Note the '/' delimiters - you must use the preg suite of regex functions in PHP. It would look like this:
preg_match_all($pattern, $string, $matches);
// matches get stored in '$matches' variable as an array
// matches in between the <a></a> tags will be in $matches[1]
print_r($matches);
This appears to work:
$pattern = '/<a.*?>(.*?)<\/a>/';
([^<]*)
I found this regular expression tester to be helpful.
Here is a very simple one:
<a.*>(.*)</a>
However, you should be careful if you have several matches in the same line, e.g.
hey.exehey2.dll
In this case, the correct regex would be:
<a.*?>(.*?)</a>
Note the '?' after the '*' quantifier. By default, quantifiers are greedy, which means they eat as much characters as they can (meaning they would return only "hey2.dll" in this example). By appending a quotation mark, you make them ungreedy, which should better fit your needs.

Categories