Regex pattern for matching mm <sup>3<sup> - php

I’m trying to write a regular expression to change mm3 to mL:
<?php
$match = 'mm<sup>3</sup>';
if(preg_match('/\b(mm<sup>3</sup>)\b/', $match))
{
$replacement = 'ml';
$replac = preg_replace('/\b(mm<sup>3</sup>)\b/', $replacement, $match);
echo $replac;
}
?>
But my regular expression doesn't capture the content in $match variable, and the $replac value isn't output. What am I doing wrong?

Change:
if(preg_match('/\b(mm<sup>3</sup>)\b/',$match))
to:
if(preg_match('#\bmm<sup>3</sup>\b#',$match))
and similarly in the preg_replace call.
Since your regular expression contains /, you need to either escape it or use a different delimiter around the regular expression.
There's also no need for the parentheses, since you're not doing anything with the groups.

You need to either use preg_quote to get rid of that / in your regexp, or use a different delimiter (usually # is used).
Also, the \b separator after the > is not necessary, nor are parentheses since you don't seem to be doing capture; you're basically doing a more expensive str_replace.
Finally, you can do everything in one move. If there's no match, nothing will happen.
<?php
$match = 'mm<sup>3</sup>';
$replacement='ML';
$replac = preg_replace('#\\bmm<sup>3</sup>#',
$replacement,
$match);
echo $replac;
?>
If you want to be picky, I guess you should also replace with 'ml', not 'ML' :-)
(for replacement of multiple strings, preg_replace supports arrays).
Note: unless you're sure that is the correct HTML you want replaces, maybe you ought to try
$match = 'mm\\s*<sup>\\s*3\\s*</sup>';
in order to catch mm 3 and similar, in addition to mm3 (in some circumstances they may look alike, and some editors might use or automatically "correct" either form into the other).

Related

preg_replace returns unexpected results to $1

<?php
$data='123
[test=abc]cba[/test]
321';
$test = preg_replace("(\[test=(.+?)\](.+?)\[\/test\])is","$1",$data);
echo $test;
?>
I expect the above code to return
abc
but instead of returning abc it returns
123 abc 321
Please tell me what I am doing wrong.
You're only replacing the matched part (the BBcode section). You're leaving the rest of the string untouched.
If you also want to remove the leading/trailing text, include those in the expression:
$test = preg_replace("(.*\[test=(.+?)\](.+?)\[\/test\].*)is","$1",$data);
I don't know if you're aware of this, but the outermost set of parentheses in your regex does not form a group (capturing or otherwise). PHP is interpreting them as regex delimiters. If you are aware of that, and you're using them as delimiters on purpose, please don't. It's usually best to use a non-bracketing character that never has any special meaning in regexes (~, %, #, etc.).
I agree with Casimir that preg_match() is the tool you should be using, not preg_replace(). But his solution is trickier than it needs to be. Your original regex works fine; all you have to do is grab the contents of the first capturing group, like so:
if (preg_match('%\[test=(.+?)\](.+?)\[/test\]%si', $data, $match)) {
$test = $match[1];
}
You don't need to use a replace here, all that you need is to take something in the string. To do that preg_match is more useful:
$data='123
[test=abc]cba[/test]
321';
$test = preg_match('~\[test=\K[^\]]++~', $data, $match);
echo $match[0];

PHP Regular expression

I would like to capture the last folder in paths without the year. For this string path I would need just 'Millers Crossing' not 'Movies\Millers Crossing' which is what my current regex captures.
G:\Movies\Millers Crossing [1990]
preg_match('/\\\\(.*)\[\d{4}\]$/i', $this->parentDirPath, $title);
How about basename [docs] and substr [docs] instead of complicated expressions?
$title = substr(basename($this->parentDirPath), 0, -6);
This assumes that there will always be a year in the format [xxxx] at the end of the string.
(And it works on *nix too ;))
Update: You can still use basename to get the folder and then apply a regular expression:
$folder = basename($this->parentDirPath);
preg_match('#^(.*?)(?:\[\d{4}\])?$#', $str, $match);
$title = $match[1];
Try
preg_match('/\\\\([^\\]*)\[\d{4}\]$/i', $this->parentDirPath, $title);
Basically, instead of matching any character with ., you're matching any character but \.
It looks like you want something like this:
/([^\\])+\s\[\d{4}\]$/
That's what I'd go with, at least. Should only include whatever comes after the last backslash in the string, and the movie title will be in the first capture group.
Simpler approach:
([^\\]*)\s?\[\d{4}\]$
I believe your issue is also with you including "double backslashes" (e.g. \\\\ instead of a single \\. You can also make life easier by using a class to include characters you don't want by prefixing it with a caret (^).

How to use preg match all in php?

Hi i want to retrieve certain information from a website.
This is what is display on the website with html tags.
<a href="ProductDisplay?catalogId=10051&storeId=90001&productId=258033&langId=-1" id="WC_CatalogSearchResultDisplay_Link_6_3" class="s_result_name">
SALT - Fine
</a>
What i want to extract is "SALT - FINE" using preg match however i do not know why i cant use it. isit because they are all on different line? cos i realise if they are on a single line i can actually retrieve what i want.
This is my code -
$pattern = '/id="WC_CatalogSearchResultDisplay_Link_6_3.*<\/a>/';
preg_match_all($pattern, $response, $match);
print_r($match);
I do not get anything in my array. if they are on a single line it works?.why is that so?
Have a look at:
http://www.php.net/manual/en/reference.pcre.pattern.modifiers.php
especially the m and s modifiers.
Also, I would recommend, changing the pattern to something like:
$pattern = '/id="WC_CatalogSearchResultDisplay_Link_6_3"[^>]*>(.*)<\/a>/ims';
Otherwise, you'll match the end of your a-tag.
And on a side note, don't use regex to parse html/xml.
Something like this:
<?php
$dom = DOMDocument::loadHtml($response);
$xpath = new DOMXPath($dom);
$node = $xpath->query('//*[#id="WC_CatalogSearchResultDisplay_Link_6_3"]/text()')->item(0);
if ($node instanceof DOMText) {
echo trim($node->nodeValue);
}
will also work, and will be a lot more robust.
You should encapsulate what you want to match by (). So i guess your pattern would then become
$pattern = '/id="WC_CatalogSearchResultDisplay_Link_6_3(.*)<\/a>/';
I however don't fully see how you arrived at this pattern, since it would be simpler to just match everything enclosed by a-tags.
Edit:
You also need the s modifier as mentioned by Yoshi so the . matches a newline. I would thus suggest you use this code:
$pattern = '/<a[^>]*>(.+)<\/a>/si';
preg_match_all($pattern, $response, $match);
print_r($match);
You're right, it's because it's a multi-line input string.
You need to add the m and s modifiers to the regex pattern to match multiline strings:
$pattern = '/id="WC_CatalogSearchResultDisplay_Link_6_3.*<\/a>/ms';
The m modifier makes it multi-line.
The s modifier makes the . dot match newline characters as well as all others (by default it doesn't match newlines)

Regular expression anchor text for a link

I am trying to pull the anchor text from a link that is formatted this way:
<h3><b>File</b> : i_want_this</h3>
I want only the anchor text for the link : "i_want_this"
"variable_text" varies according to the filename so I need to ignore that.
I am using this regex:
<a href=\"\/en\/browse\/file\/variable_text\">(.*?)<\/a>
This is matching of course the complete link.
PHP uses a pretty close version to PCRE (PERL Regex). If you want to know a lot about regex, visit perlretut.org. Also, look into Regex generators like exspresso.
For your use, know that regex is greedy. That means that when you specify that you want something, follwed by anything (any repetitions) followed by something, it will keep on going until that second something is reached.
to be more clear, what you want is this:
<a href="
any character, any number of times (regex = .* )
">
any character, any number of times (regex = .* )
</a>
beyond that, you want to capture the second group of "any character, any number of times". You can do that using what are called capture groups (capture anything inside of parenthesis as a group for reference later, also called back references).
I would also look into named subpatterns, too - with those, you can reference your choice with a human readable string rather than an array index. Syntax for those in PHP are (?P<name>pattern) where name is the name you want and pattern is the actual regex. I'll use that below.
So all that being said, here's the "lazy web" for your regex:
<?php
$str = '<h3><b>File</b> : i_want_this</h3>';
$regex = '/(<a href\=".*">)(?P<target>.*)(<\/a>)/';
preg_match($regex, $str, $matches);
print $matches['target'];
?>
//This should output "i_want_this"
Oh, and one final thought. Depending on what you are doing exactly, you may want to look into SimpleXML instead of using regex for this. This would probably require that the tags that we see are just snippits of a larger whole as SimpleXML requires well-formed XML (or XHTML).
I'm sure someone will probably have a more elegant solution, but I think this will do what you want to done.
Where:
$subject = "<h3><b>File</b> : i_want_this</h3>";
Option 1:
$pattern1 = '/(<a href=")(.*)(">)(.*)(<\/a>)/i';
preg_match($pattern1, $subject, $matches1);
print($matches1[4]);
Option 2:
$pattern2 = '()(.*)()';
ereg($pattern2, $subject, $matches2);
print($matches2[4]);
Do not use regex to parse HTML. Use a DOM parser. Specify the language you're using, too.
Since it's in a captured group and since you claim it's matching, you should be able to reference it through $1 or \1 depending on the language.
$blah = preg_match( $pattern, $subject, $matches );
print_r($matches);
The thing to remember is that regex's return everything you searched for if it matches. You need to specify that only care about the part you've surrounded in parenthesis (the anchor text). I'm not sure what language you're using the regex in, but here's an example in Ruby:
string = 'i_want_this'
data = string.match(/<a href=\"\/en\/browse\/file\/variable_text\">(.*?)<\/a>/)
puts data # => outputs 'i_want_this'
If you specify what you want in parenthesis, you can reference it:
string = 'i_want_this'
data = string.match(/<a href=\"\/en\/browse\/file\/variable_text\">(.*?)<\/a>/)[1]
puts data # => outputs 'i_want_this'
Perl will have you use $1 instead of [1] like this:
$string = 'i_want_this';
$string =~ m/<a href=\"\/en\/browse\/file\/variable_text\">(.*?)<\/a>/;
$data = $1;
print $data . "\n";
Hope that helps.
I'm not 100% sure if I understand what you want. This will match the content between the anchor tags. The URL must start with /en/browse/file/, but may end with anything.
#(.*?)#
I used # as a delimiter as it made it clearer. It'll also help if you put them in single quotes instead of double quotes so you don't have to escape anything at all.
If you want to limit to numbers instead, you can use:
#(.*?)#
If it should have just 5 numbers:
#(.*?)#
If it should have between 3 and 6 numbers:
#(.*?)#
If it should have more than 2 numbers:
#(.*?)#
This should work:
<a href="[^"]*">([^<]*)
this says that take EVERYTHING you find until you meet "
[^"]*
same! take everything with you till you meet <
[^<]*
The paratese around [^<]*
([^<]*)
group it! so you can collect that data in PHP! If you look in the PHP manual om preg_match you will se many fine examples there!
Good luck!
And for your concrete example:
<a href="/en/browse/file/variable_text">([^<]*)
I use
[^<]*
because in some examples...
.*?
can be extremely slow! Shoudln't use that if you can use
[^<]*
You should use the tool Expresso for creating regular expression... Pretty handy..
http://www.ultrapico.com/Expresso.htm

Simple RegEx PHP

Since I am completely useless at regex and this has been bugging me for the past half an hour, I think I'll post this up here as it's probably quite simple.
hey.exe
hey2.dll
pomp.jpg
In PHP I need to extract what's between the <a> tags example:
hey.exe
hey2.dll
pomp.jpg
Avoid using '.*' even if you make it ungreedy, until you have some more practice with RegEx. I think a good solution for you would be:
'/<a[^>]+>([^<]+)<\/a>/i'
Note the '/' delimiters - you must use the preg suite of regex functions in PHP. It would look like this:
preg_match_all($pattern, $string, $matches);
// matches get stored in '$matches' variable as an array
// matches in between the <a></a> tags will be in $matches[1]
print_r($matches);
This appears to work:
$pattern = '/<a.*?>(.*?)<\/a>/';
([^<]*)
I found this regular expression tester to be helpful.
Here is a very simple one:
<a.*>(.*)</a>
However, you should be careful if you have several matches in the same line, e.g.
hey.exehey2.dll
In this case, the correct regex would be:
<a.*?>(.*?)</a>
Note the '?' after the '*' quantifier. By default, quantifiers are greedy, which means they eat as much characters as they can (meaning they would return only "hey2.dll" in this example). By appending a quotation mark, you make them ungreedy, which should better fit your needs.

Categories