PHP regex lookahead not working as expected - php

I'm trying to match a version number like 2.3.3 Release fhf47fh and stripping out the periods to get a desired result of 233
Using the pattern /\d+(?=\.)\d+(?=\.)\d+/ with preg_match
The lookahead for the period does not seem to work as expected.
thanks!

If you're looking to compare the version, you can strip on the space and then use version_compare().
If you just want the numeric representation, use a regex to simply use preg_replace() all non digits in the original version string.
$version = '2.3.3 Release';
echo preg_replace('/\D+/', '', $version);

This seemed to work for all my test cases.
preg_replace('/^(\d+)\.(\d+)\.(\d+).*$/', '$1$2$3', $version);

I'd use something like this:
$pattern = '~\d+(?:\.*\d*)*~';
For the string you've provided in the question:
if (preg_match('~\d+(?:\.*\d*)*~', $version, $matches))
echo $matches[0]; // => 2.3.3
Regex101 demo.

Related

preg_replace with Regex - find number-sequence in URL

I'm a regex-noobie, so sorry for this "simple" question:
I've got an URL like following:
http://stellenanzeige.monster.de/COST-ENGINEER-AUTOMOTIVE-m-w-Job-Mainz-Rheinland-Pfalz-Deutschland-146370543.aspx
what I'm going to archieve is getting the number-sequence (aka Job-ID) right before the ".aspx" with preg_replace.
I've already figured out that the regex for finding it could be
(?!.*-).*(?=\.)
Now preg_replace needs the opposite of that regular expression. How can I archieve that? Also worth mentioning:
The URL can have multiple numbers in it. I only need the sequence right before ".aspx". Also, there could be some php attributes behind the ".aspx" like "&mobile=true"
Thank you for your answers!
You can use:
$re = '/[^-.]+(?=\.aspx)/i';
preg_match($re, $input, $matches);
//=> 146370543
This will match text not a hyphen and not a dot and that is followed by .aspx using a lookahead (?=\.aspx).
RegEx Demo
You can just use preg_match (you don't need preg_replace, as you don't want to change the original string) and capture the number before the .aspx, which is always at the end, so the simplest way, I could think of is:
<?php
$string = "http://stellenanzeige.monster.de/COST-ENGINEER-AUTOMOTIVE-m-w-Job-Mainz-Rheinland-Pfalz-Deutschland-146370543.aspx";
$regex = '/([0-9]+)\.aspx$/';
preg_match($regex, $string, $results);
print $results[1];
?>
A short explanation:
$result contains an array of results; as the whole string, that is searched for is the complete regex, the first element contains this match, so it would be 146370543.aspx in this example. The second element contains the group captured by using the parentheeses around [0-9]+.
You can get the opposite by using this regex:
(\D*)\d+(.*)
Working demo
MATCH 1
1. [0-100] `http://stellenanzeige.monster.de/COST-ENGINEER-AUTOMOTIVE-m-w-Job-Mainz-Rheinland-Pfalz-Deutschland-`
2. [109-114] `.aspx`
Even if you just want the number for that url you can use this regex:
(\d+)

Using delimiters with preg_match

I am having difficulties to understand preg_match function.An e.g is way better
$subject="XY=abC%3Fedr%3Damp;35"
I am trying to extract
bC%3Fed
using preg_match and store it in variable
if(preg_match($pattern, $subject, $matches))
{
$string = $matches[1];
}
echo $string;
Here are the different variation that i use for $pattern
I want to use # as a delimeter
#bC(.*?)#
#bC.*?#
I just don't understand why its not working , i guess something is wrong in the $pattern.
Please don't use complicated regex and try to fix my attempt as the aim here is to understand how preg_match works and what is wrong here.
Regards
Using # as the delimiter is OK, but the regex is wrong. I guess you want:
#(bC.*?)r# // matches #bC and the following characters unless and 'r' (see comments)
A good starting point to learn the regex syntax is the PCRE manual
Example:
$subject="XY=abC%3Fedr%3Damp;35";
$pattern="#(bC.*?)r#";
preg_match($pattern, $subject, $matches);
$string = $matches[1];
echo $string; // bC%3Fed
The ? after .* switches the greediness of the pattern. By default patterns are greedy, they try to find the longest match. So you .*? means any char, any count, smallest match. Because here is nothing after that will anchor it, the smallest possible match is an empty string.

Convert Notepad++ Regex to PHP Regular Expression

I'm trying to convert a Notepad++ Regex to a PHP regular expression which basically get IDs from a list of URL in this format:
http://www.example.com/category-example/1371937-text-blah-blah-blah-2012.html
http://www.example.com/category-example/1471337-text-blah-blah-2-blah-2010.html
Using Notepad++ regex function i get the output that i need in two steps (a list of comma separated IDs)
(.*)/ replace with space
-(.*) replace with comma
Result:
1371937,1471337
I tried to do something similar with PHP preg_replace but i can't figure how to get the correct regex, the below example removes everything except digits but it doesn't work as expected since there can be also numbers that do not belong to ID.
$bb = preg_replace('/[^0-9]+/', ',', $_POST['Text']);
?>
Which is the correct structure?
Thanks
If you are matching against:
http://www.example.com/category-example/1371937-text-blah-blah-blah-2012.html
To get:
1371937
You would:
$url = "http://www.example.com/category-example/1371937-text-blah-blah-blah-2012.html";
preg_match( "/[^\d]+(\d+)-/", $url, $matches );
$code = $matches[1];
.. which matches all non-numeric characters, then an unbroken string of numbers, until it reaches a '-'
If all you want to do is find the ID, then you should use preg_match, not preg_replace.
You've got lost of options for the pattern, the simplest being:
$url = 'http://www.example.com/category-example/1371937-text-blah-blah-blah-2012.html';
preg_match('/\d+/', $url, $matches);
echo $matches[0];
Which simply finds the first bunch of numbers in the URL. This works for the examples.

PHP - How to convert the YouTube URL with Regex

How can convert the below youtube urls
$url1 = http://www.youtube.com/watch?v=136pEZcb1Y0&feature=fvhl
$url2 = http://www.youtube.com/watch?feature=fvhl&v=136pEZcb1Y0
into
$url_embedded = http://www.youtube.com/v/136pEZcb1Y0
using Regular Expressions?
Here's an example solution:
PHP:
preg_replace('/.+(\?|&)v=([a-zA-Z0-9]+).*/', 'http://youtube.com/watch?v=$2', 'http://www.youtube.com/watch?v=136pEZcb1Y0&feature=fvhl');
Match:
^.+(\?|&)v=([a-zA-Z0-9]+).*$
Replace with:
http://youtube.com/watch?v=$2
Here's how it works: regex analyzer.
suicideducky's answer is fine, but you changed the requirements. Try
preg_match($url1, "/v=(\w+)/", $matches);
$url_embedded = "http://www.youtube.com/v/" . $matches[1];
In case the wrong version was still cached, I meant $matches[1]!
add the string "http://www.youtube.com/watch/"
to the result of applying the regex "v=(\w+)" to the url(s) should do the job.
\w specifies alphanumeric characters (a-z, A-Z, 0-9 and _) and will thus stop at the &
EDIT for updated question.
My approach seems a little hackish.
so get the result of applying the regex "v=(\w+)" and then apply the regex "(\w+)" to it.
Then prefix it with the string "http://www.youtube.com/v/".
so to sum up:
"http://www.youtube.com/v/" + ( result of "(\w+)" applies to the result of ( "v=(\w+)" applied to the origional url ) )
EDITED AGAIN this approach assumes you are using a regex function that matches a substring instead of the whole string
Also, MvanGeest's version is superior to mine.

using preg_match to strip specified underscore in php

There has always been a confusion with preg_match in php.
I have a string like this:
apsd_01_03s_somedescription
apsd_02_04_somedescription
Can I use preg_match to strip off anything from 3rd underscore including the 3rd underscore.
thanks.
Try this:
preg_replace('/^([^_]*_[^_]*_[^_]*).*/', '$1', $str)
This will take only the first three sequences that are separated by _. So everything from the third _ on will be removed.
if you want to strip the "_somedescription" part: preg_replace('/([^]*)([^]*)([^]*)(.*)/', '$1_$2_$3', $str);
I agree with Gumbo's answer, however, instead of using regular expressions, you can use PHP's array functions:
$s = "apsd_01_03s_somedescription";
$parts = explode("_", $s);
echo implode("_", array_slice($parts, 0, 3));
// apsd_01_03s
This method appears to execute similarly in speed, compared to a regular expression solution.
If the third underscore is the last one, you can do this:
preg_replace('/^(.+)_.+?)$/', $1, $str);

Categories