PHP preg_match returns only first match - php

The first question is this:
I am using http://www.phpliveregex.com/ to check my regex is right and it finds more than one matching lines.
I am doing this regex:
$lines = explode('\n', $text);
foreach($lines as $line) {
$matches = [];
preg_match("/[0-9]+[A-Z][a-z]+ [A-Z][a-z]+S[0-9]+\-[0-9]+T[0-9]+/uim", $line, $matches);
print_r($matches);
}
on the $text which looks like this: http://pastebin.com/9UQ5wNRu
The problem is that printed matches is only one match:
Array
(
[0] => 3Bajus StanislavS2415079249-2615T01
)
Why is it doing to me? any ideas what could fix the problem?
The second question
Maybe you've noticed not regular alphabetic characters of slovak language inside the text (from pastebin). How to match those characters and select the users which have this format:
{number}{first_name}{space}{last_name}{id_number}
how to do that?
Ok first issue is fixed. Thank you #chris85 . I should have used preg_match_all and do it on the whole text. Now I get an array of all students which have non-slovak (english) letters in the name.

preg_match is for one match. You need to use preg_match_all for a global search.
[A-Z] does not include an characters outside that range. Since you are using the i modifier that character class actual is [A-Za-z] which may or may not be what you want. You can use \p{L} in place of that for characters from any language.
Demo: https://regex101.com/r/L5g3C9/1
So your PHP code just be:
preg_match_all("/^[0-9]+\p{L}+ \p{L}+S[0-9]+\-[0-9]+T[0-9]+$/uim", $text, $matches);
print_r($matches);

You can also use T-Regx library:
pattern("^[0-9]+\p{L}+ \p{L}+S[0-9]+\-[0-9]+T[0-9]+$", 'uim')->match($text)->all();

Related

Preg_match for alternative values in regex php not work [duplicate]

The first question is this:
I am using http://www.phpliveregex.com/ to check my regex is right and it finds more than one matching lines.
I am doing this regex:
$lines = explode('\n', $text);
foreach($lines as $line) {
$matches = [];
preg_match("/[0-9]+[A-Z][a-z]+ [A-Z][a-z]+S[0-9]+\-[0-9]+T[0-9]+/uim", $line, $matches);
print_r($matches);
}
on the $text which looks like this: http://pastebin.com/9UQ5wNRu
The problem is that printed matches is only one match:
Array
(
[0] => 3Bajus StanislavS2415079249-2615T01
)
Why is it doing to me? any ideas what could fix the problem?
The second question
Maybe you've noticed not regular alphabetic characters of slovak language inside the text (from pastebin). How to match those characters and select the users which have this format:
{number}{first_name}{space}{last_name}{id_number}
how to do that?
Ok first issue is fixed. Thank you #chris85 . I should have used preg_match_all and do it on the whole text. Now I get an array of all students which have non-slovak (english) letters in the name.
preg_match is for one match. You need to use preg_match_all for a global search.
[A-Z] does not include an characters outside that range. Since you are using the i modifier that character class actual is [A-Za-z] which may or may not be what you want. You can use \p{L} in place of that for characters from any language.
Demo: https://regex101.com/r/L5g3C9/1
So your PHP code just be:
preg_match_all("/^[0-9]+\p{L}+ \p{L}+S[0-9]+\-[0-9]+T[0-9]+$/uim", $text, $matches);
print_r($matches);
You can also use T-Regx library:
pattern("^[0-9]+\p{L}+ \p{L}+S[0-9]+\-[0-9]+T[0-9]+$", 'uim')->match($text)->all();

Can this be solved with a regular expression?

I am trying to extract the digits from between the words in this string.
110.0046102.005699.0008103.0104....
I want to extract 4 digits after dot (point/period).
110.0046
102.0056
99.0008
103.0104
I was wondering if this was possible to do with a regular expression or if I should just use other way.
// replace the variable $numbers with your numbers
$numbers = "110.0046102.005699.0008103.0104";
preg_match_all("#\d+\.\d{4}#", $numbers, $matches);
var_dump($matches); // outputting all matches
https://regex101.com/r/oG1dK1/1 -> you can see the regex in action here. The numbers are in the box MATCH INFORMATION on the right.
Try this regex:
(\d{1,}\.\d{4})
Demo here: https://regex101.com/r/uJ1wU6/1

preg_replace with Regex - find number-sequence in URL

I'm a regex-noobie, so sorry for this "simple" question:
I've got an URL like following:
http://stellenanzeige.monster.de/COST-ENGINEER-AUTOMOTIVE-m-w-Job-Mainz-Rheinland-Pfalz-Deutschland-146370543.aspx
what I'm going to archieve is getting the number-sequence (aka Job-ID) right before the ".aspx" with preg_replace.
I've already figured out that the regex for finding it could be
(?!.*-).*(?=\.)
Now preg_replace needs the opposite of that regular expression. How can I archieve that? Also worth mentioning:
The URL can have multiple numbers in it. I only need the sequence right before ".aspx". Also, there could be some php attributes behind the ".aspx" like "&mobile=true"
Thank you for your answers!
You can use:
$re = '/[^-.]+(?=\.aspx)/i';
preg_match($re, $input, $matches);
//=> 146370543
This will match text not a hyphen and not a dot and that is followed by .aspx using a lookahead (?=\.aspx).
RegEx Demo
You can just use preg_match (you don't need preg_replace, as you don't want to change the original string) and capture the number before the .aspx, which is always at the end, so the simplest way, I could think of is:
<?php
$string = "http://stellenanzeige.monster.de/COST-ENGINEER-AUTOMOTIVE-m-w-Job-Mainz-Rheinland-Pfalz-Deutschland-146370543.aspx";
$regex = '/([0-9]+)\.aspx$/';
preg_match($regex, $string, $results);
print $results[1];
?>
A short explanation:
$result contains an array of results; as the whole string, that is searched for is the complete regex, the first element contains this match, so it would be 146370543.aspx in this example. The second element contains the group captured by using the parentheeses around [0-9]+.
You can get the opposite by using this regex:
(\D*)\d+(.*)
Working demo
MATCH 1
1. [0-100] `http://stellenanzeige.monster.de/COST-ENGINEER-AUTOMOTIVE-m-w-Job-Mainz-Rheinland-Pfalz-Deutschland-`
2. [109-114] `.aspx`
Even if you just want the number for that url you can use this regex:
(\d+)

Convert Notepad++ Regex to PHP Regular Expression

I'm trying to convert a Notepad++ Regex to a PHP regular expression which basically get IDs from a list of URL in this format:
http://www.example.com/category-example/1371937-text-blah-blah-blah-2012.html
http://www.example.com/category-example/1471337-text-blah-blah-2-blah-2010.html
Using Notepad++ regex function i get the output that i need in two steps (a list of comma separated IDs)
(.*)/ replace with space
-(.*) replace with comma
Result:
1371937,1471337
I tried to do something similar with PHP preg_replace but i can't figure how to get the correct regex, the below example removes everything except digits but it doesn't work as expected since there can be also numbers that do not belong to ID.
$bb = preg_replace('/[^0-9]+/', ',', $_POST['Text']);
?>
Which is the correct structure?
Thanks
If you are matching against:
http://www.example.com/category-example/1371937-text-blah-blah-blah-2012.html
To get:
1371937
You would:
$url = "http://www.example.com/category-example/1371937-text-blah-blah-blah-2012.html";
preg_match( "/[^\d]+(\d+)-/", $url, $matches );
$code = $matches[1];
.. which matches all non-numeric characters, then an unbroken string of numbers, until it reaches a '-'
If all you want to do is find the ID, then you should use preg_match, not preg_replace.
You've got lost of options for the pattern, the simplest being:
$url = 'http://www.example.com/category-example/1371937-text-blah-blah-blah-2012.html';
preg_match('/\d+/', $url, $matches);
echo $matches[0];
Which simply finds the first bunch of numbers in the URL. This works for the examples.

RegEx Capture Group with PHP preg_match Not Returning Values

I'm trying to capture the text "Capture This" in $string below.
$string = "</th><td>Capture This</td>";
$pattern = "/<\/th>\r.*<td>(.*)<\/td>$/";
preg_match ($pattern, $string, $matches);
echo($matches);
However, that just returns "Array". I also tried printing $matches using print_r, but that gave me "Array ( )".
This pattern will only come up once, so I just need it to match one time. Can somebody please tell me what I'm doing wrong?
The problem is that you require a CR character \r. Also you should make the search lazy inside the capturing group and use print_r to output the array. Like this:
$pattern = "/<\/th>.*<td>(.*?)<\/td>$/";
You can see it in action here: http://codepad.viper-7.com/djRJ0e
Note that it's recommended to parse html with a proper html parser rather than using regex.
Two things:
You need to drop the \r from your regex as there is no carriage return character in your input string.
Change echo($matches) to print_r($matches) or var_dump($matches)

Categories