preg_match_all all combinations with word bounderies - php

i've got the following string:
$string = "König Friedrich August III. von Sachsen - Adel Sachsen, Waidmannsheil, Kapitaler 16ender erlegt auf der Jagd am 2. Oktober 1905, gelaufen 30.06.1909, Verlag, Karlowa Walter, Dresden";
Now I wan't to find words in that string using preg_match_all:
preg_match_all("/\b(abituria)\b|\b(absolvia)\b|\b(adel sachsen)\b|\b(adel)\b|\b(sachsen)\b|\b(könig)\b/i",$string,$matches);
The string matches only for
array(
0 => "König",
1 => "Adel Sachsen"
)
but I need that it also returns "Adel" in the $matches-Array.
How can I do that? I think my problem is that: "After the first match is found, the subsequent searches are continued on from end of the last match."
Update
That does not work:
preg_match_all('/(?=\b(adel sachsen|adel)\b)/ui', $string, $matches);
print_r($matches[1]);
Array
(
[0] => Adel Sachsen
)
preg_match_all('/(?=\b(adel|adel sachsen)\b)/ui', $string, $matches);
print_r($matches[1]);
Array
(
[0] => Adel
)
But i need the following as result:
Array
(
[0] => Adel Sachsen,
[1] => Adel
)

I would just search for each word/combination (generate a pattern for each) and map the according match to the result array or set false, if it doesn't match. Then filter the false elements:
$arr = ["nadel", "adel", "knödel", "sachsen", "adel sachsen"];
$str = "Friedrich August III. von Sachsen - Adel Sachsen";
$res = array_filter(array_map(function ($s) use (&$str) {
$s = '/\b'.preg_quote($s,'/').'\b/iu';
return preg_match($s, $str, $out) ? $out[0] : false; }, $arr));
sort($res); print_r($res);
See test at eval.in (anonymous functions with array_map: at least PHP 5.3 is required)
Array
(
[0] => Adel
[1] => Adel Sachsen
[2] => Sachsen
)
The function can be further improved to return arrays, if such as different cases for same words is desired or capturing the offset.

You can use lookahead to get your ovelaping matches:
preg_match_all('/(?=\b(abituria|absolvia|adel sachsen|adel|sachsen|könig)\b)/ui',
$string, $matches);
print_r($matches[1]);
Array
(
[0] => König
[1] => Sachsen
[2] => Adel Sachsen
[3] => Sachsen
)
RegEx Demo
Update: Based on your updated code snippet you can do this:
preg_match_all('/(?=\b(adel sachsen)\b)(?=\b(adel)\b)/ui', $string, $matches);
unset($matches[0]);
print_r($matches);
Output:
Array
(
[1] => Array
(
[0] => Adel Sachsen
)
[2] => Array
(
[0] => Adel
)
)

As you already noticed, preg_match_all continues searching after the end of each last match, so it is not the best tool for your task.
The easy but less performant solution would be to do one preg_match for each single search term instead.
If the strings are not much longer than your example I would go for this, optimizing it seems not to be worth it.
If performance is really critical, I would group prefixes of other terms with them, ordering each group by longest term first:
abituria
absolvia
adel sachsen, adel
sachsen
könig
Now use the regex with lookahead assertion:
preg_match_all('/(?=\b(abituria|absolvia|adel sachsen|adel|sachsen|könig)\b)/ui',
$string, $matches);
If $string contains "adel", but not "adel sachsen", it will match correctly. If it contains "adel sachsen", it will only match "adel sachsen", but from the groups that we constructed before, we know that it also matches prefixes of "adel sachsen", i.e. "adel".

Related

regex split a string between [ and ]

My string is something like that '[15][18][22]' and now I like so split it into an array of [15] and [18] and [22]. I'm trying with this regex
\[\d+\]
But it only split the first one.
thanks for help
You are better off using preg_match_all with what you want to capture:
if (preg_match_all('/\[\d+]/', $str, $m)) {
print_r($m[0]);
}
Output:
Array
(
[0] => [15]
[1] => [18]
[2] => [22]
)
Or else you may use this preg_split with a capture group:
$str = '[15][18][22]';
$arr = preg_split('/(\[\d+])/', $str, -1,
PREG_SPLIT_DELIM_CAPTURE | PREG_SPLIT_NO_EMPTY);
print_r($arr);
Output:
Array
(
[0] => [15]
[1] => [18]
[2] => [22]
)
It just doesn't get any simpler than this. Three characters in the pattern. You only need to explode on the zero-width position after each ]. \K tells the regex engine to forget/release the previously matched character.
~]\K~ Pattern Demo
Code: (Demo)
$string = '[15][18][22]';
var_export(preg_split('~]\K~', $string, -1, PREG_SPLIT_NO_EMPTY));
Output:
array (
0 => '[15]',
1 => '[18]',
2 => '[22]',
)
This will perform with maximum efficiency because it doesn't have any capture groups, lookarounds, or alternatives to slow it down.

How to get a particular string using preg_replace?

i want to get a particular value from string in php. Following is the string
$string = 'users://data01=[1,2]/data02=[2,3]/*';
preg_replace('/(.*)\[(.*)\](.*)\[(.*)\](.*)/', '$2', $str);
i want to get value of data01. i mean [1,2].
How can i achieve this using preg_replace?
How can solve this ?
preg_replace() is the wrong tool, I have used preg_match_all() in case you need that other item later and trimmed down your regex to capture the part of the string you are looking for.
$string = 'users://data01=[1,2]/data02=[2,3]/*';
preg_match_all('/\[([0-9,]+)\]/',$string,$match);
print_r($match);
/*
print_r($match) output:
Array
(
[0] => Array
(
[0] => [1,2]
[1] => [2,3]
)
[1] => Array
(
[0] => 1,2
[1] => 2,3
)
)
*/
echo "Your match: " . $match[1][0];
?>
This enables you to have the captured characters or the matched pattern , so you can have [1,2] or just 1,2
preg_replace is used to replace by regular expression!
I think you want to use preg_match_all() to get each data attribute from the string.
The regex you want is:
$string = 'users://data01=[1,2]/data02=[2,3]/*';
preg_match_all('#data[0-9]{2}=(\[[0-9,]+\])#',$string,$matches);
print_r($matches);
Array
(
[0] => Array
(
[0] => data01=[1,2]
[1] => data02=[2,3]
)
[1] => Array
(
[0] => [1,2]
[1] => [2,3]
)
)
I have tested this as working.
preg_replace is for replacing stuff. preg_match is for extracting stuff.
So you want:
preg_match('/(.*?)\[(.*?)\](.*?)\[(.*?)\](.*)/', $str, $match);
var_dump($match);
See what you get, and work from there.

Finding the no of occurence of a string inside another string using regex in PHP?

I want to find the no of occurences of a sustring(pattern based) inside another string.
For example:
$mystring = "|graboard='KERALA'||graboarded='KUSAT'||graboard='MG'";
I want to find the no of graboards present in the $mystring,
So I used the regex for this, But how will I find the no of occurrence?
If you must use a regex, preg_match_all() returns the number of matches.
Use preg_match_all:
$mystring = "|graboard='KERALA'||graboarded='KUSAT'||graboard='MG'";
preg_match_all("/(graboard)='(.+?)'/i", $mystring, $matches);
print_r($matches);
will yield:
Array
(
[0] => Array
(
[0] => graboard='KERALA'
[1] => graboard='MG'
)
[1] => Array
(
[0] => graboard
[1] => graboard
)
[2] => Array
(
[0] => KERALA
[1] => MG
)
)
So then you can use count($matches[1]) -- however, this regex may need to be modified to suit your needs, but this is just a basic example.
Just use preg_match_all():
// The string.
$mystring="|graboard='KERALA'||graboarded='KUSAT'||graboard='MG'";
// The `preg_match_all()`.
preg_match_all('/graboard/is', $mystring, $matches);
// Echo the count of `$matches` generated by `preg_match_all()`.
echo count($matches[0]);
// Dumping the content of `$matches` for verification.
echo '<pre>';
print_r($matches);
echo '</pre>';

PHP preg_match_all not finding first match

I am trying to find all matches in a string. For some reason if my match is at the start of the string it is not returning that particular match. Does it have something to do with index 0? I am also using PREG_OFFSET_CAPTURE to get the indexes vs. the matches. Below is the code of working an non-working.
$text = '[QUOTE]I wonder why[QUOTE]PHP[IMG]hates me[/IMG][/QUOTE][/QUOTE][URL="http://www.bing.com"]Click me![QUOTE]........[/QUOTE]Ok Bai![/URL]';
preg_match_all('#\[QUOTE\]#', $text, $matches, PREG_OFFSET_CAPTURE, PREG_PATTERN_ORDER);
print_r($matches);
The result of which is:
Array ( [0] => Array ( [0] => Array ( [0] => [QUOTE] [1] => 19 ) [1] => Array ( [0] => [QUOTE] [1] => 100 ) ) )
As you can see it only found two matches. If I add a character to the start of the string it will then find all three.
$text = 'a[QUOTE]I wonder why[QUOTE]PHP[IMG]hates me[/IMG][/QUOTE][/QUOTE][URL="http://www.bing.com"]Click me![QUOTE]........[/QUOTE]Ok Bai![/URL]';
preg_match_all('#\[QUOTE\]#', $text, $matches, PREG_OFFSET_CAPTURE, PREG_PATTERN_ORDER);
print_r($matches);
The result of which is:
Array ( [0] => Array ( [0] => Array ( [0] => [QUOTE] [1] => 1 ) [1] => Array ( [0] => [QUOTE] [1] => 20 ) [2] => Array ( [0] => [QUOTE] [1] => 101 ) ) )
All three matches. If anyone can help me figure out if my REGEX needs to be modified or if there is some quirk I'm unaware of it would be much appreciated. I've tried this same thing utilizing Python and the re library and it returns all my matches. I also utilized this http://www.regextester.com/ and it reports it as working in both scenarios and matching everything as it should. My only guess is something to do with the PREG_OFFSET_CAPTURE finding a match at position 0 and the 0 causing some issue.
Thanks in advance for any assistance!
The correct way to add multiple flags is with a pipe |, so:
preg_match_all('#\[QUOTE\]#', $text, $matches, PREG_OFFSET_CAPTURE | PREG_PATTERN_ORDER);
Your , before PREG_PATTERN_ORDER means it becomes the 'offset' parameter (at which point in the string to start), and as PREG_PATTERN_ORDER==1, it starts at the second character.
The problem is in your function call:
preg_match_all('#\[QUOTE\]#', $text, $matches, PREG_OFFSET_CAPTURE, PREG_PATTERN_ORDER);
The fifth parameter is the offset, not another flag.

Regex, get multiple occurrences

I would like to know how to get multiple occurrences from a regex.
$str = "Some validations <IF TEST>firstValue</IF> in <IF OK>secondValue</IF> end of string.";
$do = preg_match("/<IF(.*)>.*<\/IF>/i", $str, $matches);
This is what I've done so far. It works if I have only 1 , but if I have more it doesn't return the right values. Here is the result:
Array ( [0] => firstValue in secondValue [1] => TEST>firstValue in
I need to get the "TEST" and the "OK" values.
EDIT: I've brought the modifications suggested, thanks a lot it works fine ! However, I am now trying to add a elsif parameter and can't get it to work well. Here is what I've done:
$do = preg_match_all("~<IF([^<>]+)>([^<>]+)(</IF>|<ELSEIF([^<>]+)>([^<>]+)</IF>)~", $str, $matches, PREG_SET_ORDER);
and the results is
Array
(
[0] => Array
(
[0] => firstValuesecondValue
[1] => TEST
[2] => firstValue
[3] => secondValue
[4] => TEST1
[5] => secondValue
)
[1] => Array
(
[0] => thirdValue
[1] => OK
[2] => thirdValue
[3] =>
)
)
Is there a way to make my array more clean ? It has many elements which are useless like the [0][4] etc.
You should make the regex more specific. The .* that you are using should either be less greedy, or better yet disallow other angle brackets:
~<IF([^<>]+)>([^<>]+)</IF>~i
More importantly, you should use preg_match_all, not just preg_match.
preg_match_all("~<IF([^<>]+)>([^<>]+)</IF>~i", $str, $matches, PREG_SET_ORDER);
That'll give you a nested array like:
[0] => Array
(
[0] => <IF TEST>firstValue</IF>
[1] => TEST
[2] => firstValue
)
[1] => Array
(
[0] => <IF OK>secondValue</IF>
[1] => OK
[2] => secondValue
)
The answers pointing out that you should use preg_match_all are correct.
But there is another problem: the .* is greedy by default. This will cause it to match both tags in a single match, so you need to make the star non-greedy (i.e. lazy):
/<IF(.*?)>.*?<\/IF>/i
Use this code:
$string = "Some validations <IF TEST>firstValue</IF> in <IF OK>secondValue</IF> end of string.";
$regex = "/<IF (.*?)>.*?<\/IF>/i";
preg_match_all($regex, $string, $matches);
print_r($matches[1]);
You regex is good but you have to use the non-greedy mode adding the ? char and use the preg_match_all() function.
Use a non-greedy match .*? and preg_match_all for this purpose.

Categories