Nested pattern matching with preg_match_all (Regex and PHP) - php

I'm working with text data that contains special flags in the form of "{X}" or "{XX}" where X could be any alphanumeric character. Special meaning is assigned to these flags when they are adjacent or when they are separated. I need a regex which will match adjacent flags AND separate each flag in the group.
For Example, given the following input:
{B}{R}: Target player loses 1 life.
{W}{G}{U}: Target player gains 5 life.
The output should be approximate:
("{B}{R}",
"{W}{G}{U}")
("{B}",
"{R}")
("{W}",
"{G}",
"{U}")
My PHP code is returning the adjacents array properly, but the split array contains only the last matching flag in each group:
$input = '{B}{R}: Target player loses 1 life.
{W}{G}{U}: Target player gains 5 life.';
$pattern = '#((\{[a-zA-Z0-9]{1,2}})+)#';
preg_match_all($pattern, $input, $results);
print_r($results);
Output:
Array
(
[0] => Array
(
[0] => {B}{R}
[1] => {W}{G}{U}
)
[1] => Array
(
[0] => {B}{R}
[1] => {W}{G}{U}
)
[2] => Array
(
[0] => {R}
[1] => {U}
)
)
Thanks for any help!

unset($results[1]);
foreach($results[0] AS $match){
preg_match_all('/\{[a-zA-Z0-9]{1,2}}/', $match, $r);
$results[] = $r[0];
}
That's the only way I know of to create your Required datastructure. Though, a preg_split would work as well:
unset($results[1]);
foreach($results[0] AS $match)
$results[] = preg_split('/(?<=})(?=\{)/', $match);

Related

preg_match_all is only matching one

I am trying to get the value after the dots, and I would like to get all of them (each as their own key/value).
The following is what I am running:
$string = "div.cat.dog#mouse";
preg_match_all("/\.(.+?)(\.|#|$)/", $string, $matches);
and when I do a dump of $matches I am getting this:
Array
(
[0] => Array
(
[0] => .cat.
)
[1] => Array
(
[0] => cat
)
[2] => Array
(
[0] => .
)
)
Where item [1] is, it is only returning 1 value. What I was expecting was for it to return (for this case) 2 items cat and dog. How come dog isn't getting picked up by preg_match_all?
Use lookahead:
\.(.+?)(?=\.|#|$)
RegEx Demo
Problem in your regex is that you're matching DOT on LHS and a DOT or HASH or end of input on RHS of match. After matching that internal pointer moves ahead leaving no DOT to be matched for next word.
(?=\.|#|$) is a positive lookahead that doesn't match these characters but just looks ahead so pointer remains at the cat instead of DOT after cat..

Strange behavior of preg_match_all php

I have a very long string of html. From this string I want to parse pairs of rus and eng names of cities. Example of this string is:
$html = '
Абакан
Хакасия республика
Абан
Красноярский край
Абатский
Тюменская область
';
My code is:
$subject = $this->html;
$pattern = '/<a href="([\/a-zA-Z0-9-"]*)">([а-яА-Я]*)/';
preg_match_all($pattern, $subject, $matches);
For trying I use regexer . You can see it here http://regexr.com/399co
On the test used global modifier - /g
Because of in PHP we can't use /g modifier I use preg_match_all function. But result of preg_match_all is very strange:
Array
(
[0] => Array
(
[0] => <a href="/forecasts5000/russia/republic-khakassia/abakan">Абакан
[1] => <a href="/forecasts5000/russia/krasnoyarsk-territory/aban">Абан
[2] => <a href="/forecasts5000/russia/tyumen-area/abatskij">Аба�
[3] => <a href="/forecasts5000/russia/arkhangelsk-area/abramovskij-ma">Аб�
)
[1] => Array
(
[0] => /forecasts5000/russia/republic-khakassia/abakan
[1] => /forecasts5000/russia/krasnoyarsk-territory/aban
[2] => /forecasts5000/russia/tyumen-area/abatskij
[3] => /forecasts5000/russia/arkhangelsk-area/abramovskij-ma
)
[2] => Array
(
[0] => Абакан
[1] => Абан
[2] => Аба�
[3] => Аб�
)
)
First of all - it found only first match (but I need to get array with all matches)
The second - result is very strange for me. I want to get the next result:
pairs of /forecasts5000/russia/republic-khakassia/abakan and Абакан
What do I do wrong?
Element 0 of the result is an array of each of the full matches of the regexp. Element 1 is an array of all the matches for capture group 1, element 2 contains capture group 2, and so on.
You can invert this by using the PREG_SET_ORDER flag. Then element 0 will contain all the results from the first match, element 1 will contain all the results from the second match, and so on. Within each of these, [0] will be the full match, and the remaining elements will be the capture groups.
If you use this option, you can then get the information you want with:
foreach ($matches as $match) {
$url = $match[1];
$text = $match[2];
// Do something with $url and $text
}
You can also use T-Regx library which has separate methods for each case :)
pattern('<a href="([/a-zA-Z0-9-"]*)">([а-яА-Я]*)')
->match($this->html)
->forEach(function (Match $match) {
$match = $match->text();
$group = $match->group(1);
echo "Match $match with group $group"
});
I also has automatic delimiters

capturing group under capturing group?

Is possible to capturing group under capturing group so i can have an array like that
regex = (asd1).(lol1),(asd2).(asd2)
string = asd1.lol1,asd2.lol2
return_array[0]=>group[0]='asd1';
return_array[0]=>group[1]='lol1';
return_array[1]=>group[0]='asd2';
return_array[1]=>group[1]='lol2';
While using regular expressions can get what you want, you could also use strtok() to iterate through what seems to simply be comma separated sets:
$results = array();
$str = 'asd1.lol1,asd2.lol2';
$token = strtok($str, ',');
while ($token !== false) {
$results[] = explode('.', $token, 2);
$token = strtok(',');
}
Output:
Array
(
[0] => Array
(
[0] => asd1
[1] => lol1
)
[1] => Array
(
[0] => asd2
[1] => lol2
)
)
With regular expressions your pattern needs to only include the two terms surrounding a period, i.e.:
$pattern = '/(?<=^|,)(\w+)\.(\w+)/';
preg_match_all($pattern, $str, $result, PREG_SET_ORDER);
The (?<=^|,) is a look-behind assertion; it makes sure to only match what comes after if preceded by either the start of your search string or a comma, but it doesn't "consume" anything.
Output:
Array
(
[0] => Array
(
[0] => asd1.lol1
[1] => asd1
[2] => lol1
)
[1] => Array
(
[0] => asd2.lol2
[1] => asd2
[2] => lol2
)
)
You're probably looking for preg_match_all.
$regex = '/^((\w+)\.(\w+)),((\w+)\.(\w+))$/';
$string = 'asd1.lol1,asd2.lol2';
preg_match_all($regex, $string, $matches);
This function will create a 2-dimensional array, where the first dimension represents the matched groups (i.e. the parentheses, 0 contains the whole matched string though) and each have subarrays to all the matched lines (only 1 in this case).
[0] => ("asd1.lol1,asd2.lol2") // a view of $matches
[1] => ("asd1.lol1")
[2] => ("asd1")
[3] => ("lol1")
[4] => ("asd2.lol2")
[5] => ("asd2")
[6] => ("lol2")
Your best bet to have groups is to process the first dimension of the array that you want and to then process them further, i.e. get "asd1.lol1" from 1 and 4 and then process these further into asd1 and lol1.
You wouldn't need as many parentheses in your first run:
$regex = '/^(\w+\.\w+),(\w+\.\w+)$/';
will yield:
[0] => ("asd1.lol1,asd2.lol2")
[1] => ("asd1.lol1")
[2] => ("asd2.lol2")
Then you can split the array in 1 and 2 into more granular values.
Flags can be set to preg_match_all to order the output differently. Particularly, PREG_SET_ORDER allows you to have all matched instances in the same subarray. This is of little importance if you're only processing one string, but if you're matching a pattern in a text, it might be more convenient to have all info about one match in $matches[0], and so forth.
Note that if you're just separating a string by comma and then by any periods, you might not need regular expressions and could conveniently use explode() as so:
$string = 'asd1.lol1,asd2.lol2';
$matches = explode(',', $string);
foreach($matches as &$match) {
$match = explode('.', $match);
}
This will give you exactly what you want, but do note that you don't have as much control over the process as with regular expressions – for instance, asd1.lol1.lmao,asd2.lol2.rofl.hehe will also work and they'll produce bigger arrays than you may want. You can check with count() on the size of the subarray and handle the cases when the array isn't of the appropriate size, though. I still believe that's more comfortable than using regular expressions.

Regular Expression with wordpress shortcodes

I'm trying to find all shortcodes within a string which looks like this:
 [a_col] One
 [/a_col]
outside
[b_col]
Two
[/b_col] [c_col] Three [/c_col]
I need the content (eg "Three") and the letter from the col (a, b or c)
Here's the expression I'm using
preg_match_all('#\[(a|b|c)_col\](.*)\[\/\1_col\]#m', $string, $hits);
but $hits contains only the last one.
The content can have any character even "[" or "]"
EDIT:
I would like to get "outside" as well which can be any string (except these cols). How can I handle that or should I parse this in a second step?
This will capture anything in the content, as well as attributes, and will allow any characters in the content.
<?php
$input = '[a_col some="thing"] One[/a_col]
[b_col] Two [/b_col]
[c_col] [Three] [/c_col] ';
preg_match_all('#\[(a|b|c)_col([^\[]*)\](.*?)\[\/\1_col\]#msi', $input, $matches);
print_r($matches);
?>
EDIT:
You may want to then trim the matches, since it appears there may be some whitespace. Alternatively, you can use regex for removing the whitespace in the content:
preg_match_all('#\[(a|b|c)_col([^\[]*)\]\s*(.*?)\s*\[\/\1_col\]#msi', $input, $matches);
OUTPUT:
Array
(
[0] => Array
(
[0] => [a_col some="thing"] One[/a_col]
[1] => [b_col] Two [/b_col]
[2] => [c_col] [Three] [/c_col]
)
[1] => Array
(
[0] => a
[1] => b
[2] => c
)
[2] => Array
(
[0] => some="thing"
[1] =>
[2] =>
)
[3] => Array
(
[0] => One
[1] => Two
[2] => [Three]
)
)
It might also be helpful to use this for capturing the attribute names and values stored in $matches[2]. Consider $atts to be the first element in $matches[2]. Of course, would iterate over the array of attributes and perform this on each.
preg_match_all('#([^="\'\s]+)[\t ]*=[\t ]*("|\')(.*?)\2#', $atts, $att_matches);
This gives an array where the names are stored in $att_matches[1] and their corresponding values are stored in $att_matches[3].
use ((.|\n)*) instead of (.*) to capture multiple lines...
<?php
$string = "
[a_col] One
[/a_col]
[b_col]
Two
[/b_col] [c_col] Three [/c_col]";
preg_match_all('#\[(a|b|c)_col\]((.|\n)*)\[\/\1_col\]#m', $string, $hits);
echo "<textarea style='width:90%;height:90%;'>";
print_r($hits);
echo "</textarea>";
?>
I don't have an environment I can test with here but you could use a look behind and look ahead assertion and a back reference to match tags around the content. Something like this.
(?<=\[(\w)\]).*(?=\[\/\1\])

Get position of all matches in group

Consider the following example:
$target = 'Xa,a,aX';
$pattern = '/X((a),?)*X/';
$matches = array();
preg_match_all($pattern,$target,$matches,PREG_OFFSET_CAPTURE|PREG_PATTERN_ORDER);
var_dump($matches);
What it does is returning only the last 'a' in the series, but what I need is all the 'a's.
Particularly, I need the position of ALL EACH OF the 'a's inside the string separately, thus PREG_OFFSET_CAPTURE.
The example is much more complex, see the related question: pattern matching an array, not their elements per se
Thanks
It groups a single match since the regex X((a),?)*X matches the entire string. The last ((a),?) will be grouped.
What you want to match is an a that has an X before it (and the start of the string), has a comma ahead of it, or has an X ahead of it (and the end of the string).
$target = 'Xa,a,aX';
$pattern = '/(?<=^X)a|a(?=X$|,)/';
preg_match_all($pattern, $target, $matches, PREG_OFFSET_CAPTURE);
print_r($matches);
Output:
Array
(
[0] => Array
(
[0] => Array
(
[0] => a
[1] => 1
)
[1] => Array
(
[0] => a
[1] => 3
)
[2] => Array
(
[0] => a
[1] => 5
)
)
)
When your regex includes X, it matches once. It finds one large match with groups in it. What you want is many matches, each with its own position.
So, in my opinion the best you can do is simply search for /a/ or /a,?/ without any X. Then matches[0] will contain all appearances of 'a'
If you need them between X, pre-select this part of the string.

Categories