Strange behavior of preg_match_all php - php

I have a very long string of html. From this string I want to parse pairs of rus and eng names of cities. Example of this string is:
$html = '
Абакан
Хакасия республика
Абан
Красноярский край
Абатский
Тюменская область
';
My code is:
$subject = $this->html;
$pattern = '/<a href="([\/a-zA-Z0-9-"]*)">([а-яА-Я]*)/';
preg_match_all($pattern, $subject, $matches);
For trying I use regexer . You can see it here http://regexr.com/399co
On the test used global modifier - /g
Because of in PHP we can't use /g modifier I use preg_match_all function. But result of preg_match_all is very strange:
Array
(
[0] => Array
(
[0] => <a href="/forecasts5000/russia/republic-khakassia/abakan">Абакан
[1] => <a href="/forecasts5000/russia/krasnoyarsk-territory/aban">Абан
[2] => <a href="/forecasts5000/russia/tyumen-area/abatskij">Аба�
[3] => <a href="/forecasts5000/russia/arkhangelsk-area/abramovskij-ma">Аб�
)
[1] => Array
(
[0] => /forecasts5000/russia/republic-khakassia/abakan
[1] => /forecasts5000/russia/krasnoyarsk-territory/aban
[2] => /forecasts5000/russia/tyumen-area/abatskij
[3] => /forecasts5000/russia/arkhangelsk-area/abramovskij-ma
)
[2] => Array
(
[0] => Абакан
[1] => Абан
[2] => Аба�
[3] => Аб�
)
)
First of all - it found only first match (but I need to get array with all matches)
The second - result is very strange for me. I want to get the next result:
pairs of /forecasts5000/russia/republic-khakassia/abakan and Абакан
What do I do wrong?

Element 0 of the result is an array of each of the full matches of the regexp. Element 1 is an array of all the matches for capture group 1, element 2 contains capture group 2, and so on.
You can invert this by using the PREG_SET_ORDER flag. Then element 0 will contain all the results from the first match, element 1 will contain all the results from the second match, and so on. Within each of these, [0] will be the full match, and the remaining elements will be the capture groups.
If you use this option, you can then get the information you want with:
foreach ($matches as $match) {
$url = $match[1];
$text = $match[2];
// Do something with $url and $text
}

You can also use T-Regx library which has separate methods for each case :)
pattern('<a href="([/a-zA-Z0-9-"]*)">([а-яА-Я]*)')
->match($this->html)
->forEach(function (Match $match) {
$match = $match->text();
$group = $match->group(1);
echo "Match $match with group $group"
});
I also has automatic delimiters

Related

capturing group under capturing group?

Is possible to capturing group under capturing group so i can have an array like that
regex = (asd1).(lol1),(asd2).(asd2)
string = asd1.lol1,asd2.lol2
return_array[0]=>group[0]='asd1';
return_array[0]=>group[1]='lol1';
return_array[1]=>group[0]='asd2';
return_array[1]=>group[1]='lol2';
While using regular expressions can get what you want, you could also use strtok() to iterate through what seems to simply be comma separated sets:
$results = array();
$str = 'asd1.lol1,asd2.lol2';
$token = strtok($str, ',');
while ($token !== false) {
$results[] = explode('.', $token, 2);
$token = strtok(',');
}
Output:
Array
(
[0] => Array
(
[0] => asd1
[1] => lol1
)
[1] => Array
(
[0] => asd2
[1] => lol2
)
)
With regular expressions your pattern needs to only include the two terms surrounding a period, i.e.:
$pattern = '/(?<=^|,)(\w+)\.(\w+)/';
preg_match_all($pattern, $str, $result, PREG_SET_ORDER);
The (?<=^|,) is a look-behind assertion; it makes sure to only match what comes after if preceded by either the start of your search string or a comma, but it doesn't "consume" anything.
Output:
Array
(
[0] => Array
(
[0] => asd1.lol1
[1] => asd1
[2] => lol1
)
[1] => Array
(
[0] => asd2.lol2
[1] => asd2
[2] => lol2
)
)
You're probably looking for preg_match_all.
$regex = '/^((\w+)\.(\w+)),((\w+)\.(\w+))$/';
$string = 'asd1.lol1,asd2.lol2';
preg_match_all($regex, $string, $matches);
This function will create a 2-dimensional array, where the first dimension represents the matched groups (i.e. the parentheses, 0 contains the whole matched string though) and each have subarrays to all the matched lines (only 1 in this case).
[0] => ("asd1.lol1,asd2.lol2") // a view of $matches
[1] => ("asd1.lol1")
[2] => ("asd1")
[3] => ("lol1")
[4] => ("asd2.lol2")
[5] => ("asd2")
[6] => ("lol2")
Your best bet to have groups is to process the first dimension of the array that you want and to then process them further, i.e. get "asd1.lol1" from 1 and 4 and then process these further into asd1 and lol1.
You wouldn't need as many parentheses in your first run:
$regex = '/^(\w+\.\w+),(\w+\.\w+)$/';
will yield:
[0] => ("asd1.lol1,asd2.lol2")
[1] => ("asd1.lol1")
[2] => ("asd2.lol2")
Then you can split the array in 1 and 2 into more granular values.
Flags can be set to preg_match_all to order the output differently. Particularly, PREG_SET_ORDER allows you to have all matched instances in the same subarray. This is of little importance if you're only processing one string, but if you're matching a pattern in a text, it might be more convenient to have all info about one match in $matches[0], and so forth.
Note that if you're just separating a string by comma and then by any periods, you might not need regular expressions and could conveniently use explode() as so:
$string = 'asd1.lol1,asd2.lol2';
$matches = explode(',', $string);
foreach($matches as &$match) {
$match = explode('.', $match);
}
This will give you exactly what you want, but do note that you don't have as much control over the process as with regular expressions – for instance, asd1.lol1.lmao,asd2.lol2.rofl.hehe will also work and they'll produce bigger arrays than you may want. You can check with count() on the size of the subarray and handle the cases when the array isn't of the appropriate size, though. I still believe that's more comfortable than using regular expressions.

How to get a particular string using preg_replace?

i want to get a particular value from string in php. Following is the string
$string = 'users://data01=[1,2]/data02=[2,3]/*';
preg_replace('/(.*)\[(.*)\](.*)\[(.*)\](.*)/', '$2', $str);
i want to get value of data01. i mean [1,2].
How can i achieve this using preg_replace?
How can solve this ?
preg_replace() is the wrong tool, I have used preg_match_all() in case you need that other item later and trimmed down your regex to capture the part of the string you are looking for.
$string = 'users://data01=[1,2]/data02=[2,3]/*';
preg_match_all('/\[([0-9,]+)\]/',$string,$match);
print_r($match);
/*
print_r($match) output:
Array
(
[0] => Array
(
[0] => [1,2]
[1] => [2,3]
)
[1] => Array
(
[0] => 1,2
[1] => 2,3
)
)
*/
echo "Your match: " . $match[1][0];
?>
This enables you to have the captured characters or the matched pattern , so you can have [1,2] or just 1,2
preg_replace is used to replace by regular expression!
I think you want to use preg_match_all() to get each data attribute from the string.
The regex you want is:
$string = 'users://data01=[1,2]/data02=[2,3]/*';
preg_match_all('#data[0-9]{2}=(\[[0-9,]+\])#',$string,$matches);
print_r($matches);
Array
(
[0] => Array
(
[0] => data01=[1,2]
[1] => data02=[2,3]
)
[1] => Array
(
[0] => [1,2]
[1] => [2,3]
)
)
I have tested this as working.
preg_replace is for replacing stuff. preg_match is for extracting stuff.
So you want:
preg_match('/(.*?)\[(.*?)\](.*?)\[(.*?)\](.*)/', $str, $match);
var_dump($match);
See what you get, and work from there.

Finding the no of occurence of a string inside another string using regex in PHP?

I want to find the no of occurences of a sustring(pattern based) inside another string.
For example:
$mystring = "|graboard='KERALA'||graboarded='KUSAT'||graboard='MG'";
I want to find the no of graboards present in the $mystring,
So I used the regex for this, But how will I find the no of occurrence?
If you must use a regex, preg_match_all() returns the number of matches.
Use preg_match_all:
$mystring = "|graboard='KERALA'||graboarded='KUSAT'||graboard='MG'";
preg_match_all("/(graboard)='(.+?)'/i", $mystring, $matches);
print_r($matches);
will yield:
Array
(
[0] => Array
(
[0] => graboard='KERALA'
[1] => graboard='MG'
)
[1] => Array
(
[0] => graboard
[1] => graboard
)
[2] => Array
(
[0] => KERALA
[1] => MG
)
)
So then you can use count($matches[1]) -- however, this regex may need to be modified to suit your needs, but this is just a basic example.
Just use preg_match_all():
// The string.
$mystring="|graboard='KERALA'||graboarded='KUSAT'||graboard='MG'";
// The `preg_match_all()`.
preg_match_all('/graboard/is', $mystring, $matches);
// Echo the count of `$matches` generated by `preg_match_all()`.
echo count($matches[0]);
// Dumping the content of `$matches` for verification.
echo '<pre>';
print_r($matches);
echo '</pre>';

Pattern for preg_match

I have a string contains the following pattern "[link:activate/$id/$test_code]" I need to get the word activate, $id and $test_code out of this when the pattern [link.....] occurs.
I also tried getting the inside items by using grouping but only gets active and $test_code couldn't get $id. Please help me to get all the parameter and action name in array.
Below is my code and output
Code
function match_test()
{
$string = "Sample string contains [link:activate/\$id/\$test_code] again [link:anotheraction/\$key/\$second_param]]] also how the other ationc like [link:action] works";
$pattern = '/\[link:([a-z\_]+)(\/\$[a-z\_]+)+\]/i';
preg_match_all($pattern,$string,$matches);
print_r($matches);
}
Output
Array
(
[0] => Array
(
[0] => [link:activate/$id/$test_code]
[1] => [link:anotheraction/$key/$second_param]
)
[1] => Array
(
[0] => activate
[1] => anotheraction
)
[2] => Array
(
[0] => /$test_code
[1] => /$second_param
)
)
Try this:
$subject = <<<'LOD'
Sample string contains [link:activate/$id/$test_code] again [link:anotheraction/$key/$second_param]]] also how the other ationc like [link:action] works
LOD;
$pattern = '~\[link:([a-z_]+)((?:/\$[a-z_]+)*)]~i';
preg_match_all($pattern, $subject, $matches);
print_r($matches);
if you need to have \$id and \$test_code separated you can use this instead:
$pattern = '~\[link:([a-z_]+)(/\$[a-z_]+)?(/\$[a-z_]+)?]~i';
Is this what you are looking for?
/\[link:([\w\d]+)\/(\$[\w\d]+)\/(\$[\w\d]+)\]/
Edit:
Also the problem with your expression is this part:
(\/\$[a-z\_]+)+
Although you have repeated the group, the match will only return one because it is still only one group declaration. The regex won't invent matching group numbers for you (Not that i've ever seen anyway).

Nested pattern matching with preg_match_all (Regex and PHP)

I'm working with text data that contains special flags in the form of "{X}" or "{XX}" where X could be any alphanumeric character. Special meaning is assigned to these flags when they are adjacent or when they are separated. I need a regex which will match adjacent flags AND separate each flag in the group.
For Example, given the following input:
{B}{R}: Target player loses 1 life.
{W}{G}{U}: Target player gains 5 life.
The output should be approximate:
("{B}{R}",
"{W}{G}{U}")
("{B}",
"{R}")
("{W}",
"{G}",
"{U}")
My PHP code is returning the adjacents array properly, but the split array contains only the last matching flag in each group:
$input = '{B}{R}: Target player loses 1 life.
{W}{G}{U}: Target player gains 5 life.';
$pattern = '#((\{[a-zA-Z0-9]{1,2}})+)#';
preg_match_all($pattern, $input, $results);
print_r($results);
Output:
Array
(
[0] => Array
(
[0] => {B}{R}
[1] => {W}{G}{U}
)
[1] => Array
(
[0] => {B}{R}
[1] => {W}{G}{U}
)
[2] => Array
(
[0] => {R}
[1] => {U}
)
)
Thanks for any help!
unset($results[1]);
foreach($results[0] AS $match){
preg_match_all('/\{[a-zA-Z0-9]{1,2}}/', $match, $r);
$results[] = $r[0];
}
That's the only way I know of to create your Required datastructure. Though, a preg_split would work as well:
unset($results[1]);
foreach($results[0] AS $match)
$results[] = preg_split('/(?<=})(?=\{)/', $match);

Categories