PHP Understanding callbacks - difference between preg_replace_callback() and preg_match_all() - php

I am a bit confused on the use of preg_replace_callback()
I have a $content with some URLs inside .
Previously I used
$content = preg_match_all( '/(http[s]?:[^\s]*)/i', $content, $links );
foreach ($links[1] as $link ) {
// we have the link. find image , download, replace the content with image
// echo '</br>LINK : '. $link;
$url = esc_url_raw( $link );
$url_name = parse_url($url);
$url_name = $description = $url_name['host'];// get rid of http://..
$url = 'http://somescriptonsite/v1/' . urlencode($url) . '?w=' . $width ;
}
return $url;
But what I really need is to REPLACE the original URL with my parsed URL...
So I tried the preg_replace_callback:
function o99_simple_parse($content){
$content = preg_replace_callback( '/(http[s]?:[^\s]*)/i', 'o99_simple_callback', $content );
return $content;
}
and :
function o99_simple_callback($url){
// how to get the URL which is actually the match? and width ??
$url = esc_url_raw( $link );
$url_name = parse_url($url);
$url_name = $description = $url_name['host'];// get rid of http://..
$url = 'http://something' . urlencode($url) . '?w=' . $width ;
return $url; // what i really need to replace
}
I assumed that the callback will work in a way that EVERY match will call the callback (recursively ?) and get back results , thus allowing for to replace on-the-fly the URLS in $content with the parsed $url from o99_simple_callbac().
But another question here (and especially this comment) triggered my doubts .
If the preg_replace_callback() actually pass the whole array of matches , then what is actually the difference between what I used previously ( preg_match_all() in first example ) and the callback example ?
What am I missing / misunderstanding ??
What would be the correct way of replacing the URLS found in $content with the parsed urls ?

The other answers may have been sufficient, but let me give you one more take using a simpler example.
Let's say we have the following data in $subject,
RECORD Male 1987-11-29 New York
RECORD Female 1987-07-13 Tennessee
RECORD Female 1990-04-14 New York
and the following regular expression in $pattern,
/RECORD (Male|Female) (\d\d\d\d)-(\d\d)-(\d\d) ([\w ]+)/
Let's compare three approaches.
preg_match_all
First, the vanilla preg_match_all:
preg_match_all($pattern, $subject, $matches);
Here's what $matches comes out to be:
Array
(
[0] => Array
(
[0] => RECORD Male 1987-11-29 New York
[1] => RECORD Female 1987-07-13 Tennessee
[2] => RECORD Female 1990-04-14 New York
)
[1] => Array
(
[0] => Male
[1] => Female
[2] => Female
)
[2] => Array
(
[0] => 1987
[1] => 1987
[2] => 1990
)
[3] => Array
(
[0] => 11
[1] => 07
[2] => 04
)
[4] => Array
(
[0] => 29
[1] => 13
[2] => 14
)
[5] => Array
(
[0] => New York
[1] => Tennessee
[2] => New York
)
)
Whether we're talking about the gender field in my example with the URL field in your example, it's clear that looping through $matches[1] iterates through just that field:
foreach ($matches[1] as $match)
{
$gender = $match;
// ...
}
However, as you noticed, changes you make to $matches[1], even if you iterated through its subarrays by reference, do not reflect in $subject, i.e. you cannot perform replacements via preg_match_all.
preg_match_all with PREG_SET_ORDER
Before we jump into preg_replace_callback though, let's take a look at one of preg_match_all's commonly used flags, PREG_SET_ORDER.
preg_match_all($pattern, $subject, $matches, PREG_SET_ORDER);
This outputs something (seemingly) completely different!
Array
(
[0] => Array
(
[0] => RECORD Male 1987-11-29 New York
[1] => Male
[2] => 1987
[3] => 11
[4] => 29
[5] => New York
)
[1] => Array
(
[0] => RECORD Female 1987-07-13 Tennessee
[1] => Female
[2] => 1987
[3] => 07
[4] => 13
[5] => Tennessee
)
[2] => Array
(
[0] => RECORD Female 1990-04-14 New York
[1] => Female
[2] => 1990
[3] => 04
[4] => 14
[5] => New York
)
)
Now, each subarray contains the set of capture groups per match, as opposed to the set of matches, per capture group. (In yet other words, this is the transpose of the other array.) If you wanted to play with the gender (or URL) of each match, you'd now have to write this:
foreach ($matches as $match)
{
$gender = $match[1];
// ...
}
preg_replace_callback
And that's what preg_replace_callback is like. It calls the callback for each set of matches (that is, including all its capture groups, at once), as if you were using the PREG_SET_ORDER flag. That is, contrast the way preg_replace_callback is used,
preg_replace_callback($pattern, $subject, 'my_callback');
function my_callback($matches)
{
$gender = $match[1];
// ...
return $gender;
}
to the PREG_SET_ORDER example. Note how the two examples iterate through matches in exactly the same way, the only difference being that preg_replace_callback gives you an opportunity to return a value for replacement.

It does not pass all matches, but invokes the callback for each match. The callback won't receive a single string parameter, but a list of strings. $match[0] is the whole match, and $match[1] the first capture group (what's in your regex between the first parens).
So this is how your callback should look:
function o99_simple_callback($match){
$url = $match[1];
//$url = esc_url_raw( $link );
$url_name = parse_url($url);
$url_name = $description = $url_name['host'];// get rid of http://..
$url = 'http://something' . urlencode($url) . '?w=' . $width ;
return $url; // what i really need to replace
}
Please also see the manual examples on preg_replace_callback

preg_replace_callback
Using preg_replace_callback() to Replace Patterns
Generating replacement strings with a callback function
Generating replacement strings with an anonymous function

Related

Flatten array of regular expressions

I have an array of regular expressions -$toks:
Array
(
[0] => /(?=\D*\d)/
[1] => /\b(waiting)\b/i
[2] => /^(\w+)/
[3] => /\b(responce)\b/i
[4] => /\b(from)\b/i
[5] => /\|/
[6] => /\b(to)\b/i
)
When I'm trying to flatten it:
$patterns_flattened = implode('|', $toks);
I get a regex:
/(?=\D*\d)/|/\b(waiting)\b/i|/^(\w+)/|/\b(responce)\b/i|/\b(from)\b/i|/\|/|/\b(to)\b/i
When I'm trying to:
if (preg_match('/'. $patterns_flattened .'/', 'I'm waiting for a response from', $matches)) {
print_r($matches);
}
I get an error:
Warning: preg_match(): Unknown modifier '(' in ...index.php on line
Where is my mistake?
Thanks.
You need to remove the opening and closing slashes, like this:
$toks = [
'(?=\D*\d)',
'\b(waiting)\b',
'^(\w+)',
'\b(response)\b',
'\b(from)\b',
'\|',
'\b(to)\b',
];
And then, I think you'll want to use preg_match_all instead of preg_match:
$patterns_flattened = implode('|', $toks);
if (preg_match_all("/$patterns_flattened/i", "I'm waiting for a response from", $matches)) {
print_r($matches[0]);
}
If you get the first element instead of all elements, it'll return the whole matches of each regex:
Array
(
[0] => I
[1] => waiting
[2] => response
[3] => from
)
Try it on 3v41.org
<?php
$data = Array
(
0 => '/(?=\D*\d)/',
1 => '/\b(waiting)\b/i',
2 => '/^(\w+)/',
3 => '/\b(responce)\b/i',
4 => '/\b(from)\b/i',
5 => '/\|/',
6 => '/\b(to)\b/i/'
);
$patterns_flattened = implode('|', $data);
$regex = str_replace("/i",'',$patterns_flattened);
$regex = str_replace('/','',$regex);
if (preg_match_all( '/'.$regex.'/', "I'm waiting for a responce from", $matches)) {
echo '<pre>';
print_r($matches[0]);
}
You have to remove the slashes from your regex and also the i parameter in order to make it work. That was the reason it was breaking.
A really nice tool to actually validate your regex is this :
https://regexr.com/
I always use that when i have to make a bigger than usual regular expression.
The output of the above code is :
Array
(
[0] => I
[1] => waiting
[2] => responce
[3] => from
)
There are a few adjustments to make with your $tok array.
To remove the error, you need to remove the pattern delimiters and pattern modifiers from each array element.
None of the capture grouping is necessary, in fact, it will lead to a higher step count and create unnecessary output array bloat.
Whatever your intention is with (?=\D*\d), it needs a rethink. If there is a number anywhere in your input string, you are potentially going to generate lots of empty elements which surely can't have any benefit for your project. Look at what happens when I put a space then 1 after from in your input string.
Here is my recommendation: (PHP Demo)
$toks = [
'\bwaiting\b',
'^\w+',
'\bresponse\b',
'\bfrom\b',
'\|',
'\bto\b',
];
$pattern = '/' . implode('|', $toks) . '/i';
var_export(preg_match_all($pattern, "I'm waiting for a response from", $out) ? $out[0] : null);
Output:
array (
0 => 'I',
1 => 'waiting',
2 => 'response',
3 => 'from',
)

Strange behavior of preg_match_all php

I have a very long string of html. From this string I want to parse pairs of rus and eng names of cities. Example of this string is:
$html = '
Абакан
Хакасия республика
Абан
Красноярский край
Абатский
Тюменская область
';
My code is:
$subject = $this->html;
$pattern = '/<a href="([\/a-zA-Z0-9-"]*)">([а-яА-Я]*)/';
preg_match_all($pattern, $subject, $matches);
For trying I use regexer . You can see it here http://regexr.com/399co
On the test used global modifier - /g
Because of in PHP we can't use /g modifier I use preg_match_all function. But result of preg_match_all is very strange:
Array
(
[0] => Array
(
[0] => <a href="/forecasts5000/russia/republic-khakassia/abakan">Абакан
[1] => <a href="/forecasts5000/russia/krasnoyarsk-territory/aban">Абан
[2] => <a href="/forecasts5000/russia/tyumen-area/abatskij">Аба�
[3] => <a href="/forecasts5000/russia/arkhangelsk-area/abramovskij-ma">Аб�
)
[1] => Array
(
[0] => /forecasts5000/russia/republic-khakassia/abakan
[1] => /forecasts5000/russia/krasnoyarsk-territory/aban
[2] => /forecasts5000/russia/tyumen-area/abatskij
[3] => /forecasts5000/russia/arkhangelsk-area/abramovskij-ma
)
[2] => Array
(
[0] => Абакан
[1] => Абан
[2] => Аба�
[3] => Аб�
)
)
First of all - it found only first match (but I need to get array with all matches)
The second - result is very strange for me. I want to get the next result:
pairs of /forecasts5000/russia/republic-khakassia/abakan and Абакан
What do I do wrong?
Element 0 of the result is an array of each of the full matches of the regexp. Element 1 is an array of all the matches for capture group 1, element 2 contains capture group 2, and so on.
You can invert this by using the PREG_SET_ORDER flag. Then element 0 will contain all the results from the first match, element 1 will contain all the results from the second match, and so on. Within each of these, [0] will be the full match, and the remaining elements will be the capture groups.
If you use this option, you can then get the information you want with:
foreach ($matches as $match) {
$url = $match[1];
$text = $match[2];
// Do something with $url and $text
}
You can also use T-Regx library which has separate methods for each case :)
pattern('<a href="([/a-zA-Z0-9-"]*)">([а-яА-Я]*)')
->match($this->html)
->forEach(function (Match $match) {
$match = $match->text();
$group = $match->group(1);
echo "Match $match with group $group"
});
I also has automatic delimiters

capturing group under capturing group?

Is possible to capturing group under capturing group so i can have an array like that
regex = (asd1).(lol1),(asd2).(asd2)
string = asd1.lol1,asd2.lol2
return_array[0]=>group[0]='asd1';
return_array[0]=>group[1]='lol1';
return_array[1]=>group[0]='asd2';
return_array[1]=>group[1]='lol2';
While using regular expressions can get what you want, you could also use strtok() to iterate through what seems to simply be comma separated sets:
$results = array();
$str = 'asd1.lol1,asd2.lol2';
$token = strtok($str, ',');
while ($token !== false) {
$results[] = explode('.', $token, 2);
$token = strtok(',');
}
Output:
Array
(
[0] => Array
(
[0] => asd1
[1] => lol1
)
[1] => Array
(
[0] => asd2
[1] => lol2
)
)
With regular expressions your pattern needs to only include the two terms surrounding a period, i.e.:
$pattern = '/(?<=^|,)(\w+)\.(\w+)/';
preg_match_all($pattern, $str, $result, PREG_SET_ORDER);
The (?<=^|,) is a look-behind assertion; it makes sure to only match what comes after if preceded by either the start of your search string or a comma, but it doesn't "consume" anything.
Output:
Array
(
[0] => Array
(
[0] => asd1.lol1
[1] => asd1
[2] => lol1
)
[1] => Array
(
[0] => asd2.lol2
[1] => asd2
[2] => lol2
)
)
You're probably looking for preg_match_all.
$regex = '/^((\w+)\.(\w+)),((\w+)\.(\w+))$/';
$string = 'asd1.lol1,asd2.lol2';
preg_match_all($regex, $string, $matches);
This function will create a 2-dimensional array, where the first dimension represents the matched groups (i.e. the parentheses, 0 contains the whole matched string though) and each have subarrays to all the matched lines (only 1 in this case).
[0] => ("asd1.lol1,asd2.lol2") // a view of $matches
[1] => ("asd1.lol1")
[2] => ("asd1")
[3] => ("lol1")
[4] => ("asd2.lol2")
[5] => ("asd2")
[6] => ("lol2")
Your best bet to have groups is to process the first dimension of the array that you want and to then process them further, i.e. get "asd1.lol1" from 1 and 4 and then process these further into asd1 and lol1.
You wouldn't need as many parentheses in your first run:
$regex = '/^(\w+\.\w+),(\w+\.\w+)$/';
will yield:
[0] => ("asd1.lol1,asd2.lol2")
[1] => ("asd1.lol1")
[2] => ("asd2.lol2")
Then you can split the array in 1 and 2 into more granular values.
Flags can be set to preg_match_all to order the output differently. Particularly, PREG_SET_ORDER allows you to have all matched instances in the same subarray. This is of little importance if you're only processing one string, but if you're matching a pattern in a text, it might be more convenient to have all info about one match in $matches[0], and so forth.
Note that if you're just separating a string by comma and then by any periods, you might not need regular expressions and could conveniently use explode() as so:
$string = 'asd1.lol1,asd2.lol2';
$matches = explode(',', $string);
foreach($matches as &$match) {
$match = explode('.', $match);
}
This will give you exactly what you want, but do note that you don't have as much control over the process as with regular expressions – for instance, asd1.lol1.lmao,asd2.lol2.rofl.hehe will also work and they'll produce bigger arrays than you may want. You can check with count() on the size of the subarray and handle the cases when the array isn't of the appropriate size, though. I still believe that's more comfortable than using regular expressions.

multiple patterns with preg_match

My situation is: I'm processing an array word by word. What I'm hoping to do and
working on, is to capture a certain word. But for that I need to test two patterns or more with preg-match.
This is my code :
function search_array($array)
{
$pattern = '[A-Z]{1,3}[0-9]{1,3}[A-Z]{1,2}[0-9]{1,2}[A-Z]?';
$pattern2 = '[A-Z]{1,7}[0-9]{1,2}';
$patterns = array($pattern, $pattern2);
$regex = '/(' .implode('|', $patterns) .')/i';
foreach ($array as $str) {
if (preg_match ($regex, $str, $m)){
$matches[] = $m[1];
return $matches[0];
}
}
}
Example of array I could have :
Array ( [0] => X [1] => XXXXXXX [2] => XXX [3] => XXXX [4] => ABC01DC4 )
Array ( [0] => X [1] => XXXXXXX [2] => XXX [3] => ABCDEF4 [4] => XXXX [5] => XX )
Words I would like to catch :
-In the first array : ABC01DC4
-In the second array : ABCDEF4
The problem is not the pattern itself, it's the syntax to use multiple pattern in the same pregmatch
Your code worked with me, and I didn't find any problem with the code or the REGEX. Furthermore, the description you provided is not enough to understand your needs.
However, I have guessed one problem after observing your code, which is, you didn't use any anchor(^...$) to perform matching the whole string. Your regex can find match for these inputs: %ABC01DC4V or ABCDEF4EE. So change this line with your code:
$regex = '/^(' .implode('|', $patterns) .')$/i';
-+- -+-

Regex to find sequential integers

I am having a difficult time getting my regular expression code to work properly in PHP. Here is my code:
$array = array(); // Used to satisfy the 3rd argument requirment of preg_match_all.
$regex = '/(012|345|678|987|654|321|123|456|789|876|543|210|234|567|765|432)/';
$subject = '123456';
echo preg_match_all($regex, $subject, $array).'<br />';
print_r($array);
When this code is ran it will output:
2
Array
(
[0] => Array
(
[0] => 123
[1] => 456
)
[1] => Array
(
[0] => 123
[1] => 456
)
)
What can I do so that it will match 123, 234, 345 and 456?
Thanks in advance!
Regex is not the right tool for this job (it's not going to return "sub-matches"). Simply use strpos in a loop.
$subject = '123456';
$seqs = array('012', '345', '678', '987', '654', '321', '123', '456', '234');
foreach ($seqs as $seq) {
if (strpos($subject, $seq) !== false) {
// found
}
}
$regex = '/(?=(012|345|678|987|654|321|123|456|789|876|543|210|234|567|765|432))/';
$subject = '123456';
preg_match_all($regex, $subject, $array);
print_r($array[1]);
output:
Array
(
[0] => 123
[1] => 234
[2] => 345
[3] => 456
)
You're trying to retrieve matches that overlap each other in the subject string, which in general is not possible. However, in many cases you can fake it by wrapping the whole regex in a capturing group, then wrapping that in a lookahead. Because the lookahead doesn't consume any characters when it matches, the regex engine manually bumps forward one position after each successful match, to avoid getting stuck in an infinite loop. But capturing groups still work, so you can retrieve the captured text in the usual way.
Notice that I only printed the contents of the first capturing group ($array[1]). If I had printed the whole array of arrays ($array), it would have looked like this:
Array
(
[0] => Array
(
[0] =>
[1] =>
[2] =>
[3] =>
)
[1] => Array
(
[0] => 123
[1] => 234
[2] => 345
[3] => 456
)
)
see it in action on ideone
It can be done with regular expressions. The problem with your original code is that as soon as a match occurs, the character is consumed and the regular expression will not backtrack. Here's one way to do it:
$array = array(); // Used to satisfy the 3rd argument requirment of preg_match_all.
$regex = '/012|345|678|987|654|321|123|456|789|876|543|210|234|567|765|432/';
$subject = '123456';
$tempSubject = $subject;
$finalAnswer = array();
do {
$matched = preg_match($regex, $tempSubject, $array);
$finalAnswer = array_merge($finalAnswer, $array);
$tempSubject = substr($tempSubject, 1);
} while ($matched && (strlen($tempSubject >= 3)));
print_r($finalAnswer);
As suggested in another answer, however, regular expressions might not be the correct tool to use in this situation, depending on your larger goal. In addition, the above code may not be the most efficient way (wrt memory or wrt performance) to solve this with regular expressions. It's just a striaghtforward fulfill-the-requirement solution.
Yeah it's a hack but you can use RegEx
<?php
$subject = '123456';
$rs = findmatches($subject);
echo '<pre>'.print_r($rs,true).'</pre><br />';
function findmatches($x) {
$regex = '/(\d{3})/';
// Loop through the subject string
for($counter = 0; $counter <= strlen($x); $counter++) {
$y = substr($x, $counter);
if(preg_match_all($regex, $y, $array)) {
$rs_array[$counter] = array_unique($array);
}
}
// Parse results array
foreach($rs_array as $tmp_arr) {
$rs[] = $tmp_arr[0][0];
}
return $rs;
}
?>
Returns:
Array
(
[0] => 123
[1] => 234
[2] => 345
[3] => 456
)
NOTE: This would only work with concurrent numbers

Categories