Iterating over matches from preg_match_all - php

I am trying to figure out the mechanics of this plugin in WordPress.
I have a preg_match_all function that looks like this:
preg_match_all('/(?<=\\[\\[).+?(?=\\]\\])/', $content, $matches, PREG_PATTERN_ORDER);
$numMatches = count($matches[0]);
for ($i = 0; $i < $numMatches; $i++) {
$postSlug = $matches[0][$i];
}
If I understand this correctly, count($matches[0]) assumes there is only one match in $content.
My goal here is to re-write the for statement to allow for the full array of matches in the preg_match_all script.
I'm assuming I should replace the for statement with foreach ($matches as $postSlug) and not even bother with the confusing $matches[0][$i] at the end.
Unfortunately the final output does not seem to loop through each element in the array. Any ideas? Thanks!

If I understand this correctly, count($matches[0] assumes there is only one match in $content.
Not quite; $matches[0] represents the array of matches in of the whole regular expression (as opposed to, say, $matches[1], which would be the array of matches in the first match group of the regular expression). Thus, count($matches[0]) is the number of matches in he first match group.
You could do what you've said and rewrite the for loop as a foreach loop, but this likely won't change anything, as both methods should traverse all elements in $matches[0]. Are you certain that the results you're looking for are matched in your regular expression?

If you do want to rewrite this code, then I suggest you look into PREG_SET_ORDER as last argument, instead of PREG_PATTERN_ORDER. This groups the result array by results first, and with match groups in the second level.
Then you can just loop over it as follows:
foreach ($matches as $matchgroup) {
$postslug = $matchgroup[0];
}
You still need the [0] to get the "complete match". If your pattern had any (..) groups then [1] and [2] would correspond to those..

Related

Regular Expression For Time string [duplicate]

I need to print all matches using preg_match_all.
$search = preg_match_all($pattern, $string, $matches);
foreach ($matches as $match) {
echo $match[0];
echo $match[1];
echo $match[...];
}
The problem is I don't know how many matches there in my string, and even if I knew and if it was 1000 that would be pretty dumb to type all those $match[]'s.
The $match[0], $match[1], etc., items are not the individual matches, they're the "captures".
Regardless of how many matches there are, the number of entries in $matches is constant, because it's based on what you're searching for, not the results. There's always at least one entry, plus one more for each pair of capturing parentheses in the search pattern.
For example, if you do:
$matches = array();
$search = preg_match_all("/\D+(\d+)/", "a1b12c123", $matches);
print_r($matches);
Matches will have only two items, even though three matches were found. $matches[0] will be an array containing "a1", "b12" and "c123" (the entire match for each item) and $matches[1] will contain only the first capture for each item, i.e., "1", "12" and "123".
I think what you want is something more like:
foreach ($matches[1] as $match) {
echo $match;
}
Which will print out the first capture expression from each matched string.
Does print_r($matches) give you what you want?
You could loop recursively. This example requires SPL and PHP 5.1+ via RecursiveArrayIterator:
foreach( new RecursiveArrayIterator( $matches ) as $match )
print $match;

is it possible to find overlapping matches with a single regex?

Here's a sample which executes the preg_replace multiple times to find nested/overlapping matches:
$text = '[foo][foo][/foo][/foo]';
//1st: ^^^^^ ^^^^^^
//2nd: ^^^^^ ^^^^^^
//3rd: fails
do {
$text = preg_replace('~\[foo](.*?)\[/foo]~', '[bar]$1[/bar]', $text, -1, $replace_count);
} while ($replace_count);
echo $text; //'[bar][bar][/bar][/bar]'
I'm satisfied with the result the and behavior. However, it seems inefficient to scan through the whole string 3 times as in the example above. Is there any regex magic to do this in a single replace?
Conditions:
I can't simply replace ~\[(/)?foo]~ with [$1bar], I need to make sure there is a matching closing [/foo] tag after an opening [foo] tag and replace them both at a time. It doesn't matter whether they're nested or not. Unpaired [foo] and [/foo] should not be replaced.
In JS I could set the Regex object's lastIndex property to the beginning of the match so that it starts matching again from the beginning of the last match. I couldn't find any startIndex option for regex replacing in PHP, and working with substr()ing could also be inefficient. I've looked around whether PCRE would have an achor for "start next match at this position" or similar but I had no luck.
Is there a better approach?
To clarify on unpaired tags, given the input:
[foo][foo][/foo]
I'm fine with either [bar][foo][/bar] or [foo][bar][/bar] as output. The former is the legacy behavior.
A full regex solution is not possible for this specific case.
Your solution adapted to match paired tags (in the common sense):
$pattern = '~\[foo]((?>[^[]++|\[(?!/?foo]))*)\[/foo]~';
$result = $text;
do {
$result = preg_replace($pattern, '[bar]$1[/bar]', $result, -1, $count);
} while ($count);
Another way that parses the string only once:
$arr = preg_split('~(\[/?foo])~', $text, -1, PREG_SPLIT_DELIM_CAPTURE|PREG_SPLIT_NO_EMPTY);
$stack = array();
foreach ($arr as $key=>$item) {
if ($item == '[foo]') $stack[] = $key;
else if ($item == '[/foo]' && !empty($stack)) {
$arr[array_pop($stack)] = '[bar]';
$arr[$key] = '[/bar]';
}
}
$result = implode($arr);
the performance of this second script is independant of the depth.
To answer the title question, yes it is possible to find overlapping matches with a single regex, however, you can't perform a replacement with this kind of pattern, example:
$pattern = '~(?=(\[foo]((?>[^[]++|\[(?!/?foo)|(?1))*)\[/foo]))~';
preg_match_all($pattern, $text, $matches);
The trick is to use a lookahead and a capturing group. Note that the whole match is always an empty string, this is the reason why you can't use this pattern with preg_replace.
A better way to do this is to find the end [/foo] and backtrack until you find a begin [foo] or [foo(space).*]. Replace match region with something else and keep doing it until no ending is found. But with regular strpos/stripos or plain old substr, not regex.
It might be achievable with regex, but I've always done this kind of thing with regular seeks as it's also faster.

Regular expression for between two dynamic patterns

I want to find anything that matches
[^1] and [/^1]
Eg if the subject is like this
sometext[^1]abcdef[/^1]somemoretext[^2]12345[/^2]
I want to get back an array with abcdef and 12345 as the elements.
I read this
And I wrote this code and I am unable to advance past searching between []
<?php
$test = '[12345]';
getnumberfromstring($test);
function getnumberfromstring($text)
{
$pattern= '~(?<=\[)(.*?)(?=\])~';
$matches= array();
preg_match($pattern, $text, $matches);
var_dump($matches);
}
?>
Your test checks the string '[12345]' which does not apply for the rule of having an "opening" of [^digit] and a "closing" of [\^digit]. Also, you're using preg_match when you should be using: preg_match_all
Try this:
<?php
$test = 'sometext[^1]abcdef[/^1]somemoretext[^2]12345[/^2]';
getnumberfromstring($test);
function getnumberfromstring($text)
{
$pattern= '/(?<=\[\^\d\])(.*?)(?=\[\/\^\d\])/';
$matches= array();
preg_match_all($pattern, $text, $matches);
var_dump($matches);
}
?>
That other answer doesn't really apply to your case; your delimiters are more complex and you have to use part of the opening delimiter to match the closing one. Also, unless the numbers inside the tags are limited to one digit, you can't use a lookbehind to match the first one. You have to match the tags in the normal way and use a capturing group to extract the content. (Which is how I would have done it anyway. Lookbehind should never be the first tool you reach for.)
'~\[\^(\d+)\](.*?)\[/\^\1\]~'
The number from the opening delimiter is captured in the first group and the backreference \1 matches the same number, thus insuring that the delimiters are correctly paired. The text between the delimiters is captured in group #2.
I have tested following code in php 5.4.5:
<?php
$foo = 'sometext[^1]abcdef[/^1]somemoretext[^2]12345[/^2]';
function getnumberfromstring($text)
{
$matches= array();
# match [^1]...[/^1], [^2]...[/^2]
preg_match_all('/\[\^(\d+)\]([^\[\]]+)\[\/\^\1\]/', $text, $matches, PREG_SET_ORDER);
for($i = 0; $i < count($matches); ++$i)
printf("%s\n", $matches[$i][2]);
}
getnumberfromstring($foo);
?>
output:
abcdef
123456

PHP's preg_match() returns (.*?) value only

I am struggling to get preg_match to return only the image URL and not the entire matched string. Do I need to use preg_replace after or is that getting hacky?
Possibly, would a different syntax get me what I need?
$source = file_get_contents('http://mysite.co.uk');
preg_match_all("'<div id=\"my-div\"><img src=\"(.*?)\" /></div>'", $source, $match);
echo $match[0][0];
If you use echo $match[0][0] you will have all the text.
<div id="my-div"><img src="blabla bla" /></div>
If you write $match[1][0] instead, you will get your subpattern match:
blabla bla
If you're looking for the first instance, you don't need to use preg_match_all():
$source = file_get_contents('http://mysite.co.uk');
if (preg_match('#<div id="my-div"><img src="(.*?)" /></div>#', $source, $match)) {
echo $match[1];
} else {
// no match found
}
Note that this regex will not match across multiple lines.
If you need all matches, then you were on the right track, but you were using index 0 instead of 1, so:
preg_match_all(..., $match);
foreach ($match as $m) {
echo $m[1]; // Use 1 here instead of 0; 1 is the first capture group, where 0 is the entire matched string
}
By default, preg_match_all always returns the fully matched string as the first item (using the ordering type PREG_PATTERN_ORDER).
From the documentation for PREG_PATTERN_ORDER:
Orders results so that $matches[0] is an array of full pattern
matches, $matches[1] is an array of strings matched by the first
parenthesized subpattern, and so on.
If you're looking for the value of a capturing group, check for a value at index 1 and then use the capturing group reference as a subattribute.
E.g., capturing group 1 would be: $matches[1][0]
To change this behavior you can pass a constant to as the third argument, such as PREG_SET_ORDER, which "Orders results so that $matches[0] is an array of first set of matches, $matches[1] is an array of second set of matches, and so on."

preg_match_all print *all* matches

I need to print all matches using preg_match_all.
$search = preg_match_all($pattern, $string, $matches);
foreach ($matches as $match) {
echo $match[0];
echo $match[1];
echo $match[...];
}
The problem is I don't know how many matches there in my string, and even if I knew and if it was 1000 that would be pretty dumb to type all those $match[]'s.
The $match[0], $match[1], etc., items are not the individual matches, they're the "captures".
Regardless of how many matches there are, the number of entries in $matches is constant, because it's based on what you're searching for, not the results. There's always at least one entry, plus one more for each pair of capturing parentheses in the search pattern.
For example, if you do:
$matches = array();
$search = preg_match_all("/\D+(\d+)/", "a1b12c123", $matches);
print_r($matches);
Matches will have only two items, even though three matches were found. $matches[0] will be an array containing "a1", "b12" and "c123" (the entire match for each item) and $matches[1] will contain only the first capture for each item, i.e., "1", "12" and "123".
I think what you want is something more like:
foreach ($matches[1] as $match) {
echo $match;
}
Which will print out the first capture expression from each matched string.
Does print_r($matches) give you what you want?
You could loop recursively. This example requires SPL and PHP 5.1+ via RecursiveArrayIterator:
foreach( new RecursiveArrayIterator( $matches ) as $match )
print $match;

Categories