preg_match match all starting words - php

I am trying to get all matched patterns from a list of words;
$pattern = '/^(ab|abc|abcd|asdf)/';
preg_match_all($pattern, 'abcdefgh', $matches);
I want to get 'ab, abc and abcd'
But this return only 'ab'. It works if I loop through patterns after exploding them.
Is there any way to solve it though single match?

Regular expressions consume characters as they are matching through the string, so they can't natively find overlapping matches.
You can use extended features like lookahead assertions together with capturings groups, but that requires an ugly construction:
preg_match_all(
'/^
(?:(?=(ab)))?
(?:(?=(abc)))?
(?:(?=(abcd)))?
(?:(?=(asdf)))?
/x',
$subject, $result, PREG_SET_ORDER);
for ($matchi = 0; $matchi < count($result); $matchi++) {
for ($backrefi = 0; $backrefi < count($result[$matchi]); $backrefi++) {
# Matched text = $result[$matchi][$backrefi];
}
}

Related

PHP preg_match matches

I'm trying to get all the matches from string:
$string = '[RAND_15]d4trg[RAND_23]';
with preg_match like this:
$match = array();
preg_match('#\[RAND_.*]#', $string, $match);
but after that $match array looks like this:
Array ( [0] => [RAND_15]d4trg[RAND_23] )
What should I do to get both occurrences as 2 separate elements in $match array? I would like to get result like this:
$match[0] = [RAND_15];
$match[1] = [RAND_23];
Use ...
$match = array();
preg_match_all('#\[RAND_.*?]#', $string, $match);
... instead. ? modifier will make the pattern become 'lazy', matching the shortest possible substring. Without it the pattern will try to cover the maximum distance possible, and technically, [RAND_15]d4trg[RAND_23] does match the pattern.
Another way is restricting the set of characters to match with negated character class:
$match = array();
preg_match_all('#\[RAND_[^]]*]#', $string, $match);
This way we won't have to turn the quantifier into a lazy one, as [^]] character class will stop matching at the first ] symbol.
Still, to catch all the matches you should use preg_match_all instead of preg_match. Here's the demo illustrating the difference.

Find words from the array in the text received through file_get_contents

I have a receipt of a remote page:
$page = file_get_contents ('http://sayt.ru/');
There is a array of words:
$word = array ("word", "second");
How to count the number of words in the array matches the text on the page?
Started to dig in the direction
$matches = array ();
$count_words = preg_match_all ('/'. $word. '/ i',$page, $matches);
But certainly not in the direction I dig because count is always zero. And through preg_match_all sought after one word, not the entire array. : (
you have to either check or each word in array or use regexp like this:
$serachWords = array_map(function($w){ return preg_quote($w,'/'); }, $word);
$search = implode('|', $searchWords);
$count_words = preg_match_all('/\b(?:'.$serach.')\b/i', $page, $matches);
Added few modification to have better results: escape all words, so they wouldn't break expression and add word boundaries (\b) no match word as a word, not part of swords.

Regular expression for between two dynamic patterns

I want to find anything that matches
[^1] and [/^1]
Eg if the subject is like this
sometext[^1]abcdef[/^1]somemoretext[^2]12345[/^2]
I want to get back an array with abcdef and 12345 as the elements.
I read this
And I wrote this code and I am unable to advance past searching between []
<?php
$test = '[12345]';
getnumberfromstring($test);
function getnumberfromstring($text)
{
$pattern= '~(?<=\[)(.*?)(?=\])~';
$matches= array();
preg_match($pattern, $text, $matches);
var_dump($matches);
}
?>
Your test checks the string '[12345]' which does not apply for the rule of having an "opening" of [^digit] and a "closing" of [\^digit]. Also, you're using preg_match when you should be using: preg_match_all
Try this:
<?php
$test = 'sometext[^1]abcdef[/^1]somemoretext[^2]12345[/^2]';
getnumberfromstring($test);
function getnumberfromstring($text)
{
$pattern= '/(?<=\[\^\d\])(.*?)(?=\[\/\^\d\])/';
$matches= array();
preg_match_all($pattern, $text, $matches);
var_dump($matches);
}
?>
That other answer doesn't really apply to your case; your delimiters are more complex and you have to use part of the opening delimiter to match the closing one. Also, unless the numbers inside the tags are limited to one digit, you can't use a lookbehind to match the first one. You have to match the tags in the normal way and use a capturing group to extract the content. (Which is how I would have done it anyway. Lookbehind should never be the first tool you reach for.)
'~\[\^(\d+)\](.*?)\[/\^\1\]~'
The number from the opening delimiter is captured in the first group and the backreference \1 matches the same number, thus insuring that the delimiters are correctly paired. The text between the delimiters is captured in group #2.
I have tested following code in php 5.4.5:
<?php
$foo = 'sometext[^1]abcdef[/^1]somemoretext[^2]12345[/^2]';
function getnumberfromstring($text)
{
$matches= array();
# match [^1]...[/^1], [^2]...[/^2]
preg_match_all('/\[\^(\d+)\]([^\[\]]+)\[\/\^\1\]/', $text, $matches, PREG_SET_ORDER);
for($i = 0; $i < count($matches); ++$i)
printf("%s\n", $matches[$i][2]);
}
getnumberfromstring($foo);
?>
output:
abcdef
123456

Iterating over matches from preg_match_all

I am trying to figure out the mechanics of this plugin in WordPress.
I have a preg_match_all function that looks like this:
preg_match_all('/(?<=\\[\\[).+?(?=\\]\\])/', $content, $matches, PREG_PATTERN_ORDER);
$numMatches = count($matches[0]);
for ($i = 0; $i < $numMatches; $i++) {
$postSlug = $matches[0][$i];
}
If I understand this correctly, count($matches[0]) assumes there is only one match in $content.
My goal here is to re-write the for statement to allow for the full array of matches in the preg_match_all script.
I'm assuming I should replace the for statement with foreach ($matches as $postSlug) and not even bother with the confusing $matches[0][$i] at the end.
Unfortunately the final output does not seem to loop through each element in the array. Any ideas? Thanks!
If I understand this correctly, count($matches[0] assumes there is only one match in $content.
Not quite; $matches[0] represents the array of matches in of the whole regular expression (as opposed to, say, $matches[1], which would be the array of matches in the first match group of the regular expression). Thus, count($matches[0]) is the number of matches in he first match group.
You could do what you've said and rewrite the for loop as a foreach loop, but this likely won't change anything, as both methods should traverse all elements in $matches[0]. Are you certain that the results you're looking for are matched in your regular expression?
If you do want to rewrite this code, then I suggest you look into PREG_SET_ORDER as last argument, instead of PREG_PATTERN_ORDER. This groups the result array by results first, and with match groups in the second level.
Then you can just loop over it as follows:
foreach ($matches as $matchgroup) {
$postslug = $matchgroup[0];
}
You still need the [0] to get the "complete match". If your pattern had any (..) groups then [1] and [2] would correspond to those..

PHP preg_match_all regex

If I have a string like: 10/10/12/12
I'm using:
$string = '10/10/12/12';
preg_match_all('/[0-9]+\/[0-9]+/', $string, $results);
This only seems to match 10/10, and 12/12. I also want to match 10/12. Is it because after the 10/10 is matched that is removed from the picture? So after the first match it'll only match things from /12/12?
If I want to match all 10/10, 10/12, 12/12, what should my regex look like? Thanks.
Edit: I did this
$arr = explode('/', $string);
$count = count($arr) - 1;
$newarr = array();
for ($i = 0; $i < $count; $i++)
{
$newarr[] = $arr[$i].'/'.$arr[$i+1];
}
I'd advise not using regular expression. Instead you could for example first split on slash using explode. Then iterate over the parts, checking for two consecutive parts which both consist of only digits.
The reason why your regular expression doesn't work is because the match consumes the characters it matches. Searching for the next match starts from just after where the previous match ended.
If you really want to use regular expressions you can use a zero-width match such as a lookahead to avoid consuming the characters, and put a capturing match inside the lookahead.
'#[0-9]+/(?=([0-9]+))#'
See it working online: ideone

Categories