Need to match ALL similar words/phrases using preg_match_all - php

I'm trying to create a pattern that matches all similar words/phrases within a string.
For example, I need to match: "this", "this is", "this is it", "that", "that was", "that was not".
It only matches the first occurence of "this", but it should match all occurences.
I even tried anchors and word boundaries, but nothing seems to work.
I tried (simplified):
$content = "this is it! that was not!";
preg_match_all('/(this|this is|this is it|that|that was|that was not)/i', $content, $results);
Which should output:
this
this is
this is it
that
that was
that was not

Given that you're only capturing the terms you're searching for, it might be better to simply use a foreach loop as well as substr_count to see how many times each string occurs.
For example:
$haystack = "this is it! that was not! this is not a test!";
$needles = array(
"this",
"this is",
"this is it",
"that",
"that was",
"that was not");
foreach ($needles as $needle) {
// substr_count is case sensitive, so make subject and search lowercase
$hits = substr_count(strtolower($haystack), strtolower($needle));
echo "Search '$needle' occurs $hits time(s)" . PHP_EOL;
}
The above will output:
Search 'this' occurs 2 time(s)
Search 'this is' occurs 2 time(s)
Search 'this is it' occurs 1 time(s)
Search 'that' occurs 1 time(s)
Search 'that was' occurs 1 time(s)
Search 'that was not' occurs 1 time(s)
If substr_count doesn't provide the flexibility that you need then you can always replace it with a preg_match_all and use your individual $needle values as search terms.

The problem is that the shortest string option appears first in your or group:
/(this|this is|this is it)/i
PHP will check if the test string contains a item of (this|this is|this is it) from left to right. Once it found a match in the test string it will leave the group.
This will work because PHP will search for the longest string first:
/(this is it|this is|this)/i
Demo

How about:
$content = "this is it";
preg_match_all('/(?=(this))(?=(this is))(?=(this is it))/i', $content, $results);
print_r($results);
Edit according to comments:
$content = "this is it";
preg_match_all('/(?=(this))(?=(this is))(?=(this is it))|(?=(that))(?=(that was))(?=(that was not))/i', $content, $results);
print_r($results);
Output:
Array
(
[0] => Array
(
[0] =>
[1] =>
)
[1] => Array
(
[0] => this
[1] =>
)
[2] => Array
(
[0] => this is
[1] =>
)
[3] => Array
(
[0] => this is it
[1] =>
)
[4] => Array
(
[0] =>
[1] => that
)
[5] => Array
(
[0] =>
[1] => that was
)
[6] => Array
(
[0] =>
[1] => that was not
)
)
More universal:
$content = "this is it! that was not!";
preg_match_all('/\b(?=(\w+))(?=(\w+ \w+))(?=(\w+ \w+ \w+))\b/i', $content, $results);
print_r($results);
output:
Array
(
[0] => Array
(
[0] =>
[1] =>
)
[1] => Array
(
[0] => this
[1] => that
)
[2] => Array
(
[0] => this is
[1] => that was
)
[3] => Array
(
[0] => this is it
[1] => that was not
)
)

You can also use the following regex instead.
/(this(?:\sis(?:\sit)?)?)/i

Related

PHP - Preg Match All - Wordpress Multiple short codes with multiple parameters

I'm trying to find a regex capable of capturing the content of short codes produces in Wordpress.
My short codes have the following structure:
[shortcode name param1="value1" param2="value2" param3="value3"]
The number of parameters is variable.
I need to capture the shortcode name, the parameter name and its value.
The closest results I have achieved is with this:
/(?:\[(.*?)|\G(?!^))(?=[^][]*])\h+([^\s=]+)="([^\s"]+)"/
If I have the following content in the same string:
[specs product="test" category="body"]
[pricelist keyword="216"]
[specs product="test2" category="network"]
I get this:
0=>array(
0=>[specs product="test"
1=> category="body"
2=>[pricelist keyword="216"
3=>[specs product="test2"
4=> category="network")
1=>array(
0=>specs
1=>
2=>pricelist
3=>specs
4=>)
2=>array(
0=>product
1=>category
2=>keyword
3=>product
4=>category)
3=>array(
0=>test
1=>body
2=>216
3=>test2
4=>network)
)
I have tried different regex models but I always end up with the same issue, if I have more than one parameter, it fails to detect it.
Do you have any idea of how I could achieve this?
Thanks
Laurent
You could make use of the \G anchor using 3 capture groups, where capture group 1 is the name of the shortcode, and group 2 and 3 the key value pairs.
Then you can remove the first entry of the array, and remove the empty entries in the 1st, 2nd and 3rd entry.
This is a slightly updated pattern
(?:\[(?=[^][]*])(\w+)|\G(?!^))\h+(\w+)="([^"]+)"
Regex demo | Php demo
Example
$s = '[specs product="test" category="body"]';
$pattern = '/(?:\[(?=[^][]*])(\w+)|\G(?!^))\h+(\w+)="([^"]+)"/';
$strings = [
'[specs product="test" category="body"]',
'[pricelist keyword="216"]',
'[specs product="test2" category="network" key="value"]'
];
foreach($strings as $s) {
if (preg_match_all($pattern, $s, $matches)) {
unset($matches[0]);
$matches = array_map('array_filter', $matches);
print_r($matches);
}
}
Output
Array
(
[1] => Array
(
[0] => specs
)
[2] => Array
(
[0] => product
[1] => category
)
[3] => Array
(
[0] => test
[1] => body
)
)
Array
(
[1] => Array
(
[0] => pricelist
)
[2] => Array
(
[0] => keyword
)
[3] => Array
(
[0] => 216
)
)
Array
(
[1] => Array
(
[0] => specs
)
[2] => Array
(
[0] => product
[1] => category
[2] => key
)
[3] => Array
(
[0] => test2
[1] => network
[2] => value
)
)

How to extract certain words from a php string?

I have a long string like this I1:1;I2:2;I8:2;NA1:5;IA1:[1,2,3,4,5];S1:asadada;SA1:[1,2,3,4,5];SA1:[1,2,3,4,5];. Now I just want to get certain words like 'I1','I2','I8','NA1' and so on i.e. words between ':'&';' only ,and store them in array. How to do that efficiently?
I have already tried using preg_split() and it works but giving me wrong output. As shown below.
// $a is the string I want to extract words from
$str = preg_split("/[;:]/", $a);
print_r($str);
The output I am getting is this
Array
(
[0] => I8
[1] => 2
[2] => I1
[3] => 1
[4] => I2
[5] => 2
[6] => I3
[7] => 2
[8] => I4
[9] => 4
[10] =>
)
Array
(
[0] => NA1
[1] => 5
[2] =>
)
Array
(
[0] => IA1
[1] => [1,2,3,4,5]
[2] =>
)
Array
(
[0] => S1
[1] => asadada
[2] =>
)
Array
(
[0] => SA1
[1] => [1,2,3,4,5]
[2] =>
)
But I am expecting 'I8','I1','I2','I3','I4' also in seperated array with position [0]. Any help on how to do this.
You could try something like.
<?php
$str = 'I1:1;I2:2;I8:2;NA1:5;IA1:[1,2,3,4,5];S1:asadada;SA1:[1,2,3,4,5];SA1:[1,2,3,4,5];';
preg_match_all('/(?:^|[;:])(\w+)/', $str, $result);
print_r($result[1]); // Matches are here in $result[1]
You can perform a greedy match to match the items between ; and : using preg_match_all()
<?php
$str = 'I1:1;I2:2;I8:2;NA1:5;IA1:[1,2,3,4,5];S1:asadada;SA1:[1,2,3,4,5];SA1:[1,2,3,4,5];';
preg_match_all('/;(.+?)\:/',$str,$matches);
print_r($matches[1]);
Live Demo: https://3v4l.org/eBsod
One possible approach is using a combination of explode() and implode(). The result is returned as a string, but you can easily put it into an array for example.
<?php
$input = "I1:1;I2:2;I8:2;NA1:5;IA1:[1,2,3,4,5];S1:asadada;SA1:[1,2,3,4,5];SA1:[1,2,3,4,5];.";
$output = array();
$array = explode(";", $input);
foreach($array as $item) {
$output[] = explode(":", $item)[0];
}
echo implode(",", $output);
?>
Output:
I1,I2,I8,NA1,IA1,S1,SA1,SA1,.

php regex mach before and after specific word

I have a string with data that looks like this:
$string = '
foo=bar
badge_name_foo=foo
bar_badge_name=bar
bar=baz
';
I want to match all *_badge_name and badge_name_* strings.
The regex im using is this:
preg_match_all('~(?:(\w+)_)?badge_name(?:_(\w+))?~', $string, $matches, PREG_SET_ORDER);
The result is:
Array
(
[0] => Array
(
[0] => badge_name_foo
[1] =>
[2] => foo
)
[1] => Array
(
[0] => bar_badge_name
[1] => bar
)
)
The *_badge_name is working fine, but on badge_name_* there is every time a empty value? Now how can i remove that with preg_match_all
Expected result should be:
Array
(
[0] => Array
(
[0] => badge_name_foo
[1] => foo
)
[1] => Array
(
[0] => bar_badge_name
[1] => bar
)
)
It seems you need to use BRANCH RESET feature:
Alternatives inside a branch reset group share the same capturing groups. The syntax is (?|regex) where (?| opens the group and regex is any regular expression. If you don't use any alternation or capturing groups inside the branch reset group, then its special function doesn't come into play. It then acts as a non-capturing group.
Use
(?|(\w+)_badge_name|badge_name_(\w+))
^^^
See the regex demo.
PHP demo:
$re = '/(?|(\w+)_badge_name|badge_name_(\w+))/';
$str = 'foo=bar
badge_name_foo=foo
bar_badge_name=bar
bar=baz';
preg_match_all($re, $str, $matches);
print_r($matches);
Result:
Array
(
[0] => Array
(
[0] => badge_name_foo
[1] => bar_badge_name
)
[1] => Array
(
[0] => foo
[1] => bar
)
)

Preg_match_all behaving wierd

I am new to PHP and I have the below code and I basically wish to find all keywords enclosed between
'<#' and '#>'
sample code:
<?php
$subject = "askdbvbaldjbvasdblasdbvl<#2134#>cbkdbskbkabdvb<#213aca4#>";
$pattern = "/(?<=\<\#)(.*?)(?=\#\>)/";
preg_match_all($pattern, $subject, $matches);
echo '<pre>',print_r($matches,true),'</pre>';
?>
now i am expecting a value array like:
Array
(
[0] => Array
(
[0] => 2134
[1] => 213aca4
)
)
But i am getting and output like:
Array
(
[0] => Array
(
[0] => 2134
[1] => 213aca4
)
[1] => Array
(
[0] => 2134
[1] => 213aca4
)
)
can any one tell me why am i getting the second array and how can i get rid of that..
The second array contains the sub-match, or matched group, because you're using a capture group.
Simply remove the parens in your regex:
$pattern = "/(?<=\<\#).*?(?=\#\>)/";
Also, you should be able to use this regex without some escapes:
$pattern = "/(?<=<#).*?(?=#>)/";

php returned array from preg_match_all

i have an array that is being returned like this:
Array ( [0] => Array ( [0] => ;3750;011; [1] => ;3750;012; [2] => ;3750;013; [3] => ;3750;014; [4] => ;3750;015; [5] => ;3750;016; [6] => ;3750;017; [7] => ;3750;018; [8] => ;3750;019; ))
the array is coming from preg_match_all
I have tried to print it with foreach loop and it always returns the same way
i can't work with it like this.. and i do not understand what is going on
this is the preg_match_all that it comes from:
$remove = preg_match_all('/;([\d]{4};[\d]{3});/', $str, $m);
preg_match_all() returns in match result an array of arrays. Then to display all the whole matches you must use:
$remove = preg_match_all('/;([\d]{4};[\d]{3});/', $str, $m);
foreach($m[0] as $item) { echo $item . '<br/>'; }
If you only want the content of your capturing group, just replace $m[0] by $m[1]

Categories