Regex, get multiple occurrences - php

I would like to know how to get multiple occurrences from a regex.
$str = "Some validations <IF TEST>firstValue</IF> in <IF OK>secondValue</IF> end of string.";
$do = preg_match("/<IF(.*)>.*<\/IF>/i", $str, $matches);
This is what I've done so far. It works if I have only 1 , but if I have more it doesn't return the right values. Here is the result:
Array ( [0] => firstValue in secondValue [1] => TEST>firstValue in
I need to get the "TEST" and the "OK" values.
EDIT: I've brought the modifications suggested, thanks a lot it works fine ! However, I am now trying to add a elsif parameter and can't get it to work well. Here is what I've done:
$do = preg_match_all("~<IF([^<>]+)>([^<>]+)(</IF>|<ELSEIF([^<>]+)>([^<>]+)</IF>)~", $str, $matches, PREG_SET_ORDER);
and the results is
Array
(
[0] => Array
(
[0] => firstValuesecondValue
[1] => TEST
[2] => firstValue
[3] => secondValue
[4] => TEST1
[5] => secondValue
)
[1] => Array
(
[0] => thirdValue
[1] => OK
[2] => thirdValue
[3] =>
)
)
Is there a way to make my array more clean ? It has many elements which are useless like the [0][4] etc.

You should make the regex more specific. The .* that you are using should either be less greedy, or better yet disallow other angle brackets:
~<IF([^<>]+)>([^<>]+)</IF>~i
More importantly, you should use preg_match_all, not just preg_match.
preg_match_all("~<IF([^<>]+)>([^<>]+)</IF>~i", $str, $matches, PREG_SET_ORDER);
That'll give you a nested array like:
[0] => Array
(
[0] => <IF TEST>firstValue</IF>
[1] => TEST
[2] => firstValue
)
[1] => Array
(
[0] => <IF OK>secondValue</IF>
[1] => OK
[2] => secondValue
)

The answers pointing out that you should use preg_match_all are correct.
But there is another problem: the .* is greedy by default. This will cause it to match both tags in a single match, so you need to make the star non-greedy (i.e. lazy):
/<IF(.*?)>.*?<\/IF>/i

Use this code:
$string = "Some validations <IF TEST>firstValue</IF> in <IF OK>secondValue</IF> end of string.";
$regex = "/<IF (.*?)>.*?<\/IF>/i";
preg_match_all($regex, $string, $matches);
print_r($matches[1]);
You regex is good but you have to use the non-greedy mode adding the ? char and use the preg_match_all() function.

Use a non-greedy match .*? and preg_match_all for this purpose.

Related

How to get a particular string using preg_replace?

i want to get a particular value from string in php. Following is the string
$string = 'users://data01=[1,2]/data02=[2,3]/*';
preg_replace('/(.*)\[(.*)\](.*)\[(.*)\](.*)/', '$2', $str);
i want to get value of data01. i mean [1,2].
How can i achieve this using preg_replace?
How can solve this ?
preg_replace() is the wrong tool, I have used preg_match_all() in case you need that other item later and trimmed down your regex to capture the part of the string you are looking for.
$string = 'users://data01=[1,2]/data02=[2,3]/*';
preg_match_all('/\[([0-9,]+)\]/',$string,$match);
print_r($match);
/*
print_r($match) output:
Array
(
[0] => Array
(
[0] => [1,2]
[1] => [2,3]
)
[1] => Array
(
[0] => 1,2
[1] => 2,3
)
)
*/
echo "Your match: " . $match[1][0];
?>
This enables you to have the captured characters or the matched pattern , so you can have [1,2] or just 1,2
preg_replace is used to replace by regular expression!
I think you want to use preg_match_all() to get each data attribute from the string.
The regex you want is:
$string = 'users://data01=[1,2]/data02=[2,3]/*';
preg_match_all('#data[0-9]{2}=(\[[0-9,]+\])#',$string,$matches);
print_r($matches);
Array
(
[0] => Array
(
[0] => data01=[1,2]
[1] => data02=[2,3]
)
[1] => Array
(
[0] => [1,2]
[1] => [2,3]
)
)
I have tested this as working.
preg_replace is for replacing stuff. preg_match is for extracting stuff.
So you want:
preg_match('/(.*?)\[(.*?)\](.*?)\[(.*?)\](.*)/', $str, $match);
var_dump($match);
See what you get, and work from there.

Regex - Does not contain certain Characters preg_match

I need a regex that match if the array contain certain it could anywhere for example, this array :
Array
(
[1] => Array
(
[0] => http://www.test1.com
[1] => 4
[2] => 4
)
[2] => Array
(
[0] => http://www.test2.fr/blabla.html
[1] => 2
[2] => 2
)
[3] => Array
(
[0] => http://www.stuff.com/admin/index.php
[1] => 2
[2] => 2
)
[4] => Array
(
[0] => http://www.test3.com/blabla/bla.html
[1] => 2
[2] => 2
)
[5] => Array
(
[0] => http://www.stuff.com/bla.html
[1] => 2
[2] => 2
)
I want to return all but the array that have the word stuff in it, and when i try to test with this it doesn't quite work :
return !preg_match('/(stuff)$/i', $element[0]);
any solution for that ?
Thanks
You don't need a regular expression for performing a simple search. Use array_filter() in conjunction with strpos():
$result = array_filter($array, function ($elem) {
return (strpos($elem[0], 'stuff') !== FALSE);
});
Now, to answer your question, your current regex pattern will only match strings that contain stuff at the end of the line. You don't want that, so get rid of the "end of the line" anchor $ from your regex.
The updated regex should look like below:
return !preg_match('/stuff/i', $element[0]);
If the actual use-case is different from what is shown in your question and if the operation involves more than just a simple pattern matching, then preg_match() is the right tool. As shown above, this can be used with array_filter() to create a new array that satisifes your requirements.
Here's how you'd do it with a callback function:
$result = array_filter($array, function ($elem) {
return preg_match('/stuff/i', $elem[0]);
});
Note: The actual regex might be more complex - I've used /stuff/ as an example. Also, note that I've removed the negation !... from the statement.
Your pattern will only match a string where stuff appears at the end of the string or line. To fix this, just get rid of the end anchor ($):
return !preg_match('/stuff/i', $element[0]);

Finding the no of occurence of a string inside another string using regex in PHP?

I want to find the no of occurences of a sustring(pattern based) inside another string.
For example:
$mystring = "|graboard='KERALA'||graboarded='KUSAT'||graboard='MG'";
I want to find the no of graboards present in the $mystring,
So I used the regex for this, But how will I find the no of occurrence?
If you must use a regex, preg_match_all() returns the number of matches.
Use preg_match_all:
$mystring = "|graboard='KERALA'||graboarded='KUSAT'||graboard='MG'";
preg_match_all("/(graboard)='(.+?)'/i", $mystring, $matches);
print_r($matches);
will yield:
Array
(
[0] => Array
(
[0] => graboard='KERALA'
[1] => graboard='MG'
)
[1] => Array
(
[0] => graboard
[1] => graboard
)
[2] => Array
(
[0] => KERALA
[1] => MG
)
)
So then you can use count($matches[1]) -- however, this regex may need to be modified to suit your needs, but this is just a basic example.
Just use preg_match_all():
// The string.
$mystring="|graboard='KERALA'||graboarded='KUSAT'||graboard='MG'";
// The `preg_match_all()`.
preg_match_all('/graboard/is', $mystring, $matches);
// Echo the count of `$matches` generated by `preg_match_all()`.
echo count($matches[0]);
// Dumping the content of `$matches` for verification.
echo '<pre>';
print_r($matches);
echo '</pre>';

PHP preg_match_all not finding first match

I am trying to find all matches in a string. For some reason if my match is at the start of the string it is not returning that particular match. Does it have something to do with index 0? I am also using PREG_OFFSET_CAPTURE to get the indexes vs. the matches. Below is the code of working an non-working.
$text = '[QUOTE]I wonder why[QUOTE]PHP[IMG]hates me[/IMG][/QUOTE][/QUOTE][URL="http://www.bing.com"]Click me![QUOTE]........[/QUOTE]Ok Bai![/URL]';
preg_match_all('#\[QUOTE\]#', $text, $matches, PREG_OFFSET_CAPTURE, PREG_PATTERN_ORDER);
print_r($matches);
The result of which is:
Array ( [0] => Array ( [0] => Array ( [0] => [QUOTE] [1] => 19 ) [1] => Array ( [0] => [QUOTE] [1] => 100 ) ) )
As you can see it only found two matches. If I add a character to the start of the string it will then find all three.
$text = 'a[QUOTE]I wonder why[QUOTE]PHP[IMG]hates me[/IMG][/QUOTE][/QUOTE][URL="http://www.bing.com"]Click me![QUOTE]........[/QUOTE]Ok Bai![/URL]';
preg_match_all('#\[QUOTE\]#', $text, $matches, PREG_OFFSET_CAPTURE, PREG_PATTERN_ORDER);
print_r($matches);
The result of which is:
Array ( [0] => Array ( [0] => Array ( [0] => [QUOTE] [1] => 1 ) [1] => Array ( [0] => [QUOTE] [1] => 20 ) [2] => Array ( [0] => [QUOTE] [1] => 101 ) ) )
All three matches. If anyone can help me figure out if my REGEX needs to be modified or if there is some quirk I'm unaware of it would be much appreciated. I've tried this same thing utilizing Python and the re library and it returns all my matches. I also utilized this http://www.regextester.com/ and it reports it as working in both scenarios and matching everything as it should. My only guess is something to do with the PREG_OFFSET_CAPTURE finding a match at position 0 and the 0 causing some issue.
Thanks in advance for any assistance!
The correct way to add multiple flags is with a pipe |, so:
preg_match_all('#\[QUOTE\]#', $text, $matches, PREG_OFFSET_CAPTURE | PREG_PATTERN_ORDER);
Your , before PREG_PATTERN_ORDER means it becomes the 'offset' parameter (at which point in the string to start), and as PREG_PATTERN_ORDER==1, it starts at the second character.
The problem is in your function call:
preg_match_all('#\[QUOTE\]#', $text, $matches, PREG_OFFSET_CAPTURE, PREG_PATTERN_ORDER);
The fifth parameter is the offset, not another flag.

Regular Expression with wordpress shortcodes

I'm trying to find all shortcodes within a string which looks like this:
 [a_col] One
 [/a_col]
outside
[b_col]
Two
[/b_col] [c_col] Three [/c_col]
I need the content (eg "Three") and the letter from the col (a, b or c)
Here's the expression I'm using
preg_match_all('#\[(a|b|c)_col\](.*)\[\/\1_col\]#m', $string, $hits);
but $hits contains only the last one.
The content can have any character even "[" or "]"
EDIT:
I would like to get "outside" as well which can be any string (except these cols). How can I handle that or should I parse this in a second step?
This will capture anything in the content, as well as attributes, and will allow any characters in the content.
<?php
$input = '[a_col some="thing"] One[/a_col]
[b_col] Two [/b_col]
[c_col] [Three] [/c_col] ';
preg_match_all('#\[(a|b|c)_col([^\[]*)\](.*?)\[\/\1_col\]#msi', $input, $matches);
print_r($matches);
?>
EDIT:
You may want to then trim the matches, since it appears there may be some whitespace. Alternatively, you can use regex for removing the whitespace in the content:
preg_match_all('#\[(a|b|c)_col([^\[]*)\]\s*(.*?)\s*\[\/\1_col\]#msi', $input, $matches);
OUTPUT:
Array
(
[0] => Array
(
[0] => [a_col some="thing"] One[/a_col]
[1] => [b_col] Two [/b_col]
[2] => [c_col] [Three] [/c_col]
)
[1] => Array
(
[0] => a
[1] => b
[2] => c
)
[2] => Array
(
[0] => some="thing"
[1] =>
[2] =>
)
[3] => Array
(
[0] => One
[1] => Two
[2] => [Three]
)
)
It might also be helpful to use this for capturing the attribute names and values stored in $matches[2]. Consider $atts to be the first element in $matches[2]. Of course, would iterate over the array of attributes and perform this on each.
preg_match_all('#([^="\'\s]+)[\t ]*=[\t ]*("|\')(.*?)\2#', $atts, $att_matches);
This gives an array where the names are stored in $att_matches[1] and their corresponding values are stored in $att_matches[3].
use ((.|\n)*) instead of (.*) to capture multiple lines...
<?php
$string = "
[a_col] One
[/a_col]
[b_col]
Two
[/b_col] [c_col] Three [/c_col]";
preg_match_all('#\[(a|b|c)_col\]((.|\n)*)\[\/\1_col\]#m', $string, $hits);
echo "<textarea style='width:90%;height:90%;'>";
print_r($hits);
echo "</textarea>";
?>
I don't have an environment I can test with here but you could use a look behind and look ahead assertion and a back reference to match tags around the content. Something like this.
(?<=\[(\w)\]).*(?=\[\/\1\])

Categories