preg_match_all is only matching one - php

I am trying to get the value after the dots, and I would like to get all of them (each as their own key/value).
The following is what I am running:
$string = "div.cat.dog#mouse";
preg_match_all("/\.(.+?)(\.|#|$)/", $string, $matches);
and when I do a dump of $matches I am getting this:
Array
(
[0] => Array
(
[0] => .cat.
)
[1] => Array
(
[0] => cat
)
[2] => Array
(
[0] => .
)
)
Where item [1] is, it is only returning 1 value. What I was expecting was for it to return (for this case) 2 items cat and dog. How come dog isn't getting picked up by preg_match_all?

Use lookahead:
\.(.+?)(?=\.|#|$)
RegEx Demo
Problem in your regex is that you're matching DOT on LHS and a DOT or HASH or end of input on RHS of match. After matching that internal pointer moves ahead leaving no DOT to be matched for next word.
(?=\.|#|$) is a positive lookahead that doesn't match these characters but just looks ahead so pointer remains at the cat instead of DOT after cat..

Related

PHP preg_match_all $matches output contains 3 rows

Here is my test code:
$test = '#12345 abc #12 #abd engng#geneng';
preg_match_all('/(^|\s)#([^# ]+)/', $test, $matches);
print_r($matches);
And the output $matches:
Array ( [0] => Array ( [0] => #12345 [1] => #12 [2] => #abd ) [1] => Array ( [0] => [1] => [2] => ) [2] => Array ( [0] => 12345 [1] => 12 [2] => abd ) )
My question is why does it have an empty row?
[1] => Array ( [0] => [1] => [2] => )
If I get ride of (^|\s) in the regex, the second row will disappear. However I would not able to prevent matching #geneng.
Any answer will be appreciated.
The problem with your regular expression is that it matches # even when it is preceded by whitespace. Because \s will match the whitespace, it will be captured into $matches array. You can solve this problem by using lookarounds. In this case, it can be solved with a positive lookbehind:
preg_match_all('/(?<=^|\s)#([^# ]+)/', $test, $matches);
This will match the part after # only if it is preceded by a space or beginning-of-the line anchor. It's important to note that lookarounds do not actually consume characters. They just assert that the given regular expression is either followed or preceded by something.
Demo
It's because of the memory capture to test (^|\s):
preg_match_all('/(^|\s)#([^# ]+)/', $test, $matches);
^^^^^^
It's captured as memory location #1, so to avoid that you can simply use non-capturing parentheses:
preg_match_all('/(?:^|\s)#([^# ]+)/', $test, $matches);
^^
preg_match_all uses by default the PREG_PATTERN_ORDER flag. This means that you will obtain:
$matches[0] -> all substrings that matches the whole pattern
$matches[1] -> all capture groups 1
$matches[2] -> all capture groups 2
etc.
You can change this behavior using the PREG_SET_ORDER flag:
$matches[0] -> array with the whole pattern and the capture groups for the first result
$matches[1] -> same for the second result
$matches[2] -> etc.
In your code you (PREG_PATTERN_ORDER by default) you obtain $matches[1] with only empty or blank items because it is the content of capture group 1 (^|\s)
There is 2 set of parentheses that's why you get an empty row. PHP thinks, you want 2 set of matching in the string. Removing one of them will remove one array.
FYI: In this case, you can not use [^|\s] instead of (^|\s). Cause PHP will think, you want to exclude the white space.

Regular expersion repeat inside a pattern

I have the following text and I would like to preg_match_all what is within the {'s and }'s if it contains only a-zA-Z0-9 and :
some text,{SOMETHING21} {SOMETHI32NG:MORE}some msdf{TEXT:GET:2}sdfssdf sdf sdf
I am trying to match {SOMETHING21} {SOMETHI32NG:MORE} {TEXT:GET:2} there can be several :'s within the tag.
What I currently have is:
preg_match_all('/\{([a-zA-Z0-9\-]+)(\:([a-zA-Z0-9\-]+))*\}/', $from, $matches, PREG_SET_ORDER);
It works as expected for {SOMETHING21} and {SOMETHI32NG:MORE} but for {TEXT:GET:2} it only matches TEXT and 2
So it only matches the first and last word within the tag, and leaves the middle ones out of the $matches array. Is this even possible or should I just match them and then explode on : ?
-- edit --
Well the question isn't if I can get the tags, the question is if I can get them grouped without having to explode the results again. Even though my current regex finds all the results the subpattern does not come back with all the matches in $matches.
I hope the following will clear it up abit more:
\{ // the match has to start with {
([a-zA-Z0-9\-]+) // after the { the match needs to have alphanum consisting out of 1 or more characters
(
\: // if we have : it should be followed by alphanum consisting out of 1 or more characters
([a-zA-Z0-9\-]+) // <---- !! this is what it is about !! even though this subexpression is between brackets it is not put into $matches if more then one of these is found
)* // there could be none or more of the previous subexpression
\} // the match has to end with }
You can't get all the matched values of a capturing group, you only get the last one.
So you have to match the pattern:
preg_match_all('/{([a-z\d-]+(?::[a-z\d-]+)*)}/i', $from, $matches);
and then split each element in $matches[1] on :.
I used non-capture groupings to eliminate the inner groups, and just capture the outer complete colon-separated list.
$from = "some text,{SOMETHING21} {SOMETHI32NG:MORE}some msdf{TEXT:GET:2}sdfssdf sdf sdf";
preg_match_all('/\{((?:[a-zA-Z0-9\-]+)(?:\:(?:[a-zA-Z0-9\-]+))*)\}/', $from, $matches, PREG_SET_ORDER);
print_r($matches);
Result:
Array
(
[0] => Array
(
[0] => {SOMETHING21}
[1] => SOMETHING21
)
[1] => Array
(
[0] => {SOMETHI32NG:MORE}
[1] => SOMETHI32NG:MORE
)
[2] => Array
(
[0] => {TEXT:GET:2}
[1] => TEXT:GET:2
)
)
Maybe I didn't understand the requirement, but...
preg_match_all('/{[A-Za-z0-9:-]+}/', $from, $matches, PREG_PATTERN_ORDER);
results in:
Array
(
[0] => Array
(
[0] => {SOMETHING21}
[1] => {SOMETHI32NG:MORE}
[2] => {TEXT:GET:2}
)
)

Unexpected result with very simple regexp

I am fairly new to regexp and have encountered a regexp that delivers an unexpected result, when trying to match name parts in name of the form firstname-fristname firstname:
preg_match_all('/([^- ])*/i', 'aNNA-äöå Åsa', $result);
gives a print_r($result) that looks like this:
Array
(
[0] => Array
(
[0] => aNNA
[1] =>
[2] => äöå
[3] =>
[4] => Åsa
[5] =>
)
[1] => Array
(
[0] => A
[1] =>
[2] => å
[3] =>
[4] => a
[5] =>
)
)
Now the $result[0] has the items I would want and expect as result, but where the heck do the $results[1] come from - I see it's the word endings, but how come they are matched?
And as a little side question, how do I prevent the empty matches ($results[0][1], $results[0][3], ...), or better even: Why do they show up - they are not not- or not-space either?
Have a try with:
preg_match_all('/([^- ]+)/', 'aNNA-äöå Åsa', $result);
Your regex:
/([^- ])*/i
means: find one char that is not ^ or space and keep it in a group 0 or more times
This one:
/([^- ]+)/
means: find one or more char that is not ^ or space and keep it in a group
Moreover, there's no need for case insensitive.
The * means "0 or more of the preceding." Since a "-" is exactly 0 of the the character class, it is matched. However, since it is omitted from the character class, the capture fails to grab anything, leaving you an empty entry. The expression giving you the expected behavior would be:
preg_match_all('/([^- ])+/i', 'aNNA-äöå Åsa', $result);
("+" means "1 or more of the preceding.")
http://php.net/manual/en/function.preg-match-all.php says:
Orders results so that $matches[0] is an array of full pattern
matches, $matches[1] is an array of strings matched by the first
parenthesized subpattern, and so on.
Check the URL for more details

Regex pattern matches fine but output is not complete

I am trying this regex pattern:
$string = '<div class="className">AlwaysTheSame:</div>Subtitle <br /><span class="anotherClass">entry1</span><span class="anotherClass">entry2</span><span class="anotherClass">entry3</span>';
preg_match_all('|<div class="className">AlwaysTheSame:</div>(.*?)<br />(<span class="anotherClass">(.*?)</span>)*|', $string, $matches);
print_r($matches);
exit;
The <span class="anotherClass">entry</span> can not exists or exists multiple times, the pattern seems to match it fine works both when exists and when it doesn't, but the output is:
Array
(
[0] => Array
(
[0] => <div class="className">AlwaysTheSame:</div>Subtitle <br /><span class="anotherClass">entry1</span><span class="anotherClass">entry2</span><span class="anotherClass">entry3</span>
)
[1] => Array
(
[0] => Subtitle
)
[2] => Array
(
[0] => <span class="anotherClass">entry3</span>
)
[3] => Array
(
[0] => entry3
)
)
Array[0][0] contains the full string so its matching all I need, but in Array[2] and [3] I only get the last <span...
How can I get all those <span... in the output array and not just the last one?
You can't directly, at least not in PHP. Repeated capturing groups always contain the last expression they matched. The exception is .NET where regex matches have an additional property that allows you to access every single match of a repeated group. Also, Perl 6 can do something like this - but not PHP.
Solution: Use
~<div class="className">AlwaysTheSame:</div>(.*?)<br />((?:<span class="anotherClass">(.*?)</span>)*)~
Now the second capturing group contains all the <span> tags. With another regex you can then extract all the matches:
~(?<=<span class="anotherClass">).*?(?=</span>)~
I'm using ~ as a regex delimiter, by the way - using | is confusing IMO.

Get position of all matches in group

Consider the following example:
$target = 'Xa,a,aX';
$pattern = '/X((a),?)*X/';
$matches = array();
preg_match_all($pattern,$target,$matches,PREG_OFFSET_CAPTURE|PREG_PATTERN_ORDER);
var_dump($matches);
What it does is returning only the last 'a' in the series, but what I need is all the 'a's.
Particularly, I need the position of ALL EACH OF the 'a's inside the string separately, thus PREG_OFFSET_CAPTURE.
The example is much more complex, see the related question: pattern matching an array, not their elements per se
Thanks
It groups a single match since the regex X((a),?)*X matches the entire string. The last ((a),?) will be grouped.
What you want to match is an a that has an X before it (and the start of the string), has a comma ahead of it, or has an X ahead of it (and the end of the string).
$target = 'Xa,a,aX';
$pattern = '/(?<=^X)a|a(?=X$|,)/';
preg_match_all($pattern, $target, $matches, PREG_OFFSET_CAPTURE);
print_r($matches);
Output:
Array
(
[0] => Array
(
[0] => Array
(
[0] => a
[1] => 1
)
[1] => Array
(
[0] => a
[1] => 3
)
[2] => Array
(
[0] => a
[1] => 5
)
)
)
When your regex includes X, it matches once. It finds one large match with groups in it. What you want is many matches, each with its own position.
So, in my opinion the best you can do is simply search for /a/ or /a,?/ without any X. Then matches[0] will contain all appearances of 'a'
If you need them between X, pre-select this part of the string.

Categories