Why does this regex capture produce 2 matches? - php

I do not know why there are 2 matches found aside from the input using this regex, when I expected only 1 match.
preg_match(/_(\d(-\d){0,3})\./,$str,$matches);
on this file string format name_A-B-C-D.ext.
I would expect to get a single match like this:
Example A
[0] => name_A-B-C-D.ext
[1] => A-B-C-D
Example B
[0] => name_A-B-C.ext
[1] => A-B-C
But this is the result I get:
Example A
[0] => name_A-B-C-D.ext
[1] => A-B-C-D
[2] => -D
Example B
[0] => name_A-B-C.ext
[1] => A-B-C
[2] => -C
I only wish to capture A up to D if its preceded with a hyphen.
This code is usable and I can simply ignore the 2nd match, but I would like to know why its there. I can only assume it has something to do with my two capture groups. Where is my error ?

Yes, you get two captures because you have two capturing groups in your regular expression.
To avoid the unwanted capture you could use a non-capturing group (?:...):
/_(\d(?:-\d){0,3})\./

I can only assume it has something to do with my two capture groups.
Your assumption is correct
Where is my error ?
There is no error, everything is behaving as expected.

You have to groups in your RE, so you get 2 matches. What is surprising?
Each pair of parenthesis is a group.

Related

PHP regex match multiple tags within a string and add to array

I have a string with multiple tags in as so:
<item>foo bar</item> <item>foo bar</item>
I need to match each of these and they can be on new lines and add them to an array, it can't seem to match them though, I'm new to regex so I'm not understanding what is going wrong, an explanation would be great, thanks!
preg_match_all('/<item>(.*)<\/item>/',$content,$matches);
At the moment, it returns two empty index in the matches array.
I have also tried:
<item>([\s\S]*)<\/item>
This matches from the first tag until the very last one, so grabs everything essentially.
You can use this
preg_match_all('/<item>(.*?)<\/item>/',$content,$matches);
Result
Array
(
[0] => Array
(
[0] => <item>foo bar</item>
[1] => <item>foo bar</item>
)
[1] => Array
(
[0] => foo bar
[1] => foo bar
)
)
I only added ? to the regex, that looks for the nearest match and get it.
Read about lazy and greedy here: What do lazy and greedy mean in the context of regular expressions?

PHP preg_match_all named groups issue

I'm trying to get grouped matches from the following URI:
route: "/user/{user}/{action}"
input: "/user/someone/news"
What's the appropriate regex for this? I've been searching myself sour for the past couple of hours...
I've tried something like this, but no result :(
~\/app\/user\/(?P<user>[.*]+)\/(?P<action>[.*]+)~
I get the groups back in the matches array, but no results based on the input inside the groups.
Desired output:
Array
(
[0] => Array
(
[0] => "someone"
)
[user] => Array
(
[0] => "someone"
)
[1] => Array
(
[0] => "news"
)
[action] => Array
(
[0] => "news"
)
)
To clarify with an example:
My controller has the following route: /app/user/{username}/{action}
The request URI from the browser is: /app/user/john/news
How do I match that request URI against that route using a regex patter while catching the variables between the brackets?
/user/(?P<user>[^/]+)/(?P<action>[^/]+)
http://regex101.com/r/gL1aS2
Just to explain a couple problems with your original regex:
[.*]+ means a positive number of occurrences of a dot and an asterisk only, example: *.*.* or . or ......; [^/]+ describes a positive number of any characters but slashes.
No need to escape slashes, as they're not special characters when you're using ~ as delimiters.
Your regex also required /app at the beginning, which wasn't present in your string.

PHP preg_match_all: extract parameters of a command

I have the following LaTeX command:
\autocites[][]{}[][]{}
where the parameters inside [] are optional the others inside {} are mandatory. The \autocites command can be extended by additional groups of arguments like:
\autocites[a1][a2]{a3}[b1][b2]{b3}
\autocites[a1][a2]{a3}[b1][b2]{b3}[c1][c2]{c3}
...
It can also be used like this:
\autocites{a}{b}
\autocites{a}[b1][]{b3}
\autocites{a}[][b2]{b3}
...
I'd like to extract its parameters by using a regular expression in PHP. This is my first attempt:
/\\autocites(\[(.*?)\])(\[(.*?)\])(\{(.*?)\})(\[(.*?)\])(\[(.*?)\])(\{(.*?)\})/
Although this works fine if \autocites contains only two groups of three parameters I'm not able to figure out how to get it working for an unknown number of parameters.
I also tried using the following expression:
/\\autocites((\[(.*?)\]\[(.*?)\])?\{(.*?)\}){2,}/
This time I'm able to match even larger numbers of parameters but then I'm not able to extract all values because PHP always just gives me the content of the last three parameters:
Array
(
[0] => Array
(
[0] => \autocites[a][b]{c}[d][e]{f}[a][a]{a}
)
[1] => Array
(
[0] => [a][a]{a}
)
[2] => Array
(
[0] => [a][a]
)
[3] => Array
(
[0] => a
)
[4] => Array
(
[0] => a
)
[5] => Array
(
[0] => a
)
)
Any help is greatly appreciated.
You'll have to do this in two steps. Only .NET can retrieve an arbitrary amount of captures. In all other flavors, the amount of resulting captures is fixed by the number of groups in your pattern (repeating a group will only overwrite previous captures).
So first, match the entire thing to get the parameters, and then extract them in a second step:
preg_match('/\\\\autocites((?:\{[^}]*\}|\[[^]]*\])+)/', $input, $autocite);
preg_match_all('/(?|\{([^}]*)\}|\[([^]]*)\])/', $autocite[1], $parameters);
// $parameters[1] will now be an array of all parameters
Working demo.
Using a slightly more elaborate approach and the anchor \G we could also do it all in one go, by using an arbitrary amount of matches instead of captures:
preg_match_all('/
(?| # two alternatives whose group numbers both begin at 1
\\\\autocites # match the command
(?|\{([^}]*)\}|\[([^]]*)\])
# and a parameter in group 1
| # OR
\G # anchor the match to the end of the last match
(?|\{([^}]*)\}|\[([^]]*)\])
# and match a parameter in group 1
)
/x',
$input,
$parameters);
// again, you'll have an array of parameters in $parameters[1]
Working demo.
Note that with this approach - if you have multiple autocites in your code, you'll get all parameters from all commands in a single list. There are some ways alleviate that, but I think the first approach would be cleaner in that case.
If you want to be able to distinguish between optional and mandatory parameters (with any approach), capture the opening or closing bracket/brace along with the parameter, and check against that character to find out which type it is.

php preg_match optional sub pattern

i try to build a small system that catches variables from html template.
the variables defined as #XXX# and may (but not required) to have extra parameters that are sent with colon (:), i.e. #XXX#:j to send the data as json encoded.
what i manage to do is to create a preg_match_all to capture the variables and those extra parameters. so i came up with this preg:
preg_match_all("/(#.*#(?:(j|n|x|z))?)/imU", $string,$this->localVariables,PREG_PATTERN_ORDER);
j|n|x|z are the avialiable extra parameters that are possible.
the string that i send to $string is:
#geterr# #domain#:j #jhon#:n
the result i get from preg_match_all is:
Array
(
[0] => Array
(
[0] => #geterr#
[1] => #domain#
[2] => #jhon#
)
[1] => Array
(
[0] => #geterr#
[1] => #domain#
[2] => #jhon#
)
[2] => Array
(
[0] =>
[1] =>
[2] =>
)
)
i know (or think i know) that ?: is used for optional sub pattern
the modifires iv'e used are:
i for case insensitive
m for to allow my string to be multiline
U - to be non greedy
i have no other clue what i do wrong.
any help shall be greatly appriciated
There are some issues in your pattern /(#.*#(?:(j|n|x|z))?)/imU
You don't need a capturing group around the whole pattern.
?: is creating a non capturing group, it is not limited to optional groups.
The modifier m is called multiline, but this is a bit misleading it affects only the anchors ^ and $ to match also the start and end of a row and not only of the string.
What you want is the modifier s, the singleline modifier. It treats the whole string as one line, and affects the . to match also newline characters.
The modifier U makes your whole regex ungreedy. This is not what you want, because it affects also your optional group and because it is at the end of the pattern it will never match.
You need to match the : in your string
So I would remove U and make only the first quantifier ungreedy, by adding a ? after it.
So I think your regex should be:
/#(.*?)#(?::(j|n|x|z))?/is
This would put the first part between the # in the first capturing group and the parameter in the second group.
See it here on Regexr

Problem (un-)greedy RegExp

Consider the following Strings:
1: cccbbb
2: cccaaabbb
I would like to end up with are matches like this:
1: Array
(
[1] =>
[2] => bbb
)
2: Array
(
[1] => aaa
[2] => bbb
)
How can I match both in one RegExp?
Here's my try:
#(aaa)?(.*)$#
I have tried many variants of greedy and ungreedy modifications but it doesn't work out. As soon as I add the '?' everything is matched in [2]. Making [2] ungreedy doesn't help.
My RegExp works as expected if I omit the 'ccc', but I have to allow other characters at the beginning...
/(aaa)?((.)\3*)$/
There will be an extra [3] though. I don't think that's a problem.
Thanks for the brainstorming here guys! I have finally been able to figure something out that's working:
^(?:([^a]*)(aaa))?(.*)$
here's a non-regex way. search and split on "aaa" if found, then store the rest of the right side of "aaa" into array.
$str="cccaaabbb";
if (strpos($str,"aaa")!==FALSE){
$array[]="aaa";
$s = explode("aaa",$str);
$array[]=end($s);
}
print_r($array);
output
$ php test.php
Array
(
[0] => aaa
[1] => bbb
)
As for [1], depending on what's your criteria when "aaa" is not found, it can be as simple as getting the substring from character 4 onwards using strpos().
this will match the groups but its not very flexible can you put a little more detail of what you need to do. It may be much easier to grab three characters a time and evaluate them.
Also I tested this in poweshell which has a slightly different flavor of regex.
(a{3,3})*(b{3,3})
do like this:
$sPattern = "/(aaa?|)(bbb)/";
this works well.

Categories