php preg_match optional sub pattern - php

i try to build a small system that catches variables from html template.
the variables defined as #XXX# and may (but not required) to have extra parameters that are sent with colon (:), i.e. #XXX#:j to send the data as json encoded.
what i manage to do is to create a preg_match_all to capture the variables and those extra parameters. so i came up with this preg:
preg_match_all("/(#.*#(?:(j|n|x|z))?)/imU", $string,$this->localVariables,PREG_PATTERN_ORDER);
j|n|x|z are the avialiable extra parameters that are possible.
the string that i send to $string is:
#geterr# #domain#:j #jhon#:n
the result i get from preg_match_all is:
Array
(
[0] => Array
(
[0] => #geterr#
[1] => #domain#
[2] => #jhon#
)
[1] => Array
(
[0] => #geterr#
[1] => #domain#
[2] => #jhon#
)
[2] => Array
(
[0] =>
[1] =>
[2] =>
)
)
i know (or think i know) that ?: is used for optional sub pattern
the modifires iv'e used are:
i for case insensitive
m for to allow my string to be multiline
U - to be non greedy
i have no other clue what i do wrong.
any help shall be greatly appriciated

There are some issues in your pattern /(#.*#(?:(j|n|x|z))?)/imU
You don't need a capturing group around the whole pattern.
?: is creating a non capturing group, it is not limited to optional groups.
The modifier m is called multiline, but this is a bit misleading it affects only the anchors ^ and $ to match also the start and end of a row and not only of the string.
What you want is the modifier s, the singleline modifier. It treats the whole string as one line, and affects the . to match also newline characters.
The modifier U makes your whole regex ungreedy. This is not what you want, because it affects also your optional group and because it is at the end of the pattern it will never match.
You need to match the : in your string
So I would remove U and make only the first quantifier ungreedy, by adding a ? after it.
So I think your regex should be:
/#(.*?)#(?::(j|n|x|z))?/is
This would put the first part between the # in the first capturing group and the parameter in the second group.
See it here on Regexr

Related

How to alter my regex so that preg_match returns the desired string

I have the string:
<mml:mi>P</mml:mi><mml:mn>2</mml:mn>
and wish to retrieve the 2
My pattern is:
/(?:<mml:)(mn|mi|mo)>(.+)(?:<\/mml:\1>)$/
the return is the 2 as it should be,
but if the string is:
<mml:mi>P</mml:mi><mml:mi>s</mml:mi>
the pattern should then return the s, from inside the second set of tags, but returns the P from inside the first set
P</mml:mi><mml:mi>s
when changing the pattern as in the suggestion below to:
/<mml:(mn|mi|mo)>(.*?)<\/mml:\1>/sU
the return is the same. The line of php is:
preg_match('/<mml:(mn|mi|mo)>(.*?)<\/mml:\1>/sU', '<mml:mi>P</mml:mi><mml:mi>s</mml:mi>', $ret, PREG_OFFSET_CAPTURE);
and $ret contains:
Array
(
[0] => Array
(
[0] => <mml:mi>P</mml:mi><mml:mi>s</mml:mi>
[1] => 0
)
[1] => Array
(
[0] => mi
[1] => 5
)
[2] => Array
(
[0] => P</mml:mi><mml:mi>s
[1] => 8
)
)
and when changed to the edited suggestion, with the ? removed
/<mml:(mn|mi|mo)>(.*)<\/mml:\1>/sU
the return is P, from the first occurrence, rather than the s from the second.
Typing from my phone, so will be brief.
Instead of matching any character (.+), match any character that is not the beginning of the next tag ([^<]+)
This way you don't have to worry about using back references, nor will you grab everything between two identical tags.
(Double check where I put the caret, this is off the top of my head. )
To get the last occurrence, wrap the whole regex in ()+
/(<mml:(mn|mi|mo)>([^<]+)<\/mml:\2>)+/
Here is an optimized pattern, which will not only run faster than Tim's, preg_match() will return less elements in the output array:
~<m{2}l:(m[ino])>\K[^<](?=</m{2}l:\1>$)~
Pattern Demo
Enhancements:
Replace standard pattern delimiter slash / with ~ to avoid escaping for improved brevity.
Use quantifiers for consecutive characters for improved efficiency. {2}
Use character class instead of pipes for improved efficiency and brevity. m[ino]
Use \K to start the fullstring match from middle of pattern, effectively removing the need for an extra capture group for improved efficiency.
Use negated character class to match desired character [^<] *note, if your desired substring is more than one character use: [^<]+
Use positive lookahead to accurately match closing tag followed by end of line anchor $.
PHP Implementation: (Demo)
echo preg_match('~<m{2}l:(m[ino])>\K[^<](?=</m{2}l:\1>$)~','<mml:mi>P</mml:mi><mml:mi>s</mml:mi>',$out)?$out[0]:'fail';
Output:
s

PHP : Matching strings between two strings

i have a problem with preg_match , i cant figure it out.
let the code say it :
function::wp_statistics_useronline::end
function::wp_statistics_visitor|today::end
function::wp_statistics_visitor|yesterday::end
function::wp_statistics_visitor|week::end
function::wp_statistics_visitor|month::end
function::wp_statistics_visitor|total::end
these are some string that run functions inside php;
when i use just one function::*::end it works just fine.
but when it contain more than one function , not working the way i want
it parse the match like :
function::wp_statistics_useronline::end function::wp_statistics_visitor|today::end AND ....::end
so basically i need Regex code that separate them and give me an array for each function::*::end
I assume you were actually using function::(.*)::end since function::*::end is never going to work (it can only match strings like "function::::::end").
The reason your regex failed with multiple matches on the same line is that the quantifier * is greedy by default, matching as many characters as possible. You need to make it lazy: function::(.*?)::end
It's pretty straight forward:
$result = preg_match_all('~function::(\S*)::end~m', $subject, $matches)
? $matches[1] : [];
Which gives:
Array
(
[0] => wp_statistics_useronline
[1] => wp_statistics_visitor|today
[2] => wp_statistics_visitor|yesterday
[3] => wp_statistics_visitor|week
[4] => wp_statistics_visitor|month
[5] => wp_statistics_visitor|total
)
And (for the second example):
Array
(
[0] => wp_statistics_useronline
[1] => wp_statistics_visitor|today
)
The regex in the example is a matching group around the part in the middle which does not contain whitespace. So \S* is a good fit.
As the matching group is the first one, you can retrieve it with $matches[1] as it's done after running the regular expression.
This is what you're looking for:
function\:\:(.*?)\:
Make sure you have the dot matches all identifier set.
After you get the matches, run it through a forloop and run an explode on "|", push it to an array and boom goes the dynamite, you've got what you're looking for.

PHP preg_match_all: extract parameters of a command

I have the following LaTeX command:
\autocites[][]{}[][]{}
where the parameters inside [] are optional the others inside {} are mandatory. The \autocites command can be extended by additional groups of arguments like:
\autocites[a1][a2]{a3}[b1][b2]{b3}
\autocites[a1][a2]{a3}[b1][b2]{b3}[c1][c2]{c3}
...
It can also be used like this:
\autocites{a}{b}
\autocites{a}[b1][]{b3}
\autocites{a}[][b2]{b3}
...
I'd like to extract its parameters by using a regular expression in PHP. This is my first attempt:
/\\autocites(\[(.*?)\])(\[(.*?)\])(\{(.*?)\})(\[(.*?)\])(\[(.*?)\])(\{(.*?)\})/
Although this works fine if \autocites contains only two groups of three parameters I'm not able to figure out how to get it working for an unknown number of parameters.
I also tried using the following expression:
/\\autocites((\[(.*?)\]\[(.*?)\])?\{(.*?)\}){2,}/
This time I'm able to match even larger numbers of parameters but then I'm not able to extract all values because PHP always just gives me the content of the last three parameters:
Array
(
[0] => Array
(
[0] => \autocites[a][b]{c}[d][e]{f}[a][a]{a}
)
[1] => Array
(
[0] => [a][a]{a}
)
[2] => Array
(
[0] => [a][a]
)
[3] => Array
(
[0] => a
)
[4] => Array
(
[0] => a
)
[5] => Array
(
[0] => a
)
)
Any help is greatly appreciated.
You'll have to do this in two steps. Only .NET can retrieve an arbitrary amount of captures. In all other flavors, the amount of resulting captures is fixed by the number of groups in your pattern (repeating a group will only overwrite previous captures).
So first, match the entire thing to get the parameters, and then extract them in a second step:
preg_match('/\\\\autocites((?:\{[^}]*\}|\[[^]]*\])+)/', $input, $autocite);
preg_match_all('/(?|\{([^}]*)\}|\[([^]]*)\])/', $autocite[1], $parameters);
// $parameters[1] will now be an array of all parameters
Working demo.
Using a slightly more elaborate approach and the anchor \G we could also do it all in one go, by using an arbitrary amount of matches instead of captures:
preg_match_all('/
(?| # two alternatives whose group numbers both begin at 1
\\\\autocites # match the command
(?|\{([^}]*)\}|\[([^]]*)\])
# and a parameter in group 1
| # OR
\G # anchor the match to the end of the last match
(?|\{([^}]*)\}|\[([^]]*)\])
# and match a parameter in group 1
)
/x',
$input,
$parameters);
// again, you'll have an array of parameters in $parameters[1]
Working demo.
Note that with this approach - if you have multiple autocites in your code, you'll get all parameters from all commands in a single list. There are some ways alleviate that, but I think the first approach would be cleaner in that case.
If you want to be able to distinguish between optional and mandatory parameters (with any approach), capture the opening or closing bracket/brace along with the parameter, and check against that character to find out which type it is.

preg_match spliting one word in two, returning last character as a captured group

This is my pattern:
$fullname = '/^([a-zA-Z]+)(?:[,]?[\s]?)?([a-zA-Z]+)((?:\s)([a-zA-Z]{1}))?$/i';
and I'm using it with preg_match($nameReg, $value, $match) in PHP.
I want to capture Lastname and then Firstname. Since I'll be using ajax to load results on the fly with a LIKE in my sql statement, I want to start with the lastname and not wait for the first name.
The problem is that when I only enter the lastname (first word), I get the last character of the lastname as a captured group.
Array
(
[0] => SMITH
[1] => SMIT
[2] => H
)
I'd like to get
Array
(
[0] => SMITH
[1] => SMITH
)
but I don't understand what is going on here.
That is because you require at least one letter in the second [a-zA-Z]+. If you make that second capturing group optional it should work:
/^([a-zA-Z]+)(?:[,]?[\s]?)?([a-zA-Z]+)?((?:\s)([a-zA-Z]{1}))?$/i
However, you are using the case-insensitive flag but still provide both upper- and lower-case variants. Plus {1} is always redundant. Lastly, single-character character classes are unnecessary, too, and while it might be a matter of taste, I think they only aid readability for spaces and characters that need to be escaped. This can be shortened:
/^([a-z]+)(?:,?\s?)?([a-z]+)?((?:\s)([a-z]))?$/i
Maybe it would also be a good idea to nest some of your optional groups. For example, there is no reason to allow a second name if there is no comma or space to delimit it:
/^([a-z]+)(?:(?:,\s?|\s)([a-z]+)?)?(?:\s([a-z]))?$/i

Why does this regex capture produce 2 matches?

I do not know why there are 2 matches found aside from the input using this regex, when I expected only 1 match.
preg_match(/_(\d(-\d){0,3})\./,$str,$matches);
on this file string format name_A-B-C-D.ext.
I would expect to get a single match like this:
Example A
[0] => name_A-B-C-D.ext
[1] => A-B-C-D
Example B
[0] => name_A-B-C.ext
[1] => A-B-C
But this is the result I get:
Example A
[0] => name_A-B-C-D.ext
[1] => A-B-C-D
[2] => -D
Example B
[0] => name_A-B-C.ext
[1] => A-B-C
[2] => -C
I only wish to capture A up to D if its preceded with a hyphen.
This code is usable and I can simply ignore the 2nd match, but I would like to know why its there. I can only assume it has something to do with my two capture groups. Where is my error ?
Yes, you get two captures because you have two capturing groups in your regular expression.
To avoid the unwanted capture you could use a non-capturing group (?:...):
/_(\d(?:-\d){0,3})\./
I can only assume it has something to do with my two capture groups.
Your assumption is correct
Where is my error ?
There is no error, everything is behaving as expected.
You have to groups in your RE, so you get 2 matches. What is surprising?
Each pair of parenthesis is a group.

Categories