I have a list of regular expressions:
suresnes|suresne|surenes|surene
pommier|pommiers
^musique$
^(faq|aide)$
^(file )?loss( )?less$
paris
faq <<< this match twice
My use case is that each pattern which got a match display a link to my user,
so I can have multiple pattern matching.
I test thoses patterns against a simple string of text "live in paris" / "faq" / "pom"...
The simple way to do it is to loop over all the patterns with a preg_match, but I'm will do that a lot on a performance critical page, so this look bad to me.
Here is what I have tried: combining all thoses expressions into one with group names:
preg_match("#(?P<group1>^(faq|aide|todo|paris)$)|(?P<group2>(paris)$)#im", "paris", $groups);
As you can see, each pattern is grouped: (?P<GROUPNAME>PATTERN) and they are all separated by a pipe |.
The result is not what I expect, as only the first group matching is returned. Look like when a match occurs the parsing is stopped.
What I want is the list of all the matching groups. preg_match_all does not help neither.
Thanks!
How about:
preg_match("#(?=(?P<group1>^(faq|aide|todo|paris)$))(?=(?P<group2>(paris)$))#im", "paris", $groups);
print_r($groups);
output:
Array
(
[0] =>
[group1] => paris
[1] => paris
[2] => paris
[group2] => paris
[3] => paris
[4] => paris
)
The (?= ) is called lookahead
Explanation of the regex:
(?= # start lookahead
(?P<group1> # start named group group1
^ # start of string
( # start catpure group #1
faq|aide|todo|paris # match any of faq, aide, todo or paris
) # end capture group #1
$ # end of string
) # end of named group group1
) # end of lookahead
(?= # start lookahead
(?P<group2> # start named group group2
( # start catpure group #2
paris # paris
) # end capture group #2
$ # end of string
) # end of named group group2
) # end of lookahead
Try this approach:
#/ define input string
$str_1 = "{STRING HERE}";
#/ Define regex array
$reg_arr = array(
'suresnes|suresne|surenes|surene',
'pommier|pommiers',
'^musique$',
'^(faq|aide)$',
'^(file )?loss( )?less$',
'paris',
'faq'
);
#/ define a callback function to process Regex array
function cb_reg($reg_t)
{
global $str_1;
if(preg_match("/{$reg_t}/ims", $str_1, $matches)){
return $matches[1]; //replace regex pattern with the result of matching is the key trick here
//or return $matches[0]; if you dont want to get captured parenthesized subpatterns
//or you could return an array of both. its up to you how to do it.
}else{
return '';
}
}
#/ Apply array Regex via much faster function (instead of a loop)
$results = array_map('cb_reg', $reg_arr); //returns regex results
$results = array_diff($results, array('')); //remove empty values returned
Basically, this is the fastest way I could think of.
You can't combine say 100s of Regex into one call, as it would be very complex regex to build and will have several chances to fail matching. This is one of the best way to do it.
In my opinion, combining large number of Regex into 1 regex (if possibly achieved) will be slower to execute with preg_match, as compared to this approach of Callback on Arrays. Just remember, the key here is Callback function on array member values, which is fastest way to handle array for your and similar situation in php.
Also note,
The callback on Array is not equal to looping the Array. Looping is slower and has an n from algorithm analysis. But callback on array elements is internal and is very fast as compared.
You can combine all of your regexes with "|" in between them. Then apply this: http://www.rexegg.com/regex-optimizations.html, which will optimize it, collapse common expressions, etc.
Related
I'm pretty lousy at regex, and need help with the following scenario. I need to locate and replace text that has a common structure, but one aspect will be different:
here is a string (with 3 values)
here is another string (with 5 values)
In the above examples, I need to locate and then replace the value in parenthesis. I can't search by parens alone, as the string may contain other parens. But the value in the parens that needs to be replaced is consistently constructed: (with # values) -- the only difference will be the number.
So ideally the regex returns (with 3 values) and (with 5 values) so I can use a simple str_replace to change the text.
This is regex in a PHP script.
Try with this regex :
\(with\s+\d+\s+values\)
Demo here
The following regex should work for you:
/\(with (\d+) values\)/g
This matches strings of the specified format and gives the value in a capture group so it may be used in the replace. The g flag at the end is only needed if you have multiple of these in one string.
Demo here
If, however, there can only be one digit, then the following will work:
/\(with (\d) values\)/g
Or, if the number can only be a digit greater than 1, for example, then the following:
/\(with ([2-9]) values\)/g
If I got you right, you are looking for exactly three or five items within parentheses (comma separated).
This could be accomplished by
\( # "(" literally
(?:[^,()]+,){2} # not , or ( or ) exactly two times
(?:(?:[^,()]+,){2})? # repeated
[^,()]+ # without the comma in the end
\) # the closing parenthesis
See a demo on regex101.com.
If you're really looking only for two variant of strings, you could very easily do
\(with (?:3|5) values\)
In general
\(with \d+ values\)
as proposed by #SchoolBoy.
Something like this maybe
$str ="here is another string (with 5 values)";
preg_match_all("/\(with (\d+) values\)/", $str, $out );
print_r( $out );
Output:
Array
(
[0] => Array
(
[0] => (with 5 values)
)
[1] => Array
(
[0] => 5
)
)
Here at ideone...
It uses the regex
\(with (\d+) values\)
that matches the literal opening parentheses followed by the string with # values, capturing the actual number #, and finally the closing parentheses.
It returns the complete match (the parenthesized string) in the first dimension and the actual number in the second.
I have the string:
<mml:mi>P</mml:mi><mml:mn>2</mml:mn>
and wish to retrieve the 2
My pattern is:
/(?:<mml:)(mn|mi|mo)>(.+)(?:<\/mml:\1>)$/
the return is the 2 as it should be,
but if the string is:
<mml:mi>P</mml:mi><mml:mi>s</mml:mi>
the pattern should then return the s, from inside the second set of tags, but returns the P from inside the first set
P</mml:mi><mml:mi>s
when changing the pattern as in the suggestion below to:
/<mml:(mn|mi|mo)>(.*?)<\/mml:\1>/sU
the return is the same. The line of php is:
preg_match('/<mml:(mn|mi|mo)>(.*?)<\/mml:\1>/sU', '<mml:mi>P</mml:mi><mml:mi>s</mml:mi>', $ret, PREG_OFFSET_CAPTURE);
and $ret contains:
Array
(
[0] => Array
(
[0] => <mml:mi>P</mml:mi><mml:mi>s</mml:mi>
[1] => 0
)
[1] => Array
(
[0] => mi
[1] => 5
)
[2] => Array
(
[0] => P</mml:mi><mml:mi>s
[1] => 8
)
)
and when changed to the edited suggestion, with the ? removed
/<mml:(mn|mi|mo)>(.*)<\/mml:\1>/sU
the return is P, from the first occurrence, rather than the s from the second.
Typing from my phone, so will be brief.
Instead of matching any character (.+), match any character that is not the beginning of the next tag ([^<]+)
This way you don't have to worry about using back references, nor will you grab everything between two identical tags.
(Double check where I put the caret, this is off the top of my head. )
To get the last occurrence, wrap the whole regex in ()+
/(<mml:(mn|mi|mo)>([^<]+)<\/mml:\2>)+/
Here is an optimized pattern, which will not only run faster than Tim's, preg_match() will return less elements in the output array:
~<m{2}l:(m[ino])>\K[^<](?=</m{2}l:\1>$)~
Pattern Demo
Enhancements:
Replace standard pattern delimiter slash / with ~ to avoid escaping for improved brevity.
Use quantifiers for consecutive characters for improved efficiency. {2}
Use character class instead of pipes for improved efficiency and brevity. m[ino]
Use \K to start the fullstring match from middle of pattern, effectively removing the need for an extra capture group for improved efficiency.
Use negated character class to match desired character [^<] *note, if your desired substring is more than one character use: [^<]+
Use positive lookahead to accurately match closing tag followed by end of line anchor $.
PHP Implementation: (Demo)
echo preg_match('~<m{2}l:(m[ino])>\K[^<](?=</m{2}l:\1>$)~','<mml:mi>P</mml:mi><mml:mi>s</mml:mi>',$out)?$out[0]:'fail';
Output:
s
I have two type of strings in one text:
a(bc)de(fg)h
a(bcd(ef)g)h
I need to get text between first level parentheses. In my example this is:
bc
fg
bcd(ef)g
I tried to use next regular expression /\((.+)\)/ with Ungreedy (U) flag:
bc
fg
bcd(ef
And without it:
bc)de(fg
bcd(ef)g
Both variants don't do what I need. Maybe someone know how solve my issue?
Use PCRE Recursive pattern to match substrings in nested parentheses:
$str = "a(bc)de(fg)h some text a(bcd(ef)g)h ";
preg_match_all("/\((((?>[^()]+)|(?R))*)\)/", $str, $m);
print_r($m[1]);
The output:
Array
(
[0] => bc
[1] => fg
[2] => bcd(ef)g
)
\( ( (?>[^()]+) | (?R) )* \)
First it matches an opening parenthesis. Then it matches any number of
substrings which can either be a sequence of non-parentheses, or a
recursive match of the pattern itself (i.e. a correctly parenthesized
substring). Finally, there is a closing parenthesis.
Technical cautions:
If there are more than 15 capturing parentheses in a pattern, PCRE has
to obtain extra memory to store data during a recursion, which it does
by using pcre_malloc, freeing it via pcre_free afterwards. If no
memory can be obtained, it saves data for the first 15 capturing
parentheses only, as there is no way to give an out-of-memory error
from within a recursion.
This question pretty much has the answer, but the implementations are a little ambiguous. You can use the logic in the accepted answer without the ~s to get this regex:
\(((?:\[^\(\)\]++|(?R))*)\)
Tested with this output:
Please can you try that:
preg_match("/\((.+)\)/", $input_line, $output_array);
Test this code in http://www.phpliveregex.com/
Regex: \((.+)\)
Input: a(bcd(eaerga(er)gaergf)g)h
Output: array(2
0 => (bcd(eaerga(er)gaergf)g)
1 => bcd(eaerga(er)gaergf)g
)
Although I have enough knowledge of regex in pseudocode, I'm having trouble to translate what I want to do in php regex perl.
I'm trying to use preg_match to extract part of my expression.
I have the following string ${classA.methodA.methodB(classB.methodC(classB.methodD)))} and i need to do 2 things:
a. validate the syntax
${classA.methodA.methodB(classB.methodC(classB.methodD)))} valid
${classA.methodA.methodB} valid
${classA.methodA.methodB()} not valid
${methodB(methodC(classB.methodD)))} not valid
b. I need to extract those information
${classA.methodA.methodB(classB.methodC(classB.methodD)))} should return
1. classA
2. methodA
3. methodB(classB.methodC(classB.methodD)))
I've created this code
$expression = '${myvalue.fdsfs.fsdf.blo(fsdf.fsfds(fsfs.fs))}';
$pattern = '/\$\{(?:([a-zA-Z0-9]+)\.)(?:([a-zA-Z\d]+)\.)*([a-zA-Z\d.()]+)\}/';
if(preg_match($pattern, $expression, $matches))
{
echo 'found'.'<br/>';
for($i = 0; $i < count($matches); $i++)
echo $i." ".$matches[$i].'<br/>';
}
The result is :
found
0 ${myvalue.fdsfs.fsdf.blo(fsdf.fsfds(fsfs.fs))}
1 myvalue
2 fsdf
3 blo(fsdf.fsfds(fsfs.fs))
Obviously I'm having difficult to extract repetitive methods and it is not validating it properly (honestly I left it for last once i solve the other problem) so empty parenthesis are allowed and it is not checking whether or not that once a parenthesis is opened it must be closed.
Thanks all
UPDATE
X m.buettner
Thanks for your help. I did a fast try to your code but it gives a very small issue, although i can by pass it. The issue is the same of one of my prior codes that i didn't post here which is when i try this string :
$expression = '${myvalue.fdsfs}';
with your pattern definition it shows :
found
0 ${myvalue.fdsfs}
1 myvalue.fdsfs
2 myvalue
3
4 fdsfs
As you can see the third line is catched as a white space which is not present. I couldn't understand why it was doing that so can you suggest me how to or i do have to live with it due to php regex limits?
That said i just can tell you thank you. Not only you answered to my problem but also you tried to input as much as information as possible with many suggestion on proper path to follow when developing patterns.
One last thing i (stupid) forgot to add one little important case which is multiple parameters divided by a comma so
$expression = '${classA.methodAA(classB.methodBA(classC.methodCA),classC.methodCB)}';
$expression = '${classA.methodAA(classB.methodBA(classC.methodCA),classC.methodCB,classD.mehtodDA)}';
must be valid.
I edited to this
$expressionPattern =
'/
^ # beginning of the string
[$][{] # literal ${
( # group 1, used for recursion
( # group 2 (class name)
[a-z\d]+ # one or more alphanumeric characters
) # end of group 2 (class name)
[.] # literal .
( # group 3 (all intermediate method names)
(?: # non-capturing group that matches a single method name
[a-z\d]+ # one or more alphanumeric characters
[.] # literal .
)* # end of method name, repeat 0 or more times
) # end of group 3 (intermediate method names);
( # group 4 (final method name and arguments)
[a-z\d]+ # one or or more alphanumeric characters
(?: # non-capturing group for arguments
[(] # literal (
(?1) # recursively apply the pattern inside group 1
(?: # non-capturing group for multiple arguments
[,] # literal ,
(?1) # recursively apply the pattern inside group 1 on parameters
)* # end of multiple arguments group; repeat 0 or more times
[)] # literal )
)? # end of argument-group; make optional
) # end of group 4 (method name and arguments)
) # end of group 1 (recursion group)
[}] # literal }
$ # end of the string
/ix';
X Casimir et Hippolyte
Your suggestion also is good but it implies a little complex situation when using this code. I mean the code itself is easy to understand but it get less flexible. That said it also gave me a lot of information that surely can be helpful in the future.
X Denomales
Thanks for your support but your code falls when i try this :
$sourcestring='${classA1.methodA0.methodA1.methodB1(classB.methodC(classB.methodD))}';
the result is :
Array
(
[0] => Array
(
[0] => ${classA1.methodA0.methodA1.methodB1(classB.methodC(classB.methodD))}
)
[1] => Array
(
[0] => classA1
)
[2] => Array
(
[0] => methodA0
)
[3] => Array
(
[0] => methodA1.methodB1(classB.methodC(classB.methodD))
)
)
It should be
[2] => Array
(
[0] => methodA0.methodA1
)
[3] => Array
(
[0] => methodB1(classB.methodC(classB.methodD))
)
)
or
[2] => Array
(
[0] => methodA0
)
[3] => Array
(
[0] => methodA1
)
[4] => Array
(
[0] => methodB1(classB.methodC(classB.methodD))
)
)
This is a tough one. Recursive patterns are often beyond what's possible with regular expressions and even if it is possible, it can lead to very hard to expressions that are very hard to understand and maintain.
You are using PHP and therefore PCRE, which indeed supports the recursive regex constructs (?n). As your recursive pattern is quite regular it is possible to find a somewhat practical solution using regex.
One caveat I should mention right away: since you allow and arbitrary number of "intermediate" method calls per level (in your snippet fdsfs and fsdf), you can not get all of these in separate captures. That is simply impossible with PCRE. Each match will always yield the same finite number of captures, determined by the amount of opening parentheses your pattern contains. If a capturing group is used repeatedly (e.g. using something like ([a-z]+\.)+) then every time the group is used the previous capture will be overwritten and you only get the last instance. Therefore, I recommend that you capture all the "intermediate" method calls together, and then simply explode that result.
Likewise you couldn't (if you wanted to) get the captures of multiple nesting levels at once. Hence, your desired captures (where the last one includes all nesting levels) are the only option - you can then apply the pattern again to that last match to go a level further down.
Now for the actual expression:
$pattern = '/
^ # beginning of the string
[$][{] # literal ${
( # group 1, used for recursion
( # group 2 (class name)
[a-z\d]+ # one or more alphanumeric characters
) # end of group 2 (class name)
[.] # literal .
( # group 3 (all intermediate method names)
(?: # non-capturing group that matches a single method name
[a-z\d]+ # one or more alphanumeric characters
[.] # literal .
)* # end of method name, repeat 0 or more times
) # end of group 3 (intermediate method names);
( # group 4 (final method name and arguments)
[a-z\d]+ # one or or more alphanumeric characters
(?: # non-capturing group for arguments
[(] # literal (
(?1) # recursively apply the pattern inside group 1
[)] # literal )
)? # end of argument-group; make optional
) # end of group 4 (method name and arguments)
) # end of group 1 (recursion group)
[}] # literal }
$ # end of the string
/ix';
A few general notes: for complicated expressions (and in regex flavors that support it), always use the free-spacing x modifier which allows you to introduce whitespace and comments to format the expression to your desires. Without them, the pattern looks like this:
'/^[$][{](([a-z\d]+)[.]((?:[a-z\d]+[.])*)([a-z\d]+(?:[(](?1)[)])?))[}]$/ix'
Even if you've written the regex yourself and you are the only one who ever works on the project - try understanding this a month from now.
Second, I've slightly simplified the pattern by using the case-insenstive i modifier. It simply removes some clutter, because you can omit the upper-case variants of your letters.
Third, note that I use single-character classes like [$] and [.] to escape characters where this is possible. That is simply a matter of taste, and you are free to use the backslash variants. I just personally prefer the readability of the character classes (and I know others here disagree), so I wanted to present you this option as well.
Fourth, I've added anchors around your pattern, so that there can be no invalid syntax outside of the ${...}.
Finally, how does the recursion work? (?n) is similar to a backreference \n, in that it refers to capturing group n (counted by opening parentheses from left to right). The difference is that a backreference tries to match again what was matched by group n, whereas (?n) applies the pattern again. That is (.)\1 matches any characters twice in a row, whereas (.)(?1) matches any character and then applies the pattern again, hence matching another arbitrary character. If you use one of those (?n) constructs within the nth group, you get recursion. (?0) or (?R) refers to the entire pattern. That is all the magic there is.
The above pattern applied to the input
'${abc.def.ghi.jkl(mno.pqr(stu.vwx))}'
will result in the captures
0 ${abc.def.ghi.jkl(mno.pqr(stu.vwx))}
1 abc.def.ghi.jkl(mno.pqr(stu.vwx))
2 abc
3 def.ghi.
4 jkl(mno.pqr(stu.vwx))
Note that there are a few differences to the outputs you actually expected:
0 is the entire match (and in this case just the input string again). PHP will always report this first, so you cannot get rid of it.
1 is the first capturing group which encloses the recursive part. You don't need this in the output, but (?n) unfortunately cannot refer to non-capturing groups, so you need this as well.
2 is the class name as desired.
3 is the list of intermediate method names, plus a trailing period. Using explode it's easy to extract all the method names from this.
4 is the final method name, with the optional (recursive) argument list. Now you could take this, and apply the pattern again if necessary. Note that for a completely recursive approach you might want to modify the pattern slightly. That is: strip off the ${ and } in a separate first step, so that the entire pattern has the exact same (recursive) pattern as the final capture, and you can use (?0) instead of (?1). Then match, remove method name, and parentheses, and repeat, until you get no more parentheses in the last capture.
For more information on recursion, have a look at PHP's PCRE documentation.
To illustrate my last point, here is a snippet that extracts all elements recursively:
if(!preg_match('/^[$][{](.*)[}]$/', $expression, $matches))
echo 'Invalid syntax.';
else
traverseExpression($matches[1]);
function traverseExpression($expression, $level = 0) {
$pattern = '/^(([a-z\d]+)[.]((?:[a-z\d]+[.])*)([a-z\d]+(?:[(](?1)[)])?))$/i';
if(preg_match($pattern, $expression, $matches)) {
$indent = str_repeat(" ", 4*$level);
echo $indent, "Class name: ", $matches[2], "<br />";
foreach(explode(".", $matches[3], -1) as $method)
echo $indent, "Method name: ", $method, "<br />";
$parts = preg_split('/[()]/', $matches[4]);
echo $indent, "Method name: ", $parts[0], "<br />";
if(count($parts) > 1) {
echo $indent, "With arguments:<br />";
traverseExpression($parts[1], $level+1);
}
}
else
{
echo 'Invalid syntax.';
}
}
Note again, that I do not recommend using the pattern as a one-liner, but this answer is already long enough.
you can do validation and extraction with the same pattern, example:
$subjects = array(
'${classA.methodA.methodB(classB.methodC(classB.methodD))}',
'${classA.methodA.methodB}',
'${classA.methodA.methodB()}',
'${methodB(methodC(classB.methodD))}',
'${classA.methodA.methodB(classB.methodC(classB.methodD(classC.methodE)))}',
'${classA.methodA.methodB(classB.methodC(classB.methodD(classC.methodE())))}'
);
$pattern = <<<'LOD'
~
# definitions
(?(DEFINE)(?<vn>[a-z]\w*+))
# pattern
^\$\{
(?<classA>\g<vn>)\.
(?<methodA>\g<vn>)\.
(?<methodB>
\g<vn> (
\( \g<vn> \. \g<vn> (?-1)?+ \)
)?+
)
}$
~x
LOD;
foreach($subjects as $subject) {
echo "\n\nsubject: $subject";
if (preg_match($pattern, $subject, $m))
printf("\nclassA: %s\nmethodA: %s\nmethodB: %s",
$m['classA'], $m['methodA'], $m['methodB']);
else
echo "\ninvalid string";
}
Regex explanation:¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯
At the end of the pattern you can see the modifier x that allow spaces, newlines and commentary inside the pattern.
First the pattern begin with the definition of a named group vn (variable name), here you can define how classA or methodB looks like for all the pattern. Then you can refer to this definition in all the pattern with \g<vn>
Note that you can define if you want different type of name for classes and method adding other definitions. Example:
(?(DEFINE)(?<cn>....)) # for class name
(?(DEFINE)(?<mn>....)) # for method name
The pattern itself:
(?<classA>\g<vn>) capture in the named group classA with the pattern defined in vn
same thing for methodA
methodB is different cause it can contain nested parenthesis, it's the reason why i use a recursive pattern for this part.
Detail:
\g<vn> # the method name (methodB)
( # open a capture group
\( # literal opening parenthesis
\g<vn> \. \g<vn> # for classB.methodC⑴
(?-1)?+ # refer the last capture group (the actual capture group)
# one or zero time (possessive) to allow the recursion stop
# when there is no more level of parenthesis
\) # literal closing parenthesis
)?+ # close the capture group
# one or zero time (possessive)
# to allow method without parameters
⑴you can replace it by \g<vn>(?>\.\g<vn>)+ if you want to allow more than one method.
About possessive quantifiers:
You can add + after a quantifier ( * + ? ) to make it possessive, the advantage is that the regex engine know that it don't have to backtrack to test other ways to match with a subpattern. The regex is then more efficient.
Description
This expression will match and capture only ${classA.methodA.methodB(classB.methodC(classB.methodD)))} or ${classA.methodA.methodB} formats.
(?:^|\n|\r)[$][{]([^.(}]*)[.]([^.(}]*)[.]([^(}]*(?:[(][^}]+[)])?)[}](?=\n|\r|$)
Groups
Group 0 gets the entire match from the start dollar sign to the close squiggly bracket
gets the Class
gets the first method
gets the second method followed by all the text upto but not including the close squiggly bracket. If this group has open round brackets which are empty () then this match will fail
PHP Code Example:
<?php
$sourcestring="${classA1.methodA1.methodB1(classB.methodC(classB.methodD)))}
${classA2.methodA2.methodB2}
${classA3.methodA3.methodB3()}
${methodB4(methodC4(classB4.methodD)))}
${classA5.methodA5.methodB5(classB.methodC(classB.methodD)))}";
preg_match_all('/(?:^|\n|\r)[$][{]([^.(}]*)[.]([^.(}]*)[.]([^(}]*(?:[(][^}]+[)])?)[}](?=\n|\r|$)/im',$sourcestring,$matches);
echo "<pre>".print_r($matches,true);
?>
$matches Array:
(
[0] => Array
(
[0] => ${classA1.methodA1.methodB1(classB.methodC(classB.methodD)))}
[1] =>
${classA2.methodA2.methodB2}
[2] =>
${classA5.methodA5.methodB5(classB.methodC(classB.methodD)))}
)
[1] => Array
(
[0] => classA1
[1] => classA2
[2] => classA5
)
[2] => Array
(
[0] => methodA1
[1] => methodA2
[2] => methodA5
)
[3] => Array
(
[0] => methodB1(classB.methodC(classB.methodD)))
[1] => methodB2
[2] => methodB5(classB.methodC(classB.methodD)))
)
)
Disclaimers
I added a number to the end of the class and method names to help illistrate what's happening in the groups
The sample text provided in the OP does not have balanced open and close round brackets.
Although () will be disallowed (()) will be allowed
This is from the PHP manual regarding PCRE conditional subpatterns:
The two possible forms of conditional subpattern are:
(?(condition)yes-pattern)
(?(condition)yes-pattern|no-pattern)
That's OK as long as the condition is a digit or an assertion. But I don't quite understand the following
If the condition is the string (R), it is satisfied if a recursive
call to the pattern or subpattern has been made. At "top level", the
condition is false. (...) If the condition is not a sequence of digits
or (R), it must be an assertion.
I would be grateful if someone could explain on an example what (R) is in conditional subpattern and how to use it. Thanks in advance.
As an additional and clearer answer…
2 days ago I was writing a pattern to match an IPv4 address and I found myself using the recursion in condition so I thought I should share (because it makes more sense than imaginative examples).
~
(?:(?:f|ht)tps?://)? # possibly a protocol
(
(?(R)\.) # if it\'s a recursion, require a dot
(?: # this part basically looks for 0-255
2(?:[0-4]\d|5[0-5])
| 1\d\d
| \d\d?
)
)(?1){3} # go into recursion 3 times
# for clarity I\'m not including the remaining part
~xi
From what I understand (from the recursion as the condition in a subpattern) here's a very basic example.
$str = 'ds1aadfg346fgf gd4th9u6eth0';
preg_match_all('~(?(R).(?(?=[^\d])(?R))|\d(?R)?)~'
/*
(? # [begin outer cond.subpat.]
(R) # if this is a recursion ------> IF
. # match the first char
(? # [begin inner cond.subpat.]
(?=[^\d]) # if the next char is not a digit
(?R) # reenter recursion
) # [end inner cond.subpat.]
| # otherwise -----> ELSE
\d(?R)? # match a digit and enter recursion (note the ?)
) # [end outer cond.subpat.]
*/
,$str,$m);
print_r($m[0]);
And the output:
Array
(
[0] => 1aadfg
[1] => 34
[2] => 6fgf gd
[3] => 4th
[4] => 9u
[5] => 6eth
[6] => 0
)
I know this is a silly example but I hope it makes sense.
The (R) stands for recursion. Here is a good example of using it.
Recursive patterns
Not sure I have ever seen (?R) used as the condition, or even a situation where that would be usable, or at least not in my understanding. but you learn new stuff every day in programming.
It could be used very easily as the true or false statement.
as per this:
< (?: (?(R) \d++ | [^<>]*+) | (?R)) * >
Where as (?R) is used in the false statement.
Which matches text in angle brackets, allowing for arbitrary nesting. Only digits are allowed in nested brackets (that is, when recursing), whereas any characters are permitted at the outer level.
I know this is not the answer you are looking for.... You have now sent me on a quest to research this.