preg_match_all : Assign same name to two subpatterns - php

I want to use a regex to match two different subpatterns and give them the same name using the PCRE_INFO_JCHANGED modifier (?J)
The two subpatterns are very different from each other so I have to catch them using |
What I usually do is give the two patterns a different name and then choose the one I want using PHP, but I'd like to know if it possible without PHP
Example here : https://3v4l.org/GEMeT (edited thanks to #JustOnUnderMillions)
The 2nd ?P<number> will always capture and replace the first ?P<number>
What I want : Capture both patterns with one regex and store them both with the same key number
Desired output :
Pattern 1
string(1) "1"
Pattern 2
string(1) "2"
Thanks for your help !

Dont use preg_match_all here
$regex = '/(?J)I wanna match pattern (?P<number>1) which is very different from pattern 2|(?P<number>2), again nothing to do with pattern 1 here/';
Result with preg_match:
array(3) {
[0]=>
string(62) "I wanna match pattern 1 which is very different from pattern 2"
["number"]=>
string(1) "1"
[1]=>
string(1) "1"
}
Full with fixed regex 'nothing similar' was not found in the orignal regex:
$text1 = 'I wanna match pattern 1 which is very different from pattern 2';
$text2 = 'I wanna match pattern 2, again nothing similar with pattern 1 here';
$regex = '/(?J)(I wanna match pattern (?P<number>1) which is very different from pattern 2|I wanna match pattern (?P<number>2), again nothing similar with pattern 1 here)/';
echo "Pattern 1\n";
preg_match( $regex, $text1, $matches );
var_dump($matches);
echo "\n\nPattern 2\n";
preg_match( $regex, $text2, $matches );
var_dump($matches);

Related

Match string with 1 or more trailing substrings

I have an input that goes like this
[d/D/d1/d2/d3/d4/d5/d6/d7/D1/D2/D3/D4/D5/D6/D7]+[\.]+[r1/r2/r3/r4/r5/r6/R1/R2/R3/R4/R5/R6]+[\.]+[number 1 to 37]+[#]+[number 0 - 9 ]
An example would be "d2.r1.4#100.37#1.9#2.3#1(can have as many 1-37 # 0-9 as needed)"
How do I write a regex match that can allow the last part of the string to be dynamic (matches as many groups as needed as inputted)
I've tried this expression:
[dD1-7]+\.[rR1-5]+\.
and I'm not sure how to match the dynamic group that comes after the "d2.r1." part.
Assuming you merely need to validate the string (and not capture/extract specific substrings), the following pattern provides the same result as Emma's answer but with a tighter syntax.
The i pattern modifier means you only have to write the two letters in lowercase. I don't use any excess non-capturing groups. Two-character character classes don't need a hyphen. \d is the shorter way of expressing [0-9].
Wrapping the final/repeating characters in parentheses then writing * means the sequence in the parentheses may repeat zero or more times.
Code: (Demo)
$inputs = [
'd2.r1.4#100.37#1.9#2.3#1',
'd2.r1.4#100.37#1.9#2.38#1.8#22',
'd2.r1.4#100.37#1.9#2.3#1.12#2.30#2',
];
$pattern = '/^d[1-7]\.r[1-6](?:\.(?:3[0-7]|[12]\d|[1-9])#\d+)*$/i';
foreach ($inputs as $input) {
echo "\n{$input}: ";
var_export((bool)preg_match($pattern, $input));
}
Output:
d2.r1.4#100.37#1.9#2.3#1: true
d2.r1.4#100.37#1.9#2.38#1.8#22: false
d2.r1.4#100.37#1.9#2.3#1.12#2.30#2: true
I'm guessing that maybe some expression similar to,
^[dD][1-7]\.[rR][1-6](?:(?:\.(?:3[0-7]|[1-2]\d|[1-9]))#[0-9]+)*$
or with some slight changes, would likely work here.
Test
$re = '/^[dD][1-7]\.[rR][1-6](?:(?:\.(?:3[0-7]|[1-2]\d|[1-9]))#[0-9]+)*$/m';
$str = 'd2.r1.4#100.37#1.9#2.3#1
d2.r1.4#100.37#1.9#2.3#1.37#1.9#2.3#1.37#1.9#2.3#1.37#1.9#2.3#1
d2.r1.4#100.38#1.9#2.3#1
d2.r1.4#100.0#1.9#2.3#1
';
preg_match_all($re, $str, $matches, PREG_SET_ORDER, 0);
var_dump($matches);
Output
array(2) {
[0]=>
array(1) {
[0]=>
string(24) "d2.r1.4#100.37#1.9#2.3#1"
}
[1]=>
array(1) {
[0]=>
string(63) "d2.r1.4#100.37#1.9#2.3#1.37#1.9#2.3#1.37#1.9#2.3#1.37#1.9#2.3#1"
}
}
If you wish to simplify/modify/explore the expression, it's been explained on the top right panel of regex101.com. If you'd like, you can also watch in this link, how it would match against some sample inputs.
RegEx Circuit
jex.im visualizes regular expressions:

Problem regular express pattern hunting similar matches

I have one of the four patter:
"Test"
'Test'
`Test`
(Test)
Is it possible to get "Test" with a single preg_match call?
I tried the following:
if ( preg_match( '/^(?:"(.*)"|\'(.*)\'|`(.*)`|\((.*)\')$/iu', $pattern, $matches ) )
... but this gives me five elements of $matches back. But I would like to have two only (One for the whole match and one for the found match with "Test" in it.)
To make sure that the single quote, back tick and double quote and have the same closing char you might use a capturing group with a backreference to that group.
To get the same group in the alternation to also match ( with the closing ) you might use a branch reset group.
The match for Test is in group 2
(?|(["'`])(Test)\1|\(((Test)\)))
Explanation
(?| Branch reset group
(["'`]) Capture in group 1 any of the listed
(Test)\1 Capture in group 2 matching Test followed by a backreference \1 to group 1
| Or
\(((Test)\)) Match (, capture in group 2 matching Test followed by )
) Close branch reset group
Regex demo | Php demo
For example:
$strings = [
"\"Test\"",
"'Test'",
"`Test`",
"(Test)",
"Test\"",
"'Test",
"Test`",
"(Test",
"\"Test'",
"'Test\"",
"`Test",
"Test)",
];
$pattern = '/(?|(["\'`])(Test)\1|\(((Test)\)))/';
foreach ($strings as $string){
$isMatch = preg_match($pattern, $string, $matches);
if ($isMatch) {
echo "Match $string ==> " . $matches[2] . PHP_EOL;
}
}
Result
Match "Test" ==> Test
Match 'Test' ==> Test
Match `Test` ==> Test
Match (Test) ==> Test
You can use dot to match the characters aroun d the word and use array_unique to remove duplicates.
preg_match_all("/.(\w+)./", $str,$match);
foreach($match as &$m) $m = array_unique($m);
var_dump($match);
https://3v4l.org/T2hnh
array(2) {
[0]=>
array(4) {
[0]=>
string(6) ""Test""
[1]=>
string(6) "'Test'"
[2]=>
string(6) "`Test`"
[3]=>
string(6) "(Test)"
}
[1]=>
&array(1) {
[0]=>
string(4) "Test"
}
}
You can use non-capturing groups :
'/^(?:"|\'|`|\()(.*)(?:"|\'|`|\))$/iu'
So just the (.*) group will capture data.
Your regex could be:
^['"`(](.+)['"`)]$
Which would give off the following code in PHP:
if(preg_match('^[\'"`(](.+)[\'"`)]$', $pattern, $matches))
Explanation
In Regex, character groups—marked with enclosing square brackets []— matches one of the characters inside of it.

php preg_match_all returning array of arrays

I want to replace some template tags:
$tags = '{name} text {first}';
preg_match_all('~\{(\w+)\}~', $tags, $matches);
var_dump($matches);
output is:
array(2) {
[0]=> array(2) {
[0]=> string(6) "{name}"
[1]=> string(7) "{first}"
}
[1]=> array(2) {
[0]=> string(4) "name"
[1]=> string(5) "first"
}
}
why are there inside 2 arrays? How to achieve only second one?
The sort answer:
Is there an alternative? Of course there is: lookaround assertions allow you to use zero-width (non-captured) single char matches easily:
preg_match_all('/(?<=\{)\w+(?=})/', $tags, $matches);
var_dump($matches);
Will dump this:
array(1) {
[0]=>
array(2) {
[0]=>
string(4) "name"
[1]=>
string(5) "first"
}
}
The pattern:
(?<=\{): positive lookbehind - only match the rest of the pattern if there's a { character in front of it (but don't capture it)
\w+: word characters are matches
(?=}): only match preceding pattern if it is followed by a } character (but don't capture the } char)
It's that simple: the pattern uses the {} delimiter chars as conditions for the matches, but doesn't capture them
Explaining this $matches array structure a bit:
The reason why $matches looks the way it does is quite simple: when using preg_match(_all), the first entry in the match array will always be the entire string matched by the given regex. That's why I used zero-width lookaround assertions, instead of groups. Your expression matches "{name}" in its entirety, and extracts "name" through grouping.
The matches array will hold the full match on index 0, and add groups at every subsequent index, in your case that means that:
$matches[0] will contain all substrings matching /\{\w+\}/ as a pattern.
$matches[1] will contain all substrings that were captured (/\{(\w+)\}/ captures (\w+)).
If you were to have a regex like this: /\{((\w)([^}]+))}/ the matches array will look something like this:
[
0 => [
'{name}',//as if you'd written /\{\w[^}]+}/
],
1 => [
'name',//matches group (\w)([^}]+), as if you wrote (\w[^}]+)
],
2 => [
'n',//matches (\w) group
],
3 => [
'ame',//and this is the ([^}]+) group obviously
]
]
Why? simple because the pattern contains 3 matching groups. Like I said: the first index in the matches array will always be the full match, regardless of capture groups. The groups are then appended to the array in the order the appear in in the expression. So if we analyze the expression:
\{: not matches, but part of the pattern, will only be in the $matches[0] values
((\w)([^}]+)): Start of first matching group, \w[^}]+ match is grouped here, $matches[1] will contain these values
(\w): Second group, a single \w char (ie first character after {. $matches[2] will therefore contain all first characters after a {
([^}]+): Third group, matches rest of string after {\w until a } is encountered, this will make out the $matches[3] values
To better understand, and be able to predict the way $matches will get populated, I'd strongly recommend you use this site: regex101. Write your expression there, and it'll break it all down for you on the right hand side, listing the groups. For example:
/\{((\w)([^}]+))}/
Is broken down like this:
/\{((\w)([^}]+))}/
\{ matches the character { literally
1st Capturing group ((\w)([^}]+))
2nd Capturing group (\w)
\w match any word character [a-zA-Z0-9_]
3rd Capturing group ([^}]+)
[^}]+ match a single character not present in the list below
Quantifier: + Between one and unlimited times, as many times as possible, giving back as needed [greedy]
} the literal character }
} matches the character } literally
Looking at the capturing groups, you can now confidently say what $matches will look like, and you can safely say that $matches[2] will be an array of single characters.
Of course, this may leave you wondering why $matches is a 2D array. Well, that again is really quite easy: What you can predict is how many match indexes a $matches array will contain: 1 for the full pattern, then +1 for each capture group. What you Can't predict, though, is how many matches you'll find.
So what preg_match_all does is really quite simple: fill $matches[0] with all substrings that match the entire pattern, then extract each group substring from these matches and append that value onto the respective $matches arrays. In other words, the number of arrays that you can find in $matches is a given: it depends on the pattern. The number of keys you can find in the sub-arrays of $matches is an unknown, it depends on the string you're processing. If preg_match_all were to return a 1D array, it would be a lot harder to process the matches, now you can simply write this:
$total = count($matches);
foreach ($matches[0] as $k => $full) {
echo $full . ' contains: ' . PHP_EOL;
for ($i=1;$i<$total;++$i) {
printf(
'Group %d: %s' . PHP_EOL,
$i, $matches[$i][$k]
);
}
}
If preg_match_all created a flat array, you'd have to keep track of the amount of groups in your pattern. Whenever the pattern changes, you'd also have make sure to update the rest of the code to reflect the changes made to the pattern, making your code harder to maintain, whilst making it more error-prone, too
Thats because your regex could have multiple match groups - if you have more (..) you would have more entries in your array. The first one[0] ist always the whole match.
If you want an other order of the array, you could use PREG_SET_ORDER as the 4. argument for preg_match_all. Doing this would result in the following
array(2) {
[0]=> array(2) {
[0]=> string(6) "{name}"
[1]=> string(7) "name"
}
[1]=> array(2) {
[0]=> string(4) "{first}"
[1]=> string(5) "first"
}
}
this could be easier if you loop over your result in a foreach loop.
If you only interessted in the first match - you should stay with the default PREG_PATTERN_ORDER and just use $matches[1]

PHP regex backreference not working

I wrote a regex pattern which works perfectly when I test it in Regexr, but when I use it in my PHP code it doesn't always match when it should match.
The regular expression, including some examples that should and shouldn't match.
Example PHP code that should match but doesn't:
preg_match('/^([~]{3,})\s*([\w-]+)?\s*(?:\{([\w-\s]+)\})?\s*(\2[\w-]+)?\s*$/', "~~~ {class} lang", $matches);
echo var_dump($matches);
I believe the problem is caused by the backreference in the last capture group (\2[\w-]+), however, I can't quire figure out how to fix this.
Because you're referring to a non-existing group(group 2). So remove \2 from the regex.
^([~]{3,})\s*([\w-]+)?\s*(?:\{([-\w\s]+)\})?\s*([\w-]+)?\s*$
DEMO
~~~ {class} lang
| | | |
Group1| Group3 Group4
|
Missing group 2
The problem is caused by capturing group #2, you have made this group optional. So since it may or may not exist, you need to make your backreference optional as well or else it always looks for a required group.
However, since all groups are optional I would just recurse the subpattern of the second group.
^(~{3,})\s*([\w-]+)?\s*(?:{([^}]+)})?\s*((?2))?\s*$
Example:
$str = '~~~ {class} lang';
preg_match('/^(~{3,})\s*([\w-]+)?\s*(?:{([^}]+)})?\s*((?2))?\s*$/', $str, $matches);
var_dump($matches);
Output
array(5) {
[0]=> string(16) "~~~ {class} lang"
[1]=> string(3) "~~~"
[2]=> string(0) "" # Returns "" for optional groups that dont exist
[3]=> string(5) "class"
[4]=> string(4) "lang"
}
The answers below helped me figure out why it wasn't working. However both the answers would give a positive match for $str = '~~~ lang {class} lang'; which I didn't want.
I fixed it my changing capturing group 2 to ([\w-]*) so that even if there is no string at that place, the capturing group exists but remains empty. This way all of the following strings match:
$str = '~~~ lang {no-lines float left} ';
$str = '~~~ {class} ';
$str = '~~~ lang';
$str = '~~~ {class } lang ';
$str = '~~~';
$str = '~~~lang{class}';
But this one won't:
$str = '~~~ css {class} php';
Full solution:
$str = '~~~ {class} lang';
preg_match('/^([~]{3,})\s*([\w-]*)?\s*(?:\{([\w-\s]+)\})?\s*(\2[\w-]+)?\s*$/', $str, $matches);
var_dump($matches);

Re-order regular expression matches

Is there a way to get the match patterns to change order? For example if you have a string with letters-digits and using preg_match_all(), and you want the resulting match array to have the digits before the letters. Is there a way to specify this in the regular expression itself?
So "aaa-111" would result in matches with
array(0 => '111', 1 => 'aaa');
Perhaps named capture groups will help. Example:
preg_match('/(?<alphapart>[a-z]+)-(?<numpart>[0-9]+)/', 'aaa-111', $matches);
$matches:
array('alphapart' => 'asd', 'numpart' => '111')
This way you can refer to the matches by a name instead of whatever order index they were matched in.
Edit: Just for accuracy, I want to note that $matches will actually include the matches by index as well, so the actual $matches will be: array(5) { [0]=> string(7) "aaa-111" ["alphapart"]=> string(3) "aaa" [1]=> string(3) "aaa" ["numpart"]=> string(3) "111" [2]=> string(3) "111" }
The order of groups in a regex is dependent on their positions in the regex and the string. Changing the order would make it very confusing.
What you can do is use "named groups".
/(?P<letters>\w*)-(?P<digits>\d*)/
The array will still be in the same order, but, you can use $matches['digits'] to easily get just the digits.
DEMO: http://ideone.com/3tRJLZ
Yes you can. You can use lookaheads that don't push the 'cursor' and so you could first match the last part, and then the first part. It works with (?=regex)
This works:
(?=\w+\-(\d+))(\w+)\-\d+
but will also give the full match at index 0. Like ["aaa-111", "111", "aaa"]
is that a problem?
I don't believe there is. Regex isn't designed to sort. You could setup two different regular expressions to check for each pattern though. This code will echo the two string in num/alpha order as you requested:
<?php
header('Content-Type: text/plain');
$string1 = 'aaa-123';
$string2 = '123-aaa';
echo 'String 1: '.$string1."\n";
echo 'String 2: '.$string2."\n";
$pattern1 = '/([\d]+)-([a-z]+)/i';
$pattern2 = '/([a-z]+)-([\d]+)/i';
echo 'Result 1: ';
if(preg_match($pattern1, $string1, $matches))
{
echo $matches[1].' '.$matches[2]."\n";
}
if(preg_match($pattern2, $string1, $matches))
{
echo $matches[2].' '.$matches[1]."\n";
}
echo 'Result 2: ';
if(preg_match($pattern1, $string2, $matches))
{
echo $matches[1].' '.$matches[2]."\n";
}
if(preg_match($pattern2, $string2, $matches))
{
echo $matches[2].' '.$matches[1]."\n";
}
?>
The resulting output is:
String 1: aaa-123
String 2: 123-aaa
Result 1: 123 aaa
Result 2: 123 aaa
If you want the digits to be there first, you need to sort the array yourself.
array_sort() will... sort it out for you.

Categories