PHP regex backreference not working - php

I wrote a regex pattern which works perfectly when I test it in Regexr, but when I use it in my PHP code it doesn't always match when it should match.
The regular expression, including some examples that should and shouldn't match.
Example PHP code that should match but doesn't:
preg_match('/^([~]{3,})\s*([\w-]+)?\s*(?:\{([\w-\s]+)\})?\s*(\2[\w-]+)?\s*$/', "~~~ {class} lang", $matches);
echo var_dump($matches);
I believe the problem is caused by the backreference in the last capture group (\2[\w-]+), however, I can't quire figure out how to fix this.

Because you're referring to a non-existing group(group 2). So remove \2 from the regex.
^([~]{3,})\s*([\w-]+)?\s*(?:\{([-\w\s]+)\})?\s*([\w-]+)?\s*$
DEMO
~~~ {class} lang
| | | |
Group1| Group3 Group4
|
Missing group 2

The problem is caused by capturing group #2, you have made this group optional. So since it may or may not exist, you need to make your backreference optional as well or else it always looks for a required group.
However, since all groups are optional I would just recurse the subpattern of the second group.
^(~{3,})\s*([\w-]+)?\s*(?:{([^}]+)})?\s*((?2))?\s*$
Example:
$str = '~~~ {class} lang';
preg_match('/^(~{3,})\s*([\w-]+)?\s*(?:{([^}]+)})?\s*((?2))?\s*$/', $str, $matches);
var_dump($matches);
Output
array(5) {
[0]=> string(16) "~~~ {class} lang"
[1]=> string(3) "~~~"
[2]=> string(0) "" # Returns "" for optional groups that dont exist
[3]=> string(5) "class"
[4]=> string(4) "lang"
}

The answers below helped me figure out why it wasn't working. However both the answers would give a positive match for $str = '~~~ lang {class} lang'; which I didn't want.
I fixed it my changing capturing group 2 to ([\w-]*) so that even if there is no string at that place, the capturing group exists but remains empty. This way all of the following strings match:
$str = '~~~ lang {no-lines float left} ';
$str = '~~~ {class} ';
$str = '~~~ lang';
$str = '~~~ {class } lang ';
$str = '~~~';
$str = '~~~lang{class}';
But this one won't:
$str = '~~~ css {class} php';
Full solution:
$str = '~~~ {class} lang';
preg_match('/^([~]{3,})\s*([\w-]*)?\s*(?:\{([\w-\s]+)\})?\s*(\2[\w-]+)?\s*$/', $str, $matches);
var_dump($matches);

Related

Match string with 1 or more trailing substrings

I have an input that goes like this
[d/D/d1/d2/d3/d4/d5/d6/d7/D1/D2/D3/D4/D5/D6/D7]+[\.]+[r1/r2/r3/r4/r5/r6/R1/R2/R3/R4/R5/R6]+[\.]+[number 1 to 37]+[#]+[number 0 - 9 ]
An example would be "d2.r1.4#100.37#1.9#2.3#1(can have as many 1-37 # 0-9 as needed)"
How do I write a regex match that can allow the last part of the string to be dynamic (matches as many groups as needed as inputted)
I've tried this expression:
[dD1-7]+\.[rR1-5]+\.
and I'm not sure how to match the dynamic group that comes after the "d2.r1." part.
Assuming you merely need to validate the string (and not capture/extract specific substrings), the following pattern provides the same result as Emma's answer but with a tighter syntax.
The i pattern modifier means you only have to write the two letters in lowercase. I don't use any excess non-capturing groups. Two-character character classes don't need a hyphen. \d is the shorter way of expressing [0-9].
Wrapping the final/repeating characters in parentheses then writing * means the sequence in the parentheses may repeat zero or more times.
Code: (Demo)
$inputs = [
'd2.r1.4#100.37#1.9#2.3#1',
'd2.r1.4#100.37#1.9#2.38#1.8#22',
'd2.r1.4#100.37#1.9#2.3#1.12#2.30#2',
];
$pattern = '/^d[1-7]\.r[1-6](?:\.(?:3[0-7]|[12]\d|[1-9])#\d+)*$/i';
foreach ($inputs as $input) {
echo "\n{$input}: ";
var_export((bool)preg_match($pattern, $input));
}
Output:
d2.r1.4#100.37#1.9#2.3#1: true
d2.r1.4#100.37#1.9#2.38#1.8#22: false
d2.r1.4#100.37#1.9#2.3#1.12#2.30#2: true
I'm guessing that maybe some expression similar to,
^[dD][1-7]\.[rR][1-6](?:(?:\.(?:3[0-7]|[1-2]\d|[1-9]))#[0-9]+)*$
or with some slight changes, would likely work here.
Test
$re = '/^[dD][1-7]\.[rR][1-6](?:(?:\.(?:3[0-7]|[1-2]\d|[1-9]))#[0-9]+)*$/m';
$str = 'd2.r1.4#100.37#1.9#2.3#1
d2.r1.4#100.37#1.9#2.3#1.37#1.9#2.3#1.37#1.9#2.3#1.37#1.9#2.3#1
d2.r1.4#100.38#1.9#2.3#1
d2.r1.4#100.0#1.9#2.3#1
';
preg_match_all($re, $str, $matches, PREG_SET_ORDER, 0);
var_dump($matches);
Output
array(2) {
[0]=>
array(1) {
[0]=>
string(24) "d2.r1.4#100.37#1.9#2.3#1"
}
[1]=>
array(1) {
[0]=>
string(63) "d2.r1.4#100.37#1.9#2.3#1.37#1.9#2.3#1.37#1.9#2.3#1.37#1.9#2.3#1"
}
}
If you wish to simplify/modify/explore the expression, it's been explained on the top right panel of regex101.com. If you'd like, you can also watch in this link, how it would match against some sample inputs.
RegEx Circuit
jex.im visualizes regular expressions:

Problem regular express pattern hunting similar matches

I have one of the four patter:
"Test"
'Test'
`Test`
(Test)
Is it possible to get "Test" with a single preg_match call?
I tried the following:
if ( preg_match( '/^(?:"(.*)"|\'(.*)\'|`(.*)`|\((.*)\')$/iu', $pattern, $matches ) )
... but this gives me five elements of $matches back. But I would like to have two only (One for the whole match and one for the found match with "Test" in it.)
To make sure that the single quote, back tick and double quote and have the same closing char you might use a capturing group with a backreference to that group.
To get the same group in the alternation to also match ( with the closing ) you might use a branch reset group.
The match for Test is in group 2
(?|(["'`])(Test)\1|\(((Test)\)))
Explanation
(?| Branch reset group
(["'`]) Capture in group 1 any of the listed
(Test)\1 Capture in group 2 matching Test followed by a backreference \1 to group 1
| Or
\(((Test)\)) Match (, capture in group 2 matching Test followed by )
) Close branch reset group
Regex demo | Php demo
For example:
$strings = [
"\"Test\"",
"'Test'",
"`Test`",
"(Test)",
"Test\"",
"'Test",
"Test`",
"(Test",
"\"Test'",
"'Test\"",
"`Test",
"Test)",
];
$pattern = '/(?|(["\'`])(Test)\1|\(((Test)\)))/';
foreach ($strings as $string){
$isMatch = preg_match($pattern, $string, $matches);
if ($isMatch) {
echo "Match $string ==> " . $matches[2] . PHP_EOL;
}
}
Result
Match "Test" ==> Test
Match 'Test' ==> Test
Match `Test` ==> Test
Match (Test) ==> Test
You can use dot to match the characters aroun d the word and use array_unique to remove duplicates.
preg_match_all("/.(\w+)./", $str,$match);
foreach($match as &$m) $m = array_unique($m);
var_dump($match);
https://3v4l.org/T2hnh
array(2) {
[0]=>
array(4) {
[0]=>
string(6) ""Test""
[1]=>
string(6) "'Test'"
[2]=>
string(6) "`Test`"
[3]=>
string(6) "(Test)"
}
[1]=>
&array(1) {
[0]=>
string(4) "Test"
}
}
You can use non-capturing groups :
'/^(?:"|\'|`|\()(.*)(?:"|\'|`|\))$/iu'
So just the (.*) group will capture data.
Your regex could be:
^['"`(](.+)['"`)]$
Which would give off the following code in PHP:
if(preg_match('^[\'"`(](.+)[\'"`)]$', $pattern, $matches))
Explanation
In Regex, character groups—marked with enclosing square brackets []— matches one of the characters inside of it.

preg_match_all : Assign same name to two subpatterns

I want to use a regex to match two different subpatterns and give them the same name using the PCRE_INFO_JCHANGED modifier (?J)
The two subpatterns are very different from each other so I have to catch them using |
What I usually do is give the two patterns a different name and then choose the one I want using PHP, but I'd like to know if it possible without PHP
Example here : https://3v4l.org/GEMeT (edited thanks to #JustOnUnderMillions)
The 2nd ?P<number> will always capture and replace the first ?P<number>
What I want : Capture both patterns with one regex and store them both with the same key number
Desired output :
Pattern 1
string(1) "1"
Pattern 2
string(1) "2"
Thanks for your help !
Dont use preg_match_all here
$regex = '/(?J)I wanna match pattern (?P<number>1) which is very different from pattern 2|(?P<number>2), again nothing to do with pattern 1 here/';
Result with preg_match:
array(3) {
[0]=>
string(62) "I wanna match pattern 1 which is very different from pattern 2"
["number"]=>
string(1) "1"
[1]=>
string(1) "1"
}
Full with fixed regex 'nothing similar' was not found in the orignal regex:
$text1 = 'I wanna match pattern 1 which is very different from pattern 2';
$text2 = 'I wanna match pattern 2, again nothing similar with pattern 1 here';
$regex = '/(?J)(I wanna match pattern (?P<number>1) which is very different from pattern 2|I wanna match pattern (?P<number>2), again nothing similar with pattern 1 here)/';
echo "Pattern 1\n";
preg_match( $regex, $text1, $matches );
var_dump($matches);
echo "\n\nPattern 2\n";
preg_match( $regex, $text2, $matches );
var_dump($matches);

Php preg_match issue not working

I am trying to find a php preg_match that can match:
"2-20 to 2-25"
from this text:
user levels 2-20 to 2-25 not ready
I tried
preg_match("/([0-9]+) to ([0-9]+)/", $vars[1] , $matchesto);
but the result is:
"20 to 2"
Any help appreciated.
Your pattern is almost correct; just include the dashes and adjust the capture group:
([-0-9]+ to [-0-9]+)
Example:
https://regex101.com/r/eD6lQ2/1
Thats because [0-9]+ matches one or more numbers but won't match a hyphen (-).
Try this:
$pattern = '~([0-9]+-[0-9]+) to ([0-9]+-[0-9]+)~Ui';
preg_match($pattern, $vars[1] , $matchesto);
You can use "\d" to match the digits:
<?php
$str = 'user levels 2-20 to 2-25 not ready';
$matches = array();
preg_match('/(\d+-\d+) to (\d+-\d+)/', $str, $matches);
var_dump($matches);
Output:
array(3) {
[0]=>
string(12) "2-20 to 2-25"
[1]=>
string(4) "2-20"
[2]=>
string(4) "2-25"
}

PHP regex - Take the short one

I have the string: This is a [[bla]] and i want a [[burp]] and i need to put in an array the 2 string [[bla]] and [[burp]].
The regexp i am trying to use is:
$pattern = "/\[\[.+\]\]/"
The problem is that the output is: [[bla]] and [[burp]] ,because i suppose it take the first [[ with the last ]]
How can i fix the pattern?
Make it ungreedy, see it on Regexr
/\[\[.+?\]\]/
or use a negated character class, see it on Regexr
/\[\[[^\]]+\]\]/
You need ungreedy repitition (lazy) matching here -> *? to get only the text between [[ ]] and not between [[ ]] [[ ]]:
$pattern = "/\[\[(.*?)\]\]/"
Also you need a matching group to get only the text between the square brackets and not the brackets itself -> (.*?)
Example:
$string = "This is a [[bla]] and i want a [[burp]]";
$pattern = "/\[\[(.*?)\]\]/";
preg_match_all($pattern , $string, $matches);
var_dump($matches[1]);
Output:
array(2) {
[0]=>
string(3) "bla"
[1]=>
string(4) "burp"
}

Categories