PHP regex - Take the short one - php

I have the string: This is a [[bla]] and i want a [[burp]] and i need to put in an array the 2 string [[bla]] and [[burp]].
The regexp i am trying to use is:
$pattern = "/\[\[.+\]\]/"
The problem is that the output is: [[bla]] and [[burp]] ,because i suppose it take the first [[ with the last ]]
How can i fix the pattern?

Make it ungreedy, see it on Regexr
/\[\[.+?\]\]/
or use a negated character class, see it on Regexr
/\[\[[^\]]+\]\]/

You need ungreedy repitition (lazy) matching here -> *? to get only the text between [[ ]] and not between [[ ]] [[ ]]:
$pattern = "/\[\[(.*?)\]\]/"
Also you need a matching group to get only the text between the square brackets and not the brackets itself -> (.*?)
Example:
$string = "This is a [[bla]] and i want a [[burp]]";
$pattern = "/\[\[(.*?)\]\]/";
preg_match_all($pattern , $string, $matches);
var_dump($matches[1]);
Output:
array(2) {
[0]=>
string(3) "bla"
[1]=>
string(4) "burp"
}

Related

Problem regular express pattern hunting similar matches

I have one of the four patter:
"Test"
'Test'
`Test`
(Test)
Is it possible to get "Test" with a single preg_match call?
I tried the following:
if ( preg_match( '/^(?:"(.*)"|\'(.*)\'|`(.*)`|\((.*)\')$/iu', $pattern, $matches ) )
... but this gives me five elements of $matches back. But I would like to have two only (One for the whole match and one for the found match with "Test" in it.)
To make sure that the single quote, back tick and double quote and have the same closing char you might use a capturing group with a backreference to that group.
To get the same group in the alternation to also match ( with the closing ) you might use a branch reset group.
The match for Test is in group 2
(?|(["'`])(Test)\1|\(((Test)\)))
Explanation
(?| Branch reset group
(["'`]) Capture in group 1 any of the listed
(Test)\1 Capture in group 2 matching Test followed by a backreference \1 to group 1
| Or
\(((Test)\)) Match (, capture in group 2 matching Test followed by )
) Close branch reset group
Regex demo | Php demo
For example:
$strings = [
"\"Test\"",
"'Test'",
"`Test`",
"(Test)",
"Test\"",
"'Test",
"Test`",
"(Test",
"\"Test'",
"'Test\"",
"`Test",
"Test)",
];
$pattern = '/(?|(["\'`])(Test)\1|\(((Test)\)))/';
foreach ($strings as $string){
$isMatch = preg_match($pattern, $string, $matches);
if ($isMatch) {
echo "Match $string ==> " . $matches[2] . PHP_EOL;
}
}
Result
Match "Test" ==> Test
Match 'Test' ==> Test
Match `Test` ==> Test
Match (Test) ==> Test
You can use dot to match the characters aroun d the word and use array_unique to remove duplicates.
preg_match_all("/.(\w+)./", $str,$match);
foreach($match as &$m) $m = array_unique($m);
var_dump($match);
https://3v4l.org/T2hnh
array(2) {
[0]=>
array(4) {
[0]=>
string(6) ""Test""
[1]=>
string(6) "'Test'"
[2]=>
string(6) "`Test`"
[3]=>
string(6) "(Test)"
}
[1]=>
&array(1) {
[0]=>
string(4) "Test"
}
}
You can use non-capturing groups :
'/^(?:"|\'|`|\()(.*)(?:"|\'|`|\))$/iu'
So just the (.*) group will capture data.
Your regex could be:
^['"`(](.+)['"`)]$
Which would give off the following code in PHP:
if(preg_match('^[\'"`(](.+)[\'"`)]$', $pattern, $matches))
Explanation
In Regex, character groups—marked with enclosing square brackets []— matches one of the characters inside of it.

How can I match this array-like notation using regex in PHP?

I'm trying to match the following array-like pattern with regex:
foo[bar][baz][bim]
I almost have it with the following regex:
~([^[]+)(?:\[(.+?)\])*~gm
However, the capturing groups only include:
Full match: foo[bar][baz][bim]
Group 1: foo
Group 2: bim
I can't figure out why it's only capturing the last occurrence of the [] structure. I'd like it capture foo, bar, baz, and bim in this case.
Any ideas on what I'm missing?
Repeated capturing groups in PCRE don't remember the values of each previous pattern. For this you need to invoke \G token:
(?|(\w+)|\G(?!\A)\[([^][]*)\])
See live demo here
Regex breakdown:
(?| Start of a branch reset group
(\w+) Capture word characters
| Or
\G(?!\A) Conitnue from where previous match ends
\[ Match an opening bracket
([^][]*) Capture any thing except [ and ]
\] Match a closing bracket
) End of cluster
PHP code:
preg_match_all('~(?|(\w+)|\G(?!\A)\[([^][]*)\])~', 'foo[bar][baz][bim]', $matches);
print_r($matches[1]);
This can also be parsed without regex.
Remove the closing ] and then explode on the opening [.
$str = "foo[bar][baz][bim]";
$str = str_replace("]","",$str);
$arr = explode("[", $str);
var_dump($arr);
Returns:
array(4) {
[0]=>
string(3) "foo"
[1]=>
string(3) "bar"
[2]=>
string(3) "baz"
[3]=>
string(3) "bim"
}
Where the first item is the "array" name and the following is the children/path.

PHP regex backreference not working

I wrote a regex pattern which works perfectly when I test it in Regexr, but when I use it in my PHP code it doesn't always match when it should match.
The regular expression, including some examples that should and shouldn't match.
Example PHP code that should match but doesn't:
preg_match('/^([~]{3,})\s*([\w-]+)?\s*(?:\{([\w-\s]+)\})?\s*(\2[\w-]+)?\s*$/', "~~~ {class} lang", $matches);
echo var_dump($matches);
I believe the problem is caused by the backreference in the last capture group (\2[\w-]+), however, I can't quire figure out how to fix this.
Because you're referring to a non-existing group(group 2). So remove \2 from the regex.
^([~]{3,})\s*([\w-]+)?\s*(?:\{([-\w\s]+)\})?\s*([\w-]+)?\s*$
DEMO
~~~ {class} lang
| | | |
Group1| Group3 Group4
|
Missing group 2
The problem is caused by capturing group #2, you have made this group optional. So since it may or may not exist, you need to make your backreference optional as well or else it always looks for a required group.
However, since all groups are optional I would just recurse the subpattern of the second group.
^(~{3,})\s*([\w-]+)?\s*(?:{([^}]+)})?\s*((?2))?\s*$
Example:
$str = '~~~ {class} lang';
preg_match('/^(~{3,})\s*([\w-]+)?\s*(?:{([^}]+)})?\s*((?2))?\s*$/', $str, $matches);
var_dump($matches);
Output
array(5) {
[0]=> string(16) "~~~ {class} lang"
[1]=> string(3) "~~~"
[2]=> string(0) "" # Returns "" for optional groups that dont exist
[3]=> string(5) "class"
[4]=> string(4) "lang"
}
The answers below helped me figure out why it wasn't working. However both the answers would give a positive match for $str = '~~~ lang {class} lang'; which I didn't want.
I fixed it my changing capturing group 2 to ([\w-]*) so that even if there is no string at that place, the capturing group exists but remains empty. This way all of the following strings match:
$str = '~~~ lang {no-lines float left} ';
$str = '~~~ {class} ';
$str = '~~~ lang';
$str = '~~~ {class } lang ';
$str = '~~~';
$str = '~~~lang{class}';
But this one won't:
$str = '~~~ css {class} php';
Full solution:
$str = '~~~ {class} lang';
preg_match('/^([~]{3,})\s*([\w-]*)?\s*(?:\{([\w-\s]+)\})?\s*(\2[\w-]+)?\s*$/', $str, $matches);
var_dump($matches);

php preg_match get numbers between two strings

Hi I'm starting to learn php regex and have the following problem:
I need to extract the numbers inside $string.
The regex I use returns "NULL".
$string = 'Clasificación</a> (2194) </li>';
$regex = '/Clasificación</a>((.*?))</li>/';
preg_match($regex , $string, $match);
var_dump($match);
Thanks in advance.
There are three problems with your regex:
You aren't escaping the forward slash. You're using the forward slash as a delimiter, so if you want to use it as a literal character inside the expression, you need to escape it
((.*?)) doesn't do what you think it does. It creates two capturing groups -- one nested inside the other. I assume, you're trying to capture what's inside the parentheses. For that, you'll need to escape the ( and ) characters. The expression would become: \((.*?)\)
Your expression doesn't handle whitespace. In the string you've given, there is whitespace between the </a> and the beginning of the number -- </a> (2194). To ignore the whitespace and capture just the number, you need to use \s (which matches any whitespace character). For that, you need to write \s*\((.*?)\)\s*.
The final regular expression after fixing all the above errors, will look like:
$regex = '~Clasificación</a>\s*\((.*?)\)\s*</li>~';
Full code:
$string = 'Clasificación</a> (2194) </li>';
$regex = '~Clasificación</a>\s*\((.*?)\)\s*</li>~';
preg_match($regex , $string, $match);
var_dump($match);
Output:
array(2) {
[0]=>
string(32) "Clasificación (2194) "
[1]=>
string(4) "2194"
}
Demo.
You forget to espace / in your regex, since you're using the / as a delimiter:
$regex = '/Clasificación<\/a>((.*?))<\/li>/';
// ^ delimiter ^^ ^ delimiter
// ^^ / in a string which is escaped
Another way can be to change that delimiter, and then you will not have to escape it:
$regex = '#Clasificación<\/a>((.*?))<\/li>#';
See the PHP documentation for more information.
you will have to escape out the special characters that you want to match:
$regex = '/Clasificación<\/a> \((.*?)\) <\/li>/'
and may want to make your match a little more specific where it matters (depending on your use case)
$regex = '/Clasificación<\/a>\s*\(([0-9]+)\)\s*<\/li>/';
that will allow for 0 or more spaces before or after the (1234) and only match if there are only numbers in the ()
I just tried this in php:
php > preg_match($regex , $string, $match);
php > var_dump($match);
array(2) {
[0]=>
string(30) "Clasificacin</a> (2194) </li>"
[1]=>
string(4) "2194"
}

PHP Regex to match words with dash right infront and save to array?

i'm wondering what kind of regex I could use to essentially extract all the words that have a dash infront of them in a given string. I'm going to be using it to allow users to omit certain words from the search results of my website.
For example, say I have
$str = "this is a search -test1 -test2";
I'm trying to have it save "test1" and "test2" to an array because they have a dash right infront.
Could anyone help me out
Use the following pattern /\-(\w+)/. Example:
$string = 'this is a search -test1 -test2';
$pattern = '/\-(\w+)/';
if(preg_match_all($pattern, $string, $matches)) {
$result = $matches[1];
}
var_dump($result);
Output:
array(2) {
[0] =>
string(5) "test1"
[1] =>
string(5) "test2"
}
Explanation:
/.../ delimiter chars
\- a dash (must be escaped as it has a special meaning in the regex language
(...) special capture group. stores the content between them in $matches[1]
\w+ At least one ore more word characters
this do the job:
<pre><?php
$string = 'Phileas Fog, Passe-Partout -time -day -#StrAn-_gE+*$Word²²²';
preg_match_all('~ -\K\S++~', $string, $results);
print_r($result);

Categories