Is there a method to omit nested parentheses in regex? - php

I'm writing a regex function for PHPs' preg_match_all to find all ifs(...) with all its contents from a string. (In my example I've got only one ifs, as it's not re root of the problem.)
Here's what I've got so far:
Pattern: /ifs\(.*?\)/i
String: =iferror(ifs(OR("foo", "bar"),"a",OR("tar", "scar"),"b",OR("lar"),"d"),"c")
Current output: ifs(OR("foo", "bar")
Expected output: ifs(OR("foo", "bar"),"a",OR("tar", "scar"),"b",OR("lar"),"d")
The problem: regex finds first closing parentheses.
Where am I going wrong? And how would you tackle nested parentheses?
Demo: https://regex101.com/r/SgBqbW/1

Actually you can do this thanks to PHPs capability of recursive regexps. This is inspired by this comment on that page:
$string = '=iferror(ifs(OR("foo", "bar"),"a",OR("tar", OR("scar", "baa")),"b",OR("lar"),"d"),"c")
blah blah ifs(OR("foo", "bar"),"a") and another one ifs("a", OR("tar", OR("scar", "baa")),"b",OR("lar"),"d")';
$regex = '/ifs(\(((?>[^()]+)|(?-2))*\))/';
preg_match_all($regex, $string, $matches);
print_r($matches[0]);
Output:
Array (
[0] => ifs(OR("foo", "bar"),"a",OR("tar", OR("scar", "baa")),"b",OR("lar"),"d")
[1] => ifs(OR("foo", "bar"),"a")
[2] => ifs("a", OR("tar", OR("scar", "baa")),"b",OR("lar"),"d")
)
Demo on 3v4l.org

Related

regex - finding multiple occurances of a pattern and extracting a string [duplicate]

I have tried the non capturing group option ?:
Here is my data:
hello:"abcdefg"},"other stuff
Here is my regex:
/hello:"(.*?)"}/
Here is what it returns:
Array
(
[0] => Array
(
[0] => hello:"abcdefg"}
)
[1] => Array
(
[0] => abcdefg
)
)
I wonder, how can I make it so that [0] => abdefg and that [1] => doesnt exist?
Is there any way to do this? I feel like it would be much cleaner and improve my performance. I understand that regex is simply doing what I told it to do, that is showing me the whole string that it found, and the group inside the string. But how can I make it only return abcdefg, and nothing more? Is this possible to do?
Thanks.
EDIT: I am using the regex on a website that says it uses perl regex. I am not actually using the perl interpreter
EDIT Again: apparently I misread the website. It is indeed using PHP, and it is calling it with this function: preg_match_all('/hello:"(.*?)"}/', 'hello:"abcdefg"},"other stuff', $arr, PREG_PATTERN_ORDER);
I apologize for this error, I fixed the tags.
EDIT Again 2: This is the website http://www.solmetra.com/scripts/regex/index.php
preg_match_all
If you want a different captured string, you need to change your regex. Here I'm looking for anything not a double quote " between two quote " characters behind a : colon character.
<?php
$string = 'hello:"abcdefg"},"other stuff';
$pattern = '!(?<=:")[^"]+(?=")!';
preg_match_all($pattern,$string,$matches);
echo $matches[0][0];
?>
Output
abcdefg
If you were to print_r($matches) you would see that you have the default array and the matches in their own additional arrays. So to access the string you would need to use $matches[0][0] which provides the two keys to access the data. But you're always going to have to deal with arrays when you're using preg_match_all.
Array
(
[0] => Array
(
[0] => abcdefg
)
)
preg_replace
Alternatively, if you were to use preg_replace instead, you could replace all of the contents of the string except for your capture group, and then you wouldn't need to deal with arrays (but you need to know a little more about regex).
<?php
$string = 'hello:"abcdefg"},"other stuff';
$pattern = '!^[^:]+:"([^"]+)".+$!s';
$new_string = preg_replace($pattern,"$1",$string);
echo $new_string;
?>
Output
abcdefg
preg_match_all is returning exactly what is supposed to.
The first element is the entire string that matched the regex. Every other element are the capture groups.
If you just want the the capture group, then just ignore the 1st element.
preg_match_all('/hello:"(.*?)"}/', 'hello:"abcdefg"},"other stuff', $arr, PREG_PATTERN_ORDER);
$firstMatch = $arr[1];

PHP regex, how can I make my regex only return one group?

I have tried the non capturing group option ?:
Here is my data:
hello:"abcdefg"},"other stuff
Here is my regex:
/hello:"(.*?)"}/
Here is what it returns:
Array
(
[0] => Array
(
[0] => hello:"abcdefg"}
)
[1] => Array
(
[0] => abcdefg
)
)
I wonder, how can I make it so that [0] => abdefg and that [1] => doesnt exist?
Is there any way to do this? I feel like it would be much cleaner and improve my performance. I understand that regex is simply doing what I told it to do, that is showing me the whole string that it found, and the group inside the string. But how can I make it only return abcdefg, and nothing more? Is this possible to do?
Thanks.
EDIT: I am using the regex on a website that says it uses perl regex. I am not actually using the perl interpreter
EDIT Again: apparently I misread the website. It is indeed using PHP, and it is calling it with this function: preg_match_all('/hello:"(.*?)"}/', 'hello:"abcdefg"},"other stuff', $arr, PREG_PATTERN_ORDER);
I apologize for this error, I fixed the tags.
EDIT Again 2: This is the website http://www.solmetra.com/scripts/regex/index.php
preg_match_all
If you want a different captured string, you need to change your regex. Here I'm looking for anything not a double quote " between two quote " characters behind a : colon character.
<?php
$string = 'hello:"abcdefg"},"other stuff';
$pattern = '!(?<=:")[^"]+(?=")!';
preg_match_all($pattern,$string,$matches);
echo $matches[0][0];
?>
Output
abcdefg
If you were to print_r($matches) you would see that you have the default array and the matches in their own additional arrays. So to access the string you would need to use $matches[0][0] which provides the two keys to access the data. But you're always going to have to deal with arrays when you're using preg_match_all.
Array
(
[0] => Array
(
[0] => abcdefg
)
)
preg_replace
Alternatively, if you were to use preg_replace instead, you could replace all of the contents of the string except for your capture group, and then you wouldn't need to deal with arrays (but you need to know a little more about regex).
<?php
$string = 'hello:"abcdefg"},"other stuff';
$pattern = '!^[^:]+:"([^"]+)".+$!s';
$new_string = preg_replace($pattern,"$1",$string);
echo $new_string;
?>
Output
abcdefg
preg_match_all is returning exactly what is supposed to.
The first element is the entire string that matched the regex. Every other element are the capture groups.
If you just want the the capture group, then just ignore the 1st element.
preg_match_all('/hello:"(.*?)"}/', 'hello:"abcdefg"},"other stuff', $arr, PREG_PATTERN_ORDER);
$firstMatch = $arr[1];

Pattern matching css rules

I have the following pattern:
[\{\}].*[\{\}]
With the following test strings (can provide more if needed):
}.prop{hello:ars;} //shouldn't match
}#prop{} //should match
}.prop #prop {} //should match
The purpose of the pattern is to find empty css rulesets. Can someone suggest how I go about excluding matches with characters between the second set of brackets? I will be updating the pattern as I get closer to a solution.
edit:
on http://gskinner.com/RegExr/
this pattern: [\}].*[\{]{1}[/}]{1}
seems to have the desired result although it is breaking when transfered to php for reasons I don't understand.
edit:
first apologies if this should be a separate question.
Using the pattern in the first edit in php:
$pattern = "/[\}].*[\{]{1}[/}]{1}/";
preg_match_all ($pattern, $new_css, $p);
print_r($p);
When $new_css is a string of the content of an uploaded css file containing empty rulesets, $p is never populated. Yet I know this pattern is ok. Can anyone see what the issue is?
edit: final solution
//take out other unwanted characters
$pattern = "/\}([\.#\w]\w+\s*)+{}/";
//do it twice to beat any deformation
$new_css = preg_replace ($pattern, '}', $new_css);
$new_css = preg_replace ($pattern, '}', $new_css);
Try using single quotes around the regex, or doubling the \ characters. The way PHP handles \ in double-quoted strings is that \{ becomes {, breaking the regex.
Try the pattern: '/}([\.#]\w+\s*)+{}/'
$new_css = "{}.prop{hello:ars;}
{}#prop{} //should match
}.prop #prop {} //should match
}.prop { aslkdfj}
}.prop { }
";
$pattern = '/}([\.#]\w+\s*)+{}/';
preg_match_all ($pattern, $new_css, $p);
print_r($p);
This outputs:
Array
(
[0] => Array
(
[0] => }#prop{}
[1] => }.prop #prop {}
)
[1] => Array
(
[0] => #prop
[1] => #prop
)
)

PHP/REGEX: Get a string within parentheses

This is a really simple problem, but I couldn't find a solution anywhere.
I'm try to use preg_match or preg_match_all to obtain a string from within parentheses, but without the parentheses.
So far, my expression looks like this:
\([A-Za-z0-9 ]+\)
and returns the following result:
3(hollow highlight) 928-129 (<- original string)
(hollow highlight) (<- result)
What i want is the string within parentheses, but without the parentheses. It would look like this:
hollow highlight
I could probably replace the parentheses afterwards with str_replace or something, but that doesn't seem to be a very elegant solution to me.
What do I have to add, so the parentheses aren't included in the result?
Thanks for your help, you guys are great! :)
try:
preg_match('/\((.*?)\)/', $s, $a);
output:
Array
(
[0] => (hollow highlight)
[1] => hollow highlight
)
You just need to add capturing parenthesis, in addition to your escaped parenthesis.
<?php
$in = "hello (world), my name (is andrew) and my number is (845) 235-0184";
preg_match_all('/\(([A-Za-z0-9 ]+?)\)/', $in, $out);
print_r($out[1]);
?>
This outputs:
Array ( [0] => world [1] => is andrew [2] => 845 )

Get all occurrences of words between curly brackets

I have a text like:
This is a {demo} phrase made for {test}
I need to get
demo
test
Note: My text can have more than one block of {}, not always two. Example:
This is a {demo} phrase made for {test} written in {English}
I used this expression /{([^}]*)}/ with preg_match but it returns only the first word, not all words inside the text.
Use preg_match_all instead:
preg_match_all($pattern, $input, $matches);
It's much the same as preg_match, with the following stipulations:
Searches subject for all matches to the regular expression given in
pattern and puts them in matches in the order specified by flags.
After the first match is found, the subsequent searches are continued
on from end of the last match.
Your expression is correct, but you should be using preg_match_all() instead to retrieve all matches. Here's a working example of what that would look like:
$s = 'This is a {demo} phrase made for {test}';
if (preg_match_all('/{([^}]*)}/', $s, $matches)) {
echo join("\n", $matches[1]);
}
To also capture the positions of each match, you can pass PREG_OFFSET_CAPTURE as the fourth parameter to preg_match_all. To use that, you can use the following example:
if (preg_match_all('/{([^}]*)}/', $s, $matches, PREG_OFFSET_CAPTURE)) {
foreach ($matches[1] as $match) {
echo "{$match[0]} occurs at position {$match[1]}\n";
}
}
As the { and } are part of regex matching syntax, you need to escape these characters:
<?php
$text = <<<EOD
this {is} some text {from}
which I {may} want to {extract}
some words {between} brackets.
EOD;
preg_match_all("!\{(\w+)\}!", $text, $matches);
print_r($matches);
?>
produces
Array
(
[0] => Array
(
[0] => {is}
[1] => {from}
[2] => {may}
[3] => {extract}
[4] => {between}
)
... etc ...
)
This example may be helpful to understand the use of curly brackets in regexes:
<?php
$str = 'abc212def3456gh34ij';
preg_match_all("!\d{3,}!", $str, $matches);
print_r($matches);
?>
which returns:
Array
(
[0] => Array
(
[0] => 212
[1] => 3456
)
)
Note that '34' is excluded from the results because the \d{3,} requires a match of at least 3 consecutive digits.
Matching portions between pair of braces using RegEx, is less better than using Stack for this purpose. Using RegEx would be something like «quick and dirty patch», but for parsing and processing input string you have to use a stack. Visit here for the concept and here for applying the same.

Categories