get all text between bracket but skip nested bracket - php

Im trying to figure out how to get the text between two bracket tags but dont stop at the first closing )
__('This is a (TEST) all of this i want') i dont want any of this;
my current pattern is __\((.*?)\)
which gives me
__('This is a (TEST)
but i want
__('This is a (TEST) all of this i want')
Thanks

You may use a regex subroutine to match text inside nested parentheses after __:
if (preg_match_all('~__(\(((?:[^()]++|(?1))*)\))~', $s, $matches)) {
print_r($matches[2]);
}
See the regex demo.
Details
__ - a __ substring
(\(((?:[^()]++|(?1))*)\)) - Group 1 (it will be recursed using the (?1) subroutine):
\( - a ( char
((?:[^()]++|(?1))*) - Group 2 capturing 0 or more repetitions of any 1+ chars other than ( and ) or the whole Group 1 pattern is recursed
\) - a ) char.
See the PHP demo:
$s = "__('This is a (TEST) all of this i want') i dont want any of this; __(extract this)";
if (preg_match_all('~__(\(((?:[^()]++|(?1))*)\))~', $s, $matches)) {
print_r($matches[2]);
}
// => Array ( [0] => 'This is a (TEST) all of this i want' [1] => extract this )

You forgot to escape two parenthesis in your regex : __\((.*)\);
Check on regex101.com.

Use the pattern __\((.*)?\).
The \ escapes the parentheses to catch literal parentheses. This then captures all the text inside that set of parentheses.

Related

Regex of number inside brackets

I need to get the float number inside brackets..
I tried this '([0-9]*[.])?[0-9]+' but it returns the first number like 6 in the first example.
Also I tried this
'/\((\d+)\)/'
but it returns 0.
Please note that I need the extracted number either int or float.
Can u plz help
As you need to match bracket also, You need to add () in regular expression:
$str = 'Serving size 6 pieces (40)';
$str1 = 'Per bar (41.5)';
preg_match('#\(([0-9]*[.]?[0-9]+)\)#', $str, $matches);
print_r($matches);
preg_match('#\(([0-9]*[.]?[0-9]+)\)#', $str1, $matches);
print_r($matches);
Output:
Array
(
[0] => (40)
[1] => 40
)
Array
(
[0] => (41.5)
[1] => 41.5
)
DEMO
You could escape brackets:
$str = 'Serving size 6 pieces (41.5)';
if (preg_match('~\((\d+.?\d*)\)~', $str, $matches)) {
print_r($matches);
}
Outputs:
Array
(
[0] => (41.5)
[1] => 41.5
)
Regex:
\( # open bracket
( # capture group
\d+ # one or more numbers
.? # optional dot
\d* # optional numbers
) # end capture group
\) # close bracket
You could also use this to get only one digit after the dot:
'~\((\d+.?\d?)\)~'
You need to escape the brackets
preg_match('/\((\d+(?:\.\d+)?)\)/', $search, $matches);
explanation
\( escaped bracket to look for
( open subpattern
\d a number
+ one or more occurance of the character mentioned
( open Group
?: dont save data in a subpattern
\. escaped Point
\d a number
+ one or more occurance of the character mentioned
) close Group
? one or no occurance of the Group mentioned
) close subpattern
\) escaped closingbracket to look for
matches numbers like
1,
1.1,
11,
11.11,
111,
111.111 but NOT .1, .
https://regex101.com/r/ei7bIM/1
You could match an opening parenthesis, use \K to reset the starting point of the reported match and then match your value:
\(\K\d+(?:\.\d+)?(?=\))
That would match:
\( Match (
\K Reset the starting point of the reported match
\d+ Match one or more digits
(?: Non capturing group
\.\d+ Match a dot and one or more digits
)? Close non capturing group and make it optional
(?= Positive lookahead that asserts what follows is
\) Match )
) Close posive lookahead
Demo php

How to get the text between any number of parenthesis?

Suppose I have a document where I want to capture the strings that have parenthesis before or after.
Example: This [is] a {{test}} sentence. The (((end))).
So basically I want to get the words is, test and end.
Thanks in advance.
According to your condition "strings that have parenthesis before or after" - any word could be proceeded with OR only followed by some type of parentheses:
$text = 'This [is] a {{test}} sentence. The (((end))). Some word))';
preg_match_all('/(?:\[+|\{+|\(+)(\w+)|(\w+)(?:\]+|\}+|\)+)/', $text, $m);
$result = array_filter(array_merge($m[1],$m[2]));
print_r($result);
The output:
Array
(
[0] => is
[1] => test
[2] => end
[7] => word
)
The below code works for me.
<?php
$in = "This [is] a {{test}} sentence. The (((end))).";
preg_match_all('/(?<=\(|\[|{)[^()\[\]{}]+/', $in, $out);
echo $out[0][0]."<br>".$out[0][1]."<br>".$out[0][2];
?>
Your regex could be:
[\[{(]((?(?<=\[)[^\[\]]+|(?(?<={)[^{}]+|[^()]+)))
Explanation: the if-then-else construction is needed to make sure that an opening '{' is matched against a closing '}', etc.
[\[{(] # Read [, { or (
((?(?<=\[) # Lookbehind: IF preceding char is [
[^\[\]]+ # THEN read all chars unequal to [ and ]
| # ELSE
(?(?<={) # IF preceding char is {
[^{}]+ # THEN read all chars unequal to { and }
| # ELSE
[^()]+))) # read all chars unequal to ( and )
See regex101.com
Try this Regex:
(?<=\(|\[|{)[^()\[\]{}]+
>>>Demo<<<
OR this one:
(?<=\(|{|\[)(?!\(|{|\[)[^)\]}]+
>>>Demo<<<
Explantion(for the 1st regex):
(?<=\(|\[|{) - Positive lookbehind - looks for a zero-length match just preceeded by a { or [ or a (
[^()\[\]{}]+ - one or more occurences of any character which is not amoong the following characters: [, (, {, }, ), ]
Explanation(for 2nd Regex):
(?<=\(|\[|{) - Positive lookbehind - looks for a zero-length match just preceeded by a { or [ or a (
(?!\(|{|\[) - Negative lookahead - In the previous step, it found the position which is just preceded by an opening bracket. This piece of regex verifies that it is not followed by another opening bracket. Hence, matching the position just after the innermost opening bracket - (, { or [.
[^)\]}]+ - One or more occurrences of characters which are not among these closing brackets - ], }, )

Grab all characters inside {...} if not contain "{" and "}"

I want catch all character inside { ... },
If inside not found "{" and "}"
So for example:
{amdnia91(\+wowa}
Catch it.
{amdn{ia91(\+wowa}
Not catch (contain "{").
preg_match_all('#(.+?)\{(.+?)\}#', $input, $output);
How fix it?
EDIT.
Explained more:
I will try to create css minifier.
But there i need catch all names and content inside brackets as separate array value.
Curret $input look like this:
.something{property:value;prop2:value}#something2{someprop:val;prop:val}
It is also minfied so containt multiple ...{}...{} inline.
And my code catch all good but...
This catch also if inside brackets its brackets,
but i don't want catch it if contain brackets inside.
[^}{] means match any character that is not } or {.
So:
preg_match_all('#\{([^}{]+)\}#', $input, $output);
However, note that in your {amdn{ia91(+wowa} example, this will match the ia91(+wowa fragment.
EDIT
If you didn't want any match at all for that second example, then try this:
preg_match_all('#^[^}{]*\{([^}{]+)\}[^}{]*$#', $input, $output);
The regex broken down means:
^ - The start of the line
[^}{]* - Any character which is not { or } zero or more times
\{ - The literal { character
([^}{]+) - Capture one or more characters which are not { or }
\} - The literal } character
[^}{]* - Any character which is not { or } zero or more times
$ - The end of the line
Demonstration
Second Edit
Given your further explanation on what you need, I'd suggest this:
preg_match_all('#(?<=^|})[^}{]*?\{([^}{]+?)\}(?=[^}]*$|[^}]*\{)#', $input, $output);
This uses a "look-behind" and a "look-ahead". Broken down, it means:
(?<=^|}) Lookbehind: Assert that this is either the start of the line or that the previous character was a literal '}' but do not include that character as part of the whole match
[^}{]*? - Lazily match zero or more characters which are not { or }
\{ - A literal {
([^}{]+?) - Lazily capture one or more characters which are not { or }
\} - A literal }
(?=[^}]*$|[^}]*\{) - Lookahead: Ensure that the following characters are either zero or more characters which are not } followed by the line end, or zero or more characters which are not } followed by a literal { but do not include those characters as part of the whole match
Demonstration
I am posting an alternative to the regex posted by daiscog based on the concept of matching what we do not need and omitting it, and only match what we need later with the help of PCRE (*SKIP)(*FAIL) verbs:
[#.]?[^{}]*{[^{}]*[{}][^{}]*}(*SKIP)(*F)|[#.]?[^{}]*{([^{}]+)}
See the regex demo
What does it match?
[#.]?[^{}]*{[^{}]*[{}][^{}]*}(*SKIP)(*F) - an optional . or # (see [#.]?) followed with 0+ characters other than { and } (see [^{}]*) followed with a {, that is again followed with [^{}]*, followed with either { or } (see [{}]) and then again [^{}]* and a closing }. This part matches strings like .something{ or nothing. Then, once matched, discard this match from the matches returned due to the (*SKIP)(*FAIL) verbs.
| - or...
[#.]?[^{}]*{([^{}]+)} - an optional . or # (see [#.]?) followed with 0+ characters other than { and } (see [^{}]*), then {, then 1+ characters other than braces ([^{}]+) and a closing brace }. This is what we will keep and get as matches.
PHP demo:
$re = '~[#.]?[^{}]*{[^{}]*[{}][^{}]*}(*SKIP)(*F)|[#.]?[^{}]*{([^{}]+)}~';
$str = "{amdnia91(+wowa}\n{amdn{ia91(+wowa}\n.something{property:value;prop2:value}#something2{someprop:val;prop:val}\n.something{property:value{;prop2:value}#something2{someprop:val;prop:val}\n.something{property:v}alue;prop2:value}#something2{someprop:val;prop:val}";
preg_match_all($re, $str, $matches);
print_r($matches);
Result:
Array
(
[0] => Array
(
[0] => {amdnia91(+wowa}
[1] =>
.something{property:value;prop2:value}
[2] => #something2{someprop:val;prop:val}
[3] => #something2{someprop:val;prop:val}
[4] => #something2{someprop:val;prop:val}
)
[1] => Array
(
[0] => amdnia91(+wowa
[1] => property:value;prop2:value
[2] => someprop:val;prop:val
[3] => someprop:val;prop:val
[4] => someprop:val;prop:val
)
)

PHP: Parse comma-delimited string outside single and double quotes and parentheses

I've found several partial answers to this question, but none that cover all my needs...
I am trying to parse a user generated string as if it were a series of php function arguments to determine the number of arguments:
This string:
$arg1,$arg2='ABC,DEF',$arg3="GHI\",JKL",$arg4=array(1,'2)',"3\"),")
will be inserted as the arguments of a function:
function my_function( [insert string here] ){ ... }
I need to parse the string on the commas, taking into account single- and double-quotes, parentheses, and escaped quotes and parentheses to create an array:
array(4) {
[0] => $arg1
[1] => $arg2='ABC,DEF'
[2] => $arg3="GHI\",JKL"
[3] => $arg4=array(1,'2)',"3\"),")
}
Any help with a regular expression or parser function to accomplish this is appreciated!
It isn't possible to solve this problem with a classical csv tool since there is more than one character able to protect parts of the string.
Using preg_split is possible but will result in a very complicated and inefficient pattern. So the best way is to use preg_match_all. There are however several problems to solve:
as needed, a comma enclosed in quotes or parenthesis must be ignored (seen as a character without special meaning, not as a delimiter)
you need to extract the params, but you need to check if the string has the good format too, otherwise the match results may be totally false!
For the first point, you can define subpatterns to describe each cases: the quoted parts, the parts enclosed between parenthesis, and a more general subpattern able to match a complete param and that uses the two previous subpatterns when needed.
Note that the parenthesis subpattern needs to refer to the general subpattern too, since it can contain anything (and commas too).
The second point can be solved using the \G anchor that ensures that all matchs are contiguous. But you need to be sure that the end of the string has been reached. To do that, you can add an optional empty capture group at the end of the main pattern that is created only if the anchor for the end of the string \z succeeds.
$subject = <<<'EOD'
$arg1,$arg2='ABC,DEF',$arg3="GHI\",JKL",$arg4=array(1,'2)',"3\"),")
EOD;
$pattern = <<<'EOD'
~
# named groups definitions
(?(DEFINE) # this definition group allows to define the subpatterns you want
# without matching anything
(?<quotes>
' [^'\\]*+ (?s:\\.[^'\\]*)*+ ' | " [^"\\]*+ (?s:\\.[^"\\]*)*+ "
)
(?<brackets> \( \g<content> (?: ,+ \g<content> )*+ \) )
(?<content> [^,'"()]*+ # ' # (<-- comment for SO syntax highlighting)
(?:
(?: \g<brackets> | \g<quotes> )
[^,'"()]* # ' #
)*+
)
)
# the main pattern
(?: # two possible beginings
\G(?!\A) , # a comma contiguous to a previous match
| # OR
\A # the start of the string
)
(?<param> \g<content> )
(?: \z (?<check>) )? # create an item "check" when the end is reached
~x
EOD;
$result = false;
if ( preg_match_all($pattern, $subject, $matches, PREG_SET_ORDER) &&
isset(end($matches)['check']) )
$result = array_map(function ($i) { return $i['param']; }, $matches);
else
echo 'bad format' . PHP_EOL;
var_dump($result);
demo
You could split the argument string at ,$ and then append $ back the array values:
$args_array = explode(',$', $arg_str);
foreach($args_array as $key => $arg_raw) {
$args_array[$key] = '$'.ltrim($arg_raw, '$');
}
print_r($args_array);
Output:
(
[0] => $arg1
[1] => $arg2='ABC,DEF'
[2] => $arg3="GHI\",JKL"
[3] => $arg4=array(1,'2)',"3\"),")
)
If you want to use a regex, you can use something like this:
(.+?)(?:,(?=\$)|$)
Working demo
Php code:
$re = '/(.+?)(?:,(?=\$)|$)/';
$str = "\$arg1,\$arg2='ABC,DEF',\$arg3=\"GHI\",JKL\",\$arg4=array(1,'2)',\"3\"),\")\n";
preg_match_all($re, $str, $matches);
Match information:
MATCH 1
1. [0-5] `$arg1`
MATCH 2
1. [6-21] `$arg2='ABC,DEF'`
MATCH 3
1. [22-39] `$arg3="GHI\",JKL"`
MATCH 4
1. [40-67] `$arg4=array(1,'2)',"3\"),")`

"Optional" substring matching with regex

I am writing a regular expression in PHP that will need to extract data from strings that look like:
Naujasis Salemas, Šiaurės Dakota
Jungtinės Valstijos (Centras, Šiaurės Dakota)
I would like to extract:
Naujasis Salemas
Centras
For the first case, I have written [^-]*(?=,), which works quite well. I would like to modify the expression so that if there are parenthesis ( and ) , it should search between those parenthesis and then extract everything before the comma.
Is it possible to do something like this with just 1 expression? If so, how can I make it search within parenthesis if they exist?
A conditional might help you here:
$stra = 'Naujasis Salemas, Šiaurės Dakota';
$strb = 'Jungtinės Valstijos (Centras, Šiaurės Dakota)';
$regex = '
/^ # Anchor at start of string.
(?(?=.*\(.+,.*\)) # Condition to check for: presence of text in parenthesis.
.*\(([^,]+) # If condition matches, match inside parenthesis to first comma.
| ([^,]+) # Else match start of string to first comma.
)
/x
';
preg_match($regex, $stra, $matches) and print_r($matches);
/*
Array
(
[0] => Naujasis Salemas
[1] =>
[2] => Naujasis Salemas
)
*/
preg_match($regex, $strb, $matches) and print_r($matches);
/*
Array
(
[0] => Jungtinės Valstijos (Centras
[1] => Centras
)
*/
Note that the index in $matches changes slightly above, but you might be able to work around that using named subpatterns.
I think this one could do it:
[^-(]+(?=,)
This is the same regex as your, but it doesn't allow a parenthesis in the matched string. It will still match on the first subject, and on the second it will match just after the opening parenthesis.
Try it here: http://ideone.com/Crhzz
You could use
[^(),]+(?=,)
That would match any text except commas or parentheses, followed by a comma.

Categories