Regular expression matching unescaped paired brackets and nested function calls

Regular expression matching unescaped paired brackets and nested function calls - php

Here's the problem: given a string like
"<p>The price for vehicles {capitalize(pluralize(vehicle))} is {format_number(value, language)}</p><span>{employee_name}</span><span>\{do not parse me}</span>"
I need (1) a regex pattern in PHP that matches all values between un-escaped pairs of curly brackets and (2) another regex pattern that matches function calls and nested function calls (once the first pattern is matched). Of course, if I could use one regex only for both tasks that would be awesome.
By the way, I can't use Smarty, Twig or any other library - that's the only reason I have to build a parsing mechanism myself.
Thanks a ton!
Solution
(1) A partial solution for the first problem can be found here. Basically, we use the regex (?={((?:[^{}]++|{(?1)})++)}) and find the matches at index 1 of the resulting array.
It's partial because I still need to find a way of ignoring escaped braces, though.
(2) I'm considering the use of recursive regex, as suggested by Mario. Will post result here.
Thanks, guys!

Copying the answer from the comments in order to remove this question from the "Unanswered" filter:
This appears to be what you're looking for:
(?<!\\){([^(){}]+)\(((?:[^(){}]+|\(((?2))\))*)\)}
Link: http://www.regex101.com/r/uI4qN0
~ answer per Jerry
Note: For comparison - simply ignoring escaped braces is accomplished by adding a negative lookahead (?<!\\) to the beginning of the expression, like so:
(?<!\\)(?={((?:[^{}]++|{(?1)})++)})

Related

Replacing unique nested statement with regex or alternatives

From the countless questions posted I know it's not possible/advisable to use regex to replace nested statements.
I'm wondering if it makes any difference in a case where statements are unique:
[if #test]TEST[if #second]SECOND[/if][/if]
I've gotten it work when the end blocks are also unique, which I know is clumsy workaround:
[if #test]TEST[if #second]SECOND[/if #second][/if #test]
$pattern = '%\[if #'.$dynamic.'.*?\](.*?)\[/if #'.$dynamic.'\]%s'; //Works with above
Is it possible to use regex without the end block being unique? Are there alternatives to regex that would accomplish this?
I would like to parse something like: [if #test]TEST[if #second]SECOND[/if][/if] with arbitrary nesting levels. If regex is not practical, can anyone suggest viable alternative in PHP?

In a proper solution you should tokenize the string in to its basic components such as tags, comments, text and whatever else you have there. This step can be done with regex, and produces a flat list of tokens. Next you go trough the tokens building a parse tree with all the structure and details needed. (Both steps can be combined and done in one pass as well.)
That way everything is under your control and you don't need to reparse any part of the code.
On the other hand it can be done with regex, but then you are more limited, and you need to reparse the nested parts of the code for every added depth.
Since you asked for a regex, here is one to match such nested ifs:
~
\[if\ #(\w++)]
(
(?>
(?: (?!\[if\ #\w++]|\[/if]) . )++
|
(?R)
)*+
)
\[/if]
~xs

PHP / Regex : match json inside json

Just a quick regex question...hopefully
I have a string that looks something like this:
$string = 'some text [ something {"index":"{"index2":"value2"}"}] [something2 {"here to be":"more specific"}]';
I want to be able to get the value:
{"index":"{"index2":"value2"}"}
But all my attempts at matching (or replacing) keep giving me:
{"index":"{"index2":"value2"}
preg_replace('/\[(.*?)({.*?[^}]})*?\]/is', "", $string);
Here I'm matching the whole square bracket area, but hopefully you can see what I'm trying to do.
The negation of the "do not match }" doesn't seem to be doing anything. Maybe I just need an OR in there or something.
Well, thanks if you have time to answer.
The $string could contain multiple instances of the {} so a greedy regex won't work....that I know of.

You can't make a regex count the opening brackets and the corresponding closeing brackets, you should use a simple for loop to do that, but you can get the complete string from the first opening bracket to the last closeing one with a greedy expression like: ({.*}). Note that simple string functions are much faster then regular expressions, so you should use those instead.

Matching Parentheses with Division Operator - Regex

Examples:
input: (n!/(1+n))
output: frac{n!}{1+n}
input: ((n+11)!/(n-k)^(-1))
output: frac{(n+11)!}{(n-k)^(-1)}
input: (9/10)
output: frac{9}{10}
input: ((n+11)!/(n-k)^(-1))+(11)/(2)
output: frac{(n+11)!}{(n-k)^(-1)}+(11)/(2)
The following regex works if there are no sub parentheses.
\(([^\/\)]*)\/([^\)]*)\)
The following does matching parentheses
#\((([^()]++|\((?:[^()]++|(?R))+\))+)\)#
I just can not figure out how to "combine" them - write a single regex to handle division and balanced parentheses.

I think something like this should work:
((?:\w+|\((?1)\))(?:[+*^-](?1)|!)?)\/((?1))
Now, this probably isn't perfect, but here's the idea:
The first group, $1, is ((?:\w+|\((?1)\))(?:[+*^-](?1)|!)?), which is:
(A literal) or (a balanced expression wrapped in parentheses), followed by an optional operator and another balanced expression if needed.
Writing it that way, we can use (?1) anywhere in the regex to refer to another balanced expression.
Working example: http://ideone.com/PNLOD

I know you've already accepted a regexp, but the real answer to what you're trying to do is a proper parser... and no, you don't need to code this from scratch or reinvent the wheel. While a lot of people hate phpclasses, the evalMath class that you can find there is incredibly useful for parsing and evaluating mathematical formulae. See my answer to a similar question for details of how it can be used and extended.

Parsing nested structures in PHP with preg_match

Hello I want to make something like a meta language which gets parsed and cached to be more performant. So I need to be able to parse the meta code into objects or arrays.
Startidentifier: {
Endidentifier: }
You can navigate through objects with a dot(.) but you can also do arithmetic/logic/relational operations.
Here is an example of what the meta language looks like:
{mySelf.mother.job.jobName}
or nested
{mySelf.{myObj.{keys["ObjProps"][0]}.personAttribute.first}.size}
or with operations
{obj.val * (otherObj.intVal + myObj.longVal) == 1200}
or more logical
{obj.condition == !myObj.otherCondition}
I think most of you already understood what i want. At the moment I can do only simple operations(without nesting and with only 2 values) but nesting for getting values with dynamic property names works fine. also the text concatination works fine
e.g. "Hello {myObj.name}! How are you {myObj.type}?".
Also the possibility to make short if like (condition) ? (true-case) : (false-case) would be nice but I have no idea how to parse all that stuff. I am working with loops with some regex at the moment but it would be probably faster and even more maintainable if I had more in regex.
So could anyone give me some hints or want to help me? Maybe visit the project site to understand what I need that for: http://sourceforge.net/projects/blazeframework/
Thanks in advance!

It is non-trivial to parse a indeterminate number of matching braces using regular expressions, because in general, either you will match too much or too little.
For instance, consider Hello {myObj.name}! {mySelf.{myObj.{keys["ObjProps"][0]}.personAttribute.first}.size}? to use two examples from your input in the same string:
If you use the first regular expression that probably comes to mind \{.*\} to match braces, you will get one match: {myObj.name}! {mySelf.{myObj.{keys["ObjProps"][0]}.personAttribute.first}.size} This is because by default, regular expressions are greedy and will match as much as possible.
From there, we can try to use a non-greedy pattern \{.*?\}, which will match as little as possible between the opening and closing brace. Using the same string, this pattern will result in two matches: {myObj.name} and {mySelf.{myObj.{keys["ObjProps"][0]}. Obviously the second is not a full expression, but a non-greedy pattern will match as little as possible, and that is the smallest match that satisfies the pattern.
PCRE does allow recursive regular expressions, but you're going to end up with a very complex pattern if you go down that route.
The best solution, in my opinion, would be to construct a tokenizer (which could be powered by regex) to turn your text into an array of tokens which can then be parsed.

maybe have a look at the PREG_OFFSET_CAPTURE flag!?

Need to negate this regex pattern, but no clue how

I found a regex pattern for PHP that does the exact OPPOSITE of what I'm needing, and I'm wondering how I can reverse it?
Let's say I have the following text: Item_154 ($12)
This pattern /\((.*?)\)/ gets what's inside the parenthesis, but I need to get "Item_154" and cut out what's in parenthesis and the space before the parenthesis.
Anybody know how I can do that?
Regex is above my head apparently...

/^([^( ]*)/
Match everything from the start of the string until the first space or (.
If the item you need to match can have spaces in it, and you only want to get rid of whitespace immediately before the parenthetical, then you can use this instead:
/^([^(]*?)\s*\(/

The following will match anything that looks like text (...) but returns just the text part in the match.
\w+(?=\s*\([^)]*\))
Explanation:
The \w includes alphanumeric and underscore, with + saying match one or more.
The (?= ) group is positive lookahead, saying "confirm this exists but don't match it".
Then we have \s for whitespace, and * saying zero or more.
The \( and \) matches literal ( and ) characters (since its normally a special chat).
The [^)] is anything non-) character, and again * is zero or more.
Hopefully all makes sense?

/(.*)\(.*\)/
What is not in () will now be your 1st match :)

One site that really helped me was http://gskinner.com/RegExr/
It'll let you build a regex and then paste in some sample targets/text to test it against, highlighting matches. All of the possible regex components are listed on the right with (essentially) a tooltip describing the function.

<?php
$string = 'Item_154 ($12)';
$pattern = '/(.*)\(.*?\)/';
preg_match($pattern, $string, $matches);
var_dump($matches[1]);
?>
Should get you Item_154

The following regex works for your string as a replacement if that helps? :-
\s*\(.*?\)
Here's an explanation of what's it doing...
Whitespace, any number of repetitions - \s*
Literal - \(
Any character, any number of repetitions, as few as possible - .*?
Literal - \)
I've found Expresso (http://www.ultrapico.com/) is the best way of learning/working out regular expressions.
HTH

Here is a one-shot to do the whole thing
$text = 'Item_154 ($12)';
$text = preg_replace('/([^\s]*)\s(\()[^)]*(\))/', $1$2$3, $text);
var_dump($text);
//Outputs: Item_154()
Keep in mind that using any PCRE functions involves a fair amount of overhead, so if you are using something like this in a long loop and the text is simple, you could probably do something like this with substr/strpos and then concat the parens on to the end since you know that they should be empty anyway.
That said, if you are looking to learn REGEXs and be productive with them, I would suggest checking out: http://rexv.org
I've found the PCRE tool there to very useful, though it can be quirky in certain ways. In particular, any examples that you work with there should only use single quotes if possible, as it doesn't work with double quotes correctly.
Also, to really get a grip on how to use regexs, I would check out Mastering Regular Expressions by Jeffrey Friedl ISBN-13:978-0596528126
Since you are using PHP, I would try to get the 3rd Edition since it has a section specifically on PHP PCRE. Just make sure to read the first 6 chapters first since they give you the foundation needed to work with the material in that particular chapter. If you see the 2nd Edition on the cheap somewhere, that pretty much the same core material, so it would be a good buy as well.

We Keep Coding

PHP, A popular general-purpose scripting language that is especially suited to web development.

Regular expression matching unescaped paired brackets and nested function calls - php

Related

Replacing unique nested statement with regex or alternatives

PHP / Regex : match json inside json

Matching Parentheses with Division Operator - Regex

Parsing nested structures in PHP with preg_match

Need to negate this regex pattern, but no clue how

Categories

Resources