Regex: Split function call arguments - php

Running into this problem and I've been searching for days. I'm using PHP to parse formulas for a platform.
A formula could be something like:
object.Field
ADD(object.NumberOfTHings, object.NumberOfThings)
object.DoSomething(ADD(object.NumberOfTHings, object.NumberOfThings), 'words!')
The idea is, it can nest many levels. Users can include quotes (double and single) as well.
I'm working on a function that will return each parameter at the highest level. So
object.DoSomething(ADD(object.NumberOfTHings, object.NumberOfThings), 'words!')
Will need to return the following array:
ADD(object.NumberOfTHings, object.NumberOfThings)
'words!'
We then go back and parse each parameter appropriately (some are object calls, function calls, etc.). I'm open to parsing it all at once, but figured that would just be more complicated.
My current regex is as follows:
\(?'pullsinglequotes'\'.+?\')|(?'pulldoublequotes'\".+?\")|(?'pullfunctions'[^,]\(([^()]|(?R))*\))\
It MOSTLY works, but has two issues:
Won't return objects yet (ex. if I reference object.Field as a parameter).
Only includes the last letter of a function.
Here's a REGEXR with the issue:
https://regexr.com/41e20
I've tried many different variations of REGEX and each has its downsides.
My question is: Does anyone have enough regex knowledge to solve those two issues? If so, any help would be greatly appreciated.
Update
If anyone is interested, this following was my final regex.
/(?'pullsinglequotes'\'.+?\')|(?'pulldoublequotes'\".+?\")|(?'pullfunctions'\b[\w.]+\s*\(([^()]|(?R))*\))|(?'pullvars'\w+(?:\.\w+)?)/

Your pullfunctions is only matching one character that's not a , followed by a parens. Allow it to repeat and precede it with a word boundary.
For the vars and objects, just use a repeating word character with an optional dot-separated part. You can adjust this to a character group to allow other characters like - or _.
Full regex:
(?'pullsinglequotes'\'.+?\')|(?'pulldoublequotes'\".+?\")|(?'pullfunctions'\b[\w]+\s*\(([^()]|(?R))*\))|(?'pullvars'\w+(?:\.\w+)?)

Related

Python: unserialize PHP array with regex

I'm pretty new to regular expressions but decided to use them to unserialize PHP arrays. Here's some background info:
I rewrote a database-based website for companies in django which was written in PHP. There is an M2M relation with companies and industries. In the previous model it was solved by using serialized PHP arrays so I now have to sync everything correctly. My first attempt was some splitting and cutting and it was really ugly so I decided to dive into regular expressions. Here is what I got (it's working perfectly fine) now:
def unserialize_array(serialized_array):
import re
matched_sub = re.search('^a:\d+:\{i:\d+;s:\d+:"(.*?)";\}$', serialized_array).group(1)
industry_list = re.sub('";i:\d+;s:\d+:"', "? ", matched_sub).split("? ")
new_dict = dict(enumerate(industry_list))
return new_dict
I was wondering however if I couldn't do all this with a single regular expression instead of two.
Update: updated to handle correctly also escaped quotes \" (actually written \\") and any escaped sequence (as an escaped quotes after an escaped escape\\\" that is \\\\\\").
I think, if i understood correctly the structure of your input, that you can rewrite your method as this:
def unserialize_array(serialized_array):
import re
return dict(enumerate(re.findall(r'"((?:[^"\\]|\\.)*)"', serialized_array)))
Assumed input (as is not specified explicitly in your question):
a:3:{i:1;s:9:"industry\\\\\\"A\\"";i:2;s:9:"\\"industry2\\"";i:3;s:9:"industry3"}
Output
{0: 'industry1\\\"A\"', 1: '\"industry2\"', 2: 'industry3'} (actually: {0: 'industry1\\\\\\"A\\"', 1: '\\"industry2\\"', 2: 'industry3'})
How it works
There is no need to match the entire structure of the serialized array, cause we are interested only for the string contents. The regex "((?:[^"\\]|\\.)*)" simply extract per char till encounter an escape '\' (in that case accept escape + another char) or the closing double quotes ".
The capturing group assure that the double quotes are removed in the final result.
Finally the re.findall method returns in one single call a list of strings composed by our desired results.
This is a peculiarity of re.findall that overrides the entire match if at least a capturing group is present the matches (or by the capturing group in our case). Infact the docs declares:
If one or more groups are present in the pattern, return a list of groups; this will be a list of tuples if the pattern has more than one group.
Suggesting to use re.sub with callback, which will just call unserialize_array in recursion.

Regex Multiple Matches PHP

I'm trying to get all numbers from a string, having - or _ before the number and optional - _ space or the end of string at the end of the number.
So, my regex looks like this:
[-_][\d]+[-_ $]?
My problem is, I don't match numbers right after each other. From a "foo-5234_2123_54-13-20" string, I only get 5234, 54 and 20.
What I tried is the following regexes: (?:[-_])[\d]+(?:[-_ $])? and [-_]([\d]+)[-_ $]? that obviously didn't work. I'm looking for hours now and I know it can't be that hard so I hope someone can help me here.
If that makes any difference, I'm using PHP preg_match_all.
You just need to use look-arounds:
(?<=[-_])\d+(?=[-_ ]|$)
See demo
Fortunately, PHP supports at least fixed-width look-behinds, and we can use it here.

preg_replace this: {{1}{2}{3}{..}} -- Likely a simple PHP / preg_replace / RegEx issue but I am honestly stumped

Working on a personal project I am using PHP and I'd like to run a preg_replace_callback function against the following strings:
1. {{hello}}
2. {{hello}{there}{how}{are}{you}}
I'd like to detect the hello there how are and you and send to a function as $matches[0-4] (or however many there may be, needs to be variable from 1 to infinity).
The above isn't too hard for me, but i'd also like it so if I pass this string :
3. {{hello}{there}{how}{are}{you}} blabla {{I}{Am}{Fine}{Thanks for asking}}
The function I send the $matches[0-X] to should be run TWICE, as the the little {{}} system I designed is opened and shut twice!
The pattern should also ignore {text just on its own like this} and BUT SHOULD run for {{text like this, i.e. just one box}}.
If I can type things with a back slash as a condition, such as:
4. {{ignore this next closing curly bracket \} as the slash makes it text}
...And it also could then also remove that now un-required backslash... well... THAT WOULD GET MASSIVE BONUS POINTS!!
All this is a preg_replace_callback too so I need the entire {{thing}{here}} to be replaced by whatever the function returns.
Is this simple? Or hard? I'm stuck!
Love learning though so if anyone could help me, it would be more appreciated than you'd ever imagine. Thank you!
EDIT p.s. If it is just too hard to do as I explain above, i'd accept it working for something like:
[{hello}{there}{how}{are}{you}]
Using square brackets as well as curly - But that is much less desirable...
A variable amount of capture groups is impossible; however, you can do a global match and match all of them (it would be near impossible to see if it came from the first group or the second group though, with example #3):
(?:\G(?!\A)|\{)[^}]*?\K\{(.*?)(?<!\\)\}
Demo
P.S. Here is an example of an expression to show why variable capture groups are impossible. The repeated capture group will be replaced with each match and the contents will equal that of the final match: (a)+bc

What Regex for this?

I'm trying to learn regular expression, because I can't do without them.
So, this is a list of different dimension patterns (for products to sale) :
40x30x75
46x38x23-27
Ø30H30
Ø25-18H27
So, what pattern to use to find each kind of dimensions ?
For example, now, I'm using this to find this kind of pattern 40x30x75, but it not works :
if(preg_match("#^[0-9][x][0-9][x][0-9]#", $dimension))
echo "ok"
Could you help me ?
Try the following regex:
(^[0-9]+x[0-9]+x[0-9]+$)|(^[0-9]+x[0-9]+x[0-9]+-[0-9]+$)|(^Ø[0-9]+H[0-9]+$)|(^Ø[0-9]+-[0-9]+H[0-9]+$)
So:
if (preg_match("/(^[0-9]+x[0-9]+x[0-9]+$)|(^[0-9]+x[0-9]+x[0-9]+-[0-9]+$)|(^Ø[0-9]+H[0-9]+$)|(^Ø[0-9]+-[0-9]+H[0-9]+$)/", $dimension))
echo "ok";
It probably can be simplified even more, maybe someone would want to have a go at that?
By the way, did you know about a website called RegExr it allows you to test your regular expessions, it has been very useful to me whenever I work with regex's.
Your regex is missing quantifiers, add a + sign behind the character classes in question to singal you're looking for one or more matches:
if(preg_match("#^[0-9]+x[0-9]+x[0-9]+#", $dimension))
echo "ok"
By default it's looking for one character of the class only. Single characters do not need the character class (albeit it was not wrong). See the x'es in the example above.
Your regex should be:
^[0-9]{2}x[0-9]{2}x[0-9]{2}$
[0-9] means a single character which is between 0 and 9. So, you either need to have two of those, or use a quantifier thing like {2}. Instead of [0-9] you could also use \d, meaning any digit. So, you could for example write:
^\d\dx\d\dx\d\d$
Tip: If you can't do without regular expressions, want to learn it and have an easier life, I can recommend you get RegexBuddy. Bought it for myself when I just got started, and it has helped me a lot.
This will validate the first two:
^[0-9]+x[0-9]+x[0-9]+-?[0-9]*$

replace exact match in php

im new to regular expressions in php.
I have some data in which some of the values are stored as zero(0).What i want to do is to replace them with '-'. I dont know which value will get zero as my database table gets updated daily thats why i have to place that replace thing on all the data.
$r_val=preg_replace('/(0)/','-',$r_val);
The code im using is replacing all the zeroes that it finds for eg. it is even replacing zero from 104.67,giving the output 1-4.56 which is wrong. i want that data where value is exact zero that must be replaced by '-' not every zero that it encounter.
Can anyone please help!!
Example of the values that $r_val is having :-
10.31,
391.05,
113393,
15.31,
1000 etc.
This depends alot on how your data is formatted inside $r_val, but a good place to start would be to try:
$r_val = preg_replace('/(?<!\.)\b0\b(?!\.)/', '-', $r_val);
Where \b is a 0-length character representing the start or end of a 'word'.
Strange as it may sound, but the Perl regex documentation is actually really good for explaining the regex part of the preg_* functions, since Perl is where the functionality is actually implemented.
Again, it would be more than helpful if you could supply an example of what the $r_val string really looks like.
Note that \b matches at word boundaries, which would also turn a string like "0.75" into "-.75". Not a desirable result, I guess.
Whilst the other answer does work, it seems overly complex to me. I think you need only to use the ^ and $ chars either side of 0.
$r_val = preg_replace('/^0+$/', '&#45', $r_val);
^ indicates the regex should match from the beginning of the line.
$ indicates the regex should match to the end of the line.
+ means match this pattern 1 or more times
I altered the minus sign to it's html code equivalent too. Paranoid, yes, but we are dealing with numbers after all, so I though throwing a raw minus sign in there might not be the best idea.
Why not just do this?
if ( $r_val == 0 )
$r_val = '-';
You do not need to use a regex for this. In fact, I'd advise against doing so for performance reasons. The operation above is approximately 20x faster than the regex solution.
Also, the PHP manual advises against using regexes for simple replacements:
If you don't need fancy replacing rules (like regular expressions), you should always use this function instead of ereg_replace() or preg_replace().
http://us.php.net/manual/en/function.str-replace.php
Hope that helps!

Categories