Regex: match an operation between two factors in mathematical equation - php

I want to make a regex to match either of multiplication or division operation in mathematical equation which may contain power symbol (^). The match begin between the factor within the most brackets and its nearby variable. I have created my own regex but I faced two main problems:
It doesn't match two factors that not contain * symbol between them (see example 2), I want it match.
It match the operation that only contain - symbol (example 4), I want it doesn't except there is * or / symbol before - symbol (example 3).
Here are my experiments:
EXAMPLE 1
String:
(sdf^sdf*(sdf*(23^3s)))*sdf
Expected result:
(sdf*(23^3s))
My current result:
(sdf*(23^3s))
EXAMPLE 2
String
(232^23)dfdf+dfd(sfsf)
Expected Result
(232^23)dfdf
My current result:
(doesn't match at all)
EXAMPLE 3
String
dfd(sfsf^sdf+323)/-13+sfdfsdf
Expected Result (UPDATED)
dfd(sfsf^sdf+323)
My current result
(sfsf^sdf+323)/-13
EXAMPLE 4
String
(dfd^23sdf)-(234^dfd)
Expected Result
(doesn't match anything)
My current result
(dfd^23sdf)-(234^dfd)
EXAMPLE 5
String
(dfd^23sdf)-(234^dfd)*(x-3)
Expected Result
(234^dfd)*(x-3)
My current result
(dfd^23sdf)-(234^dfd)*(x-3)
Here is my regex:
(\-?)\(?(((\-?)\-?\d*\.?\d*[a-z]*\^?)+)\)?(\*?\/?)((\-?)\(([^\(\)]+)\))(\*?\/?)(\-?)\(?(((\-?)\-?\d*\.?\d*[a-z]*\^?)+)\)|(((\-?)\(([^\(\)]+)\))([\*\/])(\-?)(((?!\+)(\-?)\(?[\-\d\.\w\^\+\-\*\/]*\)?))?)

A suggestion. If you're happy with the regex you've got you can speed it up by making all the groups clusters then running it through regex refactor software here http://www.regexformat.com
Before:
https://regex101.com/r/5Wm1Eb/4
(\-?((\w+\.\^\(.*?\)|([\w\.\^]+))|(\(?\(([^\(\)]+)\)\)?))(((\/)(?!\-))|((\*)(?!\-))|(\/\-)|(\*\-))?\(([^\(\)]+)\))|(\-?\(([^\(\)]+)\)((((\/)(?!\-))|((\*)(?!\-))|(\/\-)|(\*\-))?((\w+\.\^\(.*?\)|([\w\.\^]+))|(\(?\(([^\(\)]+)\)\)?))))
After, twice as fast, half as big:
https://regex101.com/r/TbHlI1/1
\-?(?:(?:\w+\.\^\(.*?\)|[\w\.\^]+|\(?\([^\(\)]+\)\)?)(?:[*/](?:(?!\-)|\-))?\([^\(\)]+\)|\([^\(\)]+\)(?:[*/](?:(?!\-)|\-))?(?:\w+\.\^\(.*?\)|[\w\.\^]+|\(?\([^\(\)]+\)\)?))

After few hours of finding solution, here is what I got:
I write down the regex to match operation like (*), (/), (*-), or (/-).
(((\/)(?!\-))|((\*)(?!\-))|(\/\-)|(\*\-))?
After that, I make a regex to find the factor within the most brackets and its closest back variable which match the condition.
(((\w+\^\(.*?\)|([\w\^]+))|(\(?\(([^\(\)]+)\)\)?))(((\/)(?!\-))|((\*)(?!\-))|(\/\-)|(\*\-))?\(([^\(\)]+)\))
If it doesn't match, then try again to find the factor within the most brackets and its closest front variable which match the condition.
(\(([^\(\)]+)\)((((\/)(?!\-))|((\*)(?!\-))|(\/\-)|(\*\-))?((\w+\^\(.*?\)|([\w\^]+))|(\(?\(([^\(\)]+)\)\)?))))
Then, combine those two regexs above using OR (|) quantifier to get the desired result.
DEMO
UPDATED
I modified some parts, so it can match negative factor and decimal (marked with '.' symbol).
DEMO

Related

Retrieve 0 or more matches from comma separated list inside parenthesis using regex

I am trying to retrieve matches from a comma separated list that is located inside parenthesis using regular expression. (I also retrieve the version number in the first capture group, though that's not important to this question)
What's worth noting is that the expression should ideally handle all possible cases, where the list could be empty or could have more than 3 entries = 0 or more matches in the second capture group.
The expression I have right now looks like this:
SomeText\/(.*)\s\(((,\s)?([\w\s\.]+))*\)
The string I am testing this on looks like this:
SomeText/1.0.4 (debug, OS X 10.11.2, Macbook Pro Retina)
Result of this is:
1. [6-11] `1.0.4`
2. [32-52] `, Macbook Pro Retina`
3. [32-34] `, `
4. [34-52] `Macbook Pro Retina`
The desired result would look like this:
1. [6-11] `1.0.4`
2. [32-52] `debug`
3. [32-34] `OS X 10.11.2`
4. [34-52] `Macbook Pro Retina`
According to the image above (as far as I can see), the expression should work on the test string. What is the cause of the weird results and how could I improve the expression?
I know there are other ways of solving this problem, but I would like to use a single regular expression if possible. Please don't suggest other options.
When dealing with a varying number of groups, regex ain't the best. Solve it in two steps.
First, break down the statement using a simple regex:
SomeText\/([\d.]*) \(([^)]*)\)
1. [9-14] `1.0.4`
2. [16-55] `debug, OS X 10.11.2, Macbook Pro Retina`
Then just explode the second result by ',' to get your groups.
Probably the \G anchor works best here for binding the match to an entry point. This regex is designed for input that is always similar to the sample that is provided in your question.
(?<=SomeText\/|\G(?!^))[(,]? *\K[^,)(]+
(?<=SomeText\/|\G) the lookbehind is the part where matches should be glued to
\G matches where the previous match ended (?!^) but don't match start
[(,]? *\ matches optional opening parenthesis or comma followed by any amount of space
\K resets beginning of the reported match
[^,)(]+ matches the wanted characters, that are none of ( ) ,
Demo at regex101 (grab matches of $0)
Another idea with use of capture groups.
SomeText\/([^(]*)\(|\G(?!^),? *([^,)]+)
This one without lookbehind is a bit more accurate (it also requires the opening parenthesis), of better performance (needs fewer steps) and probably easier to understand and maintain.
SomeText\/([^(]*)\( the entry anchor and version is captured here to $1
|\G(?!^),? *([^,)]+) or glued to previous match: capture to $2 one or more characters, that are not , ) preceded by optional space or comma.
Another demo at regex101
Actually, stribizhev was close:
(?:SomeText\/([^() ]*)\s*\(|(?!^)\G),?\s*([^(),]+)(?=[^()]*\))
Just had to make that one class expect at least one match
(?:SomeText\/([0-9.]+)\s*\(|(?!^)\G),?\s*([^(),]+)(?=[^()]*\)) is a little more clear as long as the version number is always numbers and periods.
I wanted to come up with something more elegant than this (though this does actually work):
SomeText\/(.*)\s\(([^\,]+)?\,?\s?([^\,]+)?\,?\s?([^\,]+)?\,?\s?([^\,]+)?\,?\s?([^\,]+)?\,?\s?([^\,]+)?\,?\s?\)
Obviously, the
([^\,]+)?\,?\s?
is repeated 6 times.
(It can be repeated any number of times and it will work for any number of comma-separated items equal to or below that number of times).
I tried to shorten the long, repetitive list of ([^\,]+)?\,?\s? above to
(?:([^\,]+)\,?\s?)*
but it doesn't work and my knowledge of regex is not currently good enough to say why not.
This should solve your problem. Use the code you already have and add something like this. It will determine where commas are in your string and delete them.
Use trim() to delete white spaces at the start or the end.
$a = strpos($line, ",");
$line = trim(substr($line, 55-$a));
I hope, this helps you!

Stripping down Phonenumber (mobile)

Is there a function or a easy way to strip down phone numbers to a specific format?
Input can be a number (mobile, different country codes)
maybe
+4917112345678
+49171/12345678
0049171 12345678
or maybe from another country
004312345678
+44...
Im doing a
$mobile_new = preg_replace("/[^0-9]/","",$mobile);
to kill everything else than a number, because i need it in the format 49171 (without + or 00 at the beginning), but i need to handle if a 00 is inserted first or maybe someone uses +49(0)171 or or inputs a 0171 (needs to be 49171.
so the first numbers ALWAYS need to be countryside without +/00 and without any (0) between.
can someone give me an advice on how to solve this?
You can use
(?:^(?:00|\+|\+\d{2}))|\/|\s|\(\d\)
to match most of your cases and simply replace them with nothing. For example:
$mobile = "+4917112345678";
$mobile_new = preg_replace("/(?:^(?:00|\+|\+\d{2}))|\/|\s|\(\d\)/","",$mobile);
echo $mobile_new;
//output: 4917112345678
regex101 Demo
Explanation:
I'm making use of OR here, matching each of your cases one by one:
(?:^(?:00|\+|\+\d{2})) matches 00, + or + followed by two numbers at the beginning of your string
\/ matches a / anywhere in the string
\s matches a whitspace anywhere in the string (it matches the newline in the regex101 demo, but I suppose you match each number on its own)
\(\d\) matches a number enclosed in brackets anywhere in the string
The only case not covered by this regex is the input format 01712345678, as you can only take a guess what the country specific prefix can be. If you want it to be 49 by default, then simply replace each input starting with a single 0 with the 49:
$mobile = "01712345678";
$mobile_new = preg_replace("/^0/","49",$mobile);
echo $mobile_new;
//output: 491712345678
This pattern (49)\(?([0-9]{3})[\)\s\/]?([0-9]{8}) will split number in three groups:
49 - country code
3 digits - area code
8 digits - number
After match you can construct clean number just concatnating them by \1\2\3.
Demo: https://regex101.com/r/tE5iY3/1
If this not suits you then please explain more precisely what you want with test input and expected output.
I recommend taking a look at LibPhoneNumber by Google and its port for PHP.
It has support for many formats and countries and is well-maintained. Better not to figure this out yourself.
https://github.com/giggsey/libphonenumber-for-php
$phoneUtil = \libphonenumber\PhoneNumberUtil::getInstance();
$usNumberProto = $phoneUtil->parse("+1 650 253 0000", "US");

PHP Regexp capturing repeating group of chars, e.g. hahaha jajajaja hihihi

As title, is there a way in PHP, with preg_match_all to catch all the repetitions of chars group?
For instante catch
hahahaha
jajajaj
hihihi
It's fine to catch repetition of any char, like abababab, acacacacac.
Also, is there a way to count the number of repetition?
The idea is to catch all this "forms" of smiling on social media.
I figured out that there are also other cases, such as misspelled instances like ahahhahaah (where you have two consecutive a or h). Any ideas?
How about this:
preg_match_all('/((?i)[a-z])((?i)[a-z])(\1\2)+/', $str, $m);
$matches = $m[0]; //$matches will contain an array of matches
A bit complicated, but it does work. To explain, the first subpattern (((?i)[a-z])) matches any character between a and z, no matter the case. The second subpattern (((?i)[a-z])) does the same thing. The third subpattern ((\1\2)+) matches one or more repetitions of the first two letters, in the same case as they were originally put. This regular expression also assumes that there's an even number of repetitions. If you don't want that, you can add \1? at the end, meaning that (as long as it contains one or more repetitions), it can end with the first character (for instance, hahah and ikikikik would both be valid, but not asa).
To retrieve the number of repetitions for a specific match, you can do:
$numb = strlen($matches[$index])/2 - 1; //-1 because the first two letters aren't repetitions
For the shortest repetition (e.g. ha gets repeated multiple times in hahahaha):
(.+?)\1+
See demo.
For the longest repetition (e.g. haha gets repeated in hahahaha):
(.+)\1+
Counting Repetitions
The non-regex solution is to compare the lengths of Group 1 (the repteated token) and the overall match.
With pure regex, in .NET, you could simply do (.+?)(\1)+ and look at the number of captures in the Group 1 CaptureCollection object.
In PHP, that's not possible, but there are some hacks. See, for instance, this question about matching a line number—it's the same technique. This is for "study purposes" only—you wouldn't want to use that in real life.

Regex match number consisting of specific range, and length?

I'm trying to match a number that may consist of [1-4], with a length of {1,1}.
I've tried multiple variations of the following, which won't work:
/^string\-(\d{1,1})[1-4]$/
Any guidelines? Thanks!
You should just use:
/^string-[1-4]$/
Match the start of the string followed by the word "string-", followed by a single number, 1 to 4 and the end of the string. This will match only this string and nothing else.
If this is part of a larger string and all you want is the one part you can use something like:
/string-[1-4]\b/
which matches pretty much the same as above just as part of a larger string.
You can (in either option) also wrap the character class ([1-4]) in parentheses to get that as a separate part of the matches array (when using preg_match/preg_match_all).
This is not hard:
/^string-([1-4]{1})$/

How to match those numbers?

I have an array of numbers, for example:
10001234
10002345
Now I have a number, which should be matched against all of those numbers inside the array. The number could either be 10001234 (which would be easy to match), but it could also be 100001234 (4 zeros instead of 3) or 101234 (one zero instead of 3) for example. Any combination could be possible. The only fixed part is the 1234 at the end.
I cant get the last 4 chars, because it can also be 3 or 5 or 6 ..., like 1000123456.
Whats a good way to match that? Maybe its easy and I dont see the wood for the trees :D.
Thanks!
if always the first number is one you can use this
$Num=1000436346;
echo(int)ltrim($Num."","1");
output:
436346
$number % 10000
Will return the remainder of dividing a number by 10000. Meaning, the last four digits.
The question doesn't make the criteria for the match very clear. However, I'll give it a go.
First, my assumptions:
The number always starts with a 1 followed by an unknown number of 0s.
After that, we have a sequence of digits which could be anything (but presumably not starting with zero?), which you want to extract from the string?
Given the above, we can formulate an expression fairly easily:
$input='10002345';
if(preg_match('/10+(\d+)/',$input,$matches)) {
$output = $matches[1];
}
$output now contains the second part of the number -- ie 2345.
If you need to match more than just a leading 1, you can replace that in the expression with \d to match any digit. And add a plus sign after it to allow more than one digit here (although we're still relying on there being at least one zero between the first part of the number and the second).
$input='10002345';
if(preg_match('/\d+0+(\d+)/',$input,$matches)) {
$output = $matches[1];
}

Categories