Regex to split string into array of numbers and characters using PHP - php

I have an arithmetic string that will be similar to the following pattern.
a. 1+2+3
b. 2/1*100
c. 1+2+3/3*100
d. (1*2)/(3*4)*100
Points to note are that
1. the string will never contain spaces.
2. the string will always be a combination of Numbers, Arithmetic symbols (+, -, *, /) and the characters '(' and ')'
I am looking for a regex in PHP to split the characters based on their type and form an array of individual string characters like below.
(Note: I cannot use str_split because I want numbers greater than 10 to not to be split.)
a. 1+2+3
output => [
0 => '1'
1 => '+'
2 => '2'
3 => '+'
4 => '3'
]
b. 2/1*100
output => [
0 => '2'
1 => '/'
2 => '1'
3 => '*'
4 => '100'
]`
c. 1+2+3/3*100
output => [
0 => '1'
1 => '+'
2 => '2'
3 => '+'
4 => '3'
5 => '/'
6 => '3'
7 => '*'
8 => '100'
]`
d. (1*2)/(3*4)*100
output => [
0 => '('
1 => '1'
2 => '*'
3 => '2'
4 => ')'
5 => '/'
6 => '('
7 => '3'
8 => '*'
9 => '4'
10 => ')'
11 => '*'
12 => '100'
]
Thank you very much in advance.

Use this regex :
(?<=[()\/*+-])(?=[0-9()])|(?<=[0-9()])(?=[()\/*+-])
It will match every position between a digit or a parenthesis and a operator or a parenthesis.
(?<=[()\/*+-])(?=[0-9()]) matches the position with a parenthesis or an operator at the left and a digit or parenthesis at the right
(?<=[0-9()])(?=[()\/*+-]) is the same but with left and right reversed.
Demo here

Since you state that the expressions are "clean", no spaces or such, you could split on
\b|(?<=\W)(?=\W)
It splits on all word boundaries and boundaries between non word characters (using positive lookarounds matching a position between two non word characters).
See an illustration here at regex101

As I said, I will help you with that if you can provide some work you did by yourself to solve that problem.
However, if when crafting an unidimensional array out of an arithmetic expression, your objective is to parse and cimpute that array, then you should build a tree instead and hierarchise it by putting the operators as nodes, the branches being the operands :
'(1*2)/(3*4)*100'
Array
(
[operand] => '*',
[left] => Array
(
[operand] => '/',
[left] => Array
(
[operand] => '*',
[left] => 1,
[right] => 2
),
[right] => Array
(
[operand] => '*',
[left] => 3,
[right] => 4
)
),
[right] => 100
)

There is no need to use regex for this. You just loop through the string and build the array as you want.
Edit, just realized it can be done much faster with a while loop instead of two for loops and if().
$str ="(10*2)/(3*40)*100";
$str = str_split($str); // make str an array
$arr = array();
$j=0; // counter for new array
for($i=0;$i<count($str);$i++){
if(is_numeric($str[$i])){ // if the item is a number
$arr[$j] = $str[$i]; // add it to new array
$k = $i+1;
while(is_numeric($str[$k])){ // while it's still a number append to new array item.
$arr[$j] .= $str[$k];
$k++; // add one to counter.
if($k == count($str)) break; // if counter is out of bounds, break loop.
}
$j++; // we are done with this item, add one to counter.
$i=$k-1; // set new value to $i
}else{
// not number, add it to the new array and add one to array counter.
$arr[$j] = $str[$i];
$j++;
}
}
var_dump($arr);
https://3v4l.org/p9jZp

You can also use this matching regex: [()+\-*\/]|\d+
Demo

I was doing something similar to this for a php calculator demo. A related post.
Consider this pattern for preg_split():
~-?\d+|[()*/+-]~ (Pattern Demo)
This has the added benefit of allowing negative numbers without confusing them for operators. The first "alternative" matches positive or negative integers, while the second "alternative (after the |) matches parentheses and operators -- one at a time.
In the php implementation, I place the entire pattern in a capture group and retain the delimiters. This way no substrings are left behind. ~ is used as the pattern delimiter so that the slash in the pattern doesn't need to be escaped.
Code: (Demo)
$expression = '(1*2)/(3*4)*100+-10';
var_export(
preg_split(
'~(-?\d+|[()*/+-])~',
$expression,
0,
PREG_SPLIT_NO_EMPTY | PREG_SPLIT_DELIM_CAPTURE
)
);
Output:
array (
0 => '(',
1 => '1',
2 => '*',
3 => '2',
4 => ')',
5 => '/',
6 => '(',
7 => '3',
8 => '*',
9 => '4',
10 => ')',
11 => '*',
12 => '100',
13 => '+',
14 => '-10',
)

Related

How do I split the letters and numbers to 2 arrays from a string in PHP

I would like to know how to split both letters and numbers in 2 separate arrays, for example if I have $string = "2w5d15h9s";, then I want it to become
$letters = ["w", "d", "h", "s"];
$numbers = [2, 5, 15, 9];
Anybody got any idea of how to do it?
I'm basically trying to make a ban command and I want to make it so you can specify a time for the ban to expire.
Use preg_split:
$string = "2w5d15h9s";
$letters = preg_split("/\d+/", $string);
array_shift($letters);
print_r($letters);
$numbers = preg_split("/[a-z]+/", $string);
array_pop($numbers);
print_r($numbers);
This prints:
Array
(
[0] => w
[1] => d
[2] => h
[3] => s
)
Array
(
[0] => 2
[1] => 5
[2] => 15
[3] => 9
)
Note that I am using array_shift or array_pop above to remove empty array elements which arise from the regex split. These empty entries occur because, for example, when spitting on digits the first character is a digit, which leaves behind an empty array element to the left of the first actual letter.
Using preg_split() with a PREG_SPLIT_NO_EMPTY flag is the most concise way that I can think of.
In terms of efficiency, preg_ functions are not famous for being fast. Scanning the string should arguably be done in one pass, but these micro-optimization considerations are probably not worth toiling over for such small input strings.
As for the patterns, \d+ means one or more consecutive digit characters and \D+ means one or more consecutive non-digit characters. The 0 is the limit parameter, which merely informs the function to have no limit (split the string as many times as it can).
Code: (Demo)
$string = "2w5d15h9s";
var_export(preg_split('/\d+/', $string, 0, PREG_SPLIT_NO_EMPTY)); // get non-numbers
var_export(preg_split('/\D+/', $string, 0, PREG_SPLIT_NO_EMPTY)); // get numbers
Output:
array (
0 => 'w',
1 => 'd',
2 => 'h',
3 => 's',
)
array (
0 => '2',
1 => '5',
2 => '15',
3 => '9',
)

preg_split regex - need to split user input around mathematical operators

I need to split a given user string into an array based around mathematical operators. The symbols I need the string splitting around are:
+
-
/
*
()
However I would like to expand on the regex to include other operators I will be adding into my program.
The regex I have so far is this:
"((\(|\d+.+|-|\*|\/\d+\|))"
which when ran through regex101.com matches a given input string of:
(30*30)/(9+8) with '30*30)/(9+8)
I would like the output to be similar to this:
[0] =
[1] = (
[2] = 30
[3] = *
[4] = 30
[5] = )
or:
[0] =
[1] = 4
[2] = *
[3] = 4
depending on whether brackets are present in the user string or not.
I forgot to include current results of the current regex string:
using http://www.phpliveregex.com/ to test preg-split with an input string of:
(30*30)+(9*8)
the result:
array(3
0 =>
1 =>
2 =>
)
Is this the pattern you are looking for?
preg_match_all("/(\(|-\d+|\d+|-|\+|\/|\*|\))/", $input, $output);
https://regex101.com/r/acKW27/3
Preg_match_all: http://www.phpliveregex.com/p/l7L
I forgot / in the regex. Links updated also.
preg_split() retains the delimiters by using the PREG_SPLIT_DELIM_CAPTURE flag. Include the additional flag PREG_SPLIT_NO_EMPTY to eliminate any empty elements. Here is an improved answer that will handle your sample input data, as well as floats and negative numbers.
Code: (Demo)
$expression = '-1*(2/(3+4)--10*-110.5/0.009+-.1)';
var_export(
preg_split(
'~(-?\d*(?:\.\d+)?|[()*/+-])~',
$expression,
0,
PREG_SPLIT_NO_EMPTY | PREG_SPLIT_DELIM_CAPTURE
)
);
Output:
array (
0 => '-1',
1 => '*',
2 => '(',
3 => '2',
4 => '/',
5 => '(',
6 => '3',
7 => '+',
8 => '4',
9 => ')',
10 => '-',
11 => '-10',
12 => '*',
13 => '-110.5',
14 => '/',
15 => '0.009',
16 => '+',
17 => '-.1',
18 => ')',
)
*Note, my above pattern makes digits before the decimal optional. If you know that your floats will always have a number before the dot, then you can use this pattern:
~(-?\d+(?:\.\d+)?|[()*/+-])~
The advantages are: no empty matches, no need for PREG_SPLIT_NO_EMPTY, and improved pattern efficiency.

Matching text that is not within the curly brackets, while also capturing the brackets after

My situation requires recursion, and I'm able to match what's in the curly brackets already the way I need it, but I'm unable to capture the surrounding text.
So this would be the example text:
This is foo {{foo}} and {{bar.function({{demo.funtion({{inner}} == "demo")}} and {{bar}} or "foo")}} more_text {{foo
And I need my result to look like this:
0 => This is foo
1 => {{foo}}
2 => and
3 => {{bar.function({{demo.funtion({{inner}} == "demo")}} and {{bar}} or "foo")}}
4 => more_text {{foo
With this: (\{\{([^{{}}]|(?R))*\}\}) I have been able to match {{foo}} and {{bar.function({{demo.funtion({{inner}} == "demo")}} and {{bar}} or "foo")}} very nicely, but not the surrounding text to achieve the result that I need.
I have tried many things, but without success.
You may use the following solution based on the preg_split and PREG_SPLIT_DELIM_CAPTURE flag:
$re = '/({{(?:[^{}]++|(?R))*}})/';
$str = 'This is foo {{foo}} and {{bar.function({{demo.funtion({{inner}} == "demo")}} and {{bar}} or "foo")}} more_text {{foo';
$res = preg_split($re, $str, -1, PREG_SPLIT_DELIM_CAPTURE | PREG_SPLIT_NO_EMPTY);
print_r($res);
// => Array
(
[0] => This is foo
[1] => {{foo}}
[2] => and
[3] => {{bar.function({{demo.funtion({{inner}} == "demo")}} and {{bar}} or "foo")}}
[4] => more_text {{foo
)
See the PHP demo.
The whole pattern is captured with the outer capturing group, that is why when adding PREG_SPLIT_DELIM_CAPTURE this text (that is split upon) is added to the output array.
If there are unwanted empty elements, PREG_SPLIT_NO_EMPTY flag will discard them.
More details:
Pattern: I removed unnecessary escapes and symbols from your pattern as you do not have to escape { and } in PHP regex when the context is enough for the rege engine to deduce the { meaning you do not need to escape } at all in all contexts). Note that [{}] is the same as [{{}}], both will match a single char that is either a { or }, no matter how many { and } you put into the character class. I also enhanced its performance by turning the + greedy quantifier into a possessive quantifier ++.
Details:
( - Group 1 start:
{{ - 2 consecutive {s
(?:[^{}]++|(?R))* - 0 or more sequences of:
[^{}]++ - 1 or more symbols other than { and } (no backtracking into this pattern is allowed)
| - or
(?R) - try matching the whole pattern
}} - a }} substring
) - Group 1 end.
PHP part:
When tokenizing a string using just one token type, it is easy to use a splitting approach. Since preg_split in PHP can split on a regex while keeping the text that is matched, it is ideal for this kind of task.
The only trouble is that empty entries might crawl into the resulting array if the matches appear to be consecutive or at the start/end of the string. Thus, PREG_SPLIT_NO_EMPTY is good to use here.
I would use a pattern like this
$patt = '/(?P<open>\{\{)|(?P<body>[-0-9a-zA-Z._]+)|(?P<whitespace>\s+)|(?<opperators>and|or|==)|(?P<close>\}\})/'
preg_match_all( $patt, $text, $matches );
The output is far to long but you can loop over it and then match items up, basically it's tokeninzing the string.
Its like this
array (
0 =>
array (
0 => '{{',
1 => 'bar.function',
2 => '{{',
3 => 'demo.funtion',
4 => '{{',
5 => 'inner',
6 => '}}',
7 => ' ',
8 => '==',
9 => ' ',
10 => 'demo',
11 => '}}',
12 => ' ',
13 => 'and',
14 => ' ',
15 => '{{',
16 => 'bar',
17 => '}}',
18 => ' ',
19 => 'or',
20 => ' ',
21 => 'foo',
22 => '}}',
),
'open' =>
array (
0 => '{{',
1 => '',
2 => '{{',
3 => '',
4 => '{{',
5 => '',
6 => '',
7 => '',
8 => '',
9 => '',
10 => '',
11 => '',
12 => '',
13 => '',
14 => '',
15 => '{{',
16 => '',
17 => '',
18 => '',
19 => '',
20 => '',
21 => '',
22 => '',
),
),
'body' =>
array (
0 => '',
1 => 'bar.function',
2 => '',
3 => 'demo.funtion',
4 => '',
5 => 'inner',
6 => '',
....
)
)
Then in a loop you can tell match [0][0] is open tag, match [0][1] is body match [0][3] is another open etc. and by keeping track of open and close tags you can work out the nesting. It will tell you what is an open match body match close match operator match etc...
Every thing you need, I don't have time for a full workup on a solution...
A quick example would be an open followed by a body followed by a close is a variable. And an open followed by and body and another open is a function.
p
You can also add additional patterns by inserting like this (?P<function>function\.) with the pipe in there like '/(?P<open>\{\{)|(?P<function>function\.)|... . Then you could pick up keywords like function foreach block etc... what have you.
I've written full fledged template systems with this method. In my template system I build the RegX in an array like this
[ 'open' => '\{\{', 'function' => 'function\.', .... ]
And then compress it to the actual regx, makes life easy...
$r = [];
foreach( $patt_array as $key=>$value ){
$r[] = '(?P<'.$key.'>'.$value.')';
}
$patt = '/'.implode('|', $r ).'/';
Etc...
If you follow.

Capture all occurrences of repeated formatted substrings

I've a string that follows this pattern [:it]Stringa in italiano[:en]String in english.
I'm trying to use preg_match_all() to capture the locales and the associated strings, ie:
[1] => 'it',
[2] => 'en',
...
[1] => 'Stringa in italiano',
[2] => 'String in english'
The regex that I'm using "/\[:(\w+)](.+?)(?=\[:\w+])/" (https://regex101.com/r/eZ1gT7/400) returns only the first group of data. What I'm doing wrong?
The final formatted segment will not satisfy your lookahead. You will need to include the option of match the position of the end of the string with an alternation. A pipe (|) means "or". A dollar symbol ($) means "end of string".
I am using negated character classes to match between literal square braces. If your \w is sufficient for your project, feel free to keep that portion as you originally posted.
Code: (Demo)
$string = '[:it]Stringa in italiano[:en]String in english';
preg_match_all('~\[:([^]]+)](.+?)(?=$|\[:[^]]+])~', $string, $m);
var_export($m);
Output:
array (
0 =>
array (
0 => '[:it]Stringa in italiano',
1 => '[:en]String in english',
),
1 =>
array (
0 => 'it',
1 => 'en',
),
2 =>
array (
0 => 'Stringa in italiano',
1 => 'String in english',
),
)

Regexp to parse a string delimited by number and a colon

I'm trying to parse a string like this
1:Tous les 6 mois2:Every 6 months4:Tutti i 6 mesi3:Cada 6 meses
Into an array like this
array (
0 =>
array (
0 => '1:Tous les 6 mois',
1 => '1',
2 => 'Tous les 6 mois',
),
1 =>
array (
0 => '2:Every 6 months',
1 => '2',
2 => 'Every 6 months',
),
2 =>
array (
0 => '4:Tutti i 6 mesi',
1 => '4',
2 => 'Tutti i 6 mesi',
),
3 =>
array (
0 => '3:Cada 6 meses',
1 => '3',
2 => 'Cada 6 meses',
),
)
I tried this
preg_match_all('/(\d+):([^\b(\d:)]+)/', $string, $matches, PREG_SET_ORDER);
But it stops the capture at the first digit. Parenthesis get interpreted as characters
Another option would be to use
preg_split('/(\d):/', $string, -1, PREG_SPLIT_DELIM_CAPTURE)
But I'm genuinely interested by a preg_match_all solution
You can use positive look ahead like this
preg_match_all('/(\d+):(.*?)(?=\d+:|$)/', $str, $matches, PREG_SET_ORDER);
The look ahead (?=\d+:|$) means match the previous token as long as either a digit and colon or the end of the string is present.
You can use a lookahead based regular expression:
preg_match_all('/(\d+):((?:(?!\d:).)*)/', $str, $matches, PREG_SET_ORDER);
Note: You can't place word boundaries \b inside of a character class.
eval.in

Categories