I need to split a given user string into an array based around mathematical operators. The symbols I need the string splitting around are:
+
-
/
*
()
However I would like to expand on the regex to include other operators I will be adding into my program.
The regex I have so far is this:
"((\(|\d+.+|-|\*|\/\d+\|))"
which when ran through regex101.com matches a given input string of:
(30*30)/(9+8) with '30*30)/(9+8)
I would like the output to be similar to this:
[0] =
[1] = (
[2] = 30
[3] = *
[4] = 30
[5] = )
or:
[0] =
[1] = 4
[2] = *
[3] = 4
depending on whether brackets are present in the user string or not.
I forgot to include current results of the current regex string:
using http://www.phpliveregex.com/ to test preg-split with an input string of:
(30*30)+(9*8)
the result:
array(3
0 =>
1 =>
2 =>
)
Is this the pattern you are looking for?
preg_match_all("/(\(|-\d+|\d+|-|\+|\/|\*|\))/", $input, $output);
https://regex101.com/r/acKW27/3
Preg_match_all: http://www.phpliveregex.com/p/l7L
I forgot / in the regex. Links updated also.
preg_split() retains the delimiters by using the PREG_SPLIT_DELIM_CAPTURE flag. Include the additional flag PREG_SPLIT_NO_EMPTY to eliminate any empty elements. Here is an improved answer that will handle your sample input data, as well as floats and negative numbers.
Code: (Demo)
$expression = '-1*(2/(3+4)--10*-110.5/0.009+-.1)';
var_export(
preg_split(
'~(-?\d*(?:\.\d+)?|[()*/+-])~',
$expression,
0,
PREG_SPLIT_NO_EMPTY | PREG_SPLIT_DELIM_CAPTURE
)
);
Output:
array (
0 => '-1',
1 => '*',
2 => '(',
3 => '2',
4 => '/',
5 => '(',
6 => '3',
7 => '+',
8 => '4',
9 => ')',
10 => '-',
11 => '-10',
12 => '*',
13 => '-110.5',
14 => '/',
15 => '0.009',
16 => '+',
17 => '-.1',
18 => ')',
)
*Note, my above pattern makes digits before the decimal optional. If you know that your floats will always have a number before the dot, then you can use this pattern:
~(-?\d+(?:\.\d+)?|[()*/+-])~
The advantages are: no empty matches, no need for PREG_SPLIT_NO_EMPTY, and improved pattern efficiency.
Related
I would like to know how to split both letters and numbers in 2 separate arrays, for example if I have $string = "2w5d15h9s";, then I want it to become
$letters = ["w", "d", "h", "s"];
$numbers = [2, 5, 15, 9];
Anybody got any idea of how to do it?
I'm basically trying to make a ban command and I want to make it so you can specify a time for the ban to expire.
Use preg_split:
$string = "2w5d15h9s";
$letters = preg_split("/\d+/", $string);
array_shift($letters);
print_r($letters);
$numbers = preg_split("/[a-z]+/", $string);
array_pop($numbers);
print_r($numbers);
This prints:
Array
(
[0] => w
[1] => d
[2] => h
[3] => s
)
Array
(
[0] => 2
[1] => 5
[2] => 15
[3] => 9
)
Note that I am using array_shift or array_pop above to remove empty array elements which arise from the regex split. These empty entries occur because, for example, when spitting on digits the first character is a digit, which leaves behind an empty array element to the left of the first actual letter.
Using preg_split() with a PREG_SPLIT_NO_EMPTY flag is the most concise way that I can think of.
In terms of efficiency, preg_ functions are not famous for being fast. Scanning the string should arguably be done in one pass, but these micro-optimization considerations are probably not worth toiling over for such small input strings.
As for the patterns, \d+ means one or more consecutive digit characters and \D+ means one or more consecutive non-digit characters. The 0 is the limit parameter, which merely informs the function to have no limit (split the string as many times as it can).
Code: (Demo)
$string = "2w5d15h9s";
var_export(preg_split('/\d+/', $string, 0, PREG_SPLIT_NO_EMPTY)); // get non-numbers
var_export(preg_split('/\D+/', $string, 0, PREG_SPLIT_NO_EMPTY)); // get numbers
Output:
array (
0 => 'w',
1 => 'd',
2 => 'h',
3 => 's',
)
array (
0 => '2',
1 => '5',
2 => '15',
3 => '9',
)
I have a database full of strings that I'd like to split into an array. Each string contains a list of directions that begin with a letter (U, D, L, R for Up, Down, Left, Right) and a number to tell how far to go in that direction.
Here is an example of one string.
$string = "U29R45U2L5D2L16";
My desired result:
['U29', 'R45', 'U2', 'L5', 'D2', 'L16']
I thought I could just loop through the string, but I don't know how to tell if the number is one or more spaces in length.
You can use preg_split to break up the string, splitting on something which looks like a U,L,D or R followed by numbers and using the PREG_SPLIT_DELIM_CAPTURE to keep the split text:
$string = "U29R45U2L5D2L16";
print_r(preg_split('/([UDLR]\d+)/', $string, -1, PREG_SPLIT_DELIM_CAPTURE | PREG_SPLIT_NO_EMPTY));
Output:
Array (
[0] => U29
[1] => R45
[2] => U2
[3] => L5
[4] => D2
[5] => L16
)
Demo on 3v4l.org
A regular expression should help you:
<?php
$string = "U29R45U2L5D2L16";
preg_match_all("/[A-Z]\d+/", $string, $matches);
var_dump($matches);
Because this task is about text extraction and not about text validation, you can merely split on the zer-width position after one or more digits. In other words, match one or more digits, then forget them with \K so that they are not consumed while splitting.
Code: (Demo)
$string = "U29R45U2L5D2L16";
var_export(
preg_split(
'/\d+\K/',
$string,
0,
PREG_SPLIT_NO_EMPTY
)
);
Output:
array (
0 => 'U29',
1 => 'R45',
2 => 'U2',
3 => 'L5',
4 => 'D2',
5 => 'L16',
)
$line = "Type:Bid, End Time: 12/20/2018 08:10 AM (PST), Price: $8,000,Bids: 14, Age: 0, Description: , Views: 120270, Valuation: $10,75, IsTrue: false";
I need to get this array:
Array ( [0] => Bid [1] => 12/20/2018 08:10 AM (PST) [2] => $8,000 [3] => 14 [4] => 0 [5] => [6] => 120270 [7] => $10,75 [8] => false )
I agree with Andreas about using preg_match_all(), but not with his pattern.
For stability, I recommend consuming the entire string from the beginning.
Match the label and its trailing colon. [^:]+:
Match zero or more spaces. \s*
Forget what you matched so far \K
Lazily match zero or more characters (giving back when possible -- make minimal match). .*?
"Look Ahead" and demand that the matched characters from #4 are immediately followed by a comma, then 1 or more non-comma&non-colon character (the next label), then a colon ,[^,:]+: OR the end of the string $.
Code: (Demo)
$line = "Type:Bid, End Time: 12/20/2018 08:10 AM (PST), Price: $8,000,Bids: 14, Age: 0, Description: , Views: 120270, Valuation: $10,75, IsTrue: false";
var_export(
preg_match_all(
'/[^:]+:\s*\K.*?(?=\s*(?:$|,[^,:]+:))/',
$line,
$out
)
? $out[0] // isolate fullstring matches
: [] // no matches
);
Output:
array (
0 => 'Bid',
1 => '12/20/2018 08:10 AM (PST)',
2 => '$8,000',
3 => '14',
4 => '0',
5 => '',
6 => '120270',
7 => '$10,75',
8 => 'false',
)
New answer according to new request:
I use he same regex for spliting the string and I replace after what is before the colon:
$line = "Type:Bid, End Time: 12/20/2018 08:10 AM (PST), Price: $8,000,Bids: 14, Age: 0, Description: , Views: 120270, Valuation: $10,75, IsTrue: false";
$parts = preg_split("/(?<!\d),|,(?!\d)/", $line);
$result = array();
foreach($parts as $elem) {
$result[] = preg_replace('/^[^:]+:\h*/', '', $elem);
}
print_r ($result);
Output:
Array
(
[0] => Bid
[1] => 12/20/2018 08:10 AM (PST)
[2] => $8,000
[3] => 14
[4] => 0
[5] =>
[6] => 120270
[7] => $10,75
[8] => false
)
I'd use preg_match instead.
Here the pattern looks for digit(s) comma digit(s) or just digit(s) or a word and a comma.
I append a comma to the string to make the regex simpler.
$line = "TRUE,59,m,10,500";
preg_match_all("/(\d+,\d+|\d+|\w+),/", $line . ",", $match);
var_dump($match);
https://3v4l.org/HQMgu
Even with a different order of the items this code will still produce a correct output: https://3v4l.org/SRJOf
much bettter idea:
$parts=explode(',',$line,4); //explode has a limit you can use in this case 4
same result less code.
I would keep it simple and do this
$line = "TRUE,59,m,10,500";
$parts = preg_split("/,/", $line);
//print_r ($parts);
$parts[3]=$parts[3].','.$parts[4]; //create a new part 3 from 3 and 4
//$parts[3].=','.$parts[4]; //alternative syntax to the above
unset($parts[4]);//remove old part 4
print_r ($parts);
i would also just use explode(), rather than a regular expression.
I have an arithmetic string that will be similar to the following pattern.
a. 1+2+3
b. 2/1*100
c. 1+2+3/3*100
d. (1*2)/(3*4)*100
Points to note are that
1. the string will never contain spaces.
2. the string will always be a combination of Numbers, Arithmetic symbols (+, -, *, /) and the characters '(' and ')'
I am looking for a regex in PHP to split the characters based on their type and form an array of individual string characters like below.
(Note: I cannot use str_split because I want numbers greater than 10 to not to be split.)
a. 1+2+3
output => [
0 => '1'
1 => '+'
2 => '2'
3 => '+'
4 => '3'
]
b. 2/1*100
output => [
0 => '2'
1 => '/'
2 => '1'
3 => '*'
4 => '100'
]`
c. 1+2+3/3*100
output => [
0 => '1'
1 => '+'
2 => '2'
3 => '+'
4 => '3'
5 => '/'
6 => '3'
7 => '*'
8 => '100'
]`
d. (1*2)/(3*4)*100
output => [
0 => '('
1 => '1'
2 => '*'
3 => '2'
4 => ')'
5 => '/'
6 => '('
7 => '3'
8 => '*'
9 => '4'
10 => ')'
11 => '*'
12 => '100'
]
Thank you very much in advance.
Use this regex :
(?<=[()\/*+-])(?=[0-9()])|(?<=[0-9()])(?=[()\/*+-])
It will match every position between a digit or a parenthesis and a operator or a parenthesis.
(?<=[()\/*+-])(?=[0-9()]) matches the position with a parenthesis or an operator at the left and a digit or parenthesis at the right
(?<=[0-9()])(?=[()\/*+-]) is the same but with left and right reversed.
Demo here
Since you state that the expressions are "clean", no spaces or such, you could split on
\b|(?<=\W)(?=\W)
It splits on all word boundaries and boundaries between non word characters (using positive lookarounds matching a position between two non word characters).
See an illustration here at regex101
As I said, I will help you with that if you can provide some work you did by yourself to solve that problem.
However, if when crafting an unidimensional array out of an arithmetic expression, your objective is to parse and cimpute that array, then you should build a tree instead and hierarchise it by putting the operators as nodes, the branches being the operands :
'(1*2)/(3*4)*100'
Array
(
[operand] => '*',
[left] => Array
(
[operand] => '/',
[left] => Array
(
[operand] => '*',
[left] => 1,
[right] => 2
),
[right] => Array
(
[operand] => '*',
[left] => 3,
[right] => 4
)
),
[right] => 100
)
There is no need to use regex for this. You just loop through the string and build the array as you want.
Edit, just realized it can be done much faster with a while loop instead of two for loops and if().
$str ="(10*2)/(3*40)*100";
$str = str_split($str); // make str an array
$arr = array();
$j=0; // counter for new array
for($i=0;$i<count($str);$i++){
if(is_numeric($str[$i])){ // if the item is a number
$arr[$j] = $str[$i]; // add it to new array
$k = $i+1;
while(is_numeric($str[$k])){ // while it's still a number append to new array item.
$arr[$j] .= $str[$k];
$k++; // add one to counter.
if($k == count($str)) break; // if counter is out of bounds, break loop.
}
$j++; // we are done with this item, add one to counter.
$i=$k-1; // set new value to $i
}else{
// not number, add it to the new array and add one to array counter.
$arr[$j] = $str[$i];
$j++;
}
}
var_dump($arr);
https://3v4l.org/p9jZp
You can also use this matching regex: [()+\-*\/]|\d+
Demo
I was doing something similar to this for a php calculator demo. A related post.
Consider this pattern for preg_split():
~-?\d+|[()*/+-]~ (Pattern Demo)
This has the added benefit of allowing negative numbers without confusing them for operators. The first "alternative" matches positive or negative integers, while the second "alternative (after the |) matches parentheses and operators -- one at a time.
In the php implementation, I place the entire pattern in a capture group and retain the delimiters. This way no substrings are left behind. ~ is used as the pattern delimiter so that the slash in the pattern doesn't need to be escaped.
Code: (Demo)
$expression = '(1*2)/(3*4)*100+-10';
var_export(
preg_split(
'~(-?\d+|[()*/+-])~',
$expression,
0,
PREG_SPLIT_NO_EMPTY | PREG_SPLIT_DELIM_CAPTURE
)
);
Output:
array (
0 => '(',
1 => '1',
2 => '*',
3 => '2',
4 => ')',
5 => '/',
6 => '(',
7 => '3',
8 => '*',
9 => '4',
10 => ')',
11 => '*',
12 => '100',
13 => '+',
14 => '-10',
)
My situation requires recursion, and I'm able to match what's in the curly brackets already the way I need it, but I'm unable to capture the surrounding text.
So this would be the example text:
This is foo {{foo}} and {{bar.function({{demo.funtion({{inner}} == "demo")}} and {{bar}} or "foo")}} more_text {{foo
And I need my result to look like this:
0 => This is foo
1 => {{foo}}
2 => and
3 => {{bar.function({{demo.funtion({{inner}} == "demo")}} and {{bar}} or "foo")}}
4 => more_text {{foo
With this: (\{\{([^{{}}]|(?R))*\}\}) I have been able to match {{foo}} and {{bar.function({{demo.funtion({{inner}} == "demo")}} and {{bar}} or "foo")}} very nicely, but not the surrounding text to achieve the result that I need.
I have tried many things, but without success.
You may use the following solution based on the preg_split and PREG_SPLIT_DELIM_CAPTURE flag:
$re = '/({{(?:[^{}]++|(?R))*}})/';
$str = 'This is foo {{foo}} and {{bar.function({{demo.funtion({{inner}} == "demo")}} and {{bar}} or "foo")}} more_text {{foo';
$res = preg_split($re, $str, -1, PREG_SPLIT_DELIM_CAPTURE | PREG_SPLIT_NO_EMPTY);
print_r($res);
// => Array
(
[0] => This is foo
[1] => {{foo}}
[2] => and
[3] => {{bar.function({{demo.funtion({{inner}} == "demo")}} and {{bar}} or "foo")}}
[4] => more_text {{foo
)
See the PHP demo.
The whole pattern is captured with the outer capturing group, that is why when adding PREG_SPLIT_DELIM_CAPTURE this text (that is split upon) is added to the output array.
If there are unwanted empty elements, PREG_SPLIT_NO_EMPTY flag will discard them.
More details:
Pattern: I removed unnecessary escapes and symbols from your pattern as you do not have to escape { and } in PHP regex when the context is enough for the rege engine to deduce the { meaning you do not need to escape } at all in all contexts). Note that [{}] is the same as [{{}}], both will match a single char that is either a { or }, no matter how many { and } you put into the character class. I also enhanced its performance by turning the + greedy quantifier into a possessive quantifier ++.
Details:
( - Group 1 start:
{{ - 2 consecutive {s
(?:[^{}]++|(?R))* - 0 or more sequences of:
[^{}]++ - 1 or more symbols other than { and } (no backtracking into this pattern is allowed)
| - or
(?R) - try matching the whole pattern
}} - a }} substring
) - Group 1 end.
PHP part:
When tokenizing a string using just one token type, it is easy to use a splitting approach. Since preg_split in PHP can split on a regex while keeping the text that is matched, it is ideal for this kind of task.
The only trouble is that empty entries might crawl into the resulting array if the matches appear to be consecutive or at the start/end of the string. Thus, PREG_SPLIT_NO_EMPTY is good to use here.
I would use a pattern like this
$patt = '/(?P<open>\{\{)|(?P<body>[-0-9a-zA-Z._]+)|(?P<whitespace>\s+)|(?<opperators>and|or|==)|(?P<close>\}\})/'
preg_match_all( $patt, $text, $matches );
The output is far to long but you can loop over it and then match items up, basically it's tokeninzing the string.
Its like this
array (
0 =>
array (
0 => '{{',
1 => 'bar.function',
2 => '{{',
3 => 'demo.funtion',
4 => '{{',
5 => 'inner',
6 => '}}',
7 => ' ',
8 => '==',
9 => ' ',
10 => 'demo',
11 => '}}',
12 => ' ',
13 => 'and',
14 => ' ',
15 => '{{',
16 => 'bar',
17 => '}}',
18 => ' ',
19 => 'or',
20 => ' ',
21 => 'foo',
22 => '}}',
),
'open' =>
array (
0 => '{{',
1 => '',
2 => '{{',
3 => '',
4 => '{{',
5 => '',
6 => '',
7 => '',
8 => '',
9 => '',
10 => '',
11 => '',
12 => '',
13 => '',
14 => '',
15 => '{{',
16 => '',
17 => '',
18 => '',
19 => '',
20 => '',
21 => '',
22 => '',
),
),
'body' =>
array (
0 => '',
1 => 'bar.function',
2 => '',
3 => 'demo.funtion',
4 => '',
5 => 'inner',
6 => '',
....
)
)
Then in a loop you can tell match [0][0] is open tag, match [0][1] is body match [0][3] is another open etc. and by keeping track of open and close tags you can work out the nesting. It will tell you what is an open match body match close match operator match etc...
Every thing you need, I don't have time for a full workup on a solution...
A quick example would be an open followed by a body followed by a close is a variable. And an open followed by and body and another open is a function.
p
You can also add additional patterns by inserting like this (?P<function>function\.) with the pipe in there like '/(?P<open>\{\{)|(?P<function>function\.)|... . Then you could pick up keywords like function foreach block etc... what have you.
I've written full fledged template systems with this method. In my template system I build the RegX in an array like this
[ 'open' => '\{\{', 'function' => 'function\.', .... ]
And then compress it to the actual regx, makes life easy...
$r = [];
foreach( $patt_array as $key=>$value ){
$r[] = '(?P<'.$key.'>'.$value.')';
}
$patt = '/'.implode('|', $r ).'/';
Etc...
If you follow.