Php preg_split seperates number with comma in two different numbers - php

$line = "Type:Bid, End Time: 12/20/2018 08:10 AM (PST), Price: $8,000,Bids: 14, Age: 0, Description: , Views: 120270, Valuation: $10,75, IsTrue: false";
I need to get this array:
Array ( [0] => Bid [1] => 12/20/2018 08:10 AM (PST) [2] => $8,000 [3] => 14 [4] => 0 [5] => [6] => 120270 [7] => $10,75 [8] => false )

I agree with Andreas about using preg_match_all(), but not with his pattern.
For stability, I recommend consuming the entire string from the beginning.
Match the label and its trailing colon. [^:]+:
Match zero or more spaces. \s*
Forget what you matched so far \K
Lazily match zero or more characters (giving back when possible -- make minimal match). .*?
"Look Ahead" and demand that the matched characters from #4 are immediately followed by a comma, then 1 or more non-comma&non-colon character (the next label), then a colon ,[^,:]+: OR the end of the string $.
Code: (Demo)
$line = "Type:Bid, End Time: 12/20/2018 08:10 AM (PST), Price: $8,000,Bids: 14, Age: 0, Description: , Views: 120270, Valuation: $10,75, IsTrue: false";
var_export(
preg_match_all(
'/[^:]+:\s*\K.*?(?=\s*(?:$|,[^,:]+:))/',
$line,
$out
)
? $out[0] // isolate fullstring matches
: [] // no matches
);
Output:
array (
0 => 'Bid',
1 => '12/20/2018 08:10 AM (PST)',
2 => '$8,000',
3 => '14',
4 => '0',
5 => '',
6 => '120270',
7 => '$10,75',
8 => 'false',
)

New answer according to new request:
I use he same regex for spliting the string and I replace after what is before the colon:
$line = "Type:Bid, End Time: 12/20/2018 08:10 AM (PST), Price: $8,000,Bids: 14, Age: 0, Description: , Views: 120270, Valuation: $10,75, IsTrue: false";
$parts = preg_split("/(?<!\d),|,(?!\d)/", $line);
$result = array();
foreach($parts as $elem) {
$result[] = preg_replace('/^[^:]+:\h*/', '', $elem);
}
print_r ($result);
Output:
Array
(
[0] => Bid
[1] => 12/20/2018 08:10 AM (PST)
[2] => $8,000
[3] => 14
[4] => 0
[5] =>
[6] => 120270
[7] => $10,75
[8] => false
)

I'd use preg_match instead.
Here the pattern looks for digit(s) comma digit(s) or just digit(s) or a word and a comma.
I append a comma to the string to make the regex simpler.
$line = "TRUE,59,m,10,500";
preg_match_all("/(\d+,\d+|\d+|\w+),/", $line . ",", $match);
var_dump($match);
https://3v4l.org/HQMgu
Even with a different order of the items this code will still produce a correct output: https://3v4l.org/SRJOf

much bettter idea:
$parts=explode(',',$line,4); //explode has a limit you can use in this case 4
same result less code.
I would keep it simple and do this
$line = "TRUE,59,m,10,500";
$parts = preg_split("/,/", $line);
//print_r ($parts);
$parts[3]=$parts[3].','.$parts[4]; //create a new part 3 from 3 and 4
//$parts[3].=','.$parts[4]; //alternative syntax to the above
unset($parts[4]);//remove old part 4
print_r ($parts);
i would also just use explode(), rather than a regular expression.

Related

Grouping of regex with same name

I am trying to write a regex to get the ingredients name, quantity, unit from the sting. The string can be any pattern like "pohe 2 kg OR 2 Kg pohe OR 2Kg Pohe".
I have tried with below code -
<?PHP
$units = array("tbsp", "ml", "g", "grams", "kg", "few drops"); // add whatever other units are allowed
//mixed pattern
$pattern = '/(?J)(((?<i>^[a-zA-Z\s]+)(?<q>\d*\s*)(?<u>' . join("|", array_map("preg_quote", $units)) . '))|(?<q>^\d*\s*)(?<u>' . join("|", array_map("preg_quote", $units)) . ')(?<i>[a-zA-Z\s]+))/';
$ingredients = '2kg pohe';
preg_match_all($pattern, $ingredients, $m);
print_r($m);
$quantities = $m['q'];
$units = array_map('trim', $m['u']);
$ingrd = array_map('trim', $m['i']);
print_r($quantities);
print_r($units);
print_r($ingrd);
?>
The above code works for the string "2kg pohe", but not for the "pohe 2kg".
If anyone having idea what I am missing, please help me in this.
For pohe 2kg duplicate named groups are empty, as the documentation of preg_match_all states that for the flag PREG_PATTERN_ORDER (which is the default)
If the pattern contains duplicate named subpatterns, only the
rightmost subpattern is stored in $matches[NAME].
Int he pattern that you generate, there is a match in the second part (after the alternation) for 2kg pohe but for the pohe 2kg there is only a match in the first part so for the second part there are no values stored.
What you might do, is use the PREG_SET_ORDER flag instead, which gives:
$ingredients = '2kg pohe';
preg_match_all($pattern, $ingredients, $m, PREG_SET_ORDER);
print_r($m[0]);
Output
Array
(
[0] => 2kg pohe
[i] => pohe
[1] =>
[q] => 2
[2] =>
[u] => kg
[3] =>
[4] => 2
[5] => kg
[6] => pohe
)
And
$ingredients = 'pohe 2kg';
preg_match_all($pattern, $ingredients, $m, PREG_SET_ORDER);
print_r($m[0]);
Output
Array
(
[0] => pohe 2kg
[i] => pohe
[1] => pohe
[q] => 2
[2] => 2
[u] => kg
[3] => kg
)
Then you can get the named subgroups for both strings like $m[0]['i'] etc..
Note that in the example there is 2Kg and you can make the pattern case insensitive to match.

Splitting a single string to an array on more than one delimiter

Is it possible to explode the following:
08 1.2/3(1(1)2.1-1
to an array of {08, 1, 2, 3, 1, 1, 2, 1, 1}?
I tried using preg_split("/ (\s|\.|\-|\(|\)) /g", '08 1.2/3(1(1)2.1-1') but it returned nothing. I tried checking my regex here and it matched well. What am I missing here?
You should use a character class containing all the delimiters which you want to use for splitting. Regex character classes appear inside [...]:
<?php
$keywords = preg_split("/[\s,\/().-]+/", '08 1.2/3(1(1)2.1-1');
print_r($keywords);
Result:
Array ( [0] => 08 [1] => 1 [2] => 2 [3] => 3 [4] => 1 [5] => 1 [6] => 2 [7] => 1 [8] => 1 )
You can use preg_match_all():
$str = '08 1.2/3(1(1)2.1-1';
preg_match_all('!\d+!', $str, $matches);
print_r($matches);

preg_split regex - need to split user input around mathematical operators

I need to split a given user string into an array based around mathematical operators. The symbols I need the string splitting around are:
+
-
/
*
()
However I would like to expand on the regex to include other operators I will be adding into my program.
The regex I have so far is this:
"((\(|\d+.+|-|\*|\/\d+\|))"
which when ran through regex101.com matches a given input string of:
(30*30)/(9+8) with '30*30)/(9+8)
I would like the output to be similar to this:
[0] =
[1] = (
[2] = 30
[3] = *
[4] = 30
[5] = )
or:
[0] =
[1] = 4
[2] = *
[3] = 4
depending on whether brackets are present in the user string or not.
I forgot to include current results of the current regex string:
using http://www.phpliveregex.com/ to test preg-split with an input string of:
(30*30)+(9*8)
the result:
array(3
0 =>
1 =>
2 =>
)
Is this the pattern you are looking for?
preg_match_all("/(\(|-\d+|\d+|-|\+|\/|\*|\))/", $input, $output);
https://regex101.com/r/acKW27/3
Preg_match_all: http://www.phpliveregex.com/p/l7L
I forgot / in the regex. Links updated also.
preg_split() retains the delimiters by using the PREG_SPLIT_DELIM_CAPTURE flag. Include the additional flag PREG_SPLIT_NO_EMPTY to eliminate any empty elements. Here is an improved answer that will handle your sample input data, as well as floats and negative numbers.
Code: (Demo)
$expression = '-1*(2/(3+4)--10*-110.5/0.009+-.1)';
var_export(
preg_split(
'~(-?\d*(?:\.\d+)?|[()*/+-])~',
$expression,
0,
PREG_SPLIT_NO_EMPTY | PREG_SPLIT_DELIM_CAPTURE
)
);
Output:
array (
0 => '-1',
1 => '*',
2 => '(',
3 => '2',
4 => '/',
5 => '(',
6 => '3',
7 => '+',
8 => '4',
9 => ')',
10 => '-',
11 => '-10',
12 => '*',
13 => '-110.5',
14 => '/',
15 => '0.009',
16 => '+',
17 => '-.1',
18 => ')',
)
*Note, my above pattern makes digits before the decimal optional. If you know that your floats will always have a number before the dot, then you can use this pattern:
~(-?\d+(?:\.\d+)?|[()*/+-])~
The advantages are: no empty matches, no need for PREG_SPLIT_NO_EMPTY, and improved pattern efficiency.

Matching text that is not within the curly brackets, while also capturing the brackets after

My situation requires recursion, and I'm able to match what's in the curly brackets already the way I need it, but I'm unable to capture the surrounding text.
So this would be the example text:
This is foo {{foo}} and {{bar.function({{demo.funtion({{inner}} == "demo")}} and {{bar}} or "foo")}} more_text {{foo
And I need my result to look like this:
0 => This is foo
1 => {{foo}}
2 => and
3 => {{bar.function({{demo.funtion({{inner}} == "demo")}} and {{bar}} or "foo")}}
4 => more_text {{foo
With this: (\{\{([^{{}}]|(?R))*\}\}) I have been able to match {{foo}} and {{bar.function({{demo.funtion({{inner}} == "demo")}} and {{bar}} or "foo")}} very nicely, but not the surrounding text to achieve the result that I need.
I have tried many things, but without success.
You may use the following solution based on the preg_split and PREG_SPLIT_DELIM_CAPTURE flag:
$re = '/({{(?:[^{}]++|(?R))*}})/';
$str = 'This is foo {{foo}} and {{bar.function({{demo.funtion({{inner}} == "demo")}} and {{bar}} or "foo")}} more_text {{foo';
$res = preg_split($re, $str, -1, PREG_SPLIT_DELIM_CAPTURE | PREG_SPLIT_NO_EMPTY);
print_r($res);
// => Array
(
[0] => This is foo
[1] => {{foo}}
[2] => and
[3] => {{bar.function({{demo.funtion({{inner}} == "demo")}} and {{bar}} or "foo")}}
[4] => more_text {{foo
)
See the PHP demo.
The whole pattern is captured with the outer capturing group, that is why when adding PREG_SPLIT_DELIM_CAPTURE this text (that is split upon) is added to the output array.
If there are unwanted empty elements, PREG_SPLIT_NO_EMPTY flag will discard them.
More details:
Pattern: I removed unnecessary escapes and symbols from your pattern as you do not have to escape { and } in PHP regex when the context is enough for the rege engine to deduce the { meaning you do not need to escape } at all in all contexts). Note that [{}] is the same as [{{}}], both will match a single char that is either a { or }, no matter how many { and } you put into the character class. I also enhanced its performance by turning the + greedy quantifier into a possessive quantifier ++.
Details:
( - Group 1 start:
{{ - 2 consecutive {s
(?:[^{}]++|(?R))* - 0 or more sequences of:
[^{}]++ - 1 or more symbols other than { and } (no backtracking into this pattern is allowed)
| - or
(?R) - try matching the whole pattern
}} - a }} substring
) - Group 1 end.
PHP part:
When tokenizing a string using just one token type, it is easy to use a splitting approach. Since preg_split in PHP can split on a regex while keeping the text that is matched, it is ideal for this kind of task.
The only trouble is that empty entries might crawl into the resulting array if the matches appear to be consecutive or at the start/end of the string. Thus, PREG_SPLIT_NO_EMPTY is good to use here.
I would use a pattern like this
$patt = '/(?P<open>\{\{)|(?P<body>[-0-9a-zA-Z._]+)|(?P<whitespace>\s+)|(?<opperators>and|or|==)|(?P<close>\}\})/'
preg_match_all( $patt, $text, $matches );
The output is far to long but you can loop over it and then match items up, basically it's tokeninzing the string.
Its like this
array (
0 =>
array (
0 => '{{',
1 => 'bar.function',
2 => '{{',
3 => 'demo.funtion',
4 => '{{',
5 => 'inner',
6 => '}}',
7 => ' ',
8 => '==',
9 => ' ',
10 => 'demo',
11 => '}}',
12 => ' ',
13 => 'and',
14 => ' ',
15 => '{{',
16 => 'bar',
17 => '}}',
18 => ' ',
19 => 'or',
20 => ' ',
21 => 'foo',
22 => '}}',
),
'open' =>
array (
0 => '{{',
1 => '',
2 => '{{',
3 => '',
4 => '{{',
5 => '',
6 => '',
7 => '',
8 => '',
9 => '',
10 => '',
11 => '',
12 => '',
13 => '',
14 => '',
15 => '{{',
16 => '',
17 => '',
18 => '',
19 => '',
20 => '',
21 => '',
22 => '',
),
),
'body' =>
array (
0 => '',
1 => 'bar.function',
2 => '',
3 => 'demo.funtion',
4 => '',
5 => 'inner',
6 => '',
....
)
)
Then in a loop you can tell match [0][0] is open tag, match [0][1] is body match [0][3] is another open etc. and by keeping track of open and close tags you can work out the nesting. It will tell you what is an open match body match close match operator match etc...
Every thing you need, I don't have time for a full workup on a solution...
A quick example would be an open followed by a body followed by a close is a variable. And an open followed by and body and another open is a function.
p
You can also add additional patterns by inserting like this (?P<function>function\.) with the pipe in there like '/(?P<open>\{\{)|(?P<function>function\.)|... . Then you could pick up keywords like function foreach block etc... what have you.
I've written full fledged template systems with this method. In my template system I build the RegX in an array like this
[ 'open' => '\{\{', 'function' => 'function\.', .... ]
And then compress it to the actual regx, makes life easy...
$r = [];
foreach( $patt_array as $key=>$value ){
$r[] = '(?P<'.$key.'>'.$value.')';
}
$patt = '/'.implode('|', $r ).'/';
Etc...
If you follow.

What is the regular expression to validate a comma delimited list but ending with '&' and a word

I'm able to extract till 11.20 but after that the comma stops and the regex I wrote fails. How can I write this expression? I'm using preg_match_all function.
input string:
8, 8.40, 9.20, 10, 10.40, 11.20, 12 & 12.40 latenight
output needed:
Array
(
[0] => 8,
[1] => 8.40,
[2] => 9.20,
[3] => 10,
[4] => 10.40,
[5] => 11.20,
[6] => 12,
[7] => 12.40,
)
$string = '8, 8.40, 9.20, 10, 10.40, 11.20, 12 & 12.40 latenight';
$string = str_replace('&', ',', $string);
$string = str_replace(' ', ',', $string);
$parts = preg_split('/,+/', $string);
print_r($parts);
prints
Array
(
[0] => 8
[1] => 8.40
[2] => 9.20
[3] => 10
[4] => 10.40
[5] => 11.20
[6] => 12
[7] => 12.40
[8] => latenight
)
Close enough?
There is no need to match the comma or ampersand is there? Why not just match what you are looking for?
var str = "8, 8.40, 9.20, 10, 10.40, 11.20, 12 & 12.40 latenight";
var res = str.match( /\d+(\.\d{2})?|\w+$/g );
console.log( res ); //["8", "8.40", "9.20", "10", "10.40", "11.20", "12", "12.40", "latenight"]
//RegExp parts
\d+ - 1 or more digits
( - start optional group
\. - a literal decimal point
\d{2} - exactly 2 digits
)? - end optional group
| - or
\w+$ - a word at the end of the string
If you don't want the word at the end then leave the last clause out.
var str = "8, 8.40, 9.20, 10, 10.40, 11.20, 12 & 12.40 latenight";
var res = str.match( /\d+(\.\d{2})?/g );
console.log( res ); //["8", "8.40", "9.20", "10", "10.40", "11.20", "12", "12.40"]
This expression,
[0-9]*(?:\.?[0-9]+)?(?=\s*&|\s*,)
might have worked too.
Demo
$re = '/[0-9]*(?:\.?[0-9]+)?(?=\s*&|\s*,)/s';
$str = '8, 8.40, 9.20, 10, 10.40, 11.20, 12 & 12.40 latenight';
preg_match_all($re, $str, $matches, PREG_SET_ORDER, 0);
var_dump($matches);
If you wish to simplify/modify/explore the expression, it's been explained on the top right panel of regex101.com. If you'd like, you can also watch in this link, how it would match against some sample inputs.

Categories