Grouping of regex with same name - php

I am trying to write a regex to get the ingredients name, quantity, unit from the sting. The string can be any pattern like "pohe 2 kg OR 2 Kg pohe OR 2Kg Pohe".
I have tried with below code -
<?PHP
$units = array("tbsp", "ml", "g", "grams", "kg", "few drops"); // add whatever other units are allowed
//mixed pattern
$pattern = '/(?J)(((?<i>^[a-zA-Z\s]+)(?<q>\d*\s*)(?<u>' . join("|", array_map("preg_quote", $units)) . '))|(?<q>^\d*\s*)(?<u>' . join("|", array_map("preg_quote", $units)) . ')(?<i>[a-zA-Z\s]+))/';
$ingredients = '2kg pohe';
preg_match_all($pattern, $ingredients, $m);
print_r($m);
$quantities = $m['q'];
$units = array_map('trim', $m['u']);
$ingrd = array_map('trim', $m['i']);
print_r($quantities);
print_r($units);
print_r($ingrd);
?>
The above code works for the string "2kg pohe", but not for the "pohe 2kg".
If anyone having idea what I am missing, please help me in this.

For pohe 2kg duplicate named groups are empty, as the documentation of preg_match_all states that for the flag PREG_PATTERN_ORDER (which is the default)
If the pattern contains duplicate named subpatterns, only the
rightmost subpattern is stored in $matches[NAME].
Int he pattern that you generate, there is a match in the second part (after the alternation) for 2kg pohe but for the pohe 2kg there is only a match in the first part so for the second part there are no values stored.
What you might do, is use the PREG_SET_ORDER flag instead, which gives:
$ingredients = '2kg pohe';
preg_match_all($pattern, $ingredients, $m, PREG_SET_ORDER);
print_r($m[0]);
Output
Array
(
[0] => 2kg pohe
[i] => pohe
[1] =>
[q] => 2
[2] =>
[u] => kg
[3] =>
[4] => 2
[5] => kg
[6] => pohe
)
And
$ingredients = 'pohe 2kg';
preg_match_all($pattern, $ingredients, $m, PREG_SET_ORDER);
print_r($m[0]);
Output
Array
(
[0] => pohe 2kg
[i] => pohe
[1] => pohe
[q] => 2
[2] => 2
[u] => kg
[3] => kg
)
Then you can get the named subgroups for both strings like $m[0]['i'] etc..
Note that in the example there is 2Kg and you can make the pattern case insensitive to match.

Related

Splitting a single string to an array on more than one delimiter

Is it possible to explode the following:
08 1.2/3(1(1)2.1-1
to an array of {08, 1, 2, 3, 1, 1, 2, 1, 1}?
I tried using preg_split("/ (\s|\.|\-|\(|\)) /g", '08 1.2/3(1(1)2.1-1') but it returned nothing. I tried checking my regex here and it matched well. What am I missing here?
You should use a character class containing all the delimiters which you want to use for splitting. Regex character classes appear inside [...]:
<?php
$keywords = preg_split("/[\s,\/().-]+/", '08 1.2/3(1(1)2.1-1');
print_r($keywords);
Result:
Array ( [0] => 08 [1] => 1 [2] => 2 [3] => 3 [4] => 1 [5] => 1 [6] => 2 [7] => 1 [8] => 1 )
You can use preg_match_all():
$str = '08 1.2/3(1(1)2.1-1';
preg_match_all('!\d+!', $str, $matches);
print_r($matches);

Php preg_split seperates number with comma in two different numbers

$line = "Type:Bid, End Time: 12/20/2018 08:10 AM (PST), Price: $8,000,Bids: 14, Age: 0, Description: , Views: 120270, Valuation: $10,75, IsTrue: false";
I need to get this array:
Array ( [0] => Bid [1] => 12/20/2018 08:10 AM (PST) [2] => $8,000 [3] => 14 [4] => 0 [5] => [6] => 120270 [7] => $10,75 [8] => false )
I agree with Andreas about using preg_match_all(), but not with his pattern.
For stability, I recommend consuming the entire string from the beginning.
Match the label and its trailing colon. [^:]+:
Match zero or more spaces. \s*
Forget what you matched so far \K
Lazily match zero or more characters (giving back when possible -- make minimal match). .*?
"Look Ahead" and demand that the matched characters from #4 are immediately followed by a comma, then 1 or more non-comma&non-colon character (the next label), then a colon ,[^,:]+: OR the end of the string $.
Code: (Demo)
$line = "Type:Bid, End Time: 12/20/2018 08:10 AM (PST), Price: $8,000,Bids: 14, Age: 0, Description: , Views: 120270, Valuation: $10,75, IsTrue: false";
var_export(
preg_match_all(
'/[^:]+:\s*\K.*?(?=\s*(?:$|,[^,:]+:))/',
$line,
$out
)
? $out[0] // isolate fullstring matches
: [] // no matches
);
Output:
array (
0 => 'Bid',
1 => '12/20/2018 08:10 AM (PST)',
2 => '$8,000',
3 => '14',
4 => '0',
5 => '',
6 => '120270',
7 => '$10,75',
8 => 'false',
)
New answer according to new request:
I use he same regex for spliting the string and I replace after what is before the colon:
$line = "Type:Bid, End Time: 12/20/2018 08:10 AM (PST), Price: $8,000,Bids: 14, Age: 0, Description: , Views: 120270, Valuation: $10,75, IsTrue: false";
$parts = preg_split("/(?<!\d),|,(?!\d)/", $line);
$result = array();
foreach($parts as $elem) {
$result[] = preg_replace('/^[^:]+:\h*/', '', $elem);
}
print_r ($result);
Output:
Array
(
[0] => Bid
[1] => 12/20/2018 08:10 AM (PST)
[2] => $8,000
[3] => 14
[4] => 0
[5] =>
[6] => 120270
[7] => $10,75
[8] => false
)
I'd use preg_match instead.
Here the pattern looks for digit(s) comma digit(s) or just digit(s) or a word and a comma.
I append a comma to the string to make the regex simpler.
$line = "TRUE,59,m,10,500";
preg_match_all("/(\d+,\d+|\d+|\w+),/", $line . ",", $match);
var_dump($match);
https://3v4l.org/HQMgu
Even with a different order of the items this code will still produce a correct output: https://3v4l.org/SRJOf
much bettter idea:
$parts=explode(',',$line,4); //explode has a limit you can use in this case 4
same result less code.
I would keep it simple and do this
$line = "TRUE,59,m,10,500";
$parts = preg_split("/,/", $line);
//print_r ($parts);
$parts[3]=$parts[3].','.$parts[4]; //create a new part 3 from 3 and 4
//$parts[3].=','.$parts[4]; //alternative syntax to the above
unset($parts[4]);//remove old part 4
print_r ($parts);
i would also just use explode(), rather than a regular expression.

PHP: Can preg_match include unmatched groups?

Can the preg_match() function include groups it did not find in the matches array?
Here is the pattern I'm using:
/^([0-9]+)(.[0-9]+)?\s?([^iIbB])?([iI])?([bB])?$/
What I'm trying to is parse an human readable size into bytes. This pattern fits my requirement, but only if I can retrieve matches in the absolute group order.
This can produce upto 5 match groups, which would result in a matches array with indices 0-5. However if the string does not match all groups, then the matches array may have, for example, group 5 actually at index 3.
What I'd like is the final match in that pattern (5) to always be at the same index of the matches array. Because multiple groups are optional it's very important that when reading the matches array we know which group in the expression got matched.
Example situation: The regex tester at regexr.com will show all 5 groups including those not matched always in the correct order. By enabling the "global" and "multi-line" flags and using the following text, you can hover over the blue matches for a good visual.
500.2 KiB
256M
700 Mb
1.2GiB
You'll notice that not all groups are always matched, however the group indexes are always in the correct order.
Edit: Yes I did already try this in PHP with the following:
$matches = [];
$matchesC = 0;
$matchesN = 6;
if (!preg_match("/^([0-9]+)(\.[0-9]+)?\s?([^iIbB])?([iI])?([bB])?$/", $size, $matches) || ($matchesC = count($matches)) < $matchesN) {
print_r($matches);
throw new \Exception(sprintf("Could not parse size string. (%d/%d)", $matchesC, $matchesN));
}
When $size is "256M" that print_r($matches); returns:
Array
(
[0] => 256M
[1] => 256
[2] =>
[3] => M
)
Groups 4 and 5 are missing.
The non-participating groups are just not initialized with an empty string value in PHP, so, Group 4 and 5 are null in case of '256M' string. It seems that preg_match discards those non-initialized values from the end of the array.
In your case, you can make your capturing groups non-optional, but the patterns inside optional.
$arr = array('500.2 KiB', '256M', '700 Mb', '1.2GiB');
foreach ($arr as $s) {
if (preg_match('~^([0-9]+)(\.[0-9]+)?\s?([^ib]?)(i?)(b?)$~i', $s, $m)) {
print_r($m) . "\n";
}
}
Output:
Array
(
[0] => 500.2 KiB
[1] => 500
[2] => .2
[3] => K
[4] => i
[5] => B
)
Array
(
[0] => 256M
[1] => 256
[2] =>
[3] => M
[4] =>
[5] =>
)
Array
(
[0] => 700 Mb
[1] => 700
[2] =>
[3] => M
[4] =>
[5] => b
)
Array
(
[0] => 1.2GiB
[1] => 1
[2] => .2
[3] => G
[4] => i
[5] => B
)
See the PHP demo.
You can use T-Regx which can handle such cases with ease! It always checks whether a group is matched, even if it's last and unmatched. It also can differentiate between "" (matched empty) or null (unmatched):
pattern('^([0-9]+)(.[0-9]+)?\s?([^iIbB])?([iI])?([bB])?$')
->match($size)
->first(function (Match $match) {
// whether the group was used in a pattern
$match->hasGroup(14);
// whether the group was matched, even if last or empty string
$match->matched(5);
// group, or default value if not matched
$match->group(5)->orReturn('unmatched');
});

How to split a string in multiple ones (Php)?

I want to split a big number/string for example 123456789123456789 into 6 smaller strings/numbers of 3 characters each. So the result would be 123 456 789 123 456 789. How can I do this?
Use chunk_split():
$var = "123456789123456789";
$split_string = chunk_split($var, 3); // 3 is the length of each chunk
If you want your result as an array, you can use str_split():
$var = "123456789123456789";
$array = str_split($var, 3); // 3 is the length of each chunk
You may use chunk_split() function.
It splits a string into smaller
$string = "123456789123456789";
echo chunk_split ($string, 3, " ");
will output
123 456 789 123 456 789
First parameter is the string to be chunked. The second is the chunk length and the third is what you want at the end of each chunk.
See PHP manual for further information
You could do something like this:
$string = '123456789123456789';
preg_match_all('/(\d{3})/', $string, $matches);
print_r($matches[1]);
Output:
Array
(
[0] => 123
[1] => 456
[2] => 789
[3] => 123
[4] => 456
[5] => 789
)
\d is a number and {3} is 3 of the previously found character (in this case a number.
....
or if there won't always be even groupings:
$string = '12345678912345678922';
preg_match_all('/(\d{1,3})/', $string, $matches);
print_r($matches[1]);
Output:
Array
(
[0] => 123
[1] => 456
[2] => 789
[3] => 123
[4] => 456
[5] => 789
[6] => 22
)
Demo: https://regex101.com/r/rX0pJ1/1

What is the regular expression to validate a comma delimited list but ending with '&' and a word

I'm able to extract till 11.20 but after that the comma stops and the regex I wrote fails. How can I write this expression? I'm using preg_match_all function.
input string:
8, 8.40, 9.20, 10, 10.40, 11.20, 12 & 12.40 latenight
output needed:
Array
(
[0] => 8,
[1] => 8.40,
[2] => 9.20,
[3] => 10,
[4] => 10.40,
[5] => 11.20,
[6] => 12,
[7] => 12.40,
)
$string = '8, 8.40, 9.20, 10, 10.40, 11.20, 12 & 12.40 latenight';
$string = str_replace('&', ',', $string);
$string = str_replace(' ', ',', $string);
$parts = preg_split('/,+/', $string);
print_r($parts);
prints
Array
(
[0] => 8
[1] => 8.40
[2] => 9.20
[3] => 10
[4] => 10.40
[5] => 11.20
[6] => 12
[7] => 12.40
[8] => latenight
)
Close enough?
There is no need to match the comma or ampersand is there? Why not just match what you are looking for?
var str = "8, 8.40, 9.20, 10, 10.40, 11.20, 12 & 12.40 latenight";
var res = str.match( /\d+(\.\d{2})?|\w+$/g );
console.log( res ); //["8", "8.40", "9.20", "10", "10.40", "11.20", "12", "12.40", "latenight"]
//RegExp parts
\d+ - 1 or more digits
( - start optional group
\. - a literal decimal point
\d{2} - exactly 2 digits
)? - end optional group
| - or
\w+$ - a word at the end of the string
If you don't want the word at the end then leave the last clause out.
var str = "8, 8.40, 9.20, 10, 10.40, 11.20, 12 & 12.40 latenight";
var res = str.match( /\d+(\.\d{2})?/g );
console.log( res ); //["8", "8.40", "9.20", "10", "10.40", "11.20", "12", "12.40"]
This expression,
[0-9]*(?:\.?[0-9]+)?(?=\s*&|\s*,)
might have worked too.
Demo
$re = '/[0-9]*(?:\.?[0-9]+)?(?=\s*&|\s*,)/s';
$str = '8, 8.40, 9.20, 10, 10.40, 11.20, 12 & 12.40 latenight';
preg_match_all($re, $str, $matches, PREG_SET_ORDER, 0);
var_dump($matches);
If you wish to simplify/modify/explore the expression, it's been explained on the top right panel of regex101.com. If you'd like, you can also watch in this link, how it would match against some sample inputs.

Categories