How to split string using one pattern? - php

I have a string such as:
84 - Pampers mid (4-9кг) №180 [Procter&Gamble] - 1978.00
And i need to divide it to array (php), something like:
[0] 84
[1] Pampers mid (4-9кг) №180
[2] Procter&Gamble
[3] 1978.00
At that moment i am doing it step-by-step:
$pattern = '/\[(.*)\]/';//producer
preg_match($pattern, $subject, $matches_producer);
$provider=$matches_producer[1];
...
and so on for each element.
But this is ugly method, isn't it? How i can make it with one pattern?

You can combine all regex's into one to extract all values into $match at once:
$string = "84 - Pampers mid (4-9кг) №180 [Procter&Gamble] - 1978.00";
preg_match('/(\d+) - (.*) \[(.*)\] - (\d+.\d+)/', $string, $match);
After running this code $match contains:
Array
(
[0] => 84 - Pampers mid (4-9кг) №180 [Procter&Gamble] - 1978.00
[1] => 84
[2] => Pampers mid (4-9кг) №180
[3] => Procter&Gamble
[4] => 1978.00
)
If you have a lot of these strings in an array you might consider using preg_match_all

something like
$x = '84 - Pampers mid (4-9кг) №180 [Procter&Gamble] - 1978.00';
preg_match_all('/(\d*)\ \-\ (.*)\ \[(.*)\]\ \-\ (.*)/', $x, $matches);
var_dump($matches);

Related

Grouping of regex with same name

I am trying to write a regex to get the ingredients name, quantity, unit from the sting. The string can be any pattern like "pohe 2 kg OR 2 Kg pohe OR 2Kg Pohe".
I have tried with below code -
<?PHP
$units = array("tbsp", "ml", "g", "grams", "kg", "few drops"); // add whatever other units are allowed
//mixed pattern
$pattern = '/(?J)(((?<i>^[a-zA-Z\s]+)(?<q>\d*\s*)(?<u>' . join("|", array_map("preg_quote", $units)) . '))|(?<q>^\d*\s*)(?<u>' . join("|", array_map("preg_quote", $units)) . ')(?<i>[a-zA-Z\s]+))/';
$ingredients = '2kg pohe';
preg_match_all($pattern, $ingredients, $m);
print_r($m);
$quantities = $m['q'];
$units = array_map('trim', $m['u']);
$ingrd = array_map('trim', $m['i']);
print_r($quantities);
print_r($units);
print_r($ingrd);
?>
The above code works for the string "2kg pohe", but not for the "pohe 2kg".
If anyone having idea what I am missing, please help me in this.
For pohe 2kg duplicate named groups are empty, as the documentation of preg_match_all states that for the flag PREG_PATTERN_ORDER (which is the default)
If the pattern contains duplicate named subpatterns, only the
rightmost subpattern is stored in $matches[NAME].
Int he pattern that you generate, there is a match in the second part (after the alternation) for 2kg pohe but for the pohe 2kg there is only a match in the first part so for the second part there are no values stored.
What you might do, is use the PREG_SET_ORDER flag instead, which gives:
$ingredients = '2kg pohe';
preg_match_all($pattern, $ingredients, $m, PREG_SET_ORDER);
print_r($m[0]);
Output
Array
(
[0] => 2kg pohe
[i] => pohe
[1] =>
[q] => 2
[2] =>
[u] => kg
[3] =>
[4] => 2
[5] => kg
[6] => pohe
)
And
$ingredients = 'pohe 2kg';
preg_match_all($pattern, $ingredients, $m, PREG_SET_ORDER);
print_r($m[0]);
Output
Array
(
[0] => pohe 2kg
[i] => pohe
[1] => pohe
[q] => 2
[2] => 2
[u] => kg
[3] => kg
)
Then you can get the named subgroups for both strings like $m[0]['i'] etc..
Note that in the example there is 2Kg and you can make the pattern case insensitive to match.

PHP: Can preg_match include unmatched groups?

Can the preg_match() function include groups it did not find in the matches array?
Here is the pattern I'm using:
/^([0-9]+)(.[0-9]+)?\s?([^iIbB])?([iI])?([bB])?$/
What I'm trying to is parse an human readable size into bytes. This pattern fits my requirement, but only if I can retrieve matches in the absolute group order.
This can produce upto 5 match groups, which would result in a matches array with indices 0-5. However if the string does not match all groups, then the matches array may have, for example, group 5 actually at index 3.
What I'd like is the final match in that pattern (5) to always be at the same index of the matches array. Because multiple groups are optional it's very important that when reading the matches array we know which group in the expression got matched.
Example situation: The regex tester at regexr.com will show all 5 groups including those not matched always in the correct order. By enabling the "global" and "multi-line" flags and using the following text, you can hover over the blue matches for a good visual.
500.2 KiB
256M
700 Mb
1.2GiB
You'll notice that not all groups are always matched, however the group indexes are always in the correct order.
Edit: Yes I did already try this in PHP with the following:
$matches = [];
$matchesC = 0;
$matchesN = 6;
if (!preg_match("/^([0-9]+)(\.[0-9]+)?\s?([^iIbB])?([iI])?([bB])?$/", $size, $matches) || ($matchesC = count($matches)) < $matchesN) {
print_r($matches);
throw new \Exception(sprintf("Could not parse size string. (%d/%d)", $matchesC, $matchesN));
}
When $size is "256M" that print_r($matches); returns:
Array
(
[0] => 256M
[1] => 256
[2] =>
[3] => M
)
Groups 4 and 5 are missing.
The non-participating groups are just not initialized with an empty string value in PHP, so, Group 4 and 5 are null in case of '256M' string. It seems that preg_match discards those non-initialized values from the end of the array.
In your case, you can make your capturing groups non-optional, but the patterns inside optional.
$arr = array('500.2 KiB', '256M', '700 Mb', '1.2GiB');
foreach ($arr as $s) {
if (preg_match('~^([0-9]+)(\.[0-9]+)?\s?([^ib]?)(i?)(b?)$~i', $s, $m)) {
print_r($m) . "\n";
}
}
Output:
Array
(
[0] => 500.2 KiB
[1] => 500
[2] => .2
[3] => K
[4] => i
[5] => B
)
Array
(
[0] => 256M
[1] => 256
[2] =>
[3] => M
[4] =>
[5] =>
)
Array
(
[0] => 700 Mb
[1] => 700
[2] =>
[3] => M
[4] =>
[5] => b
)
Array
(
[0] => 1.2GiB
[1] => 1
[2] => .2
[3] => G
[4] => i
[5] => B
)
See the PHP demo.
You can use T-Regx which can handle such cases with ease! It always checks whether a group is matched, even if it's last and unmatched. It also can differentiate between "" (matched empty) or null (unmatched):
pattern('^([0-9]+)(.[0-9]+)?\s?([^iIbB])?([iI])?([bB])?$')
->match($size)
->first(function (Match $match) {
// whether the group was used in a pattern
$match->hasGroup(14);
// whether the group was matched, even if last or empty string
$match->matched(5);
// group, or default value if not matched
$match->group(5)->orReturn('unmatched');
});

How to split a string in multiple ones (Php)?

I want to split a big number/string for example 123456789123456789 into 6 smaller strings/numbers of 3 characters each. So the result would be 123 456 789 123 456 789. How can I do this?
Use chunk_split():
$var = "123456789123456789";
$split_string = chunk_split($var, 3); // 3 is the length of each chunk
If you want your result as an array, you can use str_split():
$var = "123456789123456789";
$array = str_split($var, 3); // 3 is the length of each chunk
You may use chunk_split() function.
It splits a string into smaller
$string = "123456789123456789";
echo chunk_split ($string, 3, " ");
will output
123 456 789 123 456 789
First parameter is the string to be chunked. The second is the chunk length and the third is what you want at the end of each chunk.
See PHP manual for further information
You could do something like this:
$string = '123456789123456789';
preg_match_all('/(\d{3})/', $string, $matches);
print_r($matches[1]);
Output:
Array
(
[0] => 123
[1] => 456
[2] => 789
[3] => 123
[4] => 456
[5] => 789
)
\d is a number and {3} is 3 of the previously found character (in this case a number.
....
or if there won't always be even groupings:
$string = '12345678912345678922';
preg_match_all('/(\d{1,3})/', $string, $matches);
print_r($matches[1]);
Output:
Array
(
[0] => 123
[1] => 456
[2] => 789
[3] => 123
[4] => 456
[5] => 789
[6] => 22
)
Demo: https://regex101.com/r/rX0pJ1/1

preg_match_all pattern to get phone numbers

I have tried many many times to use preg_match_all for getting some phone numbers.
The things i want to get, with no problem is these structures :
09123456789
+989123456789
989123456789
0912 345 6789
+98 912 345 6789
How can i use preg_match_all to find the top numbers ?
they may have spaces or not.
All of them maybe start with +98 OR 98 For country code,
And then phone number Must start with 9 OR 0.
I have tried like this: (But it does NOT work for all)
/[+989][09]*([0-9]{9,})/i
I think you want something like this,
(?:\+?98|0)(?:\s*\d{3}){2}\s*\d{4}
DEMO
<?php
$str = <<<EOT
09123456789
+989123456789
989123456789
0912 345 6789
+98 912 345 6789
EOT;
$regex = '~(?:\+?98|0)(?:\s*\d{3}){2}\s*\d{4}~';
preg_match_all($regex, $str, $matches);
print_r($matches);
?>
Output:
Array
(
[0] => Array
(
[0] => 09123456789
[1] => +989123456789
[2] => 989123456789
[3] => 0912 345 6789
[4] => +98 912 345 6789
)
)
Try this:
(((\+?98)?|0) ?9[\d ]+)
See demo: http://regex101.com/r/rG6qE8/1

RegEx Statement Issues - PHP

I am attempting to use RegEx to strip down the following data:
mlb_s_left1=Baltimore 3 ^NY Yankees 12 (FINAL)&mlb_s_right1_1=W: Hughes L: Britton&mlb_s_right1_count=1&mlb_s_url1=http://sports.espn.go.com/mlb/boxscore?gameId=320801110&mlb_s_left2=^Chicago Sox 3 Minnesota 2 (FINAL)&mlb_s_right2_1=W: Peavy L: Diamond S: Reed&mlb_s_right2_count=1&mlb_s_url2=http://sports.espn.go.com/mlb/boxscore?gameId=320801109
I am hoping to split it apart by home team (first city), home score (first digit), away team (second city), away score (second digit), and where in the game it is (in parenthesis). This is the RegEx I have currently, but am feeling is very wrong.
preg_match_all('/mlb_s_left[0-9]=(?P<hometeam>.*?) (?P<homescore>.*?) (?P<awayteam>.*?) (?P<awayscore>.*?)\((?P<time>.*?)\)/', $content, $matches);
I would appreciate any and all help in getting this working.
I have tested following code snippet in php 5.4.5:
<?php
$foo = 'mlb_s_left1=Baltimore 3 ^NY Yankees 12 (FINAL)&mlb_s_right1_1=W: Hughes L: Britton&mlb_s_right1_count=1&mlb_s_url1=http://sports.espn.go.com/mlb/boxscore?gameId=320801110&mlb_s_left2=^Chicago Sox 3 Minnesota 2 (FINAL)&mlb_s_right2_1=W: Peavy L: Diamond S: Reed&mlb_s_right2_count=1&mlb_s_url2=http://sports.espn.go.com/mlb/boxscore?gameId=320801109';
preg_match_all('/mlb_s_left\d=\^?(?P<hometeam>[a-zA-Z]+(?:\s+[a-zA-Z]+)*)\s+(?P<homescore>\d+)\s+\^?(?P<awayteam>[a-zA-Z]+(?:\s+[a-zA-Z]+)*)\s+(?P<awayscore>\d+)\s+\((?P<time>\w+)\)/', $foo, $matches, PREG_SET_ORDER);
print_r($matches);
?>
output:
Array
(
[0] => Array
(
[0] => mlb_s_left1=Baltimore 3 ^NY Yankees 12 (FINAL)
[hometeam] => Baltimore
[1] => Baltimore
[homescore] => 3
[2] => 3
[awayteam] => NY Yankees
[3] => NY Yankees
[awayscore] => 12
[4] => 12
[time] => FINAL
[5] => FINAL
)
[1] => Array
(
[0] => mlb_s_left2=^Chicago Sox 3 Minnesota 2 (FINAL)
[hometeam] => Chicago Sox
[1] => Chicago Sox
[homescore] => 3
[2] => 3
[awayteam] => Minnesota
[3] => Minnesota
[awayscore] => 2
[4] => 2
[time] => FINAL
[5] => FINAL
)
)
Something like this should get you close.
preg_match_all('/mlb_s_left\d+=(?P<hometeam>\D+)\s+(?P<homescore>\d+)\s+(?P<awayteam>\D+)\s+(?P<awayscore>\d+)\s*\((?P<time>[^)]+)\)/',
$content, $matches);
Note that \d matches any digit, and \D matches anything that is not a digit.
[^)]+ matches one or more non-close parens characters; \s+ matches one or more whitespace chars, and \s* matches zero or more whitespace characters.
This wouldn't work very well if you have a city name with a number in it, and if you have a huge string, it's possible it could get hung up somewhere; you might consider splitting it up and matching a bit more piecemeal.
Generally speaking I would avoid .*? as a pattern match, as it basically matches almost anything. It's best for your regular expression to be as specific as possible, based on what you know about the data.

Categories