I want to split the text 'MTWTHFSSU' days of the week and store it in an array.
So far I am using this code using preg_split and REGEX.
$splitdays = preg_split('/(.H?)/',$days,null,PREG_SPLIT_NO_EMPTY|PREG_SPLIT_DELIM_CAPTURE);
print_r($splitdays);
array[0=>M, 1 => T, 2 => W, 3 => TH, 4 => F, 5 => S, 6 => S, 7 => U]
//this is my output
//if i change (.U) i will get correct SUNDAY = 'SU' but TH will be => T, => H
Note that the . can match any character.
As an alternative, you can use a bit more precise match with preg_match_all and use a pattern with an alternation | to list the more specific matches at the beginning and use a character class to list the single character variations.
TH|SU|[MTWFS]
For example
$days = "MTWTHFSSU";
$pattern = "/TH|SU|[MTWFS]/";
preg_match_all($pattern, $days, $matches);
print_r($matches[0]);
Output
Array
(
[0] => M
[1] => T
[2] => W
[3] => TH
[4] => F
[5] => S
[6] => SU
)
This should do the trick
$days='MTWTHFSSU';
$splitdays = preg_split('/(.H|.U?)/',$days,null,PREG_SPLIT_NO_EMPTY|PREG_SPLIT_DELIM_CAPTURE);
print_r($splitdays);
#anubhava's more concise comment works as well
preg_split('/(.[HU]?)/',$days,null,PREG_SPLIT_NO_EMPTY|PREG_SPLIT_DELIM_CAPTURE)
Related
I need to split a string by number and by spaces but not sure the regex for that. My code is:
$array = preg_split('/[0-9].\s/', $content);
The value of $content is:
Weight 229.6104534866 g
Energy 374.79170898476 kcal
Total lipid (fat) 22.163422468932 g
Carbohydrate, by difference 13.641848209743 g
Sugars, total 4.3691034101428 g
Protein 29.256342349938 g
Sodium, Na 468.99386390008 mg
Which gives the result:
Array ( [0] => Weight 229.61045348 [1] => g
Energy 374.791708984 [2] => kcal
Total lipid (fat) 22.1634224689 [3] => g
Carbohydrate, by difference 13.6418482097 [4] => g
Sugars, total 4.36910341014 [5] => g
Protein 29.2563423499 [6] => g
Sodium, Na 468.993863900 [7] => mg
) 1
I need to split the text from the number but not sure how, so that:
[0] => Weight
[1] => 229.60145348
[2] => g
and so on...
I also need it to ignore the commas, brackets and spaces where the label is. When using explode I found that 'Total lipid (fat)' instead of being one value separated into 3 values, not sure how to fix that with regex.
When using explode() I get:
[0] => Total
[1] => lipid
[2] => (fat)
but I need those values as one for a label, any way to ignore that?
Any help is very appreciated!
Instead of splitting, you might very well match and capture the required parts, e.g. with the following pattern:
^(?P<category>\D+)\s+(?P<value>[\d.]+)\s+(?P<unit>.+)
See a demo on regex101.com.
In PHP this could be
<?php
$data = 'Weight 229.6104534866 g
Energy 374.79170898476 kcal
Total lipid (fat) 22.163422468932 g
Carbohydrate, by difference 13.641848209743 g
Sugars, total 4.3691034101428 g
Protein 29.256342349938 g
Sodium, Na 468.99386390008 mg ';
$pattern = '~^(?P<category>\D+)\s+(?P<value>[\d.]+)\s+(?P<unit>.+)~m';
preg_match_all($pattern, $data, $matches, PREG_SET_ORDER, 0);
// Print the entire match result
print_r($matches);
?>
See a demo on ideone.com.
As an alternative to using a preg_ functions, sscanf() allows the decimal value to be explicitly typed as a float (if that is valuable).
Unfortunately due to the greedy nature of sscanf(), the space between the label and the float value will still be attached to the label string. If this is a problem, the label value will need to be rtrim()ed.
Code: (Demo)
// $contentLines = file('path/to/content.txt');
$contentLines = [
'Weight 229.6104534866 g',
'Energy 374.79170898476 kcal',
'Total lipid (fat) 22.163422468932 g',
'Carbohydrate, by difference 13.641848209743 g',
'Sugars, total 4.3691034101428 g',
'Protein 29.256342349938 g',
'Sodium, Na 468.99386390008 mg',
];
var_export(
array_map(
fn($line) => sscanf(
$line,
'%[^0-9]%f%s',
),
$contentLines
)
);
Thanks to everyone for the help. I found that by adding a double space in between all values then setting the explode parameter to the double space it ignored what I needed.
I tried multiple time to make a pattern that can validate given string is natural number and split into single number.
..and lack of understanding of regex, the closest thing that I can imagine is..
^([1-9])([0-9])*$ or ^([1-9])([0-9])([0-9])*$ something like that...
It only generates first, last, and second or last-second split-numbers.
I wonder what I need to know to solve this problem.. thanks
You may use a two step solution like
if (preg_match('~\A\d+\z~', $s)) { // if a string is all digits
print_r(str_split($s)); // Split it into chars
}
See a PHP demo.
A one step regex solution:
(?:\G(?!\A)|\A(?=\d+\z))\d
See the regex demo
Details
(?:\G(?!\A)|\A(?=\d+\z)) - either the end of the previous match (\G(?!\A)) or (|) the start of string (^) that is followed with 1 or more digits up to the end of the string ((?=\d+\z))
\d - a digit.
PHP demo:
$re = '/(?:\G(?!\A)|\A(?=\d+\z))\d/';
$str = '1234567890';
if (preg_match_all($re, $str, $matches)) {
print_r($matches[0]);
}
Output:
Array
(
[0] => 1
[1] => 2
[2] => 3
[3] => 4
[4] => 5
[5] => 6
[6] => 7
[7] => 8
[8] => 9
[9] => 0
)
Can the preg_match() function include groups it did not find in the matches array?
Here is the pattern I'm using:
/^([0-9]+)(.[0-9]+)?\s?([^iIbB])?([iI])?([bB])?$/
What I'm trying to is parse an human readable size into bytes. This pattern fits my requirement, but only if I can retrieve matches in the absolute group order.
This can produce upto 5 match groups, which would result in a matches array with indices 0-5. However if the string does not match all groups, then the matches array may have, for example, group 5 actually at index 3.
What I'd like is the final match in that pattern (5) to always be at the same index of the matches array. Because multiple groups are optional it's very important that when reading the matches array we know which group in the expression got matched.
Example situation: The regex tester at regexr.com will show all 5 groups including those not matched always in the correct order. By enabling the "global" and "multi-line" flags and using the following text, you can hover over the blue matches for a good visual.
500.2 KiB
256M
700 Mb
1.2GiB
You'll notice that not all groups are always matched, however the group indexes are always in the correct order.
Edit: Yes I did already try this in PHP with the following:
$matches = [];
$matchesC = 0;
$matchesN = 6;
if (!preg_match("/^([0-9]+)(\.[0-9]+)?\s?([^iIbB])?([iI])?([bB])?$/", $size, $matches) || ($matchesC = count($matches)) < $matchesN) {
print_r($matches);
throw new \Exception(sprintf("Could not parse size string. (%d/%d)", $matchesC, $matchesN));
}
When $size is "256M" that print_r($matches); returns:
Array
(
[0] => 256M
[1] => 256
[2] =>
[3] => M
)
Groups 4 and 5 are missing.
The non-participating groups are just not initialized with an empty string value in PHP, so, Group 4 and 5 are null in case of '256M' string. It seems that preg_match discards those non-initialized values from the end of the array.
In your case, you can make your capturing groups non-optional, but the patterns inside optional.
$arr = array('500.2 KiB', '256M', '700 Mb', '1.2GiB');
foreach ($arr as $s) {
if (preg_match('~^([0-9]+)(\.[0-9]+)?\s?([^ib]?)(i?)(b?)$~i', $s, $m)) {
print_r($m) . "\n";
}
}
Output:
Array
(
[0] => 500.2 KiB
[1] => 500
[2] => .2
[3] => K
[4] => i
[5] => B
)
Array
(
[0] => 256M
[1] => 256
[2] =>
[3] => M
[4] =>
[5] =>
)
Array
(
[0] => 700 Mb
[1] => 700
[2] =>
[3] => M
[4] =>
[5] => b
)
Array
(
[0] => 1.2GiB
[1] => 1
[2] => .2
[3] => G
[4] => i
[5] => B
)
See the PHP demo.
You can use T-Regx which can handle such cases with ease! It always checks whether a group is matched, even if it's last and unmatched. It also can differentiate between "" (matched empty) or null (unmatched):
pattern('^([0-9]+)(.[0-9]+)?\s?([^iIbB])?([iI])?([bB])?$')
->match($size)
->first(function (Match $match) {
// whether the group was used in a pattern
$match->hasGroup(14);
// whether the group was matched, even if last or empty string
$match->matched(5);
// group, or default value if not matched
$match->group(5)->orReturn('unmatched');
});
hello need help in regex i use this to split strings with capital letters like OldMcDonald
preg_split('/(?=[A-Z])/', $data, -1, PREG_SPLIT_NO_EMPTY);
output
[0] => Old
[1] => Mc
[2] => Donald
now i need to split strings like MWTTH
i need to tell the regex that a T with a letter H is one word how can i apply in my regex?
need the output:
[0] => M
[1] => w
[2] => T
[3] => TH
when i tried
$array = preg_split('/(?=[A-Z][TH])/', $data, -1, PREG_SPLIT_NO_EMPTY);
Output is
Array
(
[0] => MTW
[1] => F
[2] => TH
)
MTH Does not break appart, No time to study regex now.
I should have studied a little further i could have got it, anyway i already found it out i used:
$data = 'MTWFTH';
$array = preg_split('/(?=TH|M|T|W|F|S)/', $data, -1, PREG_SPLIT_NO_EMPTY);
OUTPUT
array (size=5)
0 => string 'M' (length=1)
1 => string 'T' (length=1)
2 => string 'W' (length=1)
3 => string 'F' (length=1)
4 => string 'TH' (length=2)
while this will only work for predefined data like mine.
No time to study regex... So, you are basically asking us to figure out the problem for you.
It took about 5 seconds to figure it out, 30 seconds to write it down, and 5 more seconds to copy and paste it:
$string = 'OldMcMWTTHDonald';
preg_match_all('/(?:TH|[A-Z][a-z]*)/', $string, $matches);
var_dump($matches);
You just wasted 40 seconds of my life.
I am attempting to use RegEx to strip down the following data:
mlb_s_left1=Baltimore 3 ^NY Yankees 12 (FINAL)&mlb_s_right1_1=W: Hughes L: Britton&mlb_s_right1_count=1&mlb_s_url1=http://sports.espn.go.com/mlb/boxscore?gameId=320801110&mlb_s_left2=^Chicago Sox 3 Minnesota 2 (FINAL)&mlb_s_right2_1=W: Peavy L: Diamond S: Reed&mlb_s_right2_count=1&mlb_s_url2=http://sports.espn.go.com/mlb/boxscore?gameId=320801109
I am hoping to split it apart by home team (first city), home score (first digit), away team (second city), away score (second digit), and where in the game it is (in parenthesis). This is the RegEx I have currently, but am feeling is very wrong.
preg_match_all('/mlb_s_left[0-9]=(?P<hometeam>.*?) (?P<homescore>.*?) (?P<awayteam>.*?) (?P<awayscore>.*?)\((?P<time>.*?)\)/', $content, $matches);
I would appreciate any and all help in getting this working.
I have tested following code snippet in php 5.4.5:
<?php
$foo = 'mlb_s_left1=Baltimore 3 ^NY Yankees 12 (FINAL)&mlb_s_right1_1=W: Hughes L: Britton&mlb_s_right1_count=1&mlb_s_url1=http://sports.espn.go.com/mlb/boxscore?gameId=320801110&mlb_s_left2=^Chicago Sox 3 Minnesota 2 (FINAL)&mlb_s_right2_1=W: Peavy L: Diamond S: Reed&mlb_s_right2_count=1&mlb_s_url2=http://sports.espn.go.com/mlb/boxscore?gameId=320801109';
preg_match_all('/mlb_s_left\d=\^?(?P<hometeam>[a-zA-Z]+(?:\s+[a-zA-Z]+)*)\s+(?P<homescore>\d+)\s+\^?(?P<awayteam>[a-zA-Z]+(?:\s+[a-zA-Z]+)*)\s+(?P<awayscore>\d+)\s+\((?P<time>\w+)\)/', $foo, $matches, PREG_SET_ORDER);
print_r($matches);
?>
output:
Array
(
[0] => Array
(
[0] => mlb_s_left1=Baltimore 3 ^NY Yankees 12 (FINAL)
[hometeam] => Baltimore
[1] => Baltimore
[homescore] => 3
[2] => 3
[awayteam] => NY Yankees
[3] => NY Yankees
[awayscore] => 12
[4] => 12
[time] => FINAL
[5] => FINAL
)
[1] => Array
(
[0] => mlb_s_left2=^Chicago Sox 3 Minnesota 2 (FINAL)
[hometeam] => Chicago Sox
[1] => Chicago Sox
[homescore] => 3
[2] => 3
[awayteam] => Minnesota
[3] => Minnesota
[awayscore] => 2
[4] => 2
[time] => FINAL
[5] => FINAL
)
)
Something like this should get you close.
preg_match_all('/mlb_s_left\d+=(?P<hometeam>\D+)\s+(?P<homescore>\d+)\s+(?P<awayteam>\D+)\s+(?P<awayscore>\d+)\s*\((?P<time>[^)]+)\)/',
$content, $matches);
Note that \d matches any digit, and \D matches anything that is not a digit.
[^)]+ matches one or more non-close parens characters; \s+ matches one or more whitespace chars, and \s* matches zero or more whitespace characters.
This wouldn't work very well if you have a city name with a number in it, and if you have a huge string, it's possible it could get hung up somewhere; you might consider splitting it up and matching a bit more piecemeal.
Generally speaking I would avoid .*? as a pattern match, as it basically matches almost anything. It's best for your regular expression to be as specific as possible, based on what you know about the data.