Can the preg_match() function include groups it did not find in the matches array?
Here is the pattern I'm using:
/^([0-9]+)(.[0-9]+)?\s?([^iIbB])?([iI])?([bB])?$/
What I'm trying to is parse an human readable size into bytes. This pattern fits my requirement, but only if I can retrieve matches in the absolute group order.
This can produce upto 5 match groups, which would result in a matches array with indices 0-5. However if the string does not match all groups, then the matches array may have, for example, group 5 actually at index 3.
What I'd like is the final match in that pattern (5) to always be at the same index of the matches array. Because multiple groups are optional it's very important that when reading the matches array we know which group in the expression got matched.
Example situation: The regex tester at regexr.com will show all 5 groups including those not matched always in the correct order. By enabling the "global" and "multi-line" flags and using the following text, you can hover over the blue matches for a good visual.
500.2 KiB
256M
700 Mb
1.2GiB
You'll notice that not all groups are always matched, however the group indexes are always in the correct order.
Edit: Yes I did already try this in PHP with the following:
$matches = [];
$matchesC = 0;
$matchesN = 6;
if (!preg_match("/^([0-9]+)(\.[0-9]+)?\s?([^iIbB])?([iI])?([bB])?$/", $size, $matches) || ($matchesC = count($matches)) < $matchesN) {
print_r($matches);
throw new \Exception(sprintf("Could not parse size string. (%d/%d)", $matchesC, $matchesN));
}
When $size is "256M" that print_r($matches); returns:
Array
(
[0] => 256M
[1] => 256
[2] =>
[3] => M
)
Groups 4 and 5 are missing.
The non-participating groups are just not initialized with an empty string value in PHP, so, Group 4 and 5 are null in case of '256M' string. It seems that preg_match discards those non-initialized values from the end of the array.
In your case, you can make your capturing groups non-optional, but the patterns inside optional.
$arr = array('500.2 KiB', '256M', '700 Mb', '1.2GiB');
foreach ($arr as $s) {
if (preg_match('~^([0-9]+)(\.[0-9]+)?\s?([^ib]?)(i?)(b?)$~i', $s, $m)) {
print_r($m) . "\n";
}
}
Output:
Array
(
[0] => 500.2 KiB
[1] => 500
[2] => .2
[3] => K
[4] => i
[5] => B
)
Array
(
[0] => 256M
[1] => 256
[2] =>
[3] => M
[4] =>
[5] =>
)
Array
(
[0] => 700 Mb
[1] => 700
[2] =>
[3] => M
[4] =>
[5] => b
)
Array
(
[0] => 1.2GiB
[1] => 1
[2] => .2
[3] => G
[4] => i
[5] => B
)
See the PHP demo.
You can use T-Regx which can handle such cases with ease! It always checks whether a group is matched, even if it's last and unmatched. It also can differentiate between "" (matched empty) or null (unmatched):
pattern('^([0-9]+)(.[0-9]+)?\s?([^iIbB])?([iI])?([bB])?$')
->match($size)
->first(function (Match $match) {
// whether the group was used in a pattern
$match->hasGroup(14);
// whether the group was matched, even if last or empty string
$match->matched(5);
// group, or default value if not matched
$match->group(5)->orReturn('unmatched');
});
Related
I want to split the text 'MTWTHFSSU' days of the week and store it in an array.
So far I am using this code using preg_split and REGEX.
$splitdays = preg_split('/(.H?)/',$days,null,PREG_SPLIT_NO_EMPTY|PREG_SPLIT_DELIM_CAPTURE);
print_r($splitdays);
array[0=>M, 1 => T, 2 => W, 3 => TH, 4 => F, 5 => S, 6 => S, 7 => U]
//this is my output
//if i change (.U) i will get correct SUNDAY = 'SU' but TH will be => T, => H
Note that the . can match any character.
As an alternative, you can use a bit more precise match with preg_match_all and use a pattern with an alternation | to list the more specific matches at the beginning and use a character class to list the single character variations.
TH|SU|[MTWFS]
For example
$days = "MTWTHFSSU";
$pattern = "/TH|SU|[MTWFS]/";
preg_match_all($pattern, $days, $matches);
print_r($matches[0]);
Output
Array
(
[0] => M
[1] => T
[2] => W
[3] => TH
[4] => F
[5] => S
[6] => SU
)
This should do the trick
$days='MTWTHFSSU';
$splitdays = preg_split('/(.H|.U?)/',$days,null,PREG_SPLIT_NO_EMPTY|PREG_SPLIT_DELIM_CAPTURE);
print_r($splitdays);
#anubhava's more concise comment works as well
preg_split('/(.[HU]?)/',$days,null,PREG_SPLIT_NO_EMPTY|PREG_SPLIT_DELIM_CAPTURE)
I need to figure out a method using PHP to chunk the 1's and 0's into sections.
1001 would look like: array(100,1)
1001110110010011 would look like: array(100,1,1,10,1,100,100,1,1)
It gets different when the sequence starts with 0's... I would like it to segment the first 0's into their own blocks until the first 1 is reached)
00110110 would look like (0,0,1,10,1,10)
How would this be done with PHP?
You can use preg_match_all to split your string, using the following regex:
10*|0
This matches either a 1 followed by some number of 0s, or a 0. Since a regex always tries to match the parts of an alternation in the order they occur, the second part will only match 0s that are not preceded by a 1, that is those at the start of the string. PHP usage:
$beatstr = '1001110110010011';
preg_match_all('/10*|0/', $beatstr, $m);
print_r($m);
$beatstr = '00110110';
preg_match_all('/10*|0/', $beatstr, $m);
print_r($m);
Output:
Array
(
[0] => Array
(
[0] => 100
[1] => 1
[2] => 1
[3] => 10
[4] => 1
[5] => 100
[6] => 100
[7] => 1
[8] => 1
)
)
Array
(
[0] => Array
(
[0] => 0
[1] => 0
[2] => 1
[3] => 10
[4] => 1
[5] => 10
)
)
Demo on 3v4l.org
Let's take an example of following string:
$string = "length:max(260):min(20)";
In the above string, :max(260):min(20) is optional. I want to get it if it is present otherwise only length should be returned.
I have following regex but it doesn't work:
/(.*?)(?::(.*?))?/se
It doesn't return anything in the array when I use preg_match function.
Remember, there can be something else than above string. Maybe like this:
$string = "number:disallow(negative)";
Is there any problem in my regex or PHP won't return anything? Dumping preg_match returns int 1 which means the string matches the regex.
Fully Dumped:
int 1
array (size=2)
0 => string '' (length=0)
1 => string '' (length=0)
You're using single character (.) matching in the case of being lazy, at the very beginning. So it stops at the zero position. If you change your preg_match function to preg_match_all you'll see the captured groups.
Another problem is with your Regular Expression. You're killing the engine. Also e modifier is deprecated many many decades before!!! and yet it was used in preg_replace function only.
Don't use s modifier too! That's not needed.
This works at your case:
/([^:]+)(:.*)?/
Online demo
I tried to prepare a regex which can probably solve your issue and also add some value to it
this regex will not only match the optional elements but will also capture in key value pair
Regex
/(?<=:|)(?'prop'\w+)(?:\((?'val'.+?)\))?/g
Test string
length:max(260):min(20)
length
number:disallow(negative)
Result
MATCH 1
prop [0-6] length
MATCH 2
prop [7-10] max
val [11-14] 260
MATCH 3
prop [16-19] min
val [20-22] 20
MATCH 4
prop [24-30] length
MATCH 5
prop [31-37] number
MATCH 6
prop [38-46] disallow
val [47-55] negative
try demo here
EDIT
I think I understand what you meant by duplicate array with different key, it was due to named captures eg. prop & val
here is the revision without named capturing
Regex
/(?<=:|)(\w+)(?:\((.+?)\))?/
Sample code
$str = "length:max(260):min(20)";
$str .= "\nlength";
$str .= "\nnumber:disallow(negative)";
preg_match_all("/(?<=:|)(\w+)(?:\((.+?)\))?/",
$str,
$matches);
print_r($matches);
Result
Array
(
[0] => Array
(
[0] => length
[1] => max(260)
[2] => min(20)
[3] => length
[4] => number
[5] => disallow(negative)
)
[1] => Array
(
[0] => length
[1] => max
[2] => min
[3] => length
[4] => number
[5] => disallow
)
[2] => Array
(
[0] =>
[1] => 260
[2] => 20
[3] =>
[4] =>
[5] => negative
)
)
try demo here
I was wondering how can I create preg_match for catching:
id=4
4 being any number and how can I search for the above example in a string?
If this is could be correct /^id=[0-9]/, the reason why I'm asking is because I'm not really good with preg_match.
for 4 being any number, we must set the range for it:
/^id\=[0-9]+/
\escape the equal-sign, plus after the number means 1 or even more.
You should go with the the following:
/id=(\d+)/g
Explanations:
id= - Literal id=
(\d+) - Capturing group 0-9 a character range between 0 and 9; + - repeating infinite times
/g - modifier: global. All matches (don't return on first match)
Example online
If you want to grab all ids and its values in PHP you could go with:
$string = "There are three ids: id=10 and id=12 and id=100";
preg_match_all("/id=(\d+)/", $string, $matches);
print_r($matches);
Output:
Array
(
[0] => Array
(
[0] => id=10
[1] => id=12
[2] => id=100
)
[1] => Array
(
[0] => 10
[1] => 12
[2] => 100
)
)
Example online
Note: If you want to match all you must use /g modifier. PHP doesn't support it but has other function for that which is preg_match_all. All you need to do is remove the g from the regex.
I have some strings like:
some words 1-25 to some words 26-50
more words 1-10
words text and words 30-100
how can I find and get from string all of the "1-25" and the "26-50" and more
If it’s integers, match multiple digits: \d+. To match the whole range expression: (\d+)-(\d+).
Maybe you also want to allow whitespace between the dash and the numbers:
(\d+)\s*-\s*(\d+)
And maybe you want to make sure that the expression stands free, i.e. isn’t part of a word:
\b(\d+)\s*-\s*(\d+)\b
\b is a zero-width match and tests for word boundaries. This expression forbids things
like “Some1 -2text” but allows “Some 1-2 text”.
You can do this with regular expressions:
echo preg_match_all('/([0-9]+)-([0-9]+)/', 'some words 1-25 to some words 26-50 more words 1-10 words text and words 30-100', $matches);
4
print_r($matches);
Array
(
[0] => Array
(
[0] => 1-25
[1] => 26-50
[2] => 1-10
[3] => 30-100
)
[1] => Array
(
[0] => 1
[1] => 26
[2] => 1
[3] => 30
)
[2] => Array
(
[0] => 25
[1] => 50
[2] => 10
[3] => 100
)
)
For each range the first value is in array[1] and the second is in array[2] at the same index.
I think this line is enough
preg_replace("/[^0-9]/","",$string);