PHP Regex: How to get optional text if present? - php

Let's take an example of following string:
$string = "length:max(260):min(20)";
In the above string, :max(260):min(20) is optional. I want to get it if it is present otherwise only length should be returned.
I have following regex but it doesn't work:
/(.*?)(?::(.*?))?/se
It doesn't return anything in the array when I use preg_match function.
Remember, there can be something else than above string. Maybe like this:
$string = "number:disallow(negative)";
Is there any problem in my regex or PHP won't return anything? Dumping preg_match returns int 1 which means the string matches the regex.
Fully Dumped:
int 1
array (size=2)
0 => string '' (length=0)
1 => string '' (length=0)

You're using single character (.) matching in the case of being lazy, at the very beginning. So it stops at the zero position. If you change your preg_match function to preg_match_all you'll see the captured groups.
Another problem is with your Regular Expression. You're killing the engine. Also e modifier is deprecated many many decades before!!! and yet it was used in preg_replace function only.
Don't use s modifier too! That's not needed.
This works at your case:
/([^:]+)(:.*)?/
Online demo

I tried to prepare a regex which can probably solve your issue and also add some value to it
this regex will not only match the optional elements but will also capture in key value pair
Regex
/(?<=:|)(?'prop'\w+)(?:\((?'val'.+?)\))?/g
Test string
length:max(260):min(20)
length
number:disallow(negative)
Result
MATCH 1
prop [0-6] length
MATCH 2
prop [7-10] max
val [11-14] 260
MATCH 3
prop [16-19] min
val [20-22] 20
MATCH 4
prop [24-30] length
MATCH 5
prop [31-37] number
MATCH 6
prop [38-46] disallow
val [47-55] negative
try demo here
EDIT
I think I understand what you meant by duplicate array with different key, it was due to named captures eg. prop & val
here is the revision without named capturing
Regex
/(?<=:|)(\w+)(?:\((.+?)\))?/
Sample code
$str = "length:max(260):min(20)";
$str .= "\nlength";
$str .= "\nnumber:disallow(negative)";
preg_match_all("/(?<=:|)(\w+)(?:\((.+?)\))?/",
$str,
$matches);
print_r($matches);
Result
Array
(
[0] => Array
(
[0] => length
[1] => max(260)
[2] => min(20)
[3] => length
[4] => number
[5] => disallow(negative)
)
[1] => Array
(
[0] => length
[1] => max
[2] => min
[3] => length
[4] => number
[5] => disallow
)
[2] => Array
(
[0] =>
[1] => 260
[2] => 20
[3] =>
[4] =>
[5] => negative
)
)
try demo here

Related

REGEX Pattern for Validation that check all string is integer and split into single integers

I tried multiple time to make a pattern that can validate given string is natural number and split into single number.
..and lack of understanding of regex, the closest thing that I can imagine is..
^([1-9])([0-9])*$ or ^([1-9])([0-9])([0-9])*$ something like that...
It only generates first, last, and second or last-second split-numbers.
I wonder what I need to know to solve this problem.. thanks
You may use a two step solution like
if (preg_match('~\A\d+\z~', $s)) { // if a string is all digits
print_r(str_split($s)); // Split it into chars
}
See a PHP demo.
A one step regex solution:
(?:\G(?!\A)|\A(?=\d+\z))\d
See the regex demo
Details
(?:\G(?!\A)|\A(?=\d+\z)) - either the end of the previous match (\G(?!\A)) or (|) the start of string (^) that is followed with 1 or more digits up to the end of the string ((?=\d+\z))
\d - a digit.
PHP demo:
$re = '/(?:\G(?!\A)|\A(?=\d+\z))\d/';
$str = '1234567890';
if (preg_match_all($re, $str, $matches)) {
print_r($matches[0]);
}
Output:
Array
(
[0] => 1
[1] => 2
[2] => 3
[3] => 4
[4] => 5
[5] => 6
[6] => 7
[7] => 8
[8] => 9
[9] => 0
)

PHP: Can preg_match include unmatched groups?

Can the preg_match() function include groups it did not find in the matches array?
Here is the pattern I'm using:
/^([0-9]+)(.[0-9]+)?\s?([^iIbB])?([iI])?([bB])?$/
What I'm trying to is parse an human readable size into bytes. This pattern fits my requirement, but only if I can retrieve matches in the absolute group order.
This can produce upto 5 match groups, which would result in a matches array with indices 0-5. However if the string does not match all groups, then the matches array may have, for example, group 5 actually at index 3.
What I'd like is the final match in that pattern (5) to always be at the same index of the matches array. Because multiple groups are optional it's very important that when reading the matches array we know which group in the expression got matched.
Example situation: The regex tester at regexr.com will show all 5 groups including those not matched always in the correct order. By enabling the "global" and "multi-line" flags and using the following text, you can hover over the blue matches for a good visual.
500.2 KiB
256M
700 Mb
1.2GiB
You'll notice that not all groups are always matched, however the group indexes are always in the correct order.
Edit: Yes I did already try this in PHP with the following:
$matches = [];
$matchesC = 0;
$matchesN = 6;
if (!preg_match("/^([0-9]+)(\.[0-9]+)?\s?([^iIbB])?([iI])?([bB])?$/", $size, $matches) || ($matchesC = count($matches)) < $matchesN) {
print_r($matches);
throw new \Exception(sprintf("Could not parse size string. (%d/%d)", $matchesC, $matchesN));
}
When $size is "256M" that print_r($matches); returns:
Array
(
[0] => 256M
[1] => 256
[2] =>
[3] => M
)
Groups 4 and 5 are missing.
The non-participating groups are just not initialized with an empty string value in PHP, so, Group 4 and 5 are null in case of '256M' string. It seems that preg_match discards those non-initialized values from the end of the array.
In your case, you can make your capturing groups non-optional, but the patterns inside optional.
$arr = array('500.2 KiB', '256M', '700 Mb', '1.2GiB');
foreach ($arr as $s) {
if (preg_match('~^([0-9]+)(\.[0-9]+)?\s?([^ib]?)(i?)(b?)$~i', $s, $m)) {
print_r($m) . "\n";
}
}
Output:
Array
(
[0] => 500.2 KiB
[1] => 500
[2] => .2
[3] => K
[4] => i
[5] => B
)
Array
(
[0] => 256M
[1] => 256
[2] =>
[3] => M
[4] =>
[5] =>
)
Array
(
[0] => 700 Mb
[1] => 700
[2] =>
[3] => M
[4] =>
[5] => b
)
Array
(
[0] => 1.2GiB
[1] => 1
[2] => .2
[3] => G
[4] => i
[5] => B
)
See the PHP demo.
You can use T-Regx which can handle such cases with ease! It always checks whether a group is matched, even if it's last and unmatched. It also can differentiate between "" (matched empty) or null (unmatched):
pattern('^([0-9]+)(.[0-9]+)?\s?([^iIbB])?([iI])?([bB])?$')
->match($size)
->first(function (Match $match) {
// whether the group was used in a pattern
$match->hasGroup(14);
// whether the group was matched, even if last or empty string
$match->matched(5);
// group, or default value if not matched
$match->group(5)->orReturn('unmatched');
});

preg_match to match number csv format and capture each number?

I want to use preg_match to parse '123,456,789,323' and only capture each number into arrray $m.
My php codes:
preg_match("/^(\d+)(?:,(\d+))*?$/",'123,456,789,323',$m));
print_r($m);
This is how I interpret my regexp:
^: Begin of line
1st (\d+): Capture 1st number
,(\d+): Match pattern 'a command then a number'.
(?:,(\d+))*?: Match zero or more [using *] of above pattern but don't
capture whole pattern [using ?:] instead only capture
the number [using (\d+)]. Lastly, match pattern
nongreedy [using last ?]
$: Match end of line.
But I get this output:
Array
(
[0] => 123,456,555,789,323
[1] => 123
[2] => 323
)
What I want is:
Array
(
[0] => 123,456,555,789,323
[1] => 123
[2] => 456
[3] => 789
[4] => 323
)
I thought (...)* is too greedy, so I use (...)*?. But it doesn't approve the output. What do I miss?
PS: I want to know how can regexp do this rather than use other way e.g. explode().

Catching ids and its values from a string with preg_match

I was wondering how can I create preg_match for catching:
id=4
4 being any number and how can I search for the above example in a string?
If this is could be correct /^id=[0-9]/, the reason why I'm asking is because I'm not really good with preg_match.
for 4 being any number, we must set the range for it:
/^id\=[0-9]+/
\escape the equal-sign, plus after the number means 1 or even more.
You should go with the the following:
/id=(\d+)/g
Explanations:
id= - Literal id=
(\d+) - Capturing group 0-9 a character range between 0 and 9; + - repeating infinite times
/g - modifier: global. All matches (don't return on first match)
Example online
If you want to grab all ids and its values in PHP you could go with:
$string = "There are three ids: id=10 and id=12 and id=100";
preg_match_all("/id=(\d+)/", $string, $matches);
print_r($matches);
Output:
Array
(
[0] => Array
(
[0] => id=10
[1] => id=12
[2] => id=100
)
[1] => Array
(
[0] => 10
[1] => 12
[2] => 100
)
)
Example online
Note: If you want to match all you must use /g modifier. PHP doesn't support it but has other function for that which is preg_match_all. All you need to do is remove the g from the regex.

How can I split a string into LETTERS and FLOAT/INTEGER numbers

I've been trying for the couple of days to split a string into letters and numbers. I've found various solutions but they do not work up to my expectations (some of them only separate letters from digits (not integers or float numbers/per say negative numbers).
Here's an example:
$input = '-4D-3A'; // edit: the TEXT part can have multiple chars, i.e. -4AB-3A-5SD
$result = preg_split('/(?<=\d)(?=[a-z])|(?<=[a-z])(?=\d)/i', $input);
print_r($result);
Result:
Array ( [0] => -4 [1] => D-3 [2] => A )
And I need it to be [0] => -4 [1] => D [2] => -3 [3] => A
I've tried doing several changes but no result so far, could you please help me if possible?
Thank you.
try this:
$input = '-4D-3A';
$result = preg_split('/(-?[0-9]+\.?[0-9]*)/i', $input, 0, PREG_SPLIT_DELIM_CAPTURE);
$result=array_filter($result);
print_r($result);
It will split by numbers BUT also capture the delimiter (number)
giving : Array ( [1] => -4 [4] => D [5] => -3 [8] => A )
I've patterened number as:
1. has optional negative sign (you may want to do + too)
2. followed by one or more digits
3. followed by an optional decimal point
4. followed by zero or more digits
Can anyone point out the solution to "-0." being valid number?
How about this regex? ([-]{,1}\d+|[a-zA-Z]+)
I tested it out on http://www.rubular.com/ seems to work as you want.

Categories