Two or more matches in expression - php

Is it possible to make two matches of text - /123/123/123?edit
I need to match 123, 123 ,123 and edit
For the first(123,123,123): pattern is - ([^\/]+)
For the second(edit): pattern is - ([^\?=]*$)
Is it possible to match in one preg_match_all function, or I need to do it twice - one time for one pattern, second one for second?
Thanks !

You can do this with a single preg_match_all call:
$string = '/123/123/123?edit';
$matches = array();
preg_match_all('#(?<=[/?])\w+#', $string, $matches);
/* $matches will be:
Array
(
[0] => Array
(
[0] => 123
[1] => 123
[2] => 123
[3] => edit
)
)
*/
See this in action at http://www.ideone.com/eb2dy
The pattern ((?<=[/?])\w+) uses a lookbehind to assert that either a slash or a question mark must precede a sequence of word characters (\w is a shorthand class equivalent to [a-z0-9_]).

Related

Get all matches of repeating subgroup [duplicate]

I'm trying to get all substrings matched with a multiplier:
$list = '1,2,3,4';
preg_match_all('|\d+(,\d+)*|', $list, $matches);
print_r($matches);
This example returns, as expected, the last match in [1]:
Array
(
[0] => Array
(
[0] => 1,2,3,4
)
[1] => Array
(
[0] => ,4
)
)
However, I would like to get all strings matched by (,\d+), to get something like:
Array
(
[0] => ,2
[1] => ,3
[2] => ,4
)
Is there a way to do this with a single function such as preg_match_all()?
According to Kobi (see comments above):
PHP has no support for captures of the same group
Therefore this question has no solution.
It's true that PHP (or better to say PCRE) doesn't store values of repeated capturing groups for later access (see PCRE docs):
If a capturing subpattern is matched repeatedly, it is the last portion of the string that it matched that is returned.
But in most cases the known token \G does the job. \G 1) matches the beginning of input string (as \A or ^ when m modifier is not set) or 2) starts match from where the previous match ends. Saying that, you have to use it like the following:
preg_match_all('/^\d+|\G(?!^)(,?\d+)\K/', $list, $matches);
See live demo here
or if capturing group doesn't matter:
preg_match_all('/\G,?\d+/', $list, $matches);
by which $matches will hold this (see live demo):
Array
(
[0] => Array
(
[0] => 1
[1] => ,2
[2] => ,3
[3] => ,4
)
)
Note: the benefit of using \G over the other answers (like explode() or lookbehind solution or just preg_match_all('/,?\d+/', ...)) is that you are able to validate the input string to be only in the desired format ^\d+(,\d+)*$ at the same time while exporting the matches:
preg_match_all('/(?:^(?=\d+(?:,\d+)*$)|\G(?!^),)\d+/', $list, $matches);
Using lookbehind is a way to do the job:
$list = '1,2,3,4';
preg_match_all('|(?<=\d),\d+|', $list, $matches);
print_r($matches);
All the ,\d+ are in group 0.
output:
Array
(
[0] => Array
(
[0] => ,2
[1] => ,3
[2] => ,4
)
)
Splitting is only an option when the character to split isn't used in the patterns to match itself.
I had a situation where a badly formatted comma separated line has to be parsed into any of a number of known options.
i.e. options '1,2', '2', '2,3'
subject '1,2,3'.
Splitting on ',' will result in '1', '2', and '3'; only one ('2') of which is a valid match, this happens because the separator is also part of the options.
The naïve regex would be something like '~^(1,2|2|2,3)(?:,(1,2|2|2,3))*$~i', but this runs into the problem of same-group captures.
My "solution" was to just expand the regex to match the maximum number of matches possible:
'~^(1,2|2|2,3)(?:,(1,2|2|2,3))?(?:,(1,2|2|2,3))?$~i'
(if more options were available, just repeat the '(?:,(1,2|2|2,3))?' bit.
This does result in empty string results for "unused" matches.
It's not the cleanest solution, but works when you have to deal with badly formatted input data.
Why not just:
$ar = explode(',', $list);
print_r($ar);
From http://www.php.net/manual/en/regexp.reference.repetition.php :
When a capturing subpattern is repeated, the value captured is the substring that matched the final iteration.
Also similar thread:
How to get all captures of subgroup matches with preg_match_all()?

Regular expression to extract a numeric value on a changing position within a variable string

How can I extract the bold numeric part of a string, when the most of the string can change? /data/ is always present and followed by the relevant, variable, numeric part (in this case 123456).
differentcontentLocationhttps://example.com/api/result/13548/data/123456differentstuffincludingwhitespacesandnewlines8484
$str = "differentcontentLocationhttps://example.com/api/result/13548/data/123456differentstuffincludingwhitespacesandnewlines8484";
$str2 = "differentcontentLocationhttps://example.com/api/result/13548/data/123456";
In this example I need 123456. The only constant parts in the string are /data/ and maybe the first part of the URL, like https://.
preg_match("#/data/([0-9]+)([^0-9]+)#siU", $str, $matches);
Results in Array ( [0] => /data/123456d [1] => 123456 [2] => d ), what would be acceptable. But if there's nothing following the relevant numeric part, like in $str2, this expression fails. I've tried to make the tailing part optional with preg_match("#/ads/([0-9]+)(([^0-9]+)?)#siU", $x, $matches);, but it fails, too; returning only the first number of the numeric part.
The U greediness swapping modifier makes all greedy subpattern lazy here, you should remove it together with ([^0-9]+). You also do not need DOTALL modifier because there is no . in your pattern whose behavior could be modified with that s flag.
preg_match("#/data/([0-9]+)#i", $str, $matches);
Now, the pattern will match:
/data/ - a sequence of literal chars
([0-9]+) - Group 1 capturing 1+ digits (same as (\d+))
See the PHP demo.
$str = "differentcontentLocationhttps://e...content-available-to-author-only...e.com/api/result/13548/data/123456differentstuffincludingwhitespacesandnewlines8484";
$str2 = "differentcontentLocationhttps://e...content-available-to-author-only...e.com/api/result/13548/data/123456";
preg_match("#/data/([0-9]+)#i", $str, $matches);
print_r($matches); // Array ( [0] => /data/123456 [1] => 123456 )
preg_match("#/data/([0-9]+)#i", $str2, $matches2);
print_r($matches2); // Array ( [0] => /data/123456 [1] => 123456 )

Using regex to not match periods between numbers

I have a regex code that splits strings between [.!?], and it works, but I'm trying to add something else to the regex code. I'm trying to make it so that it doesn't match [.] that's between numbers. Is that possible? So, like the example below:
$input = "one.two!three?4.000.";
$inputX = preg_split("~(?>[.!?]+)\K(?!$)~", $input);
print_r($inputX);
Result:
Array ( [0] => one. [1] => two! [2] => three? [3] => 4. [4] => 000. )
Need Result:
Array ( [0] => one. [1] => two! [2] => three? [3] => 4.000. )
You should be able to split on this:
(?<=(?<!\d(?=[.!?]+\d))[.!?])(?![.!?]|$)
https://regex101.com/r/kQ6zO4/1
It uses lookarounds to determine where to split. It looks behind to try to match anything in the set [.!?] one or more times as long as it isn't preceded by and succeeded by a digit.
It also won't return the last empty match by ensuring the last set isn't the end of the string.
UPDATE:
This should be much more efficient actually:
(?!\d+\.\d+).+?[.!?]+\K(?!$)
https://regex101.com/r/eN7rS8/1
Here is another possibility using regex flags:
$input = "one.two!three???4.000.";
$inputX = preg_split("~(\d+\.\d+[.!?]+|.*?[.!?]+)~", $input, -1, PREG_SPLIT_DELIM_CAPTURE | PREG_SPLIT_NO_EMPTY);
print_r($inputX);
It includes the delimiter in the split and ignores empty matches. The regex can be simplified to ((?:\d+\.\d+|.*?)[.!?]+), but I think what is in the code sample above is more efficient.

PHP preg_match_all $matches output contains 3 rows

Here is my test code:
$test = '#12345 abc #12 #abd engng#geneng';
preg_match_all('/(^|\s)#([^# ]+)/', $test, $matches);
print_r($matches);
And the output $matches:
Array ( [0] => Array ( [0] => #12345 [1] => #12 [2] => #abd ) [1] => Array ( [0] => [1] => [2] => ) [2] => Array ( [0] => 12345 [1] => 12 [2] => abd ) )
My question is why does it have an empty row?
[1] => Array ( [0] => [1] => [2] => )
If I get ride of (^|\s) in the regex, the second row will disappear. However I would not able to prevent matching #geneng.
Any answer will be appreciated.
The problem with your regular expression is that it matches # even when it is preceded by whitespace. Because \s will match the whitespace, it will be captured into $matches array. You can solve this problem by using lookarounds. In this case, it can be solved with a positive lookbehind:
preg_match_all('/(?<=^|\s)#([^# ]+)/', $test, $matches);
This will match the part after # only if it is preceded by a space or beginning-of-the line anchor. It's important to note that lookarounds do not actually consume characters. They just assert that the given regular expression is either followed or preceded by something.
Demo
It's because of the memory capture to test (^|\s):
preg_match_all('/(^|\s)#([^# ]+)/', $test, $matches);
^^^^^^
It's captured as memory location #1, so to avoid that you can simply use non-capturing parentheses:
preg_match_all('/(?:^|\s)#([^# ]+)/', $test, $matches);
^^
preg_match_all uses by default the PREG_PATTERN_ORDER flag. This means that you will obtain:
$matches[0] -> all substrings that matches the whole pattern
$matches[1] -> all capture groups 1
$matches[2] -> all capture groups 2
etc.
You can change this behavior using the PREG_SET_ORDER flag:
$matches[0] -> array with the whole pattern and the capture groups for the first result
$matches[1] -> same for the second result
$matches[2] -> etc.
In your code you (PREG_PATTERN_ORDER by default) you obtain $matches[1] with only empty or blank items because it is the content of capture group 1 (^|\s)
There is 2 set of parentheses that's why you get an empty row. PHP thinks, you want 2 set of matching in the string. Removing one of them will remove one array.
FYI: In this case, you can not use [^|\s] instead of (^|\s). Cause PHP will think, you want to exclude the white space.

Regular expersion repeat inside a pattern

I have the following text and I would like to preg_match_all what is within the {'s and }'s if it contains only a-zA-Z0-9 and :
some text,{SOMETHING21} {SOMETHI32NG:MORE}some msdf{TEXT:GET:2}sdfssdf sdf sdf
I am trying to match {SOMETHING21} {SOMETHI32NG:MORE} {TEXT:GET:2} there can be several :'s within the tag.
What I currently have is:
preg_match_all('/\{([a-zA-Z0-9\-]+)(\:([a-zA-Z0-9\-]+))*\}/', $from, $matches, PREG_SET_ORDER);
It works as expected for {SOMETHING21} and {SOMETHI32NG:MORE} but for {TEXT:GET:2} it only matches TEXT and 2
So it only matches the first and last word within the tag, and leaves the middle ones out of the $matches array. Is this even possible or should I just match them and then explode on : ?
-- edit --
Well the question isn't if I can get the tags, the question is if I can get them grouped without having to explode the results again. Even though my current regex finds all the results the subpattern does not come back with all the matches in $matches.
I hope the following will clear it up abit more:
\{ // the match has to start with {
([a-zA-Z0-9\-]+) // after the { the match needs to have alphanum consisting out of 1 or more characters
(
\: // if we have : it should be followed by alphanum consisting out of 1 or more characters
([a-zA-Z0-9\-]+) // <---- !! this is what it is about !! even though this subexpression is between brackets it is not put into $matches if more then one of these is found
)* // there could be none or more of the previous subexpression
\} // the match has to end with }
You can't get all the matched values of a capturing group, you only get the last one.
So you have to match the pattern:
preg_match_all('/{([a-z\d-]+(?::[a-z\d-]+)*)}/i', $from, $matches);
and then split each element in $matches[1] on :.
I used non-capture groupings to eliminate the inner groups, and just capture the outer complete colon-separated list.
$from = "some text,{SOMETHING21} {SOMETHI32NG:MORE}some msdf{TEXT:GET:2}sdfssdf sdf sdf";
preg_match_all('/\{((?:[a-zA-Z0-9\-]+)(?:\:(?:[a-zA-Z0-9\-]+))*)\}/', $from, $matches, PREG_SET_ORDER);
print_r($matches);
Result:
Array
(
[0] => Array
(
[0] => {SOMETHING21}
[1] => SOMETHING21
)
[1] => Array
(
[0] => {SOMETHI32NG:MORE}
[1] => SOMETHI32NG:MORE
)
[2] => Array
(
[0] => {TEXT:GET:2}
[1] => TEXT:GET:2
)
)
Maybe I didn't understand the requirement, but...
preg_match_all('/{[A-Za-z0-9:-]+}/', $from, $matches, PREG_PATTERN_ORDER);
results in:
Array
(
[0] => Array
(
[0] => {SOMETHING21}
[1] => {SOMETHI32NG:MORE}
[2] => {TEXT:GET:2}
)
)

Categories