Unexpected preg_match result from pattern with "?:" - php

I try this pattern
(?:(\d+)\/|)reports\/(\d+)-([\w-]+).html
with this string (preg_match with modifiers "Axu")
reports/683868-derger-gergewrger.html
and i expected this matched result (https://regex101.com/r/kX6yZ5/1):
[1] => 683868
[2] => derger-gergewrger
But i get this:
[1] =>
[2] => 683868
[3] => derger-gergewrger
Why? Where does the empty value (1), because the pattern should not capture "?:"
I have two cases:
"reports/683868-derger-gergewrger.html"
"757/reports/683868-derger-gergewrger.html"
at first case, i need two captures, but at second case i need three captures.

You can use:
preg_match('~(?:\d+/)?reports/(\d+)-([\w-]+)\.html~',
'reports/683868-derger-gergewrger.html', $m);
print_r($m);
Array
(
[0] => reports/683868-derger-gergewrger.html
[1] => 683868
[2] => derger-gergewrger
)
EDIT: You probably want this behavior:
$s = '757/reports/683868-derger-gergewrger.html';
preg_match('~(?|(\d+)/reports/(\d+)-([\w-]+)\.html|reports/(\d+)-([\w-]+)\.html)~',
$s, $m); print_r($m);Array
(
[0] => 757/reports/683868-derger-gergewrger.html
[1] => 757
[2] => 683868
[3] => derger-gergewrger
)
and:
$s = 'reports/683868-derger-gergewrger.html';
preg_match('~(?|(\d+)/reports/(\d+)-([\w-]+)\.html|reports/(\d+)-([\w-]+)\.html)~',
$s, $m); print_r($m);
Array
(
[0] => reports/683868-derger-gergewrger.html
[1] => 683868
[2] => derger-gergewrger
)
(?|..) is a Non-capturing group. Subpatterns declared within each alternative of this construct will start over from the same index.

Related

Array named capture using PHP regex

If named capture matches multiple times, is it possible to retrieve all matches?
Example
<?php
$string = 'TextToMatch [some][random][tags] SomeMoreMatches';
$pattern = "!(TextToMatch )(?P<tags>\[.+?\])+( SomeMoreMatches)!";
preg_match($pattern, $string, $matches);
print_r($matches);
Which results in
Array
(
[0] => TextToMatch [some][random][tags] SomeMoreMatches
[1] => TextToMatch
[tags] => [tags]
[2] => [tags]
[3] => SomeMoreMatches
)
Is is possible to get something like
Array
(
[0] => TextToMatch [some][random][tags] SomeMoreMatches
[1] => TextToMatch
[tags] => Array
(
[0] => [some]
[1] => [random]
[2] => [tags]
)
[2] => Array
(
[0] => [some]
[1] => [random]
[2] => [tags]
)
[3] => SomeMoreMatches
)
using only preg_match?
I am aware that I can explode tags, but I wonder if I can do this with preg_match (or similiar function) only.
Other example
$input = "Some text [many][more][other][tags][here] and maybe some text here?";
Desirable output
Array
(
[0] => Some text [many][more][other][tags][here] and maybe some text here?
[1] => Some text
[tags] => Array
(
[0] => [many]
[1] => [more]
[2] => [other]
[3] => [tags]
[4] => [here]
)
[2] => Array
(
[0] => [many]
[1] => [more]
[2] => [other]
[3] => [tags]
[4] => [here]
)
[3] => and maybe some text here?
)
You need use preg_match_all and modify the reg exp:
preg_match_all('/(?P<tags>\[.+?\])/', $string, $matches);
Just remove the + after ) to set one pattern and preg_match_all make a global search
If you need the specific answer that you posted, try with:
$string = '[some][random][tags]';
$pattern = "/(?P<tags>\[.+?\])/";
preg_match_all($pattern, $string, $matches);
$matches = [
implode($matches['tags']), end($matches['tags'])
] + $matches;
print_r($matches);
You get:
Array
(
[0] => [some][random][tags]
[1] => [tags]
[tags] => Array
(
[0] => [some]
[1] => [random]
[2] => [tags]
)
)
Since you stated in your comments that you are not actually interested in the leading substring before the set of tags, and because you stated that you don't necessarily need the named capture group (I never use them), you really only need to remove the first bit, split the string on the space after the set of tags, then split each tag in the set of tags.
Code: (Demo)
$split = explode(' ', strstr($input, '['), 2); // strstr() trims off the leading substring
var_export($split); // ^ tells explode to stop after making 2 elements
Produces:
array (
0 => '[many][more][other][tags][here]',
1 => 'and maybe some text here?',
)
Then the most direct/clean way to split those square bracketed tags, is to use the zero-width position between each closing bracket (]) and each opening bracket ([). Since only regex can isolate these specific positions as delimiters, I'll suggest preg_split().
$split[0] = preg_split('~]\K~', $split[0], -1, PREG_SPLIT_NO_EMPTY);
var_export($split); ^^- release/forget previously matched character(s)
This is the final output:
array (
0 =>
array (
0 => '[many]',
1 => '[more]',
2 => '[other]',
3 => '[tags]',
4 => '[here]',
),
1 => 'and maybe some text here?',
)
No, as Wiktor stated(1, 2), it is not possible to do using only preg_match
Solution that just works
<?php
$string = 'TextToMatch [some][random][tags] SomeMoreMatches';
$pattern = "!(TextToMatch )(?P<tags>\[.+?\]+)( SomeMoreMatches)!";
preg_match($pattern, $string, $matches);
$matches[2] = $matches["tags"] = array_map(function($s){return "[$s]";}, explode("][", substr($matches["tags"],1,-1)));
print_r($matches);

A regular expression to match a single character followed by numbers and ends with 'c'

I need a regex for the following pattern:
a single character from [e-g] followed by one or more numbers that ends with character 'c'.
for example
e123f654g933c
expected result:
Array
(
[0] => Array
(
[0] => e123
[1] => f654
[2] => g933
)
)
or
e123f654g933ce99f77g66c
expected result:
Array
(
[0] => Array
(
[0] => e123
[1] => f654
[2] => g933
),
[1] => Array
(
[0] => e99
[1] => f77
[2] => g66
)
)
I tried using the following but I don't know what to do with 'c' part.
I used this ([e-g]{1}[0-9]{1,}c)+ but it fails.
$subject="e123f654g933ce99f786g776c";
preg_match_all('/[e-g]{1}[0-9]{1,}/', $subject, $match);
print '<pre>' . print_r($match,1) . '</pre>';
Array
(
[0] => Array
(
[0] => e123
[1] => f654
[2] => g933
[3] => e99
[4] => f786
[5] => g776
)
)
thanks.
I couldn't manage to generate your multi dimensional output array via a single regex function call.
Code (Demo)
$strings = [
'e123f654g933c',
'e123f654g933ce99f77g66c'
];
foreach ($strings as $string) {
var_export(
array_map(
function($v) {
return preg_match_all('/[e-g]\d+/', $v, $out2) ? $out2[0] : []; // split the groups by string format
// or return preg_split('/\d+\K/', $v, 0, PREG_SPLIT_NO_EMPTY);
// or return preg_split('/(?=[e-g])/', $v, 0, PREG_SPLIT_NO_EMPTY);
},
preg_match_all('/(?:[e-g]\d+)+(?=c)/', $string, $out1) ? $out1[0] : [] // split into groups using c
// or explode('c', rtrim($string, 'c'))
// or array_slice(explode('c', $string), 0, -1)
// or preg_split('/c/', $string, 0, PREG_SPLIT_NO_EMPTY)
)
);
echo"\n\n";
}
Output:
array (
0 =>
array (
0 => 'e123',
1 => 'f654',
2 => 'g933',
),
)
array (
0 =>
array (
0 => 'e123',
1 => 'f654',
2 => 'g933',
),
1 =>
array (
0 => 'e99',
1 => 'f77',
2 => 'g66',
),
)
It seems you are looking for
[e-g]\d+
This needs to be matched and extracted in PHP like so...
<?php
$strings = ['e123f654g933c', 'e123f654g933ce99f77g66c'];
$regex = '~[e-g]\d+~';
foreach ($strings as $string) {
if (preg_match_all($regex, $string, $matches)) {
print_r($matches[0]);
}
}
?>
... and yields
Array
(
[0] => e123
[1] => f654
[2] => g933
)
Array
(
[0] => e123
[1] => f654
[2] => g933
[3] => e99
[4] => f77
[5] => g66
)
You may use
'~(?:\G(?!^)|(?=(?:[e-g]\d+)+c))[e-g]\d+~'
See the regex demo. In short, due to the (?:\G(?!^)|(?=(?:[e-g]\d+)+c)) part, [e-g]\d+ will only match when in between 1 or more occurrences of [e-g]\d+ and c.
Details
(?:\G(?!^)|(?=(?:[e-g]\d+)+c)) - match the end of the last successful match (\G(?!^)) or (|) the location followed with an e, f or g letter followed with 1+ digits, 1+ occurrences (due to the(?=(?:[e-g]\d+)+c) positive lookahead)
[e-g]\d+ - an e, f or g letter followed with 1+ digits
PHP demo:
$re = '/(?:\G(?!^)|(?=(?:[e-g]\d+)+c))[e-g]\d+/';
$str = 'e123f654g933c and e123f654g933ce99f77g66c';
preg_match_all($re, $str, $matches);
print_r($matches[0]);
// => Array ( [0] => e123 [1] => f654 [2] => g933 [3] => e123 [4] => f654 [5] => g933 [6] => e99 [7] => f77 [8] => g66 )
You can't easily achieve this with a single RegExp.
The solution is to split the string at the occurrences of 'c', handle the parts separately, and then build the result array:
<?php
$strings = [
'e123f654g933c',
'e123f654g933ce99f77g66c',
];
foreach ($strings as $input)
{
print_r(match($input));
}
function match($input)
{
$result = [];
$parts = array_filter(explode('c', $input));
foreach ($parts as $part)
{
preg_match_all('~[e-g]\d+~', $part, $matches);
$result[] = $matches[0];
}
return $result;
}
The output will be
Array
(
[0] => Array
(
[0] => e123
[1] => f654
[2] => g933
)
)
Array
(
[0] => Array
(
[0] => e123
[1] => f654
[2] => g933
)
[1] => Array
(
[0] => e99
[1] => f77
[2] => g66
)
)

Split __HELLO____HAPPY_BIRTHDAY__ to __HELLO__ and __HAPPY_BIRTHDAY__

I have some php code like this:
$input = "
__HELLO__
__HAPPY_BIRTHDAY__
__HELLO____HAPPY_BIRTHDAY__";
preg_match_all('/__(\w+)__/', $input, $matches);
print_r($matches[0]);
Currently the result of $matches[0] is this:
Array
(
[0] => __HELLO__
[1] => __HAPPY_BIRTHDAY__
[2] => __HELLO____HAPPY_BIRTHDAY__
)
As you can see my regex is interpreting __HELLO____HAPPY_BIRTHDAY__ as one match, which I don't want.
I want the matches to return this:
Array
(
[0] => __HELLO__
[1] => __HAPPY_BIRTHDAY__
[2] => __HELLO__
[3] => __HAPPY_BIRTHDAY__
)
Where __HELLO____HAPPY_BIRTHDAY__ is split into __HELLO__ and __HAPPY_BIRTHDAY__. How can I do this?
(Each line will only ever have one underscore in between the outer underscores e.g. __HAPPY__BIRTHDAY__ is illegal)
You need to use the U modifier. This makes quantifiers "lazy".
$input = "
__HELLO__
__HAPPY_BIRTHDAY__
__HELLO____HAPPY_BIRTHDAY__";
preg_match_all('/__(\w+)__/U', $input, $matches);
print_r($matches[0]);
Output:
Array
(
[0] => __HELLO__
[1] => __HAPPY_BIRTHDAY__
[2] => __HELLO__
[3] => __HAPPY_BIRTHDAY__
)

What's wrong with this regex expression?

i want to preg_match the following code:
{{{/foo:bar/a/0/b}}}
This is my regex (which doesn't work, and i don't understand why):
|{{{\/([[:alpha:]][[:alnum:]\_]*\:[[:alpha:]][[:alnum:]\_]*)(?:\/([[:alnum:]\_]*))+}}}|Uism
Expected result:
Array (
[0] => Array
(
[0] => {{{/foo:bar/a/0/b}}}
)
[1] => Array
(
[0] => foo:bar
)
[2] => Array
(
[0] => a
)
[3] => Array
(
[0] => 0
)
[4] => Array
(
[0] => b
)
)
The result i get:
Array (
[0] => Array
(
[0] => {{{/foo:bar/a/0/b}}}
)
[1] => Array
(
[0] => foo:bar
)
[2] => Array
(
[0] => b
)
)
I only get the last element back. So what's wrong with it?
You're repeating the second capturing group:
(?:
\/
(
[[:alnum:]\_]*
)
)+
On each repetition of the outer non-capturing group, the contents of the inner capturing group are overwritten, which is the reason why only the last match is preserved. This is standard behavior across all regex engines.
(?=(^.*$)|(?:\/(.*?)(?:\/|})))
Try this.See demo.
http://regex101.com/r/lS5tT3/3
Each subsequent match of the same capture group will overwrite the previous one; that's why you end up with just b.
What I would suggest in this case is to match the whole block first and then use a simpler explode() to dig out the inner data; use this expression:
|{{{\/([[:alpha:]][[:alnum:]\_]*\:[[:alpha:]][[:alnum:]\_]*(?:\/[[:alnum:]\_]*)+)}}}|U
Then, with the resulting $matches array (third argument to preg_match()):
$data = explode('/', $matches[1]);
Your pattern is complete overkill for something that should be quite simple:
$rex = "#[{]{3}/(\w+:\w+)/(\w)/(\d)/(\w)[}]{3}#";
$str = "{{{/foo:bar/a/0/b}}}";
preg_match($rex, $str, $res);
Result:
Array
(
[0] => {{{/foo:bar/a/0/b}}}
[1] => foo:bar
[2] => a
[3] => 0
[4] => b
)

How can I split a list with multiple delimiters?

Basically, I want to enter text into a text area, and then use them. For example
variable1:variable2#variable3
variable1:variable2#variable3
variable1:variable2#variable3
I know I could use explode to make each line into an array, and then use a foreach loop to use each line separately, but how would I separate the three variables to use?
Besides preg_split:
$line = 'variable11:variable12#variable13';
print_r(preg_split('/[:#]/', $line));
/*
Array
(
[0] => variable11
[1] => variable12
[2] => variable13
)
*/
you could do a preg_match_all:
$text = 'variable11:variable12#variable13
variable21:variable22#variable23
variable31:variable32#variable33';
preg_match_all('/([^\r\n:]+):([^\r\n#]+)#(.*)\s*/', $text, $matches, PREG_SET_ORDER);
print_r($matches);
/*
Array
(
[0] => Array
(
[0] => variable11:variable12#variable13
[1] => variable11
[2] => variable12
[3] => variable13
)
[1] => Array
(
[0] => variable21:variable22#variable23
[1] => variable21
[2] => variable22
[3] => variable23
)
[2] => Array
(
[0] => variable31:variable32#variable33
[1] => variable31
[2] => variable32
[3] => variable33
)
)
*/
try preg_split http://php.net/manual/en/function.preg-split.php
if necessary, you could make several calls to "explode"
http://jp.php.net/manual/en/function.explode.php

Categories