preg_split and multiple delimiters - php

let me start by saying the first number before the first - will be the ID I need to extract. from the first - to the first / will be the 'name' I need to extract. Everything after that I do not care for.
Test String:
1-gc-communications/edit/profile_picture
Expected Output:
Array ( [0] => 1 [1] => gc-communications [2] => /edit/profile_picture )
The best I could come up with was the following patterns (along with their results - with a limit of 3)
Pattern: /-|edit\/profile_picture/
Result: Array ( [0] => 1 [1] => gc [2] => communications/edit/profile_picture )
^ This one is flawed because it does both dashes.
Pattern: /~-~|edit\/profile_picture/
Result: Array ( [0] => 1-gc-communications/ [1] => )
^ major fail.
I know I can do a 2-element limit and just break on the first / and then do a preg_split on the result array, but I would love a way to make this work with one line.
If this is a no-go I am open to other "one liner" solutions.

Try this one
$str = '1-gc-communications/edit/profile_picture';
$match = preg_split('#([^-]+)-([^/]+)/(.*)#', $str, 0, PREG_SPLIT_DELIM_CAPTURE);
print_r($match);
return like as
array (
0 => '',
1 => '1',
2 => 'gc-communications',
3 => 'edit/profile_picture',
4 => '',
)

the first number before the first - will be the ID I need to extract. from the first - to the first / will be the 'name' I need to extract. Everything after that I do not care for.
This task seems a great candidate for sscanf() -- it is specifically designed for parsing (scanning) a formatted string. Not only is the syntax brief, you know that you do not need to make repeated matches with the pattern. The output, in case it matters, can be pre-cast as an integer or string for convenience. The remaining string from the first occurring slash are simply ignored.
Code: (Demo)
$str = '1-gc-communications/edit/profile_picture';
var_export(
sscanf($str, '%d-%[^/]')
# ^^ ^^^^^- greedily match one or more non-slash characters
# ^^------- greedily match one or more numeric characters
);
Output:
array (
0 => 1, #<-- integer-typed
1 => 'gc-communications', #<-- string-typed
)

Related

How to get only last bracket from multiple arrays? PHP

I have a few arrays and I have to choose only what is in the last bracket. How to do it?
For example my some arrays always be similar, but be different:
Array
(
[0] => 3 BUILTIN\Users:(OI)(CI)(F)
)
Array
(
[0] => BUILTIN\Users:(OI)(CI)(R)
)
Array
(
[0] => 22 BUILTIN\Users:(OI)(CI)(R,W)
)
And i want get result from that like:
(F)
(R)
(R,W)
I must use substr or what?
Regards
You can do this simply with preg_filter
$arr = array(
'3 BUILTIN\Users:(OI)(CI)(F)',
'BUILTIN\Users:(OI)(CI)(R)',
'22 BUILTIN\Users:(OI)(CI)(R,W)'
);
print_r(preg_filter('/^.+(\([^)]+\))$/', '\1', $arr));
Output
Array
(
[0] => (F)
[1] => (R)
[2] => (R,W)
)
Sandbox
The Regex
^ - match start of string
.+ - match anything one or more "greedy"
(...) - First Capture group
\( the ( literally
[^)]+ match anything "not" )
\) the ) literally
$ - match end of string.
So what this does is replace everything in each array item that is not in the capture group with \1 - the first capture group. Which should match everything from the start of the last ( to the end of that "set" ). Basically what we want is only that "stuff" the last parentheses set, which is good, because that's what the above code does (oddly enough, it's like someone set it just the way we need it ... lol).
This should also remove anything from the array that does not match that pattern. For example:
$arr = array(
'3 BUILTIN\Users:(OI)(CI)(F)',
'BUILTIN\Users:(OI)(CI)(R)',
'22 BUILTIN\Users:(OI)(CI)(R,W)',
'foo' //--- foo will not appear in the results, because it does not end with (...)
);
Hope it helps!
preg_filter() is identical to preg_replace() except it only returns the (possibly transformed) subjects where there was a match. For details about how this function works, read the preg_replace() documentation.
https://www.php.net/manual/en/function.preg-filter.php
*PS I gave the above example as it highlights the difference between preg_replace() and preg_filter() (mentioned above). You could do the same with just preg_replace() if you are sure there will always be a match in each item.
Here you can go
$arr = array(
'3 BUILTIN\Users:(OI)(CI)(F)',
'BUILTIN\Users:(OI)(CI)(R)',
'22 BUILTIN\Users:(OI)(CI)(R,W)'
);
$newArr = array();
foreach($arr as $k => $v){
$lastElement = array_filter(explode('(',explode(':',$v)[1]));
$newArr[] = '('.$lastElement[count($lastElement)];
}
print_r($newArr);
Result :-
Array
(
[0] => (F)
[1] => (R)
[2] => (R,W)
)

Get all matches of repeating subgroup [duplicate]

I'm trying to get all substrings matched with a multiplier:
$list = '1,2,3,4';
preg_match_all('|\d+(,\d+)*|', $list, $matches);
print_r($matches);
This example returns, as expected, the last match in [1]:
Array
(
[0] => Array
(
[0] => 1,2,3,4
)
[1] => Array
(
[0] => ,4
)
)
However, I would like to get all strings matched by (,\d+), to get something like:
Array
(
[0] => ,2
[1] => ,3
[2] => ,4
)
Is there a way to do this with a single function such as preg_match_all()?
According to Kobi (see comments above):
PHP has no support for captures of the same group
Therefore this question has no solution.
It's true that PHP (or better to say PCRE) doesn't store values of repeated capturing groups for later access (see PCRE docs):
If a capturing subpattern is matched repeatedly, it is the last portion of the string that it matched that is returned.
But in most cases the known token \G does the job. \G 1) matches the beginning of input string (as \A or ^ when m modifier is not set) or 2) starts match from where the previous match ends. Saying that, you have to use it like the following:
preg_match_all('/^\d+|\G(?!^)(,?\d+)\K/', $list, $matches);
See live demo here
or if capturing group doesn't matter:
preg_match_all('/\G,?\d+/', $list, $matches);
by which $matches will hold this (see live demo):
Array
(
[0] => Array
(
[0] => 1
[1] => ,2
[2] => ,3
[3] => ,4
)
)
Note: the benefit of using \G over the other answers (like explode() or lookbehind solution or just preg_match_all('/,?\d+/', ...)) is that you are able to validate the input string to be only in the desired format ^\d+(,\d+)*$ at the same time while exporting the matches:
preg_match_all('/(?:^(?=\d+(?:,\d+)*$)|\G(?!^),)\d+/', $list, $matches);
Using lookbehind is a way to do the job:
$list = '1,2,3,4';
preg_match_all('|(?<=\d),\d+|', $list, $matches);
print_r($matches);
All the ,\d+ are in group 0.
output:
Array
(
[0] => Array
(
[0] => ,2
[1] => ,3
[2] => ,4
)
)
Splitting is only an option when the character to split isn't used in the patterns to match itself.
I had a situation where a badly formatted comma separated line has to be parsed into any of a number of known options.
i.e. options '1,2', '2', '2,3'
subject '1,2,3'.
Splitting on ',' will result in '1', '2', and '3'; only one ('2') of which is a valid match, this happens because the separator is also part of the options.
The naïve regex would be something like '~^(1,2|2|2,3)(?:,(1,2|2|2,3))*$~i', but this runs into the problem of same-group captures.
My "solution" was to just expand the regex to match the maximum number of matches possible:
'~^(1,2|2|2,3)(?:,(1,2|2|2,3))?(?:,(1,2|2|2,3))?$~i'
(if more options were available, just repeat the '(?:,(1,2|2|2,3))?' bit.
This does result in empty string results for "unused" matches.
It's not the cleanest solution, but works when you have to deal with badly formatted input data.
Why not just:
$ar = explode(',', $list);
print_r($ar);
From http://www.php.net/manual/en/regexp.reference.repetition.php :
When a capturing subpattern is repeated, the value captured is the substring that matched the final iteration.
Also similar thread:
How to get all captures of subgroup matches with preg_match_all()?

Using regex to not match periods between numbers

I have a regex code that splits strings between [.!?], and it works, but I'm trying to add something else to the regex code. I'm trying to make it so that it doesn't match [.] that's between numbers. Is that possible? So, like the example below:
$input = "one.two!three?4.000.";
$inputX = preg_split("~(?>[.!?]+)\K(?!$)~", $input);
print_r($inputX);
Result:
Array ( [0] => one. [1] => two! [2] => three? [3] => 4. [4] => 000. )
Need Result:
Array ( [0] => one. [1] => two! [2] => three? [3] => 4.000. )
You should be able to split on this:
(?<=(?<!\d(?=[.!?]+\d))[.!?])(?![.!?]|$)
https://regex101.com/r/kQ6zO4/1
It uses lookarounds to determine where to split. It looks behind to try to match anything in the set [.!?] one or more times as long as it isn't preceded by and succeeded by a digit.
It also won't return the last empty match by ensuring the last set isn't the end of the string.
UPDATE:
This should be much more efficient actually:
(?!\d+\.\d+).+?[.!?]+\K(?!$)
https://regex101.com/r/eN7rS8/1
Here is another possibility using regex flags:
$input = "one.two!three???4.000.";
$inputX = preg_split("~(\d+\.\d+[.!?]+|.*?[.!?]+)~", $input, -1, PREG_SPLIT_DELIM_CAPTURE | PREG_SPLIT_NO_EMPTY);
print_r($inputX);
It includes the delimiter in the split and ignores empty matches. The regex can be simplified to ((?:\d+\.\d+|.*?)[.!?]+), but I think what is in the code sample above is more efficient.

Splitting GET string

I need to split my GET string into some array. The string looks like this:
ident[0]=<IDENT_0>&value[0]=<VALUE_0>&version[0]=<VERSION_0>&....&ident[N]=<IDENT_N>&value[N]=<VALUE_N>&version[N]=<VERSION_N>
So, I need to split this string by every third ampersand character, like this:
ident[0]=<IDENT_0>&value[0]=<VALUE_0>&version[0]=<VERSION_0>
ident[1]=<IDENT_1>&value[1]=<VALUE_1>&version[1]=<VERSION_1> and so on...
How can I do it? What regular expression should I use? Or is here some better way to do it?
There is a better way (assuming this is data being sent to your PHP page, not some other thing you're dealing with).
PHP provides a "magic" array called $_GET which already has the values parsed out for you.
For example:
one=1&two=2&three=3
Would result in this array:
Array ( [one] => 1 [two] => 2 [three] => 3 )
So you could access the variables like so:
$oneValue = $_GET['one']; // answer is 1
$twoValue = $_GET['two']; // and so on
If you provide array indexes, which your example does, it'll sort those out for you as well. So, to use your example above $_GET would look like:
Array
(
[ident] => Array
(
[0] => <IDENT_0>
[N] => <IDENT_N>
)
[value] => Array
(
[0] => <VALUE_0>
[N] => <VALUE_N>
)
[version] => Array
(
[0] => <VERSION_0>
[N] => <VERSION_N>
)
)
I'd assume your N keys will actually be numbers, so you'll be able to look them up like so:
$_GET['ident'][0] // => <IDENT_0>
$_GET['value'][0] // => <VALUE_0>
$_GET['version'][0] // => <VERSION_0>
You could loop across them all or whatever, and you will never have to worry about splitting them all out yourself.
Hope it helps you.
You can use preg_split with this pattern: &(?=ident)
$result = preg_split('~&(?=ident)~', $yourstring);
regex detail: &(?=ident) means & followed by ident
(?=..) is a lookahead assertion that performs only a check but match nothing.
Or using preg_match_all:
preg_match_all('~(?<=^|&)[^&]+&[^&]+&[^&]+(?=&|$)~', $yourstring, &matches);
$result = $matches[0];
pattern detail: (?<=..) is a lookbehind assertion
(?<=^|&) means preceded by the begining of the string ^ or an ampersand.
[^&]+ means all characters except the ampersand one or more times.
(?=&|$) means followed by an ampersand or the end of the string $.
Or you can use explode, and then a for loop:
$items = explode('&', $yourstring);
for ( $i=0; $i<sizeof($items); $i += 3 ) {
$result[] = implode('&', array_slice($items, $i, 3));
}

Unexpected result with very simple regexp

I am fairly new to regexp and have encountered a regexp that delivers an unexpected result, when trying to match name parts in name of the form firstname-fristname firstname:
preg_match_all('/([^- ])*/i', 'aNNA-äöå Åsa', $result);
gives a print_r($result) that looks like this:
Array
(
[0] => Array
(
[0] => aNNA
[1] =>
[2] => äöå
[3] =>
[4] => Åsa
[5] =>
)
[1] => Array
(
[0] => A
[1] =>
[2] => å
[3] =>
[4] => a
[5] =>
)
)
Now the $result[0] has the items I would want and expect as result, but where the heck do the $results[1] come from - I see it's the word endings, but how come they are matched?
And as a little side question, how do I prevent the empty matches ($results[0][1], $results[0][3], ...), or better even: Why do they show up - they are not not- or not-space either?
Have a try with:
preg_match_all('/([^- ]+)/', 'aNNA-äöå Åsa', $result);
Your regex:
/([^- ])*/i
means: find one char that is not ^ or space and keep it in a group 0 or more times
This one:
/([^- ]+)/
means: find one or more char that is not ^ or space and keep it in a group
Moreover, there's no need for case insensitive.
The * means "0 or more of the preceding." Since a "-" is exactly 0 of the the character class, it is matched. However, since it is omitted from the character class, the capture fails to grab anything, leaving you an empty entry. The expression giving you the expected behavior would be:
preg_match_all('/([^- ])+/i', 'aNNA-äöå Åsa', $result);
("+" means "1 or more of the preceding.")
http://php.net/manual/en/function.preg-match-all.php says:
Orders results so that $matches[0] is an array of full pattern
matches, $matches[1] is an array of strings matched by the first
parenthesized subpattern, and so on.
Check the URL for more details

Categories