php preg_split ignore duplicate delimiters - php

I want to split strings produced by an older version of phpstan we are constrained to use (v0.9).
Each error string is separated by :, but there are sometimes static calls marked with :: which I want to ignore.
My code:
$error = '/path/to/file/namespace/filename:line_number:error message Namespace\ClassName::method().'
$output = preg_split('/:/', $error);
A var_dump of $output gives this:
Array
(
[0] => /path/to/file/namespace/filename
[1] => line_number
[2] => error message Namespace\ClassName
[3] =>
[4] => method().
)
The result I want is this:
Array
(
[0] => /path/to/file/namespace/filename
[1] => line_number
[2] => error message Namespace\ClassName::method().
)
I was hoping this could be solved with regex.
I have been reading similar questions and have tried variations of regex, none of which worked.

You can use lookahead and lookbehind for your split:
$error = '/path/to/file/namespace/filename:line_number:error message Namespace\ClassName::method().';
$arr = preg_split('/(?<!:):(?!:)/', $error, -1, PREG_SPLIT_NO_EMPTY);
print_r($arr);
Array
(
[0] => /path/to/file/namespace/filename
[1] => line_number
[2] => error message Namespace\ClassName::method().
)
RegEx Demo
RegEx Details:
(?<!:): Negative lookbehind to fail the match if there is a : behind
:: Match a :
(?!:): Negative lookahead to fail the match if there is a : ahead

Another option is to match 2 or more occurrences of : and use (*SKIP)(*F). Then match a single : to split on.
:{2,}(*SKIP)(*F)|:
Explanation
:{2,}(*SKIP)(*F) Match 2 or more occurrences of :, then skip all currently matched chars
| Or
: Match a single :
Regex demo | Php demo
$error = '/path/to/file/namespace/filename:line_number:error message Namespace\ClassName::method().';
$output = preg_split('/:{2,}(*SKIP)(*F)|:/', $error);
print_r($output);
Output
Array
(
[0] => /path/to/file/namespace/filename
[1] => line_number
[2] => error message Namespace\ClassName::method().
)

Using preg_match_all (sometimes more simple to split):
preg_match_all('~[^:]+(?>::[^:]*)*~', $error, $matches);
print_r($matches[0]);

Related

Flatten array of regular expressions

I have an array of regular expressions -$toks:
Array
(
[0] => /(?=\D*\d)/
[1] => /\b(waiting)\b/i
[2] => /^(\w+)/
[3] => /\b(responce)\b/i
[4] => /\b(from)\b/i
[5] => /\|/
[6] => /\b(to)\b/i
)
When I'm trying to flatten it:
$patterns_flattened = implode('|', $toks);
I get a regex:
/(?=\D*\d)/|/\b(waiting)\b/i|/^(\w+)/|/\b(responce)\b/i|/\b(from)\b/i|/\|/|/\b(to)\b/i
When I'm trying to:
if (preg_match('/'. $patterns_flattened .'/', 'I'm waiting for a response from', $matches)) {
print_r($matches);
}
I get an error:
Warning: preg_match(): Unknown modifier '(' in ...index.php on line
Where is my mistake?
Thanks.
You need to remove the opening and closing slashes, like this:
$toks = [
'(?=\D*\d)',
'\b(waiting)\b',
'^(\w+)',
'\b(response)\b',
'\b(from)\b',
'\|',
'\b(to)\b',
];
And then, I think you'll want to use preg_match_all instead of preg_match:
$patterns_flattened = implode('|', $toks);
if (preg_match_all("/$patterns_flattened/i", "I'm waiting for a response from", $matches)) {
print_r($matches[0]);
}
If you get the first element instead of all elements, it'll return the whole matches of each regex:
Array
(
[0] => I
[1] => waiting
[2] => response
[3] => from
)
Try it on 3v41.org
<?php
$data = Array
(
0 => '/(?=\D*\d)/',
1 => '/\b(waiting)\b/i',
2 => '/^(\w+)/',
3 => '/\b(responce)\b/i',
4 => '/\b(from)\b/i',
5 => '/\|/',
6 => '/\b(to)\b/i/'
);
$patterns_flattened = implode('|', $data);
$regex = str_replace("/i",'',$patterns_flattened);
$regex = str_replace('/','',$regex);
if (preg_match_all( '/'.$regex.'/', "I'm waiting for a responce from", $matches)) {
echo '<pre>';
print_r($matches[0]);
}
You have to remove the slashes from your regex and also the i parameter in order to make it work. That was the reason it was breaking.
A really nice tool to actually validate your regex is this :
https://regexr.com/
I always use that when i have to make a bigger than usual regular expression.
The output of the above code is :
Array
(
[0] => I
[1] => waiting
[2] => responce
[3] => from
)
There are a few adjustments to make with your $tok array.
To remove the error, you need to remove the pattern delimiters and pattern modifiers from each array element.
None of the capture grouping is necessary, in fact, it will lead to a higher step count and create unnecessary output array bloat.
Whatever your intention is with (?=\D*\d), it needs a rethink. If there is a number anywhere in your input string, you are potentially going to generate lots of empty elements which surely can't have any benefit for your project. Look at what happens when I put a space then 1 after from in your input string.
Here is my recommendation: (PHP Demo)
$toks = [
'\bwaiting\b',
'^\w+',
'\bresponse\b',
'\bfrom\b',
'\|',
'\bto\b',
];
$pattern = '/' . implode('|', $toks) . '/i';
var_export(preg_match_all($pattern, "I'm waiting for a response from", $out) ? $out[0] : null);
Output:
array (
0 => 'I',
1 => 'waiting',
2 => 'response',
3 => 'from',
)

Strange behavior of preg_match_all php

I have a very long string of html. From this string I want to parse pairs of rus and eng names of cities. Example of this string is:
$html = '
Абакан
Хакасия республика
Абан
Красноярский край
Абатский
Тюменская область
';
My code is:
$subject = $this->html;
$pattern = '/<a href="([\/a-zA-Z0-9-"]*)">([а-яА-Я]*)/';
preg_match_all($pattern, $subject, $matches);
For trying I use regexer . You can see it here http://regexr.com/399co
On the test used global modifier - /g
Because of in PHP we can't use /g modifier I use preg_match_all function. But result of preg_match_all is very strange:
Array
(
[0] => Array
(
[0] => <a href="/forecasts5000/russia/republic-khakassia/abakan">Абакан
[1] => <a href="/forecasts5000/russia/krasnoyarsk-territory/aban">Абан
[2] => <a href="/forecasts5000/russia/tyumen-area/abatskij">Аба�
[3] => <a href="/forecasts5000/russia/arkhangelsk-area/abramovskij-ma">Аб�
)
[1] => Array
(
[0] => /forecasts5000/russia/republic-khakassia/abakan
[1] => /forecasts5000/russia/krasnoyarsk-territory/aban
[2] => /forecasts5000/russia/tyumen-area/abatskij
[3] => /forecasts5000/russia/arkhangelsk-area/abramovskij-ma
)
[2] => Array
(
[0] => Абакан
[1] => Абан
[2] => Аба�
[3] => Аб�
)
)
First of all - it found only first match (but I need to get array with all matches)
The second - result is very strange for me. I want to get the next result:
pairs of /forecasts5000/russia/republic-khakassia/abakan and Абакан
What do I do wrong?
Element 0 of the result is an array of each of the full matches of the regexp. Element 1 is an array of all the matches for capture group 1, element 2 contains capture group 2, and so on.
You can invert this by using the PREG_SET_ORDER flag. Then element 0 will contain all the results from the first match, element 1 will contain all the results from the second match, and so on. Within each of these, [0] will be the full match, and the remaining elements will be the capture groups.
If you use this option, you can then get the information you want with:
foreach ($matches as $match) {
$url = $match[1];
$text = $match[2];
// Do something with $url and $text
}
You can also use T-Regx library which has separate methods for each case :)
pattern('<a href="([/a-zA-Z0-9-"]*)">([а-яА-Я]*)')
->match($this->html)
->forEach(function (Match $match) {
$match = $match->text();
$group = $match->group(1);
echo "Match $match with group $group"
});
I also has automatic delimiters

How to get a particular string using preg_replace?

i want to get a particular value from string in php. Following is the string
$string = 'users://data01=[1,2]/data02=[2,3]/*';
preg_replace('/(.*)\[(.*)\](.*)\[(.*)\](.*)/', '$2', $str);
i want to get value of data01. i mean [1,2].
How can i achieve this using preg_replace?
How can solve this ?
preg_replace() is the wrong tool, I have used preg_match_all() in case you need that other item later and trimmed down your regex to capture the part of the string you are looking for.
$string = 'users://data01=[1,2]/data02=[2,3]/*';
preg_match_all('/\[([0-9,]+)\]/',$string,$match);
print_r($match);
/*
print_r($match) output:
Array
(
[0] => Array
(
[0] => [1,2]
[1] => [2,3]
)
[1] => Array
(
[0] => 1,2
[1] => 2,3
)
)
*/
echo "Your match: " . $match[1][0];
?>
This enables you to have the captured characters or the matched pattern , so you can have [1,2] or just 1,2
preg_replace is used to replace by regular expression!
I think you want to use preg_match_all() to get each data attribute from the string.
The regex you want is:
$string = 'users://data01=[1,2]/data02=[2,3]/*';
preg_match_all('#data[0-9]{2}=(\[[0-9,]+\])#',$string,$matches);
print_r($matches);
Array
(
[0] => Array
(
[0] => data01=[1,2]
[1] => data02=[2,3]
)
[1] => Array
(
[0] => [1,2]
[1] => [2,3]
)
)
I have tested this as working.
preg_replace is for replacing stuff. preg_match is for extracting stuff.
So you want:
preg_match('/(.*?)\[(.*?)\](.*?)\[(.*?)\](.*)/', $str, $match);
var_dump($match);
See what you get, and work from there.

Split a string while keeping delimiters and string outside

I'm trying to do something that must be really simple, but I'm fairly new to PHP and I'm struggling with this one. What I want is to split a string containing 0, 1 or more delimiters (braces), while keeping the delimiters AND the string between AND the string outside.
ex: 'Hello {F}{N}, how are you?' would output :
Array ( [0] => Hello
[1] => {F}
[2] => {N}
[3] => , how are you? )
Here's my code so far:
$value = 'Hello {F}{N}, how are you?';
$array= preg_split('/[\{\}]/', $value,-1, PREG_SPLIT_DELIM_CAPTURE | PREG_SPLIT_NO_EMPTY);
print_r($array);
which outputs (missing braces) :
Array ( [0] => Hello
[1] => F
[2] => N
[3] => , how are you? )
I also tried :
preg_match_all('/\{[^}]+\}/', $myValue, $array);
Which outputs (braces are there, but the text outside is flushed) :
Array ( [0] => {F}
[1] => {N} )
I'm pretty sure I'm on the good track with preg_split, but with the wrong regex. Can anyone help me with this? Or tell me if I'm way off?
You aren't capturing the delimiters. Add them to a capturing group:
/(\{.*?\})/
You need parentheses around the part of the expression to be captured:
preg_split('/(\{[^}]+\})/', $myValue, -1, PREG_SPLIT_DELIM_CAPTURE);
See the documentation for preg_split().

PHP preg_match_all not finding first match

I am trying to find all matches in a string. For some reason if my match is at the start of the string it is not returning that particular match. Does it have something to do with index 0? I am also using PREG_OFFSET_CAPTURE to get the indexes vs. the matches. Below is the code of working an non-working.
$text = '[QUOTE]I wonder why[QUOTE]PHP[IMG]hates me[/IMG][/QUOTE][/QUOTE][URL="http://www.bing.com"]Click me![QUOTE]........[/QUOTE]Ok Bai![/URL]';
preg_match_all('#\[QUOTE\]#', $text, $matches, PREG_OFFSET_CAPTURE, PREG_PATTERN_ORDER);
print_r($matches);
The result of which is:
Array ( [0] => Array ( [0] => Array ( [0] => [QUOTE] [1] => 19 ) [1] => Array ( [0] => [QUOTE] [1] => 100 ) ) )
As you can see it only found two matches. If I add a character to the start of the string it will then find all three.
$text = 'a[QUOTE]I wonder why[QUOTE]PHP[IMG]hates me[/IMG][/QUOTE][/QUOTE][URL="http://www.bing.com"]Click me![QUOTE]........[/QUOTE]Ok Bai![/URL]';
preg_match_all('#\[QUOTE\]#', $text, $matches, PREG_OFFSET_CAPTURE, PREG_PATTERN_ORDER);
print_r($matches);
The result of which is:
Array ( [0] => Array ( [0] => Array ( [0] => [QUOTE] [1] => 1 ) [1] => Array ( [0] => [QUOTE] [1] => 20 ) [2] => Array ( [0] => [QUOTE] [1] => 101 ) ) )
All three matches. If anyone can help me figure out if my REGEX needs to be modified or if there is some quirk I'm unaware of it would be much appreciated. I've tried this same thing utilizing Python and the re library and it returns all my matches. I also utilized this http://www.regextester.com/ and it reports it as working in both scenarios and matching everything as it should. My only guess is something to do with the PREG_OFFSET_CAPTURE finding a match at position 0 and the 0 causing some issue.
Thanks in advance for any assistance!
The correct way to add multiple flags is with a pipe |, so:
preg_match_all('#\[QUOTE\]#', $text, $matches, PREG_OFFSET_CAPTURE | PREG_PATTERN_ORDER);
Your , before PREG_PATTERN_ORDER means it becomes the 'offset' parameter (at which point in the string to start), and as PREG_PATTERN_ORDER==1, it starts at the second character.
The problem is in your function call:
preg_match_all('#\[QUOTE\]#', $text, $matches, PREG_OFFSET_CAPTURE, PREG_PATTERN_ORDER);
The fifth parameter is the offset, not another flag.

Categories