Regex - how to split string by commas, omitting commas in brackets - php

I have a string, say:
$str = "myTemplate, testArr => [1868,1869,1870], testInteger => 3, testString => 'test, can contain a comma'"
It basically represents a comma delimited list of parameters I need to parse.
I need to split this string in PHP (probably using preg_match_all) by commas (but omitting those in brackets and quotes) so the end result would be array of the following four matches:
myTemplate
testArr => [1868,1869,1870]
testInteger => 3
testString => 'test, can contain a comma'
The problem is with the array and string values. So any commas inside [ ] or ' ' or " " should not be considered as a delimiter.
There are many similar questions here, but I wasn't able to get it working for this particular situation. What would be the correct regex to get this result? Thank you!

You can use this lookaround based regex:
$str = "myTemplate, testArr => [1868,1869,1870], testInteger => 3, testString => 'test, can contain a comma'";
$arr = preg_split("/\s*,\s*(?![^][]*\])(?=(?:(?:[^']*'){2})*[^']*$)/", $str);
print_r( $arr );
There are 2 lookarounds used in this regex:
(?![^][]*\]) - Asserts comma is not inside [...]
(?=(?:(?:[^']*'){2})*[^']*$) - Asserts comma is not inside '...'
PS: This is assuming we don't have unbalanced/nested/escaped quotes and brackets.
RegEx Demo
Output:
Array
(
[0] => myTemplate
[1] => testArr => [1868,1869,1870]
[2] => testInteger => 3
[3] => testString => 'test, can contain a comma'
)

I wound do it like this:
<?php
$str = "myTemplate, testArr => [1868,1869,1870], testInteger => 3, testString => 'test, can contain a comma'";
$pattern[0] = "[a-zA-Z]+,"; // textonly entry
$pattern[1] = "\w+\s*?=>\s*\[.*\]\s*,?"; // array type entry with value enclosed in square brackets
$pattern[2] = "\w+\s*?=>\s*\d+\s*,?"; // array type entry with decimal value
$pattern[3] = "\w+\s*?=>\s*\'.*\'\s*,?"; // array type entry with string value
$regex = implode('|', $pattern);
preg_match_all("/$regex/", $str, $matches);
// You can also use the one liner commented below if you dont like to use the array
//preg_match_all("/[a-zA-Z]+,|\w+\s*?=>\s*\[.*\]\s*,?|\w+\s*?=>\s*\d+\s*,?|\w+\s*?=>\s*\'.*\'\s*,?/", $str, $matches);
print_r($matches);
This is easier to manage and I can easily add/remove patterns if needed. It will output like
Array
(
[0] => Array
(
[0] => myTemplate,
[1] => testArr => [1868,1869,1870],
[2] => testInteger => 3,
[3] => testString => 'test, can contain a comma'
)
)

Related

Find all occurrences of a "unknown" substring in a string with PHP

I have a string and I need to find all occurrences of some substrings in it but I know only initials chars of substrings... Ho can I do?
Example:
$my_string = "This is a text cointaining [substring_aaa attr], [substring_bbb attr] and [substring], [substring], [substring] and I'll try to find them!";
I know all substrings begin with '[substring' and end with a space char (before attr) or ']' char, so in this example I need to find substring_aaa, substring_bbb and substring and count how many occurrences for each one of them.
The result would be an associative array with the substrings as keys and occurrerrences as values, example:
$result = array(
'substring' => 3,
'substring_aaa' => 1,
'substring_bbb' => 1
)
Match [substring and then NOT ] zero or more times and then a ]:
preg_match_all('/\[(substring[^\]]*)\]/', $my_string, $matches);
$matches[1] will yield:
Array
(
[0] => substring_aaa attr
[1] => substring_bbb attr
[2] => substring
[3] => substring
[4] => substring
)
Then you can count the values:
$result = array_count_values($matches[1]);
After rereading the question, if you don't want what comes after a space (attr in this case) then:
preg_match_all('/\[(substring[^\]\s]*)[\]\s]/', $my_string, $matches);
For which $matches[1] will yield:
Array
(
[0] => substring_aaa
[1] => substring_bbb
[2] => substring
[3] => substring
[4] => substring
)
With the array_count_values yielding:
Array
(
[substring_aaa] => 1
[substring_bbb] => 1
[substring] => 3
)

Flatten array of regular expressions

I have an array of regular expressions -$toks:
Array
(
[0] => /(?=\D*\d)/
[1] => /\b(waiting)\b/i
[2] => /^(\w+)/
[3] => /\b(responce)\b/i
[4] => /\b(from)\b/i
[5] => /\|/
[6] => /\b(to)\b/i
)
When I'm trying to flatten it:
$patterns_flattened = implode('|', $toks);
I get a regex:
/(?=\D*\d)/|/\b(waiting)\b/i|/^(\w+)/|/\b(responce)\b/i|/\b(from)\b/i|/\|/|/\b(to)\b/i
When I'm trying to:
if (preg_match('/'. $patterns_flattened .'/', 'I'm waiting for a response from', $matches)) {
print_r($matches);
}
I get an error:
Warning: preg_match(): Unknown modifier '(' in ...index.php on line
Where is my mistake?
Thanks.
You need to remove the opening and closing slashes, like this:
$toks = [
'(?=\D*\d)',
'\b(waiting)\b',
'^(\w+)',
'\b(response)\b',
'\b(from)\b',
'\|',
'\b(to)\b',
];
And then, I think you'll want to use preg_match_all instead of preg_match:
$patterns_flattened = implode('|', $toks);
if (preg_match_all("/$patterns_flattened/i", "I'm waiting for a response from", $matches)) {
print_r($matches[0]);
}
If you get the first element instead of all elements, it'll return the whole matches of each regex:
Array
(
[0] => I
[1] => waiting
[2] => response
[3] => from
)
Try it on 3v41.org
<?php
$data = Array
(
0 => '/(?=\D*\d)/',
1 => '/\b(waiting)\b/i',
2 => '/^(\w+)/',
3 => '/\b(responce)\b/i',
4 => '/\b(from)\b/i',
5 => '/\|/',
6 => '/\b(to)\b/i/'
);
$patterns_flattened = implode('|', $data);
$regex = str_replace("/i",'',$patterns_flattened);
$regex = str_replace('/','',$regex);
if (preg_match_all( '/'.$regex.'/', "I'm waiting for a responce from", $matches)) {
echo '<pre>';
print_r($matches[0]);
}
You have to remove the slashes from your regex and also the i parameter in order to make it work. That was the reason it was breaking.
A really nice tool to actually validate your regex is this :
https://regexr.com/
I always use that when i have to make a bigger than usual regular expression.
The output of the above code is :
Array
(
[0] => I
[1] => waiting
[2] => responce
[3] => from
)
There are a few adjustments to make with your $tok array.
To remove the error, you need to remove the pattern delimiters and pattern modifiers from each array element.
None of the capture grouping is necessary, in fact, it will lead to a higher step count and create unnecessary output array bloat.
Whatever your intention is with (?=\D*\d), it needs a rethink. If there is a number anywhere in your input string, you are potentially going to generate lots of empty elements which surely can't have any benefit for your project. Look at what happens when I put a space then 1 after from in your input string.
Here is my recommendation: (PHP Demo)
$toks = [
'\bwaiting\b',
'^\w+',
'\bresponse\b',
'\bfrom\b',
'\|',
'\bto\b',
];
$pattern = '/' . implode('|', $toks) . '/i';
var_export(preg_match_all($pattern, "I'm waiting for a response from", $out) ? $out[0] : null);
Output:
array (
0 => 'I',
1 => 'waiting',
2 => 'response',
3 => 'from',
)

Most elegant way to clean a string into only comma separated numerals

After instructing clients to input only
number comma number comma number
(no set length, but generally < 10), the results of their input have been, erm, unpredictable.
Given the following example input:
3,6 ,bannana,5,,*,
How could I most simply, and reliably end up with:
3,6,5
So far I am trying a combination:
$test= trim($test,","); //Remove any leading or trailing commas
$test= preg_replace('/\s+/', '', $test);; //Remove any whitespace
$test= preg_replace("/[^0-9]/", ",", $test); //Replace any non-number with a comma
But before I keep throwing things at it...is there an elegant way, probably from a regex boffin!
In a purely abstract sense this is what I'd do:
$test = array_filter(array_map('trim',explode(",",$test)),'is_numeric')
Example:
http://sandbox.onlinephpfunctions.com/code/753f4a833e8ff07cd9c7bd780708f7aafd20d01d
<?php
$str = '3,6 ,bannana,5,,*,';
$str = explode(',', $str);
$newArray = array_map(function($val){
return is_numeric(trim($val)) ? trim($val) : '';
}, $str);
print_r(array_filter($newArray)); // <-- this will give you array
echo implode(',',array_filter($newArray)); // <--- this give you string
?>
Here's an example using regex,
$string = '3,6 ,bannana,5,-6,*,';
preg_match_all('#(-?[0-9]+)#',$string,$matches);
print_r($matches);
will output
Array
(
[0] => Array
(
[0] => 3
[1] => 6
[2] => 5
[3] => -6
)
[1] => Array
(
[0] => 3
[1] => 6
[2] => 5
[3] => -6
)
)
Use $matches[0] and you should be on your way.
If you don't need negative numbers just remove the first bit in the in the regex rule.

PHP: split a string of alternating groups of characters into an array

I have a string whose correct syntax is the regex ^([0-9]+[abc])+$. So examples of valid strings would be: '1a2b' or '00333b1119a555a0c'
For clarity, the string is a list of (value, letter) pairs and the order matters. I'm stuck with the input string so I can't change that. While testing for correct syntax seems easy in principle with the above regex, I'm trying to think of the most efficient way in PHP to transform a compliant string into a usable array something like this:
Input:
'00333b1119a555a0c'
Output:
array (
0 => array('num' => '00333', 'let' => 'b'),
1 => array('num' => '1119', 'let' => 'a'),
2 => array('num' => '555', 'let' => 'a'),
3 => array('num' => '0', 'let' => 'c')
)
I'm having difficulty using preg_match for this. For example this doesn't give the expected result, the intent being to greedy-match on EITHER \d+ (and save that) OR [abc] (and save that), repeated until end of string reached.
$text = '00b000b0b';
$out = array();
$x = preg_match("/^(?:(\d+|[abc]))+$/", $text, $out);
This didn't work either, the intent here being to greedy-match on \d+[abc] (and save these), repeated until end of string reached, and split them into numbers and letter afterwards.
$text = '00b000b0b';
$out = array();
$x = preg_match("/^(?:\d+[abc])+$/", $text, $out);
I'd planned to check syntax as part of the preg_match, then use the preg_match output to greedy-match the 'blocks' (or keep the delimiters if using preg_split), then if needed loop through the result 2 items at a time using for (...; i+=2) to extract value-letter in their pairs.
But I can't seem to even get that basic preg_split() or preg_match() approach to work smoothly, much less explore if there's a 'neater' or more efficient way.
Your regex needs a few matching groups
/([0-9]+?)([a-z])/i
This means match all numbers in one group, and all letters in another. Preg match all gets all matches.
The key to the regex is the non greedy flag ? which matches the shortest possible string.
match[0] is the whole match
match[1] is the first match group (the numbers)
match[2] is the second match group (the letter)
example below
<?php
$input = '00333b1119a555a0c';
$regex = '/([0-9]+?)([a-z])/i';
$out = [];
$parsed = [];
if (preg_match_all($regex, $input, $out)) {
foreach ($out[0] as $index => $value) {
$parsed[] = [
'num' => $out[1][$index],
'let' => $out[2][$index],
];
}
}
var_dump($parsed);
output
array(4) {
[0] =>
array(2) {
'num' =>
string(5) "00333"
'let' =>
string(1) "b"
}
[1] =>
array(2) {
'num' =>
string(4) "1119"
'let' =>
string(1) "a"
}
[2] =>
array(2) {
'num' =>
string(3) "555"
'let' =>
string(1) "a"
}
[3] =>
array(2) {
'num' =>
string(1) "0"
'let' =>
string(1) "c"
}
}
Simple solution with preg_match_all(with PREG_SET_ORDER flag) and array_map functions:
$input = '00333b1119a555a0c';
preg_match_all('/([0-9]+?)([a-z]+?)/i', $input, $matches, PREG_SET_ORDER);
$result = array_map(function($v) {
return ['num' => $v[1], 'let' => $v[2]];
}, $matches);
print_r($result);
The output:
Array
(
[0] => Array
(
[num] => 00333
[let] => b
)
[1] => Array
(
[num] => 1119
[let] => a
)
[2] => Array
(
[num] => 555
[let] => a
)
[3] => Array
(
[num] => 0
[let] => c
)
)
You can use:
$str = '00333b1119a555a0c';
$arr=array();
if (preg_match_all('/(\d+)(\p{L}+)/', $str, $m)) {
array_walk( $m[1], function ($v, $k) use(&$arr, $m ) {
$arr[] = [ 'num'=>$v, 'let'=>$m[2][$k] ]; });
}
print_r($arr);
Output:
Array
(
[0] => Array
(
[num] => 00333
[let] => b
)
[1] => Array
(
[num] => 1119
[let] => a
)
[2] => Array
(
[num] => 555
[let] => a
)
[3] => Array
(
[num] => 0
[let] => c
)
)
All of the above work. But they didn't seem to have the elegance I wanted - they needed to loop, use array mapping, or (for preg_match_all()) they needed another almost identical regex as well, just to verify the string matched the regex.
I eventually found that preg_match_all() combined with named captures solved it for me. I hadn't used named captures for that purpose before and it looks powerful.
I also added an optional extra step to simplify the output if dups aren't expected (which wasn't in the question but may help someone).
$input = '00333b1119a555a0c';
preg_match_all("/(?P<num>\d+)(?P<let>[dhm])/", $input, $raw_matches, PREG_SET_ORDER);
print_r($raw_matches);
// if dups not expected this is also worth doing
$matches = array_column($raw_matches, 'num', 'let');
print_r($matches);
More complete version with input+duplicate checking
$input = '00333b1119a555a0c';
if (!preg_match("/^(\d+[abc])+$/",$input)) {
// OPTIONAL: detected $input incorrectly formatted
}
preg_match_all("/(?P<num>\d+)(?P<let>[dhm])/", $input, $raw_matches, PREG_SET_ORDER);
$matches = array_column($raw_matches, 'num', 'let');
if (count($matches) != count($raw_matches)) {
// OPTIONAL: detected duplicate letters in $input
}
print_r($matches);
Explanation:
This uses preg_match_all() as suggested by #RomanPerekhrest and #exussum to break out the individual groups and split the numbers and letters. I used named groups so that the resulting array of $raw_matches is created with the correct names already.
But if dups arent expected, then I used an extra step with array_column(), which directly extracts data from a nested array of entries and creates a desired flat array, without any need for loops, mapping, walking, or assigning item by item: from
(group1 => (num1, let1), group2 => (num2, let2), ... )
to the "flat" array:
(let1 => num1, let2 => num2, ... )
If named regex matches feels too advanced then they can be ignored - the matches will be given numbers anyway and this will work just as well, you would have to manually assign letters and it's just harder to follow.
preg_match_all("/(\d+)([dhm])/", $input, $raw_matches, PREG_SET_ORDER);
$matches = array_column($raw_matches, 1, 2);
If you need to check for duplicated letters (which wasn't in the question but could be useful), here's how: If the original matches contained >1 entry for any letter then when array_column() is used this letter becomes a key for the new array, and duplicate keys can't exist. Only one entry for each letter gets kept. So we just test whether the number of matches originally found, is the same as the number of matches in the final array after array_coulmn. If not, there were duplicates.

Split a single string into an array using specific Regex rules

I'm processing a single string which contains many pairs of data. Each pair is separated by a ; sign. Each pair contains a number and a string, separated by an = sign.
I thought it would be easy to process, but i've found that the string half of the pair can contain the = and ; sign, making simple splitting unreliable.
Here is an example of a problematic string:
123=one; two;45=three=four;6=five;
For this to be processed correctly I need to split it up into an array that looks like this:
'123', 'one; two'
'45', 'three=four'
'6', 'five'
I'm at a bit of dead end so any help is appreciated.
UPDATE:
Thanks to everyone for the help, this is where I am so far:
$input = '123=east; 456=west';
// split matches into array
preg_match_all('~(\d+)=(.*?);(?=\s*(?:\d|$))~', $input, $matches);
$newArray = array();
// extract the relevant data
for ($i = 0; $i < count($matches[2]); $i++) {
$type = $matches[2][$i];
$price = $matches[1][$i];
// add each key-value pair to the new array
$newArray[$i] = array(
'type' => "$type",
'price' => "$price"
);
}
Which outputs
Array
(
[0] => Array
(
[type] => east
[price] => 123
)
)
The second item is missing as it doesn't have a semicolon on the end, i'm not sure how to fix that.
I've now realised that the numeric part of the pair sometimes contains a decimal point, and that the last string pair does not have a semicolon after it. Any hints would be appreciated as i'm not having much luck.
Here is the updated string taking into account the things I missed in my initial question (sorry):
12.30=one; two;45=three=four;600.00=five
You need a look-ahead assertion for this; the look-ahead matches if a ; is followed by a digit or the end of your string:
$s = '12.30=one; two;45=three=four;600.00=five';
preg_match_all('/(\d+(?:.\d+)?)=(.+?)(?=(;\d|$))/', $s, $matches);
print_r(array_combine($matches[1], $matches[2]));
Output:
Array
(
[12.30] => one; two
[45] => three=four
[600.00] => five
)
I think this is the regex you want:
\s*(\d+)\s*=(.*?);(?=\s*(?:\d|$))
The trick is to consider only the semicolon that's followed by a digit as the end of a match. That's what the lookahead at the end is for.
You can see a detailed visualization on www.debuggex.com.
You can use following preg_match_all code to capture that:
$str = '123=one; two;45=three=four;6=five;';
if (preg_match_all('~(\d+)=(.+?);(?=\d|$)~', $str, $arr))
print_r($arr);
Live Demo: http://ideone.com/MG3BaO
$str = '123=one; two;45=three=four;6=five;';
preg_match_all('/(\d+)=([a-zA-z ;=]+)/', $str,$matches);
echo '<pre>';
print_r($matches);
echo '</pre>';
o/p:
Array
(
[0] => Array
(
[0] => 123=one; two;
[1] => 45=three=four;
[2] => 6=five;
)
[1] => Array
(
[0] => 123
[1] => 45
[2] => 6
)
[2] => Array
(
[0] => one; two;
[1] => three=four;
[2] => five;
)
)
then y can combine
echo '<pre>';
print_r(array_combine($matches[1],$matches[2]));
echo '</pre>';
o/p:
Array
(
[123] => one; two;
[45] => three=four;
[6] => five;
)
Try this but this code is written in c#, you can change it into php
string[] res = Regex.Split("123=one; two;45=three=four;6=five;", #";(?=\d)");
--SJ

Categories