Regular expression needed for PHP preg_split - php

I need help with a regular expression in PHP.
I have one string containing a lot of data and the format could be like this.
key=value,e4354ahj\,=awet3,asdfa\=asdfa=23f23
So I have 2 delimiters , and = where , is the set of key and value. The thing is that key and value can contain the same symbols , and = but they will always be escaped. So I cant use explode. I need to use preg_split but I am no good at regular expressions.
Could someone give me a hand with this one?

You need to use negative lookbehind:
// 4 backslashes because they are in a PHP string, so PHP translates them to \\
// and then the regex engine translates the \\ to a literal \
$keyValuePairs = preg_split('/(?<!\\\\),/', $input);
This will split on every , that is not escaped, so you get key-value pairs. You can do the same for each pair to separate the key and value:
list($key, $value) = preg_split('/(?<!\\\\)=/', $pair);
See it in action.

#Jon's answer is awesome. I though of providing a solution by matching the string:
preg_match_all('#(.*?)(?<!\\\\)=(.*?)(?:(?<!\\\\),|$)#', $string, $m);
// You'll find the keys in $m[1] and the values in $m[2]
$array = array_combine($m[1], $m[2]);
print_r($array);
Output:
Array
(
[key] => value
[e4354ahj\,] => awet3
[asdfa\=asdfa] => 23f23
)
Explanation:
(.*?)(?<!\\\\)= : match anything and group it until = not preceded by \
(.*?)(?:(?<!\\\\),|$) : match anything and group it until , not preceded by \ or end of line.

Related

php RegEx extract values from string

I am new to regular expressions and I am trying to extract some specific values from this string:
"Iban: EU4320000713864374\r\nSwift: DTEADCCC\r\nreg.no 2361 \r\naccount no. 1234531735"
Values that I am trying to extract:
EU4320000713864374
2361
This is what I am trying to do now:
preg_match('/[^Iban: ](?<iban>.*)[^\\r\\nreg.no ](?<regnr>.*)[^\\r\\n]/',$str,$matches);
All I am getting back is null or empty array. Any suggestions would be highly appreciated
The square brackets make no sense, you perhaps meant to anchor at the beginning of a line:
$result = preg_match(
'/^Iban: (?<iban>.*)\R.*\R^reg.no (?<regnr>.*)/m'
, $str, $matches
);
This requires to set the multi-line modifier (see m at the very end). I also replaced \r\n with \R so that this handles all kind of line-separator sequences easily.
Example: https://eval.in/47062
A slightly better variant then only captures non-whitespace values:
$result = preg_match(
'/^Iban: (?<iban>\S*)\R.*\R^reg.no (?<regnr>\S*)/m'
, $str, $matches
);
Example: https://eval.in/47069
Result then is (beautified):
Array
(
[0] => "Iban: EU4320000713864374
Swift: DTEADCCC
reg.no 2361"
[iban] => "EU4320000713864374"
[1] => "EU4320000713864374"
[regnr] => "2361"
[2] => "2361"
)
preg_match("/Iban: (\\S+).*reg.no (\\S+)/s", $str, $matches);
There is a specific feature about newlines: dot (.) does not match newline character unless s flag is specified.

Php regexp for escaping characters

I have a string that the user may split manually using comma's.
For example, the string value1,value2,value3 should result in the array:
["value1", "value2", "value3"]
Now what if the user wishes to allow a comma as a substring? I would like to solve that problem by letting the user escape a comma using two comma's or a backslash. For example, the string
"Hi, Stackoverflow" would be written as "Hi,, Stackoverflow" or "Hi\, Stackoverflow".
I find it difficult to evaluate such a string however. I have attempted preg splitting, but there is no way to see if a lookbehind or lookahead series of characters consists of an even or odd number. Furthermore, backslashes and double comma's meant for escaping must be removed as well, which probably requires an additional replace function.
$text = 'Hello, World \,asdas, 123';
$data = preg_split('/(?<=[^\\\]),/',$text);
print_r($data);
Result
Array ( [0] => Hello [1] => World \,asdas [2] => 123 )
For this I would run preg_replace_callback which allows you to count escape characters used and determine what to do with them. If it turns out that coma is not escaped, replace it to some non-printable character that should not be used by user in his input and then explode by this character:
<?php
$str = "One,Two\\, Two\\\\,Three";
$delimiter = chr(0x0B); // vertical tab, hope you do not expect it in the input?
$escaped = preg_replace_callback('/(\\\\)*,?/', function($m) use($delimiter){
if(!isset($m[1]) || strlen($m[0])%2) {
return str_replace(',',$delimiter,preg_replace('/\\\\{2}/','\\',$m[0]));
} else {
return str_replace('\\,',',', preg_replace('/\\\\{2}/','\\',$m[0]));
}
}, $str);
$array = explode($delimiter, $escaped);

Regular expression to parse pipe-delimited data enclosed in double braces

I'm trying to match a string like this:
{{name|arg1|arg2|...|argX}}
with a regular expression
I'm using preg_match with
/{{(\w+)\|(\w+)(?:\|(.+))*}}/
but I get something like this, whenever I use more than two args
Array
(
[0] => {{name|arg1|arg2|arg3|arg4}}
[1] => name
[2] => arg1
[3] => arg2|arg3|arg4
)
The first two items cannot contain spaces, the rest can.
Perhaps I'm working too long on this, but I can't find the error - any help would be greatly appreciated.
Thanks Jan
Don't use regular expressions for these kind of simple tasks. What you really need is:
$inner = substr($string, 2, -2);
$parts = explode('|', $inner);
# And if you want to make sure the string has opening/closing braces:
$length = strlen($string);
assert($inner[0] === '{');
assert($inner[1] === '{');
assert($inner[$length - 1] === '}');
assert($inner[$length - 2] === '}');
The problem is here: \|(.+)
Regular expressions, by default, match as many characters as possible. Since . is any character, other instances of | are happily matched too, which is not what you would like.
To prevent this, you should exclude | from the expression, saying "match anything except |", resulting in \|([^\|]+).
Should work for anywhere from 1 to N arguments
<?php
$pattern = "/^\{\{([a-z]+)(?:\}\}$|(?:\|([a-z]+))(?:\|([a-z ]+))*\}\}$)/i";
$tests = array(
"{{name}}" // should pass
, "{{name|argOne}}" // should pass
, "{{name|argOne|arg Two}}" // should pass
, "{{name|argOne|arg Two|arg Three}}" // should pass
, "{{na me}}" // should fail
, "{{name|arg One}}" // should fail
, "{{name|arg One|arg Two}}" // should fail
, "{{name|argOne|arg Two|arg3}}" // should fail
);
foreach ( $tests as $test )
{
if ( preg_match( $pattern, $test, $matches ) )
{
echo $test, ': Matched!<pre>', print_r( $matches, 1 ), '</pre>';
} else {
echo $test, ': Did not match =(<br>';
}
}
Of course you would get something like this :) There is no way in regular expression to return dynamic count of matches - in your case the arguments.
Looking at what you want to do, you should keep up with the current regular expression and just explode the extra args by '|' and add them to an args array.
indeed, this is from PCRE manual:
When a capturing subpattern is
repeated, the value captured is the
substring that matched the final
iteration. For example, after
(tweedle[dume]{3}\s*)+ has matched
"tweedledum tweedledee" the value of
the captured substring is
"tweedledee". However, if there are
nested capturing subpatterns, the
corresponding captured values may have
been set in previous iterations. For
example, after /(a|(b))+/ matches
"aba" the value of the second captured
substring is "b".

Simple question, comma delimited id's to array in php 5.2

I've got a comma delimited string of id's coming in and I need some quick way to split them into an array.
I know that I could hardcode it, but that's just gross and pointless.
I know nothing about regex at all and I can't find a SIMPLE example anywhere on the internet, only huge tutorials trying to teach me how to master regular expressions in 2 hours or something.
fgetcsv is only applicable for a file and str_getcsv is only available in PHP 5.3 and greater.
So, am I going to have to write this by hand or is there something out there that will do it for me?
I would prefer a simple regex solution with a little explanation as to why it does what it does.
$string = "1,3,5,9,11";
$array = explode(',', $string);
See explode()
Returns an array of strings, each of which is a substring of string formed by splitting it on boundaries formed by the string delimiter .
Any problem with normal split function?
$array = split(',', 'One,Two,Three');
will give you
Array
(
[0] => One
[1] => Two
[2] => Three
)
If you want to just split on commas:
$values = explode(",", $string);
If you also want to get rid of whitespace around the commas (eg: your string is 1, 3, 5)
$values = preg_split('/\s*,\s*/', $string)
If you want to be able to have commas in your string when surrounded by quotes, (eg: first, "se,cond", third)
$regex = <<<ENDOFREGEX
/ " ( (?:[^"\\\\]++|\\\\.)*+ ) \"
| ' ( (?:[^'\\\\]++|\\\\.)*+ ) \'
| ,+
/x
ENDOFREGEX;
$values = preg_split($regex, $string, -1, PREG_SPLIT_NO_EMPTY | PREG_SPLIT_DELIM_CAPTURE);
A simple regular expression should do the trick.
$a_ids = preg_split('%,%', $ids);

Split text using multiple delimiters into an array of trimmed values

I've got a group of strings which I need to chunk into an array.
The string needs to be split on either /, ,, with, or &.
Unfortunately it is possible for a string to contain two of the strings which needs to be split on, so I can't use split() or explode().
For example, a string could say first past/ going beyond & then turn, so I am trying to get an array that would return:
array('first past', 'going beyond', 'then turn')
The code I am currently using is
$splittersArray=array('/', ',', ' with ','&');
foreach($splittersArray as $splitter){
if(strpos($string, $splitter)){
$splitString = split($splitter, $string);
foreach($splitString as $split){
I can't seem to find a function in PHP that allows me to do this.
Do I need to be passing the string back into the top of the funnel, and continue to go through the foreach() after the string has been split again and again?
This doesn't seem very efficient.
Use a regular expression and preg_split.
In the case you mention, you would get the split array with:
$splitString = preg_split('/(\/|\,| with |\&/)/', $string);
To concisely write the pattern use a character class for the single-character delimiters and add the with delimiter as a value after the pipe (the "or" character in regex). Allow zero or more spaces on either side of the group of delimiters so that the values in the output don't need to be trimmed.
I am using the PREG_SPLIT_NO_EMPTY function flag in case a delimiter occurs at the start or end of the string and you don't want to have any empty elements generated.
Code: (Demo)
$string = 'first past/ going beyond & then turn with everyone';
var_export(
preg_split('~ ?([/,&]|with) ?~', $string, 0, PREG_SPLIT_NO_EMPTY)
);
Output:
array (
0 => 'first past',
1 => 'going beyond',
2 => 'then turn',
3 => 'everyone',
)

Categories