Whitespace delimiter not being captured in preg split - php

<?php
$text = "Testing text splitting\nWith a newline!";
$textArray = preg_split('/\s+/', $text, 0, PREG_SPLIT_DELIM_CAPTURE);
print_r($textArray);
The above code will output the following:
Array
(
[0] => Testing
[1] => text
[2] => splitting
[3] => With
[4] => a
[5] => newline!
)
However to my knowledge the PREG_SPLIT_DELIM_CAPTURE flag should be capturing the whitespace delimiters in the array. Am I missing something?
edit: Ok, after rereading the documentation I now understand PREG_SPLIT_DELIM_CAPTURE is not meant for this case. My desired output would be something like:
Array
(
[0] => Testing
[1] => ' '
[2] => text
[3] => ' '
[4] => splitting
[5] => '\n'
[6] => With
[7] => ' '
[8] => a
[9] => ' '
[10] => newline!
)

So if you read manual for PREG_SPLIT_DELIM_CAPTURE once again which says:
If this flag is set, parenthesized expression in the delimiter pattern will be captured and returned as well.
you will suddenly understand that expression in the delimiter pattern (in your case it is \s) will be captured (i.e added to result) only when it is in parentheses. Now, you can:
$text = "Testing text splitting\nWith a newline!";
$textArray = preg_split('/(\s+)/', $text, 0, PREG_SPLIT_DELIM_CAPTURE);
// parentheses!
print_r($textArray);

You can also use T-Regx library:
$textArray = pattern('(\s+)')->split("Testing text splitting\nWith a newline!")->inc();

Related

regex split a string between [ and ]

My string is something like that '[15][18][22]' and now I like so split it into an array of [15] and [18] and [22]. I'm trying with this regex
\[\d+\]
But it only split the first one.
thanks for help
You are better off using preg_match_all with what you want to capture:
if (preg_match_all('/\[\d+]/', $str, $m)) {
print_r($m[0]);
}
Output:
Array
(
[0] => [15]
[1] => [18]
[2] => [22]
)
Or else you may use this preg_split with a capture group:
$str = '[15][18][22]';
$arr = preg_split('/(\[\d+])/', $str, -1,
PREG_SPLIT_DELIM_CAPTURE | PREG_SPLIT_NO_EMPTY);
print_r($arr);
Output:
Array
(
[0] => [15]
[1] => [18]
[2] => [22]
)
It just doesn't get any simpler than this. Three characters in the pattern. You only need to explode on the zero-width position after each ]. \K tells the regex engine to forget/release the previously matched character.
~]\K~ Pattern Demo
Code: (Demo)
$string = '[15][18][22]';
var_export(preg_split('~]\K~', $string, -1, PREG_SPLIT_NO_EMPTY));
Output:
array (
0 => '[15]',
1 => '[18]',
2 => '[22]',
)
This will perform with maximum efficiency because it doesn't have any capture groups, lookarounds, or alternatives to slow it down.

Sscanf with regex to match even an empty string

I am using this regex in sscanf
sscanf($seat, "%d-%[^(](%[^#]#%[^)])");
And it works well when i'm getting this kind of strings:
173-9B(AA#3.45 EUR#32H)
but when i'm getting this kind of string:
173-9B(#3.14 EUR#32H)
it's all messed up, how can I also accept empty strings between the first ( and the first # ?
You would be better off using a regex in preg_match to handle optional data presence in input:
$re = '/(\d*)-([^(]*)\(([^#]*)#([^)]*)\)/';
preg_match($re, '173-9B(#3.45 EUR#32H)', $m);
unset($m[0]);
print_r($m);
Output:
Array
(
[1] => 173
[2] => 9B
[3] =>
[4] => 3.45 EUR#32H
)
And 2nd example:
preg_match($re, '173-9B(AA#3.45 EUR#32H)', $m);
unset($m[0]);
print_r($m);
Array
(
[1] => 173
[2] => 9B
[3] => AA
[4] => 3.45 EUR#32H
)
Use of ([^#]*) will make it match 0 more characters that are not #.

php regex split string by [%%%]

Hi I need a preg_split regex that will split a string at substrings in square brackets.
This example input:
$string = 'I have a string containing [substrings] in [brackets].';
should provide this array output:
[0]= 'I have a string containing '
[1]= '[substrings]'
[2]= ' in '
[3]= '[brackets]'
[4]= '.'
After reading your revised question:
This might be what you want:
$string = 'I have a string containing [substrings] in [brackets].';
preg_split('/(\[.*?\])/', $string, null, PREG_SPLIT_DELIM_CAPTURE);
You should get:
Array
(
[0] => I have a string containing
[1] => [substrings]
[2] => in
[3] => [brackets]
[4] => .
)
Original answer:
preg_split('/%+/i', 'ot limited to 3 %%% so it can be %%%% or % or %%%%%, etc Tha');
You should get:
Array
(
[0] => ot limited to 3
[1] => so it can be
[2] => or
[3] => or
[4] => , etc Tha
)
Or if you want a mimimum of 3 then try:
preg_split('/%%%+/i', 'Not limited to 3 %%% so it can be %%%% or % or %%%%%, etc Tha');
Have a go at http://regex.larsolavtorvik.com/
I think this is what you are looking for:
$array = preg_split('/(\[.*?\])/', $string, null, PREG_SPLIT_DELIM_CAPTURE);

preg_split() problem with strings containing '&'

I am using preg_split() to get array of sentence from a string.
$sentences = preg_split("/([.?!\r\n]+)/", $text, 0, PREG_SPLIT_NO_EMPTY|PREG_SPLIT_DELIM_CAPTURE);
But when $text contains '&', for example:
$text = 'this is test. we are testing this & we are over.';
then it stops matching after the '&'.
Your preg_split handles sentences with ampersands correctly, for example:
$text = 'Sample sentence. Another sentence! Sentence with the special character & (ampersand). Last sentence.';
$sentences = preg_split("/([.?!\r\n]+)/", $text, 0, PREG_SPLIT_NO_EMPTY|PREG_SPLIT_DELIM_CAPTURE);
print_r($sentences);
Output:
Array
(
[0] => Sample sentence
[1] => .
[2] => Another sentence
[3] => !
[4] => Sentence with the special character & (ampersand)
[5] => .
[6] => Last sentence
[7] => .
)
Your Script:
$text = 'this is test. we are testing this & we are over.';
$sentences = preg_split("/([.?!\r\n]+)/", $text, 0, PREG_SPLIT_NO_EMPTY|PREG_SPLIT_DELIM_CAPTURE);
echo '<pre>'.print_r($sentences, true).'</pre>';
My Output:
Array
(
[0] => this is test
[1] => .
[2] => we are testing this & we are over
[3] => .
)
I don't understand your problem.

REGEX: Splitting by commas that are not in single quotes, allowing for escaped quotes

I am looking for a regular expression using preg_match_all in PHP 5 that would allow me to split a string by commas, so long as the commas do not exist inside single quotes, allowing for escaped single quotes. Example data would be:
(some_array, 'some, string goes here','another_string','this string may contain "double quotes" but, it can\'t split, on escaped single quotes', anonquotedstring, 83448545, 1210597346 + '000', 1241722133 + '000')
This should produce a match that looks like this:
(some_array
'some, string goes here'
'another_string'
'this string may contain "double quotes" but, it can\'t split, on escaped single quotes'
anonquotedstring
83448545
1210597346 + '000'
1241722133 + '000')
I've tried many, many regexes... My current one looks like this, although it doesn't match 100% correctly. (It still splits some commas inside single quotes.)
"/'(.*?)(?<!(?<!\\\)\\\)'|[^,]+/"
Have you tried str_getcsv? It does exactly what you need without a regular expression.
$result = str_getcsv($str, ",", "'");
You can even implement this method in PHP versions older than 5.3, mapping to fgetcsv with this snippet from a comment in the docs:
if (!function_exists('str_getcsv')) {
function str_getcsv($input, $delimiter = ',', $enclosure = '"', $escape = null, $eol = null) {
$temp = fopen("php://memory", "rw");
fwrite($temp, $input);
fseek($temp, 0);
$r = fgetcsv($temp, 4096, $delimiter, $enclosure);
fclose($temp);
return $r;
}
}
In PHP 5.3 onwards you can save yourself that pain with str_getcsv
$data=str_getcsv($input, ",", "'");
To take your example...
$input=<<<STR
(some_array, 'some, string goes here','another_string','this string may contain "double quotes" but it can\'t split on escaped single quotes', anonquotedstring, 83448545, 1210597346 + '000', 1241722133 + '000')
STR;
$data=str_getcsv($input, ",", "'");
print_r($data);
Outputs this
Array
(
[0] => (some_array
[1] => some, string goes here
[2] => another_string
[3] => this string may contain "double quotes" but it can\'t split on escaped single quotes
[4] => anonquotedstring
[5] => 83448545
[6] => 1210597346 + '000'
[7] => 1241722133 + '000')
)
With some look-behind, you can get something close to what you want :
$test = "(some_array, 'some, string goes here','another_string','this string may contain \"double quotes\" but, it can\'t split, on escaped single quotes', anonquotedstring, 83448545, 1210597346 + '000', 1241722133 + '000')";
preg_match_all('`
(?:[^,\']|
\'((?<=\\\\)\'|[^\'])*\')*
`x', $test, $result);
print_r($result);
Gives you this result :
Array
(
[0] => Array
(
[0] => (some_array
[1] =>
[2] => 'some, string goes here'
[3] =>
[4] => 'another_string'
[5] =>
[6] => 'this string may contain "double quotes" but, it can\'t split, on escaped single quotes'
[7] =>
[8] => anonquotedstring
[9] =>
[10] => 83448545
[11] =>
[12] => 1210597346 + '000'
[13] =>
[14] => 1241722133 + '000')
[15] =>
)
[1] => Array
(
[0] =>
[1] =>
[2] => e
[3] =>
[4] => g
[5] =>
[6] => s
[7] =>
[8] =>
[9] =>
[10] =>
[11] =>
[12] => 0
[13] =>
[14] => 0
[15] =>
)
)
I second the use of a CSV parser here, that's what they are there for.
If you're stuck with regex, you could use
preg_match_all(
'/\s*" # either match " (optional preceding whitespace),
(?:\\\\. # followed either by an escaped character
| # or
[^"] # any character except "
)* # any number of times,
"\s* # followed by " (and optional whitespace).
| # Or: do the same thing for single-quoted strings.
\s*\'(?:\\\\.|[^\'])*\'\s*
| # Or:
[^,]* # match anything except commas (i.e. any remaining unquoted strings)
/x',
$subject, $result, PREG_PATTERN_ORDER);
$result = $result[0];
But, as you can see, this is ugly and hard to maintain. Use the right tool for the job.

Categories