i'm trying to use php to split a string into array components using either " or ' as the delimiter. i just want to split by the outermost string. here are four examples and the desired result for each:
$pattern = "?????";
$str = "the cat 'sat on' the mat";
$res = preg_split($pattern, $str);
print_r($res);
/*output:
Array
(
[0] => the cat
[1] => 'sat on'
[2] => the mat
)*/
$str = "the cat \"sat on\" the mat";
$res = preg_split($pattern, $str);
print_r($res);
/*output:
Array
(
[0] => the cat
[1] => "sat on"
[2] => the mat
)*/
$str = "the \"cat 'sat' on\" the mat";
$res = preg_split($pattern, $str);
print_r($res);
/*output:
Array
(
[0] => the
[1] => "cat 'sat' on"
[2] => the mat
)*/
$str = "the 'cat \"sat\" on' the mat 'when \"it\" was' seventeen";
$res = preg_split($pattern, $str);
print_r($res);
/*output:
Array
(
[0] => the
[1] => 'cat "sat" on'
[2] => the mat
[3] => 'when "it" was'
[4] => seventeen
)*/
as you can see i only want to split by the outermost quotation, and i want to ignore any quotations within quotations.
the closest i have come up with for $pattern is
$pattern = "/((?P<quot>['\"])[^(?P=quot)]*?(?P=quot))/";
but obviously this is not working.
You can use preg_split with the PREG_SPLIT_DELIM_CAPTURE option. The regular expressions is not quite as elegant as #Jan TuroĊ's back reference approach because the required capture group messes up the results.
$str = "the 'cat \"sat\" on' the mat the \"cat 'sat' on\" the mat";
$match = preg_split("/('[^']*'|\"[^\"]*\")/U", $str, null, PREG_SPLIT_DELIM_CAPTURE);
print_r($match);
You can use just preg_match for this:
$str = "the \"cat 'sat' on\" the mat";
$pattern = '/^([^\'"]*)(([\'"]).*\3)(.*)$/';
if (preg_match($pattern, $str, $matches)) {
printf("[initial] => %s\n[quoted] => %s\n[end] => %s\n",
$matches[1],
$matches[2],
$matches[4]
);
}
This prints:
[initial] => the
[quoted] => "cat 'sat' on"
[end] => the mat
Here is an explanation of the regex:
/^([^\'"]*) => put the initial bit until the first quote (either single or double) in the first captured group
(([\'"]).*\3) => capture in \2 the text corresponding from the initial quote (either single or double) (that is captured in \3) until the closing quote (that must be the same type as the opening quote, hence the \3). The fact that the regexp is greedy by nature helps to get from the first quote to the last one, regardless of how many quotes are inside.
(.*)$/ => Capture until the end in \4
Yet another solution using preg_replace_callback
$result1 = array();
function parser($p) {
global $result1;
$result1[] = $p[0];
return "|"; // temporary delimiter
}
$str = "the 'cat \"sat\" on' the mat 'when \"it\" was' seventeen";
$str = preg_replace_callback("/(['\"]).*\\1/U", "parser", $str);
$result2 = explode("|",$str); // using temporary delimiter
Now you can zip those arrays using array_map
$result = array();
function zipper($a,$b) {
global $result;
if($a) $result[] = $a;
if($b) $result[] = $b;
}
array_map("zipper",$result2,$result1);
print_r($result);
And the result is
[0] => the
[1] => 'cat "sat" on'
[2] => the mat
[3] => 'when "it" was'
[4] => seventeen
Note: I'd would be probably better to create a class doing this feat, so the global variables can be avoided.
You can use back references and ungreedy modifier in preg_match_all
$str = "the 'cat \"sat\" on' the mat 'when \"it\" was' seventeen";
preg_match_all("/(['\"])(.*)\\1/U", $str, $match);
print_r($match[0]);
Now you have your outermost quotation parts
[0] => 'cat "sat" on'
[1] => 'when "it" was'
And you can find the rest of the string with substr and strpos (kind of blackbox solution)
$a = $b = 0; $result = array();
foreach($match[0] as $part) {
$b = strpos($str,$part);
$result[] = substr($str,$a,$b-$a);
$result[] = $part;
$a = $b+strlen($part);
}
$result[] = substr($str,$a);
print_r($result);
Here is the result
[0] => the
[1] => 'cat "sat" on'
[2] => the mat
[3] => 'when "it" was'
[4] => seventeen
Just strip eventual empty heading/trailing element if the quotation is at the very beginning/end of the string.
Related
I'm a bit lost with preg_split() in parsing a string with multiple delimiters and keeping the delimiter in the 'after' part of the split.
My delimiters are $, #, and ?.
For instance:
$str = 'participant-$id#-group';
$ar = preg_split('/([^$#?]+[$#?]+)/', $str, -1, PREG_SPLIT_DELIM_CAPTURE | PREG_SPLIT_NO_EMPTY);
echo "<pre>"; print_r( $ar); echo "</pre>";
will show:
Array
(
[0] => participant_data-$
[1] => id#
[2] => -group
)
However I need:
Array
(
[0] => participant_data-
[1] => $id
[2] => #-group
)
Regex makes my brain hurt. so could someone advise how I use PREG_SPLIT_DELIM_CAPTURE and keep the delimiter at the beginning of the segment?
Try this:
$ar = preg_split('/(\$[^#]+)/', $str, -1, PREG_SPLIT_DELIM_CAPTURE | PREG_SPLIT_NO_EMPTY);
How about this. I am capturing the delimiters and then put them back together.
<?php
$str = 'participant-$id#-group';
$ar = preg_split('/([^$#?]+[^$#?]+)/', $str, -1, PREG_SPLIT_DELIM_CAPTURE | PREG_SPLIT_NO_EMPTY);
echo "<pre>"; print_r( $ar); echo "</pre>";
/*
Array
(
[0] => participant-
[1] => $
[2] => id
[3] => #
[4] => -group
) */
$result = array();
$result[] = $ar[0];
for($i=1;$i<count($ar);$i+=2) {
$result[] = $ar[$i] . $ar[$i+1];
}
echo "<pre>"; print_r( $result); echo "</pre>";
/*
Array
(
[0] => participant-
[1] => $id
[2] => #-group
)
*/
?>
You don't need to capture the delimiters, just use a lookahead for the $, #, or ?. This will split your string on the zero-width position before the delimiters. No characters will be lost/consumed why exploding.
Code: (Demo)
$str = 'participant-$id#-group';
var_export(
preg_split('/(?=[$#?])/', $str)
);
Output:
array (
0 => 'participant-',
1 => '$id',
2 => '#-group',
)
Here is my code, which currently does not working properly. How can I make it working? My wish is to make output like one string (of course I know how to "convert" array to string):
words altered, added, and removed to make it
Code:
<?php
header('Content-Type: text/html; charset=utf-8');
$text = explode(" ", strip_tags("words altered added and removed to make it"));
$stack = array();
$words = array("altered", "added", "something");
foreach($words as $keywords){
$check = array_search($keywords, $text);
if($check>(-1)){
$replace = " ".$text[$check].",";
$result = str_replace($text[$check], $replace, $text);
array_push($stack, $result);
}
}
print_r($stack);
?>
Output:
Array
(
[0] => Array
(
[0] => words
[1] => altered,
[2] => added
[3] => and
[4] => removed
[5] => to
[6] => make
[7] => it
)
[1] => Array
(
[0] => words
[1] => altered
[2] => added,
[3] => and
[4] => removed
[5] => to
[6] => make
[7] => it
)
)
Without more explanation it's as simple as this:
$text = strip_tags("words altered added and removed to make it");
$words = array("altered", "added", "something");
$result = $text;
foreach($words as $word) {
$result = str_replace($word, "$word,", $result);
}
Don't explode your source string
Loop the words and replace the word with the word and added comma
Or abandoning the loop approach:
$text = strip_tags("words altered added and removed to make it");
$words = array("altered", "added", "something");
$result = preg_replace('/('.implode('|', $words).')/', '$1,', $text);
Create a pattern by imploding the words on the alternation (OR) operator |
Replace found word with the word $1 and a comma
You can use an iterator
// Array with your stuff.
$array = [];
$iterator = new RecursiveIteratorIterator(new RecursiveArrayIterator($array));
foreach($iterator as $value) {
echo $v, " ";
}
Your original approach should work with a few modifications. Loop over the words in the exploded string instead. For each one, if it is in the array of words to modify, add the comma. If not, don't. Then the modified (or not) word goes on the stack.
$text = explode(" ", strip_tags("words altered added and removed to make it"));
$words = array("altered", "added", "something");
foreach ($text as $word) {
$stack[] = in_array($word, $words) ? "$word," : $word;
}
I have this variable:
$str = "w15";
What is the best way to split it to w and 15 separately?
explode() is removing 1 letter, and str_split() doesn't have the option to split the string to an unequal string.
$str = "w15xx837ee";
$letters = preg_replace('/\d/', '', $str);
$numbers = preg_replace('/[^\d]/', '', $str);
echo $letters; // outputs wxxee
echo $numbers; // outputs 15837
You could do something like this to separate strings and numbers
<?php
$str = "w15";
$strarr=str_split($str);
foreach($strarr as $val)
{
if(is_numeric($val))
{
$intarr[]=$val;
}
else
{
$stringarr[]=$val;
}
}
print_r($intarr);
print_r($stringarr);
Output:
Array
(
[0] => 1
[1] => 5
)
Array
(
[0] => w
)
If you want it to be as 15 , you could just implode the $intarr !
Use preg_split() to achieve this:
$arr = preg_split('~(?<=\d)(?=[a-zA-Z])|(?<=[a-zA-Z])(?=\d)~', $str);
Output:
Array
(
[0] => w
[1] => 15
)
For example, the string w15g12z would give the following array:
Array
(
[0] => w
[1] => 15
[2] => g
[3] => 12
[4] => z
)
Demo
Much cleaner:
$result = preg_split("/(?<=\d)(?=\D)|(?<=\D)(?=\d)/",$str);
Essentially, it's kind of like manually implementing \b with a custom set (rather than \w)
I'm trying to split the following string:
Hello how are you<br>Foo bar hello
Into
"Hello", " how", " are", " you", "<br>", " Foo", " bar", " Hello"
Is this possible?
Don't make things harder than you have to. Use preg_split() with the PREG_SPLIT_DELIM_CAPTURE flag, and capture the <br>:
$str = 'Hello how are you<br>Foo bar hello';
$array = preg_split( '/\s+|(<br>)/', $str, -1, PREG_SPLIT_DELIM_CAPTURE);
print_r( $array);
Output:
Array
(
[0] => Hello
[1] => how
[2] => are
[3] => you
[4] => <br>
[5] => Foo
[6] => bar
[7] => hello
)
Edit: To include the space in the following token, you can use an assertion:
$array = preg_split( '/(?:\s*(?=\s))|(<br>)/', $str, -1, PREG_SPLIT_DELIM_CAPTURE);
So, the goal of preg_split() is to find a spot in the string to split. The regex we use consists of two parts, OR'd together with |:
(?:\s*(?=\s)). This starts off with a non-capturing group (?:), because when we match this part of the regex, we do not want it returned to us. Inside the non-capturing group, is \s*(?=\s), which says "match zero or more whitespace characters, but assert that the next character is a whitespace character". Looking at our input string, this makes sense:
Hello how are you<br>Foo bar hello
^ ^
The regex will start from left to right, find "Hello{space}how", and decide how to split the string. It tries to match \s* with the restriction that if it consumes any space, there needs to be one space left. So, it breaks up the string at just "Hello". When it continues, it has " how are youFoo bar hello" left. It starts the match again, trying to match from where it left off, and sees " how are", and does the same split as above. It continues until there are no matches left.
Capture <br>, with (<br>). It is captured because when we match this, we want to keep it in the output, so capturing it along with the PREG_SPLIT_DELIM_CAPTURE causes it to be returned to us when it is matched (instead of being completely consumed).
This results in:
array(8)
{
[0]=> string(5) "Hello"
[1]=> string(4) " how"
[2]=> string(4) " are"
[3]=> string(4) " you"
[4]=> string(4) "<br>"
[5]=> string(3) "Foo"
[6]=> string(4) " bar"
[7]=> string(6) " hello"
}
Not pretty, but simple enough:
$data = 'Hello how are you<br>Foo bar hello';
$split = array();
foreach (explode('<br>', $data) as $line) {
$split[] = array_merge($split, explode(' ', $line));
$split[] = '<br>';
}
array_pop($split);
print_r($split);
Or version 2:
$data = 'Hello how are you<br>Foo bar hello';
$data = preg_replace('#\s|(<br>)#', '**$1**', $data);
$split = array_filter(explode('**', $data));
print_r($split);
This is how I'd do it:
Explode the string with space as a delimiter
Loop through the parts
Use strpos and check if part contains the given tag -- <br> in this case
If it does, explode the string again with the tag as the delimiter
Push all the three items into the result array
If it doesn't, then push it into the result array
Code:
$str = 'Hello how are you<br>Foo bar hello';
$parts = explode(' ', $str);
$result = array();
foreach ($parts as $part) {
if(strpos($part, '<br>') !== FALSE) {
$arr = explode('<br>', $part);
$result = array_merge($result, $arr);
$result[] = "<br>";
}
else {
$result[] = $part;
}
}
print_r($result);
Output:
Array
(
[0] => Hello
[1] => how
[2] => are
[3] => you
[4] => Foo
[5] => <br>
[6] => bar
[7] => hello
)
Demo!
Here is a brief solution. Replace <br> by (space <br> space) and split using space:
<?php
$newStr=str_replace("<br>"," <br> ","Hello how are you<br>Foo bar hello");
$str= explode(' ',$newStr);
?>
Output of print_r($str):
(
[0] => Hello
[1] => how
[2] => are
[3] => you
[4] => <br>
[5] => Foo
[6] => bar
[7] => hello
)
Borrowing the preg_split pattern from #nickb's answer:
<?php
$string = 'Hello how are you<br>Foo bar hello';
$array = preg_split('/\s/',$string);
foreach($array as $key => $value) {
$a = preg_split( '/\s+|(<br>)/', $value, -1, PREG_SPLIT_DELIM_CAPTURE);
if(is_array($a)) {
foreach($a as $key2 => $value2) {
$result[] = $value2;
}
}
}
print_r($result);
?>
Output:
Array
(
[0] => Hello
[1] => how
[2] => are
[3] => you
[4] => <br>
[5] => Foo
[6] => bar
[7] => hello
)
$str = "This is a string";
$words = explode(" ", $str);
Works fine, but spaces still go into array:
$words === array ('This', 'is', 'a', '', '', '', 'string');//true
I would prefer to have words only with no spaces and keep the information about the number of spaces separate.
$words === array ('This', 'is', 'a', 'string');//true
$spaces === array(1,1,4);//true
Just added: (1, 1, 4) means one space after the first word, one space after the second word and 4 spaces after the third word.
Is there any way to do it fast?
Thank you.
For splitting the String into an array, you should use preg_split:
$string = 'This is a string';
$data = preg_split('/\s+/', $string);
Your second part (counting spaces):
$string = 'This is a string';
preg_match_all('/\s+/', $string, $matches);
$result = array_map('strlen', $matches[0]);// [1, 1, 4]
Here is one way, splitting the string and running a regex once, then parsing the results to see which segments were captured as the split (and therefore only whitespace), or which ones are words:
$temp = preg_split('/(\s+)/', $str, -1, PREG_SPLIT_DELIM_CAPTURE | PREG_SPLIT_NO_EMPTY);
$spaces = array();
$words = array_reduce( $temp, function( &$result, $item) use ( &$spaces) {
if( strlen( trim( $item)) === 0) {
$spaces[] = strlen( $item);
} else {
$result[] = $item;
}
return $result;
}, array());
You can see from this demo that $words is:
Array
(
[0] => This
[1] => is
[2] => a
[3] => string
)
And $spaces is:
Array
(
[0] => 1
[1] => 1
[2] => 4
)
You can use preg_split() for the first array:
$str = 'This is a string';
$words = preg_split('#\s+#', $str);
And preg_match_all() for the $spaces array:
preg_match_all('#\s+#', $str, $m);
$spaces = array_map('strlen', $m[0]);
Another way to do it would be using foreach loop.
$str = "This is a string";
$words = explode(" ", $str);
$spaces=array();
$others=array();
foreach($words as $word)
{
if($word==' ')
{
array_push($spaces,$word);
}
else
{
array_push($others,$word);
}
}
Here are the results of performance tests:
$str = "This is a string";
var_dump(time());
for ($i=1;$i<100000;$i++){
//Alma Do Mundo - the winner
$rgData = preg_split('/\s+/', $str);
preg_match_all('/\s+/', $str, $rgMatches);
$rgResult = array_map('strlen', $rgMatches[0]);// [1,1,4]
}
print_r($rgData); print_r( $rgResult);
var_dump(time());
for ($i=1;$i<100000;$i++){
//nickb
$temp = preg_split('/(\s+)/', $str, -1,PREG_SPLIT_DELIM_CAPTURE | PREG_SPLIT_NO_EMPTY);
$spaces = array();
$words = array_reduce( $temp, function( &$result, $item) use ( &$spaces) {
if( strlen( trim( $item)) === 0) {
$spaces[] = strlen( $item);
} else {
$result[] = $item;
}
return $result;
}, array());
}
print_r( $words); print_r( $spaces);
var_dump(time());
int(1378392870)
Array
(
[0] => This
[1] => is
[2] => a
[3] => string
)
Array
(
[0] => 1
[1] => 1
[2] => 4
)
int(1378392871)
Array
(
[0] => This
[1] => is
[2] => a
[3] => string
)
Array
(
[0] => 1
[1] => 1
[2] => 4
)
int(1378392873)
$financialYear = 2015-2016;
$test = explode('-',$financialYear);
echo $test[0]; // 2015
echo $test[1]; // 2016
Splitting with regex has been demonstrated well by earlier answers, but I think this is a perfect case for calling ctype_space() to determine which result array should receive the encountered value.
Code: (Demo)
$string = "This is a string";
$words = [];
$spaces = [];
foreach (preg_split('~( +)~', $string, null, PREG_SPLIT_NO_EMPTY | PREG_SPLIT_DELIM_CAPTURE) as $s) {
if (ctype_space($s)) {
$spaces[] = strlen($s);
} else {
$words[] = $s;
}
}
var_export([
'words' => $words,
'spaces' => $spaces
]);
Output:
array (
'words' =>
array (
0 => 'This',
1 => 'is',
2 => 'a',
3 => 'string',
),
'spaces' =>
array (
0 => 1,
1 => 1,
2 => 4,
),
)
If you want to replace the piped constants used by preg_split() you can just use 3 (Demo). This represents PREG_SPLIT_NO_EMPTY which is 1 plus PREG_SPLIT_DELIM_CAPTURE which is 2. Be aware that with this reduction in code width, you also lose code readability.
preg_split('~( +)~', $string, -1, 3)
What about this? Does someone care to profile this?
$str = str_replace(["\t", "\r", "\r", "\0", "\v"], ' ', $str); // \v -> vertical space, see trim()
$words = explode(' ', $str);
$words = array_filter($words); // there would be lots elements from lots of spaces so skip them.