Split a string on different substrings, but conserve those substrings - php

I'm trying to split the following string:
Hello how are you<br>Foo bar hello
Into
"Hello", " how", " are", " you", "<br>", " Foo", " bar", " Hello"
Is this possible?

Don't make things harder than you have to. Use preg_split() with the PREG_SPLIT_DELIM_CAPTURE flag, and capture the <br>:
$str = 'Hello how are you<br>Foo bar hello';
$array = preg_split( '/\s+|(<br>)/', $str, -1, PREG_SPLIT_DELIM_CAPTURE);
print_r( $array);
Output:
Array
(
[0] => Hello
[1] => how
[2] => are
[3] => you
[4] => <br>
[5] => Foo
[6] => bar
[7] => hello
)
Edit: To include the space in the following token, you can use an assertion:
$array = preg_split( '/(?:\s*(?=\s))|(<br>)/', $str, -1, PREG_SPLIT_DELIM_CAPTURE);
So, the goal of preg_split() is to find a spot in the string to split. The regex we use consists of two parts, OR'd together with |:
(?:\s*(?=\s)). This starts off with a non-capturing group (?:), because when we match this part of the regex, we do not want it returned to us. Inside the non-capturing group, is \s*(?=\s), which says "match zero or more whitespace characters, but assert that the next character is a whitespace character". Looking at our input string, this makes sense:
Hello how are you<br>Foo bar hello
^ ^
The regex will start from left to right, find "Hello{space}how", and decide how to split the string. It tries to match \s* with the restriction that if it consumes any space, there needs to be one space left. So, it breaks up the string at just "Hello". When it continues, it has " how are youFoo bar hello" left. It starts the match again, trying to match from where it left off, and sees " how are", and does the same split as above. It continues until there are no matches left.
Capture <br>, with (<br>). It is captured because when we match this, we want to keep it in the output, so capturing it along with the PREG_SPLIT_DELIM_CAPTURE causes it to be returned to us when it is matched (instead of being completely consumed).
This results in:
array(8)
{
[0]=> string(5) "Hello"
[1]=> string(4) " how"
[2]=> string(4) " are"
[3]=> string(4) " you"
[4]=> string(4) "<br>"
[5]=> string(3) "Foo"
[6]=> string(4) " bar"
[7]=> string(6) " hello"
}

Not pretty, but simple enough:
$data = 'Hello how are you<br>Foo bar hello';
$split = array();
foreach (explode('<br>', $data) as $line) {
$split[] = array_merge($split, explode(' ', $line));
$split[] = '<br>';
}
array_pop($split);
print_r($split);
Or version 2:
$data = 'Hello how are you<br>Foo bar hello';
$data = preg_replace('#\s|(<br>)#', '**$1**', $data);
$split = array_filter(explode('**', $data));
print_r($split);

This is how I'd do it:
Explode the string with space as a delimiter
Loop through the parts
Use strpos and check if part contains the given tag -- <br> in this case
If it does, explode the string again with the tag as the delimiter
Push all the three items into the result array
If it doesn't, then push it into the result array
Code:
$str = 'Hello how are you<br>Foo bar hello';
$parts = explode(' ', $str);
$result = array();
foreach ($parts as $part) {
if(strpos($part, '<br>') !== FALSE) {
$arr = explode('<br>', $part);
$result = array_merge($result, $arr);
$result[] = "<br>";
}
else {
$result[] = $part;
}
}
print_r($result);
Output:
Array
(
[0] => Hello
[1] => how
[2] => are
[3] => you
[4] => Foo
[5] => <br>
[6] => bar
[7] => hello
)
Demo!

Here is a brief solution. Replace <br> by (space <br> space) and split using space:
<?php
$newStr=str_replace("<br>"," <br> ","Hello how are you<br>Foo bar hello");
$str= explode(' ',$newStr);
?>
Output of print_r($str):
(
[0] => Hello
[1] => how
[2] => are
[3] => you
[4] => <br>
[5] => Foo
[6] => bar
[7] => hello
)

Borrowing the preg_split pattern from #nickb's answer:
<?php
$string = 'Hello how are you<br>Foo bar hello';
$array = preg_split('/\s/',$string);
foreach($array as $key => $value) {
$a = preg_split( '/\s+|(<br>)/', $value, -1, PREG_SPLIT_DELIM_CAPTURE);
if(is_array($a)) {
foreach($a as $key2 => $value2) {
$result[] = $value2;
}
}
}
print_r($result);
?>
Output:
Array
(
[0] => Hello
[1] => how
[2] => are
[3] => you
[4] => <br>
[5] => Foo
[6] => bar
[7] => hello
)

Related

I want to explode a variable in a little different way [duplicate]

This question already has answers here:
How can I use str_getcsv() and ignore commas between quotes?
(1 answer)
REGEX: Splitting by commas that are not in single quotes, allowing for escaped quotes
(4 answers)
Closed 6 months ago.
I have this variable.
$var = "A,B,C,D,'1,2,3,4,5,6',E,F";
I want to explode it so that I get the following array.
array(
[0] => A,
[1] => B,
[2] => C,
[3] => D,
[4] => 1,2,3,4,5,6,
[5] => E,
[6] => F
);
I used explode(',',$var) but I am not getting my desired output. Any suggestions?
There is an existing function that can parse your comma-separated string. The function is str_getcsv
It's signature is like so:
array str_getcsv ( string $input [, string $delimiter = "," [, string $enclosure = '"' [, string $escape = "\\" ]]] )
Your only change would be to change the 3rd variable, the enclosure, to single quotes rather than the default double quotes.
Here is a sample.
$var = "A,B,C,D,'1,2,3,4,5,6',E,F";
$array = str_getcsv($var,',',"'");
If you var_dump the array, you'll get the format you wanted:
array(7) {
[0]=>
string(1) "A"
[1]=>
string(1) "B"
[2]=>
string(1) "C"
[3]=>
string(1) "D"
[4]=>
string(11) "1,2,3,4,5,6"
[5]=>
string(1) "E"
[6]=>
string(1) "F"
}
Simply use preg_match_all with the following regex as follows
preg_match_all("/(?<=').*(?=')|\w+/",$var,$m);
print_r($m[0]);
Regex Explanation :
(?<=').*(?=') Capture each and every character within '(quotes)
|\w+ |(OR) Will grab rest of the characters except ,
Demo
Regex
Although preg_split along with array_map is working very good, see below an example using explode and trim
$var = "A,B,C,D,'1,2,3,4,5,6',E,F";
$a = explode("'",$var);
//print_r($a);
/*
outputs
Array
(
[0] => A,B,C,D,
[1] => 1,2,3,4,5,6
[2] => ,E,F
)
*/
$firstPart = explode(',',trim($a[0],',')); //take out the trailing comma
/*
print_r($firstPart);
outputs
Array
(
[0] => A
[1] => B
[2] => C
[3] => D
)
*/
$secondPart = array($a[1]);
$thirdPart = explode(',',trim($a[2],',')); //tale out the leading comma
/*
print_r($thirdPart);
Array
(
[0] => E
[1] => F
)
*/
$fullArray = array_merge($firstPart,$secondPart,$thirdPart);
print_r($fullArray);
/*
ouputs
Array
(
[0] => A
[1] => B
[2] => C
[3] => D
[4] => 1,2,3,4,5,6
[5] => E
[6] => F
)
*/
You need to explode the string to array.
But, you need commas after every element except last one.
Here is working example:
<?php
$var = "A,B,C,D,'1,2,3,4,5,6',E,F";
$arr = explode("'", $var);
$num = ! empty($arr[1]) ? str_replace(',', '_', $arr[1]) : '';
$nt = $arr[0] . $num . $arr[2];
$nt = explode(',', $nt);
$len = count($nt);
$na = array();
$cnt = 0;
foreach ($nt as $v) {
$v = str_replace('_', ',', $v);
$v .= ($cnt != $len - 1) ? ',' : '';
$na[] = $v;
++$cnt;
}
Demo
$var = "A,B,C,D,'1,2,3,4,5,6',E,F";
$arr = preg_split("/(,)(?=(?:[^']|'[^']*')*$)/",$var);
foreach ($arr as $data) {
$requiredData[] = str_replace("'","",$data);
}
echo '<pre>';
print_r($requiredData);
Description :
Regular Exp. :-
(?<=').*(?=') => Used to get all characters within single quotes(' '),
|\w+ |(OR) => Used to get rest of characters excepted comma(,)
Then Within foreach loop i'm replacing single quote

Regex match everything after

I have a url for which i want to match a certain pattern
/events/display/id/featured
where
match everything after /events/
display is matched into key
id and featured are 1 or more matched into a key
thus i end up with
Array (
[method] => display
[param] => Array ([0]=>id,[1]=>featured,[2]=>true /* if there was another path */)
)
so far i have
(?:/events/)/(?P<method>.*?)/(?P<parameter>.*?)([^/].*?)
But its not working out as expected.
What's wrong with the syntax?
P.S. no i don't want to use parse_url() or php defined function i need a regex
You can use this pattern:
<pre><?php
$subject = '/events/display/id1/param1/id2/param2/id3/param3';
$pattern = '~/events/(?<method>[^/]+)|\G(?!\A)/(?<id>[^/]+)/(?<param>[^/]+)~';
preg_match_all($pattern, $subject, $matches, PREG_SET_ORDER);
foreach($matches as $match) {
if (empty($match['method'])) {
$keyval[] = array('id'=>$match['id'], 'param'=>$match['param']);
} else {
$result['method'] = $match['method'];
}
}
if (isset($keyval)) $result['param'] = $keyval;
print_r($result);
pattern details:
~
/events/(?<method>[^/]+) # "events" followed by the method name
| # OR
\G # a contiguous match from the precedent
(?!\A) # not at the start of the string
/(?<id>[^/]+) # id
/(?<param>[^/]+) # param
~
Why not using a mix of preg_match() and explode()?:
$str = '/events/display/id/featured';
$pattern = '~/events/(?P<method>.*?)/(?P<parameter>.*)~';
preg_match($pattern, $str, $matches);
// explode the params by '/'
$matches['parameter'] = explode('/', $matches['parameter']);
var_dump($matches);
Output:
array(5) {
[0] =>
string(27) "/events/display/id/featured"
'method' =>
string(7) "display"
[1] =>
string(7) "display"
'parameter' =>
array(2) {
[0] =>
string(2) "id"
[1] =>
string(8) "featured"
}
[2] =>
string(11) "id/featured"
}
Here I'm basically using preg_match_all() to recreate functionality similar to explode(). Then I'm remapping the results to a new array. Unfortunately this can't be done with Regex alone.
<?php
$url = '/events/display/id/featured/something-else';
if(preg_match('!^/events!',$url)){
$pattern = '!(?<=/)[^/]+!';
$m = preg_match_all($pattern,$url,$matches);
$results = array();
foreach($matches[0] as $key=>$value){
if($key==1){
$results['method']=$value;
} elseif(!empty($key)) {
$results['param'][]=$value;
}
}
}
print_r($results);
?>
Output
Array
(
[method] => display
[param] => Array
(
[0] => id
[1] => featured
[2] => something-else
)
)

string to array, split by single and double quotes

i'm trying to use php to split a string into array components using either " or ' as the delimiter. i just want to split by the outermost string. here are four examples and the desired result for each:
$pattern = "?????";
$str = "the cat 'sat on' the mat";
$res = preg_split($pattern, $str);
print_r($res);
/*output:
Array
(
[0] => the cat
[1] => 'sat on'
[2] => the mat
)*/
$str = "the cat \"sat on\" the mat";
$res = preg_split($pattern, $str);
print_r($res);
/*output:
Array
(
[0] => the cat
[1] => "sat on"
[2] => the mat
)*/
$str = "the \"cat 'sat' on\" the mat";
$res = preg_split($pattern, $str);
print_r($res);
/*output:
Array
(
[0] => the
[1] => "cat 'sat' on"
[2] => the mat
)*/
$str = "the 'cat \"sat\" on' the mat 'when \"it\" was' seventeen";
$res = preg_split($pattern, $str);
print_r($res);
/*output:
Array
(
[0] => the
[1] => 'cat "sat" on'
[2] => the mat
[3] => 'when "it" was'
[4] => seventeen
)*/
as you can see i only want to split by the outermost quotation, and i want to ignore any quotations within quotations.
the closest i have come up with for $pattern is
$pattern = "/((?P<quot>['\"])[^(?P=quot)]*?(?P=quot))/";
but obviously this is not working.
You can use preg_split with the PREG_SPLIT_DELIM_CAPTURE option. The regular expressions is not quite as elegant as #Jan TuroĊˆ's back reference approach because the required capture group messes up the results.
$str = "the 'cat \"sat\" on' the mat the \"cat 'sat' on\" the mat";
$match = preg_split("/('[^']*'|\"[^\"]*\")/U", $str, null, PREG_SPLIT_DELIM_CAPTURE);
print_r($match);
You can use just preg_match for this:
$str = "the \"cat 'sat' on\" the mat";
$pattern = '/^([^\'"]*)(([\'"]).*\3)(.*)$/';
if (preg_match($pattern, $str, $matches)) {
printf("[initial] => %s\n[quoted] => %s\n[end] => %s\n",
$matches[1],
$matches[2],
$matches[4]
);
}
This prints:
[initial] => the
[quoted] => "cat 'sat' on"
[end] => the mat
Here is an explanation of the regex:
/^([^\'"]*) => put the initial bit until the first quote (either single or double) in the first captured group
(([\'"]).*\3) => capture in \2 the text corresponding from the initial quote (either single or double) (that is captured in \3) until the closing quote (that must be the same type as the opening quote, hence the \3). The fact that the regexp is greedy by nature helps to get from the first quote to the last one, regardless of how many quotes are inside.
(.*)$/ => Capture until the end in \4
Yet another solution using preg_replace_callback
$result1 = array();
function parser($p) {
global $result1;
$result1[] = $p[0];
return "|"; // temporary delimiter
}
$str = "the 'cat \"sat\" on' the mat 'when \"it\" was' seventeen";
$str = preg_replace_callback("/(['\"]).*\\1/U", "parser", $str);
$result2 = explode("|",$str); // using temporary delimiter
Now you can zip those arrays using array_map
$result = array();
function zipper($a,$b) {
global $result;
if($a) $result[] = $a;
if($b) $result[] = $b;
}
array_map("zipper",$result2,$result1);
print_r($result);
And the result is
[0] => the
[1] => 'cat "sat" on'
[2] => the mat
[3] => 'when "it" was'
[4] => seventeen
Note: I'd would be probably better to create a class doing this feat, so the global variables can be avoided.
You can use back references and ungreedy modifier in preg_match_all
$str = "the 'cat \"sat\" on' the mat 'when \"it\" was' seventeen";
preg_match_all("/(['\"])(.*)\\1/U", $str, $match);
print_r($match[0]);
Now you have your outermost quotation parts
[0] => 'cat "sat" on'
[1] => 'when "it" was'
And you can find the rest of the string with substr and strpos (kind of blackbox solution)
$a = $b = 0; $result = array();
foreach($match[0] as $part) {
$b = strpos($str,$part);
$result[] = substr($str,$a,$b-$a);
$result[] = $part;
$a = $b+strlen($part);
}
$result[] = substr($str,$a);
print_r($result);
Here is the result
[0] => the
[1] => 'cat "sat" on'
[2] => the mat
[3] => 'when "it" was'
[4] => seventeen
Just strip eventual empty heading/trailing element if the quotation is at the very beginning/end of the string.

How do you split a string into word pairs?

I am trying to split a string into an array of word pairs in PHP. So for example if you have the input string:
"split this string into word pairs please"
the output array should look like
Array (
[0] => split this
[1] => this string
[2] => string into
[3] => into word
[4] => word pairs
[5] => pairs please
[6] => please
)
some failed attempts include:
$array = preg_split('/\w+\s+\w+/', $string);
which gives me an empty array, and
preg_match('/\w+\s+\w+/', $string, $array);
which splits the string into word pairs but doesn't repeat the word. Is there an easy way to do this? Thanks.
Why not just use explode ?
$str = "split this string into word pairs please";
$arr = explode(' ',$str);
$result = array();
for($i=0;$i<count($arr)-1;$i++) {
$result[] = $arr[$i].' '.$arr[$i+1];
}
$result[] = $arr[$i];
Working link
If you want to repeat with a regular expression, you'll need some sort of look-ahead or look-behind. Otherwise, the expression will not match the same word multiple times:
$s = "split this string into word pairs please";
preg_match_all('/(\w+) (?=(\w+))/', $s, $matches, PREG_SET_ORDER);
$a = array_map(
function($a)
{
return $a[1].' '.$a[2];
},
$matches
);
var_dump($a);
Output:
array(6) {
[0]=>
string(10) "split this"
[1]=>
string(11) "this string"
[2]=>
string(11) "string into"
[3]=>
string(9) "into word"
[4]=>
string(10) "word pairs"
[5]=>
string(12) "pairs please"
}
Note that it does not repeat the last word "please" as you requested, although I'm not sure why you would want that behavior.
You could explode the string and then loop through it:
$str = "split this string into word pairs please";
$strSplit = explode(' ', $str);
$final = array();
for($i=0, $j=0; $i<count($strSplit); $i++, $j++)
{
$final[$j] = $strSplit[$i] . ' ' . $strSplit[$i+1];
}
I think this works, but there should be a way easier solution.
Edited to make it conform to OP's spec. - as per codaddict
$s = "split this string into word pairs please";
$b1 = $b2 = explode(' ', $s);
array_shift($b2);
$r = array_map(function($a, $b) { return "$a $b"; }, $b1, $b2);
print_r($r);
gives:
Array
(
[0] => split this
[1] => this string
[2] => string into
[3] => into word
[4] => word pairs
[5] => pairs please
[6] => please
)

Split a string on every third instance of character

How can I explode a string on every third semicolon (;)?
example data:
$string = 'piece1;piece2;piece3;piece4;piece5;piece6;piece7;piece8;';
Desired output:
$output[0] = 'piece1;piece2:piece3;'
$output[1] = 'piece4;piece5;piece6;'
$output[2] = 'piece7;piece8;'
I am sure you can do something slick with regular expressions, but why not just explode the each semicolor and then add them three at a time.
$tmp = explode(";", $string);
$i=0;
$j=0;
foreach($tmp as $piece) {
if(! ($i++ %3)) $j++; //increment every 3
$result[$j] .= $piece;
}
Easiest solution I can think of is:
$chunks = array_chunk(explode(';', $input), 3);
$output = array_map(create_function('$a', 'return implode(";",$a);'), $chunks);
Essentially the same solution as the other ones that explode and join again...
$tmp = explode(";", $string);
while ($tmp) {
$output[] = implode(';', array_splice($tmp, 0, 3));
};
$string = "piece1;piece2;piece3;piece4;piece5;piece6;piece7;piece8;piece9;";
preg_match_all('/([A-Za-z0-9\.]*;[A-Za-z0-9\.]*;[A-Za-z0-9\.]*;)/',$string,$matches);
print_r($matches);
Array
(
[0] => Array
(
[0] => piece1;piece2;piece3;
[1] => piece4;piece5;piece6;
[2] => piece7;piece8;piece9;
)
[1] => Array
(
[0] => piece1;piece2;piece3;
[1] => piece4;piece5;piece6;
[2] => piece7;piece8;piece9;
)
)
Maybe approach it from a different angle. Explode() it all, then combine it back in triples. Like so...
$str = "1;2;3;4;5;6;7;8;9";
$boobies = explode(";", $array);
while (!empty($boobies))
{
$foo = array();
$foo[] = array_shift($boobies);
$foo[] = array_shift($boobies);
$foo[] = array_shift($boobies);
$bar[] = implode(";", $foo) . ";";
}
print_r($bar);
Array
(
[0] => 1;2;3;
[1] => 4;5;6;
[2] => 7;8;9;
)
Here's a regex approach, which I can't say is all too good looking.
$str='';
for ($i=1; $i<20; $i++) {
$str .= "$i;";
}
$split = preg_split('/((?:[^;]*;){3})/', $str, -1,
PREG_SPLIT_NO_EMPTY | PREG_SPLIT_DELIM_CAPTURE);
Output:
Array
(
[0] => 1;2;3;
[1] => 4;5;6;
[2] => 7;8;9;
[3] => 10;11;12;
[4] => 13;14;15;
[5] => 16;17;18;
[6] => 19;
)
Another regex approach.
<?php
$string = 'piece1;piece2;piece3;piece4;piece5;piece6;piece7;piece8';
preg_match_all('/([^;]+;?){1,3}/', $string, $m, PREG_SET_ORDER);
print_r($m);
Results:
Array
(
[0] => Array
(
[0] => piece1;piece2;piece3;
[1] => piece3;
)
[1] => Array
(
[0] => piece4;piece5;piece6;
[1] => piece6;
)
[2] => Array
(
[0] => piece7;piece8
[1] => piece8
)
)
Regex Split
$test = ";2;3;4;5;6;7;8;9;10;;12;;14;15;16;17;18;19;20";
// match all groups that:
// (?<=^|;) follow the beginning of the string or a ;
// [^;]* have zero or more non ; characters
// ;? maybe a semi-colon (so we catch a single group)
// [^;]*;? again (catch second item)
// [^;]* without the trailing ; (to not capture the final ;)
preg_match_all("/(?<=^|;)[^;]*;?[^;]*;?[^;]*/", $test, $matches);
var_dump($matches[0]);
array(7) {
[0]=>
string(4) ";2;3"
[1]=>
string(5) "4;5;6"
[2]=>
string(5) "7;8;9"
[3]=>
string(6) "10;;12"
[4]=>
string(6) ";14;15"
[5]=>
string(8) "16;17;18"
[6]=>
string(5) "19;20"
}
<?php
$str = 'piece1;piece2;piece3;piece4;piece5;piece6;piece7;piece8;';
$arr = array_map(function ($arr) {
return implode(";", $arr);
}, array_chunk(explode(";", $str), 3));
var_dump($arr);
outputs
array(3) {
[0]=>
string(20) "piece1;piece2;piece3"
[1]=>
string(20) "piece4;piece5;piece6"
[2]=>
string(14) "piece7;piece8;"
}
Similar to #Sebastian's earlier answer, I recommend preg_split() with a repeated pattern. The difference is that by using a non-capturing group and appending \K to restart the fullstring match, you can spare writting the PREG_SPLIT_DELIM_CAPTURE flag.
Code: (Demo)
$string = 'piece1;piece2;piece3;piece4;piece5;piece6;piece7;piece8;';
var_export(preg_split('/(?:[^;]*;){3}\K/', $string, 0, PREG_SPLIT_NO_EMPTY));
A similar technique for splitting after every 2 things can be found here. That snippet actually writes the \K before the last space character so that the trailing space is consumed while splitting.

Categories