php explode: split string into words by using space a delimiter - php

$str = "This is a string";
$words = explode(" ", $str);
Works fine, but spaces still go into array:
$words === array ('This', 'is', 'a', '', '', '', 'string');//true
I would prefer to have words only with no spaces and keep the information about the number of spaces separate.
$words === array ('This', 'is', 'a', 'string');//true
$spaces === array(1,1,4);//true
Just added: (1, 1, 4) means one space after the first word, one space after the second word and 4 spaces after the third word.
Is there any way to do it fast?
Thank you.

For splitting the String into an array, you should use preg_split:
$string = 'This is a string';
$data = preg_split('/\s+/', $string);
Your second part (counting spaces):
$string = 'This is a string';
preg_match_all('/\s+/', $string, $matches);
$result = array_map('strlen', $matches[0]);// [1, 1, 4]

Here is one way, splitting the string and running a regex once, then parsing the results to see which segments were captured as the split (and therefore only whitespace), or which ones are words:
$temp = preg_split('/(\s+)/', $str, -1, PREG_SPLIT_DELIM_CAPTURE | PREG_SPLIT_NO_EMPTY);
$spaces = array();
$words = array_reduce( $temp, function( &$result, $item) use ( &$spaces) {
if( strlen( trim( $item)) === 0) {
$spaces[] = strlen( $item);
} else {
$result[] = $item;
}
return $result;
}, array());
You can see from this demo that $words is:
Array
(
[0] => This
[1] => is
[2] => a
[3] => string
)
And $spaces is:
Array
(
[0] => 1
[1] => 1
[2] => 4
)

You can use preg_split() for the first array:
$str = 'This is a string';
$words = preg_split('#\s+#', $str);
And preg_match_all() for the $spaces array:
preg_match_all('#\s+#', $str, $m);
$spaces = array_map('strlen', $m[0]);

Another way to do it would be using foreach loop.
$str = "This is a string";
$words = explode(" ", $str);
$spaces=array();
$others=array();
foreach($words as $word)
{
if($word==' ')
{
array_push($spaces,$word);
}
else
{
array_push($others,$word);
}
}

Here are the results of performance tests:
$str = "This is a string";
var_dump(time());
for ($i=1;$i<100000;$i++){
//Alma Do Mundo - the winner
$rgData = preg_split('/\s+/', $str);
preg_match_all('/\s+/', $str, $rgMatches);
$rgResult = array_map('strlen', $rgMatches[0]);// [1,1,4]
}
print_r($rgData); print_r( $rgResult);
var_dump(time());
for ($i=1;$i<100000;$i++){
//nickb
$temp = preg_split('/(\s+)/', $str, -1,PREG_SPLIT_DELIM_CAPTURE | PREG_SPLIT_NO_EMPTY);
$spaces = array();
$words = array_reduce( $temp, function( &$result, $item) use ( &$spaces) {
if( strlen( trim( $item)) === 0) {
$spaces[] = strlen( $item);
} else {
$result[] = $item;
}
return $result;
}, array());
}
print_r( $words); print_r( $spaces);
var_dump(time());
int(1378392870)
Array
(
[0] => This
[1] => is
[2] => a
[3] => string
)
Array
(
[0] => 1
[1] => 1
[2] => 4
)
int(1378392871)
Array
(
[0] => This
[1] => is
[2] => a
[3] => string
)
Array
(
[0] => 1
[1] => 1
[2] => 4
)
int(1378392873)

$financialYear = 2015-2016;
$test = explode('-',$financialYear);
echo $test[0]; // 2015
echo $test[1]; // 2016

Splitting with regex has been demonstrated well by earlier answers, but I think this is a perfect case for calling ctype_space() to determine which result array should receive the encountered value.
Code: (Demo)
$string = "This is a string";
$words = [];
$spaces = [];
foreach (preg_split('~( +)~', $string, null, PREG_SPLIT_NO_EMPTY | PREG_SPLIT_DELIM_CAPTURE) as $s) {
if (ctype_space($s)) {
$spaces[] = strlen($s);
} else {
$words[] = $s;
}
}
var_export([
'words' => $words,
'spaces' => $spaces
]);
Output:
array (
'words' =>
array (
0 => 'This',
1 => 'is',
2 => 'a',
3 => 'string',
),
'spaces' =>
array (
0 => 1,
1 => 1,
2 => 4,
),
)
If you want to replace the piped constants used by preg_split() you can just use 3 (Demo). This represents PREG_SPLIT_NO_EMPTY which is 1 plus PREG_SPLIT_DELIM_CAPTURE which is 2. Be aware that with this reduction in code width, you also lose code readability.
preg_split('~( +)~', $string, -1, 3)

What about this? Does someone care to profile this?
$str = str_replace(["\t", "\r", "\r", "\0", "\v"], ' ', $str); // \v -> vertical space, see trim()
$words = explode(' ', $str);
$words = array_filter($words); // there would be lots elements from lots of spaces so skip them.

Related

Comma-separated string to associative array with lowercase keys

I have a string like this:
$string = 'Apple, Orange, Lemone';
I want to make this string to:
$array = array('apple'=>'Apple', 'orang'=>'Orange', 'lemone'=>'Lemone');
How to achieve this?
I used the explode function like this:
$string = explode(',', $string );
But that only gives me this:
Array ( [0] => Apple [1] => Orange [2] => Lemone )
Somebody says this question is a duplicate of this SO question: Split a comma-delimited string into an array?
But that question is not addressing my problem. I have already reached the answer of that question, please read my question. I need to change that result array's key value and case. see:
Array
(
[0] => 9
[1] => admin#example.com
[2] => 8
)
to like this
Array
(
[9] => 9
[admin#example.com] => admin#example.com
[8] => 8
)
You may try :
$string = 'Apple, Orange, Lemone';
$string = str_replace(' ', '', $string);
$explode = explode(',',$string);
$res = array_combine($explode,$explode);
echo '<pre>';
print_r($res);
echo '</pre>';
Or if you need to make lower case key for resulting array use following :
echo '<pre>';
print_r(array_change_key_case($res,CASE_LOWER));
echo '</pre>';
<?php
$string = 'Apple, Orange, Lemone'; // your string having space after each value
$string =str_replace(' ', '', $string); // removing blank spaces
$array = explode(',', $string );
$final_array = array_combine($array, $array);
$final_array = array_change_key_case($final_array, CASE_LOWER); // Converting all the keys to lower case based on your requiment
echo '<pre>';
print_r($final_array);
?>
You can use array functions such as array_combine, and array_values
$string = 'Apple, Orange, Lemone';
$arr = explode(', ', $string);
$assocArr = array_change_key_case(
array_combine(array_values($arr), $arr)
);
<?php
$string = 'Apple, Orange, Lemone';
$array = explode(', ', $string);
print_r($array);
output will be
Array
(
[0] => Apple
[1] => Orange
[2] => Lemone
)
Now
$ar = array();
foreach ($array as $value) {
$ar[$value] = $value;
}
print_r($ar);
Your Desire Output:
Array
(
[Apple] => Apple
[Orange] => Orange
[Lemone] => Lemone
)
$valuesInArrayWithSpace = explode("," $string);
$finalArray = [];
foreach ($ValuesInArrayWitchSpace as $singleItem) {
$finalArray[trim(strtolower($singleItem))] = trim($singleItem);
}
This way you'll have what you need:
$lowerString = strtolower($string);
$values = explode(', ', $string);
$keys = explode(', ', $lowerString);
$array = array_combine($keys, $values);
print_r($array);
Or...
$csvdata = str_getcsv('Apple,Orange,Lemone');
$arr = [];
foreach($csvdata as $a) {
$arr[strtolower(trim($a))] = $a;
}
print_r($arr);
use array_flip to change the values as key and use array_combine
<?php
$string = 'Apple, Orange, Lemone';
$array = explode(', ', $string);
$new_array =array_flip($array);
$final_array = array_combine($new_array,$array)
print_r($final_array );
?>
A clean, functional-style approach can use array_reduce() after splitting on commas followed by spaces.
Before pushing the new associative values into the result array, change the key to lowercase.
Code: (Demo)
$string = 'Apple, Orange, Lemone';
var_export(
array_reduce(
explode(', ', $string),
fn($result, $v) =>
$result + [strtolower($v) => $v],
[]
)
);
Output:
array (
'apple' => 'Apple',
'orange' => 'Orange',
'lemone' => 'Lemone',
)

PHP foreach() loop

$string = "The complete archive of The New York Times can now be searched from NYTimes.com " //the actual input is unknown, it would be read from textarea
$size = the longest word length from the string
I assigned and initialized array in for loop, for example array1, array2 ....arrayN, here is how i did
for ($i = 1; $i <= $size; $i++) {
${"array" . $i} = array();
}
so the $string would be divided in the length of the word
$array1 = [""];
$array2 = ["of", "be", ...]
$array3 = ["the", "can", "now", ...] and so on
So, my question is how to assign in simple for loop or foreach loop $string value to $array1, $array2, $array3 ....., since the input text or the size of the longest word is unknown
I'd probably start with $words = explode(' ', $string)
then sort the string by word length
usort($words, function($word1, $word2) {
if (strlen($word1) == strlen($word2)) {
return 0;
}
return (strlen($word1) < strlen($word2)) ? -1 : 1;
});
$longestWordSize = strlen(last($words));
Loop over the words and place in their respective buckets.
Rather than separate variables for each length array, you should consider something like
$sortedWords = array(
1 => array('a', 'I'),
2 => array('to', 'be', 'or', 'is'),
3 => array('not', 'the'),
);
by looping over the words you don't need to know the maximum word length.
The final solution is as simple as
foreach ($words as $word) {
$wordLength = strlen($word);
$sortedWords[ $wordLength ][] = $word;
}
You could use something like this:
$words = explode(" ", $string);
foreach ($words as $w) {
array_push(${"array" . strlen($w)}, $w);
}
This splits up $string into an array of $words and then evaluates each word for length and pushes that word to the appropriate array.
you can use explode().
$string = "The complete archive of The New York Times can now be searched from NYTimes.com " ;
$arr=explode(" ",$string);
$count=count($arr);
$big=0;
for ($i = 0; $i < $count; $i++) {
$p=strlen($arr[$i]);
if($big<$p){ $big_val=$arr[$i]; $big=$p;}
}
echo $big_val;
Just use the word length as the index and append [] each word:
foreach(explode(' ', $string) as $word) {
$array[strlen($word)][] = $word;
}
To remove duplicates $array = array_map('array_unique', $array);.
Yields:
Array
(
[3] => Array
(
[0] => The
[2] => New
[3] => can
[4] => now
)
[8] => Array
(
[0] => complete
[1] => searched
)
[7] => Array
(
[0] => archive
)
[2] => Array
(
[0] => of
[1] => be
)
[4] => Array
(
[0] => York
)
[5] => Array
(
[0] => Times
)
)
If you want to re-index the main array use array_values() and to re-index the subarrays use array_map() with array_values().

PHP - Multiple delimiter implode

I want to implode a string with multiple delimiters. I've already eplode it with this PHP function:
function multiexplode ($delimiters,$string) {
$ready = str_replace($delimiters, $delimiters[0], $string);
$launch = explode($delimiters[0], $ready);
return $launch;
}
$text = "here is a sample: this text, and this will be exploded. this also | this one too :)";
$exploded = multiexplode(array(",",".","|",":"),$text);
The output of this is:
Array (
[0] => here is a sample
[1] => this text
[2] => and this will be exploded
[3] => this also
[4] => this one too
[5] => )
)
Can I implode this array with the following multiple delimiters: , . | : ?
Edit:
For define the rules I think this is the best option:
$test = array(':', ',', '.', '|', ':');
$i = 0;
foreach ($exploded as $value) {
$exploded[$i] .= $test[$i];
$i++;
}
$test2 = implode($exploded);
The output of $test2 is:
here is a sample: this text, and this will be exploded. this also | this one too :)
I only now need to know how to define the $test array (maybe with preg_match()?) so that it matched these values , . | : and set the variables in a order where it occurs in the string into an array. Is this possible?
function multiexplode ($delimiters,$string) {
$ready = str_replace($delimiters, $delimiters[0], $string);
$launch = explode($delimiters[0], $ready);
return $launch;
}
$string = "here is a sample: this text, and this will be exploded. this also | this one too :)";
echo "Input:".PHP_EOL.$string;
$needle = array(",",".","|",":");
$split = multiexplode($needle, $string);
$chars = implode($needle);
$found = array();
while (false !== $search = strpbrk($string, $chars)) {
$found[] = $search[0];
$string = substr($search, 1);
}
echo PHP_EOL.PHP_EOL."Found needle:".PHP_EOL.PHP_EOL;
print_r($found);
$i = 0;
foreach ($split as $value) {
$split[$i] .= $found[$i];
$i++;
}
$output = implode($split);
echo PHP_EOL."Output:".PHP_EOL.$output;
The output of this is:
Input:
here is a sample: this text, and this will be exploded. this also | this one too :)
Found needle:
Array
(
[0] => :
[1] => ,
[2] => .
[3] => |
[4] => :
)
Output:
here is a sample: this text, and this will be exploded. this also | this one too :)
You can see it working here.
For more information what's the function of strpbrk in this script, see here.
It's my first contribution to Stack Overflow, hope it helps.

string to array, split by single and double quotes

i'm trying to use php to split a string into array components using either " or ' as the delimiter. i just want to split by the outermost string. here are four examples and the desired result for each:
$pattern = "?????";
$str = "the cat 'sat on' the mat";
$res = preg_split($pattern, $str);
print_r($res);
/*output:
Array
(
[0] => the cat
[1] => 'sat on'
[2] => the mat
)*/
$str = "the cat \"sat on\" the mat";
$res = preg_split($pattern, $str);
print_r($res);
/*output:
Array
(
[0] => the cat
[1] => "sat on"
[2] => the mat
)*/
$str = "the \"cat 'sat' on\" the mat";
$res = preg_split($pattern, $str);
print_r($res);
/*output:
Array
(
[0] => the
[1] => "cat 'sat' on"
[2] => the mat
)*/
$str = "the 'cat \"sat\" on' the mat 'when \"it\" was' seventeen";
$res = preg_split($pattern, $str);
print_r($res);
/*output:
Array
(
[0] => the
[1] => 'cat "sat" on'
[2] => the mat
[3] => 'when "it" was'
[4] => seventeen
)*/
as you can see i only want to split by the outermost quotation, and i want to ignore any quotations within quotations.
the closest i have come up with for $pattern is
$pattern = "/((?P<quot>['\"])[^(?P=quot)]*?(?P=quot))/";
but obviously this is not working.
You can use preg_split with the PREG_SPLIT_DELIM_CAPTURE option. The regular expressions is not quite as elegant as #Jan TuroĊˆ's back reference approach because the required capture group messes up the results.
$str = "the 'cat \"sat\" on' the mat the \"cat 'sat' on\" the mat";
$match = preg_split("/('[^']*'|\"[^\"]*\")/U", $str, null, PREG_SPLIT_DELIM_CAPTURE);
print_r($match);
You can use just preg_match for this:
$str = "the \"cat 'sat' on\" the mat";
$pattern = '/^([^\'"]*)(([\'"]).*\3)(.*)$/';
if (preg_match($pattern, $str, $matches)) {
printf("[initial] => %s\n[quoted] => %s\n[end] => %s\n",
$matches[1],
$matches[2],
$matches[4]
);
}
This prints:
[initial] => the
[quoted] => "cat 'sat' on"
[end] => the mat
Here is an explanation of the regex:
/^([^\'"]*) => put the initial bit until the first quote (either single or double) in the first captured group
(([\'"]).*\3) => capture in \2 the text corresponding from the initial quote (either single or double) (that is captured in \3) until the closing quote (that must be the same type as the opening quote, hence the \3). The fact that the regexp is greedy by nature helps to get from the first quote to the last one, regardless of how many quotes are inside.
(.*)$/ => Capture until the end in \4
Yet another solution using preg_replace_callback
$result1 = array();
function parser($p) {
global $result1;
$result1[] = $p[0];
return "|"; // temporary delimiter
}
$str = "the 'cat \"sat\" on' the mat 'when \"it\" was' seventeen";
$str = preg_replace_callback("/(['\"]).*\\1/U", "parser", $str);
$result2 = explode("|",$str); // using temporary delimiter
Now you can zip those arrays using array_map
$result = array();
function zipper($a,$b) {
global $result;
if($a) $result[] = $a;
if($b) $result[] = $b;
}
array_map("zipper",$result2,$result1);
print_r($result);
And the result is
[0] => the
[1] => 'cat "sat" on'
[2] => the mat
[3] => 'when "it" was'
[4] => seventeen
Note: I'd would be probably better to create a class doing this feat, so the global variables can be avoided.
You can use back references and ungreedy modifier in preg_match_all
$str = "the 'cat \"sat\" on' the mat 'when \"it\" was' seventeen";
preg_match_all("/(['\"])(.*)\\1/U", $str, $match);
print_r($match[0]);
Now you have your outermost quotation parts
[0] => 'cat "sat" on'
[1] => 'when "it" was'
And you can find the rest of the string with substr and strpos (kind of blackbox solution)
$a = $b = 0; $result = array();
foreach($match[0] as $part) {
$b = strpos($str,$part);
$result[] = substr($str,$a,$b-$a);
$result[] = $part;
$a = $b+strlen($part);
}
$result[] = substr($str,$a);
print_r($result);
Here is the result
[0] => the
[1] => 'cat "sat" on'
[2] => the mat
[3] => 'when "it" was'
[4] => seventeen
Just strip eventual empty heading/trailing element if the quotation is at the very beginning/end of the string.

Only print defined str_word_count matches?

How can i use (str_word_count($str, 1)); as an array and omit words assigned a number by leaving them out... So Hello [1] => World [2] => This [3] => Is [4] => a [5] => Test ) 6 only outputs the numbers i define, such as [1] and [2] to omit This is a test leaving only leaving Hello World, or [1] and [6] for Hello Test...
You can do that with array_intersect and str_word_count or explode
$input = 'Hello World This Is a Test';
$allow = array('Hello', 'Test');
$data = explode(' ', $input);
// or your way
$data = str_word_count($input, 1);
$output = array_intersect($data, $allow);
$count = count($output);
echo 'Found ' + $count;
var_dump($output);
PHP 5.3 solution
$input = 'Hello World This Is a Test';
$allow = array('Hello', 'World');
$array = array_filter(
str_word_count( $input, 1 ),
function( $v ) use( $allow ) {
return in_array( $v, $allow ) ? $v : false;
}
);
print_r( $array );

Categories