Why is there an extra empty row when splited by multibyte punctuation? - php

Try this:
$pattern = '/[\x{ff0c},]/u';
//$string = "something here ; and there, oh,that's all!";
$string = 'hei,nihao,a ';
echo '<pre>', print_r( preg_split( $pattern, $string ), 1 ), '</pre>';
exit();
output:
<pre>Array
(
[0] => hei,nihao,a
)
</pre>

The character you have is a fullwidth comma ( hex ff0c ), as well as a regular comma. Have you tried updating it to my version which accounts for it?
<?php
$pattern = '/[\x{ff0c},]/u';
//$string = "something here ; and there, oh,that's all!";
$string = 'hei,nihao,a ';
echo '<pre>', print_r( preg_split( $pattern, $string ), 1 ), '</pre>';
Output:
Array
(
[0] => hei
[1] => nihao
[2] => a
)

Related

PHP String(which contain array of object) to Array conversion

Suppose I have a string (which contain array of objects):
$string = "[{'test':'1', 'anothertest':'2'}, {'test':'3', 'anothertest':'4'}]";
My goals is to get the output to look like this when I print_r:
Array
(
[0] => Array
(
[test] => 1
[anothertest] => 2
)
[1] => Array
(
[test] => 3
[anothertest] => 4
)
)
I tried to json_decode($string) but it returned NULL
Also tried my own workaround which is kinda solved the problem,
$string = "[{'test':'1', 'anothertest':'2'}, {'test':'3', 'anothertest':'4'}]";
$string = substr($string, 1, -1);
$string = str_replace("'","\"", $string);
$string = str_replace("},","}VerySpecialSeparator", $string);
$arrayOfString = explode("VerySpecialSeparator",$string);
$results = [];
foreach($arrayOfString as $string) {
$results[] = json_decode($string, true);
}
echo "<pre>";
print_r($results);
die;
But is there any other ways to solve this?
As per your given data, if quotes will be corrected, then you will get your desired output, so get it done like below:
<?php
$string = "[{'test':'1', 'anothertest':'2'}, {'test':'3', 'anothertest':'4'}]";
$string = str_replace("'",'"', $string);
print_r(json_decode($string,true));
https://3v4l.org/PDI7O

Split a string just before each occurrence of 3 specific delimiters

I'm a bit lost with preg_split() in parsing a string with multiple delimiters and keeping the delimiter in the 'after' part of the split.
My delimiters are $, #, and ?.
For instance:
$str = 'participant-$id#-group';
$ar = preg_split('/([^$#?]+[$#?]+)/', $str, -1, PREG_SPLIT_DELIM_CAPTURE | PREG_SPLIT_NO_EMPTY);
echo "<pre>"; print_r( $ar); echo "</pre>";
will show:
Array
(
[0] => participant_data-$
[1] => id#
[2] => -group
)
However I need:
Array
(
[0] => participant_data-
[1] => $id
[2] => #-group
)
Regex makes my brain hurt. so could someone advise how I use PREG_SPLIT_DELIM_CAPTURE and keep the delimiter at the beginning of the segment?
Try this:
$ar = preg_split('/(\$[^#]+)/', $str, -1, PREG_SPLIT_DELIM_CAPTURE | PREG_SPLIT_NO_EMPTY);
How about this. I am capturing the delimiters and then put them back together.
<?php
$str = 'participant-$id#-group';
$ar = preg_split('/([^$#?]+[^$#?]+)/', $str, -1, PREG_SPLIT_DELIM_CAPTURE | PREG_SPLIT_NO_EMPTY);
echo "<pre>"; print_r( $ar); echo "</pre>";
/*
Array
(
[0] => participant-
[1] => $
[2] => id
[3] => #
[4] => -group
) */
$result = array();
$result[] = $ar[0];
for($i=1;$i<count($ar);$i+=2) {
$result[] = $ar[$i] . $ar[$i+1];
}
echo "<pre>"; print_r( $result); echo "</pre>";
/*
Array
(
[0] => participant-
[1] => $id
[2] => #-group
)
*/
?>
You don't need to capture the delimiters, just use a lookahead for the $, #, or ?. This will split your string on the zero-width position before the delimiters. No characters will be lost/consumed why exploding.
Code: (Demo)
$str = 'participant-$id#-group';
var_export(
preg_split('/(?=[$#?])/', $str)
);
Output:
array (
0 => 'participant-',
1 => '$id',
2 => '#-group',
)

php explode: split string into words by using space a delimiter

$str = "This is a string";
$words = explode(" ", $str);
Works fine, but spaces still go into array:
$words === array ('This', 'is', 'a', '', '', '', 'string');//true
I would prefer to have words only with no spaces and keep the information about the number of spaces separate.
$words === array ('This', 'is', 'a', 'string');//true
$spaces === array(1,1,4);//true
Just added: (1, 1, 4) means one space after the first word, one space after the second word and 4 spaces after the third word.
Is there any way to do it fast?
Thank you.
For splitting the String into an array, you should use preg_split:
$string = 'This is a string';
$data = preg_split('/\s+/', $string);
Your second part (counting spaces):
$string = 'This is a string';
preg_match_all('/\s+/', $string, $matches);
$result = array_map('strlen', $matches[0]);// [1, 1, 4]
Here is one way, splitting the string and running a regex once, then parsing the results to see which segments were captured as the split (and therefore only whitespace), or which ones are words:
$temp = preg_split('/(\s+)/', $str, -1, PREG_SPLIT_DELIM_CAPTURE | PREG_SPLIT_NO_EMPTY);
$spaces = array();
$words = array_reduce( $temp, function( &$result, $item) use ( &$spaces) {
if( strlen( trim( $item)) === 0) {
$spaces[] = strlen( $item);
} else {
$result[] = $item;
}
return $result;
}, array());
You can see from this demo that $words is:
Array
(
[0] => This
[1] => is
[2] => a
[3] => string
)
And $spaces is:
Array
(
[0] => 1
[1] => 1
[2] => 4
)
You can use preg_split() for the first array:
$str = 'This is a string';
$words = preg_split('#\s+#', $str);
And preg_match_all() for the $spaces array:
preg_match_all('#\s+#', $str, $m);
$spaces = array_map('strlen', $m[0]);
Another way to do it would be using foreach loop.
$str = "This is a string";
$words = explode(" ", $str);
$spaces=array();
$others=array();
foreach($words as $word)
{
if($word==' ')
{
array_push($spaces,$word);
}
else
{
array_push($others,$word);
}
}
Here are the results of performance tests:
$str = "This is a string";
var_dump(time());
for ($i=1;$i<100000;$i++){
//Alma Do Mundo - the winner
$rgData = preg_split('/\s+/', $str);
preg_match_all('/\s+/', $str, $rgMatches);
$rgResult = array_map('strlen', $rgMatches[0]);// [1,1,4]
}
print_r($rgData); print_r( $rgResult);
var_dump(time());
for ($i=1;$i<100000;$i++){
//nickb
$temp = preg_split('/(\s+)/', $str, -1,PREG_SPLIT_DELIM_CAPTURE | PREG_SPLIT_NO_EMPTY);
$spaces = array();
$words = array_reduce( $temp, function( &$result, $item) use ( &$spaces) {
if( strlen( trim( $item)) === 0) {
$spaces[] = strlen( $item);
} else {
$result[] = $item;
}
return $result;
}, array());
}
print_r( $words); print_r( $spaces);
var_dump(time());
int(1378392870)
Array
(
[0] => This
[1] => is
[2] => a
[3] => string
)
Array
(
[0] => 1
[1] => 1
[2] => 4
)
int(1378392871)
Array
(
[0] => This
[1] => is
[2] => a
[3] => string
)
Array
(
[0] => 1
[1] => 1
[2] => 4
)
int(1378392873)
$financialYear = 2015-2016;
$test = explode('-',$financialYear);
echo $test[0]; // 2015
echo $test[1]; // 2016
Splitting with regex has been demonstrated well by earlier answers, but I think this is a perfect case for calling ctype_space() to determine which result array should receive the encountered value.
Code: (Demo)
$string = "This is a string";
$words = [];
$spaces = [];
foreach (preg_split('~( +)~', $string, null, PREG_SPLIT_NO_EMPTY | PREG_SPLIT_DELIM_CAPTURE) as $s) {
if (ctype_space($s)) {
$spaces[] = strlen($s);
} else {
$words[] = $s;
}
}
var_export([
'words' => $words,
'spaces' => $spaces
]);
Output:
array (
'words' =>
array (
0 => 'This',
1 => 'is',
2 => 'a',
3 => 'string',
),
'spaces' =>
array (
0 => 1,
1 => 1,
2 => 4,
),
)
If you want to replace the piped constants used by preg_split() you can just use 3 (Demo). This represents PREG_SPLIT_NO_EMPTY which is 1 plus PREG_SPLIT_DELIM_CAPTURE which is 2. Be aware that with this reduction in code width, you also lose code readability.
preg_split('~( +)~', $string, -1, 3)
What about this? Does someone care to profile this?
$str = str_replace(["\t", "\r", "\r", "\0", "\v"], ' ', $str); // \v -> vertical space, see trim()
$words = explode(' ', $str);
$words = array_filter($words); // there would be lots elements from lots of spaces so skip them.

string to array, split by single and double quotes

i'm trying to use php to split a string into array components using either " or ' as the delimiter. i just want to split by the outermost string. here are four examples and the desired result for each:
$pattern = "?????";
$str = "the cat 'sat on' the mat";
$res = preg_split($pattern, $str);
print_r($res);
/*output:
Array
(
[0] => the cat
[1] => 'sat on'
[2] => the mat
)*/
$str = "the cat \"sat on\" the mat";
$res = preg_split($pattern, $str);
print_r($res);
/*output:
Array
(
[0] => the cat
[1] => "sat on"
[2] => the mat
)*/
$str = "the \"cat 'sat' on\" the mat";
$res = preg_split($pattern, $str);
print_r($res);
/*output:
Array
(
[0] => the
[1] => "cat 'sat' on"
[2] => the mat
)*/
$str = "the 'cat \"sat\" on' the mat 'when \"it\" was' seventeen";
$res = preg_split($pattern, $str);
print_r($res);
/*output:
Array
(
[0] => the
[1] => 'cat "sat" on'
[2] => the mat
[3] => 'when "it" was'
[4] => seventeen
)*/
as you can see i only want to split by the outermost quotation, and i want to ignore any quotations within quotations.
the closest i have come up with for $pattern is
$pattern = "/((?P<quot>['\"])[^(?P=quot)]*?(?P=quot))/";
but obviously this is not working.
You can use preg_split with the PREG_SPLIT_DELIM_CAPTURE option. The regular expressions is not quite as elegant as #Jan Turoň's back reference approach because the required capture group messes up the results.
$str = "the 'cat \"sat\" on' the mat the \"cat 'sat' on\" the mat";
$match = preg_split("/('[^']*'|\"[^\"]*\")/U", $str, null, PREG_SPLIT_DELIM_CAPTURE);
print_r($match);
You can use just preg_match for this:
$str = "the \"cat 'sat' on\" the mat";
$pattern = '/^([^\'"]*)(([\'"]).*\3)(.*)$/';
if (preg_match($pattern, $str, $matches)) {
printf("[initial] => %s\n[quoted] => %s\n[end] => %s\n",
$matches[1],
$matches[2],
$matches[4]
);
}
This prints:
[initial] => the
[quoted] => "cat 'sat' on"
[end] => the mat
Here is an explanation of the regex:
/^([^\'"]*) => put the initial bit until the first quote (either single or double) in the first captured group
(([\'"]).*\3) => capture in \2 the text corresponding from the initial quote (either single or double) (that is captured in \3) until the closing quote (that must be the same type as the opening quote, hence the \3). The fact that the regexp is greedy by nature helps to get from the first quote to the last one, regardless of how many quotes are inside.
(.*)$/ => Capture until the end in \4
Yet another solution using preg_replace_callback
$result1 = array();
function parser($p) {
global $result1;
$result1[] = $p[0];
return "|"; // temporary delimiter
}
$str = "the 'cat \"sat\" on' the mat 'when \"it\" was' seventeen";
$str = preg_replace_callback("/(['\"]).*\\1/U", "parser", $str);
$result2 = explode("|",$str); // using temporary delimiter
Now you can zip those arrays using array_map
$result = array();
function zipper($a,$b) {
global $result;
if($a) $result[] = $a;
if($b) $result[] = $b;
}
array_map("zipper",$result2,$result1);
print_r($result);
And the result is
[0] => the
[1] => 'cat "sat" on'
[2] => the mat
[3] => 'when "it" was'
[4] => seventeen
Note: I'd would be probably better to create a class doing this feat, so the global variables can be avoided.
You can use back references and ungreedy modifier in preg_match_all
$str = "the 'cat \"sat\" on' the mat 'when \"it\" was' seventeen";
preg_match_all("/(['\"])(.*)\\1/U", $str, $match);
print_r($match[0]);
Now you have your outermost quotation parts
[0] => 'cat "sat" on'
[1] => 'when "it" was'
And you can find the rest of the string with substr and strpos (kind of blackbox solution)
$a = $b = 0; $result = array();
foreach($match[0] as $part) {
$b = strpos($str,$part);
$result[] = substr($str,$a,$b-$a);
$result[] = $part;
$a = $b+strlen($part);
}
$result[] = substr($str,$a);
print_r($result);
Here is the result
[0] => the
[1] => 'cat "sat" on'
[2] => the mat
[3] => 'when "it" was'
[4] => seventeen
Just strip eventual empty heading/trailing element if the quotation is at the very beginning/end of the string.

Only print defined str_word_count matches?

How can i use (str_word_count($str, 1)); as an array and omit words assigned a number by leaving them out... So Hello [1] => World [2] => This [3] => Is [4] => a [5] => Test ) 6 only outputs the numbers i define, such as [1] and [2] to omit This is a test leaving only leaving Hello World, or [1] and [6] for Hello Test...
You can do that with array_intersect and str_word_count or explode
$input = 'Hello World This Is a Test';
$allow = array('Hello', 'Test');
$data = explode(' ', $input);
// or your way
$data = str_word_count($input, 1);
$output = array_intersect($data, $allow);
$count = count($output);
echo 'Found ' + $count;
var_dump($output);
PHP 5.3 solution
$input = 'Hello World This Is a Test';
$allow = array('Hello', 'World');
$array = array_filter(
str_word_count( $input, 1 ),
function( $v ) use( $allow ) {
return in_array( $v, $allow ) ? $v : false;
}
);
print_r( $array );

Categories