str_getcsv not parsing the data correctly - php

I have a problem with str_getcsv function for PHP.
I have this code:
<?php
$string = '#EXT-X-STREAM-INF:PROGRAM-ID=1,BANDWIDTH=714000,RESOLUTION=640x480,CODECS="avc1.77.30, mp4a.40.34"';
$array = str_getcsv($string, ",", '"');
print_r($array);
Which should return:
Array
(
[0] => #EXT-X-STREAM-INF:PROGRAM-ID=1
[1] => BANDWIDTH=714000
[2] => RESOLUTION=640x480
[3] => CODECS=avc1.77.30, mp4a.40.34
)
But instead, it is returning:
Array
(
[0] => #EXT-X-STREAM-INF:PROGRAM-ID=1
[1] => BANDWIDTH=714000
[2] => RESOLUTION=640x480
[3] => CODECS="avc1.77.30
[4] => mp4a.40.34"
)
Cause it is ignoring the enclosure of the last parameter: CODECS and is spliting also that information. I'm using str_getcsv instead of just doing explode(",", $string) precisely for that reason (that function should respect the enclosure) but it is working the same as explode will do it.
The code being executed: http://eval.in/17471

The enclosure (third) parameter does not have quite that effect. The enclosure character is treated as such only when it appears next to the delimiter.
To get your desired output, the input would need to be
#EXT-X-STREAM-INF:PROGRAM-ID=1,...,"CODECS=avc1.77.30, mp4a.40.34"
See it in action.

Related

str_getcsv doesn't enclose first column in double quotation marks in multi-line CSV

I noticed that str_getcsv doesn't seem to enclose the first value it receives in double quotation marks, even when the string data is passed this way.
In the example below, the first value in the 3rd row is "Small Box, But Smaller", but after running it through str_getcsv it becomes Small Box, But Smaller (without double quotation marks). Like this:
// multi-line csv string
$csvString = <<<'CSV'
"Title","Description",Quantity
"Small Box","For storing magic beans.",2
"Small Box, But Smaller","Not sure why we need this.",0
CSV;
// split string into rows (don't use explode in case multi-line values exist)
$csvRows = str_getcsv($csvString, "\n"); // parse rows
echo '<pre>';
print_r($csvRows);
echo '</pre>';
Outputs:
Array
(
[0] => Title,"Description",Quantity
[1] => Small Box,"For storing magic beans.",2
[2] => Small Box, But Smaller,"Not sure why we need this.",0
)
The problem this causes is that now if each row is parsed using str_getcsv, a comma in the first value makes it split into two rows. If it keeps running this:
foreach($csvRows as &$csvRow) {
$csvRow = str_getcsv($csvRow); // parse each row into values and save over original array value
}
unset($csvRow); // clean up
// output
echo '<pre>';
print_r($csvRows);
echo '</pre>';
Outputs:
Array
(
[0] => Array
(
[0] => Title
[1] => Description
[2] => Quantity
)
[1] => Array
(
[0] => Small Box
[1] => For storing magic beans.
[2] => 2
)
[2] => Array
(
[0] => Small Box
[1] => But Smaller
[2] => Not sure why we need this.
[3] => 0
)
)
The problem is in the last array value, which is an array of 4 keys instead of 3. It's split on the comma of the value "Small Box, But Smaller".
On the other hand, parsing just one row string works:
$csvRowData = '"Small Box, But Smaller","Not sure why we need this.",0';
$csvValues = str_getcsv($csvRowData);
echo '<pre>';
print_r($csvValues);
echo '</pre>';
Outputs:
Array
(
[0] => Small Box, But Smaller
[1] => Not sure why we need this.
[2] => 0
)
Why is this happening and how do I solve the problem with multi-line CSV data? Is there a best practice for working with multi-line CSV data when it is a string and is not read directly from a file? Also, I need to handle multi-line values, such as "foo \n bar" so I can't just use explode() instead of the first str_getcsv().
After much headache I think I understand the problem now. According to the PHP folks, "str_getcsv() is designed to parse a single CSV record into fields" (see https://bugs.php.net/bug.php?id=55763). I discovered that using str_getcsv() for multiple rows causes these not-so-well documented problems:
Double quotation marks are not maintained (as I demontrate above).
Line breaks in values cause it to think a new row has begun. This can have many unintended consequences.
I solved the issue by creating a temporary file and writing the CSV content to it. Then I read the file using fgetcsv(), which did not result in the 2 issues I described above. Example code:
// multi-line csv string
$csvString = <<<'CSV'
"Title","Description",Quantity
"Small Box","For storing magic beans.",2
"Small Box, But Smaller","This value
contains
multiple
lines.",0
CSV;
// ^ notice the multiple lines in the last row's value
// create a temporary file
$tempFile = tmpfile();
// write the CSV to the file
fwrite($tempFile, $csvString);
// go to first character
fseek($tempFile, 0);
// track CSV rows
$csvRows = array();
// read the CSV temp file line by line
while (($csvColumns = fgetcsv($tempFile)) !== false) {
$csvRows[] = $csvColumns; // push columns to array (really it would be more memory-efficient to process the data here and not append to an array)
}
// Close and delete the temp file
fclose($tempFile);
// output
echo '<pre>';
print_r($csvRows);
echo '</pre>';
Results in:
Array
(
[0] => Array
(
[0] => Title
[1] => Description
[2] => Quantity
)
[1] => Array
(
[0] => Small Box
[1] => For storing magic beans.
[2] => 2
)
[2] => Array
(
[0] => Small Box, But Smaller
[1] => This value
contains
multiple
lines.
[2] => 0
)
)
I'll also add that I found some options on GitHub, and 2 major projects for PHP 5.4+ and PHP 5.5+. However, I am still using PHP 5.3 and only saw options with limited activity. Furthermore, some of those processed CSV strings by writing to files and reading them out also.
I should also note that the documentation for PHP has some comments about str_getcsv() not being RFC-compliant: http://php.net/manual/en/function.str-getcsv.php. The same seems to be true for fgetcsv() yet the latter did meet my needs, at least in this case.
I don't know why you PHP_EOL is not working correctly as it does on my server however I did encounter this problem before.
The approach I took goes as follows.
Firstly I like to make sure all my fields are surrounded by double quotes regardless of the value in the field so to use your example text (with some slight modifications):
// multi-line csv string
$csvString = <<<CSV
"Title","Description","Quantity"
"Small Box","For storing magic beans.","2"
"Small Box, But Smaller","Not sure why we need this.","0"
"a","\n","b","c"
CSV;
$csvString .= '"a","' . "\n" . '","' . PHP_EOL . '","c"';
Secondly I target solo PHP_EOL that may be lingering in values so I can replace any "PHP_EOL" strings with "\r\n"
// Clear any solo end of line characters that are within values
$csvString = str_replace('","' . PHP_EOL . '"', '",""',$csvString);
$csvString = str_replace('"' . PHP_EOL . '","', '"","',$csvString);
$csvString = str_replace('"' . PHP_EOL . '"', '"'. "\r\n" . '"',$csvString);
and then finally this allows me to use the php explode function and display output:
$csvArr = explode("\r\n",$csvString);
foreach($csvArr as &$csvRow) {
$csvRow = str_getcsv($csvRow); // parse each row into values and save over original array value
}
unset($csvRow); // clean up
// output
echo '<pre>';
print_r($csvArr);
echo '</pre>';
Which outputs:
Array
(
[0] => Array
(
[0] => Title
[1] => Description
[2] => Quantity
)
[1] => Array
(
[0] => Small Box
[1] => For storing magic beans.
[2] => 2
)
[2] => Array
(
[0] => Small Box, But Smaller
[1] => Not sure why we need this.
[2] => 0
)
[3] => Array
(
[0] => a
[1] =>
[2] => b
[3] => c
)
[4] => Array
(
[0] => a
[1] =>
[2] =>
[3] => c
)
)
As you can see from the output the new line characters are not targeted, just the PHP_EOL.

Split string in php with comma and new line

Im trying to split string in PHP. I should split string using two delimiters: new line and comma. My code is:
$array = preg_split("/\n|,/", $str)
But i get string split using comma, but not using \n. Why is that? Also , do I have to take into account "\r\n" symbol?
I can think of two possible reasons that this is happening.
1. You are using a single quoted string:
$array = preg_split("/\n|,/", 'foo,bar\nbaz');
print_r($array);
Array
(
[0] => foo
[1] => bar\nbaz
)
If so, use double quotes " instead ...
$array = preg_split("/\n|,/", "foo,bar\nbaz");
print_r($array);
Array
(
[0] => foo
[1] => bar
[2] => baz
)
2. You have multiple newline sequences and I would recommend using \R if so. This matches any Unicode newline sequence that is in the ASCII range.
$array = preg_split('/\R|,/', "foo,bar\nbaz\r\nquz");
print_r($array);
Array
(
[0] => foo
[1] => bar
[2] => baz
[3] => quz
)

PHP preg_replace and explode function

I have some raw data like this
\u002522\u00253A\u002522https\u00253A\u00255C\u00252F\u00255C\
My intention is to remove the backslash "\" and first 7 digit of every string between \u002522https\ this. For this the output will be only https.
If there is only 7 digit like this \u002522\ the output will be empty.
My final intention is to put every result in a array which is formatted for the above raw data like this
Array
(
[0] =>
[1] =>
[2] => https
[3] =>
[4] =>
[5] =>
[6] =>
)
I want this result for constructing a URL. I have tried with preg_replace and explode function to get my expected result but I am failed.
$text = '\u002522\u00253A\u002522https\u00253A\u00255C\u00252F\u00255C\\';
$text = preg_replace("#(\\\\[a-z0-9]{7})#is",",",$text);
$text_array = explode(",",trim($text,'\\'));
print_r($text_array);

split regular expression php

I have a string like that :
0d(Hi)i(Hello)4d(who)i(where)540d(begin)i(began)
And i want to make it an array with that.
I try first to add separator, in order to use the php function explode.
;0,d(Hi),i(Hello);4,d(who),i(where);540,d(begin),i(began)
It works but the problem is I want to minimize the separator to save disk space.
Therefore i want to know by using preg_split, regular expression, if it's possible to have a huge array like that without using separator :
Array ( [0] => Array ( [0] => 0 [1] => d(hi) [2] => i(Hello) )
[1] => Array ( [0] => 4 [1] => d(who) [2] => i(where) )
[2] => Array ( [0] => 540 [1] => d(begin) [2] => i(began) )
)
I try some code & regex, but I saw that the value in the regular expression was not present in the final result (like explode function, in the final array we do not have the delimitor.)
More over, i have some difficulties to build the regex. Here is the one that I made :
$modif = preg_split("/[0-9]+(d(.+))?(i(.+))?/", $data);
I must precise that d() and i() can not be present (but at least one)
Thanks
If you do
preg_match_all('/(\d+)(d\([^()]*\))?(i\([^()]*\))?/', $subject, $result, PREG_SET_ORDER);
on your original string, then you'll get an array where
$result[$i][0]
contains the ith match (i. e. $result[0][0] would be 0d(Hi)i(Hello)) and where
$result[$i][$c]
contains the cth capturing group of the ith match (i. e. $result[0][1] is 0, $result[0][2] is d(Hi) and $result[0][2] is i(Hello)).
Is that what you wanted?

trying to filter string with <br> tags using explode, does not work

I get a string that looks like this
<br>
ACCEPT:YES
<br>
SMMD:tv240245ce
<br>
is contained in a variable $_session['result']
I am trying to parse through this string and get the following either in an array or as separate variables
ACCEPT:YES
tv240245ce
I first tried
to explode the string using as the delimiter, and that did not work
then I already tried
$yes = explode(":", strip_tags($_SESSION['result']));
echo print_r($yes);
which gives me an array like so
Array ( [0] => ACCEPT [1] => YESSEED [2] => tv240245ce ) 1
which gives me one of my answers.
Please what would be a great way of trying to achieve what I am trying to achieve?
is there a way to get rid of the first and last?
then use the remaining one as a delimiter to explode the string ?
or what's the best way to go about this ?
This will do it:
$data=preg_split('/\s?<br>\s?/', str_replace('SMMD:','',$data), NULL, PREG_SPLIT_NO_EMPTY);
See example here:
CodePad
You can also skip caring about the spurious <br> and treat the whole string as key:value format with a simple regex like:
preg_match_all('/^(\w+):(.*)/', $text, $result, PREG_SET_ORDER);
This requires that you really have line breaks in it though. Gives you a $result list which is easy to convert into an associative array afterwards:
[0] => Array
(
[0] => ACCEPT:YES
[1] => ACCEPT
[2] => YES
)
[1] => Array
(
[0] => SMMD:tv240245ce
[1] => SMMD
[2] => tv240245ce
)
First, do a str_replace to remove all instances of "SMMD:". Then, Explode on "< b r >\n". Sorry for weird spaced, it was encoding the line break.
Include the new line character and you should get the array you want:
$mystr = str_replace( 'SMMD:', '', $mystr );
$res_array = explode( "<br>\n", $mystr );

Categories