Escaping for CSV - php

I need to store a string in a MySQL database. The values will later be used in a CSV. How do I escape the string so that it is CSV-safe? I assume I need to escape the following: comma, single quote, double quote.
PHP's addslashes function does:
single quote ('), double quote ("), backslash () and NUL (the NULL byte).
So that won't work. Suggestions? I'd rather not try to create some sort of regex solution.
Also, I need to be able to unescape.

Use fputcsv() to write, and fgetcsv() to read.

fputcsv() is not always necessary especially if you don't need to write any file but you want to return the CSV as an HTTP response.
All you need to do is to double quote each value and to escape double quote characters repeating a double quote each time you find one.
Here a few examples:
hello -> "hello"
this is my "quote" -> "this is my ""quote"""
catch 'em all -> "catch 'em all"
As you can see the single quote character doesn't need any escaping.
Follows a full working example:
<?php
$arrayToCsvLine = function(array $values) {
$line = '';
$values = array_map(function ($v) {
return '"' . str_replace('"', '""', $v) . '"';
}, $values);
$line .= implode(',', $values);
return $line;
};
$csv = [];
$csv[] = $arrayToCsvLine(["hello", 'this is my "quote"', "catch 'em all"]);
$csv[] = $arrayToCsvLine(["hello", 'this is my "quote"', "catch 'em all"]);
$csv[] = $arrayToCsvLine(["hello", 'this is my "quote"', "catch 'em all"]);
$csv = implode("\r\n", $csv);
If you get an error is just because you're using an old version of PHP. Fix it by declaring the arrays with their old syntax and replacing the lambda function with a classic one.

For those of you trying to sanitise data using PHP and output as a CSV this can be done using PHP's fputcsv() function without having to write to a file as such:
<?php
// An example PHP array holding data to be put into CSV format
$data = [];
$data[] = ['row1_val1', 'row1_val2', 'row1_val3'];
$data[] = ['row2_val1', 'row2_val2', 'row2_val3'];
// Write to memory (unless buffer exceeds 2mb when it will write to /tmp)
$fp = fopen('php://temp', 'w+');
foreach ($data as $fields) {
// Add row to CSV buffer
fputcsv($fp, $fields);
}
rewind($fp); // Set the pointer back to the start
$csv_contents = stream_get_contents($fp); // Fetch the contents of our CSV
fclose($fp); // Close our pointer and free up memory and /tmp space
// Handle/Output your final sanitised CSV contents
echo $csv_contents;

Don't store the data CSV escaped in the database. Escape it when you export to CSV using fputcsv. If you're storing it CSV escaped you're essentially storing garbage for all purposes other than CSV exporting.

Related

explode csv file on delimiter (;) and delimiter(,)?

when I explode csv file on delimiter (;)
the explode successfully in some excel program and failed in others
also when I explode csv file on delimiter (,)
the explode successfully in some excel program and failed in others
How can I do explode in all versions of excel?
How can I know the perfect delimiter to explode?
yes there is code..
if (!function_exists('create_csv')) {
function create_csv($query, &$filename = false, $old_csv = false) {
if(!$filename) $filename = "data_export_".date("Y-m-d").".csv";
$ci = &get_instance();
$ci->load->helper('download');
$ci->load->dbutil();
$delimiter = ";";
$newline = "\r\n";
$csv = "Data:".date("Y-m-d").$newline;
if($old_csv)
$csv .= $old_csv;
else
$csv .= $ci->dbutil->csv_from_result($query, $delimiter, $newline);
$columns = explode($newline, $csv);
$titles = explode($delimiter, $columns[1]);
$new_titles = array();
foreach ($titles as $item) {
array_push($new_titles, lang(trim($item,'"')));
}
$columns[1] = implode($delimiter, $new_titles);
$csv = implode($newline, $columns);
return $csv;
}
}
sometimes I put $delimiter = ";";
and sometims $delimiter = ",";
thanks..
You can use helper function to detect best delimiter like:
public function find_delimiter($csv)
{
$delimiters = array(',', '.', ';');
$bestDelimiter = false;
$count = 0;
foreach ($delimiters as $delimiter)
if (substr_count($csv, $delimiter) > $count) {
$count = substr_count($csv, $delimiter);
$bestDelimiter = $delimiter;
}
return $bestDelimiter;
}
If you have an idea of the expected data (number of columns) then this might work as a good guess, and could be a good alternative to comparing which occurs the most (depending on what kind of data you're expecting).
It would work even better if you have a header record, I'd imagine. (You could put in a check for specific header values)
Sorry for not fitting it into your code, but I am not really sure what those calls you are making do, but you should be able to fit it around.
$expected_num_of_columns = 10;
$delimiter = "";
foreach (array(",", ";") as $test_delimiter) {
$fid = fopen ($filename, "r");
$csv_row = fgetcsv($fid, 0, $test_delimiter);
if (count($csv_row) == $expected_num_of_columns) {
$delimiter = $test_delimiter;
break;
}
fclose($fid);
}
if (empty($delimiter)) {
die ("Input file did not contain the correct number of fields (" . $expected_num_of_columns . ")");
}
Don't use this if, for example, all or most of the fields contain non-integer numbers (e.g. a list of monetary amounts) and has no header record, because files separated by ; are most likely to use , as the decimal point and there could be the same number of commas and semi-colons.
The short answer is, you probably can't unless you can apply some heuristic to determine the file format. If you don't know and can't detect the format of the file you're parsing, then parsing it is going to be difficult.
However, once you have determined (or, required a particular one) the delimiter format. You will probably find that php's built-in fgetcsv will be easier and more accurate than a manual explode based strategy.
There is no way to be 100% sure you are targeting the real delimiter. All you can do is guessing.
You should start by finding the right delimiter, then explode the CSV on this delimiter.
To find the delimiter, basically, you want a function that counts the number of , and the number of ; and that returns the greater.
Something like :
$array = explode(find_delimiter($csv), $csv);
Hope it helps ;)
Edit : Your find_delimiter function could be something like :
function find_delimiter($csv)
{
$arrDelimiters = array(',', '.', ';');
$arrResults = array();
foreach ($arrDelimiters as $delimiter)
{
$arrResults[$delimiter] = count(explode($delimiter, $csv));
}
$arrResults = rsort($arrResults);
return (array_keys($arrResults)[0]);
}
Well, it looks like you exactly know that your delimiter will be "," or ";". This is a good place to start. Thus, you may try to replace all commas (,) to semicolons (;), and then explode by the semicolon only. However, in this approach you would definitely have a problem in some cases, because some lines of your CSV files could be like this:
"name,value",other name,other value,last name;last value
In this way delimiter of your CSV file will be comma if there will be four columns in your CSV file. However, by changing commas to semicolons you would get five columns which would be incorrect. So, changing some delimiter to another is not a good way.
But still, if your CSV file is correctly formatted, then you may find correct delimiter in any of the lines. So, you may try to create some function like find_delimiter($csvLine) as proposed by #johnkork, but the problem with this is that the function itself can't know which delimiter to search for. However, you exactly know all the possible delimiters, so you may try to create another, quite similar, function like delimiter_exists($csvLine, $delimiter) which returns true or false.
But even the function delimiter_exists($csvLine, $delimiter) is not enough. Why? Because for the instance of CSV line provided above you would get that both "," and ";" are delimiters that exists. For comma it would CSV file with four columns, and for semicolon it would be two columns.
Thus, there is no universal way which would get you exactly what you want. However, there may be another way you can check for - the first line of CSV file which is the header assuming your CSV files have a header. Mostly, headers in CSV file have (not necessarily) no other symbols, except for the alphanumeric names of the columns, which are delimited by the specific delimiter. So, you may try to create function like delimiter_exists($csvHeader, $delimiter) whose implementation could be like this:
function delimiter_exists($csvHeader, $delimiter) {
return (bool)preg_match("/$delimiter/", $csvHeader);
}
For you specific case you may use it like this:
$csvHeader = "abc;def";
$delimiter = delimiter_exists($csvHeader, ',') ? ',' : ';';
Hope this helps!

How to prevent character conversion to integer in data export in PHP

I have a link on a page that, when clicked, exports an array of data to csv using fputcsv. When Excel displays the data, there is a column that looks like an integer, but it's not, and Excel is converting it to scientific notation. How do I export the data so that this column is displayed as characters (not a scientific number) ?
The code I'm using for export is from Alain Tiemblo's answer here:
Link to Code
function array2csv(array &$array)
{
if (count($array) == 0) {
return null;
}
ob_start();
$df = fopen("php://output", 'w');
fputcsv($df, array_keys(reset($array)));
foreach ($array as $row) {
fputcsv($df, $row);
}
fclose($df);
return ob_get_clean();
}
Not sure about Excel, but LibreOffice and OpenOffice will import fields as strings if the CSV field is quoted. For example, you want your CSV to be something like:
foo,bar,"12345",baz
(You may also have to check "Quoted field as text" option in the file open dialog.)
Edit: PHP's fputcsv() function will only use quote wrappers if it needs to, so you'll likely have to manually force quotes around the actual field value yourself:
$field = 12345;
$quoted_field = '"' . $field . '"';
Edit 2: If you don't need to worry about escaping, this might work for you instead of fputcsv():
fwrite($fp, '"' . implode($fields, '","') . '"' . "\n");
Try to force your int into a string before your fputcsv.
For example
$foo = "$foo";
http://php.net/manual/en/language.types.type-juggling.php
But then again Excel might make up its own mind when converting your CSV to an Excel format...
Also this question might help: Excel CSV - Number cell format
You could cast the value to string using strval.
converting integer to string using :-
string, strval or enclosing value in double/single quotes , or even concat space with the variable does not work because CSV doesn't hold field type information.
The only way I found is to add some character or symbol to forcefully make it string but that will show in output too.

0's are truncate while write in to an csv using php function

I have try to write data to csv using the php function.
But the 0's are truncate while write in to an csv.
$data = array ( 'aaa,bbb,ccc,dddd', '000123,456,789','"aaa","bbb"');
$fp = fopen('data.csv', 'w');
foreach($data as $line)
{
$val = explode(",",$line);
fputcsv($fp, $val);
}
fclose($fp);
if you are trying to open csv in excel or open office, it will truncate leading zeros.
when u construct the string with "\t" before zero to avoid 0 truncation
I think Excel has treated it as a number and omitted the 0.
You may try to do this:
fputcsv ($fp, "='".$val."'");
See if it works
You could find that it's Excel, etc that's eating the 0 characters (you can test this out by opening the csv file in notepad (or whatever your favourite text editor is) and seeing if they're there.
If that's not the case then try using the following line:
fputcsv($fp, (string) $val);
Just in case the variable is somehow being cast to an integer somewhere.

PHP file generating CSV, how to correctly output text

Currently I am working on a PHP script to output a CSV file from entries in a MySQL database. My problem lies in how to correctly output the values. Many of the entries in the MySQL database will contain commas and quotes, which destroy the format of the CSV file if I just plainly print them out to the file.
I'm aware that I can surround the text in quotes, but the entries that contain quotes would mess up the format of the file.
My question is, what can I do to keep this from happening?
Also, do new lines affect the interpretation of the file?
In addition, I'd rather not use the fputcsv function in PHP. I'm attempting to make the PHP script output the contents of the file (with appropriate headers) rather than write to a new file.
Thanks in advance!
Regards,
celestialorb
I think your dismissal of fputcsv might have been premature. In the comments of the fputcsv manual there's an example where they use fputcsv to output to the browser instead of a file.
http://php.net/manual/en/function.fputcsv.php
Here is that code, plus some headers to show that it does indeed prompt the user to download a csv file.
$mydata = array(
array('data11', 'data12', 'data13'),
array('data21', 'data22', 'data23'),
array('data31', 'data32', 'data23'));
header("Content-type: application/octet-stream");
header("Content-Disposition: attachment; filename=\"my-data.csv\"");
outputCSV($mydata);
function outputCSV($data) {
$outstream = fopen("php://output", 'w');
function __outputCSV(&$vals, $key, $filehandler) {
fputcsv($filehandler, $vals, ';', '"');
}
array_walk($data, '__outputCSV', $outstream);
fclose($outstream);
}
The process is called escaping, and most parsers (included PHP's) use the backslash to escape characters:
"This string contains literal \"quotes\" denoted by backslashes"
You can escape characters in a string with addcslashes:
// escape double-quotes
$string = addcslashes('this string contains "quotes"', '"');
echo $string; // 'this string contains \"quotes\"'
Given an array of data you want to separate by commas, you can do the following:
// Escape all double-quotes
foreach ($data as $key => $value)
$data[$key] = addcslashes($value, '"');
// Wrap each element in double quotes
echo '"' . implode('", "', $data), '"';
I have found tab separated value files to be helpful in these situations. TSV is less susceptible to the issues with commas etc in your data.

Forcing fputcsv to Use Enclosure For *all* Fields

When I use fputcsv to write out a line to an open file handle, PHP will add an enclosing character to any column that it believes needs it, but will leave other columns without the enclosures.
For example, you might end up with a line like this
11,"Bob ",Jenkins,"200 main st. USA ",etc
Short of appending a bogus space to the end of every field, is there any way to force fputcsv to always enclose columns with the enclosure (defaults to a ") character?
No, fputcsv() only encloses the field under the following conditions
/* enclose a field that contains a delimiter, an enclosure character, or a newline */
if (FPUTCSV_FLD_CHK(delimiter) ||
FPUTCSV_FLD_CHK(enclosure) ||
FPUTCSV_FLD_CHK(escape_char) ||
FPUTCSV_FLD_CHK('\n') ||
FPUTCSV_FLD_CHK('\r') ||
FPUTCSV_FLD_CHK('\t') ||
FPUTCSV_FLD_CHK(' ')
)
There is no "always enclose" option.
Not happy with this solution but it is what I did and worked. The idea is to set an empty char as enclosure character on fputcsv and add some quotes on every element of your array.
function encodeFunc($value) {
return "\"$value\"";
}
fputcsv($handler, array_map(encodeFunc, $array), ',', chr(0));
Building on Martin's answer, if you want to avoid inserting any characters that don't stem from the source array (Chr(127), Chr(0), etc), you can replace the fputcsv() line with the following instead:
fputs($fp, implode(",", array_map("encodeFunc", $row))."\r\n");
Granted, fputs() is slower than fputcsv(), but it's a cleaner output. The complete code is thus:
/***
* #param $value array
* #return string array values enclosed in quotes every time.
*/
function encodeFunc($value) {
///remove any ESCAPED double quotes within string.
$value = str_replace('\\"','"',$value);
//then force escape these same double quotes And Any UNESCAPED Ones.
$value = str_replace('"','\"',$value);
//force wrap value in quotes and return
return '"'.$value.'"';
}
$fp = fopen("filename.csv", 'w');
foreach($table as $row){
fputs($fp, implode(",", array_map("encodeFunc", $row))."\r\n");
}
fclose($fp);
After a lot of scrafffing around and some somewhat tedious character checking, I have a version of the above referenced codes by Diego and Mahn that will correctly strip out encasings and replace with double quotes on all fields in fputcsv. and then output the file to the browser to download.
I also had a secondary issue of not being able to be sure that double quotes were always / never escaped.
Specifically for when outputting directly to browser using the php://input stream as referenced by Diego. Chr(127) is a space character so the CSV file has a few more spaces than otherwise but I believe this sidesteps the issue of chr(0) NULL characters in UTF-8.
/***
* #param $value array
* #return string array values enclosed in quotes every time.
*/
function encodeFunc($value) {
///remove any ESCAPED double quotes within string.
$value = str_replace('\\"','"',$value);
//then force escape these same double quotes And Any UNESCAPED Ones.
$value = str_replace('"','\"',$value);
//force wrap value in quotes and return
return '"'.$value.'"';
}
$result = $array_Set_Of_DataBase_Results;
$fp = fopen('php://output', 'w');
if ($fp && $result) {
header('Content-Type: text/csv');
header('Content-Disposition: attachment; filename="export-'.date("d-m-Y").'.csv"');
foreach($result as $row) {
fputcsv($fp, array_map("encodeFunc", $row), ',', chr(127));
}
unset($result,$row);
die;
}
I hope this is useful for some one.
A "quick and dirty" solution is to add ' ' at the end of all of your fields, if it's acceptable for you:
function addspace($v) {
return $v.' ';
}
fputcsv($handle, array_map('addspace', $fields));
PS: why it's working? see Volkerk answer ;)

Categories