UTF16-LE with BOM not recognizing sep in csv file - php

I need to generate a csv through PHP in UTF16-LE to support Excel (on Windows and Mac OS X). As suggested here I used mb_convert_encoding and added the BOM at the file start, followed by sep=; in order to make it open properly on Excel.
header('Content-Type: application/csv; charset=UTF-16LE');
header('Content-Disposition: attachment; filename=export.csv');
$output = fopen('php://output', 'w');
fputs($output, mb_convert_encoding("\xEF\xBB\xBF" . "sep=;\n" . implode($labels, ";") . "\n", 'UTF-16LE', 'UTF-8'));
foreach($data as $data_line) {
fputs($output, mb_convert_encoding(implode($data_line, ";") . "\n", 'UTF-16LE', 'UTF-8'));
}
The character encoding is ok, but when I try to open it in OpenOffice here is what I get:
The sep=;\n isn't recognized - it shouldn't be on the first line. I don't think it's a BOM issue, because when I open it with an hex editor this is what I get:
The BOM seems to be correct, as it's ÿþ which is the UTF16-LE BOM. I tried with \r\n in place of \n after sep, with no luck.

I can't be sure if this is the cause of your problems, but an obvious issue I see is that you haven't encoded the sep=;\n string as UTF-16LE.
To fix this, change your first fputs() line to:
$bom = "\xEF\xBB\xBF";
$header = $bom . "sep=;\n" . implode($labels, ";") . "\n";
fputs($output, mb_convert_encoding($header, 'UTF-16LE', 'UTF-8'));
(The string \xEF\xBB\xBF is the Unicode Byte Order Mark in the UTF-8 encoding; it will yield the correct BOM when converted to UTF-16.)

Related

if i add sep="\t" my csv it's not open in UTF-8 using excel in php

I want to export csv files from php. I decided to use \t = tab as separator to make it unique (use comma as separator is not a good choice because it change from countries to countries). But if i add sep="\t" to my csv it will not recognize the utf-8 bom and when i open the csv in excel, it shows up using ? character to replace °,à, ecc..
If i remove the string "sep=\t" it's decode perfectly in UTF-8 but it won't recognize the \t as separator.
How could i use both "sep=\t" and utf-8 bom?
Here is my code:
header('Content-Encoding: UTF-8');
header('Content-type: text/csv; charset=UTF-8');
header('Content-Disposition: attachment; filename="export.csv"');
/*UTF-8 BOM (Byte Order Mark)
\x = caratteri codifica esadecimale */
echo "\xEF\xBB\xBF";
$f = fopen('php://output', 'w');
fwrite($f, "sep=\t");
foreach ($array_to_csv as $line) {
fputcsv($f, $line,"\t");
}

PHP: Export to CSV with special characters

I am trying to export some data that is stored on a table but when I tried to export to CSV this letter č shows like Ä or &#269.
I tried everithing utf8_decode, utf8_enconde, html_entity_decode, but is not working. What can I do?
Thanks,
Leandro.
Additional Information: Now I directly testing the following:
$delimiter = ";";
$enclosure = '"';
header("Content-Disposition: attachment; filename=memorandos.csv");
header("Pragma: no-cache");
header("Expires: 0");
$output = fopen('php://output', 'w');
$header = array('Apellido');
fputcsv($output, $header, $delimiter, $enclosure);
$memorando = Memorando::getById(3263);
if ($memorando){
$dd = array ();
$dd[] = $memorando->apellido; ////ON THE DATABSE IS STORED LIKE Jurič
fputcsv($output, $dd, $delimiter, $enclosure);
}
On the file I see this Juri&#269 ; instead of Jurič
There are many angles and approaches of dealing with this issue, you can even try: ISO-8859-1
Lets say you have
$input = "Fóø Bår Zacarías ?S?B?D Ferreíra"; // original text
Use iconv to get rid of the special chars
$output = iconv("utf-8", "ascii//TRANSLIT//IGNORE", $input);
Regexp lets remove utf-8 special characters except blank spaces
$output = preg_replace("/^'|[^A-Za-z0-9\s-]|'$/", '', $output);
Results in: Foo Bar Zacarias ASABAD Ferreira
echo $output;
And where is your code? can you share?

Exporting array to CSV ini PHP with special characters

Im exporting a php array to csv.
The error is that all special characters are screwed up (e.g.: á é ñ), and the start of the file displays field1 instead of field1.
The issue is happening because of Content-disposition: attachment;, if I comment that line, the file is created without any issues (sadly it is downloaded as FILENAME.php extension).
# CSV headers
header('Cache-Control: public');
header('Content-Type: application/octet-stream');
header('Content-type: application/csv; charset=utf-8');
header('Content-disposition: attachment; filename='.date('Y-m-d H\hi').'.csv');
# Columns
$o = 'field1,field2,field3,field4';
$o .= "\n";
# Data
$rows = array();
foreach($data as $item) {
$fields = array();
foreach($item as $field) {
$fields[] = $field;
}
$rows[] = implode(', ', $fields);
}
$o .= implode("\n", $rows);
echo $o;
Any ideas? Thanks!
As commented by Dagon,  is the BOM and may be causing problems with the file being read (specially if it is done in CMD on Windows). Remove the BOM from your script file.
As for the special characters, you may need to convert them, specially if your source isn't UTF-8.
I had a similar problem once and the solution for me was to certify that the input was being read correctly and converting them before outputting.
For converting the characters, I did something like this:
mb_convert_case($result['products_name'], MB_CASE_UPPER, "UTF-8");
To certify I was working with UTF-8, I issued
$connection->set_charset("utf8");
when connecting to my database.
Take care.

Export CSV for Excel

I'm writing a CSV file in PHP using fputcsv($file, $data).
It all works, however I can't just open it in Excel but have to import it and specify the encoding and which delimiter to use (in a wizard).
I've seen exports from other websites that open correctly just by clicking on them and now would like to know what I should do to my file to achieve that.
I tried using this library: http://code.google.com/p/parsecsv-for-php/
But I couldn't even get it to run and am not really confident if it would really help me...
This is how I make Excel readable CSV files from PHP :
Add BOM to fix UTF-8 in Excel
Set semi-colon (;) as delimeter
Set correct header ("Content-Type: text/csv; charset=utf-8")
For exemple :
$headers = array('Lastname :', 'Firstname :');
$rows = array(
array('Doe', 'John'),
array('Schlüter', 'Rudy'),
array('Alvarez', 'Niño')
);
// Create file and make it writable
$file = fopen('file.csv', 'w');
// Add BOM to fix UTF-8 in Excel
fputs($file, $bom = (chr(0xEF) . chr(0xBB) . chr(0xBF)));
// Headers
// Set ";" as delimiter
fputcsv($file, $headers, ";");
// Rows
// Set ";" as delimiter
foreach ($rows as $row) {
fputcsv($file, $row, ";");
}
// Close file
fclose($file);
// Send file to browser for download
$dest_file = 'file.csv';
$file_size = filesize($dest_file);
header("Content-Type: text/csv; charset=utf-8");
header("Content-disposition: attachment; filename=\"file.csv\"");
header("Content-Length: " . $file_size);
readfile($dest_file);
Works with Excel 2013.
this is really a mess. You surely can use the sep=; or sep=, or sep=\t or whatever to make Excel aware of a separator used in your CSV. Just put this string at the beginning of your CSV contents. E.g.:
fwrite($handle, "sep=,\n");
fputcsv($handle,$yourcsvcontent);
This works smoothly. BUT, it doesn't work in combination with a BOM which is required to make Excel aware of UTF-8 in case you need to support special characters or MB respectively.
In the end to make it bullet-proof you need to read out users locale and set the Separator accordingly, as mentioned above.
Put a BOM ("\xEF\xBB\xBF") at the begining of your CSV content, then write the CSV like e.g.: fputcsv($handle, $fields, $user_locale_seperator);
where $user_locale_seperator is the separtator you retrieved by checking the user's locale.
Not comfortable but it works...
Despite the "C=comma" in CVS, Excel uses your locale native separator. So supposing fputcsv always uses a comma, it won't work, if your locale separator is for example a semicolon.
What Google AdSense does, when you click "Export to Excel CSV", is that it uses Tab as a separator. And that works.
To replicate that, set the third parameter (delimiter) of fputcsv to override the default comma. E.g. for Tab use: fputcsv($handle, $fields, "\t");
Compare the format of the CSV that works for you against the one generated by fputcsv.
Consider including example of both in your question. You might get better answers.
You may have an encoding issue.
Try this post:
http://onwebdev.blogspot.com.es/2010/10/php-encoding-of-csv-file-for-excel.html
I notice that you need to consider:
Content-Type header
BOM (Byte Order Mark)
Actual character encoding in the file
With BOM (works):
$bom = pack("CCC", 0xEF, 0xBB, 0xBF);
header('Content-Type: text/csv');
header('Content-Length: '.(strlen($csv)+strlen($bom)));
header('Content-Disposition: attachment;filename=my.csv');
echo $bom;
echo $csv;
Without BOM (works but you need to replace “smart quotes” then run utf8_decode on each value or cell, and it converts some characters, for example FRĒ is converted to FRE')
header('Content-Type: application/csv;charset=utf-8');
header('Content-Length: '.strlen($csv));
header('Content-Disposition: attachment;filename=my.csv');
echo $csv;
If the wrong combination of charset and BOM are used, it just comes out wrong when opening in MS Excel.
Bonus fact: mb_strlen tells you the number of characters, strlen tells you the number of bytes. You do NOT want to use mb_strlen for calculating the Content-Length header.
Bonus 2: replace microsoft "smart" characters (em dash, curly quotes, etc):
$map = array(chr(145) => "'"
,chr(146) => "'"
,chr(147) => '"'
,chr(148) => '"'
,chr(149) => '-'
,chr(150) => '-'
,chr(151) => '-'
,chr(152) => '-'
,chr(152) => '-'
,chr(171) => '-'
,chr(187) => '-'
);
// faster that strtr
return str_replace( array_keys($map), $map, $str );

Writing a CSV file for Mac users with PHP

I use a generic algorithm to write CSV files that in theory works for the major OSes. However, the client started to use Mac a few weeks ago, and they keep telling me the CSV file cannot be read in Microsoft Excel 2008 for Mac 12.2.1.
They have their OS configured to use "semicolon ;" as list separator, which is exactly what I am writing in the CSV. They also say, that when they open the file in notepad, they have noticed there are no linebreaks, everything is displayed in a single line; which is why Excel cannot read the file properly; but in my code, I am using the cross-browser line break \r\n
This is the full code I use:
header("Cache-Control: must-revalidate, post-check=0, pre-check=0");
// Output to browser with appropriate mime type, you choose ;)
header("Content-type: text/x-csv");
//header("Content-type: text/csv");
//header("Content-type: application/csv");
header("Content-Disposition: attachment; filename=participantes.csv");
$separator = ";";
$rs = $sql->doquery("SELECT A QUERY TO RETRIEVE DATA FROM THE DB");
$header = "";
$num_fields = mysql_num_fields($rs);
for($i=0; $i<$num_fields; $i++){
$field = mysql_field_name($rs, $i);
$header .= $field.$separator;
}
echo $header."\r\n";
while($row = $sql->fetch($rs)){
$str = "";
for($i=0; $i<$num_fields; $i++){
$field = mysql_field_name($rs, $i);
$value = str_replace(";", ",", $row->{$field});
$value = str_replace("\n", ",", $value);
$value = str_replace("\d", ",", $value);
$value = str_replace(chr(13), ",", $value);
$str .= $value.$separator;
}
echo $str."\r\n";
}
Is there anything I can do so Mac users can read the file properly?
For debugging purposes:
Create a CSV file and send it to them by mail. Can they open it OK?
Have them download the file from your page and have it sent back to you. Compare the files in a Hex-editor to rule out the off-chance that they look differently from what you send to the browser or from what you have saved.
Have them double-check their Excel-settings.
Have them create a working CSV file from scratch (text editor on a mac) and spot any differences from your approach.
Here's some code I did to convert a tab delimited data into CSV, and it comes in fine on my mac. Note that I have it set up to make me click to download, rather than pushing it to the browser. It's not a great solution (I'm pretty sure the code is crap) but it works for what I need it for.
$input = $_POST['input'];
//Remove commas.
$replacedinput1 = str_replace(",", "-", $input);
//remove tabs, replace with commas
$replacedinput2 = str_replace(" ", ",", $replacedinput1);
//create an array
$explodedinput = explode("
", $replacedinput2);
//open the CSV file to write to; delete other text in it
$opencsvfile = fopen("/var/www/main/tools/replace_tab.csv","w+");
//for each line in the array, write it to the file
foreach($explodedinput as $line) {
fputcsv ($opencsvfile, split(',', $line));
};
//close the file
fclose($opencsvfile);
//have the user download the file.
/*
header('Pragma: public');
header('Expires: Fri, 01 Jan 2010 00:00:00 GMT');
header('Cache-Control: must-revalidate, post-check=0, pre-check=0');
header('Content-Type: application/csv');
header('Content-Disposition: filename=replace_tab.csv'); */
//Or not, since I can't get it to work.
echo "<a href='replace_tab.csv'>Download CSV File, then open in Numbers or Excel (Note, you may need to right click, save as.)</a>.";
The newline character in Mac is LF (unicode U+000A) and in Windows it is CR + LF (Unicode U+000D followed by U+000A).
Maybe that's why the csv is unreadable on a Mac.

Categories