Convert html entities to unicode characters via php mysql csv export - php

In my MySQL database this is a sample of an HTML entity that I have:
Ú
When I export it through my script this is what I get:
ú
As you can see in my script I already have 'html_entity_decode' which should convert it appropriately to this (which is what I want):
Ú
Obviously, I am doing something wrong. I have exhausted other various scripts, solutions and otherwise have been trying to resolve this issue for over a day. Here is my PHP code:
$link = mysqli_connect("localhost", "user", "pass", "db");
$sql="SELECT * FROM wtf";
$result=mysqli_query($link,$sql);
if (!$result) die('Couldn\'t fetch records');
$fp = fopen('php://output', 'w');
if ($fp && $result) {
header("Content-type: application/vnd.ms-excel");
header("Content-Encoding: UTF-8");
header('Content-Disposition: attachment; filename="results.csv"');
header('Pragma: no-cache');
header('Expires: 0');
fputcsv($fp, array('Nome'));
while ($row = $result->fetch_array(MYSQLI_NUM)) {
fputcsv($fp, array_map('html_entity_decode',array_values($row)), ',', '"');
}
die;
}
mysqli_close($link);
exit;
Could someone please help or at least point me in the right direction? Having taken on a project that requires European characters in the CSV results, it has been nothing less then a nightmare...

It sounds like you're probably using a newer version of PHP, which will default to "UTF-8" when html_entity_decode() is called. Maybe try something like this:
Instead of this:
fputcsv($fp, array_map('html_entity_decode',array_values($row)), ',', '"');
Try this:
fputcsv($fp, call_user_func_array('html_entity_decode', array(array_values($row), ENT_COMPAT, 'ISO-8859-1')), ',', '"');

The problem is caused by Excel misinterpreting the character encoding of your output.
An output like ú is an indication that a multi-byte character is being interpreted as two separate single byte characters. When instead of writing a string as CSV, you echo that same string, it is rendered correctly, so this means the problem is not in the string, as stored by PHP.
The header Content-Encoding: UTF-8 does not find its way to Excel, so in order to make Excel aware of the UTF-8 encoding, output a Byte Order Mark at the start of the output:
$fp = fopen('php://output', 'w');
if ($fp && $result) {
header("Content-type: application/vnd.ms-excel");
header("Content-Encoding: UTF-8");
header('Content-Disposition: attachment; filename="results.csv"');
header('Pragma: no-cache');
header('Expires: 0');
fwrite($fp, "\xEF\xBB\xBF"); // <--- add this
Secondly, things tend to also work better when you use a TAB character as separator instead of a comma, as in Europe some regional settings define the semi-colon as the separator (the comma being taken as decimal separator), and this will make all columns collapse into one. So write:
fputcsv($fp, array('Nome'), "\t", '"');
while ($row = $result->fetch_array(MYSQLI_NUM)) {
fputcsv($fp, array_map('html_entity_decode',array_values($row)), "\t", '"');
}

Related

PHP: Export to CSV with special characters

I am trying to export some data that is stored on a table but when I tried to export to CSV this letter č shows like Ä or &#269.
I tried everithing utf8_decode, utf8_enconde, html_entity_decode, but is not working. What can I do?
Thanks,
Leandro.
Additional Information: Now I directly testing the following:
$delimiter = ";";
$enclosure = '"';
header("Content-Disposition: attachment; filename=memorandos.csv");
header("Pragma: no-cache");
header("Expires: 0");
$output = fopen('php://output', 'w');
$header = array('Apellido');
fputcsv($output, $header, $delimiter, $enclosure);
$memorando = Memorando::getById(3263);
if ($memorando){
$dd = array ();
$dd[] = $memorando->apellido; ////ON THE DATABSE IS STORED LIKE Jurič
fputcsv($output, $dd, $delimiter, $enclosure);
}
On the file I see this Juri&#269 ; instead of Jurič
There are many angles and approaches of dealing with this issue, you can even try: ISO-8859-1
Lets say you have
$input = "Fóø Bår Zacarías ?S?B?D Ferreíra"; // original text
Use iconv to get rid of the special chars
$output = iconv("utf-8", "ascii//TRANSLIT//IGNORE", $input);
Regexp lets remove utf-8 special characters except blank spaces
$output = preg_replace("/^'|[^A-Za-z0-9\s-]|'$/", '', $output);
Results in: Foo Bar Zacarias ASABAD Ferreira
echo $output;
And where is your code? can you share?

Exporting array to CSV ini PHP with special characters

Im exporting a php array to csv.
The error is that all special characters are screwed up (e.g.: á é ñ), and the start of the file displays field1 instead of field1.
The issue is happening because of Content-disposition: attachment;, if I comment that line, the file is created without any issues (sadly it is downloaded as FILENAME.php extension).
# CSV headers
header('Cache-Control: public');
header('Content-Type: application/octet-stream');
header('Content-type: application/csv; charset=utf-8');
header('Content-disposition: attachment; filename='.date('Y-m-d H\hi').'.csv');
# Columns
$o = 'field1,field2,field3,field4';
$o .= "\n";
# Data
$rows = array();
foreach($data as $item) {
$fields = array();
foreach($item as $field) {
$fields[] = $field;
}
$rows[] = implode(', ', $fields);
}
$o .= implode("\n", $rows);
echo $o;
Any ideas? Thanks!
As commented by Dagon,  is the BOM and may be causing problems with the file being read (specially if it is done in CMD on Windows). Remove the BOM from your script file.
As for the special characters, you may need to convert them, specially if your source isn't UTF-8.
I had a similar problem once and the solution for me was to certify that the input was being read correctly and converting them before outputting.
For converting the characters, I did something like this:
mb_convert_case($result['products_name'], MB_CASE_UPPER, "UTF-8");
To certify I was working with UTF-8, I issued
$connection->set_charset("utf8");
when connecting to my database.
Take care.

Escape special characters for fputcsv();

Is there an easy way to escape special characters, double quotes and multiple spaces for fputcsv()? In short with a lot of regex i get data from a PDF and i build my array. Inside the array there are no html tags with the exception of <br> tag. Now the problem is that i have to escape chars like these:
Èàèìòù"\' <br>
This is the messy output that i have now:
<br> instead of <br>
� instead of È, ò, à, ', "
Lot of ������� where there are multiple spaces which i don't want
I'm using this code:
$fichier = 'file.csv';
header("Content-Type: text/csv;charset=UTF-8" );
header("Content-Disposition: attachment;filename=\"$fichier\"" );
header("Pragma: no-cache");
header("Expires: 0");
$fp= fopen('php://output', 'w');
foreach ($output as $fields)
{
fputcsv($fp, $fields);
}
fclose($fp);
exit();
Tryed with array_map and many other functions with no success. I'm sure that it should be a simple question of charset encoding but it seems that content type UTF-8 is not working at all.

Comma Separated File, appending encrypted charter instead of new line

I am creating a comma separated file, that appends encrypted character at the end of each line instead of starting the next line in a new line?
$inner_exported_header_array[] = 'ID';
$inner_exported_header_array[] = 'First Name';
$exported_customers_arr [] = $inner_exported_header_array;
foreach ($_list AS $row) {
$inner_exported_array = array();
$inner_exported_array[] = $row['id'];
$inner_exported_array[] = $row['v_first_name'];
$exported_customers_arr[] = $inner_exported_array;
}
header("Cache-Control: must-revalidate, post-check=0, pre-check=0");
header('Content-Description: File Transfer');
header("Content-Type: text/plain");
header("Content-Disposition: attachment; filename=test.txt");
header("Pragma: no-cache");
header("Expires: 0");
outputCSV($exported_customers_arr);
function outputCSV($data) {
$outstream = fopen("php://output", "w");
function __outputCSV(&$vals, $key, $filehandler) {
fputcsv($filehandler, $vals); // add parameters if you want
}
array_walk($data, "__outputCSV", $outstream);
fclose($outstream);
}
Output is showing:
"ID","First Name"[special character not copied here]1,Testname
But if would be:
"ID","First Name"
1,Testname
Any thought, that our new line should be in a new line rather than inserting special characters at end of line ans starting new line from where the first line ends?
Summary from the comments:
The easiest solution may be to simply read the file in and replace every "\n" with "\r\n". The consumer of the file expects different line endings.
What are these characters? Seems like they're coming from your database, somehow? These "special" characters seem to contain newline (i.e. \n), which messes up your output.
I suggest checking the value of the last character of your problematic field using PHP's ord() function:
http://php.net/manual/en/function.ord.php
You could clean your CSV output using PHP's trim() function:
http://php.net/manual/en/function.trim.php
trim() also clears newline, which should solve your issue, if applied to the output of each field, i.e change this:
$inner_exported_array[] = $row['v_first_name'];
to:
$inner_exported_array[] = trim($row['v_first_name']);

Writing a CSV file for Mac users with PHP

I use a generic algorithm to write CSV files that in theory works for the major OSes. However, the client started to use Mac a few weeks ago, and they keep telling me the CSV file cannot be read in Microsoft Excel 2008 for Mac 12.2.1.
They have their OS configured to use "semicolon ;" as list separator, which is exactly what I am writing in the CSV. They also say, that when they open the file in notepad, they have noticed there are no linebreaks, everything is displayed in a single line; which is why Excel cannot read the file properly; but in my code, I am using the cross-browser line break \r\n
This is the full code I use:
header("Cache-Control: must-revalidate, post-check=0, pre-check=0");
// Output to browser with appropriate mime type, you choose ;)
header("Content-type: text/x-csv");
//header("Content-type: text/csv");
//header("Content-type: application/csv");
header("Content-Disposition: attachment; filename=participantes.csv");
$separator = ";";
$rs = $sql->doquery("SELECT A QUERY TO RETRIEVE DATA FROM THE DB");
$header = "";
$num_fields = mysql_num_fields($rs);
for($i=0; $i<$num_fields; $i++){
$field = mysql_field_name($rs, $i);
$header .= $field.$separator;
}
echo $header."\r\n";
while($row = $sql->fetch($rs)){
$str = "";
for($i=0; $i<$num_fields; $i++){
$field = mysql_field_name($rs, $i);
$value = str_replace(";", ",", $row->{$field});
$value = str_replace("\n", ",", $value);
$value = str_replace("\d", ",", $value);
$value = str_replace(chr(13), ",", $value);
$str .= $value.$separator;
}
echo $str."\r\n";
}
Is there anything I can do so Mac users can read the file properly?
For debugging purposes:
Create a CSV file and send it to them by mail. Can they open it OK?
Have them download the file from your page and have it sent back to you. Compare the files in a Hex-editor to rule out the off-chance that they look differently from what you send to the browser or from what you have saved.
Have them double-check their Excel-settings.
Have them create a working CSV file from scratch (text editor on a mac) and spot any differences from your approach.
Here's some code I did to convert a tab delimited data into CSV, and it comes in fine on my mac. Note that I have it set up to make me click to download, rather than pushing it to the browser. It's not a great solution (I'm pretty sure the code is crap) but it works for what I need it for.
$input = $_POST['input'];
//Remove commas.
$replacedinput1 = str_replace(",", "-", $input);
//remove tabs, replace with commas
$replacedinput2 = str_replace(" ", ",", $replacedinput1);
//create an array
$explodedinput = explode("
", $replacedinput2);
//open the CSV file to write to; delete other text in it
$opencsvfile = fopen("/var/www/main/tools/replace_tab.csv","w+");
//for each line in the array, write it to the file
foreach($explodedinput as $line) {
fputcsv ($opencsvfile, split(',', $line));
};
//close the file
fclose($opencsvfile);
//have the user download the file.
/*
header('Pragma: public');
header('Expires: Fri, 01 Jan 2010 00:00:00 GMT');
header('Cache-Control: must-revalidate, post-check=0, pre-check=0');
header('Content-Type: application/csv');
header('Content-Disposition: filename=replace_tab.csv'); */
//Or not, since I can't get it to work.
echo "<a href='replace_tab.csv'>Download CSV File, then open in Numbers or Excel (Note, you may need to right click, save as.)</a>.";
The newline character in Mac is LF (unicode U+000A) and in Windows it is CR + LF (Unicode U+000D followed by U+000A).
Maybe that's why the csv is unreadable on a Mac.

Categories