Excel utf8 encoding (BOM is not working)

Excel utf8 encoding (BOM is not working) - php

I just want to export data in the csv format and open it in excel. This method writes one row into it.
public function writeRow(array $row)
{
$str = $this->rowToStr($row);
$encodedStr = mb_convert_encoding($str, 'UTF-16LE', 'UTF-8');
$ret = fwrite($this->_getFilePointer('w+'), $encodedStr);
/* According to http://php.net/fwrite the fwrite() function
should return false on error. However not writing the full
string (which may occur e.g. when disk is full) is not considered
as an error. Therefore both conditions are necessary. */
if (($ret === false) || (($ret === 0) && (strlen($str) > 0))) {
throw new Exception("Cannot open file $this",
Exception::WRITE_ERROR, NULL, 'writeError');
}
}
Then i will try to write a row.
$csvFile->writeRow(array(chr(0xEF) . chr(0xBB) . chr(0xBF)));
$csvHeaders = array('ID', 'Email', 'Variabilní symbol', 'Jméno', 'Příjmení',
'Stav', 'Zaregistrován', 'Zaregistrován do');
$csvFile->writeRow($csvHeaders);
And the result is :
ID,"Email","Variabilní symbol","Jméno","PYíjmení","Stav","Zaregistrován","Zaregistrován do"
Only a few letters are not correct (the method mb_convert_encoding does the trick)
I have tried the traditional way
// Open file pointer to standard output
$fp = fopen($filePath, 'w');
// Add BOM to fix UTF-8 in Excel
fputs($fp, $bom = (chr(0xEF) . chr(0xBB) . chr(0xBF)));
fclose($fp)
And the result was the same.

The BOM you've mentioned is for UTF-8, but your data is UTF-16LE. Therefore you should use a different BOM:
$bom = chr(0xFF) . chr(0xFE)
Or in your code:
$fp = fopen($filePath, 'w');
fputs($fp, chr(0xFF) . chr(0xFE));
// Add lines here...
fclose($fp);

Related

utf-16le to UTF-8

I am using php on osx terminal to open the file generated with windows.
I confirmed file is utf-16le encoded
$file --mime myfile.ini
myfile.ini: text/plain; charset=utf-16le
Now I convert it to UTF-8 with this script.
while ($line = fgets($handle)) {
$line = rtrim($line);
$line = mb_convert_encoding($line,"UTF-8","UTF-16LE");
var_dump($line);
}
somehow it shows the corruption like this
string(63) "䘀爀漀洀䐀愀琀攀㴀㈀　㄀㄀⸀　㄀⸀　㄀ഀ਀"
How can I get the correct encoding???
When I don't use mb_convert_encoding
while ($line = fgets($handle)) {
$line = rtrim($line);
$line = mb_convert_encoding($line,"UTF-8","UTF-16LE");
var_dump($line);
if (preg_match('/Optimization/',$line)){print "hit";}
}
var_dump shows the strange result why 28????
string(28) "Optimization=0"
and preg_match also dosen't hit.

You could try doing this:
while ($line = fgets($handle)) {
$line = rtrim($line);
$line = iconv(mb_detect_encoding($line, mb_detect_order(), true), "UTF-8", $line);;
var_dump($line);
}

fgets() won't possibly detect line endings reliably if the stream isn't encoded in an ASCII-compatible encoding. Similarly, when rtrim() seeks for e.g. \n ('LINE FEED (LF)' (U+000A)) it expects a literal 0x0A but in UTF-16LE the encoding is 0x0A00. Bad things can happen.
I suggest you read the file in chunks that are a multiple of 4 bytes, so you won't split individual characters, and forget about line endings until you've successfully re-encoded the file:
$output = '';
while ($line = fgets($handle, 4 * 4096)) {
$output .= mb_convert_encoding($line, "UTF-8", "UTF-16LE");
}
var_dump(bin2hex($output));
Ideally, save output to a file so you can use a text editor or hexadecimal editor to inspect the result.

Finally I use UTF-16BE not UTF-16LE , it shows the correct strings.
My problem was solved.
$line = mb_convert_encoding($line,"UTF-8","UTF-16BE");
However I don't know why it works,
Even file commend says This file is utf-16le
$file --mime myfile.ini
myfile.ini: text/plain; charset=utf-16le

fputcsv inserting extra "

I realized that strings which contains spaces are inserted on the csv file with an extra " at the beginning and at the end
if (!file_exists("./csv/file.csv")) {
$header = array("Arbol completo","Títol","Code","Parent Code","Servei","Urgència per defecte","Impacte","No es pot sol·licitar","Flux de Treball","SLA","Grup Resolutor-1","Grupo responsable catalogo","Informació","Documentació","Descripció","Llista autoritzats","Icona","Caracteristica", "Valor");
$fp = fopen("./csv/catalogo_de _peticiones_de_servicio.csv", "w");
fprintf($fp, $bom =( chr(0xEF) . chr(0xBB) . chr(0xBF)));
fputcsv($fp, $header,";");
fclose($fp);
}
it's somthing wrog ,it doens't have enconding problems I'm usen utf-8 charset

Specify enclosure as blank which is the 4th optional parameter of fputcsv. Default enclosure is "
fputcsv($fp, $header,";", '');
Reference Link

Php write strange character on txt file

I to everyone, when i execute thi code for write on a file:
$fileTXT = 'prodotti.txt';
$newfileTXT = 'prodotti_2'.date("d-m-Y_h_m_s").'.txt';
if (!copy($fileTXT, $newfileTXT)) {
echo "Impossibile continuare, impossibile creare file TXT.";
exit;
}
$towriteinfile = "";
$fp = fopen($path . $filename, "r") or die("Couldn't open $filename");
$fpTXT = fopen($newfileTXT, 'w') or die("Couldn't open $newfileTXT");
while (!feof($fp)) {
$line = fgets($fp, 1024);
$arr = explode("\t", $line);
$arr[7] = '<img src="http://link/imgHigh/' . $arr[7] . '.jpg" />;';
echo "Prodotto: ".$arr[4]."<br>";
foreach ($arr as $fields) {
fwrite($fpTXT, $fields.";");
}
fwrite($fpTXT, "\n");
}
fclose($fpTXT);
fclose($fp);
I have thi result on txt file:
175;13563;desc;01;category;..............c etc etc.....
mercato.㰻浩⁧牳㵣栢瑴㩰⼯睷⹷獯畣慬楴挮浯椯⽴慣⽴浩䡧杩⽨　㄀⸀㄀　　⸀砀砀 漀欀ഀ਀樮杰•㸯㬻
the html code for image is written as chinese caharcter, why?

Do you want to add content to the end of $newFileTXT from $filename ?
IF so, you should change:
$fpTXT = fopen($newfileTXT, 'w') or die("Couldn't open $newfileTXT");
to
$fpTXT = fopen($newfileTXT, 'a') or die("Couldn't open $newfileTXT");

The file is probably interpreted as unicode (probably UTF-8). In unicode, characters can consist of multiple bytes. When you read the file, you just read 1024 bytes, which can result in half a unicode character at the end of the part that you read, and the other half at the start of the next part. When you start adding new characters inbetween, you get other unicode sequences instead, causing the text to be a complete mess.

I have resolved the problem, i have passed any line to this function:
function cleanString($string){
$string = preg_replace('/[\x00-\x1F\x80-\xFF]/', '', $string);
return $string;
}
My old string contained binary chars, i have cleaned the string and now all is ok

Byte Order Mark causing session errors

I have an PHP app with houndreds of files. The problem is that one or several files apparently have a BOM in them, so including them causes error when creating the session... Is there a way how to reconfigure PHP or the server or how can I get rid of the BOM? Or at least identify the source? I would prefer a PHP solution if available

The real solution of course is to fix your editor settings (and the other team members as well) to not store files with UTF byte order mark. Read on here: https://stackoverflow.com/a/2558793/43959
You could use this function to "transparently" remove the BOM before including another PHP file.
Note: I really recommend you to fix your editor(s) / files instead of doing nasty things with eval() which i demonstrate here.
This is just a proof of concept:
bom_test.php:
<?php
function bom_safe_include($file) {
$fd = fopen($file, "r");
// read 3 bytes to detect BOM. file read pointer is now behind BOM
$possible_bom = fread($fd, 3);
// if the file has no BOM, reset pointer to beginning file (0)
if ($possible_bom !== "\xEF\xBB\xBF") {
fseek($fd, 0);
}
$content = stream_get_contents($fd);
fclose($fd);
// execute (partial) script (without BOM) using eval
eval ("?>$content");
// export global vars
$GLOBALS += get_defined_vars();
}
// include a file
bom_safe_include("test_include.php");
// test function and variable from include
test_function($test);
test_include.php, with BOM at beginning
test
<?php
$test = "Hello World!";
function test_function ($text) {
echo $text, PHP_EOL;
}
OUTPUT:
kaii#test$ php bom_test.php
test
Hello World!

I have been able to identify the files that carried BOM inside them with this script, maybe it helps someone else with the same problem in the future. Works without eval().
function fopen_utf8 ($filename) {
$file = #fopen($filename, "r");
$bom = fread($file, 3);
if ($bom != b"\xEF\xBB\xBF")
{
return false;
}
else
{
return true;
}
}
function file_array($path, $exclude = ".|..|libraries", $recursive = true) {
$path = rtrim($path, "/") . "/";
$folder_handle = opendir($path);
$exclude_array = explode("|", $exclude);
$result = array();
while(false !== ($filename = readdir($folder_handle))) {
if(!in_array(strtolower($filename), $exclude_array)) {
if(is_dir($path . $filename . "/")) {
// Need to include full "path" or it's an infinite loop
if($recursive) $result[] = file_array($path . $filename . "/", $exclude, true);
} else {
if ( fopen_utf8($path . $filename) )
{
//$result[] = $filename;
echo ($path . $filename . "<br>");
}
}
}
}
return $result;
}
$files = file_array(".");

vim $(find . -name \*.php)
once inside vim:
:argdo :set nobomb | :w

fputcsv and newline codes

I'm using fputcsv in PHP to output a comma-delimited file of a database query. When opening the file in gedit in Ubuntu, it looks correct - each record has a line break (no visible line break characters, but you can tell each record is separated,and opening it in OpenOffice spreadsheet allows me to view the file correctly.)
However, we're sending these files on to a client on Windows, and on their systems, the file comes in as one big, long line. Opening it in Excel, it doesn't recognize multiple lines at all.
I've read several questions on here that are pretty similar, including this one, which includes a link to the really informative Great Newline Schism explanation.
Unfortunately, we can't just tell our clients to open the files in a "smarter" editor. They need to be able to open them in Excel. Is there any programmatic way to ensure that the correct newline characters are added so the file can be opened in a spreadsheet program on any OS?
I'm already using a custom function to force quotes around all values, since fputcsv is selective about it. I've tried doing something like this:
function my_fputcsv($handle, $fieldsarray, $delimiter = "~", $enclosure ='"'){
$glue = $enclosure . $delimiter . $enclosure;
return fwrite($handle, $enclosure . implode($glue,$fieldsarray) . $enclosure."\r\n");
}
But when the file is opened in a Windows text editor, it still shows up as a single long line.

// Writes an array to an open CSV file with a custom end of line.
//
// $fp: a seekable file pointer. Most file pointers are seekable,
// but some are not. example: fopen('php://output', 'w') is not seekable.
// $eol: probably one of "\r\n", "\n", or for super old macs: "\r"
function fputcsv_eol($fp, $array, $eol) {
fputcsv($fp, $array);
if("\n" != $eol && 0 === fseek($fp, -1, SEEK_CUR)) {
fwrite($fp, $eol);
}
}

This is an improved version of #John Douthat's great answer, preserving the possibility of using custom delimiters and enclosures and returning fputcsv's original output:
function fputcsv_eol($handle, $array, $delimiter = ',', $enclosure = '"', $eol = "\n") {
$return = fputcsv($handle, $array, $delimiter, $enclosure);
if($return !== FALSE && "\n" != $eol && 0 === fseek($handle, -1, SEEK_CUR)) {
fwrite($handle, $eol);
}
return $return;
}

Using the php function fputcsv writes only \n and cannot be customized. This makes the function worthless for microsoft environment although some packages will detect the linux newline also.
Still the benefits of fputcsv kept me digging into a solution to replace the newline character just before sending to the file. This can be done by streaming the fputcsv to the build in php temp stream first. Then adapt the newline character(s) to whatever you want and then save to file. Like this:
function getcsvline($list, $seperator, $enclosure, $newline = "" ){
$fp = fopen('php://temp', 'r+');
fputcsv($fp, $list, $seperator, $enclosure );
rewind($fp);
$line = fgets($fp);
if( $newline and $newline != "\n" ) {
if( $line[strlen($line)-2] != "\r" and $line[strlen($line)-1] == "\n") {
$line = substr_replace($line,"",-1) . $newline;
} else {
// return the line as is (literal string)
//die( 'original csv line is already \r\n style' );
}
}
return $line;
}
/* to call the function with the array $row and save to file with filehandle $fp */
$line = getcsvline( $row, ",", "\"", "\r\n" );
fwrite( $fp, $line);

As webbiedave pointed out (thx!) probably the cleanest way is to use a stream filter.
It is a bit more complex than other solutions, but even works on streams that are not editable after writing to them (like a download using $handle = fopen('php://output', 'w'); )
Here is my approach:
class StreamFilterNewlines extends php_user_filter {
function filter($in, $out, &$consumed, $closing) {
while ( $bucket = stream_bucket_make_writeable($in) ) {
$bucket->data = preg_replace('/([^\r])\n/', "$1\r\n", $bucket->data);
$consumed += $bucket->datalen;
stream_bucket_append($out, $bucket);
}
return PSFS_PASS_ON;
}
}
stream_filter_register("newlines", "StreamFilterNewlines");
stream_filter_append($handle, "newlines");
fputcsv($handle, $list, $seperator, $enclosure);
...

alternatively, you can output in native unix format (\n only) then run unix2dos on the resulting file to convert to \r\n in the appropriate places. Just be careful that your data contains no \n's . Also, I see you are using a default separator of ~ . try a default separator of \t .

I've been dealing with a similiar situation. Here's a solution I've found that outputs CSV files with windows friendly line-endings.
http://www.php.net/manual/en/function.fputcsv.php#90883
I wasn't able to use the since I'm trying to stream a file to the client and can't use the fseeks.

windows needs \r\n as the linebreak/carriage return combo in order to show separate lines.

I did eventually get an answer over at experts-exchange; here's what worked:
function my_fputcsv($handle, $fieldsarray, $delimiter = "~", $enclosure ='"'){
$glue = $enclosure . $delimiter . $enclosure;
return fwrite($handle, $enclosure . implode($glue,$fieldsarray) . $enclosure.PHP_EOL);
}
to be used in place of standard fputcsv.

We Keep Coding

PHP, A popular general-purpose scripting language that is especially suited to web development.

Excel utf8 encoding (BOM is not working) - php

The BOM you've mentioned is for UTF-8, but your data is UTF-16LE. Therefore you should use a different BOM: $bom = chr(0xFF) . chr(0xFE) Or in your code: $fp = fopen($filePath, 'w'); fputs($fp, chr(0xFF) . chr(0xFE)); // Add lines here... fclose($fp);

Related

utf-16le to UTF-8

fputcsv inserting extra "

Php write strange character on txt file

Byte Order Mark causing session errors

fputcsv and newline codes

Categories

Resources