php write hebrew / UTF-8 in csv - php

Hi I am trying to write a CSV with HEBREW text in it. It writes some symbol not an Hebrew text. Below is my PHP CODE.
<?php
$list = array (
array('שלטל', 'שלטל', 'שלטל', 'שלטל'),
array('123', '456', '789'),
array('"שלטל"', '"שלטל"')
);
$fp = fopen('file.csv', 'w');
//fputs($fp, $bom =( chr(0xEF) . chr(0xBB) . chr(0xBF) ));
foreach ($list as $fields) {
fputcsv($fp, $fields);
}
fclose($fp);
?>
I checked in internet and added "fputs($fp, $bom =( chr(0xEF) . chr(0xBB) . chr(0xBF) ))" but it didnt work out. Can some one help me.
Below is the out put i am getting.

Just ran your code. The text is correctly encoded and generated csv is valid. Opening the CSV in a text editor that supports Hebrew text will show correctly. To open CSV containing Hebrew you need to follow instructions as suggested here
Update:
So turns out MS uses UTF-16 and not only that it uses UTF-16LE (little endian). For MS CSV to open correctly, you need UTF-16LE encoded text, tab delimited file.
$list = array (
array('שלטל', 'שלטל', 'שלטל', 'שלטל'),
array('123', '456', '789'),
array('"שלטל"', '"שלטל"')
);
$fp = fopen('file.csv', 'w');
//UTF-16LE BOM
fputs($fp, chr(0xFF) . chr(0xFE));
foreach ($list as $fields) {
$out = '';
foreach ($fields as $k => $v){
$fields[$k] = mb_convert_encoding($v, 'UTF-16LE', 'UTF-8');
}
// UTF-16LE tab
$out = implode(chr(0x09).chr(0x00), $fields);
// UTF-16LE new line
fputs($fp, $out.chr(0x0A).chr(0x00));
}
fclose($fp);
Above code works, not sure about its efficiency though.

I think I've encountered this before and sometimes some versions of excel just isn't showing proper utf8 characters/fonts. but the file itself have utf8 already.
Try opening your csv into Notepad++ and see if it is a utf8-bom and if its showing utf8 characters

Related

utf-16le to UTF-8

I am using php on osx terminal to open the file generated with windows.
I confirmed file is utf-16le encoded
$file --mime myfile.ini
myfile.ini: text/plain; charset=utf-16le
Now I convert it to UTF-8 with this script.
while ($line = fgets($handle)) {
$line = rtrim($line);
$line = mb_convert_encoding($line,"UTF-8","UTF-16LE");
var_dump($line);
}
somehow it shows the corruption like this
string(63) "䘀爀漀洀䐀愀琀攀㴀㈀ ㄀㄀⸀ ㄀⸀ ㄀ഀ਀"
How can I get the correct encoding???
When I don't use mb_convert_encoding
while ($line = fgets($handle)) {
$line = rtrim($line);
$line = mb_convert_encoding($line,"UTF-8","UTF-16LE");
var_dump($line);
if (preg_match('/Optimization/',$line)){print "hit";}
}
var_dump shows the strange result why 28????
string(28) "Optimization=0"
and preg_match also dosen't hit.
You could try doing this:
while ($line = fgets($handle)) {
$line = rtrim($line);
$line = iconv(mb_detect_encoding($line, mb_detect_order(), true), "UTF-8", $line);;
var_dump($line);
}
fgets() won't possibly detect line endings reliably if the stream isn't encoded in an ASCII-compatible encoding. Similarly, when rtrim() seeks for e.g. \n ('LINE FEED (LF)' (U+000A)) it expects a literal 0x0A but in UTF-16LE the encoding is 0x0A00. Bad things can happen.
I suggest you read the file in chunks that are a multiple of 4 bytes, so you won't split individual characters, and forget about line endings until you've successfully re-encoded the file:
$output = '';
while ($line = fgets($handle, 4 * 4096)) {
$output .= mb_convert_encoding($line, "UTF-8", "UTF-16LE");
}
var_dump(bin2hex($output));
Ideally, save output to a file so you can use a text editor or hexadecimal editor to inspect the result.
Finally I use UTF-16BE not UTF-16LE , it shows the correct strings.
My problem was solved.
$line = mb_convert_encoding($line,"UTF-8","UTF-16BE");
However I don't know why it works,
Even file commend says This file is utf-16le
$file --mime myfile.ini
myfile.ini: text/plain; charset=utf-16le

Convert encoding of file data using stream_filter_append

I am trying to convert a file I am creating to UTF-16LE during the creation process using stream_filter_append(). But, I'm just getting garbled data from the output. Example code below
$fp = fopen($filename, "w");
fwrite($fp, chr(255) . chr(254));
$rows = array(array('Состав Gerber', 'Секреты производства'), array('Полезные аксессуары', 'Инструменты'));
foreach($rows as $row)
{
fputcsv($fp, array_values($row));
}
stream_filter_append($fp, 'convert.iconv.UTF-8/UTF-16LE', STREAM_FILTER_WRITE);
fclose($fp);
I can't use iconv on the data before passing it through fputcsv, because fputcsv doesn't handle UTF-16. saving CSV with UTF-16BE encoding in PHP
I know I have the option to create the file, then read and convert it. Or even create a custom fputcsv to handle UTF-16. But I was wondering if this would be possible during the initial file creation process using stream_filter_append().
This was solved by #Sammitch in the comments of the question. He realized the placement of stream_filter_append() was incorrectly used after writing the data. It should be used before writing. A correct example is below.
$fp = fopen($filename, "w");
fwrite($fp, chr(255) . chr(254));
stream_filter_append($fp, 'convert.iconv.UTF-8/UTF-16LE', STREAM_FILTER_WRITE);
$rows = array(array('Состав Gerber', 'Секреты производства'), array('Полезные аксессуары', 'Инструменты'));
foreach($rows as $row)
{
fputcsv($fp, array_values($row));
}
fclose($fp);

fputcsv inserting extra "

I realized that strings which contains spaces are inserted on the csv file with an extra " at the beginning and at the end
if (!file_exists("./csv/file.csv")) {
$header = array("Arbol completo","Títol","Code","Parent Code","Servei","Urgència per defecte","Impacte","No es pot sol·licitar","Flux de Treball","SLA","Grup Resolutor-1","Grupo responsable catalogo","Informació","Documentació","Descripció","Llista autoritzats","Icona","Caracteristica", "Valor");
$fp = fopen("./csv/catalogo_de _peticiones_de_servicio.csv", "w");
fprintf($fp, $bom =( chr(0xEF) . chr(0xBB) . chr(0xBF)));
fputcsv($fp, $header,";");
fclose($fp);
}
it's somthing wrog ,it doens't have enconding problems I'm usen utf-8 charset
Specify enclosure as blank which is the 4th optional parameter of fputcsv. Default enclosure is "
fputcsv($fp, $header,";", '');
Reference Link

Excel utf8 encoding (BOM is not working)

I just want to export data in the csv format and open it in excel. This method writes one row into it.
public function writeRow(array $row)
{
$str = $this->rowToStr($row);
$encodedStr = mb_convert_encoding($str, 'UTF-16LE', 'UTF-8');
$ret = fwrite($this->_getFilePointer('w+'), $encodedStr);
/* According to http://php.net/fwrite the fwrite() function
should return false on error. However not writing the full
string (which may occur e.g. when disk is full) is not considered
as an error. Therefore both conditions are necessary. */
if (($ret === false) || (($ret === 0) && (strlen($str) > 0))) {
throw new Exception("Cannot open file $this",
Exception::WRITE_ERROR, NULL, 'writeError');
}
}
Then i will try to write a row.
$csvFile->writeRow(array(chr(0xEF) . chr(0xBB) . chr(0xBF)));
$csvHeaders = array('ID', 'Email', 'Variabilní symbol', 'Jméno', 'Příjmení',
'Stav', 'Zaregistrován', 'Zaregistrován do');
$csvFile->writeRow($csvHeaders);
And the result is :
ID,"Email","Variabilní symbol","Jméno","PYíjmení","Stav","Zaregistrován","Zaregistrován do"
Only a few letters are not correct (the method mb_convert_encoding does the trick)
I have tried the traditional way
// Open file pointer to standard output
$fp = fopen($filePath, 'w');
// Add BOM to fix UTF-8 in Excel
fputs($fp, $bom = (chr(0xEF) . chr(0xBB) . chr(0xBF)));
fclose($fp)
And the result was the same.
The BOM you've mentioned is for UTF-8, but your data is UTF-16LE. Therefore you should use a different BOM:
$bom = chr(0xFF) . chr(0xFE)
Or in your code:
$fp = fopen($filePath, 'w');
fputs($fp, chr(0xFF) . chr(0xFE));
// Add lines here...
fclose($fp);

Php write strange character on txt file

I to everyone, when i execute thi code for write on a file:
$fileTXT = 'prodotti.txt';
$newfileTXT = 'prodotti_2'.date("d-m-Y_h_m_s").'.txt';
if (!copy($fileTXT, $newfileTXT)) {
echo "Impossibile continuare, impossibile creare file TXT.";
exit;
}
$towriteinfile = "";
$fp = fopen($path . $filename, "r") or die("Couldn't open $filename");
$fpTXT = fopen($newfileTXT, 'w') or die("Couldn't open $newfileTXT");
while (!feof($fp)) {
$line = fgets($fp, 1024);
$arr = explode("\t", $line);
$arr[7] = '<img src="http://link/imgHigh/' . $arr[7] . '.jpg" />;';
echo "Prodotto: ".$arr[4]."<br>";
foreach ($arr as $fields) {
fwrite($fpTXT, $fields.";");
}
fwrite($fpTXT, "\n");
}
fclose($fpTXT);
fclose($fp);
I have thi result on txt file:
175;13563;desc;01;category;..............c etc etc.....
mercato.㰻浩⁧牳㵣栢瑴㩰⼯睷⹷獯畣慬楴挮浯椯⽴慣⽴浩䡧杩⽨ ㄀⸀㄀  ⸀砀砀 漀欀ഀ਀樮杰•㸯㬻
the html code for image is written as chinese caharcter, why?
Do you want to add content to the end of $newFileTXT from $filename ?
IF so, you should change:
$fpTXT = fopen($newfileTXT, 'w') or die("Couldn't open $newfileTXT");
to
$fpTXT = fopen($newfileTXT, 'a') or die("Couldn't open $newfileTXT");
The file is probably interpreted as unicode (probably UTF-8). In unicode, characters can consist of multiple bytes. When you read the file, you just read 1024 bytes, which can result in half a unicode character at the end of the part that you read, and the other half at the start of the next part. When you start adding new characters inbetween, you get other unicode sequences instead, causing the text to be a complete mess.
I have resolved the problem, i have passed any line to this function:
function cleanString($string){
$string = preg_replace('/[\x00-\x1F\x80-\xFF]/', '', $string);
return $string;
}
My old string contained binary chars, i have cleaned the string and now all is ok

Categories