PHP : csv file encoding?

PHP : csv file encoding? - php

I have a stupid problem. I use a software for export .csv files, and the result is a strange formated text. When I try to deal them in PHP, everything goes wrong.
I copy and paste the text in MS WORD : there is a strange character between each letter.
In php I tried to convert it using utf8_decode/utf8_encode, iconv("ISO-8859-1", "WINDOWS-1252", $str)... in vain.
I guess it's an utf16 encoded text, but I'm not sure of it. I tried some functions to decode utf16, in vain too.
Is someone has a solution to fix this ?

Your guess it correct:
file -i NL_JGFR_130326_bac.csv
NL_JGFR_130326_bac.csv: text/plain; charset=utf-16le
You can probably use the PHP MultiByte extension to work with UTF-16:
http://php.net/manual/en/ref.mbstring.php

Related

PHP - How to save a file in Windows-1252?

I work on a system that automates signature generation for outlook. The part to generate the .htm files works great. But now I need to also add files in .txt format. If I use the content without any change in the encoding, all my accentuated characters are converted to a different value for example : "é" becomes "Ã©" or "ô" becomes "Ã´".
This issue clearly looked like an encoding conflict of some sort. I tried to correct it by converting the text value input to the "Windows-1252" encoding.
$myText = iconv( mb_detect_encoding( $myText ) , "Windows-1252//TRANSLIT", $myText);
But it didn't change anything. I also tried with :
$myText = mb_convert_encoding($myText, "Windows-1252");
And it didn't work either. For both of these tests, I checked the file type with Atom (my IDE) and it recognise these files as UTF-8. But when I check on terminal with file -I signature.txt it responds with this encoding signature.txt: text/plain; charset=iso-8859-1
Note that if I manually change the encoding to Windows-1252 in Atom, the characters are correct.
Has anyone met the same problem ? Is there another way in php to specify the encoding of the file ?

I figured it out. The code to use was (as pointed out by #Powerlord):
$monTexteTXT = mb_convert_encoding($monTexteTXT, "Windows-1252", "UTF-8");
I had a false negative when I first tried this solution because when I opened the file the characters seemed broken. But once it was opened with outlook it was fine.

Convert txt file encoding from DOS737 to UTF8

I have a txt file that has greek characters. When i open the file with notepad it shows that the encoding is ASCII.
But the only way that i can read the greek characters is to change (in openoffice writer or Editpad lite) the character set to DOS737.
The process that i need to implement in PHP is to open the file, split the text and import it to database. Everything is ok except that i cannot get the greek characters as they are.
I tried iconv but with no result.
I also tried mb_convert_encoding($data[0], "DOS737"); but i get warning mb_convert_encoding(): Unknown encoding "DOS737"
Also tried utf8_encode but with no luck
Any suggestions?

Finally found it.
It was easy... For anyone that might have the same issue use iconv("cp737","UTF-8","$string");

html_entity_decode in FPDF

I Have the some problem als this post from 2 year ago on this website. I tried everything, but nothing help.
Does anybody over here a working solutions for this problem?
html_entity_decode in FPDF(using tFPDF extention)
I am using tFPDF to generate a PDF. The php file is UTF-8 encoded. I want © for example, to be output in the pdf as the copyright symbol.
I have tried iconv, html_entity_decode, htmlspecialchars_decode. When I take the string I am trying to decode and hard-code it in to a different file and decode it, it works as expected. So for some reason it is not being output in the PDF. I have tried output buffering. I am using DejaVuSansCondensed.ttf (true type fonts).
Link to tFPDF: http://fpdf.org/en/script/script92.php
I am out of ideas. I tried double decoding, I checked everywhere to make sure it was not being encoded anywhere else.
Help!

Save filename with unicode chars

I have searched all over the Internet and SO, still no luck in the following:
I would like to know, how to properly save a file using file_put_contents when filename has some unicode characters. (Windows 7 as OS)
$string = "jérôme.jpg" ; //UTF-8 string
file_put_contents("images/" . $string, "stuff");
Resuts in a file:
jГ©rГґme.jpg
Tried all possible combinations of such functions as iconv and mb_convert_encoding with all possible encodings, converting source file into different encodings as well.
All proper headers are set, browser recognises UTF-8 properly.
However, I can successfully copy-paste and create a file with such a name in explorer's GUI, but how to make it via PHP?
The last hardcore solution was to urlencode the string and save file.

This might be late but i just found a solution to close this hurting issue for me as well.
Forget about iconv and multibyte solutions; the problem is on Windows! (in the link you'll find all it's beauty about this.)
After numerous attempts and ways to solve this, i met with URLify and decided that best way to cope with unicode-in-filenames is to transliterate them before writing to file.
Example of transliterating a filename before saving it:
$filename = "Αρχείο.php"; // greek name for 'file'
echo URLify::filter($filename,128,"",TRUE);
// output: arxeio.php

First letter disappear if it has an accent (CSV file, UTF-8 encoded)

I'm actually working on a web application coded in php with zend framework. I need to translate every pages in french and english so I use csv file to do it.
My problem is when a word start with an accentued letter like É or À, the letter just disappear, but the rest of the word is displayed.
For example, if my csv file contains Écriture, it displays criture. But if I have exécution, it displays exécution without any problems.
Everytime I want to display text in my view, I just call <?php echo $this->translate('line to call in csv'); ?> and my text is displayed.
Like I said ,my application is encoded with UTF-8, and I don't have any problems withs specials characters, except when they're first. I googled it but couldn't find anything for now.
Thanks already for your help !
UPDATE
I forgot to say that when I execute my application in zend browser to debug it, everything's fine, my É displays. It's only in broswers like IE or FF that I have the problem.
UPDATE #2
I just found another post talking about fgetcsv, and it looks like the function I use to translate from my csv file is using fgetcsv() ... could it be the problem ? And if it is, how can I fix it ? It's coded like that in Zend Translate library I'm not sure I want to start changing things there ...
UPDATE #3
I continued my research and I found issues in PHP when encoded UTF-8. But Zend Framework is encoded UTF-8 by default so I'm sure there is a way to make this work.. I'm still searching but I hope someone has the solution !

I had the same problem, I tried AJ's solution and it worked:
Missing first character of fields in csv
The problem seems to be that fgetcsv() uses locale settings, just use
setlocale(LC_ALL, 'en_US.UTF-8');

In .csv file content try to use
; as delimiter
and
" as enclosure.
something like this inside .csv file
"key1";"value1" ##first line
"key1";"value1" ##second line
"key1";"value1" ##fird line
this solve like ussue for me

view csv file using hex editor and make sure it is encoded in the right way
"É" is 0xC3 0x89,
"À" is 0xC3 0x80

Did you have some strtoupper() or ucfirst() or similar functions in your code? In that case try mb_strtoupper($str, 'UTF-8')

We Keep Coding

PHP, A popular general-purpose scripting language that is especially suited to web development.