Convert txt file encoding from DOS737 to UTF8 - php

I have a txt file that has greek characters. When i open the file with notepad it shows that the encoding is ASCII.
But the only way that i can read the greek characters is to change (in openoffice writer or Editpad lite) the character set to DOS737.
The process that i need to implement in PHP is to open the file, split the text and import it to database. Everything is ok except that i cannot get the greek characters as they are.
I tried iconv but with no result.
I also tried mb_convert_encoding($data[0], "DOS737"); but i get warning mb_convert_encoding(): Unknown encoding "DOS737"
Also tried utf8_encode but with no luck
Any suggestions?

Finally found it.
It was easy... For anyone that might have the same issue use iconv("cp737","UTF-8","$string");

Related

How do I to export utf8mb4 mysql data using php to a csv file

I am looking for help on a csv file export.
I have a mysql database encoded as utf8mb4 (unicode_ci) with a table using collation utf8mb4_unicode_ci for my fields. The data contains special characters such as copyright symbols, foreign characters such as "é", etc. I am trying to export data to a csv file but the string values that contain special characters are not translating over properly. For example, the copyright symbol comes up as "¬Æ" in the csv file I generate.
My environment is Laravel 7, PHP 7 and MySQL 5.7 on Ubunutu 18.0.4. My database connection is already setup as charset = "utf8mb4" and collation = "utf8mb4_unicode_ci" in my Laravel database config file. The meta tag in my page header is already set to use charset=utf-8 and the header used to generate the csv file is set to:
header('Content-Type: text/csv; charset=utf-8');
I have tried using:
iconv("utf-8", "ascii//TRANSLIT//IGNORE", $mystring);
but this only replaces some of the values with ascii representations and not the proper symbols. I have also tried using something like
htmlspecialchars($mystring, ENT_QUOTES, "UTF-8");
but this still returns "®" for the copyright symbol and other strange character sequences in the csv file. When I echo the values in php, they appear correctly on my page. Am I right in thinking that I need to somehow convert the utf8mb4 string to regular utf-8 when I append the row to my csv file? I have not been able to find a solution and am looking for some help.
Can anyone tell me what I need to do to get the expected symbols in my csv file?
Jerry's comment
You don't show the code you use to actually write the file. Also, you don't say how you're inspecting the result (if you are using Excel, that could be the problem).
and Sammitch's comment
It's not that the data is not exporting properly, it's that the program that is reading or displaying it is not using the correct charset. You can try adding a UTF8 BOM \xEF\xBB\xBF to the beginning of the file and the program may use that as a signal to apply the correct charset. Failing that, look up how to open UTF8 CSVs properly in that program. Failing that you'll need to translate the data to a charset that the program does handle correctly.
were helpful. I was using Excel to preview the file. When I looked at the raw csv data in a code editor, the expected characters are there so it is something with the way Excel handles the file. Since I am working on a Mac and the © symbol is being entered with [Option] + [G], the é is [Option] + [E], etc. it would make sense that it could be a translation problem with how Excel reads the file. Adding \xEF\xBB\xBF to the beginning of the file seems to have done the trick!
If you stored utf8 values into a column declared latin1, fix that first.
Do not use any conversion routines.
Do verify the data in the tables using SELECT(hex) and SHOW CREATE TABLE
More: Trouble with UTF-8 characters; what I see is not what I stored

Reading czech characters from a txt file using PHP

I’m having issues with reading Czech characters from a txt file.
I want to read .txt files containing categories line by line. With general languages I have no issue. I can read the txt file line by line and copy the categories that I want in an array.
But as soon as I want to read a txt file that contains categories in the Czech language I get problems processing the output of my code. The Czech specific characters are coming out rubbish even though the text file is showing the characters correctly.
As an example:
The letters ě, č, ů or ř are all outputed as a square or as st\u001b or other rubish, depending on the way I read the file.
Origionally I use the fgets function to read a line from the text file.
But as this didn’t return the correct characters I started testing with adding utf8_encode but whilst that changed some characters it still didn’t restore all the characters.
Then I started experimenting with mb_detect_encoding combined with mb_convert_encoding and later read somewhere that fgets could sometimes return incorrect characters so I started testing with file_get_contents. This also didn’t solve the issue.
I assume the main issue is with the way I’m reading the txt file as the output from the fgets and file_get_contents functions are garbled from the start.
Can anyone tell me how to read a text file with Czech characters correctly?
Thanks In advance.
Oké I found the solution myself. Just for the case someone else runs into this issue, the txt file was in the wrong coding. The file was in the "UCS-2 Little Endian" coding. After loading the file in Notepad++ I could encode it to the UTF-8 format and that solved the problem.

Saving csv file in Latin 2 (CP852) encoding

I have to create a csv (semicolon separated) file as export for some Česká pošta DOS based system. Its not basicaly problem, but it have to be strictly in CP852 encoding (which is extended ASCII table with czech special characters ěščřžýáíé etc. named also Latin 2). So its no solution to remove diacritics.
I tried many approches, search through stack and google, but find no working solution. Source is saved in UTF8. I try to use iconv, mb_convert_encoding and also some other libraries, but nothing works right.
Closest is to use
iconv("UTF-8", "ISO-8859-2", 'abcěščřžýáíé');
Which looks in browser that is correctly encoded in ISO
abc���������
(when I change encoding to ISO, it shows character right).
abcěščřžýáíé
But when I write it into file by FPutS and read it again by FGetS followed by mb_detect_encoding that string, it shows UTF-8 instead. Pošta's software is very strict about encoding, so I have to make it right.
When I simply use for example
$text = iconv("UTF-8", "ASCII", 'abcěščřžýáíé');
it removes all characters with diacritics, but its not solution.
Is there anyone able to help with it? I am realy stuck ...

PHP : csv file encoding?

I have a stupid problem. I use a software for export .csv files, and the result is a strange formated text. When I try to deal them in PHP, everything goes wrong.
I copy and paste the text in MS WORD : there is a strange character between each letter.
In php I tried to convert it using utf8_decode/utf8_encode, iconv("ISO-8859-1", "WINDOWS-1252", $str)... in vain.
I guess it's an utf16 encoded text, but I'm not sure of it. I tried some functions to decode utf16, in vain too.
Is someone has a solution to fix this ?
Your guess it correct:
file -i NL_JGFR_130326_bac.csv
NL_JGFR_130326_bac.csv: text/plain; charset=utf-16le
You can probably use the PHP MultiByte extension to work with UTF-16:
http://php.net/manual/en/ref.mbstring.php

uploaded file contents being echoed out but not showing accent marks

INT. PALO TORCIDO HIGH SCHOOL, CAFETER�A - DAY
Hi, I uploaded a .txt to my server and got the contents with fopen/fread and alsot used file_get_contents just in case.
I can't seem to figure out how to encode the special characters...
In my HTML i have my UTF set to 8. I also tried a PHP HEADER to use UTF-8 encoding.
what is the proper way to handle files with letters not part of the english alphabet?
Try utf8_encode()
echo utf8_encode(file_get_contents('file.txt'));
This works if the *.txt is encoded in Latin1. If other encoding may be used too, detect the encoding using mb_detect_encoding() and encode it to UTF8 with mb_convert_encoding()

Categories