Why fgetcsv adds some characters between characters? - php

I use a php script with fgetcsv() to import data from csv files I did not create (no choice).
My problem is the data are imported to my database with a special character between each character, on each field...
For example, "Mounted print" is imported as "�M�o�u�n�t�e�d� �P�r�i�n�t�"
I tried to modify encoding with no result : utf8_encode/decode, iconv ...
Any idea ?
Thanks

Related

How do I to export utf8mb4 mysql data using php to a csv file

I am looking for help on a csv file export.
I have a mysql database encoded as utf8mb4 (unicode_ci) with a table using collation utf8mb4_unicode_ci for my fields. The data contains special characters such as copyright symbols, foreign characters such as "é", etc. I am trying to export data to a csv file but the string values that contain special characters are not translating over properly. For example, the copyright symbol comes up as "¬Æ" in the csv file I generate.
My environment is Laravel 7, PHP 7 and MySQL 5.7 on Ubunutu 18.0.4. My database connection is already setup as charset = "utf8mb4" and collation = "utf8mb4_unicode_ci" in my Laravel database config file. The meta tag in my page header is already set to use charset=utf-8 and the header used to generate the csv file is set to:
header('Content-Type: text/csv; charset=utf-8');
I have tried using:
iconv("utf-8", "ascii//TRANSLIT//IGNORE", $mystring);
but this only replaces some of the values with ascii representations and not the proper symbols. I have also tried using something like
htmlspecialchars($mystring, ENT_QUOTES, "UTF-8");
but this still returns "®" for the copyright symbol and other strange character sequences in the csv file. When I echo the values in php, they appear correctly on my page. Am I right in thinking that I need to somehow convert the utf8mb4 string to regular utf-8 when I append the row to my csv file? I have not been able to find a solution and am looking for some help.
Can anyone tell me what I need to do to get the expected symbols in my csv file?
Jerry's comment
You don't show the code you use to actually write the file. Also, you don't say how you're inspecting the result (if you are using Excel, that could be the problem).
and Sammitch's comment
It's not that the data is not exporting properly, it's that the program that is reading or displaying it is not using the correct charset. You can try adding a UTF8 BOM \xEF\xBB\xBF to the beginning of the file and the program may use that as a signal to apply the correct charset. Failing that, look up how to open UTF8 CSVs properly in that program. Failing that you'll need to translate the data to a charset that the program does handle correctly.
were helpful. I was using Excel to preview the file. When I looked at the raw csv data in a code editor, the expected characters are there so it is something with the way Excel handles the file. Since I am working on a Mac and the © symbol is being entered with [Option] + [G], the é is [Option] + [E], etc. it would make sense that it could be a translation problem with how Excel reads the file. Adding \xEF\xBB\xBF to the beginning of the file seems to have done the trick!
If you stored utf8 values into a column declared latin1, fix that first.
Do not use any conversion routines.
Do verify the data in the tables using SELECT(hex) and SHOW CREATE TABLE
More: Trouble with UTF-8 characters; what I see is not what I stored

Python equivalent of php FILTER_FLAG_STRIP_HIGH

Parsing a large data set of poor quality data converted from pysical form using OCR and using PostgreSQL COPY to insert .csv files into psql. Some records have ASCII bytes that are causing errors to import into postgres since I want the data in UTF-8 varchar(), as I believe that using a TEXT type column would not produce this error.
DataError: invalid byte sequence for encoding "UTF8": 0xd6 0x53
CONTEXT: COPY table_name, line 112809
I want to filter all these bytes before writing to the csv file.
I believe something like PHP's FILTER_FLAG_STRIP_HIGH (http://php.net/manual/en/filter.filters.sanitize.php) would work since it can remove all high ASCII value > 127.
Is there such a function in python?
Encode your string to ASCII, ignoring errors, then decode that back to a string.
text = "ƒart"
text = text.encode("ascii", "ignore").decode()
print(text) # art
If you are starting with a byte string in UTF-8, then you just need to decode it:
bites = "ƒart".encode("utf8")
text = bites.decode("ascii", "ignore")
print(text) # art
This works specifically with UTF-8 because multi-byte characters always use values outside of the ASCII range, so partial characters are never stripped out. It mightn't work so well with other encodings.

PHP fputcsv does not display Chinese Character Correctly

I need your help to finish my project. I take the data from my json files, some of which consist of chinese characters, but when I try to write to .csv it does not display properly.
This is my code
function writeCsv()
{
$resource = fopen('c:/xampp/test.json','w');
$csvBodyData = [ 'item'=> '逆始感録機政'];
fputcsv($resource, $csvBodyData);
}
I have tried the following solution but it's still not working.
write utf-8 characters to file with fputcsv in php
I got this character "???".
In your case the problem was not in PHP. When you open a csv file in Excel it shows you a window, where you can setup CSV importing options like delimiter and encoding. You should choose UTF-8 encoding to view those Chinese characters.

php + fgetcsv does not support some special characters

I am uploading csv file and its content is fetching using the function fgetcsv,
I am already using utf8 encoding still some characters are gets converted in to ?
Following are some of those charcters:
ť č ň
Is there any way which accept all the special charcters of any language which supports while reading CSV file.
How to add the BOM element while reading CSV
Try fgets instead of fgetcsv. fgetcsv() tries to be binary-safe, but it's actually not.

Convert txt file encoding from DOS737 to UTF8

I have a txt file that has greek characters. When i open the file with notepad it shows that the encoding is ASCII.
But the only way that i can read the greek characters is to change (in openoffice writer or Editpad lite) the character set to DOS737.
The process that i need to implement in PHP is to open the file, split the text and import it to database. Everything is ok except that i cannot get the greek characters as they are.
I tried iconv but with no result.
I also tried mb_convert_encoding($data[0], "DOS737"); but i get warning mb_convert_encoding(): Unknown encoding "DOS737"
Also tried utf8_encode but with no luck
Any suggestions?
Finally found it.
It was easy... For anyone that might have the same issue use iconv("cp737","UTF-8","$string");

Categories