I have a php script with utf-8 encoding. In it I have an array with special characters (like and n with a ~ on top). It looks just fine in my editor. The php matches the array with text coming in from a html form and writes a csv file. When I write the file I do it like this;
fwrite($fp,utf8_encode($data),strlen($data)+100);
When I open the file it says it is utf-8 encoded but the charters are all messed up.
have you tried without using utf_encode() on the data?
it seems that you are reencoding something that's already utf-8 encoded
Related
I am looking for help on a csv file export.
I have a mysql database encoded as utf8mb4 (unicode_ci) with a table using collation utf8mb4_unicode_ci for my fields. The data contains special characters such as copyright symbols, foreign characters such as "é", etc. I am trying to export data to a csv file but the string values that contain special characters are not translating over properly. For example, the copyright symbol comes up as "¬Æ" in the csv file I generate.
My environment is Laravel 7, PHP 7 and MySQL 5.7 on Ubunutu 18.0.4. My database connection is already setup as charset = "utf8mb4" and collation = "utf8mb4_unicode_ci" in my Laravel database config file. The meta tag in my page header is already set to use charset=utf-8 and the header used to generate the csv file is set to:
header('Content-Type: text/csv; charset=utf-8');
I have tried using:
iconv("utf-8", "ascii//TRANSLIT//IGNORE", $mystring);
but this only replaces some of the values with ascii representations and not the proper symbols. I have also tried using something like
htmlspecialchars($mystring, ENT_QUOTES, "UTF-8");
but this still returns "®" for the copyright symbol and other strange character sequences in the csv file. When I echo the values in php, they appear correctly on my page. Am I right in thinking that I need to somehow convert the utf8mb4 string to regular utf-8 when I append the row to my csv file? I have not been able to find a solution and am looking for some help.
Can anyone tell me what I need to do to get the expected symbols in my csv file?
Jerry's comment
You don't show the code you use to actually write the file. Also, you don't say how you're inspecting the result (if you are using Excel, that could be the problem).
and Sammitch's comment
It's not that the data is not exporting properly, it's that the program that is reading or displaying it is not using the correct charset. You can try adding a UTF8 BOM \xEF\xBB\xBF to the beginning of the file and the program may use that as a signal to apply the correct charset. Failing that, look up how to open UTF8 CSVs properly in that program. Failing that you'll need to translate the data to a charset that the program does handle correctly.
were helpful. I was using Excel to preview the file. When I looked at the raw csv data in a code editor, the expected characters are there so it is something with the way Excel handles the file. Since I am working on a Mac and the © symbol is being entered with [Option] + [G], the é is [Option] + [E], etc. it would make sense that it could be a translation problem with how Excel reads the file. Adding \xEF\xBB\xBF to the beginning of the file seems to have done the trick!
If you stored utf8 values into a column declared latin1, fix that first.
Do not use any conversion routines.
Do verify the data in the tables using SELECT(hex) and SHOW CREATE TABLE
More: Trouble with UTF-8 characters; what I see is not what I stored
I need your help to finish my project. I take the data from my json files, some of which consist of chinese characters, but when I try to write to .csv it does not display properly.
This is my code
function writeCsv()
{
$resource = fopen('c:/xampp/test.json','w');
$csvBodyData = [ 'item'=> '逆始感録機政'];
fputcsv($resource, $csvBodyData);
}
I have tried the following solution but it's still not working.
write utf-8 characters to file with fputcsv in php
I got this character "???".
In your case the problem was not in PHP. When you open a csv file in Excel it shows you a window, where you can setup CSV importing options like delimiter and encoding. You should choose UTF-8 encoding to view those Chinese characters.
I have a txt file that has greek characters. When i open the file with notepad it shows that the encoding is ASCII.
But the only way that i can read the greek characters is to change (in openoffice writer or Editpad lite) the character set to DOS737.
The process that i need to implement in PHP is to open the file, split the text and import it to database. Everything is ok except that i cannot get the greek characters as they are.
I tried iconv but with no result.
I also tried mb_convert_encoding($data[0], "DOS737"); but i get warning mb_convert_encoding(): Unknown encoding "DOS737"
Also tried utf8_encode but with no luck
Any suggestions?
Finally found it.
It was easy... For anyone that might have the same issue use iconv("cp737","UTF-8","$string");
The structure of this XML is corrupted because of "include" connection database.
As you can see, there are strange characters in the first line of the file ('╗ ┐' ╗ ┐).
However, they do not appear on the web, since they only appear when I use cmd.exe to type the file. Here is a screenshot of the offending file:
Here's the URL of the file:
http://web.wipix.com.br/aniversariantes.xml
In my PHP file, I have two "includes" in the files connection.php (connection to database) AND "serialize.php" to generate the XML.
This only works if I take out the "includes" and use everything on one page only. How can I fix this?
That is a byte order mark (Unicode character U+FEFF) but it being displayed in an incorrect encoding. Since your document claims to be encoded as ISO-8859-1 there should not be a byte order mark.
Probably your xml file is in UTF-8 format with BOM.
http://en.wikipedia.org/wiki/Byte_order_mark
Remove offending 8 bytes or save your xml without BOM using a text editor.
If xml is dinamically generated, you have to modify the generation code.
Moreover, the BOM bytes seems encoded badly. Probably the xml was converted in a wrong way and BOM bytes were screwed up.
The odd stuff at the beginning could be a byte-order mark, but I'm not sure.
A byte-order mark is a byte sequence inserted at the beginning of a file used to indicate the endianness of it, or whether the most significant byte comes first.
From your output, there are other weird characters (not text) in the file, so it is possible that the program inserted them in.
Hi guys after 5 hours of research and trying everything I'm so desperate so I write here.
I have an XML file coming from a third party. When I try to parse it with SimpleXMLElement it simply says that the string is not in valid XML format and I also found out that this happens due to ANSI encoding the XML file is having. I tried converting the file to UTF-8 -> it gets read by the parser but all my Cyrillic symbols are lost, replaced by meaningless chars.
Then in notepad++ I copied the content created a file with utf8 encoding and pasted the content -> it was just fine and got read by the parser. I tried to do it with code but no result -> I get contents of the file, create a file with first bytes, the bytes of UTF-8 file, output the content and when I open it -> meaningless chars instead of Cyrillic. Help me please I really need to convert this file to UTF-8 valid for the XML parser or could you please tell me another way to parse the file from XML to array.
Try looking at
http://php.net/manual/en/function.utf8-decode.php
and
http://php.net/manual/en/function.iconv.php
You need to figure out what encoding the original XML file is in, then you can use iconv to convert it to UTF8.