I am doing the folloiwing steps to import data into my e-commerce shop:
convert excel sheet to csv in excel
open csv file in notepad++ and convert to UTF-8
import csv file in phpmyadmin
If I look at the front end of the webpage the french characters are displayed as ?. The charset of the page is utf-8
If I change the charset to iso-8859-1 everything displays correctly.
If I check the item in the phpmyadmin the accents are displayed correctly.
How come utf-8 is not displaying them correctly? I thought it should display é etc.
If i go to the back end of the website and edit the product, the french description displays properly in the WYSIWYG editor. If I save then the product the french characters then show correctly. But this is because the WYSIWYG editor is converting the characters to html entities.
A common issue when collecting Unicode DATA is leaving the Connection and database/table/column character set configurad as ISO-8859-1, but then inserting data that is actually utf-8. The database is essentially told, "here's some 8859-1-encoded data, store it in this 8859-1 table". It doesn't do any conversions because it doesn't realize the data isn't in 8859-1. So the data is utf-8 but the database has essentially been told it's in 8859-1.
It's an insidious problem because, as you say, the database will convert them wrongly if you change your charset to UtF-8, since it will convert the "8859-1" data (remmember the databae thinks it's 8859-1) to utf-8 - a conversion that fails of course, as the data really is in utf-8.
So basically the problem is that phpmyadmin is in 8859-1 but you told it to insert the data in 8859-1 and then told it you were providing data in 8859-1, and then gave it utf-8 data. The database thinks it's 8859-1 so the only easy way to solve the problem is to a) keep acting like it's 8859-1 even though it's not, and hope you never have to deal with sorting, searching, collation, etc ( may work in your case), or b) pulling out the data as 8859-1 ( leaving it unconverted ), then re-inserting it after setting the database and connection to utf-8 so the database knows what character set the data really is in.
Hope that makes sense. Let me know if it doesn't. This is a hard one to wrap your head around.
You might consider opening your csv with PHP (since you mention it in your tags), and use utf8_encode on the fields before saving them with queries.
This question is so old, but changing the encoding of the file from ISO-8859-1 to UTF-8 in various programs such as Excel etc was not working for me.
My issue is words like intérêt shows up as intérêt in the file.
In case this helps someone, here is what finally worked for me:
Starting with a CSV file, open in Notepad
Click "File > Save As".
In the dialog window that appears - select "ANSI" from the "Encoding" field. Then click "Save".
That's it! Opening this new CSV file using Excel should now show the non-English characters properly.
Related
I know this question might be redundant, but I had to go through them and I cannot solve my problem.
1 - The data is in the DB and stored in UTF-8 encoding.
2 - The connection charset also had been set to UTF8.
3 - Have tried manually encode the value to UTF8 encode while printing using encode_utf8($value) where the $value is the Chinese character.
I used PHP 5.621, and PHPExcel class from http://www.codeplex.com/PHPExcel .
The result is still showing the Chinese character as "赛维网络".
I am not printing the result on the page. I am creating a new xlsx file and printing onto it. The Chinese character is unreadable on the xlsx file, not on the page. On the page yes I can use the meta charset. But the problem is in the xlsx file.
Any advice for the best solution?
If your content is already UTF-8 and you encode it again you certainly will get broken characters.
So, just omit the third step.
I noticed some people with Non-Western had issues with XLSX/Excel-2077 too, maybe it's a bug. Check the PHPExcel GitHub tracker.
Heelo guys , i'm trying to retrieve a stored arabic information from my sql database , the data has arrived successfully , but not arabic , it came like that :
NON Arabic characters
any one can help ?
here is my code
we suppose database tables were set to a Latin-1
1-Export the data as Latin-1. Because MySQL knows that the table is already using a Latin-1 encoding, it will do a straight export of the data without trying to convert the data to another character set. If you try to export as UTF-8, MySQL appears to attempt to convert the (supposedly) Latin-1 data to UTF-8 – resulting in double encoded characters (since the data was actually already UTF-8).
2-Change the character set in the exported data file from ‘latin1’ to ‘utf8’. Since the dumped data was not converted during the export process, it’s actually UTF-8 encoded data.
3-Create your new table as UTF-8 If your CREATE TABLE command is in your SQL dump file, change the character set from ‘latin1’ to ‘utf8’.
4-Import your data normally. Since you’ve got UTF-8 encoded data in your dump file, the declared character set in the dump file is now UTF-8, and the table you’re importing into is UTF-8, everything will go smoothly.
I have a column in my UTF-8 MySQL table that is datatype 'longtext'. When I display the string on a charset=UTF-8 page in PHP, I get a unicode character (� or U+FFFD) occasionally. Example:
"None of these adjustments affects existing force structure or military capabilities, and the efficiencies will further enable U.S. European Command to resource high priority missions,"� Pentagon Press Secretary Navy Rear Adm. John Kirby said in the release.
I have tried wrapping my string in and html_entity_decode(), to replace the unicode character with nothing, but without much luck:
$content = html_entity_decode(preg_replace("/U\+([0-9A-F]{4,5})/", "", $getstory[0]['content']), ENT_NOQUOTES, 'UTF-8');
As a side-note, this issue doesn't occur with new data inserted into the this table column, only with older data.
Any suggestions?
Try to change the encoding of your php file to utf8. This can be done in your editor, somewhere at Tools - Character Encoding, and change it to UTF-8.
If you can't find it, open it in notepad, and go to file - save as, and when it prompts for location to save, below of the name of the file, there will be an option to choose in what character encoding you wish to save the file.
**EDIT:
It looks like you want to change your database charset. Go to phpmyadmin and there you can change it for your database, and for each table separately
I have a MySQL table & fields that are all set to UTF-8. The thing is, a previous PHP script, which was in charge of the database writing, was using some other encoding, not sure whether it is in the script itself, the MySQL connection or somewhere else. The result is that although the table & fields are set to UTF-8, we see the wrong chars instead of Chinese.
It looks like that:
Now, the previous scripts (which were in charge of the writing and corrupted the data) can read it well for some reason, but my new script which all encoded in UTF-8, shows chars like ½©. How can that be fixed?
By the sound of it, you have a utf8 column but you are writing to it and reading from it using a latin1 connection, so what is actually being stored in the table is mis-encoded. Your problem is that when you read from the table using a utf8 connection, you see the data that's actually stored there, which is why it looks wrong. You can fix the mis-encoded data in the table by converting to latin1, then back to utf8 via the binary character set (three steps in total).
The original database was in a Chinese encoding – GB-18030 or similar, not Latin-1 – and the bytes that make up these characters, when displayed in UTF-8, show up as a bunch of Latin diacritics. Read each string as GB-18030, convert it to UTF-8, and save.
So, I have built on this system for quite some time, and it is currently outputting Latin1 (ISO-8859-1) to the web browser, and this is the components:
MySQL - all data is stored with the Latin1 character set
PHP - All PHP text files are stored on disk with Latin1 encoding
HTML - The output has the http-equiv="content-type" content="text/html; charset=iso-8859-1" meta tag
So, I'm trying to understand how the encoding of the different parts come into play in my workflow. If I open a PHP script and change its encoding within the text editor to UTF-8 and save it back to disk and reload the web browser, the text is all messed up - unless the text comes from the DB. If I change the encoding of the DB to UTF-8 and keep the PHP files in latin1 I have to use utf8_decode() for the data to display correctly. And if I change the HTML code the browser will read it incorrectly.
So yeah, I realise that if I want to "upgrade" to UTF8, I have to update all three parts of this setup for it to work correctly, but since it's a huge system with some 180k lines of PHP code and millions of posts in a lot of databases/tables, I don't want to start something like this without understanding everything correctly.
What haven't I thought about? What could mess this up beyond fixing? What are the procedures for changing the encoding of an entire MySQL installation and what's the easiest way to change the encoding of hundreds or thousands of PHP files on disk?
The META tag is luckily added dynamically, so I'll change that in one place only :)
Let me hear about your experiences with this.
It's tricky.
You have to:
change the DB and every table character set/encoding – I don't know much about MySQL, but see here
set the client encoding to UTF-8 in PHP (SET NAMES UTF8) before the first query
change the meta tag and possible the Content-type header (note the Content-type header has precedence)
convert all the PHP files to UTF-8 w/out BOM – you can easily do that with a loop and iconv.
the trickiest of all: you have to change most of your string function calls. Than means mb_strlen instead of strlen, mb_substr instead of substr and $str[index], etc.
Don't convert to UTF8 if you don't have to. Its not worth the trouble.
UTF8 is (becoming) the new standard, so for new projects I can recommend it.
Functions
Certain function calls don't work anymore. For latin1 it's:
echo htmlentities($string);
For UTF8 it's:
echo htmlentities($string, ENT_COMPAT, 'UTF-8');
strlen(), substr(), etc. Aren't aware of the multibyte characters.
MySQL
mysql_set_charset('UTF8') or mysql_query('SET NAMES UTF8') will convert all text to UTF8 coming from the database(SELECTs). It will also convert incoming strings(INSERT, UPDATE) from UTF8 to the encoding of the table.
So for reading from a latin1 table it's not necessary to convert the table encoding.
But certain characters are only available in unicode (like the snowman ☃, iPhone emoticons, etc) and can't be converted to latin1. (The data will be truncated)
Scripts
I try to prevent specials-characters in my php-scripts / templates.
I use the ë notation instead of ë etc. This way it doesn't matter if is saved in latin1 or utf8.