I have a MySQL table & fields that are all set to UTF-8. The thing is, a previous PHP script, which was in charge of the database writing, was using some other encoding, not sure whether it is in the script itself, the MySQL connection or somewhere else. The result is that although the table & fields are set to UTF-8, we see the wrong chars instead of Chinese.
It looks like that:
Now, the previous scripts (which were in charge of the writing and corrupted the data) can read it well for some reason, but my new script which all encoded in UTF-8, shows chars like ½©. How can that be fixed?
By the sound of it, you have a utf8 column but you are writing to it and reading from it using a latin1 connection, so what is actually being stored in the table is mis-encoded. Your problem is that when you read from the table using a utf8 connection, you see the data that's actually stored there, which is why it looks wrong. You can fix the mis-encoded data in the table by converting to latin1, then back to utf8 via the binary character set (three steps in total).
The original database was in a Chinese encoding – GB-18030 or similar, not Latin-1 – and the bytes that make up these characters, when displayed in UTF-8, show up as a bunch of Latin diacritics. Read each string as GB-18030, convert it to UTF-8, and save.
Related
Heelo guys , i'm trying to retrieve a stored arabic information from my sql database , the data has arrived successfully , but not arabic , it came like that :
NON Arabic characters
any one can help ?
here is my code
we suppose database tables were set to a Latin-1
1-Export the data as Latin-1. Because MySQL knows that the table is already using a Latin-1 encoding, it will do a straight export of the data without trying to convert the data to another character set. If you try to export as UTF-8, MySQL appears to attempt to convert the (supposedly) Latin-1 data to UTF-8 – resulting in double encoded characters (since the data was actually already UTF-8).
2-Change the character set in the exported data file from ‘latin1’ to ‘utf8’. Since the dumped data was not converted during the export process, it’s actually UTF-8 encoded data.
3-Create your new table as UTF-8 If your CREATE TABLE command is in your SQL dump file, change the character set from ‘latin1’ to ‘utf8’.
4-Import your data normally. Since you’ve got UTF-8 encoded data in your dump file, the declared character set in the dump file is now UTF-8, and the table you’re importing into is UTF-8, everything will go smoothly.
According to the official MySQL manual the collation used defines the order of records when sorting alphabetically:
http://dev.mysql.com/doc/refman/5.0/en/charset-general.html
However: I have a PHP script (UTF-8) and I save some foreign characters in my MySQL database it's saved all weird (first row). This is when the collation I choose is latin1_swedish_ci. When I change the collation to utf8_unicode_ci all is good (second row).
When saving this data everything is exactly the same except for the collation.
So how about that "collation is used solely for sorting records"?
How someone can clarify this for me :-) Thanks in advance!
It appears that the charset of your connection is not set right, therefore the conversion from the programming language charset to the database is not correct.
You should set the charset in your connection, then both will workfine.
as pointed out in the comments a little explanation on how things work.
when you have not set the character set in your connections, the server assumes it to be the same as the collocation of the database. when data is recieved in a another encoding, the data is written nevertheless. just with wrong or other characters than they have been in the encoding of the data from the script.
as long as nothing changes, the script gets back the same data as it has written and everything appears to be fine.
however when either the connection encoding or the database encoding is changed at this point, the already stored data gets converted to the new encoding. the problem here is that the source data is not in the encoding that is assumend when converting.
all encodings share the ascii set with the same bits, thats why ascii charactes dont mess up. only special charaters do.
so you have to set your conneciton encoding in order to dont produce the mess that you are already in.
now what can you do about the data you already have?
you can make a dump of your database using mysqldump and use the --skip-set-charset option. then you get a plaintext file. in this plane text file replace all occurences of the actual database charset with the one the data is really in (the one you had in your script when you wrote the data).
then save the file and make sure your editor does not do any conversion (i recommend vim).
then import that file and you will get a database with data in the correct encoding. then you can change the encoding however you like and as long as your conneciton charset gets set also you will be fine from now on.
also make sure that the mysql server has the charsets installed, but it should have that already.
this is only my approach, i have cleaned up a lot of messed up installations like that. most of which at some point have garbled characters in their projects (after switching server, updating or restoring a backup...).
turns out not setting the connection charset is something that is very often forgotten.
I recenly had problem in importing latin1_swedish database into new one. Somone made Latin1 Database to store Latin2 characters. It was all working till I made database dump and wanted to import it to another database.
It's really complicated. In the end I corrected sql dump to proper ISO-8859-2 Encoded file with all characters displaying correctly. Still import into tables with Latin2 encoding didn't work, all special characters were lost (maybe its a PHPMyAdmin bug?).
Converting file to UTF-8 encoding and changing table encoding to utf8_general_ci imported everything correctly.
Next, whole PHP site uses and displays ISO-8859-2 characters (its old PHPBB forum).
While connecting to Database I use "SET NAMES latin2" command to change encoding.
To my surprise, page displays as proper ISO-8859-2.
If table is UTF-8 and Set names is latin2. Does MySQL connection convert characters into ISO-8859-2 before returning them???
(didnt know if I shoud write it all or not. Edit it if I put too much not needed info)
SET NAMES effectively sets how the data is translated before being stored or after recalled, prior to presenting to the client. For the case of storage, the character set definition of the column is the ultimate determining factor (if it differs from table, and database character set definition). See this informative blog post about encoding in MySQL.
I have a Mysql database with all tables collated as 'utf8_unicode_ci'.
Also all data I wrote to the Database with php was encoded in utf8.
But I forgot to set the mysql connection encoding to utf8, so it probably defaulted to ISO-8859.
For a long time this was not a problem. Although special characters where displayed wrong in Tools like phpMyAdmin, the data was correct when loading it into my php application, as long as I kept using the wrong connection encoding.
But now I need to use my database from another application, that (correctly) does not use ISO-8859 as connection encoding and gets broken special characters.
Now I want to convert my database so I can use the right connection encoding.
I already tried this:
mysql wrong connection encoding
But I does not help for me. The closest I got to a solution was 'ut8_decode(utf8_decode($data))'.
But this breaks fields that start with a special character.
Additional Information:
So what might happen is the following:
My application sends some utf8 Data to the database.
Mysql gets the data but thinks (due to the connection encoding) that it is not utf8, and converts it, to fit for the 'utf8_unicode_ci' collation.
When my php application reads the data from the database mysql seems to undo the previous conversion so everything looks fine again from my php app.
There was a table in latin1 and site in cp1252
I want to have table in utf8 and site in utf-8
I've done:
1) on web page: Content-Type: text/html;charset=utf-8
2) Mysql: ALTER TABLE XXX CONVERT TO CHARACTER SET utf8
_
This SQL doesn't work as I want - it doesn't convert ä & ü characters in database to their multibyte equivalents
Please Help.
Tanks
As this blog post says, using MySQL's ALTER TABLE CONVERT syntax is A Bad Idea [TM]. Export your data, convert the table and then reimport the data, as described in the blog post.
Another idea: Have you set your default client connection charset via /etc/my.cnf or mysqli::set-charset .
I've been a fool. SET NAMES was missing.
What I know now:
1) Every time the charset of a column is changed, the actual data is ALWAYS recoded! Change field to binary to see that.
2) The charset of a column is prior!, the table and db charset follow in the priority. They are used mainly for setting defaults. (not 100% sure about last sentence)
3) SET NAMES is very important. German characters can come in latin1 and be placed get correctly in utf8 table(recoded by Mysql silently) when you SET NAMES correctly. The server can send data to a web page in the encoding you desire, no matter what the table encoding is. It can be recoded for output
4) If there is a column in encoding A and a column in encoding B, and you compare them (or use LIKE), the Mysql will silently convert them so that it looks like they are in one encoding
5) Mysql is smart. It never operates with text as with bytes unless the type is binary. It always operates as characters! He wants that ё in latin1 would equal ё in utf8 if he knows the data encoding
Since you claim you now get s**t back, it suggests that the characters were modified in the database.
How are you accessing the data in mysql? If you are using a programming interface such as PHP, then you may need to tell that interface what character encoding to expect.
For example, in PHP you will need to call something like mysql_set_charset("utf8"); but it can also be done with an SQL query of SET NAMES utf8
You will then also need to make sure that whatever is displaying the text knows it is utf8 and is rendering with an appropriate encoding. For example, on a web page you would need to set the content type to utf-8. something like Content-Type: text/html;charset=utf-8