convert(cast(convert('$username' using utf8) as binary) using latin1)
This is what I do for my MySQL query.
I have a string that is encoded as utf8 recorded into mysql as latin1
Now I draw out the latin1, but I wanna retrieve it as utf8 and display as utf8.
I tried mb_convert, utf8_encode and decode, all to no avail.
How can I restore back the original utf8 with php?
Step 1: Decide which case you have:
How mangling happens
Step 2: Decide what you want to do. You seem to want to leave the table messed up but retrieve the data. It would be better to fix the data, then retrieve the data without contortions.
Related
I have MYSQL database collation set to latin1_swedish_ci but my site uses encoding windows-1256. This means the data inside tables is encoded with windows-1256.
What is the correct way to convert my database tables/fields and data to utf-8 using iconv or any other library?
First, you need to verify that the data in the table(s) is really latin1. Could you do SELECT HEX(col), col ... to see what it looks like.
Depending on whether it is latin1 encoding or utf8 encoding (or something else) will determine what steps to perform. (If you do these steps without knowing, you could make things worse.)
These references give you the next steps:
http://dev.mysql.com/doc/refman/5.0/en/alter-table.html and/or
http://mysql.rjweb.org/doc.php/charcoll
I'm not sure if someone has asked this question before or not but I didn't find any. While I hadn't set connection charset in PDO and table collation was utf8_percian_ci, so all data has been stored in unreadable characters like سلام that is سلام in Persian.
Before setting charset by adding mysql:charset=utf8mb4; to PDO DNS I was able to retrieve all data correctly but now I see سلام instead of سلام in browser.
My website is a blog and now it seems I have to reenter all the texts and posts and then restore them to be saved correctly. That's a disaster!
I used mb_detect_encoding() for both سلام & سلام and found out that both of them are UTF-8. This is so funny to search "How to convert utf8 to utf8 ?" and absolutely I get no expected result.
Is there anyway to convert سلام to سلام using MySQL ? If not, I thought another way could be using PHP to read old data then convert and insert into database again.
What should I do ?
You can simply do:
UPDATE my_table SET my_column = BINARY CONVERT(my_column USING latin1)
(where latin1 is the character set in which your connection was set at the time of insertion).
I recenly had problem in importing latin1_swedish database into new one. Somone made Latin1 Database to store Latin2 characters. It was all working till I made database dump and wanted to import it to another database.
It's really complicated. In the end I corrected sql dump to proper ISO-8859-2 Encoded file with all characters displaying correctly. Still import into tables with Latin2 encoding didn't work, all special characters were lost (maybe its a PHPMyAdmin bug?).
Converting file to UTF-8 encoding and changing table encoding to utf8_general_ci imported everything correctly.
Next, whole PHP site uses and displays ISO-8859-2 characters (its old PHPBB forum).
While connecting to Database I use "SET NAMES latin2" command to change encoding.
To my surprise, page displays as proper ISO-8859-2.
If table is UTF-8 and Set names is latin2. Does MySQL connection convert characters into ISO-8859-2 before returning them???
(didnt know if I shoud write it all or not. Edit it if I put too much not needed info)
SET NAMES effectively sets how the data is translated before being stored or after recalled, prior to presenting to the client. For the case of storage, the character set definition of the column is the ultimate determining factor (if it differs from table, and database character set definition). See this informative blog post about encoding in MySQL.
I've got json data. There is "cyrillic" strings in json file, like this one:
\u0418\u0432\u0430\u043D\u043E\u0432 \u0418.
When I decode json and put this data in database table I get the string
Иванов И.
On one decoding web-site I entered this string and got very good (the one I need)
Иванов И.
And also this site told me that it was converted from CP1252 to UTF-8.
So I tried to convert data from json after decoding manually using
mb_convert_encoding ( $string, "UTF-8","windows-1252");
mb_convert_encoding ( $string, "UTF-8","CP1252");
and
iconv("windows-1252","UTF-8",$string);
iconv("CP1252","UTF-8",$string);
Any of this functions made the string in database table look like
Øòðýþò ÃËœ.
or
Øòðýþò Ø.
both are not decoded on above site properly. So the question is, how do I convert this string?
Upd: used this sql request:
ALTER DATABASE logenterprise
CHARACTER SET utf8
Tried after the same things that wrote above - result is the same.
Also tried this just in case:
alter table mytable convert to CHARACTER SET utf8 COLLATE utf8_unicode_ci;
Curse you damned encodings ^^
They gave me a hard time too.
Everything looked fine (database, encoding of the inputdata and on the website), but still i got cryptic chars in my tables. So what's the problem then? It's the connection to your database-server.
Fortunately you can fix this with a simple query.
Right after establishing the mysql-connection you need to execute the following query:
mysql_query("SET NAMES 'utf8'");
Voilà. When you execute your INSERT-Query the data gets nicely saved in your db.
This saved my ass many times as i was handling 'Umlauts' and the €-sign.
Note: You shouldn't use mysql_xxx methods anymore as they are deprecated. I just used them in the example to make the code clearer.
I have a MySQL table & fields that are all set to UTF-8. The thing is, a previous PHP script, which was in charge of the database writing, was using some other encoding, not sure whether it is in the script itself, the MySQL connection or somewhere else. The result is that although the table & fields are set to UTF-8, we see the wrong chars instead of Chinese.
It looks like that:
Now, the previous scripts (which were in charge of the writing and corrupted the data) can read it well for some reason, but my new script which all encoded in UTF-8, shows chars like ½©. How can that be fixed?
By the sound of it, you have a utf8 column but you are writing to it and reading from it using a latin1 connection, so what is actually being stored in the table is mis-encoded. Your problem is that when you read from the table using a utf8 connection, you see the data that's actually stored there, which is why it looks wrong. You can fix the mis-encoded data in the table by converting to latin1, then back to utf8 via the binary character set (three steps in total).
The original database was in a Chinese encoding – GB-18030 or similar, not Latin-1 – and the bytes that make up these characters, when displayed in UTF-8, show up as a bunch of Latin diacritics. Read each string as GB-18030, convert it to UTF-8, and save.