convert utf8 characters using PHP or MySQL - php

I'm not sure if someone has asked this question before or not but I didn't find any. While I hadn't set connection charset in PDO and table collation was utf8_percian_ci, so all data has been stored in unreadable characters like سلام that is سلام in Persian.
Before setting charset by adding mysql:charset=utf8mb4; to PDO DNS I was able to retrieve all data correctly but now I see سلام instead of سلام in browser.
My website is a blog and now it seems I have to reenter all the texts and posts and then restore them to be saved correctly. That's a disaster!
I used mb_detect_encoding() for both سلام & سلام and found out that both of them are UTF-8. This is so funny to search "How to convert utf8 to utf8 ?" and absolutely I get no expected result.
Is there anyway to convert سلام to سلام using MySQL ? If not, I thought another way could be using PHP to read old data then convert and insert into database again.
What should I do ?

You can simply do:
UPDATE my_table SET my_column = BINARY CONVERT(my_column USING latin1)
(where latin1 is the character set in which your connection was set at the time of insertion).

Related

Rgd How to Convert cast in php

convert(cast(convert('$username' using utf8) as binary) using latin1)
This is what I do for my MySQL query.
I have a string that is encoded as utf8 recorded into mysql as latin1
Now I draw out the latin1, but I wanna retrieve it as utf8 and display as utf8.
I tried mb_convert, utf8_encode and decode, all to no avail.
How can I restore back the original utf8 with php?
Step 1: Decide which case you have:
How mangling happens
Step 2: Decide what you want to do. You seem to want to leave the table messed up but retrieve the data. It would be better to fix the data, then retrieve the data without contortions.

mysql shows chinese characters like squares

I want to save chinese characters in mysql db, charset is set to UTF8 via connecting to db, also the field's charset is utf8, and collation - utf8_general_ci,
But instead of the word it shows squares. I use sqlyog.
There is one thing, if I make request and echo the word in the browser if shows the right chinese word.
So, I am wondering why it shows the correct word in browser, when in db it is like squares and vice versa.
I am afraid that maybe via exporting or importing in the future I can have some data lose.
Thanks
Your data might be stored correctly in the DB, but read wrongly by sqlyog.
I haven't used sqlyog, but this problem might be because of the way sqlyog connects to MySQL - look for parameters in sqlyog connection to DB that are related to character set and make sure they are also utf8
I had a similar problem when I had to insert Latin characters to the database, I used mb_convert_encoding($str, 'utf8', 'HTML-ENTITIES') and it got stored correctly in the database and wen I had to show it in the html page I just had the encoding=utf-8

Different Characters returned via remote database connection utf8 than local connection

I am trying to retrieve foreign language UTF8 data via a remote mysql database connection. When I retrieve the data remotely, the utf8 doesn't appear properly in the browser. However, when I retrieve the data via a local database connection, both on the live site, and on the local testing machine, the characters appear correctly in the browser.
My remote connection is from wamp local server to the online live website.
For every page I have set:
header('Content-Type: text/html; charset=utf-8');
I've also tried to set UTF-8 meta tag. I also have UTF8 specified in .htaccess as the default charset.
It's an older website so am still using mysqli. I have also tried setting:
$mysqli->set_charset("utf8");
For example, with remote connection Français is appearing as Français.
I have no idea what to do with this. I have spent hours trying to figure it out, but to no avail. I know it's the norm to ask for code, but there is just so much code, that I can't include it all here.
Thanks!
And your solution is: on the remote database, the data is encoded to utf8 twice, which yields incorrect results. There is no problem in your code, that database is at fault. You can fix it there (if it's a varchar, make a backup first!): convert it to latin1 first, then to binary then to utf8. An working sql fiddle to show you how is here, I'll paste the code here too in case sqlfiddle removes it somewhere in the future:
-- database column correctly defined as utf8
CREATE TABLE base (col VARCHAR(128) CHARSET utf8);
-- wrong data is entered:
INSERT INTO base SELECT UNHEX('4672616EC383C2A7616973');
-- first, convert back to latin-1, we have now proper utf-8 data, but in a latin1 column
ALTER TABLE base MODIFY COLUMN col VARCHAR(128) CHARSET latin1;
-- convert to binary first, so MySQL leaves the bytes as is without conversion
ALTER TABLE base MODIFY COLUMN col VARBINARY(128);
-- then convert to the proper character set, which will leave the bytes once again intact
ALTER TABLE base MODIFY COLUMN col VARCHAR(128) CHARSET utf8;
I made it work by adding the following to the script that calls the remote database:
$mysqli->set_charset("latin1");
I don't know if it's a bit of a hack, because it still means the chars are probably not encoded or collated correctly, but it works. Thanks Wrikken for showing me the character set modifications, I can try to use those here in the future to correct things properly.

MySQL SET NAMES - working mechanizm explanation?

I recenly had problem in importing latin1_swedish database into new one. Somone made Latin1 Database to store Latin2 characters. It was all working till I made database dump and wanted to import it to another database.
It's really complicated. In the end I corrected sql dump to proper ISO-8859-2 Encoded file with all characters displaying correctly. Still import into tables with Latin2 encoding didn't work, all special characters were lost (maybe its a PHPMyAdmin bug?).
Converting file to UTF-8 encoding and changing table encoding to utf8_general_ci imported everything correctly.
Next, whole PHP site uses and displays ISO-8859-2 characters (its old PHPBB forum).
While connecting to Database I use "SET NAMES latin2" command to change encoding.
To my surprise, page displays as proper ISO-8859-2.
If table is UTF-8 and Set names is latin2. Does MySQL connection convert characters into ISO-8859-2 before returning them???
(didnt know if I shoud write it all or not. Edit it if I put too much not needed info)
SET NAMES effectively sets how the data is translated before being stored or after recalled, prior to presenting to the client. For the case of storage, the character set definition of the column is the ultimate determining factor (if it differs from table, and database character set definition). See this informative blog post about encoding in MySQL.

Mysql: latin1-> utf8. Convert characters to their multibyte equivalents

There was a table in latin1 and site in cp1252
I want to have table in utf8 and site in utf-8
I've done:
1) on web page: Content-Type: text/html;charset=utf-8
2) Mysql: ALTER TABLE XXX CONVERT TO CHARACTER SET utf8
_
This SQL doesn't work as I want - it doesn't convert ä & ü characters in database to their multibyte equivalents
Please Help.
Tanks
As this blog post says, using MySQL's ALTER TABLE CONVERT syntax is A Bad Idea [TM]. Export your data, convert the table and then reimport the data, as described in the blog post.
Another idea: Have you set your default client connection charset via /etc/my.cnf or mysqli::set-charset .
I've been a fool. SET NAMES was missing.
What I know now:
1) Every time the charset of a column is changed, the actual data is ALWAYS recoded! Change field to binary to see that.
2) The charset of a column is prior!, the table and db charset follow in the priority. They are used mainly for setting defaults. (not 100% sure about last sentence)
3) SET NAMES is very important. German characters can come in latin1 and be placed get correctly in utf8 table(recoded by Mysql silently) when you SET NAMES correctly. The server can send data to a web page in the encoding you desire, no matter what the table encoding is. It can be recoded for output
4) If there is a column in encoding A and a column in encoding B, and you compare them (or use LIKE), the Mysql will silently convert them so that it looks like they are in one encoding
5) Mysql is smart. It never operates with text as with bytes unless the type is binary. It always operates as characters! He wants that ё in latin1 would equal ё in utf8 if he knows the data encoding
Since you claim you now get s**t back, it suggests that the characters were modified in the database.
How are you accessing the data in mysql? If you are using a programming interface such as PHP, then you may need to tell that interface what character encoding to expect.
For example, in PHP you will need to call something like mysql_set_charset("utf8"); but it can also be done with an SQL query of SET NAMES utf8
You will then also need to make sure that whatever is displaying the text knows it is utf8 and is rendering with an appropriate encoding. For example, on a web page you would need to set the content type to utf-8. something like Content-Type: text/html;charset=utf-8

Categories