used wrong mysql connection encoding - need to convert data - php

I have a Mysql database with all tables collated as 'utf8_unicode_ci'.
Also all data I wrote to the Database with php was encoded in utf8.
But I forgot to set the mysql connection encoding to utf8, so it probably defaulted to ISO-8859.
For a long time this was not a problem. Although special characters where displayed wrong in Tools like phpMyAdmin, the data was correct when loading it into my php application, as long as I kept using the wrong connection encoding.
But now I need to use my database from another application, that (correctly) does not use ISO-8859 as connection encoding and gets broken special characters.
Now I want to convert my database so I can use the right connection encoding.
I already tried this:
mysql wrong connection encoding
But I does not help for me. The closest I got to a solution was 'ut8_decode(utf8_decode($data))'.
But this breaks fields that start with a special character.
Additional Information:
So what might happen is the following:
My application sends some utf8 Data to the database.
Mysql gets the data but thinks (due to the connection encoding) that it is not utf8, and converts it, to fit for the 'utf8_unicode_ci' collation.
When my php application reads the data from the database mysql seems to undo the previous conversion so everything looks fine again from my php app.

Related

PHP/MySQL Encoding

I have a website, with arabic content which has been migrated from a different server. On the old server, everything was displaying correctly, supposedly everything was encoded with UTF-8.
On the current server, the data started displaying incorrectly, showing نبذة عن and similar characters.
The application is build on the CakePHP Framework.
After many trials, I changed the 'encoding' parameter in the MySql connection array to become 'latin1'. For the people who don't know CakePHP, this sets MySql's connection encoding. Setting this value to UTF8 did not change anything, even after the steps described below.
Some of the records started showing correctly in Arabic, while others remained gibberish.
I have already gone through all the database and server checks, confirming that:
The database created is UTF-8.
The table is UTF-8.
The columns are not explicitly set to any encoding, thus encoded in UTF-8.
Default Character set in PHP is UTF-8
mysql.cnf settings default to UTF-8
After that, I retrieved my data and looped through it, printing the encoding of each string (from each row) using mb_detect_encoding. The rows that are displaying correctly are returning UTF8 while it is returning nothing for the rows that are corrupt.
The data of the website has been edited on multiple types, possibly with different encodings, this is something I cannot know for sure. What I can confirm though, is that the only 2 encodings that this data might have passed through are UTF-8 and latin1.
Is there any possible way to recover the data when mb_detect_encoding is not returning anything and the current dataset is unknown?
UPDATE: I have found out that while the database was active on the new server, the my.cnf was updated.
The below directive was changed:
character-set-server=utf8
To
default-character-set=utf8
I am not sure how much this makes a difference though.
Checking the modified dates, I can conclude to a certain degree of certainty that the data I could recover was not edited on the new server, while the data I couldn't retrieve has been edited.
Try to fix the problem from DB side .. not from php or DB connection
I advice you to go to your old server and export your DB again with character set UTF8
then after import it to a new server .. be sure that you can see the arabic characters inside the tables(with phpmyadmin)
if your tables looks fine ..
then you can move to check the next
DB connection
php file encoding
the header encoding in html
as I know if the problem from the DB .. there is no way without export the data again from the old server
Edit:
if you do not have access to your old DB please check this answer it can help you
You were expecting نبذة عن? Mojibake. See duplicate for discussion and solution, including how to recover the data via a pair of ALTER TABLEs.
I had a similar problem with migrating database tables encoded with utf8 from a public server to localhost. The resolution was in setting the localhost server encoding using PHP
$db->set_charset("utf8")
right after the mysqli connection.
Now it works properly.

Is MySQL's collation solely used for sorting?

According to the official MySQL manual the collation used defines the order of records when sorting alphabetically:
http://dev.mysql.com/doc/refman/5.0/en/charset-general.html
However: I have a PHP script (UTF-8) and I save some foreign characters in my MySQL database it's saved all weird (first row). This is when the collation I choose is latin1_swedish_ci. When I change the collation to utf8_unicode_ci all is good (second row).
When saving this data everything is exactly the same except for the collation.
So how about that "collation is used solely for sorting records"?
How someone can clarify this for me :-) Thanks in advance!
It appears that the charset of your connection is not set right, therefore the conversion from the programming language charset to the database is not correct.
You should set the charset in your connection, then both will workfine.
as pointed out in the comments a little explanation on how things work.
when you have not set the character set in your connections, the server assumes it to be the same as the collocation of the database. when data is recieved in a another encoding, the data is written nevertheless. just with wrong or other characters than they have been in the encoding of the data from the script.
as long as nothing changes, the script gets back the same data as it has written and everything appears to be fine.
however when either the connection encoding or the database encoding is changed at this point, the already stored data gets converted to the new encoding. the problem here is that the source data is not in the encoding that is assumend when converting.
all encodings share the ascii set with the same bits, thats why ascii charactes dont mess up. only special charaters do.
so you have to set your conneciton encoding in order to dont produce the mess that you are already in.
now what can you do about the data you already have?
you can make a dump of your database using mysqldump and use the --skip-set-charset option. then you get a plaintext file. in this plane text file replace all occurences of the actual database charset with the one the data is really in (the one you had in your script when you wrote the data).
then save the file and make sure your editor does not do any conversion (i recommend vim).
then import that file and you will get a database with data in the correct encoding. then you can change the encoding however you like and as long as your conneciton charset gets set also you will be fine from now on.
also make sure that the mysql server has the charsets installed, but it should have that already.
this is only my approach, i have cleaned up a lot of messed up installations like that. most of which at some point have garbled characters in their projects (after switching server, updating or restoring a backup...).
turns out not setting the connection charset is something that is very often forgotten.

PHP mysql fixed connection to utf8, but now existing greek data is useless

I have a mysql database storing some fields in greek characters. In my html I have charset=utf-8 and my database columns are defined with encoding utf_general_ci. But I was not setting the connection encoding so far. As a result I have a database that doesn't display the greek characters well, but when reading back in PHP, it all shows well.
Now I try to do this the right way, so I added also in my database functions.
$mysqli->set_charset("utf8");
This works great for new entries.
But for existing entries, the problem is that when I read data in PHP, it comes garbled, since now the connection encoding has changed.
Is there a way to fix my data and make them useful again? I can continue working my old way, but I know it's wrong and can cause me more problems in the future.
I solved this issue as follows:
in a PHP script, retrieve the information as I do now, i.e without setting the connection. This way the mistake will be inverted and corrected and in your php file you will have the characters in the correct utf-8 format.
in the same PHP script, write back the information with setting the connection to utf-8
at this point the correct characters are in the database
I changed all my read/write functions of your site to use the utf-8 from now on

problems with character encoding pulled from mysql database [duplicate]

This question already has answers here:
Closed 10 years ago.
Possible Duplicate:
UTF-8 all the way through
okay, this is stupid that I can't figure it out.
Mysql database is set to utf8_general_ci collation. The field i'm having problems with is longtext type.
characters added to the database as &eacute or other accented characters are returning as �.
I run the output through stripslashes and i've tried both with and without html_entity_decode but can find no change in the output. What am I doing wrong?
Cheers
What character encoding does the string have that you try to insert? If it is in ISO-8859-1 you can use the PHP function utf8_encode() to encode it to UTF-8 before inserting it into the database.
http://php.net/manual/en/function.utf8-encode.php
Getting encoding right is really tricky - there are too many layers:
Browser
Page
PHP
MySQL
The SQL command "SET CHARSET utf8" from PHP will ensure that the client side (PHP) will get the data in utf8, no matter how they are stored in the database. Of course, they need to be stored correctly first.
DDL definition vs. real data
Encoding defined for a table/column doesn't really mean that the data are in that encoding. If you happened to have a table defined as utf8 but stored as differtent encoding, then MySQL will treat them as utf8 and you're in trouble. Which means you have to fix this first.
What to check
You need to check in what encoding the data flow at each layer.
Check HTTP headers, headers.
Check what's really sent in body of the request.
Don't forget that MySQL has encoding almost everywhere:
Database
Tables
Columns
Server as a whole
Client
Make sure that there's the right one everywhere.
Conversion
If you receive data in e.g. windows-1250, and want to store in utf-8, then use this SQL before storing:
SET NAMES 'cp1250';
If you have data in DB as windows-1250 and want to retreive utf8, use:
SET CHARSET 'utf8';
Last note:
Don't rely on too "smart" tools to show the data. E.g. phpMyAdmin does (was doing when I was using it) encoding really bad. And it goes through all the layers so it's hard to find out. Also, Internet Explorer had really stupid behavior of "guessing" the encoding based on weird rules. Use simple editors where you can switch encoding. Also, I recommend MySQL Workbench.

Mysql display chinese characters

I have been using php + mysql (phpmyadmin) to construct websites with Chinese contents (utf-8) for a long time.
When inputting forms, and also generate output php from db, the Chinese Words display well; but when I look at the database, although sometimes they are normal chinese characters, but something they are not (become strange strings), that made me notice that, the way that mysql handle and input data is not always utf-8.
Some experts on web mentioned, mysql were used to record the input data by latin1; nevertheless, I note that the existing charset in phpmyadmin is utf-8...
Will there be any solid way to detect the encoding format of chinese characters appeared in a phpmyadmin table cell?
Also, apart from mentioning at header of the page, will there be any method so that I can make sure the data entered to the db is utf-8 but not others?
Thank you.
The biggest problem that people encounter in this regard is that they don't tell MySQL that they're sending/expecting UTF-8 encoded data when connecting to the database, so MySQL thinks it's supposed to handle latin1 encoded data and converts it accordingly. Issue the command SET NAMES utf8 after connecting to the db or use mysql_set_charset.
in my case, it just because htmlentities(); Solution is change echo htmlentities($email_db); to echo htmlentities($email_db, ENT_COMPAT, 'UTF-8');

Categories