Advice on converting ISO-8859-1 data to UTF-8 in MySQL - php

We have a very large InnoDB MySQL 5.1 database with all tables using the latin1_swedish_ci collation. We want to convert all of the data which should be in ISO-8859-1 into UTF-8. How effective would changing the collation to utf8_general_ci be, if at all?
Would we be better off writing a script to convert the data and inserting into a new table? Obviously our goal is to minimise the risk of losing any data when re-encoding.
Edit: We do have accented character's, £ symbols etc.

If the data is currently using only latin characters and you are just wanted to change the character set and collation to UTF8 to enable future addition of UTF-8 data, then there should be no problem simply changing the character set and collation. I would do it in a copy of the table first of course.

About a week ago I had to do the same task (issues with ö, ä, å)
Created a dump.sql.
Searched and replaced all CHARSET=latin1 to CHARSET=utf8 (in the dump.sql).
searched and replaced all COLLATE=latin1_swedish_ci to COLLATE=utf8_unicode_ci (in the dump.sql).
Created a new database with the collation utf8_unicode_ci.
Imported the dump.sql.
Altered the the database's charset with alter database MY_DB charset=utf8;
and it worked perfectly
Note: after Mike Brant's remark, I think it's better better to do manual searching and replace for the fields you specifically want. Or you can simply use ALTER for each field without needing the dump.sql. It didn't make much change in my case, as most of my fields needed to be utf encoded

Related

Emojis on textarea does not save post

I have a commenting system and when its just text, no problem - it is saved to the database. When I add a 😄 (for instance), then no comment is saved to the database? Nothing is saving, when there is an emoji.
What can I do to allow emojis?
The "message" is where I am saving the actual comment and where there should be an emoji.
You might want to update the charset and potentially collation. I'm assuming you're using MySQL. This is very confusing, but in MySQL the UTF8 charset isn't actually UTF8, but a mysql's proprietary charset that is largely similar to the actual UTF8, but lacks some characters.
The way to handle it is to switch to the actual UTF8, which in the world of mysql is called utf8mb4_general_ci. You can do so by running
ALTER DATABASE <you db name> CHARACTER SET utf8mb4_general_ci COLLATE utf8mb4_general_ci;
(this will affect only the new tables that you create)
and
ALTER TABLE <you existing table name> CONVERT TO CHARACTER SET utf8mb4_general_ci;
(this will update an already existing table, although the emojis that you already lost cannot be recovered)

Changed mysql charset to utf8, non-latin characters already in database now unreadable

I have several years of data in the DB, which is 99% Latin characters. Recently, I've added the following after the mysql connection:
mysqli_set_charset($link, "utf8");
Now all the existing data in the database that is composed of asian, Hebrew, etc characters is no longer readable and appears as garbage data.
How can I fix the data in the DB so its readable with a utf8 charset?
The table charset was always utf8. The only thing that changed is the fact that there is a charset set during the connection (as shown above), and before that line was absent.
The table creation is fairly basic, the collation is utf8_general_ci
CREATE TABLE `test` (
COLUMNS + INDEXES
) ENGINE=InnoDB DEFAULT CHARSET=utf8
You now have data that is double-encoded, and you are going to need to fix the data before you can read it on a connection that uses utf8 as the charset.
Here's a blog that explain in detail how to fix your data:
http://www.mysqlperformanceblog.com/2013/10/16/utf8-data-on-latin1-tables-converting-to-utf8-without-downtime-or-double-encoding/

PHP MYSQL Collation Speical Characters XML->PHP->MYSQL

I am trying to import data from an XML file into a MYSQL DB using PHP. I am able to get the code to work just fine but when I look at the data in the DB there are special characters. For example, when I look at the XML in my browser it shows up as "outdoors in good weater..." but in the DB it appears to as "outdoors in good weather…".
I've cycled through all the different types of collation for that field in my DB but it does not seem to help much. Sometimes it shows up with the characters mentioned above and others as ???.
I have also tried to sync up the data with the following code in my PHP
$mysqli->query("SET NAMES 'utf8' COLLATE 'utf8_general_ci'");
But, again I have had no luck.
Thank you for reading this and for your help!
Akshay
You need to change the character set to UTF-8, along with your collation:
ALTER TABLE tablename CONVERT TO CHARACTER SET utf8 COLLATE utf8_general_ci;
What you are seeing is a Unicode ellipsis character (…) being converted into another character set, which is probably Latin1. That is why it looks garbled.

Find *actual* character encoding of data in MySQL DB: UTF8 Latin1 illegal collation

Spent hours on this now and could use some help! Our website queries our db - table columns are set to Latin1 collation, website has set names to UTF8 for queries.
The data is French and when we do a search for a string including accented characters we get the "Illegal mix of collations (latin1_swedish_ci,IMPLICIT) and (utf8_general_ci,COERCIBLE) for operation 'like'" error.
If you navigate to a page which loads the data completely it shows accented characters no problem, it is just when using the search function that it breaks.
We have tried a number of methods including ALTER TABLE t1 CHANGE c1 c1 BLOB; ALTER TABLE t1 CHANGE c1 c1 TEXT CHARACTER SET utf8;
but this damages the data: the text before the first accented character in each field is fine, but then the rest of the text in the field is completely dropped: 'Métal' becomes 'M'. I am using phpMyAdmin to try and fix this BTW. Not sure if that's a problem.
So is the data UTF8 encoded? If so, why does the ALTER TABLE not work, I've seen it mentioned as THE way to fix this problem on so many webpages! If the fact it doesn't work means that the data is not UTF8 encoded, how do I find out what it is?
Having a different encoding between your website and your database is not a very good idea.... to avoid this problem, it's better to have everything in utf8. Though, it should be possible to convert the encoding of your tables playing with collations.

inserting latin1-encoded text into utf8 tables (forgot to use mysql_set_charset)

I have a PHP web app with MySQL tables taking utf8 text. I recently converted the data from latin1 to utf8 along with the tables and columns accordingly. I did, however, forget to use mysql_set_charset and the latest incoming data I would assume came through the MySQL connection as latin1. I don't know what happens when latin1 comes in to a utf8 column, but it's causing some strange display issues for items like comma, quotes, ampersand, etc.
Now that mysql_set_charset is in place, it is pulling the data out with funky characters. Any way to convert the latin1-utf8 soup to straight utf8 now that i have the database connection resource using the correct charset?
Found the fix with your comment. Here was the SQL line that seemingly has solved my issue.
UPDATE table SET col = CONVERT(CONVERT(CONVERT(col USING latin1) USING binary) using utf8);
Even though the column is UTF8, it forces it to pull the data out as latin1, convert to binary, convert to utf8 and re-insert.

Categories