Mysql: latin1-> utf8. Convert characters to their multibyte equivalents - php

There was a table in latin1 and site in cp1252
I want to have table in utf8 and site in utf-8
I've done:
1) on web page: Content-Type: text/html;charset=utf-8
2) Mysql: ALTER TABLE XXX CONVERT TO CHARACTER SET utf8
_
This SQL doesn't work as I want - it doesn't convert ä & ü characters in database to their multibyte equivalents
Please Help.
Tanks

As this blog post says, using MySQL's ALTER TABLE CONVERT syntax is A Bad Idea [TM]. Export your data, convert the table and then reimport the data, as described in the blog post.
Another idea: Have you set your default client connection charset via /etc/my.cnf or mysqli::set-charset .

I've been a fool. SET NAMES was missing.
What I know now:
1) Every time the charset of a column is changed, the actual data is ALWAYS recoded! Change field to binary to see that.
2) The charset of a column is prior!, the table and db charset follow in the priority. They are used mainly for setting defaults. (not 100% sure about last sentence)
3) SET NAMES is very important. German characters can come in latin1 and be placed get correctly in utf8 table(recoded by Mysql silently) when you SET NAMES correctly. The server can send data to a web page in the encoding you desire, no matter what the table encoding is. It can be recoded for output
4) If there is a column in encoding A and a column in encoding B, and you compare them (or use LIKE), the Mysql will silently convert them so that it looks like they are in one encoding
5) Mysql is smart. It never operates with text as with bytes unless the type is binary. It always operates as characters! He wants that ё in latin1 would equal ё in utf8 if he knows the data encoding

Since you claim you now get s**t back, it suggests that the characters were modified in the database.
How are you accessing the data in mysql? If you are using a programming interface such as PHP, then you may need to tell that interface what character encoding to expect.
For example, in PHP you will need to call something like mysql_set_charset("utf8"); but it can also be done with an SQL query of SET NAMES utf8
You will then also need to make sure that whatever is displaying the text knows it is utf8 and is rendering with an appropriate encoding. For example, on a web page you would need to set the content type to utf-8. something like Content-Type: text/html;charset=utf-8

Related

PHP 5.6 encoding latin1 MySQL encoding

After migration from PHP 5.3 to PHP 5.6 I have encoding problem. My MySQL database is latin1 and my PHP files are in windows-1251. Now everything is displayed like "ñëåäíèòå àäðåñè" or "�����".
It should be display something in Cyrillic like "кирилица". I've tried mysqli_set_charset but it didn't solve my problem.
First, let's see what you have in the table. Do SELECT col, HEX(col)... to see how these are encoded. Here is the HEX that should be there if it is correctly utf8-encoded:
ñëå --> C3B1C3ABC3A5; кир --> D0BAD0B8D180
If you don't get those, then the problem was on inserting, and we may (or may not) be able to repair the data. If you have C390C2BAC390C2B8C391E282AC for the Cyrillic, then you have "double encoding", and it will take some work to 'fix'.
utf8 needs to be established in about 4 places.
The column(s) in the database -- Use SHOW CREATE TABLE to verify that they are explicitly set to utf8, or defaulted from the table definition. (It is not enough to change the database default.)
The connection between the client and the server. See SET NAMES utf8.
The bytes you have. (This is probably the case.)
If you are displaying the text in a web page, check the <meta> tag.
Halfer is right. Change both your PHP and MySQL encoding, first the PHP with
mb_internal_encoding ("UTF-8");
mb_http_output("UTF-8");
to UTF-8, at the top of your PHP pages.
If you miss out the "UTF-8" and print the output from these finctions, it will show you your current PHP encoding - probably windows-1251
Also note that with MySQL you need to change the character encoding on the row in the table as well as on the table itself overall and on the database itself overall, as the defaults will remain latin1 so any new fields you add would be latin1 without being carefully checked.
If you are trying to save Cryllic text to the database you will need the correct Cryllic character set in the database, rather than latin1

Convert Database Data to utf8 - What is the correct way?

I have MYSQL database collation set to latin1_swedish_ci but my site uses encoding windows-1256. This means the data inside tables is encoded with windows-1256.
What is the correct way to convert my database tables/fields and data to utf-8 using iconv or any other library?
First, you need to verify that the data in the table(s) is really latin1. Could you do SELECT HEX(col), col ... to see what it looks like.
Depending on whether it is latin1 encoding or utf8 encoding (or something else) will determine what steps to perform. (If you do these steps without knowing, you could make things worse.)
These references give you the next steps:
http://dev.mysql.com/doc/refman/5.0/en/alter-table.html and/or
http://mysql.rjweb.org/doc.php/charcoll

MySQL outputs Western encoding in UTF-8 PHP file

I have the following problem: on a very simple php-mysqli query:
if ( $result = $mysqli->query( $sqlquery ) )
{
$res = $result->fetch_all();
$result->close();
}
I get strings wrongly encoded as Western encoded string, although the database, the table and the column is in utf8_general_ci collation. The php script itself is utf-8 encoded and the mysql-less parts of the script get the correct encodings. So say echo "ő" works perfectly, but echo $res[0] from the previous example outputs the EF BF BD character when the file viewed in the correct UTF-8 encoding. If I manually switch the browser's encoding to Western, the mysqli sourced strings get good decoding, except for the non-western characters being replaced with "?'.
What makes it even stranger is that on my development environment this isn't happening, while on my webserver it is. The developer environment is a LAMP stack (The Uniform Server), while the webserver uses nginx.
In this case, I entered the data in the database using phpMyAdmin, and inside phpmyadmin it displays perfectly. phpMyAdmin's collation is utf-8 too. I believe that the problem must be somewhere around here, as on the same webserver, for an other site where I enter data through php (using POST) the same problem doesn't happen. On that case, the data is visible correctly both while entering and while viewing it (I mean in the php generated webpages), but the special characters are not correct in phpMyAdmin.
Can you help me start where to debug? Is it connected to php or mysql or nginx or phpMyAdmin?
Use mysqli_set_charset to change the client encoding to UTF-8 just after you connect:
$mysqli->set_charset("utf8");
The client encoding is what MySql expects your input to be in (e.g. when you insert user-supplied text to a search query) and what it gives you the results in (so it has to match your output encoding in order for echo to display things correctly).
You need to have it match the encoding of your web page to account for the two scenarios above and the encoding of the PHP source file (so that the hardcoded parts of your queries are interpreted correctly).
Update: How to convert data inserted using latin-1 to utf-8
Regarding data that have already been inserted using the wrong connection encoding there is a convenient solution to fix the problem. For each column that contains this kind of data you need to do:
ALTER TABLE table_name MODIFY column_name existing_column_type CHARACTER SET latin1;
ALTER TABLE table_name MODIFY column_name BLOB;
ALTER TABLE table_name MODIFY column_name existing_column_type CHARACTER SET utf8;
The placeholders table_name, column_name and existing_column_type should be replaced with the correct values from your database each time.
What this does is
Tell MySql that it needs to store data in that column in latin1. This character set contains only a small subset of utf8 so in general this conversion involves data loss, but in this specific scenario the data was already interpreted as latin1 on input so there will be no side effects. However, MySql will internally convert the byte representation of your data to match what was originally sent from PHP.
Convert the column to a binary type (BLOB) that has no associated encoding information. At this point the column will contain raw bytes that are a proper utf8 character string.
Convert the column to its previous character type, telling MySql that the raw bytes should be considered to be in utf8 encoding.
WARNING: You can only use this indiscriminate approach if the column in question contains only incorrectly inserted data. Any data that has been correctly inserted will be truncated at the first occurrence of any non-ASCII character!
Therefore it's a good idea to do it right now, before the PHP side fix goes into effect.
Use mysqli::set_charset function.
$mysqli->set_charset('utf8'); //returns false if the encoding was not valid... won't happen
http://php.net/manual/en/mysqli.set-charset.php
I haven't used mysqli for some time, but if things are the same, connections by default use the latin swedish encoding (ISO 8859 1).
I will consider your page is already using utf8 encoding by having:
<meta http-equiv="Content-Type" content="text/html; charset=utf-8"/>
Inside the <head> tag.
If you have string already on latin swedish encoding, you can use mk_convert_encoding:
http://php.net/manual/en/function.mb-convert-encoding.php
$fixedStr = mb_convert_encoding($wrongStr, 'UTF-8', 'ISO-8859-1');
iconv does something very similar: Truth be told, I don't know the difference, but here's the link to the function reference:
http://php.net/manual/en/function.iconv.php
I just realized that you might have some strings in utf8 and others in latin swedish. You can use mb_detect_encoding for that: http://php.net/manual/en/function.mb-detect-encoding.php
You can also dump the database and use iconv (cmd line) if you have it installed:
iconv -f latain -t utf-8 < currentdb.sql > fixeddb.sql

PHP MySQL Chinese UTF-8 Issue

I have a MySQL table & fields that are all set to UTF-8. The thing is, a previous PHP script, which was in charge of the database writing, was using some other encoding, not sure whether it is in the script itself, the MySQL connection or somewhere else. The result is that although the table & fields are set to UTF-8, we see the wrong chars instead of Chinese.
It looks like that:
Now, the previous scripts (which were in charge of the writing and corrupted the data) can read it well for some reason, but my new script which all encoded in UTF-8, shows chars like ½©. How can that be fixed?
By the sound of it, you have a utf8 column but you are writing to it and reading from it using a latin1 connection, so what is actually being stored in the table is mis-encoded. Your problem is that when you read from the table using a utf8 connection, you see the data that's actually stored there, which is why it looks wrong. You can fix the mis-encoded data in the table by converting to latin1, then back to utf8 via the binary character set (three steps in total).
The original database was in a Chinese encoding – GB-18030 or similar, not Latin-1 – and the bytes that make up these characters, when displayed in UTF-8, show up as a bunch of Latin diacritics. Read each string as GB-18030, convert it to UTF-8, and save.

Don't fix whats not broken

I have made code that stores utf-8 in a database.
It shows it well in the browser but looks distorted in the database. Since the functionality seems to work and it doesn't look like I have had any problems with processing the string input, is it any point in 'fixing what is not broken' and make utf-8 characters like Japanese show in the database?
I don't search the database since the strings are serialized anyway.
You have to specify the text encoding of the queries, you are sending to MySQL with for instance
SET NAMES `utf8` COLLATE `utf8_unicode_ci`
If you don't, MySQL may interpret your query with the servers default text-encoding that can be different to UTF-8, e.g. iso-latin. So you will have strings in your tables, that are UTF-8 encoded, but MySQL marked them as iso-latin. That won't have much effect on your code, because MySQL just returns your UTF-8 strings back to you and you ignore the text-encoding. If you view the data in phpMyAdmin or any other application, that sets the connections character encoding, you will end up with distorted strings.
You could on the other hand utf8_decode your query strings and utf8_encode the result's provided by MySQL and don't change the connections text encoding from iso-latin. but if you query a different MySQL server that uses UTF-8 as default text encoding, you will end up with the same problem the other way around. so just set the connection's text encoding once after connecting.
What do you use to access the database. If you use a console just the the encoding in the console to utf-8. If you use GUI software just check the options the set the encoding to utf-8. You can try 'set names' to ser the client encoding.

Categories