Database charset conversion - php

I've moved database from one host to another. I've used PMA to export and bigdump to import. The whole database have latin2 charset set everywhere where it's possible. However in database, special chars (polish ąęłó, etc.) are broken. When I used SELECT i see "bushes" - "Ä�" insetad of "ą". Then I've set document encoding to utf-8... And the characters are good. How to fix this? Can it be done using CONVERT in query? I don't want to export/import database again, because it has over 200MB. What's wrong?
Every PHP/MySQL query solution will save me.
Sorry if you can't understand this, because I'm still learning english though.

If a table contains the wrong kind of charset (let's say utf-8 has slipped into latin1 column varhcar(255)):
ALTER TABLE tablename MODIFY colummname BINARY(255);
ALTER TABLE tablename MODIFY colummname VARCHAR(255) CHARSET utf8;
ALTER TABLE tablename MODIFY colummname VARCHAR(255) CHARSET latin1;
See also: http://dev.mysql.com/doc/refman/4.1/en/charset-conversion.html
However, it is more likely you just have a wrong character set in your default connection. What does a SET NAMES latin1; before selecting result in?

Related

Emojis on textarea does not save post

I have a commenting system and when its just text, no problem - it is saved to the database. When I add a 😄 (for instance), then no comment is saved to the database? Nothing is saving, when there is an emoji.
What can I do to allow emojis?
The "message" is where I am saving the actual comment and where there should be an emoji.
You might want to update the charset and potentially collation. I'm assuming you're using MySQL. This is very confusing, but in MySQL the UTF8 charset isn't actually UTF8, but a mysql's proprietary charset that is largely similar to the actual UTF8, but lacks some characters.
The way to handle it is to switch to the actual UTF8, which in the world of mysql is called utf8mb4_general_ci. You can do so by running
ALTER DATABASE <you db name> CHARACTER SET utf8mb4_general_ci COLLATE utf8mb4_general_ci;
(this will affect only the new tables that you create)
and
ALTER TABLE <you existing table name> CONVERT TO CHARACTER SET utf8mb4_general_ci;
(this will update an already existing table, although the emojis that you already lost cannot be recovered)

Bulk insert string containing Russian

I am converting a spreadsheet using PHPExcel to a Database and the cell value happens to contain Russian. If I run mb_detect_encoding() I am told the text is UTF8 and if I set a header of UTF8 then I see the correct Russian characters.
However if I compile it into a string (with only addslashes involved in the process) and insert it into the table I see lots of ????. I have set the table characterset as utf8mb4 and also set the collation as utf8mb4_general_ci. I have also run $this->db->query("SET NAMES 'utf8mb4'"); on my DB connection.
I run PDO query() with my multi part insert and get the ???s but if I output the query to screen I get ÐŸÐ¾Ñ which would be valid UTF8. Why would this not be stored correctly in the database?
I have kept this question rather than deleting it so someone may find the answer helpful.
The reason I was struggling was because in SQLYog it doesn't show you the column Charset by default. There is an option which reads "Hide language options" on the Alter table view which will then reveal that when SQLyog creates a table it uses the default server Charset as opposed to what you define the table Charset to be. I'm not sure if thats correct - but the solution simply is to turn on the Column Charset settings and check they match what you are expecting.
По is Mojibake for По. Probably...
The bytes you have in the client are correctly encoded in utf8 (good).
You connected with SET NAMES latin1 (or set_charset('latin1') or ...), probably by default. (It should have been utf8.)
The column in the tables may or may not have been CHARACTER SET utf8, but it should have been that.
The question marks imply...
you had utf8-encoded data (good)
SET NAMES latin1 was in effect (default, but wrong)
the column was declared CHARACTER SET latin1 (default, but wrong)
One way to help diagnose the problem(s) is to run
SELECT col, HEX(col) FROM tbl WHERE ...
For По, the hex should be D09FD0BE. Each Cyrillic character, in utf8, is hex D0xx.

Is there a way to fix an encoding issue directly in my Mysql?

I'm having an issue with my Wordpress install. Somehow all the content was inserted in the database with a wrong charset, but in the frontend is working smoothly.
As you can see here:
http://prntscr.com/8vifc3
I'm changing my host, but when I imported my site won'r render the encoding properly because of the way that the content was inserted.
There's a way so I can fix the encoding directly in my previous mysql before I export it?
Thanks
You can fix the issue by converting your strings to binary and then do charset conversion. The example below converts UTF8 data to CP1251:
UPDATE table SET column=CONVERT(CONVERT(CONVERT(column USING binary) USING utf8) USING cp1251) WHERE id=123;
you can use set_charset` function
in mysqli
$mysqli->set_charset("utf8")
or you can change the charset from phpmyadmin to utf8_*
That's Mojibake
The bytes you have in the client are correctly encoded in utf8 (good).
You connected with SET NAMES latin1 (or set_charset('latin1') or ...), probably by default. (It should have been utf8.)
The column in the tables may or may not have been CHARACTER SET utf8, but it should have been that.
If you need to fix the data it takes a "2-step ALTER", something like
ALTER TABLE Tbl MODIFY COLUMN col VARBINARY(...) ...;
ALTER TABLE Tbl MODIFY COLUMN col VARCHAR(...) ... CHARACTER SET utf8 ...;

insert ≠ (not equal to ) in mysql field

I run the following query in mysql
UPDATE `gamequestions` SET a2 = '≠' WHERE id = 564
It runs successfully but the '?' is inserted in a2 field in place of '≠'
The datatype of a2 is text and also tried with varchar
Any Help greatly appreciated.
you need to change Collation to UTF-8 to store special characters
insert ≠ (not equal to ) in mysql field
The goal in these conversions is always to decide on what charset/collation combination you want to use (UTF8 being the best choice in almost all scenarios) then to convert all tables/columns in your database to use that charset. At that point you can set DB_COLLATE and DB_CHARSET` to the desired charset and collation to match.
Note:
In most cases if a collation is not defined MySQL will assume the default collation for the CHARSET which is specified. For UTF8 the default is utf8_general_ci, which is usually the right choice.
Changing the default charset of the database
ALTER DATABASE MyDb CHARACTER SET utf8;
Changing the default charset of individual tables
ALTER TABLE MyTable CHARACTER SET utf8;
https://dev.mysql.com/doc/refman/5.1/en/charset-unicode-utf8.html
You can add that option in the /mysql/my.cnf. In the [mysqld] section add ’’character-set-server=UTF8"; in the [client] section add “default-character-set=UTF8”.
You can find more information in these links:
http://dev.mysql.com/doc/refman/5.1/en/charset-… http://dev.mysql.com/doc/refman/5.0/en/server-o…
If you need to conver existing data, you can execute:
ALTER TABLE tbl_name CONVERT TO CHARACTER SET charset_name;
You need to check following things
use set names utf8 before you query/insert into the database
using Default CHARSET=utf8 when creating new tables

Page with UTF-8 encoding sends data to MySQL with UTF-8 encoding but entry is scrambled

I realize there's a dozen similar questions, but none of the solutions suggested there work in this case.
I have a PHP variable on a page, initialized as:
$hometeam="Крылья Советов"; //Cyrrilic string
When I print it out on the page, it prints out correctly. So echo $hometeam displays the string Крылья Советов, as it should.
The content meta tag in the header is set as follows:
<META HTTP-EQUIV="Content-Type" CONTENT="text/html; charset=UTF-8">
And, at the very beginning of the page, I have the following (as suggested in one of the solutions found in my search):
ini_set('default_charset', 'utf-8');
So that should be all good.
The MySQL table I'm trying to save this to, and the column in question, have utf8_bin as their encoding. When I go to phpMyAdmin and manually enter Крылья Советов, it saves properly in the field.
However, when I try to save it through a query on the page, using the following basic query:
mysql_query("insert into tablename (round,hometeam) values ('1','$hometeam') ");
The mysql entry looks like this:
c390c5a1c391e282acc391e280b9c390c2bbc391c592c391c28f20c390c2a1c390c2bec390c2b2c390c2b5c391e2809ac390c2bec390c2b2
So what's going on here? If everything is ok on the page, and everything is ok with MySQL itself, where is the issue? Is there something I should add to the query itself to make it keep the string UTF-8 encoded?
Note that I have set mysql_set_charset('utf8'); after connecting to the database (at the top of the page).
EDIT: Running the query SHOW VARIABLES LIKE "%character_set%" gives the following:
Variable_name Value
character_set_client utf8
character_set_connection utf8
character_set_database latin1
character_set_filesystem binary
character_set_results utf8
character_set_server latin1
character_set_system utf8
character_sets_dir /usr/share/mysql/charsets/
Seems like there could be something here, since there are 2 latin1's in that list. What do you think?
Also, when I type a Cyrillic string directly into phpMyAdmin, it appears fine at first (it displays correctly after I save it). But reloading the table, it displays in HEX like the inserted ones. I apologize for the misinformation regarding this in the question. As it turns out, this should mean the problem is with phpMyAdmin or the database itself.
EDIT #2: this is what show create table tablename returns:
CREATE TABLE `tablename` ( `id` int(11) NOT NULL AUTO_INCREMENT, `round` int(11), `hometeam` varchar(32) COLLATE utf8_bin NOT NULL, `competition` varchar(32) CHARACTER SET latin1 NOT NULL DEFAULT 'Russia', PRIMARY KEY (`id`)) ENGINE=MyISAM AUTO_INCREMENT=119 DEFAULT CHARSET=utf8 COLLATE=utf8_bin
Do you get this hex string in phpMyAdmin? I suppose when you SELECT the inserted value by e.g. PHP or the MySQL console client, you would be given the expected cyrillic UTF8 string.
If so, it's a configuration issue with phpMyAdmin, see e.g. here: http://theyouri.blogspot.ch/2010/12/phpmyadmin-collated-db-in-utf8bin-shows.html
phpMyAdmin collated db in utf8_bin shows hex data instead of UTF8 text
$cfg['DisplayBinaryAsHex'] = false;
Moreover, please don't use mysql_query that way, since you're totally open to SQL injections. I'm also not sure if you really want to use utf8_bin, see e.g. this discussion: utf8_bin vs. utf_unicode_ci or this: UTF-8: General? Bin? Unicode?
EDIT There's something weird going on. If you translate the given hex string to UTF8 characters, you get this: "ÐšÑ€Ñ‹Ð»ÑŒÑ Ð¡Ð¾Ð²ÐµÑ‚Ð¾Ð²" (see e.g. http://software.hixie.ch/utilities/cgi/unicode-decoder/utf8-decoder). If you utf8_decode this, you get the desired "Крылья Советов". So, it seems that it's at least utf8 encoded twice (besides the problem that it somewhere shows up as hex characters).
Could you please provide the complete script? Do you utf8_encode your string anywhere? If your script is this and only this (besides a valid, opened MySQL connection):
<?php
$hometeam="Крылья Советов"; //Cyrrilic string
// open mysql connection here
mysql_set_charset('utf8');
mysql_query("INSERT INTO tablename (round, hometeam) VALUES ('1', '$hometeam')");
$result = mysql_query("SELECT * FROM tablename WHERE round = '1'");
$row = mysql_fetch_assoc($result);
echo $row['hometeam'];
?>
And you call the page, what is the result (in the page source of the browser, not what is displayed in the browser)?
Also, please check what happens if you change the collation to utf8_unicode_ci, as suggested in another answer here. That at least covers phpMyAdmin issues when displaying binary data and is propably anyway what you'll want (since you probably want ORDER BY clauses to perform as expected, see discussions in the SO questions I linked above).
EDIT2 Perhaps you could also provide some snippets like SHOW CREATE TABLE tablename or SHOW VARIABLES LIKE "%character_set%". Might help.
Also, when I type a Cyrillic string directly into phpMyAdmin, it
appears fine at first (it displays correctly after I save it). But
reloading the table, it displays in HEX like the inserted ones.
This almost certainly looks like there is a problem in your table! Run show create table tablename. I bet there is latin1 instead of utf8, because you have it set as the default in the character_set_database variable.
To change this, run the following commmand:
ALTER TABLE tbl_name CONVERT TO CHARACTER SET charset_name;
This will convert all your varchar fields to utf8. But be careful with the records you already have in the table, as they are already malformed, if you converted them to UTF8 they will stay malformed. Maybe the best idea is to create the database again, just add the following commands at the end of table definition:
CREATE TABLE `tablename` (
....
) ENGINE=<whatever you use> DEFAULT CHARSET=utf8 COLLATE=utf8_general_ci
1) Try to save the entry to the database with the PhpMyAdmin and then also look at the result in PhpMyAdmin. Does it look OK? If yes, database is created and set up properly.
2) Try to use utf8_general_ci instead. This shouldn't matter, but give it a try.
3) Tune all necessary settings on the PHP side - follow this post:
http://blog.loftdigital.com/blog/php-utf-8-cheatsheet . Especially try this trick:
echo htmlentities($hometeam, ENT_QUOTES, 'UTF-8')
As I saw in the comments, you don't seam to be able to update your database configuration isn't it?
I guess you have a misconfiguration of the encoding because I saw that in the official documentation MySQL Documentation
I can propose you a PHP solution. Because of a lot of encoding problem you can transform the string before inserting it inside database. You have to find a common language to talk between PHP and the database.
The one I tried in an other project consist in transform string using url_encode($string) and url_decode($string).

Categories