Convert and show data in mysql from 1252 to UTF-8

Convert and show data in mysql from 1252 to UTF-8 - php

I have one old database which I must use. The problem is that the old data(mostly text) is stored in 1252(latin1_general_ci) and is showed like ?????? on the page. Then I've converted whole database and the table to UTF-8 collation like this:
ALTER DATABASE databasename CHARACTER SET utf8 COLLATE utf8_unicode_ci;
ALTER TABLE tablename CONVERT TO CHARACTER SET utf8 COLLATE utf8_unicode_ci;
But the problem whit old records remains. I know that the queries above are just change the fields collation. My question is there an way to show those ????? records properly on the web page now?

1) Create dump
mysqldump --default-character-set=latin1 --skip-set-charset mydatabase mytable > ./mytable.sql
2) In mytable.sql replace latin1 in utf8
CREATE TABLE `test` (
`id` int(10) unsigned NOT NULL,
`name` char(255) NOT NULL default '',
PRIMARY KEY (`id`)
) ENGINE=MyISAM DEFAULT CHARSET=latin1;
3) Import DB
mysql --user=login -p --database=mydatabase < ./mytable.sql
mysqldump — A Database Backup Program

Related

MySQL Character Set & Select Query Performance in stored procedure

Recently I noticed few queries are taking very long time in execution, checked further and found that MySQL Optimizer is trying to use COLLATE in Where clause and that's causing performance issue, if I run below query without COLLATE then getting quick response from database:
SELECT notification_id FROM notification
WHERE ref_table = 2
AND ref_id = NAME_CONST('v_wall_detail_id',_utf8mb4'c37e32fc-b3b5-11ec-befc-02447a44a47c' COLLATE 'utf8mb4_unicode_ci')
MySQL version 5.7
Database Character Set: utf8mb4
Column Character set: UTF8
Column Data Type: CHAR(36) UUID
From PHP in Connection object passing: utf8mb4
Index is applied
This query is written in MySQL stored procedure
SHOW CREATE TABLE
CREATE TABLE `notification` (
`notification_id` CHAR(36) CHARACTER SET utf8 COLLATE utf8_unicode_ci NOT NULL,
`title` VARCHAR(500) CHARACTER SET utf8mb4 COLLATE utf8mb4_unicode_ci DEFAULT NULL,
`created` TIMESTAMP NOT NULL DEFAULT CURRENT_TIMESTAMP,
PRIMARY KEY (`notification_id`)
) ENGINE=INNODB DEFAULT CHARSET=utf8mb4
SHOW VARIABLES LIKE 'coll%';
collation_connection utf8_general_ci
collation_database utf8mb4_unicode_ci
collation_server latin1_swedish_ci
SHOW VARIABLES LIKE 'char%';
character_set_client, Connection,Result, System: utf8
character_set_database utf8mb4
character_set_server latin1
character_set_system utf8
Any suggestion, what improvements are needed to make my queries faster?

The table's character set is utf8, so I guess its collation is one of utf8_general_ci or utf8_unicode_ci. You can check this way:
SELECT collation_name from INFORMATION_SCHEMA.COLUMNS
WHERE table_schema = '...your schema...' AND table_name = 'notification'
AND column_name = 'ref_id';
You are forcing it to compare to a string with a utf8mb4 charset and collation. An index is a sorted data structure, and the sort order depends on the collation of the column. Using that index means taking advantage of the sort order to look up values rapidly, without examining every row.
When you compared the column to a string with a different collation, MySQL cannot infer that the sort order or string equivalence of your UUID constant is compatible. So it must do string comparison the hard way, row by row.
This is not a bug, this is the intended way for collations to work. To take advantage of the index, you must compare to a string with a compatible collation.
I tested and found that the following expressions fail to use the index:
Different character set, different collation:
WHERE ref_id = _utf8mb4'c37e32fc-b3b5-11ec-befc-02447a44a47c' COLLATE utf8mb4_general_ci
WHERE ref_id = _utf8mb4'c37e32fc-b3b5-11ec-befc-02447a44a47c' COLLATE utf8mb4_unicode_ci
Same character set, different collation:
WHERE ref_id = _utf8'c37e32fc-b3b5-11ec-befc-02447a44a47c' COLLATE 'utf8_unicode_ci'
The following expressions successfully use the index:
Different character set, default collation:
WHERE ref_id = _utf8mb4'c37e32fc-b3b5-11ec-befc-02447a44a47c'
Same character set, same collation:
WHERE ref_id = _utf8'c37e32fc-b3b5-11ec-befc-02447a44a47c' COLLATE 'utf8_general_ci'
Same character set, default collation:
WHERE ref_id = _utf8'c37e32fc-b3b5-11ec-befc-02447a44a47c'
To simplify your environment, I recommend you should just use one character set and one collation in all tables and in your session. I suggest:
ALTER TABLE notification CONVERT TO CHARACTER SET utf8mb4 COLLATE utf8mb4_unicode_ci;
This will rebuild the indexes on string columns, using the sort order for the specified collation.
Then using COLLATE utf8mb4_unicode_ci will be compatible, and will use the index.
P.S. In all cases I omitted the NAME_CONST() function, because it has no purpose in a WHERE clause as far as I know. I don't know why you are using it.

These say what the client is talking in:
collation_connection utf8_general_ci
character_set_client, Connection,Result, System: utf8
Either change them or change the various columns to match them.
If you have Stored routines, they need to be dropped, do SET NAMES to match what you picked, then re-CREATEd.
Since you are using 5.7, I recommend using utf8mb4 and utf8mb4_unicode_520_ci throughout.

SQL collation not working

I am trying to insert some utf8 encoded data into a mysql database. The special characters are correctly displayed in the browser, but not in my database. After manually changing the collation to utf8_unicode_ci and confirming that "this operation will attempt to convert your data to the new collation", the data gets displayed correctly. However if I create the table using
CREATE TABLE IF NOT EXISTS table_name (
date date NOT NULL,
searchengine VARCHAR(255) CHARACTER SET utf8 COLLATE utf8_unicode_ci NOT NULL,
location VARCHAR(255) CHARACTER SET utf8 COLLATE utf8_unicode_ci NOT NULL,
keyword VARCHAR(255) CHARACTER SET utf8 COLLATE utf8_unicode_ci NOT NULL,
position INT NOT NULL,
competition VARCHAR(255) CHARACTER SET utf8 COLLATE utf8_unicode_ci NOT NULL,
url VARCHAR(255) CHARACTER SET utf8 COLLATE utf8_unicode_ci NOT NULL
)
and insert the data after creating the table, the data is still not shown correctly, even though the colletion is utf8_unicode_ci. Any ideas on how to fix this?

A collation is a set of rules that defines how to compare and sort character strings. Each collation in MySQL belongs to a single character set. Every character set has at least one collation, and most have two or more collations. A collation orders characters based on weights.
utf8mb4_unicode_ci is based on the Unicode standard for sorting and comparison, which sorts accurately in a very wide range of languages.

Wrong character encoding with database output using laravel

I've recently started using laravel for a project I'm working on, and I'm currently having problems displaying data from my database in the correct character encoding.
My current system consists of a separate script responsible for populating the database with data, while the laravel project is reponsible for displaying the data. The view that is used, is set to display all text as utf-8, which works as I've successfully printed special characters in the view. Text from the database is not printed as utf8, and will not print special characters the right way. I've tried using both eloquent models and DB::select(), but they both show the same poor result.
charset in database.php is set to utf8 while collation is set to utf8_unicode_ci.
The database table:
CREATE TABLE `RssFeedItem` (
`id` int(10) unsigned NOT NULL AUTO_INCREMENT,
`feedId` smallint(5) unsigned NOT NULL,
`title` varchar(250) COLLATE utf8_unicode_ci NOT NULL,
`url` varchar(250) COLLATE utf8_unicode_ci NOT NULL,
`created_at` datetime NOT NULL,
`updated_at` datetime NOT NULL,
`text` mediumtext COLLATE utf8_unicode_ci,
`textSha1` varchar(250) COLLATE utf8_unicode_ci DEFAULT NULL,
PRIMARY KEY (`id`),
UNIQUE KEY `url` (`url`),
KEY `feedId` (`feedId`),
CONSTRAINT `RssFeedItem_ibfk_1` FOREIGN KEY (`feedId`) REFERENCES `RssFeed` (`id`)
) ENGINE=InnoDB AUTO_INCREMENT=6370 DEFAULT CHARSET=utf8 COLLATE=utf8_unicode_ci;
I've also set up a test page in order to see if the problem could be my database setup, but the test page prints everything just fine. The test page uses PDO to select all data, and prints it on a simple html page.
Does anyone know what the problem might be? I've tried searching around with no luck besides this link, but I haven't found anything that might help me.

I did eventually end up solving this myself. The problem was caused by the separate script responsible for populating my database with data. This was solved by running a query with SET NAMES utf8 before inserting data to the database. The original data was pulled out, and then sent back in after running said query.
The reason for it working outside laravel, was simply because the said query wasn't executed on my test page. If i ran the query before retrieving the data, it came out with the wrong encoding because the query stated that the data was encoded as utf8, when it really wasn't.

How to convert latin1_swedish_ci data into utf8_general_ci?

I have a MySQL database with all the table fields collation as
latin1_swedish_ci
It has almost 1000 of the records already stored and now I want to convert all these data into
utf8_general_ci
So that I can display any language content. I have already altered the field collations into utf8_general_ci but this does not CONVERT all the old records into utf8_general_ci

one funny thing.
CONVERT TO CHARSET and CONVERT()/CAST() suggested by Anshu will work fine if charset in the table is in right encoding.
If for some reason latin1 column containts utf8 text, CONVERT() and CAST() will not be able to help. I had "messed" my database with that setup so spend bit more time on solving this.
to fix this in addition to character set conversion, there are several exercises required.
"Hard one" is to recreate the database from dump that will be converted via console
"Simple one" is to convert row by row or table by table:
INSERT INTO UTF8_TABLE (UTF8_FIELD)
SELECT convert(cast(convert(LATIN1_FIELD using latin1) as binary) using utf8)
FROM LATIN1_TABLE;
basically, both cases will process string to original symbols and then to right encoding, that won't happen with simple convert(field using encoding) from table; command.

Export your table.
Drop the table.
Open the export file in the editor.
Edit it manually where the table structure is created.
old query:
CREATE TABLE `message` (
`message_id` int(11) NOT NULL,
`message_thread_id` int(11) NOT NULL,
`message_from` int(11) NOT NULL,
`message_to` int(11) NOT NULL,
`message_text` longtext NOT NULL,
`message_time` varchar(50) NOT NULL
) ENGINE=InnoDB DEFAULT CHARSET=latin1;
new query: ( suppose you want to change message_text field. )
CREATE TABLE `message` (
`message_id` int(11) NOT NULL,
`message_thread_id` int(11) NOT NULL,
`message_from` int(11) NOT NULL,
`message_to` int(11) NOT NULL,
`message_text` longtext CHARACTER SET utf8mb4 COLLATE utf8mb4_unicode_ci NOT NULL,
`message_time` varchar(50) NOT NULL
) ENGINE=InnoDB DEFAULT CHARSET=latin1;
save the file and import back to the database.

Trouble with encondig in Mysql

I have the next table:
CREATE TABLE IF NOT EXISTS `applications` (
`id` int(11) NOT NULL AUTO_INCREMENT,
`name` varchar(45) COLLATE utf8_unicode_ci DEFAULT NULL,
PRIMARY KEY (`id`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8 COLLATE=utf8_unicode_ci;
I want to store the value "España" in the "name" field.
I have a PHP FILE (encoded in UTF8) with a form to save that. When i save "España" using the php file and i read from mysql with php i see the data ok.
But if a go to PMA o Mysql Query Browser i see this: "EspaÃ±a"
If i save it from PMA (with encoding set to UTF-8 ) or mysql query browser i see ok on that two tools, but i see "Espa�a" from PHP.
I dont understand why.
In bytes:
If is saved from PHP i see: C3 83 C2 B1 (for ñ)
If is saved from MQB or PMA i see: C3 B1 (for ñ)

Run mysql_set_charset() before executing queries on the open connection.
mysql_set_charset("UTF-8");

The problem was the php mysql client. It uses latin1 as encoding for the connection, you can see this with:
echo mysql_client_encoding($con);
or
print_r(mysql_fetch_assoc(mysql_query("show variables like 'char%';")));
There is two ways to solve this:
mysql_set_charset("utf8");
or
mysql_query("SET CHARACTER SET 'UTF8'", $con); // data send by the server
mysql_query("SET NAMES 'UTF8'", $con); // data send by the client

We Keep Coding

PHP, A popular general-purpose scripting language that is especially suited to web development.