I've created database with Unicode 'latin1_swedish_ci' and insert some text in it.
later I changed the Unicode in the database to 'UTF-8', then the text inside the database changed to strange text.
how i can convert the text Unicode for the text inside the database to utf-8, so I can read them again ??
ALTER TABLE table_name CONVERT TO CHARACTER SET utf8;
Hi here convert to utf8
ALTER TABLE tablename CHANGE column_name column_name VARCHAR(255) CHARACTER SET utf8 COLLATE utf8_bin NULL DEFAULT NULL;
Related
Recently I noticed few queries are taking very long time in execution, checked further and found that MySQL Optimizer is trying to use COLLATE in Where clause and that's causing performance issue, if I run below query without COLLATE then getting quick response from database:
SELECT notification_id FROM notification
WHERE ref_table = 2
AND ref_id = NAME_CONST('v_wall_detail_id',_utf8mb4'c37e32fc-b3b5-11ec-befc-02447a44a47c' COLLATE 'utf8mb4_unicode_ci')
MySQL version 5.7
Database Character Set: utf8mb4
Column Character set: UTF8
Column Data Type: CHAR(36) UUID
From PHP in Connection object passing: utf8mb4
Index is applied
This query is written in MySQL stored procedure
SHOW CREATE TABLE
CREATE TABLE `notification` (
`notification_id` CHAR(36) CHARACTER SET utf8 COLLATE utf8_unicode_ci NOT NULL,
`title` VARCHAR(500) CHARACTER SET utf8mb4 COLLATE utf8mb4_unicode_ci DEFAULT NULL,
`created` TIMESTAMP NOT NULL DEFAULT CURRENT_TIMESTAMP,
PRIMARY KEY (`notification_id`)
) ENGINE=INNODB DEFAULT CHARSET=utf8mb4
SHOW VARIABLES LIKE 'coll%';
collation_connection utf8_general_ci
collation_database utf8mb4_unicode_ci
collation_server latin1_swedish_ci
SHOW VARIABLES LIKE 'char%';
character_set_client, Connection,Result, System: utf8
character_set_database utf8mb4
character_set_server latin1
character_set_system utf8
Any suggestion, what improvements are needed to make my queries faster?
The table's character set is utf8, so I guess its collation is one of utf8_general_ci or utf8_unicode_ci. You can check this way:
SELECT collation_name from INFORMATION_SCHEMA.COLUMNS
WHERE table_schema = '...your schema...' AND table_name = 'notification'
AND column_name = 'ref_id';
You are forcing it to compare to a string with a utf8mb4 charset and collation. An index is a sorted data structure, and the sort order depends on the collation of the column. Using that index means taking advantage of the sort order to look up values rapidly, without examining every row.
When you compared the column to a string with a different collation, MySQL cannot infer that the sort order or string equivalence of your UUID constant is compatible. So it must do string comparison the hard way, row by row.
This is not a bug, this is the intended way for collations to work. To take advantage of the index, you must compare to a string with a compatible collation.
I tested and found that the following expressions fail to use the index:
Different character set, different collation:
WHERE ref_id = _utf8mb4'c37e32fc-b3b5-11ec-befc-02447a44a47c' COLLATE utf8mb4_general_ci
WHERE ref_id = _utf8mb4'c37e32fc-b3b5-11ec-befc-02447a44a47c' COLLATE utf8mb4_unicode_ci
Same character set, different collation:
WHERE ref_id = _utf8'c37e32fc-b3b5-11ec-befc-02447a44a47c' COLLATE 'utf8_unicode_ci'
The following expressions successfully use the index:
Different character set, default collation:
WHERE ref_id = _utf8mb4'c37e32fc-b3b5-11ec-befc-02447a44a47c'
Same character set, same collation:
WHERE ref_id = _utf8'c37e32fc-b3b5-11ec-befc-02447a44a47c' COLLATE 'utf8_general_ci'
Same character set, default collation:
WHERE ref_id = _utf8'c37e32fc-b3b5-11ec-befc-02447a44a47c'
To simplify your environment, I recommend you should just use one character set and one collation in all tables and in your session. I suggest:
ALTER TABLE notification CONVERT TO CHARACTER SET utf8mb4 COLLATE utf8mb4_unicode_ci;
This will rebuild the indexes on string columns, using the sort order for the specified collation.
Then using COLLATE utf8mb4_unicode_ci will be compatible, and will use the index.
P.S. In all cases I omitted the NAME_CONST() function, because it has no purpose in a WHERE clause as far as I know. I don't know why you are using it.
These say what the client is talking in:
collation_connection utf8_general_ci
character_set_client, Connection,Result, System: utf8
Either change them or change the various columns to match them.
If you have Stored routines, they need to be dropped, do SET NAMES to match what you picked, then re-CREATEd.
Since you are using 5.7, I recommend using utf8mb4 and utf8mb4_unicode_520_ci throughout.
I have a MySQL table with utf8 general ci collation. In the table, I can see two entries:
abad
abád
I am using a query that looks like this:
SELECT * FROM `words` WHERE `word` = 'abád'
The query result gives both words:
abad
abád
Is there a way to indicate that I only want MySQL to find the accented word? I want the query to only return
abád
I have also tried this query:
SELECT * FROM `words` WHERE BINARY `word` = 'abád'
It gives me no results. Thank you for the help.
If your searches on that field are always going to be accent-sensitive, then declare the collation of the field as utf8_bin (that'll compare for equality the utf8-encoded bytes) or use a language specific collation that distinguish between the accented and un-accented characters.
col_name varchar(10) collate utf8_bin
If searches are normally accent-insensitive, but you want to make an exception for this search, try;
WHERE col_name = 'abád' collate utf8_bin
Update for MySQL 8.0, plus addressing some of the Comments and other Answers:
The CHARACTER SET matches the beginning of the COLLATION.
Any COLLATION name ending in _bin will ignore both upper/lower case and accents. Examples: latin1_bin, utf8mb4_bin.
Any COLLATION name containing _as_ will ignore accents, but do case folding or not based on _ci vs _cs.
To see the collations available (on any version), do SHOW COLLATION;.
utf8mb4 is now the default charset. You should be using that instead of utf8.
It is better to have the CHARACTER SET and COLLATION set 'properly' on each column (or defaulted by the table definition) than to dynamically use any conversion routine such as CONVERT().
In my version (MySql 5.0), there is not available any utf8 charset collate for case insensitive, accent sensitive searches. The only accent sensitive collate for utf8 is utf8_bin. However it is also case sensitive.
My work around has been to use something like this:
SELECT * FROM `words` WHERE LOWER(column) = LOWER('aBád') COLLATE utf8_bin
Accepted answer is good, but beware that you may have to use COLLATE utf8mb4_bin instead!
WHERE col_name = 'abád' collate utf8mb4_bin
Above fixes errors like:
MySQL said: Documentation 1253 - COLLATION 'utf8_bin' is not valid for
CHARACTER SET 'utf8mb4'
The MySQL bug, for future reference, is http://bugs.mysql.com/bug.php?id=19567.
Check to see if the database table collation type end with "_ci", This stands for case insensitive...
Change it to collation the same or nearest name without the "_ci" ...
For example... change "utf8_general_ci" to "utf8_bin"
Mke
I was getting the same error.
I've changed the collation of my table to utf8_bin (through phpMyAdmin) and the problem was solved.
SELECT * FROM `words` WHERE column = 'abád' collate latin1_General_CS
(or your collation including cs)
You can try searching for the hex variable of the character, HEX() within mysql and use a similar function within your programming language and match these. This worked well for me when i was doing a listing where a person could select the first letter of a person.
Well, you just described what utf8_general_ci collation is all about (a, á, à, â, ä, å all equals to a in comparison).
There have also been changes in MySQL server 5.1 in regards to utf8_general_ci and utf8_unicode_ci so it's server version dependent too. Better check the docs.
So, If it's MySQL server 5.0 I'd go for utf8_unicode_ci instead of utf8_general_ci which is obviously wrong for your use-case.
That works for me for an accent insensitive and case insensitive search in MySql server 5.1 in a database in utf8_general_ci, where column is a LONGBLOB.
select * from words where '%word%' LIKE column collate utf8_unicode_ci
with
select * from words where'%word%' LIKE column collate utf8_general_ci
the result is case sensitive but not accent sensitive.
While altering my column, I got the following error
ALTER TABLE mytab MODIFY mydate date CHARACTER SET utf8;
1:05:00 ALTER mytab TABLE MODIFY mydate Date CHARACTER SET utf8
COLLATE utf8_unicode_ci Error Code: 1064. Syntax error near 'CHARACTER
SET utf8 COLLATE utf8_unicode_ci' at line 1 0.000 sec
Please help. The same thing is happening with the datatypes int and datetime.
You can only specify a character set for CHAR, VARCHAR and TEXT columns.
I can't convert data from Latin1_swedish to UTF-8.
The application is based on Symfony2 and the database is MySQL.
I've already tried this query:
ALTER TABLE <tablename> CONVERT TO CHARACTER
SET utf8 COLLATE utf8_unicode_ci
and:
ALTER TABLE t MODIFY col1 CHAR(50) CHARACTER SET utf8;
I would like a solution that does all the tables and columns, because MySQL database has 1000 tables. If I had to modify them all manually it would take too long.
Since you mentioned "weird characters", I suspect that "changing from latin1 to utf8" is not the real task, but rather to fix up some kind of mess that happened during INSERTs.
There are about 5 cases to deal with. We don't yet know which case you have. Please provide
SHOW CREATE TABLE for a table that you are trying to change.
SELECT col, HEX(col) ... for some cell that has non-ascii text.
Let's review the attempts:
ALTER TABLE <tablename> CONVERT TO CHARACTER SET utf8;
That assumes the table is declared to be latin1 and correctly contains latin1 bytes, but you would like to change it to utf8. Since 'Ă' and 'Ĺ' do not exist in latin1, this ALTER feels very wrong.
ALTER TABLE t MODIFY col1 CHAR(50) CHARACTER SET utf8;
is similar to the above, but works only one column at a time, and needs exactly the right stuff in the MODIFY clause. Hence, it would be quite tedious.
ALTER DATABASE databasename DEFAULT CHARACTER SET utf8;
merely sets the default CHARACTER SET for any new tables created in that databasename. The word DEFAULT is optional.
HEX('ĂĹ') = 'C482C4B9' -- So it looks like you are working with some Eastern European language, perhaps using utf8, perhaps not. Please provide further details. What came before and after Ă?
The fix for the "weird characters" is probably in my blog, but need details to point you directly.
Have you tried this ?
ALTER DATABASE databasename CHARACTER SET utf8 COLLATE utf8_unicode_ci;
ALTER TABLE tablename CONVERT TO CHARACTER SET utf8 COLLATE utf8_unicode_ci;
If you would like to change entire DB you should run this command:
ALTER DATABASE db_name DEFAULT CHARACTER SET utf8 COLLATE utf8_general_ci;
I have a mysql table, I made it in latin1 style and it is all that way. How can I make a table that is all latin1 except one column, which i need to be able to accept chinese characters?
Also, Whats the best structure for a column with chinese characters?
alter table your_table
modify column
chinese_column varchar(255) collate utf8_general_ci; <-- or any relevant collate
details can be found here