Confusion with utf8_general_ci & utf8_unicode_ci - php

Mysql server collation is utf8_general_ci in my.cnf
I am using utf8_general_ci collation for database, now i have created few tables with utf8_unicode_ci collation in
same database.
Now i would like to use utf8_unicode_ci for server/database/tables/fields. In order to do that first i need to change collation for server to utf8_unicode_ci
then for database, tables and fields.
My question is i already have data in tables stored using utf8_general_ci, can i just keep as it is without doing anything to data Or do i need to do any kind of conversion.
Other thing is, as you can see server level collation is utf8_general_ci but at table and field level is utf8_unicode_ci, so with my current setup when i store and retrieve data from these tables what collation mysql use?
Thank you.

"Server level" collation means nothing.
Server and database level charset (and collation) serve as mere default values for the table (and database) creation.
Say, if you didn't supply any collation when created a database, it will be created using server collation. But if you do - the supplied one will be used and server collation won't interfere at all.
If you didn't supply any collation in table definition, the table will be created using database collation. But if you do - the supplied one will be used and neither server nor database collation will affect your queries.
It's only table and field level collation that matters.
if i already have data in tables stored using utf8_general_ci, can i just keep as it is
Yes. You can have tables with any charset in your database.

Related

Working with SET NAMES utf8mb4 with utf8 tables

In a large system based on Mysql 5.5.57 Php 5.6.37 setup
Currently the whole system is working in utf8 including SET NAMES utf8 at the beginning of each db connection.
I need to support emojis in one of the tables so I need to switch it to utf8mb4. I don't want to switch other tables.
My question is - if I change to SET NAMES utf8mb4 for all connections (utf8 and utf8mb4) and switch the specific table only to utf8mb4 (and only write mb4 data to this table). Will the rest of the system work as before?
Can there be any issue from working with SET NAMES utf8mb4 in the utf8 tables/data/connections?
I think there should no problem using SET NAMES utf8mb4 for all connections.
(utf8mb3 is a synonym of utf8 in MySQL; I'll use the former for clarity.)
utf8mb3 is a subset of utf8mb4, so your client's bytes will be happy either way (except for Emoji, which needs utf8mb4). When the bytes get to (or come from) a column that is declared only there will be a check to verify that you are not storing Emoji or certain Chinese characters, but otherwise, it goes through with minimal fuss.
I suggest
ALTER TABLE ... CONVERT TO utf8mb4
as the 'right' way to convert a table. However, it converts all varchar/text columns. This may be bad...
If you JOIN a converted table to an unconverted table, then you will be trying to compare a utf8mb3 string to a utf8mb4 string. MySQL will throw up its hands and convert all rows from one to the other. That is no INDEX will be useful.
So... Be sure to at least be consistent about any columns that are involved in JOINs.

utf8mb4_unicode_ci Selected in PhpMyAdmin but WordPress Tables using utf8mb4_unicode_520_ci Collation

I have selected utf8mb4_unicode_ci Collation (since this was recommended to use instead of latin..) in PhpMyAdmin in both options, in Server connection collation under General settings and in Database under Operations Tab of PhpMyAdmin
but tables in that database which are of WordPress blog, are using utf8mb4_unicode_520_ci Collation (which can be seen on main window e.g by clicking on that database)
My Question is, is this any bad thing or does it have any negative effect that I have selected utf8mb4_unicode_ci but Database for WordPress blog is using utf8mb4_unicode_520_ci tables in Database. All of the tables in that database are using utf8mb4_unicode_520_ci.
1) Should I change options from utf8mb4_unicode_ci to utf8mb4_unicode_520_ci in PhpMyAdmin (in both places as mentioned above)
2) Or it does not have any bad effect, I should leave it, as it is.
hoping to get answer for this query.
Thank You for reading.
When doing CREATE TABLE ..., the collation comes from:
You can explicitly state the collation with the CREATE, or
Defaulting to the database's collation (CREATE DATABASE ...)
Similarly, when declaring a column, you can be either explicit or default to the TABLE's settings.
I prefer to be explicit, not letting things default.
There is no harm when the database / table / column disagree on CHARACTER SET and/or COLLATION.
Until you get to MySQL 8.0, utf8mb4_unicode_520_ci is the "best" collation. (Best according to the Unicode standards committee.)

PHP + MySQL + Spanish

My system deals with spanish data. I am using laravel + mysql. My database collation is latin1 - default collation and my tables structure looks something like this:
CREATE TABLE `product` (
`id` int(11) NOT NULL AUTO_INCREMENT,
`name` varchar(100) CHARACTER SET latin1 NOT NULL,
) ENGINE=InnoDB AUTO_INCREMENT=298 DEFAULT CHARSET=utf8mb4;
Have a few questions:
I load data from file to db. Is it a good practice to
utf8_encode($name) before inserting to db? I am currently doing so,
else some comparison throw error : SQLSTATE[HY000]: General error: 1267 Illegal mix of collations (latin1_swedish_ci,IMPLICIT) and (utf8_unicode_ci,COERCIBLE) for operation '='
If using utf8_encode is the way to go, do i need to utf8_encode even name i want to search? i.e. select... where name =
utf8_encoded(name)?
Is there any flaws or better way to handle the above? As i doing spanish for the first time (characters with accents).
Your product.name column has the character set latin1. You know that. It also has the collation latin1_swedish_ci. That's the default. The original developers of MySQL are Swedish. Because you're working in Spanish, you probably want to use latin1_spanish_ci for your collation; it sorts Ñ after N. The other Latin-language collations sort them together.
Because your product.name column is stored in latin1, it is a bad, not a good, idea to use utf8_encode() on text before storing it to that column.
Your best course of action, especially if your application is new, is to make the character set for all columns utf8mb4. That means changing the defined character set of your name column. Then you can convert text strings to unicode before storing them.
You probably would be wise to make the default collation of each table utf8mb4_spanish_ci as well. Collations get baked into indexes for varchar() columns. (If you're working in traditional Spanish, in which ch is a distinct letter, use utf8mb4_spanish2_ci.)

What does collation utf8mb4_unicode_ci mean

I was working on a project and wanted to implement a posts table similar to the wordpress posts table to store page content.
So I basically copied the wp_posts table which is longtext however I noticed under collation it had utf8mb4_unicode_ci
I'm wondering what this means and what its necessary for?
utf8mb4_unicode_ci support full unicode in mysql databases.
More information can be found here https://mathiasbynens.be/notes/mysql-utf8mb4
Basically there are many characters in Unicode that cant be stored in table with utf8, thus resulting in data loss.
UTF-8 symbols take one to three bytes, but there are symbols that can take even 4, and these werent supported (utf8 - utf8mb4).
In wordpress this change from utf8 collation was cause of problems for some users, mostly because utf8mb4_unicode_ci is supported only in MySQL 5.5.3+.

How to set a specific default charset for Multi Collation for a table

I create a table with different collation column that included:
utf8_persian_ci
cp1256
Why different collation? Because some of them address and correct charset for PHP that be able create Persian folder/files is windows-1256 therefore I thought I need to set charset cp1256 for saving path into mysql.
It shows ???? instead of farsi characters When I fetch rows from the table to show in PHP. My default charset sets to UTF8.
Now what is the problem that row store with ??? or php shows ??? instead persian keywords?

Categories