Could someone tell me how to force Doctrine to create database tables with UTF-8 coding and utf8_polish_ci? My Doctrine config file has this db configuration parameters:
$conn = array(
'driver' => 'pdo_mysql',
'dbname' => 'test',
'user' => 'root',
'password' => '*****',
'charset' => 'utf8',
'driverOptions' => array(1002=>'SET NAMES utf8'));
Nevertheless, it's still creating table with default coding: latin1 and latin1_swedish_ci.
You set it in your database, doctrine just uses the databases default values. See this question from the Doctrine 2.1 FAQ:
4.1.1. How do I set the charset and collation for MySQL tables?
You can’t set these values inside the annotations, yml or xml mapping files. To make a database work with the default charset and collation you should configure MySQL to use it as default charset, or create the database with charset and collation details. This way they get inherited to all newly created database tables and columns.
Use code below to set Doctrine collation, charset and engine:
/**
* #ORM\Table(name="temporary", options={"collate"="utf16_latin_ci", "charset"="utf16", "engine"="MyISAM"})
* #ORM\Entity
*/
When you create your database, you should create it like this:
CREATE DATABASE `your_table_name` DEFAULT CHARACTER SET utf8 COLLATE utf8_polish_ci;
that will allow your created tables to inherit the charset and collate values
Related
I'm working on a project using Doctrine 2.4.3 with a MySQL 5.7.21 database with utf8 as default charset.
Recently, I've been looking to implement emoji support. To overcome MySQL's limitation of 3 bytes for utf8, I need to change the columns that can receive emojis to the utf8mb4 charset (see https://dev.mysql.com/doc/refman/5.5/en/charset-unicode-utf8mb4.html).
However, I have not found a way to reflect this in my entities (using annotations).
My database connection config is the following :
$data = array(
'driver' => 'pdo_mysql',
'host' => $dbhost,
'port' => $dbport,
'dbname' => $dbname,
'user' => $dbuser,
'password' => $dbpw,
'charset' => 'utf8mb4'
);
I tried adding annotations to the table :
/* #Entity(repositoryClass="path\to\DAO") #Table(name="post", indexes={#Index(name="uid", columns={"uid"})}, options={"charset":"utf8mb4", "collation":"utf8mb4_unicode_ci"})
* #HasLifecycleCallbacks */
class Post extends BaseEntity
{
...
}
In the same fashion, tried adding annotations to the column (in the same table) itself :
/* #Column(type="text", options={"charset":"utf8mb4", collation":"utf8mb4_unicode_ci"}) */
protected $text;
None of the above worked. I expected an ALTER TABLE query when executing doctrine orm:schema-tool:update --dump-sql but Doctrine sees no change, and I still can't insert 4 bytes emojis.
If I update the column's charset myself directly in MySQL, emojis do get supported, but when I do run orm:schema-tool:update, Doctrine sees a difference between my entity and the schema, but seems to not know what to make of it since the output I get is the following :
ALTER TABLE post CHANGE text text LONGTEXT NOT NULL ;
I also tried to add SET NAMES utf8mb4 COLLATE utf8mb4_unicode_ci as driverOptions in my database connection config array, alas to no result either.
Unfortunately, I could not find anything regarding this matter in Doctrine's documentation.
If any of you has any clue regarding this matter, feel free to hit me up! Thanks in advance.
To convert the whole table:
ALTER TABLE tbl CONVERT TO CHARACTER SET utf8mb4;
Please provide
SHOW CREATE TABLE ...
For more troubleshooting: Trouble with UTF-8 characters; what I see is not what I stored
As I have legacy requirements and cannot update Doctrine's lib as of right now, I had to find a workaround.
What I did was manually convert my tables to utf8mb4 with SQL queries, which is not overwritten by Doctrine back to utf8 when executing orm:schema-tool:update --force after the charset conversion.
For the record, I generated the update statements with the following script :
SELECT CONCAT('ALTER TABLE ', t.table_schema, '.', t.table_name, ' CONVERT TO CHARACTER SET utf8mb4 COLLATE utf8mb4_unicode_ci;')
FROM information_schema.tables t
WHERE t.table_schema LIKE {your_schema};
^ Do not execute this blindly - check beforehand if existing data will fit while utf8mb4 encoded. For more details check the very good article from Mathias Bynens on the matter : https://mathiasbynens.be/notes/mysql-utf8mb4#column-index-length
I also changed the database's charset settings.
ALTER DATABASE {database_name} CHARACTER SET = utf8mb4 COLLATE = utf8mb4_unicode_ci;
I did keep the 'charset' => 'utf8mb4' in the Doctrine's database connection settings array for correct transmission of the data.
For new entities (tables), annotating them with correct settings in table options does create them with the right charset and collation :
#Entity #Table(name="table", options={"charset":"utf8mb4", "collate":"utf8mb4_unicode_ci"})
Cheers.
Inserting UTF-8 encoded string into UTF-8 encoded table gives incorrect string value.
PDOException: SQLSTATE[HY000]: General error: 1366 Incorrect string value: '\xF0\x9D\x84\x8E i...' for column 'body_value' at row 1: INSERT INTO
I have a 𝄎 character, in a string that mb_detect_encoding claims is UTF-8 encoded.
I try to insert this string into a MySQL table, which is defined as (among other things) DEFAULT CHARSET=utf8
Edit: Drupal always does SET NAMES utf8 with optional COLLATE (atleast when talking to MySQL).
Edit 2: Some more details that appear to be relevant. I grab some text from a PostgreSQL database. I stick it onto an object, use mb_detect_encoding to verify that it's UTF-8, and persist the object to the database, using node_save. So while there is an HTTP request that triggers the import, the data does not come from the browser.
Edit 3: Data is denormalized over two tables:
SELECT character_set_name FROM information_schema.COLUMNS C WHERE table_schema = "[database]" AND table_name IN ("field_data_body", "field_revision_body") AND column_name = "body_value";
>+--------------------+
| character_set_name |
+--------------------+
| utf8 |
| utf8 |
+--------------------+
Edit 4: Is it possible that the character is "to new"? I'm more than a little fuzzy on the relationship between unicode and UTF-8, but this wikipedia article, implies that the character was standardized very recently.
I don't understand how that can fail with "Incorrect string value".
𝄎 (U+1D10E) is a character Unicode found outside the BMP (Basic Multilingual Plane) (above U+FFFF) and thus can't be represented in UTF-8 in 3 bytes. MySQL charset utf8 only accepts UTF-8 characters if they can be represented in 3 bytes. If you need to store this in MySQL, you'll need to use MySQL charset utf8mb4. You'll need MySQL 5.5.3 or later. You can use ALTER TABLE to change the character set without much problem; since it needs more space to store the characters, a couple issues show up that may require you to reduce string size. See http://dev.mysql.com/doc/refman/5.5/en/charset-unicode-upgrading.html .
to solve this issue, first you change your database field to utf8m4b charset. For example:
ALTER TABLE `tb_name` CHANGE `field_name` `field_name` VARCHAR(100) CHARACTER SET utf8mb4 COLLATE utf8mb4_general_ci NULL DEFAULT NULL;
then in your db connection, set driver_options for it to utf8mb4. For example, if you use PDO
$db = new PDO('mysql:host=localhost;dbname=testdb;charset=utf8mb4', 'username', 'password');
or in zend framework 1.2
$dbParam = array('host' => 'localhost', 'username' => 'db_user_name',
'password' => 'password', 'dbname' => 'db_name',
'driver_options' => array(
'1002' => "SET NAMES 'utf8mb4'",
'12' => 0 //this is not necessary
)
);
In your PDO connecton, set the charset.
new PDO('mysql:host=localhost;dbname=the_db;charset=utf8mb4', $user, $password);
I fixed the error:
SQLSTATE[HY000]: General error: 1366 Incorrect string value ......
with this method:
I use utf8mb4_unicode_ci for database
Set utf8mb4_unicode_ci for all tables
Set longblog datatype for column (not text, longtext.... you need big datatype to store 4 bytes of your content)
It is okay now.
If you use laravel, continue to edit config/database.php
'charset' => 'utf8mb4',
'collation' => 'utf8mb4_unicode_ci',
If you use function strtolower, replace it with mb_strtolower
Notice: you have to put <meta charset="utf-8"> on your head tag
THE SITUATION:
Sorry in advance if this question has already been asked, but the solutions aren't working for me.
No matter what I try, I cannot store emoji in my database. They are saved as ????.
The only emojis that are properly saved are the ones that require only 3 bytes, like the shy face or the sun.
The actual utf8mb4 is not working.
It has been tested on both Android and Ios. With same results.
VERSIONS:
Mysql: 5.5.49
CodeIgniter: 3.0.0
THE STEPS:
I have modified database character set and collation properties.
ALTER DATABASE my_database CHARACTER SET = utf8mb4 COLLATE = utf8mb4_unicode_ci
I have modified table character set and collation properties.
ALTER TABLE table_name CONVERT TO CHARACTER SET utf8mb4 COLLATE utf8mb4_unicode_ci
I have set each field of the table, where possible, as Encoding: UTF-8(ut8mb4) and Collation: utf8mb4_unicode_ci
I have modified the database connection in the CodeIgniter app.
I have run the following: SET NAMES utf8mb4 COLLATE utf8mb4_unicode_ci
Lastly I have also tried this:
REPAIR TABLE table_name;
OPTIMIZE TABLE table_name;
Everything should have been setup properly but yet it doesn't work.
DATABASE SETTINGS:
This is the outcome running the following command:
`SHOW VARIABLES WHERE Variable_name LIKE 'character\_set\_%' OR Variable_name LIKE 'collation%';`
TABLE SETTINGS:
A screeshot of the table structure:
DATABASE CONNECTION:
These are the database connection settings inside database.php (note this is not the only database, there are also others that connect using utf8)
$db['my_database'] = array(
'dsn' => '',
'hostname' => PROJECT_DATABASE_HOSTNAME,
'username' => PROJECT_DATABASE_USERNAME,
'password' => PROJECT_DATABASE_PASSWORD,
'database' => PROJECT_DATABASE_NAME,
'dbdriver' => 'mysqli',
'dbprefix' => '',
'pconnect' => FALSE,
'db_debug' => TRUE,
'cache_on' => FALSE,
'cachedir' => '',
'char_set' => 'utf8mb4',
'dbcollat' => 'utf8mb4_unicode_ci',
'swap_pre' => '',
'encrypt' => FALSE,
'compress' => FALSE,
'stricton' => FALSE,
'failover' => array(),
'save_queries' => TRUE
);
MY.CNF SETTINGS:
This is the whole content of the file my.cnf:
[mysqld]
default-storage-engine=MyISAM
innodb_file_per_table=1
max_allowed_packet=268435456
open_files_limit=10000
character-set-client-handshake = FALSE
character-set-server = utf8mb4
collation-server = utf8mb4_unicode_ci
[client]
default-character-set = utf8mb4
[mysql]
default-character-set = utf8mb4
THE QUESTION:
Do you know why is not working? Am I missing something?
HYPHOTESIS 1:
I am not sure, but the cause of the problem may be this:
As you can see in my.cnf character-set-server is clearly set as utf8mb4:
But after running the query in the database:
SHOW VARIABLES WHERE Variable_name LIKE 'character\_set\_%' OR Variable_name LIKE 'collation%';
The outcome is that character-set-server = latin1
Do you know why is that? Why is not actually updating?
HYPHOTESIS 2:
The application use several different databases.
This one is set to utf8mb4 but all the others are set to utf8. It may be a problem even if they are separated databases?
Thank you!
EDIT:
This is the outcome of SHOW CREATE TABLE app_messages;
CREATE TABLE `app_messages` (
`message_id` bigint(20) unsigned NOT NULL AUTO_INCREMENT,
`project_id` bigint(20) NOT NULL,
`sender_id` bigint(20) NOT NULL,
`receiver_id` bigint(20) NOT NULL,
`message` text COLLATE utf8mb4_unicode_ci,
`timestamp` bigint(20) DEFAULT NULL,
`is_read` enum('x','') COLLATE utf8mb4_unicode_ci DEFAULT NULL,
PRIMARY KEY (`message_id`)
) ENGINE=InnoDB AUTO_INCREMENT=496 DEFAULT CHARSET=utf8mb4 COLLATE=utf8mb4_unicode_ci
EDIT 2:
I have run the following command:
INSERT INTO app_messages (message_id, project_id, sender_id, receiver_id, message, timestamp, is_read)
VALUES ('496','322','77','188', '😜' ,'1473413606','x');
And other two similar with 😂 and 👻
They were inserted in the table without problems:
But in the actual app what i really see is: ? (this time only one ? and not 4)
Okay I finally managed to make it working!
Thanks to everybody that tried to help me, especially #Rick James and #Gerard Roche.
SUGGESTION:
If you need to work with emoji first of all make simple tests on localhost. Create a new database and make a fresh app for testing purpose.
If you follow the steps I wrote in the question or if you follow this tutorial: https://mathiasbynens.be/notes/mysql-utf8mb4#utf8-to-utf8mb4 it must work.
Working locally on a fresh basic app you will have more control and more room to make all the test you need.
SOLUTION:
In my case the problem was in the configuration of the database in CodeIgniter. It was not properly setting up the char_set and the collation for a stupid overlooking: I was overriding the database settings in the function that save messages to be sure it was working with the mobile database.
BEFORE:
function message_save ( $data = FALSE )
{
$project_db_config = array();
$project_db_config['hostname'] = 'MY_HOST';
$project_db_config['username'] = 'MY_USERNAME';
$project_db_config['password'] = 'MY_PASSWORD';
$project_db_config['database'] = 'MY_DATABASE';
$mobile_db = $this->load->database( $project_db_config, TRUE );
// other code to save message
}
AFTER:
function message_save ( $data = FALSE )
{
$mobile_db_connection = $this->load->database('admin_mobile_mh', TRUE);
// other code to save message
}
CONCLUSION:
The app must set the connection to the database properly.
If you have the database properly setup but you don't make the proper connection with your app, it won't work.
So if you encounter similar problems make sure the api properly setup the char_set as utf8mb4 and db_collat as utf8mb4_unicode_ci.
The only way I know of to get ???? for an Emoji is to not have the column declared utf8mb4. I understand that you have apparently determined that the column is declared that way, but please run SHOW CREATE TABLE table_name; to further confirm it.
The system default, the database default, and the table default are irrelevant if the column overrides the CHARACTER SET.
A note to all the other attempted answers: The COLLATION is irrelevant, only the CHARACTER SET is relevant for this question.
my.cnf is loaded first, then conf.d/*.cnf.
Instead of modifying my.cnf *(which may be overridden by configurations in conf.d/*.cnf), create a custom override configuration e.g. conf.d/90-my.cnf.
Prefixing 90 ensures the custom settings are loaded last which means they overwrite any earlier set settings.
To ensure the new configuration is reloaded, see Reload Without Restarting the MySQL service.
Example Configuration Structure (Linux)
.
├── conf.d
│ ├── 90-my.cnf
│ ├── conn.cnf
│ ├── my5.6.cnf
│ └── mysqld_safe_syslog.cnf
├── debian.cnf
├── debian-start
└── my.cnf
conf.d/90-my.cnf
# https://mathiasbynens.be/notes/mysql-utf8mb4
# http://stackoverflow.com/q/3513773/934739
[client]
default-character-set = utf8mb4
[mysql]
default-character-set = utf8mb4
[mysqld]
character-set-client-handshake = FALSE
# The server character set and collation are used as default values if the
# database character set and collation are not specified in CREATE DATABASE
# statements. They have no other purpose.
character-set-server = utf8mb4
collation-server = utf8mb4_unicode_ci
Instead of the varchar you can change the Table filed value as follows to utf8mb4
Make sure all your tables' default character sets and text fields are converted to utf8mb4, in addition to setting the client & server character sets, e.g. ALTER TABLE mytable charset=utf8mb4, MODIFY COLUMN textfield1 VARCHAR(255) CHARACTER SET utf8mb4,MODIFY COLUMN textfield2 VARCHAR(255) CHARACTER SET utf8mb4; and so on.
hi i have used EMOJI in android and i stored it to orm database using EMOJI_INDEX.I saved in db in normal message in string form but when i get that time i check if there is any emoji then convert it into there processemoji.
textMessage.setText(getItem(pos).file != null ? "":EmojiUtil.getInstance(context).processEmoji(getItem(pos).message, textMessage.getTextSize()));
Take a look from here how i changed Emoji_Index to process
if (emojiImages == null || emojiImages.isRecycled()) {
InputStream localInputStream;
try {
localInputStream = context.getAssets().open("emoji/emoji_2x.png");
Options opts = new Options();
opts.inPurgeable = true;
opts.inInputShareable = true;
emojiImages = BitmapFactory.decodeStream(localInputStream, null, opts);
} catch (IOException e) {
return Html.fromHtml(paramString);
}
}
For more information take a look from here.
Thanks hope this will help you.
I had a problem with the server version, on linux. I had to change the file database_interface.lib.php manually and around this
if (!PMA_DRIZZLE) {
if (! empty($GLOBALS['collation_connection'])) {
change it so that, is becomes this: ( note the utf8mb4_unicode_ci references )
// Skip charsets for Drizzle
if (!PMA_DRIZZLE) {
if (! empty($GLOBALS['collation_connection'])) {
PMA_DBI_query("SET CHARACTER SET 'utf8mb4';", $link, PMA_DBI_QUERY_STORE);
$set_collation_con_query = "SET collation_connection = '"
. PMA_Util::sqlAddSlashes($GLOBALS['collation_connection']) . "';";
PMA_DBI_query(
$set_collation_con_query,
$link,
PMA_DBI_QUERY_STORE
);
} else {
PMA_DBI_query(
"SET NAMES 'utf8mb4' COLLATE 'utf8mb4_unicode_ci';",
$link,
PMA_DBI_QUERY_STORE
);
}
}
Updated answer
You can try charset utf8 collation utf8_unicode_ci instead of utf8mb4_unicode_ci.
run this query
ALTER TABLE table_name CHANGE `column_name` `column_name` TEXT CHARSET utf8 COLLATE utf8_unicode_ci;
old answer
You should use collation utf8mb4_bin instead of utf8mb4_unicode_ci.
run this query
ALTER TABLE table_name CHANGE `column_name` `column_name` TEXT CHARSET utf8mb4 COLLATE utf8mb4_bin;
Emojis will be stored as code and converted into emojis again in Android and iOS apps. I have used this code in my projects as well.
I have a project in Phalcon PHP and MySql.
when UTF8 characters have to keep these errors are stored.
For example:
I save : nueva descripción ñññ
in Database: nueva descipción ñññ
I have tried several types of collations both in the database, tables and fields.
Thanks for your help.
While having properly defined database elements, you have to also set your connection to use UTF-8 ecoding. As of Phalcon makes use of PDO, you can try to modify your connection alike to:
$di["db"] = function() {
return new \Phalcon\Db\Adapter\Pdo\Mysql(array(
"host" => "localhost",
"username" => "root",
"password" => "1234",
"dbname" => "test",
"options" => array( // this is your important part
PDO::MYSQL_ATTR_INIT_COMMAND => 'SET NAMES utf8'
)
));
};
Example from Phalcon Forum.
As of I'm working with Polish language, my DB collations are mostly set to utf8_polish_ci or sometimes to utf8_universal_ci. You have to test it out because of result sorting issues.
check your project database if it is utf8-unicode-ci collation.
Also check all your individual table has collation utf-8-unicode-ci
If it is not ok ,check your apache mysql config my.ini file
In that check UTF 8 Settings has no hash (#) comment like this
## UTF 8 Settings
init-connect=\'SET NAMES utf8\' //remove #
collation_server=utf8_unicode_ci
character_set_server=utf8
I have a feed that I pull data into a database from. It provides the data in XML format. However, the data includes "illegal" characters. For example:
A GREAT NEIGHBOURHOOD – WITH A
or
large “country style†eat-in
or
Garage 14’x32’, large
or
OR…….ENDLESS POSSIBILITIES!!
My question is first, how do I identify the encoding of these characters, and second, how do I change the encoding to match the UTF8 format expected by my database?
EDIT: To be clear, there's no database involved in this process (at this point in the process, anyway). The data will be inserted into the DB later, but at the moment I'm just reading the data via a PHP script and printing it on screen using var_dump.
EDIT 2: the data is being pulled from a RETS feed using the PHP PHRETS library
The problem is that your UTF-8 response is treated in a different way or the database is not set up correctly. Here some examples on where this could happen and how to fix it.
Before Using Curl
header("Content-Type: text/html; charset=utf-8");
Mysql (my.cnf)
[client]
default-character-set=utf8
[mysql]
default-character-set=utf8
[mysqld]
collation-server = utf8_unicode_ci
init-connect='SET NAMES utf8'
character-set-server = utf8
When Creating The Database Manually
CREATE DATABASE `your_table_name` DEFAULT CHARACTER SET utf8 COLLATE utf8_polish_ci;
When Using Frameworks such as Doctrine
$conn = array(
'driver' => 'pdo_mysql',
'dbname' => 'test',
'user' => 'root',
'password' => '*****',
'charset' => 'utf8',
'driverOptions' => array(1002=>'SET NAMES utf8')
);
It seems that at some point the XML source or data, that is UTF-8, is treated as ISO-8859-1 and converted to UTF-8. Depending on how you generate the feed this could happen at several points.
The most likely point is the encoding for the database connection. Make sure it is UTF-8.
Another possibility is the content type header you send.
Please add your database encoding type so we can answer better.
In order to detect the encoding type of a string you will need to use the mb_detect_encoding as follow:
echo mb_detect_encoding("your-string");
You can also use this function to convert from one encoding type to another,
$str = mb_convert_encoding($str, $source_encode, $destination_encode);