This question already has answers here:
UTF-8 all the way through
(13 answers)
Closed 6 years ago.
My charset in the database is set to utf8_unicode_ci, all files encoded in UTF8 (without BOM).
Here is my php code:
<?php
require_once("./includes/config.php");
$article = new Article();
$fields = array(
'status' => '0',
'title' => 'מכבי ת"א אלופת אירופה בפעם ה-9',
'shorttitle' => 'מכבי ת"א אלופת אירופה',
'priority' => '1',
'type' => '1',
'category' => '2',
'template' => '68',
'author' => '1',
'date' => date("Y-m-d H:i"),
'lastupdate' => date("Y-m-d H:i"),
'preview' => 'בלה בלה בלה',
'content' => 'עוד קצת בלה בלה בלה',
'tags' => 'מכבי ת"א,יורוליג,אליפות אירופה',
'comments' => '1'
);
$article->set($fields);
$article->save();
for some reason, the Hebrew characters appear like this in phpmyadmin:
מכבי ת"× ×לופת ×ירופה ×‘×¤×¢× ×”-9
Database connection code:
<?php
final class Database
{
protected $fields;
protected $con;
public function __construct($host = "", $name = "", $username = "", $password = "")
{
if ($host == "")
{
global $config;
$this->fields = array(
'dbhost' => $config['Database']['host'],
'dbname' => $config['Database']['name'],
'dbusername' => $config['Database']['username'],
'dbpassword' => $config['Database']['password']
);
$this->con = new mysqli($this->fields['dbhost'], $this->fields['dbusername'], $this->fields['dbpassword'], $this->fields['dbname']);
if ($this->con->connect_errno > 0)
die("<b>Database connection error:</b> ".$this->con->connect_error);
}
else
{
$this->con = new mysqli($host, $username, $password, $name);
if ($this->con->connect_errno > 0)
die("<b>Database connection error:</b> ".$this->con->connect_error);
}
}
Any ideas why?
You have set the database's and file's character set to UTF-8, but the data transfer between PHP and the database also needs to be set correctly.
You can do this using set_charset:
Sets the default character set to be used when sending data from and to the database server.
Add the following as last statement of your Database constructor:
$this->con->set_charset("utf8");
This will not fix the issue for the data that is already in the database, but for new data written to the database you should notice the difference.
If you decide to rebuild your database, then please consider using the superior utf8mb4 character set, as described in the MySql docs:
The character set named utf8 uses a maximum of three bytes per character and contains only BMP characters. As of MySQL 5.5.3, the utf8mb4 character set uses a maximum of four bytes per character supports supplemental characters:
For a BMP character, utf8 and utf8mb4 have identical storage characteristics: same code values, same encoding, same length.
For a supplementary character, utf8 cannot store the character at all, while utf8mb4 requires four bytes to store it. Since utf8 cannot store the character at all, you do not have any supplementary characters in utf8 columns and you need not worry about converting characters or losing data when upgrading utf8 data from older versions of MySQL.
utf8mb4 is a superset of utf8
It's important that your entire line code has the same charset to avoid issues where characters displays incorrectly.
There are a few settings that needs to be properly defined and I'd strongly recommend UTF-8, as this has most letters you would need (Hebrew), but also supports a wide variety of other charsets too (Scandinavian, Greek, Arabic).
Here's a little list of things that has to be set to a specific charset.
Headers
Setting the charset in both HTML and PHP headers to UTF-8
PHP: header('Content-Type: text/html; charset=utf-8');
(PHP headers has to be placed before any kind output (echo, whitespace, HTML))
HTML: <meta http-equiv="Content-Type" content="text/html; charset=utf-8" />
(HTML-headers are placed within the <head> / </head> tag)
Connection
You also need to specify the charset in the connection itself (placed directly after creating the connection).
$this->con->set_charset("utf8");
Database and tables
Your database and all its tables has to be set to UTF-8. Note that charset is not exactly the same as collation (see this post).
You can do that by running the queries below once for each database and tables (for example in phpMyAdmin)
ALTER DATABASE yourDatabase CHARACTER SET utf8 COLLATE utf8_unicode_ci;
ALTER TABLE yourTable CONVERT TO CHARACTER SET utf8 COLLATE utf8_unicode_ci;
Other
Some specific functions have the attribute of a specific charset, and if you are using such functions, it should be specified there as well
It may be that you already have values in your database that are not encoded with UTF-8. Updating them manually could be a pain and could consume a lot of time. Should this be the case, you could use something like ForceUTF8 and loop through your databases, updating the fields with that function.
Should you follow all of the pointers above, chances are your problem will be solved. If not, you can take a look at this StackOverflow post: UTF-8 all the way through.
Related
In my table, I have a row like this:
Amazing ... 💰💰💰
When I try do display it in my view, it show this:
Amazing ... ???
In the head of the html page, I have well the tag
<meta http-equiv="Content-Type" content="text/html; charset=utf-8" />
In core.php I have:
Configure::write('App.encoding', 'UTF-8');
In my database.php, I have:
public $default = array(
'datasource' => 'Database/Mysql',
'persistent' => false,
'host' => 'localhost',
'login' => 'root',
'password' => 'xxx',
'database' => 'xxx',
'prefix' => '',
'encoding' => 'utf8',
);
I am converting a current python script in php and I can see this code:
'comment_text': row[2].encode('unicode-escape'),
I tried for find the equivalent for encode('unicode-escape') in php but nothing found.
Do I need to use a similar function for my php display or I don't need to use this function equivalent and something is wrong with my encoding setup ?
I also had same problem before. The thing is the utf8 encoding only supports three bytes per character. You can read detail here
MySQL’s utf8 isn’t UTF-8. So, you can't save some char and emoji and sometime it may cut off your text.
What I did is I applied utf8mb4 to all table and schema.
E.g
`ALTER TABLE your_table CONVERT TO CHARACTER SET utf8mb4 COLLATE utf8mb4_unicode_ci;`
`ALTER SCHEMA `your_schema` DEFAULT CHARACTER SET utf8mb4 DEFAULT COLLATE utf8mb4_unicode_ci;`
After onward, happy to saved emoji character as well :).
I'm working on a project using Doctrine 2.4.3 with a MySQL 5.7.21 database with utf8 as default charset.
Recently, I've been looking to implement emoji support. To overcome MySQL's limitation of 3 bytes for utf8, I need to change the columns that can receive emojis to the utf8mb4 charset (see https://dev.mysql.com/doc/refman/5.5/en/charset-unicode-utf8mb4.html).
However, I have not found a way to reflect this in my entities (using annotations).
My database connection config is the following :
$data = array(
'driver' => 'pdo_mysql',
'host' => $dbhost,
'port' => $dbport,
'dbname' => $dbname,
'user' => $dbuser,
'password' => $dbpw,
'charset' => 'utf8mb4'
);
I tried adding annotations to the table :
/* #Entity(repositoryClass="path\to\DAO") #Table(name="post", indexes={#Index(name="uid", columns={"uid"})}, options={"charset":"utf8mb4", "collation":"utf8mb4_unicode_ci"})
* #HasLifecycleCallbacks */
class Post extends BaseEntity
{
...
}
In the same fashion, tried adding annotations to the column (in the same table) itself :
/* #Column(type="text", options={"charset":"utf8mb4", collation":"utf8mb4_unicode_ci"}) */
protected $text;
None of the above worked. I expected an ALTER TABLE query when executing doctrine orm:schema-tool:update --dump-sql but Doctrine sees no change, and I still can't insert 4 bytes emojis.
If I update the column's charset myself directly in MySQL, emojis do get supported, but when I do run orm:schema-tool:update, Doctrine sees a difference between my entity and the schema, but seems to not know what to make of it since the output I get is the following :
ALTER TABLE post CHANGE text text LONGTEXT NOT NULL ;
I also tried to add SET NAMES utf8mb4 COLLATE utf8mb4_unicode_ci as driverOptions in my database connection config array, alas to no result either.
Unfortunately, I could not find anything regarding this matter in Doctrine's documentation.
If any of you has any clue regarding this matter, feel free to hit me up! Thanks in advance.
To convert the whole table:
ALTER TABLE tbl CONVERT TO CHARACTER SET utf8mb4;
Please provide
SHOW CREATE TABLE ...
For more troubleshooting: Trouble with UTF-8 characters; what I see is not what I stored
As I have legacy requirements and cannot update Doctrine's lib as of right now, I had to find a workaround.
What I did was manually convert my tables to utf8mb4 with SQL queries, which is not overwritten by Doctrine back to utf8 when executing orm:schema-tool:update --force after the charset conversion.
For the record, I generated the update statements with the following script :
SELECT CONCAT('ALTER TABLE ', t.table_schema, '.', t.table_name, ' CONVERT TO CHARACTER SET utf8mb4 COLLATE utf8mb4_unicode_ci;')
FROM information_schema.tables t
WHERE t.table_schema LIKE {your_schema};
^ Do not execute this blindly - check beforehand if existing data will fit while utf8mb4 encoded. For more details check the very good article from Mathias Bynens on the matter : https://mathiasbynens.be/notes/mysql-utf8mb4#column-index-length
I also changed the database's charset settings.
ALTER DATABASE {database_name} CHARACTER SET = utf8mb4 COLLATE = utf8mb4_unicode_ci;
I did keep the 'charset' => 'utf8mb4' in the Doctrine's database connection settings array for correct transmission of the data.
For new entities (tables), annotating them with correct settings in table options does create them with the right charset and collation :
#Entity #Table(name="table", options={"charset":"utf8mb4", "collate":"utf8mb4_unicode_ci"})
Cheers.
Inserting UTF-8 encoded string into UTF-8 encoded table gives incorrect string value.
PDOException: SQLSTATE[HY000]: General error: 1366 Incorrect string value: '\xF0\x9D\x84\x8E i...' for column 'body_value' at row 1: INSERT INTO
I have a 𝄎 character, in a string that mb_detect_encoding claims is UTF-8 encoded.
I try to insert this string into a MySQL table, which is defined as (among other things) DEFAULT CHARSET=utf8
Edit: Drupal always does SET NAMES utf8 with optional COLLATE (atleast when talking to MySQL).
Edit 2: Some more details that appear to be relevant. I grab some text from a PostgreSQL database. I stick it onto an object, use mb_detect_encoding to verify that it's UTF-8, and persist the object to the database, using node_save. So while there is an HTTP request that triggers the import, the data does not come from the browser.
Edit 3: Data is denormalized over two tables:
SELECT character_set_name FROM information_schema.COLUMNS C WHERE table_schema = "[database]" AND table_name IN ("field_data_body", "field_revision_body") AND column_name = "body_value";
>+--------------------+
| character_set_name |
+--------------------+
| utf8 |
| utf8 |
+--------------------+
Edit 4: Is it possible that the character is "to new"? I'm more than a little fuzzy on the relationship between unicode and UTF-8, but this wikipedia article, implies that the character was standardized very recently.
I don't understand how that can fail with "Incorrect string value".
𝄎 (U+1D10E) is a character Unicode found outside the BMP (Basic Multilingual Plane) (above U+FFFF) and thus can't be represented in UTF-8 in 3 bytes. MySQL charset utf8 only accepts UTF-8 characters if they can be represented in 3 bytes. If you need to store this in MySQL, you'll need to use MySQL charset utf8mb4. You'll need MySQL 5.5.3 or later. You can use ALTER TABLE to change the character set without much problem; since it needs more space to store the characters, a couple issues show up that may require you to reduce string size. See http://dev.mysql.com/doc/refman/5.5/en/charset-unicode-upgrading.html .
to solve this issue, first you change your database field to utf8m4b charset. For example:
ALTER TABLE `tb_name` CHANGE `field_name` `field_name` VARCHAR(100) CHARACTER SET utf8mb4 COLLATE utf8mb4_general_ci NULL DEFAULT NULL;
then in your db connection, set driver_options for it to utf8mb4. For example, if you use PDO
$db = new PDO('mysql:host=localhost;dbname=testdb;charset=utf8mb4', 'username', 'password');
or in zend framework 1.2
$dbParam = array('host' => 'localhost', 'username' => 'db_user_name',
'password' => 'password', 'dbname' => 'db_name',
'driver_options' => array(
'1002' => "SET NAMES 'utf8mb4'",
'12' => 0 //this is not necessary
)
);
In your PDO connecton, set the charset.
new PDO('mysql:host=localhost;dbname=the_db;charset=utf8mb4', $user, $password);
I fixed the error:
SQLSTATE[HY000]: General error: 1366 Incorrect string value ......
with this method:
I use utf8mb4_unicode_ci for database
Set utf8mb4_unicode_ci for all tables
Set longblog datatype for column (not text, longtext.... you need big datatype to store 4 bytes of your content)
It is okay now.
If you use laravel, continue to edit config/database.php
'charset' => 'utf8mb4',
'collation' => 'utf8mb4_unicode_ci',
If you use function strtolower, replace it with mb_strtolower
Notice: you have to put <meta charset="utf-8"> on your head tag
I have a feed that I pull data into a database from. It provides the data in XML format. However, the data includes "illegal" characters. For example:
A GREAT NEIGHBOURHOOD – WITH A
or
large “country style†eat-in
or
Garage 14’x32’, large
or
OR…….ENDLESS POSSIBILITIES!!
My question is first, how do I identify the encoding of these characters, and second, how do I change the encoding to match the UTF8 format expected by my database?
EDIT: To be clear, there's no database involved in this process (at this point in the process, anyway). The data will be inserted into the DB later, but at the moment I'm just reading the data via a PHP script and printing it on screen using var_dump.
EDIT 2: the data is being pulled from a RETS feed using the PHP PHRETS library
The problem is that your UTF-8 response is treated in a different way or the database is not set up correctly. Here some examples on where this could happen and how to fix it.
Before Using Curl
header("Content-Type: text/html; charset=utf-8");
Mysql (my.cnf)
[client]
default-character-set=utf8
[mysql]
default-character-set=utf8
[mysqld]
collation-server = utf8_unicode_ci
init-connect='SET NAMES utf8'
character-set-server = utf8
When Creating The Database Manually
CREATE DATABASE `your_table_name` DEFAULT CHARACTER SET utf8 COLLATE utf8_polish_ci;
When Using Frameworks such as Doctrine
$conn = array(
'driver' => 'pdo_mysql',
'dbname' => 'test',
'user' => 'root',
'password' => '*****',
'charset' => 'utf8',
'driverOptions' => array(1002=>'SET NAMES utf8')
);
It seems that at some point the XML source or data, that is UTF-8, is treated as ISO-8859-1 and converted to UTF-8. Depending on how you generate the feed this could happen at several points.
The most likely point is the encoding for the database connection. Make sure it is UTF-8.
Another possibility is the content type header you send.
Please add your database encoding type so we can answer better.
In order to detect the encoding type of a string you will need to use the mb_detect_encoding as follow:
echo mb_detect_encoding("your-string");
You can also use this function to convert from one encoding type to another,
$str = mb_convert_encoding($str, $source_encode, $destination_encode);
I am trying to insert hebrew values to my MySQL database, but the output is only strange chars, or question marks(???? ???) or empty rectangles ▯▯▯▯ .
I've tried to change the collation and the charset to utf8, but it dont helped so much.
When I use the command "show variables like 'char%' :
everything is utf8 (except character_set_filesystem --> binary of course)
By the way, I am using WAMP Server.
How can I fix it and use hebrew on mysql database?
Thank you .
Set the mysqli charset prior to insert or select, i.e.:
mysqli_set_charset($con,"utf8");
and set your table fields to :
Charset Set: utf8
Collation: utf8_general_ci
Update based on you comment:
Your string is json encoded to decode it use:
$string = '{"name":"\u05d1\u05d9\u05d2 \u05d1\u05d5\u05e8\u05d2\u05e8"}';
print_r( json_decode($string));
OUTPUT:
stdClass Object
(
[name] => ביג בורגר
)