Utf-8 characters displayed as ISO-8859-1 - php

I've got an issue with inserting/reading utf8 content from a db. All verifications I'm doing seem to point to the fact that the content in my DB should be utf8 encoded, however it seems to be latin encoded. The data are initially imported from a PHP script from the CLI.
Configuration:
Zend Framework Version: 1.10.5
mysql-server-5.0: 5.0.51a-3ubuntu5.7
php5-mysql: 5.2.4-2ubuntu5.10
apache2: 2.2.8-1ubuntu0.16
libapache2-mod-php5: 5.2.4-2ubuntu5.10
Vertifications:
-mysql:
mysql> SHOW VARIABLES LIKE 'character_set%';
+--------------------------+----------------------------+
| Variable_name | Value |
+--------------------------+----------------------------+
| character_set_client | utf8 |
| character_set_connection | utf8 |
| character_set_database | utf8 |
| character_set_filesystem | binary |
| character_set_results | utf8 |
| character_set_server | utf8 |
| character_set_system | utf8 |
| character_sets_dir | /usr/share/mysql/charsets/ |
+--------------------------+----------------------------+
8 rows in set (0.00 sec)
mysql> SHOW VARIABLES LIKE 'collation%';
+----------------------+-----------------+
| Variable_name | Value |
+----------------------+-----------------+
| collation_connection | utf8_general_ci |
| collation_database | utf8_bin |
| collation_server | utf8_general_ci |
+----------------------+-----------------+
-database
created with
CREATE DATABASE mydb CHARACTER SET utf8 COLLATE utf8_bin;
CREATE SCHEMA `mydb` DEFAULT CHARACTER SET utf8 COLLATE utf8_bin ;
mysql> status;
--------------
mysql Ver 14.12 Distrib 5.0.51a, for debian-linux-gnu (i486) using readline 5.2
Connection id: 7
Current database: mydb
Current user: root#localhost
SSL: Not in use
Current pager: stdout
Using outfile: ''
Using delimiter: ;
Server version: 5.0.51a-3ubuntu5.7-log (Ubuntu)
Protocol version: 10
Connection: Localhost via UNIX socket
Server characterset: utf8
Db characterset: utf8
Client characterset: utf8
Conn. characterset: utf8
UNIX socket: /var/run/mysqld/mysqld.sock
Uptime: 9 min 45 sec
-sql: before doing my inserts I run the
SET names 'utf8';
-php: before doing my inserts I use utf8_encode() and mb_detect_encoding() which gives me 'UTF-8'. After retrieveing the content from db and before sending it to the user mb_detect_encoding() also gives 'UTF-8'
Validation test:
the only way for me to have the content displayed properly is to set the content type to latin (If I sniff the traffic I can see the content-type header with ISO-8859-1):
ini_set('default_charset', 'ISO-8859-1');
This test shows that the content comes out as latin. I don't understand why.
Does anybody have any idea?
Thanks.

Well, I've found that SET NAMES isn't really all that great. Take a peak at the docs...
What I typically do is execute 4 queries:
SET CHARACTER SET 'UTF8';
SET character_set_database = 'UTF8';
SET character_set_connection = 'UTF8';
SET character_set_server = 'UTF8';
Give that a shot and see if that does it for you...
Oh, and remember, all UTF-8 characters <= 127 are valid ISO-8859-1 characters as well. So if you only have characters <= 127 in the stream, mb_detect_encoding will fall on the higher prevalence charset (which is by default "UTF-8")...

What are you doing before retrieval? Also a 'SET NAMES utf8;'? Otherwise, MySQL will silently convert to the charset the connection indicates as used.
If not even that, what does a SHOW FULL COLUMNS FROM table; show? Having a table with a default charset does not mean the column is. i.e, this is valid:
.
CREATE TABLE test (
`name` varchar(10) character set latin1
) CHARSET=utf8

Related

Chinese characters in database prepared statement

I might have a simple problem of encoding but i can't figure it out.
I have addresses that can be in English or in Chinese in a MySQL database, so i used utf8_unicode_ci . I don't have problems retrieving my chinese characters from the database, but I can't use the chinese characters in a prepared request.
I explain:
If I type
$bdd= new PDO('mysql:host=localhost:3306; dbname=****;charset=utf8', 'root', '');
$list_business = $bdd->query('SELECT * FROM business WHERE address LIKE N\'台灣台南市\' ');
$nb_business=$list_business->rowCount();
I will get one result, because one of the addresses contains "台灣台南市"
But if I try to use a prepared request:
$list_business = $bdd->prepare('SELECT * FROM business WHERE address LIKE ? ');
$list_business->execute(array('%'.$_POST['address'].'%'));
$nb_business=$list_business->rowCount();
If $_POST['address'] is in English it works, in Chinese it doesn't :p
EDIT :
If i echo $_POST['address'] it shows the address in chinese that I input so that part is okay, although, if I echo the address from database it will look like this : "701\u53f0\u7063\u53f0\u5357\u5e02\u6771\u5340\u88d5\u8c50\u885775\u865f".
EDIT2:
When asking for show variables like 'char%'; I got this result
character_set_client utf8mb4
character_set_connection utf8mb4
character_set_database latin1
character_set_filesystem binary
character_set_results utf8mb4
character_set_server latin1
character_set_system utf8
character_sets_dir c:\wamp\bin\mysql\mysql5.6.17\share\charsets\
Please, help!
Thanks beforehand,
Q
have you set your language environment to "UTF-8"?
have you set your mysql character set to utf-8?
in mysql ,exec "show variables like '%char%'; it should return
+--------------------------+----------------------------+
| Variable_name | Value |
+--------------------------+----------------------------+
| character_set_client | utf8 |
| character_set_connection | utf8 |
| character_set_database | utf8 |
| character_set_filesystem | binary |
| character_set_results | utf8 |
| character_set_server | utf8 |
| character_set_system | utf8 |
| character_sets_dir | /usr/share/mysql/charsets/ |
+--------------------------+----------------------------+
try setting the character set of the page itself (if you haven't already).
header('Content-Type: text/html; charset=utf-8');
Turned out that the encoding was correct everywhere, except for the browser itself, on the form i was using to test my php file. I don't get why Google Chrome would encode it as European although i saved the html file as UTF-8.
Anyways, problem is solved. Thanks for your help, guys =)

Error on accentuated characters with PHP and MySQL

My problem is that what is written directly via PHP is correctly accentuated, but when the accentuated word comes from the MySQL, the letters come like this �.
I tried using the html charset as ISO-8859-1 and it fixed the MySQL letters, but broke the others. One way to fix it all is to set my .php files to ISO-8859-1, but I can't do it, I need to use it in utf-8 encode.
What can I do?
At the moment solution: Include mysqli_set_charset($link, "utf8"); before the queries (only need to do once for each connection made). I'm still looking for a conclusive solution on the server, not on the client.
EDIT:
mysql> SHOW VARIABLES LIKE 'char%';
+--------------------------+----------------------------+
| Variable_name | Value |
+--------------------------+----------------------------+
| character_set_client | utf8 |
| character_set_connection | utf8 |
| character_set_database | utf8 |
| character_set_filesystem | binary |
| character_set_results | utf8 |
| character_set_server | utf8 |
| character_set_system | utf8 |
| character_sets_dir | /usr/share/mysql/charsets/ |
+--------------------------+----------------------------+
mysql> SHOW VARIABLES LIKE 'collation%';
+----------------------+-----------------+
| Variable_name | Value |
+----------------------+-----------------+
| collation_connection | utf8_general_ci |
| collation_database | utf8_general_ci |
| collation_server | utf8_general_ci |
+----------------------+-----------------+
mysql> show variables like "character_set_database";
+------------------------+-------+
| Variable_name | Value |
+------------------------+-------+
| character_set_database | utf8 |
+------------------------+-------+
1 row in set (0.00 sec)
mysql> show variables like "collation_database";
+--------------------+-----------------+
| Variable_name | Value |
+--------------------+-----------------+
| collation_database | utf8_general_ci |
+--------------------+-----------------+
1 row in set (0.00 sec)
These are the values of my database, but I still cannot make it right.
EDIT2:
<meta charset="utf-8">
...
$con = mysqli_connect('localhost', 'root', 'root00--', 'eicomnor_db');
$query = "SELECT * FROM table";
$result = mysqli_query($con, $query);
while ($row = mysqli_fetch_assoc($result)) {
echo "<tr>";
echo "<td>" . $row['id'] . "</td>";
echo "<td>" . $row['nome'] . "</td>";
echo "</tr>";
}
mysqli_close($con);
Here's the PHP code.
First off, don't try to modify your php files in the direction of ISO-8859-1, that's going backwards, and may lead to compatibility issues with browsers on down the line. Instead, you want to be following the path to utf-8 from the bottom up.
The
easiest thing to check is to make sure that you're serving your html as utf-8:
AddDefaultCharset utf-8 in your apache config may help with that,
and <meta charset="utf-8"> in your html header will as well.
The second thing to check is to make sure that the mysql connection & collation
uses utf-8:
http://dev.mysql.com/doc/refman/5.0/en/charset-connection.html or http://docs.moodle.org/23/en/Converting_your_MySQL_database_to_UTF8
The
final and most annoying step is to convert any data actually in the
database to utf-8. Back up your data with a standard mysql dump first! There are a few tricks to simplify this process by creating a dump of the database as utf-8 and then putting it back into the system with the right collation, but be aware that this is a delicate process and be sure you have a solid backup to work with first! http://docs.moodle.org/23/en/Converting_your_MySQL_database_to_UTF8 is a good guide to that process.
Good luck! charset issues with old databases are often more work than they initially appear.
Have you tried iconv? As you know that the charset used on the DB is ISO-8859-1, you can convert to your charset (I'm assuming UTF-8):
// Assuming that $text is the text coming from the DB
$text = iconv("ISO-8859-1", "UTF-8", $text)
Assuming you send the output to the browser, you need to ensure that the proper charset <meta charset="utf-8" /> is set and that you don't override it in your browser settings (check that it's either "auto" or "uft-8").
Include mysqli_set_charset($link, "utf8"); before the queries (only need to do once for each connection made) resolves the problem.

encoding Romanian characters in php

i have o problem encoding characters that look like this: ĂăÂâÎîȘșȚț
i am using the following mysql table:
CREATE TABLE `news` (
`NewsID` int(11) NOT NULL AUTO_INCREMENT,
`UserID` int(11) NOT NULL,
`Title` varchar(255) CHARACTER SET utf8 NOT NULL,
`Date` datetime NOT NULL,
PRIMARY KEY (`NewsID`),
FULLTEXT KEY `Title` (`Title`,`Content`)
) ENGINE=MyISAM AUTO_INCREMENT=1 DEFAULT CHARSET=utf8 COLLATE=utf8_bin
I try to insert the upper mentioned character sequence in the Title field by using the following code (runs on zend framework):
$params = $this->getRequest()->getParams();
$mysqli = new mysqli("localhost", "user", "pass", "database_name");
$mysqli->query("INSERT INTO `news` (`NewsID`, `Title`) VALUES (NULL, '".$params['text']."');");
And in the database i get for the field Title the following value: ÃãÂâÎîȘșȚț
Why are these characters html encoded? And why aren't the first characters encoded to their utf8_bin equivalent ?
Thanks.
In my case I just updated php db connection settings with the following line:
mysqli_set_charset( $con, 'utf8');
Also i added in html file meta http-equiv="content-type" content="text/html; charset=UTF-8" as #liyakat mentioned.
Old thread, but maybe someone needs to know this.
Be sure that your IDE or text editor is also set to use UTF-8 characters.
To set the default to UTF-8, you want to add the following to my.cnf
[client]
default-character-set=utf8
[mysqld]
default-character-set = utf8
Then, to verify:
mysql> show variables like "%character%";show variables like "%collation%";
+--------------------------+----------------------------+
| Variable_name | Value |
+--------------------------+----------------------------+
| character_set_client | utf8 |
| character_set_connection | utf8 |
| character_set_database | utf8 |
| character_set_filesystem | binary |
| character_set_results | utf8 |
| character_set_server | utf8 |
| character_set_system | utf8 |
| character_sets_dir | /usr/share/mysql/charsets/ |
+--------------------------+----------------------------+
8 rows in set (0.00 sec)
+----------------------+-----------------+
| Variable_name | Value |
+----------------------+-----------------+
| collation_connection | utf8_general_ci |
| collation_database | utf8_general_ci |
| collation_server | utf8_general_ci |
+----------------------+-----------------+
3 rows in set (0.00 sec)
OR TRY
Try setting the MySQL connection to UTF-8:
SET NAMES 'utf8'
And send explicit UTF-8 headers, just in case your server has some other default settings:
header('Content-type: text/html; charset=utf-8');

mysql delivering a 'Can't initialize character set utf-8 (path: /usr/share/mysql/charsets/)' error, no utf8.xml file there

I am on the Path of learning more about mysqli and all that exciting stuff but I get blocked quite soon.
I have a local server on my debian box. It is up to date, has php and mysql installed and running smoothly.
I was looking to learn a bit more on mysqli and as I tried the following code:
<?php
$db = new mysqli('localhost', 'userdb', 'pwuserdb', 'db');
if(!$db->set_charset('utf-8')) {
printf("Error setting the character set utf-8: %s\n", $db->error);
} else {
printf("Current character set is: %s\n", $db->character_set_name());
}
print_r($db->get_charset());
?>
I was, to my surprise, getting the following message, when visiting the page:
Error setting the character set utf-8: Can't initialize character set utf-8 (path: /usr/share/mysql/charsets/) stdClass Object ( [charset] => latin1 [collation] => latin1_swedish_ci [dir] => [min_length] => 1 [max_length] => 1 [number] => 8 [state] => 801 [comment] => cp1252 West European )
I thought to myself that it is logical as I didn't set up utf-8 as the standard charset of mysql so I completed with the following settings in the my.cnf file:
for [mysqld]
default-character-set=utf8
for [client]
default-character-set=utf8
I also logged into mysql from the command line and ran
ALTER DATABASE db CHARSET=utf8;
I also reloaded mysql from the command line, as well as apache.
When looking how things are going on in mysql, almost everything looks alright:
mysql> SHOW VARIABLES LIKE 'character_set%';
+--------------------------+----------------------------+
| Variable_name | Value |
+--------------------------+----------------------------+
| character_set_client | utf8 |
| character_set_connection | utf8 |
| character_set_database | utf8 |
| character_set_filesystem | binary |
| character_set_results | utf8 |
| character_set_server | utf8 |
| character_set_system | utf8 |
| character_sets_dir | /usr/share/mysql/charsets/ |
+--------------------------+----------------------------+
8 rows in set (0.00 sec)
But due to the fact that it seems like mysql cannot locate the utf8 file, I checked for a utf8.xml file in the /usr/share/mysql/charsets/ folder and there isn't one.
In the Index.xml file under this directory there is the mention of utf8, in the list of the charsets but I suppose that the problem comes from the fact that the xml file is missing in the directory.
Just for the information, my system locales are all UTF8 (en and pl) and I cannot understand why the utf8.xml file is not in the directory, as I haven't been goofying around with this directory or its content at all.
Any idea/ advice/ recommendation is welcome.
Thank you in advance!
Cheers!
did you try
if(!$db->set_charset('utf8')) {
without the dash?
since all your research on your system points to utf8 instead of utf-8 ;)

UTF8 issues PHP -> MySQL. Getting question marks in database?

OK, I am currently in PHP/MySQL/UTF-8/Unicode hell!
My environment:
MySQL: 5.1.53
Server characterset: latin1
Db characterset: latin1
Client characterset: latin1
Conn. characterset: latin1
PHP: 5.3.3
My PHP files are saved as UTF-8 format, not ASCII files.
In my PHP code when I make the database connection I do the following:
ini_set('default_charset', 'utf-8');
$my_db = mysql_connect(DEV_DB, DEV_USER, DEV_PASS);
mysql_select_db(MY_DB);
// I have tried both of the following utf8 connection functions
// mysql_query("SET NAMES 'utf8'", $my_db);
mysql_set_charset('utf8', $my_db);
// Detect if form value is not UTF-8
if (mb_detect_encoding($_POST['lang_desc']) == 'UTF-8') {
$lang_description = $_POST['lang_desc'];
} else {
$lang_description = utf8_encode($_POST['lang_desc']);
}
$language_sql = sprintf(
'INSERT INTO app_languages (language_id, app_id, description) VALUES (%d, %d, "%s")',
intval($lang_data['lang_id']),
intval($new_app_id),
mysql_real_escape_string($lang_description, $my_db)
);
The format/create of my MySQL database is:
CREATE TABLE IF NOT EXISTS app_languages (
language_id int(10) unsigned NOT NULL,
app_id int(10) unsigned NOT NULL,
description tinytext collate utf8_unicode_ci,
PRIMARY KEY (language_id,app_id)
) ENGINE=MyISAM DEFAULT CHARSET=utf8 COLLATE=utf8_unicode_ci;
The SQL statements that are generated from my PHP code look like this:
INSERT INTO app_languages (language_id, app_id, description) VALUES (91, 2055, "阿拉伯体育新闻和信息")
INSERT INTO app_languages (language_id, app_id, description) VALUES (26, 2055, "阿拉伯體育新聞和信息")
INSERT INTO app_languages (language_id, app_id, description) VALUES (56, 2055, "בערבית ספורט חדשות ומידע")
INSERT INTO app_languages (language_id, app_id, description) VALUES (69, 2055, "アラビア語のスポーツニュースと情報")
Yet, the output appears in my database as this:
| 69 | 2055 | ????????????????? |
| 56 | 2055 | ?????? ????? ????? ????? |
| 28 | 2055 | Arapski sportske vijesti i informacije |
| 42 | 2055 | Arabe des nouvelles sportives et d\'information |
| 91 | 2055 | ?????????? |
What am I doing wrong??
P.S. We can use Putty to SSH directly to the database server and via the command line Paste one of the unicode/multi-lingual insert statements. And they work successfully!?
Thanks for any light you can shed on this, it's driving me mad.
Cheers, Jason
try to execute the following query after you selected the db:
SET NAMES 'utf8'
this query should solve the problem with different charsets in your files and the db.
felix
The answer is right in your question. You're using latin1 throughout your database, and it can't handle unicode. You need to change those to UTF-8 as well.
//first make sure your file produce utf-8 chars
header('Content-Type: text/html; charset=utf-8');
mb_detect_encoding is quite useless unless you already know what you are dealing with. You probably should not rely on it unless you specify the second and third argument. Currently it probably does not return what you think it does.
I see that the words you saw it as ??????? are Arabic words.. which must have a collation
cp1256_general_ci
not
UTF-8_general_ci
change that, it may solve the problem.

Categories