Chinese characters in database prepared statement - php

I might have a simple problem of encoding but i can't figure it out.
I have addresses that can be in English or in Chinese in a MySQL database, so i used utf8_unicode_ci . I don't have problems retrieving my chinese characters from the database, but I can't use the chinese characters in a prepared request.
I explain:
If I type
$bdd= new PDO('mysql:host=localhost:3306; dbname=****;charset=utf8', 'root', '');
$list_business = $bdd->query('SELECT * FROM business WHERE address LIKE N\'台灣台南市\' ');
$nb_business=$list_business->rowCount();
I will get one result, because one of the addresses contains "台灣台南市"
But if I try to use a prepared request:
$list_business = $bdd->prepare('SELECT * FROM business WHERE address LIKE ? ');
$list_business->execute(array('%'.$_POST['address'].'%'));
$nb_business=$list_business->rowCount();
If $_POST['address'] is in English it works, in Chinese it doesn't :p
EDIT :
If i echo $_POST['address'] it shows the address in chinese that I input so that part is okay, although, if I echo the address from database it will look like this : "701\u53f0\u7063\u53f0\u5357\u5e02\u6771\u5340\u88d5\u8c50\u885775\u865f".
EDIT2:
When asking for show variables like 'char%'; I got this result
character_set_client utf8mb4
character_set_connection utf8mb4
character_set_database latin1
character_set_filesystem binary
character_set_results utf8mb4
character_set_server latin1
character_set_system utf8
character_sets_dir c:\wamp\bin\mysql\mysql5.6.17\share\charsets\
Please, help!
Thanks beforehand,
Q

have you set your language environment to "UTF-8"?
have you set your mysql character set to utf-8?
in mysql ,exec "show variables like '%char%'; it should return
+--------------------------+----------------------------+
| Variable_name | Value |
+--------------------------+----------------------------+
| character_set_client | utf8 |
| character_set_connection | utf8 |
| character_set_database | utf8 |
| character_set_filesystem | binary |
| character_set_results | utf8 |
| character_set_server | utf8 |
| character_set_system | utf8 |
| character_sets_dir | /usr/share/mysql/charsets/ |
+--------------------------+----------------------------+

try setting the character set of the page itself (if you haven't already).
header('Content-Type: text/html; charset=utf-8');

Turned out that the encoding was correct everywhere, except for the browser itself, on the form i was using to test my php file. I don't get why Google Chrome would encode it as European although i saved the html file as UTF-8.
Anyways, problem is solved. Thanks for your help, guys =)

Related

Error on accentuated characters with PHP and MySQL

My problem is that what is written directly via PHP is correctly accentuated, but when the accentuated word comes from the MySQL, the letters come like this �.
I tried using the html charset as ISO-8859-1 and it fixed the MySQL letters, but broke the others. One way to fix it all is to set my .php files to ISO-8859-1, but I can't do it, I need to use it in utf-8 encode.
What can I do?
At the moment solution: Include mysqli_set_charset($link, "utf8"); before the queries (only need to do once for each connection made). I'm still looking for a conclusive solution on the server, not on the client.
EDIT:
mysql> SHOW VARIABLES LIKE 'char%';
+--------------------------+----------------------------+
| Variable_name | Value |
+--------------------------+----------------------------+
| character_set_client | utf8 |
| character_set_connection | utf8 |
| character_set_database | utf8 |
| character_set_filesystem | binary |
| character_set_results | utf8 |
| character_set_server | utf8 |
| character_set_system | utf8 |
| character_sets_dir | /usr/share/mysql/charsets/ |
+--------------------------+----------------------------+
mysql> SHOW VARIABLES LIKE 'collation%';
+----------------------+-----------------+
| Variable_name | Value |
+----------------------+-----------------+
| collation_connection | utf8_general_ci |
| collation_database | utf8_general_ci |
| collation_server | utf8_general_ci |
+----------------------+-----------------+
mysql> show variables like "character_set_database";
+------------------------+-------+
| Variable_name | Value |
+------------------------+-------+
| character_set_database | utf8 |
+------------------------+-------+
1 row in set (0.00 sec)
mysql> show variables like "collation_database";
+--------------------+-----------------+
| Variable_name | Value |
+--------------------+-----------------+
| collation_database | utf8_general_ci |
+--------------------+-----------------+
1 row in set (0.00 sec)
These are the values of my database, but I still cannot make it right.
EDIT2:
<meta charset="utf-8">
...
$con = mysqli_connect('localhost', 'root', 'root00--', 'eicomnor_db');
$query = "SELECT * FROM table";
$result = mysqli_query($con, $query);
while ($row = mysqli_fetch_assoc($result)) {
echo "<tr>";
echo "<td>" . $row['id'] . "</td>";
echo "<td>" . $row['nome'] . "</td>";
echo "</tr>";
}
mysqli_close($con);
Here's the PHP code.
First off, don't try to modify your php files in the direction of ISO-8859-1, that's going backwards, and may lead to compatibility issues with browsers on down the line. Instead, you want to be following the path to utf-8 from the bottom up.
The
easiest thing to check is to make sure that you're serving your html as utf-8:
AddDefaultCharset utf-8 in your apache config may help with that,
and <meta charset="utf-8"> in your html header will as well.
The second thing to check is to make sure that the mysql connection & collation
uses utf-8:
http://dev.mysql.com/doc/refman/5.0/en/charset-connection.html or http://docs.moodle.org/23/en/Converting_your_MySQL_database_to_UTF8
The
final and most annoying step is to convert any data actually in the
database to utf-8. Back up your data with a standard mysql dump first! There are a few tricks to simplify this process by creating a dump of the database as utf-8 and then putting it back into the system with the right collation, but be aware that this is a delicate process and be sure you have a solid backup to work with first! http://docs.moodle.org/23/en/Converting_your_MySQL_database_to_UTF8 is a good guide to that process.
Good luck! charset issues with old databases are often more work than they initially appear.
Have you tried iconv? As you know that the charset used on the DB is ISO-8859-1, you can convert to your charset (I'm assuming UTF-8):
// Assuming that $text is the text coming from the DB
$text = iconv("ISO-8859-1", "UTF-8", $text)
Assuming you send the output to the browser, you need to ensure that the proper charset <meta charset="utf-8" /> is set and that you don't override it in your browser settings (check that it's either "auto" or "uft-8").
Include mysqli_set_charset($link, "utf8"); before the queries (only need to do once for each connection made) resolves the problem.

encoding Romanian characters in php

i have o problem encoding characters that look like this: ĂăÂâÎîȘșȚț
i am using the following mysql table:
CREATE TABLE `news` (
`NewsID` int(11) NOT NULL AUTO_INCREMENT,
`UserID` int(11) NOT NULL,
`Title` varchar(255) CHARACTER SET utf8 NOT NULL,
`Date` datetime NOT NULL,
PRIMARY KEY (`NewsID`),
FULLTEXT KEY `Title` (`Title`,`Content`)
) ENGINE=MyISAM AUTO_INCREMENT=1 DEFAULT CHARSET=utf8 COLLATE=utf8_bin
I try to insert the upper mentioned character sequence in the Title field by using the following code (runs on zend framework):
$params = $this->getRequest()->getParams();
$mysqli = new mysqli("localhost", "user", "pass", "database_name");
$mysqli->query("INSERT INTO `news` (`NewsID`, `Title`) VALUES (NULL, '".$params['text']."');");
And in the database i get for the field Title the following value: ÃãÂâÎîȘșȚț
Why are these characters html encoded? And why aren't the first characters encoded to their utf8_bin equivalent ?
Thanks.
In my case I just updated php db connection settings with the following line:
mysqli_set_charset( $con, 'utf8');
Also i added in html file meta http-equiv="content-type" content="text/html; charset=UTF-8" as #liyakat mentioned.
Old thread, but maybe someone needs to know this.
Be sure that your IDE or text editor is also set to use UTF-8 characters.
To set the default to UTF-8, you want to add the following to my.cnf
[client]
default-character-set=utf8
[mysqld]
default-character-set = utf8
Then, to verify:
mysql> show variables like "%character%";show variables like "%collation%";
+--------------------------+----------------------------+
| Variable_name | Value |
+--------------------------+----------------------------+
| character_set_client | utf8 |
| character_set_connection | utf8 |
| character_set_database | utf8 |
| character_set_filesystem | binary |
| character_set_results | utf8 |
| character_set_server | utf8 |
| character_set_system | utf8 |
| character_sets_dir | /usr/share/mysql/charsets/ |
+--------------------------+----------------------------+
8 rows in set (0.00 sec)
+----------------------+-----------------+
| Variable_name | Value |
+----------------------+-----------------+
| collation_connection | utf8_general_ci |
| collation_database | utf8_general_ci |
| collation_server | utf8_general_ci |
+----------------------+-----------------+
3 rows in set (0.00 sec)
OR TRY
Try setting the MySQL connection to UTF-8:
SET NAMES 'utf8'
And send explicit UTF-8 headers, just in case your server has some other default settings:
header('Content-type: text/html; charset=utf-8');

mysql delivering a 'Can't initialize character set utf-8 (path: /usr/share/mysql/charsets/)' error, no utf8.xml file there

I am on the Path of learning more about mysqli and all that exciting stuff but I get blocked quite soon.
I have a local server on my debian box. It is up to date, has php and mysql installed and running smoothly.
I was looking to learn a bit more on mysqli and as I tried the following code:
<?php
$db = new mysqli('localhost', 'userdb', 'pwuserdb', 'db');
if(!$db->set_charset('utf-8')) {
printf("Error setting the character set utf-8: %s\n", $db->error);
} else {
printf("Current character set is: %s\n", $db->character_set_name());
}
print_r($db->get_charset());
?>
I was, to my surprise, getting the following message, when visiting the page:
Error setting the character set utf-8: Can't initialize character set utf-8 (path: /usr/share/mysql/charsets/) stdClass Object ( [charset] => latin1 [collation] => latin1_swedish_ci [dir] => [min_length] => 1 [max_length] => 1 [number] => 8 [state] => 801 [comment] => cp1252 West European )
I thought to myself that it is logical as I didn't set up utf-8 as the standard charset of mysql so I completed with the following settings in the my.cnf file:
for [mysqld]
default-character-set=utf8
for [client]
default-character-set=utf8
I also logged into mysql from the command line and ran
ALTER DATABASE db CHARSET=utf8;
I also reloaded mysql from the command line, as well as apache.
When looking how things are going on in mysql, almost everything looks alright:
mysql> SHOW VARIABLES LIKE 'character_set%';
+--------------------------+----------------------------+
| Variable_name | Value |
+--------------------------+----------------------------+
| character_set_client | utf8 |
| character_set_connection | utf8 |
| character_set_database | utf8 |
| character_set_filesystem | binary |
| character_set_results | utf8 |
| character_set_server | utf8 |
| character_set_system | utf8 |
| character_sets_dir | /usr/share/mysql/charsets/ |
+--------------------------+----------------------------+
8 rows in set (0.00 sec)
But due to the fact that it seems like mysql cannot locate the utf8 file, I checked for a utf8.xml file in the /usr/share/mysql/charsets/ folder and there isn't one.
In the Index.xml file under this directory there is the mention of utf8, in the list of the charsets but I suppose that the problem comes from the fact that the xml file is missing in the directory.
Just for the information, my system locales are all UTF8 (en and pl) and I cannot understand why the utf8.xml file is not in the directory, as I haven't been goofying around with this directory or its content at all.
Any idea/ advice/ recommendation is welcome.
Thank you in advance!
Cheers!
did you try
if(!$db->set_charset('utf8')) {
without the dash?
since all your research on your system points to utf8 instead of utf-8 ;)

Utf-8 characters displayed as ISO-8859-1

I've got an issue with inserting/reading utf8 content from a db. All verifications I'm doing seem to point to the fact that the content in my DB should be utf8 encoded, however it seems to be latin encoded. The data are initially imported from a PHP script from the CLI.
Configuration:
Zend Framework Version: 1.10.5
mysql-server-5.0: 5.0.51a-3ubuntu5.7
php5-mysql: 5.2.4-2ubuntu5.10
apache2: 2.2.8-1ubuntu0.16
libapache2-mod-php5: 5.2.4-2ubuntu5.10
Vertifications:
-mysql:
mysql> SHOW VARIABLES LIKE 'character_set%';
+--------------------------+----------------------------+
| Variable_name | Value |
+--------------------------+----------------------------+
| character_set_client | utf8 |
| character_set_connection | utf8 |
| character_set_database | utf8 |
| character_set_filesystem | binary |
| character_set_results | utf8 |
| character_set_server | utf8 |
| character_set_system | utf8 |
| character_sets_dir | /usr/share/mysql/charsets/ |
+--------------------------+----------------------------+
8 rows in set (0.00 sec)
mysql> SHOW VARIABLES LIKE 'collation%';
+----------------------+-----------------+
| Variable_name | Value |
+----------------------+-----------------+
| collation_connection | utf8_general_ci |
| collation_database | utf8_bin |
| collation_server | utf8_general_ci |
+----------------------+-----------------+
-database
created with
CREATE DATABASE mydb CHARACTER SET utf8 COLLATE utf8_bin;
CREATE SCHEMA `mydb` DEFAULT CHARACTER SET utf8 COLLATE utf8_bin ;
mysql> status;
--------------
mysql Ver 14.12 Distrib 5.0.51a, for debian-linux-gnu (i486) using readline 5.2
Connection id: 7
Current database: mydb
Current user: root#localhost
SSL: Not in use
Current pager: stdout
Using outfile: ''
Using delimiter: ;
Server version: 5.0.51a-3ubuntu5.7-log (Ubuntu)
Protocol version: 10
Connection: Localhost via UNIX socket
Server characterset: utf8
Db characterset: utf8
Client characterset: utf8
Conn. characterset: utf8
UNIX socket: /var/run/mysqld/mysqld.sock
Uptime: 9 min 45 sec
-sql: before doing my inserts I run the
SET names 'utf8';
-php: before doing my inserts I use utf8_encode() and mb_detect_encoding() which gives me 'UTF-8'. After retrieveing the content from db and before sending it to the user mb_detect_encoding() also gives 'UTF-8'
Validation test:
the only way for me to have the content displayed properly is to set the content type to latin (If I sniff the traffic I can see the content-type header with ISO-8859-1):
ini_set('default_charset', 'ISO-8859-1');
This test shows that the content comes out as latin. I don't understand why.
Does anybody have any idea?
Thanks.
Well, I've found that SET NAMES isn't really all that great. Take a peak at the docs...
What I typically do is execute 4 queries:
SET CHARACTER SET 'UTF8';
SET character_set_database = 'UTF8';
SET character_set_connection = 'UTF8';
SET character_set_server = 'UTF8';
Give that a shot and see if that does it for you...
Oh, and remember, all UTF-8 characters <= 127 are valid ISO-8859-1 characters as well. So if you only have characters <= 127 in the stream, mb_detect_encoding will fall on the higher prevalence charset (which is by default "UTF-8")...
What are you doing before retrieval? Also a 'SET NAMES utf8;'? Otherwise, MySQL will silently convert to the charset the connection indicates as used.
If not even that, what does a SHOW FULL COLUMNS FROM table; show? Having a table with a default charset does not mean the column is. i.e, this is valid:
.
CREATE TABLE test (
`name` varchar(10) character set latin1
) CHARSET=utf8

UTF-8, PHP and XML Mysql

I am having great problems solving this one:
I have a mysql database encoding latin1_swedish_ci and a table that stores names and addresses.
I am trying to output a UTF-8 XML file, but I am having problems with the following string:
Otivägen it is being outputted as Otivägen when i vim the file. Also when opened it IE i get
"An invalid character was found in text content. Error processing resource"
I have the following code:
function fixEncoding($in_str)
{
$cur_encoding = mb_detect_encoding($in_str) ;
if($cur_encoding == "UTF-8" && mb_check_encoding($in_str,"UTF-8"))
return $in_str;
else
return utf8_encode($in_str);
}
header("Content-type: text/plain;charset=utf-8");
$mystring = "Otivägen" // this is actually obtained from database;
$myxml = "<myxml>
....
<node>".$mystring."</node>
....
</myxml>
";
$myxml = fixEncoding($myxml);
The actual XML output is below:
<?xml version="1.0" encoding="UTF-8" ?>
<myxml>
....
<node>Otivägen</node>
....
</myxml>
Any ideas how I can output the file so in vim the file reads Otivägen and not Otivägen?
EDIT:
I did mysql_client_encoding() and got latin1
I then did mysql_set_charset()
and again ran mysql_client_encoding() and got utf8, but still the same outputting issues.
Edit 2
I have logged into the command line and run the query SELECT address1 FROM address WHERE id = 1000;
SELECT address1 FROM address WHERE id = 1000;
Current database: ftpuser_db
+-------------+
| address1 |
+-------------+
| Otivägen 32 |
+-------------+
1 row in set (0.06 sec)
Thanks in advance!
I think you did everything correctly, except that your terminal is in Latin-1.
The UTF-8 sequence for ä is C3 A4, which is ä if displayed as Latin-1.
Is your MySQL connection encoding properly set to UTF-8 ?
Check mysql_set_charset() and mysql_client_encoding() for more details.
Oh boy. UTF8 issues can be a real pain and they get almost impossible to solve when something is doing re-encodings for you.
You really need to start at one end and make sure every process is UTF8. That will remove things in the process from interpreting the data wrong and 'converting' it for you. But significantly, it will also let you much more easily spot when something has already mis-encoded text for you (yes, I've had that problem).
And if you have UTF8 data in tables that aren't set to UTF8 and might be mis-encoded, you need to do the tables last, after the data has been re-encoded. Otherwise you will damage your data irretrievably. I've had that problem, too.
First steps:
Check your terminal is UTF8 compliant. Gnome-terminal is. Kterm is. ETerm is not.
Check your LANG setting in your shell. It should probably have .UTF-8 on the end of it's value.
Check that vim is picking up the UTF8 setting correctly. You can check with :set encoding
This will mean that your files will be edited in UTF8.
Now we check MySQL.
In the MySQL CLI, do show variables like 'character_set%';. The results will probably be something like:
+--------------------------+----------------------------+
| Variable_name | Value |
+--------------------------+----------------------------+
| character_set_client | latin1 |
| character_set_connection | latin1 |
| character_set_database | latin1 |
| character_set_filesystem | binary |
| character_set_results | latin1 |
| character_set_server | latin1 |
| character_set_system | utf8 |
| character_sets_dir | /usr/share/mysql/charsets/ |
+--------------------------+----------------------------+
What you're aiming for is to change all those latin1 values (or whatever you're seeing) to utf8.
set names utf8; will change most of them and you might need to do that with every new connection in your database. This was the solution I had to adopt in a previous application. The other settings to change are in the my.cnf file for which I need to direct you to the documentation. It is unlikely you will need to set them all.
I see you're already setting the output headers, so that's good.
Now you can look at the data from the database and see why it's "wrong".
latin1_swedish_ci is a collation, not a charset. Since collations are supposed to match their charset, it suggests that the table is using latin1, but it's not a guarantee.
Strictly speaking, the charset of tables is irrelevant here, since MySql can convert input/output. That's what the connection charset (mysql_set_charset) is for. However, for that to work properly, the data needs to be encoded properly in the database. I would begin by checking that strings are correct in the database. Simplest thing is to log in on the command line and select a row which has non-ascii characters in it. Does it look OK?
$mystring = "Otivägen" // this is actually obtained from database;
Watch out. The encoding of the data in $mystring will now depend on the encoding of the php file. That may or may not be the same as the data in the database.
before output run query SET NAMES utf8
after output you can go back and run SET NAMES latin1
Look here, I've got the same problem
It seems you are "double encoding" Otivägen. You get this behaviour if Otivägen already is UTF-8, and run utf8_encode() on it again. Example:
$str = "Otivägen"; // already an UTF-8 string
echo utf8_encode($str); // outputs Otivägen
I'm not sure we're the actual "double encoding" occurs, but it may be due to settings in your editor. My theory. Lets say you are running Aptana Studio: Your actual character set is set to ISO-8859-1 (in Aptana, you can check this by right clicking on a file and choose "properties". To set default character encoding for all projects, choose Preferences from Aptana main menu -> General -> workspace). If that's the case, the actual PHP source file where you have $myxml and its string <myxml><node>... is detected to be ISO-8859-1, but $mystring received from the database is UTF-8. Your fixEncoding function would then run the else clause, since the $myxml as a whole is seen as ISO-8859-1 and not UTF-8. This results in double encoding the results from the database, and may be the cause to your problem.
Check the encoding of your actual source file in your editor, and verify that it is set to UTF-8. Alternatively, experiment with applying or removing fixEncoding/utf8_encode/utf8_decode to $myxml. Observe the results and see what needs to be done to the value Otivägen right.

Categories