Error on accentuated characters with PHP and MySQL - php

My problem is that what is written directly via PHP is correctly accentuated, but when the accentuated word comes from the MySQL, the letters come like this �.
I tried using the html charset as ISO-8859-1 and it fixed the MySQL letters, but broke the others. One way to fix it all is to set my .php files to ISO-8859-1, but I can't do it, I need to use it in utf-8 encode.
What can I do?
At the moment solution: Include mysqli_set_charset($link, "utf8"); before the queries (only need to do once for each connection made). I'm still looking for a conclusive solution on the server, not on the client.
EDIT:
mysql> SHOW VARIABLES LIKE 'char%';
+--------------------------+----------------------------+
| Variable_name | Value |
+--------------------------+----------------------------+
| character_set_client | utf8 |
| character_set_connection | utf8 |
| character_set_database | utf8 |
| character_set_filesystem | binary |
| character_set_results | utf8 |
| character_set_server | utf8 |
| character_set_system | utf8 |
| character_sets_dir | /usr/share/mysql/charsets/ |
+--------------------------+----------------------------+
mysql> SHOW VARIABLES LIKE 'collation%';
+----------------------+-----------------+
| Variable_name | Value |
+----------------------+-----------------+
| collation_connection | utf8_general_ci |
| collation_database | utf8_general_ci |
| collation_server | utf8_general_ci |
+----------------------+-----------------+
mysql> show variables like "character_set_database";
+------------------------+-------+
| Variable_name | Value |
+------------------------+-------+
| character_set_database | utf8 |
+------------------------+-------+
1 row in set (0.00 sec)
mysql> show variables like "collation_database";
+--------------------+-----------------+
| Variable_name | Value |
+--------------------+-----------------+
| collation_database | utf8_general_ci |
+--------------------+-----------------+
1 row in set (0.00 sec)
These are the values of my database, but I still cannot make it right.
EDIT2:
<meta charset="utf-8">
...
$con = mysqli_connect('localhost', 'root', 'root00--', 'eicomnor_db');
$query = "SELECT * FROM table";
$result = mysqli_query($con, $query);
while ($row = mysqli_fetch_assoc($result)) {
echo "<tr>";
echo "<td>" . $row['id'] . "</td>";
echo "<td>" . $row['nome'] . "</td>";
echo "</tr>";
}
mysqli_close($con);
Here's the PHP code.

First off, don't try to modify your php files in the direction of ISO-8859-1, that's going backwards, and may lead to compatibility issues with browsers on down the line. Instead, you want to be following the path to utf-8 from the bottom up.
The
easiest thing to check is to make sure that you're serving your html as utf-8:
AddDefaultCharset utf-8 in your apache config may help with that,
and <meta charset="utf-8"> in your html header will as well.
The second thing to check is to make sure that the mysql connection & collation
uses utf-8:
http://dev.mysql.com/doc/refman/5.0/en/charset-connection.html or http://docs.moodle.org/23/en/Converting_your_MySQL_database_to_UTF8
The
final and most annoying step is to convert any data actually in the
database to utf-8. Back up your data with a standard mysql dump first! There are a few tricks to simplify this process by creating a dump of the database as utf-8 and then putting it back into the system with the right collation, but be aware that this is a delicate process and be sure you have a solid backup to work with first! http://docs.moodle.org/23/en/Converting_your_MySQL_database_to_UTF8 is a good guide to that process.
Good luck! charset issues with old databases are often more work than they initially appear.

Have you tried iconv? As you know that the charset used on the DB is ISO-8859-1, you can convert to your charset (I'm assuming UTF-8):
// Assuming that $text is the text coming from the DB
$text = iconv("ISO-8859-1", "UTF-8", $text)

Assuming you send the output to the browser, you need to ensure that the proper charset <meta charset="utf-8" /> is set and that you don't override it in your browser settings (check that it's either "auto" or "uft-8").

Include mysqli_set_charset($link, "utf8"); before the queries (only need to do once for each connection made) resolves the problem.

Related

Chinese characters in database prepared statement

I might have a simple problem of encoding but i can't figure it out.
I have addresses that can be in English or in Chinese in a MySQL database, so i used utf8_unicode_ci . I don't have problems retrieving my chinese characters from the database, but I can't use the chinese characters in a prepared request.
I explain:
If I type
$bdd= new PDO('mysql:host=localhost:3306; dbname=****;charset=utf8', 'root', '');
$list_business = $bdd->query('SELECT * FROM business WHERE address LIKE N\'台灣台南市\' ');
$nb_business=$list_business->rowCount();
I will get one result, because one of the addresses contains "台灣台南市"
But if I try to use a prepared request:
$list_business = $bdd->prepare('SELECT * FROM business WHERE address LIKE ? ');
$list_business->execute(array('%'.$_POST['address'].'%'));
$nb_business=$list_business->rowCount();
If $_POST['address'] is in English it works, in Chinese it doesn't :p
EDIT :
If i echo $_POST['address'] it shows the address in chinese that I input so that part is okay, although, if I echo the address from database it will look like this : "701\u53f0\u7063\u53f0\u5357\u5e02\u6771\u5340\u88d5\u8c50\u885775\u865f".
EDIT2:
When asking for show variables like 'char%'; I got this result
character_set_client utf8mb4
character_set_connection utf8mb4
character_set_database latin1
character_set_filesystem binary
character_set_results utf8mb4
character_set_server latin1
character_set_system utf8
character_sets_dir c:\wamp\bin\mysql\mysql5.6.17\share\charsets\
Please, help!
Thanks beforehand,
Q
have you set your language environment to "UTF-8"?
have you set your mysql character set to utf-8?
in mysql ,exec "show variables like '%char%'; it should return
+--------------------------+----------------------------+
| Variable_name | Value |
+--------------------------+----------------------------+
| character_set_client | utf8 |
| character_set_connection | utf8 |
| character_set_database | utf8 |
| character_set_filesystem | binary |
| character_set_results | utf8 |
| character_set_server | utf8 |
| character_set_system | utf8 |
| character_sets_dir | /usr/share/mysql/charsets/ |
+--------------------------+----------------------------+
try setting the character set of the page itself (if you haven't already).
header('Content-Type: text/html; charset=utf-8');
Turned out that the encoding was correct everywhere, except for the browser itself, on the form i was using to test my php file. I don't get why Google Chrome would encode it as European although i saved the html file as UTF-8.
Anyways, problem is solved. Thanks for your help, guys =)

encoding Romanian characters in php

i have o problem encoding characters that look like this: ĂăÂâÎîȘșȚț
i am using the following mysql table:
CREATE TABLE `news` (
`NewsID` int(11) NOT NULL AUTO_INCREMENT,
`UserID` int(11) NOT NULL,
`Title` varchar(255) CHARACTER SET utf8 NOT NULL,
`Date` datetime NOT NULL,
PRIMARY KEY (`NewsID`),
FULLTEXT KEY `Title` (`Title`,`Content`)
) ENGINE=MyISAM AUTO_INCREMENT=1 DEFAULT CHARSET=utf8 COLLATE=utf8_bin
I try to insert the upper mentioned character sequence in the Title field by using the following code (runs on zend framework):
$params = $this->getRequest()->getParams();
$mysqli = new mysqli("localhost", "user", "pass", "database_name");
$mysqli->query("INSERT INTO `news` (`NewsID`, `Title`) VALUES (NULL, '".$params['text']."');");
And in the database i get for the field Title the following value: ÃãÂâÎîȘșȚț
Why are these characters html encoded? And why aren't the first characters encoded to their utf8_bin equivalent ?
Thanks.
In my case I just updated php db connection settings with the following line:
mysqli_set_charset( $con, 'utf8');
Also i added in html file meta http-equiv="content-type" content="text/html; charset=UTF-8" as #liyakat mentioned.
Old thread, but maybe someone needs to know this.
Be sure that your IDE or text editor is also set to use UTF-8 characters.
To set the default to UTF-8, you want to add the following to my.cnf
[client]
default-character-set=utf8
[mysqld]
default-character-set = utf8
Then, to verify:
mysql> show variables like "%character%";show variables like "%collation%";
+--------------------------+----------------------------+
| Variable_name | Value |
+--------------------------+----------------------------+
| character_set_client | utf8 |
| character_set_connection | utf8 |
| character_set_database | utf8 |
| character_set_filesystem | binary |
| character_set_results | utf8 |
| character_set_server | utf8 |
| character_set_system | utf8 |
| character_sets_dir | /usr/share/mysql/charsets/ |
+--------------------------+----------------------------+
8 rows in set (0.00 sec)
+----------------------+-----------------+
| Variable_name | Value |
+----------------------+-----------------+
| collation_connection | utf8_general_ci |
| collation_database | utf8_general_ci |
| collation_server | utf8_general_ci |
+----------------------+-----------------+
3 rows in set (0.00 sec)
OR TRY
Try setting the MySQL connection to UTF-8:
SET NAMES 'utf8'
And send explicit UTF-8 headers, just in case your server has some other default settings:
header('Content-type: text/html; charset=utf-8');

mysql delivering a 'Can't initialize character set utf-8 (path: /usr/share/mysql/charsets/)' error, no utf8.xml file there

I am on the Path of learning more about mysqli and all that exciting stuff but I get blocked quite soon.
I have a local server on my debian box. It is up to date, has php and mysql installed and running smoothly.
I was looking to learn a bit more on mysqli and as I tried the following code:
<?php
$db = new mysqli('localhost', 'userdb', 'pwuserdb', 'db');
if(!$db->set_charset('utf-8')) {
printf("Error setting the character set utf-8: %s\n", $db->error);
} else {
printf("Current character set is: %s\n", $db->character_set_name());
}
print_r($db->get_charset());
?>
I was, to my surprise, getting the following message, when visiting the page:
Error setting the character set utf-8: Can't initialize character set utf-8 (path: /usr/share/mysql/charsets/) stdClass Object ( [charset] => latin1 [collation] => latin1_swedish_ci [dir] => [min_length] => 1 [max_length] => 1 [number] => 8 [state] => 801 [comment] => cp1252 West European )
I thought to myself that it is logical as I didn't set up utf-8 as the standard charset of mysql so I completed with the following settings in the my.cnf file:
for [mysqld]
default-character-set=utf8
for [client]
default-character-set=utf8
I also logged into mysql from the command line and ran
ALTER DATABASE db CHARSET=utf8;
I also reloaded mysql from the command line, as well as apache.
When looking how things are going on in mysql, almost everything looks alright:
mysql> SHOW VARIABLES LIKE 'character_set%';
+--------------------------+----------------------------+
| Variable_name | Value |
+--------------------------+----------------------------+
| character_set_client | utf8 |
| character_set_connection | utf8 |
| character_set_database | utf8 |
| character_set_filesystem | binary |
| character_set_results | utf8 |
| character_set_server | utf8 |
| character_set_system | utf8 |
| character_sets_dir | /usr/share/mysql/charsets/ |
+--------------------------+----------------------------+
8 rows in set (0.00 sec)
But due to the fact that it seems like mysql cannot locate the utf8 file, I checked for a utf8.xml file in the /usr/share/mysql/charsets/ folder and there isn't one.
In the Index.xml file under this directory there is the mention of utf8, in the list of the charsets but I suppose that the problem comes from the fact that the xml file is missing in the directory.
Just for the information, my system locales are all UTF8 (en and pl) and I cannot understand why the utf8.xml file is not in the directory, as I haven't been goofying around with this directory or its content at all.
Any idea/ advice/ recommendation is welcome.
Thank you in advance!
Cheers!
did you try
if(!$db->set_charset('utf8')) {
without the dash?
since all your research on your system points to utf8 instead of utf-8 ;)

Utf-8 characters displayed as ISO-8859-1

I've got an issue with inserting/reading utf8 content from a db. All verifications I'm doing seem to point to the fact that the content in my DB should be utf8 encoded, however it seems to be latin encoded. The data are initially imported from a PHP script from the CLI.
Configuration:
Zend Framework Version: 1.10.5
mysql-server-5.0: 5.0.51a-3ubuntu5.7
php5-mysql: 5.2.4-2ubuntu5.10
apache2: 2.2.8-1ubuntu0.16
libapache2-mod-php5: 5.2.4-2ubuntu5.10
Vertifications:
-mysql:
mysql> SHOW VARIABLES LIKE 'character_set%';
+--------------------------+----------------------------+
| Variable_name | Value |
+--------------------------+----------------------------+
| character_set_client | utf8 |
| character_set_connection | utf8 |
| character_set_database | utf8 |
| character_set_filesystem | binary |
| character_set_results | utf8 |
| character_set_server | utf8 |
| character_set_system | utf8 |
| character_sets_dir | /usr/share/mysql/charsets/ |
+--------------------------+----------------------------+
8 rows in set (0.00 sec)
mysql> SHOW VARIABLES LIKE 'collation%';
+----------------------+-----------------+
| Variable_name | Value |
+----------------------+-----------------+
| collation_connection | utf8_general_ci |
| collation_database | utf8_bin |
| collation_server | utf8_general_ci |
+----------------------+-----------------+
-database
created with
CREATE DATABASE mydb CHARACTER SET utf8 COLLATE utf8_bin;
CREATE SCHEMA `mydb` DEFAULT CHARACTER SET utf8 COLLATE utf8_bin ;
mysql> status;
--------------
mysql Ver 14.12 Distrib 5.0.51a, for debian-linux-gnu (i486) using readline 5.2
Connection id: 7
Current database: mydb
Current user: root#localhost
SSL: Not in use
Current pager: stdout
Using outfile: ''
Using delimiter: ;
Server version: 5.0.51a-3ubuntu5.7-log (Ubuntu)
Protocol version: 10
Connection: Localhost via UNIX socket
Server characterset: utf8
Db characterset: utf8
Client characterset: utf8
Conn. characterset: utf8
UNIX socket: /var/run/mysqld/mysqld.sock
Uptime: 9 min 45 sec
-sql: before doing my inserts I run the
SET names 'utf8';
-php: before doing my inserts I use utf8_encode() and mb_detect_encoding() which gives me 'UTF-8'. After retrieveing the content from db and before sending it to the user mb_detect_encoding() also gives 'UTF-8'
Validation test:
the only way for me to have the content displayed properly is to set the content type to latin (If I sniff the traffic I can see the content-type header with ISO-8859-1):
ini_set('default_charset', 'ISO-8859-1');
This test shows that the content comes out as latin. I don't understand why.
Does anybody have any idea?
Thanks.
Well, I've found that SET NAMES isn't really all that great. Take a peak at the docs...
What I typically do is execute 4 queries:
SET CHARACTER SET 'UTF8';
SET character_set_database = 'UTF8';
SET character_set_connection = 'UTF8';
SET character_set_server = 'UTF8';
Give that a shot and see if that does it for you...
Oh, and remember, all UTF-8 characters <= 127 are valid ISO-8859-1 characters as well. So if you only have characters <= 127 in the stream, mb_detect_encoding will fall on the higher prevalence charset (which is by default "UTF-8")...
What are you doing before retrieval? Also a 'SET NAMES utf8;'? Otherwise, MySQL will silently convert to the charset the connection indicates as used.
If not even that, what does a SHOW FULL COLUMNS FROM table; show? Having a table with a default charset does not mean the column is. i.e, this is valid:
.
CREATE TABLE test (
`name` varchar(10) character set latin1
) CHARSET=utf8

MySQL or PHP is appending a  whenever the £ is used

Answers provided have all been great, I mentioned in the comments of Alnitak's answer that I would need to go take a look at my CSV Generation script because for whatever reason it wasn't outputting UTF-8.
As was correctly pointed out, it WAS outputting UTF-8 - the problem existed with Ye Olde Microsoft Excel which wasn't picking up the encoding the way I would have liked.
My existing CSV generation looked something like:
// Create file and exit;
$filename = $file."_".date("Y-m-d_H-i",time());
header("Content-type: application/vnd.ms-excel");
header("Content-disposition: csv" . date("Y-m-d") . ".csv");
header( "Content-disposition: filename=".$filename.".csv");
echo $csv_output;
It now looks like:
// Create file and exit;
$filename = $file."_".date("Y-m-d_H-i",time());
header("Content-type: text/csv; charset=ISO-8859-1");
header("Content-disposition: csv" . date("Y-m-d") . ".csv");
header("Content-disposition: filename=".$filename.".csv");
echo iconv('UTF-8', 'ISO-8859-1', $csv_output);
-------------------------------------------------------
ORIGINAL QUESTION
Hi,
I've got a form which collects data, form works ok but I've just noticed that if someone types or uses a '£' symbol, the MySQL DB ends up with '£'.
Not really sure where or how to stop this from happening, code and DB information to follow:
MySQL details
mysql> SHOW COLUMNS FROM fraud_report;
+--------------+--------------+------+-----+---------+----------------+
| Field | Type | Null | Key | Default | Extra |
+--------------+--------------+------+-----+---------+----------------+
| id | mediumint(9) | | PRI | NULL | auto_increment |
| crm_number | varchar(32) | YES | | NULL | |
| datacash_ref | varchar(32) | YES | | NULL | |
| amount | varchar(32) | YES | | NULL | |
| sales_date | varchar(32) | YES | | NULL | |
| domain | varchar(32) | YES | | NULL | |
| date_added | datetime | YES | | NULL | |
| agent_added | varchar(32) | YES | | NULL | |
+--------------+--------------+------+-----+---------+----------------+
8 rows in set (0.03 sec)
PHP Function
function processFraudForm($crm_number, $datacash_ref, $amount, $sales_date, $domain, $agent_added) {
// Insert Data to DB
$sql = "INSERT INTO fraud_report (id, crm_number, datacash_ref, amount, sales_date, domain, date_added, agent_added) VALUES (NULL, '$crm_number', '$datacash_ref', '$amount', '$sales_date', '$domain', NOW(), '$agent_added')";
$result = mysql_query($sql) or die (mysql_error());
if ($result) {
$outcome = "<div id=\"success\">Emails sent and database updated.</div>";
} else {
$outcome = "<div id=\"error\">Something went wrong!</div>";
}
return $outcome;
}
Example DB Entry
+----+------------+--------------+---------+------------+--------------------+---------------------+------------------+
| id | crm_number | datacash_ref | amount | sales_date | domain | date_added | agent_added |
+----+------------+--------------+---------+------------+--------------------+---------------------+------------------+
| 13 | 100xxxxxxx | 10000000 | £10.93 | 18/12/08 | blargh.com | 2008-12-22 10:53:53 | agent.name |
What you're seeing is UTF-8 encoding - it's a way of storing Unicode characters in a relatively compact format.
The pound symbol has value 0x00a3 in Unicode, but when it's written in UTF-8 that becomes 0xc2 0xa3 and that's what's stored in the database. It seems that your database table is already set to use UTF-8 encoding. This is a good thing!
If you pull the value back out from the database and display it on a UTF-8 compatible terminal (or on a web page that's declared as being UTF-8 encoded) it will look like a normal pound sign again.
£ is 0xC2 0xA3 which is the UTF-8 encoding for £ symbol - so you're storing it as UTF-8, but presumably viewing it as Latin-1 or something other than UTF-8
It's useful to know how to spot and decode UTF-8 by hand - check the wikipedia page for info on how the encoding works:
0xC2A3 = 110 00010 10 100011
The bold parts are the actual
"payload", which gives 10100011,
which is 0xA3, the pound symbol.
In PHP, another small scale solution is to do a string conversion on the returned utf8 string:
print iconv('UTF-8', 'ASCII//TRANSLIT', "Mystring â"); //"Mystring "
Or in other platforms fire a system call to the inconv command (linux / osx)
http://php.net/manual/en/function.iconv.php#83238
You need to serve your HTML in utf-8 encoding (actually everyone needs to do this I think!)
Header like:
Content-Type: text/html; charset=UTF-8
Or the equivalent. Double check the details though. Should always be declaring the charset as a browser can default to anything it likes.
To remove a  use:
$column = str_replace("\xc2\xa0", '', $column);
Credits among others: How to remove all occurrences of c2a0 in a string with PHP?
Thanks a lot. I had been suspecting mysql for being currupting the pound symbol. Now all i need to do is wherever the csv record is generated, just use wrap them incov funciton. Though this is a good job, I am happy, at least someone showed exactly what to do. I sincerly appreciate dislaying the previous and the new 'header' values. It was a great help to me.
-mark
If you save line "The £50,000 Development Challenge" in two different data type column i.e. "varchar" & "text" field.
Before i save i have replaced the symbol with html equi value using following function.
str_replace("£", "£", $title);
You will find that value stored in text fields is &pound where as in varchar its "£".

Categories