PHP displaying Chinese characters: SET NAMES 'utf8' not working - php

I'm trying to work with a database that I have, but I can't display Chinese characters in it. The database was actually a MS Access file first, that I converted into mysql with a program. Anyway, many rows have Chinese characters in them and I can't get them to display properly in any browser.
I can display Chinese characters just fine otherwise, and I can also see them if I use phpmyadmin to look at the tables. I searched around for a solution to this problem and it seems to me that the usual fix is to do the "SET NAMES 'utf8'" query, but this only changed the displayed characters from question marks to other, weird, symbols.
If I look in phpmyadmin collation is utf8_general_ci for the database and all the tables.
Any ideas?

For MySQL DB, this solves the problem:
$dbh = mysql_connect($hostname, $username, $password);
mysql_select_db($db, $dbh);
mysql_set_charset('utf8', $dbh);
PDO solution:
$dbh = new PDO('mysql:host=$hostname;dbname=$db;charset=UTF-8', $username, $password);

You'd have to make sure of a few things:
Before import, the character set of the table you're going to use has to be set as utf8. You must also make sure the imported data actually contains proper utf8 encoded characters.
At the time of import you have to specify the character set the established session (e.g. by running SET NAMES utf8;)
After import, you should write a small script that reads a row that you know has special characters in it; the script must:
use header('Content-Type: text/plain; charset=utf-8'); or whichever mime type you wish to set
set the correct character set for the established MySQL connection (utf8)
If all goes well, it should display your data correctly.

Related

Characters getting encoded to �

I am using php + mysql to make a dynamic page. My db has “Make which is encoded to �Make in the web page. I though it to be an encoding issue so,I tried using <html lang='en' dir='ltr'> & <meta charset="utf-8" /> But that too didn't help
When dealing with any charset, it's important that you set everything to the same. You mentioned having set both PHP and HTML headers to UTF-8, which often does the trick, but it's also important that the database-connection, the actual database and it's tables are encoded with UTF-8 as well.
Connection
You also need to specify the charset in the connection itself.
PDO (specified in the object itself):
$handler = new PDO('mysql:host=localhost;dbname=database;charset=utf8', 'username', 'password', array(PDO::MYSQL_ATTR_INIT_COMMAND => "SET CHARACTER SET UTF8"));
MySQLi: (placed directly after creating the connection)
For OOP: $mysqli->set_charset("utf8");
For procedural: mysqli_set_charset($mysqli, "utf8");
(where $mysqli is the MySQLi connection)
MySQL (depricated, you should convert to PDO or MySQLi): (placed directly after creating the connection)
mysql_set_charset("utf8");
Database and tables
Your database and all its tables has to be set to UTF-8. Note that charset is not exactly the same as collation (see this post).
You can do that by running the queries below once for each database and tables (for example in phpMyAdmin)
ALTER DATABASE databasename CHARACTER SET utf8 COLLATE utf8_unicode_ci;
ALTER TABLE tablename CONVERT TO CHARACTER SET utf8 COLLATE utf8_unicode_ci;
File-encoding
It might also be needed for the file itself to be UTF-8 encoded. If you're using Notepad++ to write your code, this can be done in the "Format" drop-down on the taskbar (you should use Convert to..., as this won't mess your current file up) - but any decent IDE would have a similar option. You should use UTF-8 w/o BOM (see this StackOverflow question).
Other
It may be that you already have values in your database that are not encoded with UTF-8. Updating them manually could be a pain and could consume a lot of time. Should this be the case, you could use something like ForceUTF8 and loop through your databases, updating the fields with that function.
Should you follow all of the pointers above, chances are your problem will be solved. If not, you can take a look at this StackOverflow post: UTF-8 all the way through.
If the � is in your database column itself, change the original character to the following:
http://www.w3schools.com/charsets/ref_html_ansi.asp

storing arabic text in mysql using pdo in php

I'm working on arabic site and for that I want to store the arabic input in database. I've set the character set to utf8mb4_general_ci. When I'm printing the data before the insert query, then it is showing me correct arabic value. But when I am inserting it into db it is storing as اÙرÙاضâ. I am using PDO in PHP and I've also set the character set to utf 8 in connection string.
$this->pdo = new PDO($dsn, $this->settings["user"],
$this->settings["password"], array(PDO::MYSQL_ATTR_INIT_COMMAND => "SET NAMES utf8"));
But I am not able to store arabic character in my table.
When setting client charset, one have to make it match the actual data encoding.
So, if your input data is in utf-8, everything should work, but in this case why would you set database charset to utf8mb4, not utf8?
If your input data encoding is different from utf-8, then you have to set names to match this actual encoding.
Also, setting charset in PDO::MYSQL_ATTR_INIT_COMMAND is but a superstition. Although in most cases it plausible, better set it via DSN - it works for all the currently supported PHP versions. Note that encoding names are slightly different from commonly used.
Regarding strange characters you're observing - it's most likely no more than measurement error. The tool you are using to browse the database, have to both support that encoding and set up to display it properly.
All the above is based on the assumption that
I'd set the character set to utf8mb4_general_ci.
statement is about setting the table charset.

Doing INSERT using mysqli_query corrupts utf-8 encoding (Converts to question marks)

I am migrating my database using two connections (one to old and other to the new).
On both the connections I am executing:
mysqli_set_charset($connection, "utf8");
mysqli_query($connection, "SET NAMES utf8");
Also the PHP file has fileencoding set to utf-8 using the following commands in vim :
:set bomb
:set fileencoding=utf-8
:wq
Both the source and destination tables have respective fields set to "utf8_general_ci" collation.
The source is on MySql 5.0 and destination is on MySql 5.5
I am also using mysqli_real_escape_string on the content that is extracted from source.
The above steps don't seem to work as unicode data (in utf-8 encoding) gets converted to question marks in the destination. What is that I am doing wrong?
(Please also note that, I can not directly import the data in the destination as the table structure is changing.)
Both the source and destination tables have respective fields set to "utf8_general_ci" collation.
Looks like destination table doesn't
Found the problem:
Though the fields and the tables were set to "utf8_general_ci" collation, the Database itself was set to "latin1_swedish_ci" collation.
Changed the collation to "utf8_general_ci" and it fixed the problem.

Converting latin1_swedish_ci to utf8 with PHP

I have a database filled with values like ♥•â—♥ Dhaka ♥•â—♥ (Which should be ♥•●♥ Dhaka ♥•●♥) as I didnt specify the collation while creating the database.
Now I want to Fix it. I cannot fetch the data again from where I got it from at the first place. So I was thinking if it might be possible to fetch the data in a php script and convert it to the correct characters.
I've changed the collation of the database and the fields to utf8_general_ci..
The collation is NOT the same as the character set. The collation is only used for sorting and comparison of text (that's why there's a language term in there). The actual character set may be different.
The most common failure is not in the database but rather in the connection between PHP and MySQL. The default charset for the connection is usually ISO-8859-1. You need to change that the first thing you do after connecting, using either the SQL query SET NAMES 'utf-8'; or the mysql_set_charset function.
Also check the character set of your tables. This may be wrong as well if you have not specified UTF-8 to begin with (again: this is not the same as the collation). But make sure to take a backup before changing anything here. MySQL will try to convert the charset from the previous one, so you may need to reload the data from backup if you have actually saved UTF-8 data in ISO-8859-1 tables.
I would look into mb_detect_encoding() and mb_convert_encoding() and see if they can help you.

Double UTF-8 encoding, but why? PHP/MySQL

I have some forms, that insert some data into a MySQL database, and for some reason the characters get double utf-8 encoded. You don't see it on the front-end of my website, but in the back-end you do, if i look at the data from phpmyadmin, it's double encoded.
Also, to display data entered from phpmyadmin i have to utf8_encode it.
If i use uft8_decode() on my data before i put it into my database, it works, but then i'd have to use utf8_encode() again to display my data properly, and i would like to find a better solution that re-writing most of my code.
The characters i'm dealing with is the danish æ, ø and å characters.
I have every setting i can find in php.ini set to utf-8, every thing i can find in phpmyadmin to utf8, html meta tag set to utf-8, and still i have this error to deal with.
So my question is, does anyone know why this happens, or how i could fix it?..
Update: After running the mysql code Jako suggested, the data is properly encoded in the back-end of the database when it comes from the front-end, but i still need to run utf8_encode() to display the data properly on the front-end, any ideas?..
Update 2: Again, after running the code from the answer to this question i still had problems, the encoding was now on and off utf8, and i suspect phpmyadmin for resetting the encoding somehow. I found a new way of doing things, and it works flawlessly, described in my answer below...
I ran into similar issues in the past and this did the trick for me.
mysql_query("SET names UTF8");
Okay, i found the answer!! I searched a bit more, (have been doing so for hours) and found this question: Whether to use “SET NAMES”
Using the answer from that question i ran this query:
mysql_query("SET character_set_client = UTF8");
mysql_query("SET character_set_results = UTF8");
mysql_query("SET character_set_connection = UTF8");
mysql_query("SET names UTF8");
That's it, it works fine now on all my php scripts. Still thanks to Jako for leading me in the right way. ;)
Update: Okay, since the encoding was on and off and the settings didn't stay that way i found a new solution that works. I added mysql_set_charset() to my connection script:
mysql_set_charset("UTF8");
It gives me the right data from the database every time and inserts the right data as well, so this was only one line of code.

Categories