Issue with Chinese characters, PHP and MySQL - php

I have a MySQL database with the following settings:
MySQL charset: UTF-8 Unicode (utf8)
MySQL connection collation: utf8_unicode_ci
I have a table with a column named "softtitle", this column is coded in utf8_general_ci. The entries of this column contain Chinese characters. If I run SQL through the PHPMyAdmin Control pane, the Chinese characters are shown correctly. But if I run SQL through a PHP file, all Chinese characters are shown wrongly. Here is the PHP file:
<?php
header("Content-Type: text/html; charset=utf8_general_ci");
mysql_connect("116.123.163.73","xxdd_f1","xxdd123"); // host, username, password
mysql_select_db("xxdd");
mysql_query("SET names 'utf8_general_ci'");
$q = mysql_query("SELECT softtitle
FROM dede_ext
LIMIT 0 , 30");
while($e = mysql_fetch_assoc($q))
$output[] = $e;
print(json_encode($output));
mysql_close();
?>
What is wrong here? What should I do to fix this problem? Thank you very much!

You header is wrong. You're not supposed to set it to the character set of the table/database.
header("Content-Type: text/html; charset=UTF-8");
The same applies for "SET NAMES":
mysql_query("SET names 'utf8'");
And as a last thing, you are printing out json encoded data, your Content-type shouldn't be text/html but application/json.

because this Q&A still ranks highly in Google search results..
setting aside for a moment the general advice to switch from using mysql statements in php to using mysqli, replace this:
mysql_connect("116.123.163.73","xxdd_f1","xxdd123");
mysql_select_db("xxdd");
with this:
$con = mysql_connect("116.123.163.73","xxdd_f1","xxdd123");
mysql_set_charset($con, 'utf8');
mysql_select_db("xxdd");
check that your database, or at least the column, really IS collated as utf8_general_ci - though you might get better results with utf8mb4_unicode_ci
if your php is producing a purely JSON output, for example, a JSON 'object' that you're picking up with an AJAX call to pull data into another document, the header you should use in the PHP file is
header("Content-Type: application/json", true);
and not a header content-type of text/html or anything else.
and finally, assuming you're eventually presenting the values taken from the DB and placed into your JSON onto an html document, remember the to start that document with the following:
<!DOCTYPE html>
<html lang="zh-Hans">
<head>
<meta charset="utf-8">
...etc
note the declaration of utf-8 in the meta of the head, and the declaration of the language (chinese simplified in my code above = zh-Hans). If you are using a script which is written from right to left, eg. arabic sccripts, add dir="rtl" into the tag as well.

Related

How do I recover the actual utf8 code from data within MySQL?

I am storing an emoji as part of a string in a text field in MySQL:
<div><span id="emoji_1f600">&#x1f600</span></div>
The field in MySQL has utf8_general_ci set. When the data is stored into MySQL the field, the data now looks like this:
<div><span id="emoji_1f600">😀</span></div>
I am assuming that is because of how the emoji is stored. Please educate me if I am wrong on this point, as I thought I would have seen the unicode of &#x1f600 instead of the strange characters.
I then fetch the data from the MySQL field into a php var and do a substring to get just the actual emoji between the span tags. The value in the php var now looks like this:
"C0E8Kb,"
My code makes an attempt to get the unicode back by doing the following:
$code = utf8_encode($code) //$code contains the string "C0E8KB,"
The result is "CB0CB8CBC"BB,"
I am obviously not handling the emoji utf8 code properly and welcome any and all help and instruction.
Thanks in advance.
I don't really need UTF8 all the way through. Just on one field. Which the field in MySOL is typed to be utf8.
Ok I made a major mistake in my problem description. It is true that my code is producing the following html
<div><span id="emoji_1f600">&#x1f600</span></div>
However, this html is within an editor from a 3rd party and the emoji code within my span tag is actually being rendered as an emoji. So when I save the data from the editor, what I get back from the editor is the following:
<div>test 2 <span id="emoji_1f600">😀</span></div>
I am assuming the strange chars between the span tags is the actual emoji, since it is being rendered. Is this ok as is, or should I be replacing that with the actual &#x1f600 code, prior to storing it in the database? My fear is that if I do that, then the actual emoji will not get rendered when I place the string from the database into an html string to be rendered.
Your problem is assuming that MySQL's characterset called utf8 is actually utf8. It isn't. MySQLs utf8 is a 3-bytes subset of utf8 that does not cover emojis. In order to tell MySQL to not corrupt your data in the future, and give an error instead when invalid characters are given for the row, enable the STRICT_TRANS_TABLES sql_mode. In order to make mysql use the real 4-byte utf8, make the row characterset "utf8mb4" - in short, mysqls utf8 is a retardedly named utf8 subset, and the real utf8 is called utf8mb4 in MySQL. (This is also true for MariaDB btw, which inherited this brain damage from the MySQL source code it was forked from)
utf8_encode should not be used as your DB is already UTF-8 ; it encodes from ISO-8859-1 (often found with MySQL) to UTF-8 ; it may produce bad chars if your data is already utf-8 encoded. Is the html page containing the data that you want to store declared as utf-8 ? Something like this :
<head>
<meta charset="UTF-8">
</head>
I was bored so I tried the following code with no issue :
`<!DOCTYPE html>
<html>
<head>
<meta charset="utf-8">
<title></title>
</head>
<body>
<div><span id="emoji_1f600">&#x1f600</span></div>
<?php
$mysqli=new mysqli("127.0.0.1", "root", "","utf8_general_mysql");
$num=1;
$text="&#x1f600";
$stmt = $mysqli->prepare("INSERT INTO testtable VALUES (?, ?)");
$stmt->bind_param('ds', $num, $text);
$stmt->execute();
echo '<div><span id="emoji_1f600">&#x1f600</span></div>';
$stmt = $mysqli->prepare("SELECT * FROM testtable WHERE testtable.text='&#x1f600'");
$stmt->execute();
$result = $stmt->get_result();
while ($row = $result->fetch_array(MYSQLI_NUM))
{
foreach ($row as $r)
{
print "$r ";
}
print "\n";
}
?>
</body>
</html>`
Edit ... :
I really think it has to do with your headers content-type :
try to add :
header('Content-type: text/html; charset=utf-8');
then try
header('Content-type: text/html; charset=iso-8859-1'); (this is how you seem to be set)
on the page you are inserting data to MySQL, here are the 2 different rows :
I think meta charset does not work because http headers can be set elsewhere, these PHP lines should do the trick, hopefully.
To have these rows, i had to set the headers and replace the previous $text value with $text="😀" into my code sample.

why does my html not display special characters taken from my database

I included this at the top of my php file:
<?php
header('Content-Type: text/html; charset=UTF-8');
?>
I did this because my file.php was not displaying "á, é, í, ó, ú or ¿" in the html file or from data queried from my database.
After I placed the 'header('Content-Type: text/html; charset=UTF-8');' line of code my html page started to understand the special characters in the html file but, data received from my database now has a black rhombus with a question mark.
The collation my database has is "utf8_spanish_ci"
at the html tag i tried to put lang=es but this never worked I also tried to put the meta tag inside the head tag
<!DOCTYPE html>
<html lang=es>
<head>
<meta http-equiv="content-type" content="text/html; charset=UTF-8">
<head>
I also tried:
<meta charset="utf-8">
and:
<meta http-equiv="Content-Type" content="text/html; charset=iso-8859-1">
I don't know what the problem is. When I insert data directly into the data base the special characters are there but when I insert them from my file.php they appear with random characters.
Does anyone know why this is happening?
There are a couple of reasons this could be happening. It is however important that your entire line of code uses the same set of charset, and that all functions that can be set to a specific charset, is set to the same. The most widely used one is UTF-8, which is the one I'm suggesting you use.
Connection
You also need to specify the charset in the connection itself.
PDO (specified in the object itself):
$handler = new PDO('mysql:host=localhost;dbname=database;charset=utf8', 'username', 'password', array(PDO::MYSQL_ATTR_INIT_COMMAND => "SET CHARACTER SET UTF8"));
MySQLi: (placed directly after creating the connection)
* For OOP: $mysqli->set_charset("utf8");
* For procedural: mysqli_set_charset($mysqli, "utf8");
(where $mysqli is the MySQLi connection)
MySQL (depricated, you should convert to PDO or MySQLi): (placed directly after creating the connection)
mysql_set_charset("utf8");
Database
Your database and all its tables has to be set to UTF-8. Note that charset is not the same as collation.
You can do that by running the queries below once for each database and tables (for example in phpMyAdmin)
ALTER DATABASE databasename CHARACTER SET utf8 COLLATE utf8_unicode_ci;
ALTER TABLE tablename CONVERT TO CHARACTER SET utf8 COLLATE utf8_unicode_ci;
File-encoding
It's also important that the .php file itself is UTF-8 encoded. If you're using Notepad++ to write your code, this can be done in the "Format" drop-down on the taskbar (Convert to UFT-8 w/o BOM). You should use UTF-8 w/o BOM.
Should you follow all of the pointers above, chances are your problem will be solved. If not, you can take a look at this StackOverflow post: UTF-8 all the way through.
Are you sure? And are you sure that you are retrieving your data from the data base? Having said that, most databases require you to save data in a way that is NOT exactly like your question. There is really valid security reasons for this.
You should use utf8_general_ci as a database encoding also before insert query you should run this query
Mysql_query(" SET NAMES 'utf8'");

Can't figure out whether MySQL or PHP badly encodes UTF8 strings

I have a form on a PHP page which inserts data into a MySQL database.
Some input fields may contain UTF8 characters as é, è, â, etc. When they are actually inserted into the database, everything gets messed up. For example, a column shows Qréon instead of Qréon.
I used setLocale(LC_CTYPE, 'FR_fr.UTF-8'); at the top of my page for PHP and this <meta charset="utf-8"> is in my HTML header.
My database is run by MySQL, the storage engine is InnoDB, and the collation is utf8_general_mysql500_ci. I also tried utf8_general_ci and utf8_bin but I had no luck.
How do I know if this comes either from PHP or MySQL processing, and how can I fix it ?
Thank you for your time.
I think this can help you:
If your using mysql
mysql_query("SET NAMES 'utf8'");
If your using PDO use this:
$dbh->exec("set names utf8");
Otherwise i could be one of these which helps you specific:
//At the Top of you files
ini_set("default_charset", "UTF-8");
header('Content-type: text/html; charset=UTF-8');
//Before your queries
mysql_query("SET CHARACTER SET utf8 ");
mysql_set_charset('utf8');

Can not insert french string in database mysql php

I have form with input text, when i add text
Un sac à main de femme recèlerait une quantité importante de bactéries
it adds in database only Un sac
i have tried with addslashes, mysql_real_escape_string, htmlspecialchars etc. also using UTF-8 encoding, but still it can not insert whole string
YOu should use utf8_unicode_ci as your column's collation in orer for French strings to be added in it.
In order to store non-US strings in the database, you must ensure that each of the following 3 steps are correctly implemented:
You database table must be set to a charset compatible with French. To be future proof, I recommend creating tables with UTF-8. For more information see the MySQL documentation.
Your database connection must be set to a proper character set both when storing and when querying. To do this, use mysqli_set_charset() (or whatever your MySQL connector offers).
Your input form AND your view page must be served with the exact character set as your data. To do that, you will need to set the following header: header('Content-Type: text/html; charset=UTF-8'); (If you are using a different charset, change it accordingly.)
You can of course use a different character set for storage and representation but why would you want to do that?
Also, when working with databases and HTML, you should consider:
ALWAYS escape your data as it goes into the database. Use mysqli_real_escape_string() or whatever escape method your database connector offers. Also, do NOT set the connection charset by using SET NAMES UTF8, otherwise your connector library will not know what charset to use for escaping. For more information google "sql injection".
ALWAYS escape your data as it goes into HTML with htmlspecialchars(). Also pay attention to ALWAYS provide the correct character set. For more information google "xss".
After breaking my head for 2 days straight and reading all the possible answers here's what solved the problem and allows me to insert additional weird characters like em dash etc. and retrieve data without seeing weird characters.
Here's the complete step-by-step setup.
The collation of the db column need to be: utf8_general_ci
The type is: varchar(250)
In the PHP header set the default client character set to UTF8
mysql_set_charset("UTF8", $link);
Set the character set result so we can show french characters
$sql = "SET character_set_results=utf8";
$result = mysql_query($sql);
In the html header specify, so you can view the french characters:
<meta http-equiv="Content-Type" content="text/html; charset=UTF-8">
When inserting the data do NOT use utf8_decode, just the below will work fine
$query = 'insert into tbl (col) VALUES ("'.mysql_real_escape_string($variable).'");
Use normal queries to retreive data, example query:
$query = "select * from table;";
Finally got this fixed, hope this is helpful to others.
In the php:
header ('Content-type: text/html; charset=utf-8');
After connection:
mysql_set_charset("utf8");
Just to follow up with this, I was using dbForge Studio and just pasting in French text and I had all the collations/encoding set properly. The one thing I didn't have set was the actual encoding for the connection to the db. Set it to UTF8 and all was well again. #2 in #Janoszen answer.
Had the same problem. The input text came from ANSII file, so it wasn't quite UTF8, despite all my utf8 settings. utf8_encode(input_text) solved it.
I have tried
htmlentities()
. .it saves the string as it is in the database
You should try this to insert special character in mysql :
$con = mysql_connect($server,$uname,$pass);
$res = mysql_select_db($database,$con)
mysql_set_charset("letin1", $con);

mysql encoding problem

i have a proble, when insert something in foreign language into database.
i have set the collation of database to utf8_general_ci(try utf8_unicod_ci too).
but when i insert some text into table, it was saved like this
Õ€Õ¡ÕµÕ¥Ö€Õ¥Õ¶ Ô±Õ¶Õ¸Ö‚Õ¶
but when i read from database, text shows in correct form. it looks like that only in database.
i have set encoding in my html document to charset=UTF-8
<meta http-equiv="Content-Type" content="text/html; charset=UTF-8" />
and i set
mysql_query("SET NAMES UTF-8");
mysql_query("SET CHARACTER SET UTF-8");
when conecting to database.
so i think that i' ve done everything, but it still save in that anknown format.
could you help me.
thanks in advance
I believe you have to SET NAMES utf8, instead of UTF-8, in MySQL.
It looks like maybe your phpmyadmin isn't using the correct charset. In your phpmyadmin folder, open config.default.php and edit the lines
$cfg['DefaultCharset'] = 'iso-8859-1';
$cfg['DefaultLang'] = 'en-iso-8859-1';
To your chosen encoding.
It is suggested to use mysql_set_charset() instead of "SET NAMES" query, however the impact should be the same.

Categories