So I have a weird truncate issue! Can't find a specific answer on this.
So basically there's an issue with an apparent ISO character ½ that truncates the rest of the text upon insertion into a column with UTF-8 specified.
Lets say that my string is: "You need to add ½ cup of water." MySQL will truncate that to "You need to add"
if I:
print iconv("ISO-8859-1", "UTF-8//IGNORE", $text);
Then it outputs:
½
O_o
OK that doesn't work because I need the 1/2 by itself. If I go to phpMyAdmin and copy and paste the sentence in and submit it, it works like a charm as the whole string is in there with half symbol and remaining text! Something is wrong and I'm puzzled at what it is. I know this will probably affect other characters so the underlying problem needs to be addressed.
The language I'm using is php, the file itself is encoded as UTF-8 and the data I'm bringing in has content-type set to ISO-8859-1. The column is utf8_general_ci and all the mysql character sets are set to UTF-8 in php: "SET character_set_result = 'utf8', etc..."
Something in your code isn't handling the string as UTF8. It could be your PHP/HTML, it could be in your connection to the DB, or it could be the DB itself - everything has to be set as UTF8 consistently, and if anything isn't, the string will get truncated exactly as you see when passing across a UTF8/non-UTF8 boundary.
I will assume your DB is UTF8 compliant - that is easiest to check. Note that the collation can be set at the server level, database level, the table level, and the column level within the table. Setting UTF8 collation on the column should override anything else for storage, but the others will still kick in when talking to the DB if they're not also UTF8. If you're not sure, explicitly set the connection to UTF8 after you open it:
$dbh->setAttribute(PDO::MYSQL_ATTR_INIT_COMMAND, "SET NAMES 'utf8'");
Now your DB & connection are UTF8, make sure your web page is too. Again, this can be set in more than one place (.htaccess, php.ini). If you're not sure / don't have access, just override whatever PHP is picking up as default at the top of your page:
<?php ini_set('default_charset', 'UTF-8'); ?>
Note that you want the above right at the start, before any text is output from your page. Once text gets output, it is potentially too late to try and specify an encoding - you may already be locked into whatever is default on your server. I also then repeat this in my headers (possibly overkill):
<head>
<meta charset="UTF-8">
<meta http-equiv="Content-type" content="text/html; charset=UTF-8">
</head>
And I override it on forms where I'm taking data as well:
<FORM NAME="utf8-test" METHOD="POST" ACTION="utf8-test.php" enctype="multipart/form-data" accept-charset="UTF-8">"
To be honest, if you've set the encoding at the top, my understanding is that the other overrides aren't required - but I keep them anyway, because it doesn't break anything either, and I'd rather just state the encoding explicitly, than let the server make assumptions.
Finally, you mentioned that in phpMyAdmin you inserted the string and it looked as expected - are you sure though that the phpMyAdmin pages are UTF8? I don't think they are. When I store UTF8 data from my PHP code, it views like raw 8-bit characters in phpMyAdmin. If I take the same string and store it directly in phpMyAdmin, it looks 'correct'. So I'm guessing phpMyAdmin is using the default character set of my local server, not necessarily UTF8.
For example, the following string stored from my web page:
I can¹t wait
Reads like this in my phpMyAdmin:
I can’t wait
So be careful when testing that way, as you don't really know what encoding phpMyAdmin is using for display or DB connection.
If you're still having issues, try my code below. First I create a table to store the text in UTF8:
CREATE TABLE IF NOT EXISTS `utf8_test` (
`id` int(11) NOT NULL AUTO_INCREMENT,
`my_text` varchar(8000) NOT NULL,
PRIMARY KEY (`id`)
) ENGINE=MyISAM DEFAULT CHARSET=utf8 AUTO_INCREMENT=1 ;
And here's some PHP to test it. It basically takes your input on a form, echoes that input back at you, and stores/retrieves the text from the DB. Like I said, if you view the data directly in phpMyAdmin, you might find it doesn't look right there, but through the page below it should always appear as expected, due to the page & db connection both being locked to UTF8.
<?php
// Override whatever is set in php.ini
ini_set('default_charset', 'UTF-8');
// The following should not be required with the above override
//header('Content-Type:text/html; charset=UTF-8');
// Open the database
$dbh = new PDO('mysql:dbname=utf8db;host=127.0.0.1;charset=utf8', 'root', 'password');
// Set the connection to UTF8
$dbh->setAttribute(PDO::MYSQL_ATTR_INIT_COMMAND, "SET NAMES 'utf8'");
// Tell MySql to do the parameter replacement, not PDO
$dbh->setAttribute(PDO::ATTR_EMULATE_PREPARES, false);
// Throw exceptions (and break the code) if a query is bad
$dbh->setAttribute(PDO::ATTR_ERRMODE, PDO::ERRMODE_EXCEPTION);
$id = 0;
if (isset($_POST["StoreText"]))
{
$stmt = $dbh->prepare('INSERT INTO utf8_test (my_text) VALUES (:my_text)');
$stmt->execute(array(':my_text' => $_POST['my_text']));
$id = $dbh->lastInsertId();
}
?>
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional/EN" "http://www.w3.org/TR/xhtml11/DTD/xhtml11-transitional.dtd">
<html xmlns="http://www.w3.org/1999/xhtml">
<head>
<meta charset="UTF-8">
<meta http-equiv="Content-type" content="text/html; charset=UTF-8">
<title>UTF-8 Test</title>
</head>
<body>
<?php
// If something was posted, output it
if (isset($_POST['my_text']))
{
echo "POSTED<br>\n";
echo $_POST['my_text'] . "<br>\n";
}
// If something was written to the database, read it back, and output it
if ($id > 0)
{
$stmt = $dbh->prepare('SELECT my_text FROM utf8_test WHERE id = :id');
$stmt->execute(array(':id' => $id));
if ($result = $stmt->fetch())
{
echo "STORED<br>\n";
echo $result['my_text'] . "<br>\n";
}
}
// Create a form to take some user input
echo "<FORM NAME=\"utf8-test\" METHOD=\"POST\" ACTION=\"utf8-test.php\" enctype=\"multipart/form-data\" accept-charset=\"UTF-8\">";
echo "<br>";
echo "<textarea name=\"my_text\" rows=\"20\" cols=\"90\">";
// If something was posted, include it on the form
if (isset($_POST['my_text']))
{
echo $_POST['my_text'];
}
echo "</textarea>";
echo "<br>";
echo "<INPUT TYPE = \"Submit\" Name = \"StoreText\" VALUE=\"Store It\" />";
echo "</FORM>";
?>
<br>
</body>
</html>
Check into mb_convert_encoding if you can't change the way the data is handled. Otherwise, do yourself a favor and get your encoding on the same page before it gets out of hand. UTF-8 uses multibyte characters which aren't recognized in the ISO-8859-1 (Latin) encoding. wikipedia. This page and this page are good sources, as well as this debug table.
Finally, I've run into this when various combinations of htmlentities, htmlspecialchars and html_entity_decode are used..
Did you call set_charset() on your MySQLi database connection? It's required to properly use real_escape_string().
$db = new mysqli(...);
$db->set_charset('utf8');
Setting session variables in your connection is not enough -- those affect what happens on the server-side. The set_charset will affect what happens client side.
You can checkout the PHP reference mysqli::real_escape_string
Related
I am storing an emoji as part of a string in a text field in MySQL:
<div><span id="emoji_1f600">😀</span></div>
The field in MySQL has utf8_general_ci set. When the data is stored into MySQL the field, the data now looks like this:
<div><span id="emoji_1f600">😀</span></div>
I am assuming that is because of how the emoji is stored. Please educate me if I am wrong on this point, as I thought I would have seen the unicode of 😀 instead of the strange characters.
I then fetch the data from the MySQL field into a php var and do a substring to get just the actual emoji between the span tags. The value in the php var now looks like this:
"C0E8Kb,"
My code makes an attempt to get the unicode back by doing the following:
$code = utf8_encode($code) //$code contains the string "C0E8KB,"
The result is "CB0CB8CBC"BB,"
I am obviously not handling the emoji utf8 code properly and welcome any and all help and instruction.
Thanks in advance.
I don't really need UTF8 all the way through. Just on one field. Which the field in MySOL is typed to be utf8.
Ok I made a major mistake in my problem description. It is true that my code is producing the following html
<div><span id="emoji_1f600">😀</span></div>
However, this html is within an editor from a 3rd party and the emoji code within my span tag is actually being rendered as an emoji. So when I save the data from the editor, what I get back from the editor is the following:
<div>test 2 <span id="emoji_1f600">😀</span></div>
I am assuming the strange chars between the span tags is the actual emoji, since it is being rendered. Is this ok as is, or should I be replacing that with the actual 😀 code, prior to storing it in the database? My fear is that if I do that, then the actual emoji will not get rendered when I place the string from the database into an html string to be rendered.
Your problem is assuming that MySQL's characterset called utf8 is actually utf8. It isn't. MySQLs utf8 is a 3-bytes subset of utf8 that does not cover emojis. In order to tell MySQL to not corrupt your data in the future, and give an error instead when invalid characters are given for the row, enable the STRICT_TRANS_TABLES sql_mode. In order to make mysql use the real 4-byte utf8, make the row characterset "utf8mb4" - in short, mysqls utf8 is a retardedly named utf8 subset, and the real utf8 is called utf8mb4 in MySQL. (This is also true for MariaDB btw, which inherited this brain damage from the MySQL source code it was forked from)
utf8_encode should not be used as your DB is already UTF-8 ; it encodes from ISO-8859-1 (often found with MySQL) to UTF-8 ; it may produce bad chars if your data is already utf-8 encoded. Is the html page containing the data that you want to store declared as utf-8 ? Something like this :
<head>
<meta charset="UTF-8">
</head>
I was bored so I tried the following code with no issue :
`<!DOCTYPE html>
<html>
<head>
<meta charset="utf-8">
<title></title>
</head>
<body>
<div><span id="emoji_1f600">😀</span></div>
<?php
$mysqli=new mysqli("127.0.0.1", "root", "","utf8_general_mysql");
$num=1;
$text="😀";
$stmt = $mysqli->prepare("INSERT INTO testtable VALUES (?, ?)");
$stmt->bind_param('ds', $num, $text);
$stmt->execute();
echo '<div><span id="emoji_1f600">😀</span></div>';
$stmt = $mysqli->prepare("SELECT * FROM testtable WHERE testtable.text='😀'");
$stmt->execute();
$result = $stmt->get_result();
while ($row = $result->fetch_array(MYSQLI_NUM))
{
foreach ($row as $r)
{
print "$r ";
}
print "\n";
}
?>
</body>
</html>`
Edit ... :
I really think it has to do with your headers content-type :
try to add :
header('Content-type: text/html; charset=utf-8');
then try
header('Content-type: text/html; charset=iso-8859-1'); (this is how you seem to be set)
on the page you are inserting data to MySQL, here are the 2 different rows :
I think meta charset does not work because http headers can be set elsewhere, these PHP lines should do the trick, hopefully.
To have these rows, i had to set the headers and replace the previous $text value with $text="😀" into my code sample.
I have problem with German characters on my web site,
in html/php part of website i have this code to set utf-8:
<meta charset="utf-8">
in mysql, i have this code to set utf-8
SET CHARSET 'utf8';
Here is some word on German: Gemäß
Here is how that word looks in mysql table:
Gemäß
Here is how that word is shown on the site: Gemäß
What is a problem? Thanks.
I was using this code to get title:
$title = mysql_real_escape_string(htmlentities($_POST['title']));
I just override that to
$title = $_POST['title'];
At first, make sure, that you have UTF-8 characters in your database.
After that, try using SET NAMES 'UTF8' after connecting to MySQL:
$con=mysqli_connect("host", "user", "pw", "db");
if (!$con)
{
die('Failed to connect to mySQL: ' .mysqli_connect_errno());
}
mysqli_query($con, "SET NAMES 'UTF8'") or die("ERROR: ". mysqli_error($con));
As the manual says:
SET NAMES indicates what character set the client will use to send SQL
statements to the server... It also specifies the character set that the server should
use for sending results back to the client.
Try SET NAMES 'utf8' or SET NAMES 'utf-8'. Some of these works fine for portuguese, probably for german too. I just can't remember which one is correct, but if it is not, an error will be produced.
you should make sure that the CONNECTION is also utf-8.
with mysqli this is done with something like this:
$connection = mysqli_connect($host, $user, $pass, $db_name);
$connection->set_charset("utf8");
Now if somehow you ended up with wrong characters in the database there is a way to make it right:
in a PHP script, retrieve the information as you do now, i.e without setting the connection. This way the mistake will be inverted and corrected and in your php file you will have the characters in the correct utf-8 format.
in a PHP script, write back the information with setting the connection to utf-8
at this point you should see the character correct in your database
now change all your read/write functions of your site to use the utf-8 from now on
in HTML5 use
<meta charset="utf-8">
in HTML 4.0.1 use
<meta http-equiv="Content-Type" content="text/html; charset=utf-8">
the results are html entity encoded as if they were processed by htmlentities(), I wonder if your variables are ibserted as received from the form or are being processed by say a wysiwg editor for instance?
Anyway, these should print fine on an html template but an html_entity_decode() should do it to.
Hope this helps
Set the data type in your database to use UTF-8 as well, this should solve the problem.
I had the same problem. which I solved by using:
if you have already created your table, you need the modify the character set as:
alter table <table name> convert to character set utf8 collate utf8_general_ci.
your tables character set is set to latin_swedish by default by MySQL.
also, you might face some problems while retrieving the data and displaying it to you page.For that include: mysql_set_charset('utf8') just below the line where you have connected your database.
eg:
mysql_connect('localhost','root','');
mysql_select_db('my db');
mysql_set_charset('utf8');
You will need to do this for php 5.x
$yourNiceLookingString =
htmlspecialchars ($YourStringFromDB, ENT_COMPAT | ENT_HTML401, 'ISO-8859-1');
and for php 4.x
$yourNiceLookingString = htmlspecialchars($YourStringFromDB);
I have form with input text, when i add text
Un sac à main de femme recèlerait une quantité importante de bactéries
it adds in database only Un sac
i have tried with addslashes, mysql_real_escape_string, htmlspecialchars etc. also using UTF-8 encoding, but still it can not insert whole string
YOu should use utf8_unicode_ci as your column's collation in orer for French strings to be added in it.
In order to store non-US strings in the database, you must ensure that each of the following 3 steps are correctly implemented:
You database table must be set to a charset compatible with French. To be future proof, I recommend creating tables with UTF-8. For more information see the MySQL documentation.
Your database connection must be set to a proper character set both when storing and when querying. To do this, use mysqli_set_charset() (or whatever your MySQL connector offers).
Your input form AND your view page must be served with the exact character set as your data. To do that, you will need to set the following header: header('Content-Type: text/html; charset=UTF-8'); (If you are using a different charset, change it accordingly.)
You can of course use a different character set for storage and representation but why would you want to do that?
Also, when working with databases and HTML, you should consider:
ALWAYS escape your data as it goes into the database. Use mysqli_real_escape_string() or whatever escape method your database connector offers. Also, do NOT set the connection charset by using SET NAMES UTF8, otherwise your connector library will not know what charset to use for escaping. For more information google "sql injection".
ALWAYS escape your data as it goes into HTML with htmlspecialchars(). Also pay attention to ALWAYS provide the correct character set. For more information google "xss".
After breaking my head for 2 days straight and reading all the possible answers here's what solved the problem and allows me to insert additional weird characters like em dash etc. and retrieve data without seeing weird characters.
Here's the complete step-by-step setup.
The collation of the db column need to be: utf8_general_ci
The type is: varchar(250)
In the PHP header set the default client character set to UTF8
mysql_set_charset("UTF8", $link);
Set the character set result so we can show french characters
$sql = "SET character_set_results=utf8";
$result = mysql_query($sql);
In the html header specify, so you can view the french characters:
<meta http-equiv="Content-Type" content="text/html; charset=UTF-8">
When inserting the data do NOT use utf8_decode, just the below will work fine
$query = 'insert into tbl (col) VALUES ("'.mysql_real_escape_string($variable).'");
Use normal queries to retreive data, example query:
$query = "select * from table;";
Finally got this fixed, hope this is helpful to others.
In the php:
header ('Content-type: text/html; charset=utf-8');
After connection:
mysql_set_charset("utf8");
Just to follow up with this, I was using dbForge Studio and just pasting in French text and I had all the collations/encoding set properly. The one thing I didn't have set was the actual encoding for the connection to the db. Set it to UTF8 and all was well again. #2 in #Janoszen answer.
Had the same problem. The input text came from ANSII file, so it wasn't quite UTF8, despite all my utf8 settings. utf8_encode(input_text) solved it.
I have tried
htmlentities()
. .it saves the string as it is in the database
You should try this to insert special character in mysql :
$con = mysql_connect($server,$uname,$pass);
$res = mysql_select_db($database,$con)
mysql_set_charset("letin1", $con);
I'm trying to save French accents in my database, but they aren't saved like they should in the DB.For example, a "é" is saved as "é".I've tried to set my files to "Unicode (utf-8)", the fields in the DB are "utf8_general_ci" as well as the DB itself.When I look at my data posted through AJAX with Firebug, I see the accent passed as "é", so it's correct.Thanks and let me know you need more info!
Personally I solved the same issue by adding after the MySQL connection code:
mysql_set_charset("utf8");
or for mysqli:
mysqli_set_charset($conn, "utf8");
or the mysqli OOP equivalent:
$conn->set_charset("utf8");
And sometimes you'll have to define the main php charset by adding this code:
mb_internal_encoding('UTF-8');
On the client HTML side you have to add the following header data :
<meta http-equiv="Content-type" content="text/html;charset=utf-8" />
In order to use JSON AJAX results (e.g. by using jQuery), you should define the header by adding :
header("Content-type: application/json;charset=utf8");
json_encode(
some_data
);
This should do the trick
The best bet is that your database connection is not UTF-8 encoded - it is usually ISO-8859-1 by default.
Try sending a query
SET NAMES utf8;
after making the connection.
mysqli_set_charset($conn, "utf8");
if you use PDO, you must instanciate like that :
new \PDO("mysql:host=$host;dbname=$schema", $username, $password, array(\PDO::MYSQL_ATTR_INIT_COMMAND => 'SET NAMES utf8') );
Use UTF8:
Set a meta in your
<meta http-equiv="Content-type" content="text/html;charset=utf-8" />
When you connect to your mySQL DB, force encoding so you DONT have to play with your mysql settings
$conn = mysql_connect('server', 'user', 'password') or die('Could not connect to mysql server.');
mysql_select_db('mydb') or die('Could not select database.');
mysql_set_charset('utf8',$conn); //THIS IS THE IMPORTANT PART
If you use AJAX, set you encoding like this:
header('Content-type: text/html; charset=utf-8');
Have you reviewed http://dev.mysql.com/doc/refman/5.0/en/charset-unicode.html:
Client applications that need to
communicate with the server using
Unicode should set the client
character set accordingly; for
example, by issuing a SET NAMES 'utf8'
statement. ucs2 cannot be used as a
client character set, which means that
it does not work for SET NAMES or SET
CHARACTER SET. (See Section 9.1.4,
“Connection Character Sets and
Collations”.)
Further to that:
if you get data via php from your
mysql-db (everything utf-8) but still
get '?' for some special characters in
your browser (), try this:
after mysql_connect() , and
mysql_select_db() add this lines:
mysql_query("SET NAMES utf8");
worked for me. i tried first with the
utf8_encode, but this only worked for
äüöéè... and so on, but not for
kyrillic and other chars.
You need to a) make sure your tables are using a character encoding that can encode such characters (UTF-8 tends to be the go-to encoding these days) and b) make sure that your form submissions are being sent to the database in the same character encoding. You do this by saving your HTML/PHP/whatever files as UTF-8, and by including a meta tag in the head that tells the browser to use UTF-8 encoding.
<meta http-equiv="Content-Type" content="text/html; charset=UTF-8" />
Oh, and don't forget C, when connecting to the database, make sure you're actually using the correct character set by executing a SET NAMES charset=utf8 (might not be the correct syntax, I'll have to look up what it should be, but it will be along those lines)
PHP(.net) advises against setting charsets after connecting using a query like SET NAMES utf8 because your functionality for escaping data inside MySQL statements might not work as intended.
Do not use SET NAMES utf8 but use the appropriate ..._set_charset() function (or method) instead, in case you are using PHP.
Ok I have found a working solution for me :
Run this mysql command
show variables like 'char%';
Here you have many variables : "character_set_server", "character_set_system" etc.
In my case I have "é" for "é" in database and I want to show "é" on my website.
To work I have to change "character_set_server" value from "utf8mb4" to "latin1".
All my correct value are :
And other values are :
With theses values the wrong database accent are corrected and well displayed by the server.
But each case can be different.
I have a MySQL database with the following settings:
MySQL charset: UTF-8 Unicode (utf8)
MySQL connection collation: utf8_unicode_ci
I have a table with a column named "softtitle", this column is coded in utf8_general_ci. The entries of this column contain Chinese characters. If I run SQL through the PHPMyAdmin Control pane, the Chinese characters are shown correctly. But if I run SQL through a PHP file, all Chinese characters are shown wrongly. Here is the PHP file:
<?php
header("Content-Type: text/html; charset=utf8_general_ci");
mysql_connect("116.123.163.73","xxdd_f1","xxdd123"); // host, username, password
mysql_select_db("xxdd");
mysql_query("SET names 'utf8_general_ci'");
$q = mysql_query("SELECT softtitle
FROM dede_ext
LIMIT 0 , 30");
while($e = mysql_fetch_assoc($q))
$output[] = $e;
print(json_encode($output));
mysql_close();
?>
What is wrong here? What should I do to fix this problem? Thank you very much!
You header is wrong. You're not supposed to set it to the character set of the table/database.
header("Content-Type: text/html; charset=UTF-8");
The same applies for "SET NAMES":
mysql_query("SET names 'utf8'");
And as a last thing, you are printing out json encoded data, your Content-type shouldn't be text/html but application/json.
because this Q&A still ranks highly in Google search results..
setting aside for a moment the general advice to switch from using mysql statements in php to using mysqli, replace this:
mysql_connect("116.123.163.73","xxdd_f1","xxdd123");
mysql_select_db("xxdd");
with this:
$con = mysql_connect("116.123.163.73","xxdd_f1","xxdd123");
mysql_set_charset($con, 'utf8');
mysql_select_db("xxdd");
check that your database, or at least the column, really IS collated as utf8_general_ci - though you might get better results with utf8mb4_unicode_ci
if your php is producing a purely JSON output, for example, a JSON 'object' that you're picking up with an AJAX call to pull data into another document, the header you should use in the PHP file is
header("Content-Type: application/json", true);
and not a header content-type of text/html or anything else.
and finally, assuming you're eventually presenting the values taken from the DB and placed into your JSON onto an html document, remember the to start that document with the following:
<!DOCTYPE html>
<html lang="zh-Hans">
<head>
<meta charset="utf-8">
...etc
note the declaration of utf-8 in the meta of the head, and the declaration of the language (chinese simplified in my code above = zh-Hans). If you are using a script which is written from right to left, eg. arabic sccripts, add dir="rtl" into the tag as well.