How do I recover the actual utf8 code from data within MySQL? - php

I am storing an emoji as part of a string in a text field in MySQL:
<div><span id="emoji_1f600">&#x1f600</span></div>
The field in MySQL has utf8_general_ci set. When the data is stored into MySQL the field, the data now looks like this:
<div><span id="emoji_1f600">😀</span></div>
I am assuming that is because of how the emoji is stored. Please educate me if I am wrong on this point, as I thought I would have seen the unicode of &#x1f600 instead of the strange characters.
I then fetch the data from the MySQL field into a php var and do a substring to get just the actual emoji between the span tags. The value in the php var now looks like this:
"C0E8Kb,"
My code makes an attempt to get the unicode back by doing the following:
$code = utf8_encode($code) //$code contains the string "C0E8KB,"
The result is "CB0CB8CBC"BB,"
I am obviously not handling the emoji utf8 code properly and welcome any and all help and instruction.
Thanks in advance.
I don't really need UTF8 all the way through. Just on one field. Which the field in MySOL is typed to be utf8.
Ok I made a major mistake in my problem description. It is true that my code is producing the following html
<div><span id="emoji_1f600">&#x1f600</span></div>
However, this html is within an editor from a 3rd party and the emoji code within my span tag is actually being rendered as an emoji. So when I save the data from the editor, what I get back from the editor is the following:
<div>test 2 <span id="emoji_1f600">😀</span></div>
I am assuming the strange chars between the span tags is the actual emoji, since it is being rendered. Is this ok as is, or should I be replacing that with the actual &#x1f600 code, prior to storing it in the database? My fear is that if I do that, then the actual emoji will not get rendered when I place the string from the database into an html string to be rendered.

Your problem is assuming that MySQL's characterset called utf8 is actually utf8. It isn't. MySQLs utf8 is a 3-bytes subset of utf8 that does not cover emojis. In order to tell MySQL to not corrupt your data in the future, and give an error instead when invalid characters are given for the row, enable the STRICT_TRANS_TABLES sql_mode. In order to make mysql use the real 4-byte utf8, make the row characterset "utf8mb4" - in short, mysqls utf8 is a retardedly named utf8 subset, and the real utf8 is called utf8mb4 in MySQL. (This is also true for MariaDB btw, which inherited this brain damage from the MySQL source code it was forked from)

utf8_encode should not be used as your DB is already UTF-8 ; it encodes from ISO-8859-1 (often found with MySQL) to UTF-8 ; it may produce bad chars if your data is already utf-8 encoded. Is the html page containing the data that you want to store declared as utf-8 ? Something like this :
<head>
<meta charset="UTF-8">
</head>
I was bored so I tried the following code with no issue :
`<!DOCTYPE html>
<html>
<head>
<meta charset="utf-8">
<title></title>
</head>
<body>
<div><span id="emoji_1f600">&#x1f600</span></div>
<?php
$mysqli=new mysqli("127.0.0.1", "root", "","utf8_general_mysql");
$num=1;
$text="&#x1f600";
$stmt = $mysqli->prepare("INSERT INTO testtable VALUES (?, ?)");
$stmt->bind_param('ds', $num, $text);
$stmt->execute();
echo '<div><span id="emoji_1f600">&#x1f600</span></div>';
$stmt = $mysqli->prepare("SELECT * FROM testtable WHERE testtable.text='&#x1f600'");
$stmt->execute();
$result = $stmt->get_result();
while ($row = $result->fetch_array(MYSQLI_NUM))
{
foreach ($row as $r)
{
print "$r ";
}
print "\n";
}
?>
</body>
</html>`
Edit ... :
I really think it has to do with your headers content-type :
try to add :
header('Content-type: text/html; charset=utf-8');
then try
header('Content-type: text/html; charset=iso-8859-1'); (this is how you seem to be set)
on the page you are inserting data to MySQL, here are the 2 different rows :
I think meta charset does not work because http headers can be set elsewhere, these PHP lines should do the trick, hopefully.
To have these rows, i had to set the headers and replace the previous $text value with $text="😀" into my code sample.

Related

why does my html not display special characters taken from my database

I included this at the top of my php file:
<?php
header('Content-Type: text/html; charset=UTF-8');
?>
I did this because my file.php was not displaying "á, é, í, ó, ú or ¿" in the html file or from data queried from my database.
After I placed the 'header('Content-Type: text/html; charset=UTF-8');' line of code my html page started to understand the special characters in the html file but, data received from my database now has a black rhombus with a question mark.
The collation my database has is "utf8_spanish_ci"
at the html tag i tried to put lang=es but this never worked I also tried to put the meta tag inside the head tag
<!DOCTYPE html>
<html lang=es>
<head>
<meta http-equiv="content-type" content="text/html; charset=UTF-8">
<head>
I also tried:
<meta charset="utf-8">
and:
<meta http-equiv="Content-Type" content="text/html; charset=iso-8859-1">
I don't know what the problem is. When I insert data directly into the data base the special characters are there but when I insert them from my file.php they appear with random characters.
Does anyone know why this is happening?
There are a couple of reasons this could be happening. It is however important that your entire line of code uses the same set of charset, and that all functions that can be set to a specific charset, is set to the same. The most widely used one is UTF-8, which is the one I'm suggesting you use.
Connection
You also need to specify the charset in the connection itself.
PDO (specified in the object itself):
$handler = new PDO('mysql:host=localhost;dbname=database;charset=utf8', 'username', 'password', array(PDO::MYSQL_ATTR_INIT_COMMAND => "SET CHARACTER SET UTF8"));
MySQLi: (placed directly after creating the connection)
* For OOP: $mysqli->set_charset("utf8");
* For procedural: mysqli_set_charset($mysqli, "utf8");
(where $mysqli is the MySQLi connection)
MySQL (depricated, you should convert to PDO or MySQLi): (placed directly after creating the connection)
mysql_set_charset("utf8");
Database
Your database and all its tables has to be set to UTF-8. Note that charset is not the same as collation.
You can do that by running the queries below once for each database and tables (for example in phpMyAdmin)
ALTER DATABASE databasename CHARACTER SET utf8 COLLATE utf8_unicode_ci;
ALTER TABLE tablename CONVERT TO CHARACTER SET utf8 COLLATE utf8_unicode_ci;
File-encoding
It's also important that the .php file itself is UTF-8 encoded. If you're using Notepad++ to write your code, this can be done in the "Format" drop-down on the taskbar (Convert to UFT-8 w/o BOM). You should use UTF-8 w/o BOM.
Should you follow all of the pointers above, chances are your problem will be solved. If not, you can take a look at this StackOverflow post: UTF-8 all the way through.
Are you sure? And are you sure that you are retrieving your data from the data base? Having said that, most databases require you to save data in a way that is NOT exactly like your question. There is really valid security reasons for this.
You should use utf8_general_ci as a database encoding also before insert query you should run this query
Mysql_query(" SET NAMES 'utf8'");

ISO-8859-1 Character truncates text inserting into utf-8 mysql column

So I have a weird truncate issue! Can't find a specific answer on this.
So basically there's an issue with an apparent ISO character ½ that truncates the rest of the text upon insertion into a column with UTF-8 specified.
Lets say that my string is: "You need to add ½ cup of water." MySQL will truncate that to "You need to add"
if I:
print iconv("ISO-8859-1", "UTF-8//IGNORE", $text);
Then it outputs:
½
O_o
OK that doesn't work because I need the 1/2 by itself. If I go to phpMyAdmin and copy and paste the sentence in and submit it, it works like a charm as the whole string is in there with half symbol and remaining text! Something is wrong and I'm puzzled at what it is. I know this will probably affect other characters so the underlying problem needs to be addressed.
The language I'm using is php, the file itself is encoded as UTF-8 and the data I'm bringing in has content-type set to ISO-8859-1. The column is utf8_general_ci and all the mysql character sets are set to UTF-8 in php: "SET character_set_result = 'utf8', etc..."
Something in your code isn't handling the string as UTF8. It could be your PHP/HTML, it could be in your connection to the DB, or it could be the DB itself - everything has to be set as UTF8 consistently, and if anything isn't, the string will get truncated exactly as you see when passing across a UTF8/non-UTF8 boundary.
I will assume your DB is UTF8 compliant - that is easiest to check. Note that the collation can be set at the server level, database level, the table level, and the column level within the table. Setting UTF8 collation on the column should override anything else for storage, but the others will still kick in when talking to the DB if they're not also UTF8. If you're not sure, explicitly set the connection to UTF8 after you open it:
$dbh->setAttribute(PDO::MYSQL_ATTR_INIT_COMMAND, "SET NAMES 'utf8'");
Now your DB & connection are UTF8, make sure your web page is too. Again, this can be set in more than one place (.htaccess, php.ini). If you're not sure / don't have access, just override whatever PHP is picking up as default at the top of your page:
<?php ini_set('default_charset', 'UTF-8'); ?>
Note that you want the above right at the start, before any text is output from your page. Once text gets output, it is potentially too late to try and specify an encoding - you may already be locked into whatever is default on your server. I also then repeat this in my headers (possibly overkill):
<head>
<meta charset="UTF-8">
<meta http-equiv="Content-type" content="text/html; charset=UTF-8">
</head>
And I override it on forms where I'm taking data as well:
<FORM NAME="utf8-test" METHOD="POST" ACTION="utf8-test.php" enctype="multipart/form-data" accept-charset="UTF-8">"
To be honest, if you've set the encoding at the top, my understanding is that the other overrides aren't required - but I keep them anyway, because it doesn't break anything either, and I'd rather just state the encoding explicitly, than let the server make assumptions.
Finally, you mentioned that in phpMyAdmin you inserted the string and it looked as expected - are you sure though that the phpMyAdmin pages are UTF8? I don't think they are. When I store UTF8 data from my PHP code, it views like raw 8-bit characters in phpMyAdmin. If I take the same string and store it directly in phpMyAdmin, it looks 'correct'. So I'm guessing phpMyAdmin is using the default character set of my local server, not necessarily UTF8.
For example, the following string stored from my web page:
I can¹t wait
Reads like this in my phpMyAdmin:
I can’t wait
So be careful when testing that way, as you don't really know what encoding phpMyAdmin is using for display or DB connection.
If you're still having issues, try my code below. First I create a table to store the text in UTF8:
CREATE TABLE IF NOT EXISTS `utf8_test` (
`id` int(11) NOT NULL AUTO_INCREMENT,
`my_text` varchar(8000) NOT NULL,
PRIMARY KEY (`id`)
) ENGINE=MyISAM DEFAULT CHARSET=utf8 AUTO_INCREMENT=1 ;
And here's some PHP to test it. It basically takes your input on a form, echoes that input back at you, and stores/retrieves the text from the DB. Like I said, if you view the data directly in phpMyAdmin, you might find it doesn't look right there, but through the page below it should always appear as expected, due to the page & db connection both being locked to UTF8.
<?php
// Override whatever is set in php.ini
ini_set('default_charset', 'UTF-8');
// The following should not be required with the above override
//header('Content-Type:text/html; charset=UTF-8');
// Open the database
$dbh = new PDO('mysql:dbname=utf8db;host=127.0.0.1;charset=utf8', 'root', 'password');
// Set the connection to UTF8
$dbh->setAttribute(PDO::MYSQL_ATTR_INIT_COMMAND, "SET NAMES 'utf8'");
// Tell MySql to do the parameter replacement, not PDO
$dbh->setAttribute(PDO::ATTR_EMULATE_PREPARES, false);
// Throw exceptions (and break the code) if a query is bad
$dbh->setAttribute(PDO::ATTR_ERRMODE, PDO::ERRMODE_EXCEPTION);
$id = 0;
if (isset($_POST["StoreText"]))
{
$stmt = $dbh->prepare('INSERT INTO utf8_test (my_text) VALUES (:my_text)');
$stmt->execute(array(':my_text' => $_POST['my_text']));
$id = $dbh->lastInsertId();
}
?>
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional/EN" "http://www.w3.org/TR/xhtml11/DTD/xhtml11-transitional.dtd">
<html xmlns="http://www.w3.org/1999/xhtml">
<head>
<meta charset="UTF-8">
<meta http-equiv="Content-type" content="text/html; charset=UTF-8">
<title>UTF-8 Test</title>
</head>
<body>
<?php
// If something was posted, output it
if (isset($_POST['my_text']))
{
echo "POSTED<br>\n";
echo $_POST['my_text'] . "<br>\n";
}
// If something was written to the database, read it back, and output it
if ($id > 0)
{
$stmt = $dbh->prepare('SELECT my_text FROM utf8_test WHERE id = :id');
$stmt->execute(array(':id' => $id));
if ($result = $stmt->fetch())
{
echo "STORED<br>\n";
echo $result['my_text'] . "<br>\n";
}
}
// Create a form to take some user input
echo "<FORM NAME=\"utf8-test\" METHOD=\"POST\" ACTION=\"utf8-test.php\" enctype=\"multipart/form-data\" accept-charset=\"UTF-8\">";
echo "<br>";
echo "<textarea name=\"my_text\" rows=\"20\" cols=\"90\">";
// If something was posted, include it on the form
if (isset($_POST['my_text']))
{
echo $_POST['my_text'];
}
echo "</textarea>";
echo "<br>";
echo "<INPUT TYPE = \"Submit\" Name = \"StoreText\" VALUE=\"Store It\" />";
echo "</FORM>";
?>
<br>
</body>
</html>
Check into mb_convert_encoding if you can't change the way the data is handled. Otherwise, do yourself a favor and get your encoding on the same page before it gets out of hand. UTF-8 uses multibyte characters which aren't recognized in the ISO-8859-1 (Latin) encoding. wikipedia. This page and this page are good sources, as well as this debug table.
Finally, I've run into this when various combinations of htmlentities, htmlspecialchars and html_entity_decode are used..
Did you call set_charset() on your MySQLi database connection? It's required to properly use real_escape_string().
$db = new mysqli(...);
$db->set_charset('utf8');
Setting session variables in your connection is not enough -- those affect what happens on the server-side. The set_charset will affect what happens client side.
You can checkout the PHP reference mysqli::real_escape_string

Can not insert french string in database mysql php

I have form with input text, when i add text
Un sac à main de femme recèlerait une quantité importante de bactéries
it adds in database only Un sac
i have tried with addslashes, mysql_real_escape_string, htmlspecialchars etc. also using UTF-8 encoding, but still it can not insert whole string
YOu should use utf8_unicode_ci as your column's collation in orer for French strings to be added in it.
In order to store non-US strings in the database, you must ensure that each of the following 3 steps are correctly implemented:
You database table must be set to a charset compatible with French. To be future proof, I recommend creating tables with UTF-8. For more information see the MySQL documentation.
Your database connection must be set to a proper character set both when storing and when querying. To do this, use mysqli_set_charset() (or whatever your MySQL connector offers).
Your input form AND your view page must be served with the exact character set as your data. To do that, you will need to set the following header: header('Content-Type: text/html; charset=UTF-8'); (If you are using a different charset, change it accordingly.)
You can of course use a different character set for storage and representation but why would you want to do that?
Also, when working with databases and HTML, you should consider:
ALWAYS escape your data as it goes into the database. Use mysqli_real_escape_string() or whatever escape method your database connector offers. Also, do NOT set the connection charset by using SET NAMES UTF8, otherwise your connector library will not know what charset to use for escaping. For more information google "sql injection".
ALWAYS escape your data as it goes into HTML with htmlspecialchars(). Also pay attention to ALWAYS provide the correct character set. For more information google "xss".
After breaking my head for 2 days straight and reading all the possible answers here's what solved the problem and allows me to insert additional weird characters like em dash etc. and retrieve data without seeing weird characters.
Here's the complete step-by-step setup.
The collation of the db column need to be: utf8_general_ci
The type is: varchar(250)
In the PHP header set the default client character set to UTF8
mysql_set_charset("UTF8", $link);
Set the character set result so we can show french characters
$sql = "SET character_set_results=utf8";
$result = mysql_query($sql);
In the html header specify, so you can view the french characters:
<meta http-equiv="Content-Type" content="text/html; charset=UTF-8">
When inserting the data do NOT use utf8_decode, just the below will work fine
$query = 'insert into tbl (col) VALUES ("'.mysql_real_escape_string($variable).'");
Use normal queries to retreive data, example query:
$query = "select * from table;";
Finally got this fixed, hope this is helpful to others.
In the php:
header ('Content-type: text/html; charset=utf-8');
After connection:
mysql_set_charset("utf8");
Just to follow up with this, I was using dbForge Studio and just pasting in French text and I had all the collations/encoding set properly. The one thing I didn't have set was the actual encoding for the connection to the db. Set it to UTF8 and all was well again. #2 in #Janoszen answer.
Had the same problem. The input text came from ANSII file, so it wasn't quite UTF8, despite all my utf8 settings. utf8_encode(input_text) solved it.
I have tried
htmlentities()
. .it saves the string as it is in the database
You should try this to insert special character in mysql :
$con = mysql_connect($server,$uname,$pass);
$res = mysql_select_db($database,$con)
mysql_set_charset("letin1", $con);

Superscript character in PHP causing a MySQLi select query to find 0 rows

I am using PHP 5.3.3 and MySQL 5.1.61. The column in question is using UTF-8 encoding and the PHP file is encoded in UTF-8 without BOM.
When doing a MySQLi query with a ² character in SQLyog on Windows, the query executes properly and the correct search result displays.
If I do this same exact query in PHP, it will execute but will show 0 affected_rows.
Here's what I tried:
Using both LIKE instead of =
Changing the encoding of the PHP file to ANSI, UTF-8 without BOM, and UTF-8
Doing 'SET NAMES utf-8' and 'latin1' before running the query
Did header('Content-Type: text/html; charset=UTF-8'); in PHP
Escaping using MySQLi::real_escape_string
Doing a filter_var($String, FILTER_SANITIZE_STRING)
Tried a MySQLi stmt bind
The only way I could get it to work properly is if I swapped the ² for a % and changed = to LIKE in PHP.
How can I get it query properly in PHP when using the ²?
You should be able to get the query to work by ensuring the following:
Prepping PHP for UTF-8
You first need to make sure the PHP pages that will be issuing these queries are served as UTF-8 encoded pages. This will ensure that any UTF-8 output coming from the database is displayed properly. In Firefox, you can check to see if this is the case by visiting the page you're interested in and using the View Page Info menu item. When you do so, you should see UTF-8 as the value for the page's Encoding. If the page isn't being served as UTF-8, you can do so one of two ways. Either you can set the encoding in a call to header(), like this:
header('Content-Type: text/html; charset=UTF-8');
Or, you can use a meta tag in your page's head block:
<meta http-equiv="Content-Type" content="text/html; charset=UTF-8">
Prepping MySQL for UTF-8
Next up, you need to make sure the database is set up to use the UTF-8 encoding. This can be set at the server, database, table, or column levels. If you're on a shared host, you probably can only control the table and column levels of your hierarchy. If you have control of the server or database, you can check to see what character encoding they are using by issuing these two commands:
SHOW VARIABLES LIKE 'character_set_system';
SHOW VARIABLES LIKE 'character_set_database';
Changing the database level encoding can be done using a command like this:
(CREATE | ALTER) DATABASE ... DEFAULT CHARACTER SET utf8;
To see what character encoding a table uses, simply do:
SHOW CREATE TABLE myTable;
Similarly, here's how to change a table-level encoding:
(CREATE | ALTER) TABLE ... DEFAULT CHARACTER SET utf8;
I recommend setting the encoding as high as you possibly can in the hierarchy. This way, you don't have to remember to manually set it for new tables. Now, if your character encoding for a table is not already set to UTF-8, you can attempt to convert it using an alter statement like this:
ALTER TABLE ... CONVERT TO CHARACTER SET utf8;
Be very careful about using this statement! If you already have UTF-8 values in your tables, they may become corrupted when you attempt to convert. There are some ways to get around this, however.
Forcing MySQLi to Use UTF-8
Finally, before you connect to your database, make sure you issue the appropriate call to say that you are using the UTF-8 encoding. Here's how:
$db = new mysqli(DB_HOST, DB_USERNAME, DB_PASSWORD, DB_NAME);
// Change the character set to UTF-8 (have to do it early)
if(! $db->set_charset("utf8"))
{
printf("Error loading character set utf8: %sn", $db->error);
}
Once you do that, everything should hopefully work as expected. The only characters you need to worry about encoding are the big 5 for HTML: <, >, ', ", and &. You can handle that using the htmlspecialchars() function.
If you want to read more (and get links to additional resources), feel free to check out the articles I wrote about this process. There are two parts: Unicode and the Web: Part 1, and Unicode and the Web: Part 2. Good luck!

Issue with Chinese characters, PHP and MySQL

I have a MySQL database with the following settings:
MySQL charset: UTF-8 Unicode (utf8)
MySQL connection collation: utf8_unicode_ci
I have a table with a column named "softtitle", this column is coded in utf8_general_ci. The entries of this column contain Chinese characters. If I run SQL through the PHPMyAdmin Control pane, the Chinese characters are shown correctly. But if I run SQL through a PHP file, all Chinese characters are shown wrongly. Here is the PHP file:
<?php
header("Content-Type: text/html; charset=utf8_general_ci");
mysql_connect("116.123.163.73","xxdd_f1","xxdd123"); // host, username, password
mysql_select_db("xxdd");
mysql_query("SET names 'utf8_general_ci'");
$q = mysql_query("SELECT softtitle
FROM dede_ext
LIMIT 0 , 30");
while($e = mysql_fetch_assoc($q))
$output[] = $e;
print(json_encode($output));
mysql_close();
?>
What is wrong here? What should I do to fix this problem? Thank you very much!
You header is wrong. You're not supposed to set it to the character set of the table/database.
header("Content-Type: text/html; charset=UTF-8");
The same applies for "SET NAMES":
mysql_query("SET names 'utf8'");
And as a last thing, you are printing out json encoded data, your Content-type shouldn't be text/html but application/json.
because this Q&A still ranks highly in Google search results..
setting aside for a moment the general advice to switch from using mysql statements in php to using mysqli, replace this:
mysql_connect("116.123.163.73","xxdd_f1","xxdd123");
mysql_select_db("xxdd");
with this:
$con = mysql_connect("116.123.163.73","xxdd_f1","xxdd123");
mysql_set_charset($con, 'utf8');
mysql_select_db("xxdd");
check that your database, or at least the column, really IS collated as utf8_general_ci - though you might get better results with utf8mb4_unicode_ci
if your php is producing a purely JSON output, for example, a JSON 'object' that you're picking up with an AJAX call to pull data into another document, the header you should use in the PHP file is
header("Content-Type: application/json", true);
and not a header content-type of text/html or anything else.
and finally, assuming you're eventually presenting the values taken from the DB and placed into your JSON onto an html document, remember the to start that document with the following:
<!DOCTYPE html>
<html lang="zh-Hans">
<head>
<meta charset="utf-8">
...etc
note the declaration of utf-8 in the meta of the head, and the declaration of the language (chinese simplified in my code above = zh-Hans). If you are using a script which is written from right to left, eg. arabic sccripts, add dir="rtl" into the tag as well.

Categories