This question already has answers here:
UTF-8 all the way through
(13 answers)
Closed 7 years ago.
So I was having an issue with json_encode returning null that I found the solution for here, but I don't understand why it was an issue in the first place. The MySQL tables from which I was drawing the data are defined like
CREATE TABLE `super_table` (
`id` int(11) NOT NULL AUTO_INCREMENT,
`name` varchar(100) DEFAULT NULL,
`values` text,
`created` timestamp NOT NULL DEFAULT CURRENT_TIMESTAMP,
PRIMARY KEY (`id`)
) ENGINE=InnoDB AUTO_INCREMENT=1 DEFAULT CHARSET=utf8;
so shouldn't values and name be utf8 encoded already when I pull them out in PHP? A simplified version of what I'm doing:
$sql = "SELECT * FROM super_table";
$query = $mysqli->query($sql);
$data = (object) $query->fetch_all(MYSQLI_ASSOC);
foreach ($data as $key => $val) {
$data->$key = utf8_encode($val);
}
$result = array('success'=>$success, 'data'=>$data);
echo json_encode($result);
Why does not doing the extra utf8_encode step sometimes yield a null result when I try to json_encode it?
Indeed, MySQL's utf8 is not universally understood UTF-8. Even though this is not your issue here... MySQL's utf8 is a subset of actual UTF-8 only covering the BMP and not supporting 4-byte characters. But this just means that high characters will get discarded; otherwise it's still UTF-8 compatible.
Your actual issue is that MySQL is just storing the data in utf8, but that says nothing about how you will receive the data in your database client. MySQL converts text on the fly from the stored encoding to the connection encoding (and vice versa). When connecting to the database in your PHP code, you can choose which encoding you prefer to receive your data in. Use $mysqli->set_charset('utf8') to retrieve your data in UTF-8.
Related
I have the following serialized array stored in a MySQL longblob data field:
a:1:{s:10:"attributes";a:1:{s:13:"Ticket Holder";a:1:{i:0;s:8:"Joe Blow";}}}
In PHP, when I query the field, unserialize it, and print it out, the following empty array is printed:
Array
(
)
This is the table create statement:
CREATE TABLE `order` (
`state` varchar(255) CHARACTER SET ascii DEFAULT NULL,
`data` longblob,
`created` int(11) DEFAULT NULL
) ENGINE=InnoDB DEFAULT CHARSET=utf8mb4;
Your serialized data is invalid. You must never mainpulate serialized data manually.
Strings are serialized as:
s:<i>:"<s>";
where <i> is an integer representing the string length of <s>, and <s> is the string value.
So in this case valid data is :
a:1:{s:10:"attributes";a:1:{s:13:"Ticket Holder";a:1:{i:0;s:8:"Joe Blow";}}}
Joe Blow string length is 8 but in your serialized strings is defined 13.
See : Structure of a Serialized PHP string
Use base64_encode
The core problem comes from the binary encoding done by the database and the extra bytes it adds to the value stored. This "corrupts" the data that is being fetched from the database and causes unserialize to fail. To help mitigate this problem, you can use base64_encode and base64_decode which will allow you to insert and extract the data painlessly.
// data is encoded before insert
$dataToInsert = base64_encode(serialize($myArray));
// decode the data before unserializing it
$dataToRead = unserialize(base64_decode($longblob));
Converting the column to UTF8 in the SELECT statement and then unserializing it.
CONVERT (data USING utf8)
I'm not sure why this is necessary. I'm guessing it has something to do with the utf8mb4_general_ci collation.
Thank you for all of your help guys!
This is on my windows test platform.
I have the following csv:
You have signed out successfully!,ar,لقد خرجت بنجاح!
I have the following table definition:
CREATE TABLE `translations` (
`sourcephrase` varchar(250) NOT NULL,
`language` char(5) NOT NULL,
`translatedphrase` varchar(250) CHARACTER SET utf8 DEFAULT NULL,
PRIMARY KEY (`sourcephrase`,`language`),
KEY `language` (`language`)
) ENGINE=InnoDB DEFAULT CHARSET=latin1
If I load this csv into table (via mysql workbench, import csv), I get the data just fine.
sourcephrase, language, translation
You have signed out successfully! ar لقد خرجت بنجاح!
If instead I run this php code (where psquery is just execute a prepared statement):
$sourcephrase="You have signed out successfully!";
$language="ar";
$translated="لقد خرجت بنجاح!";
$sql = "insert into translations (sourcephrase, language, translatedphrase) values (?,?,?)";
$this->DB->psquery($sql, array("sss", $sp, $language, $translated));
The table contains the following data:
You have signed out successfully! ar لقد خرجت بنجاØ!
Why am I getting a different result in php ? (I know its something utf8 related, but I can't see what). I don't believe it's mysql related, as the csv import is just fine.
لقد خرجت بنجاØ! is Mojibake for the desired string. See this for the likely causes, best practice, and debugging techniques.
Probably this item is relevant to your PHP connection: "The connection when INSERTing and SELECTing text needs to specify utf8 or utf8mb4."
Hi I am developing a mobile app using phonegap and I am querying the MySQL database through ajax (jsonp). However I have an issue when special characters are returned as they are displayed as "?" instead for example Ż.
At the moment in my PHP I have added this, however it did not do the trick:
header('content-type: application/json; charset=UTF-8');
Is anyone aware of any other charset that can be used which includes special characters like the above?
First thing is first
a) Fix the db tables
Make sure that tables defined with proper character set
e.g
CREATE TABLE `types` (
`id` int(11) NOT NULL AUTO_INCREMENT,
`name` varchar(255) DEFAULT NULL,
PRIMARY KEY (`id`)
) ENGINE=MyISAM AUTO_INCREMENT=0 DEFAULT CHARSET=utf8
b) After connection to db ensure following things
SET NAMES 'utf8';
[I also run following ]
SET character_set_client ='utf8',
character_set_connection ='utf8',
character_set_database ='utf8',
character_set_results ='utf8',
character_set_server ='utf8',
collation_connection ='utf8_general_ci',
collation_database ='utf8_general_ci',
collation_server ='utf8_general_ci'
c) Finally set proper content type for the html page
hope this will help you
You're working with a MySQL database? So try to set a utf8 charset to the database connection like:
$conn = mysql_connect('localhost','user1','pass1',TRUE);
mysql_set_charset('utf8',$conn);
Or try UTF-8 encoding
string utf8_encode ( string $data )
Parameters:
data
An ISO-8859-1 string.
Return Values:
Returns the UTF-8 translation of data.
According to the JSON implementation standards, all JSON data must be encoded in UTF format, the default format being UTF-8.
But you can always use other UTF formats, such as UTF-32BE, UTF-16BE, UTF-32LE, UTF-16LE.
For detailed standards and information, visit ietf standard.
I'm reading a UTF-8 encoded file using PHP and splatting the contents directly into a database. The problem is that when i encounter a character such as ” , it places the following †into the database.
How can i encode this correctly, i'm reading a UTF-8 file and my database column's collation is a UTF-8. What am i doing wrong? Is there a nice function i'm missing? Any help is welcome.
This is my table:
CREATE TABLE tblProductData (
intProductDataId int(10) unsigned NOT NULL AUTO_INCREMENT,
strProductName varchar(50) NOT NULL,
strProductDesc varchar(255) NOT NULL,
strProductCode varchar(10) NOT NULL,
dtmAdded datetime DEFAULT NULL,
dtmDiscontinued datetime DEFAULT NULL,
stmTimestamp timestamp NOT NULL DEFAULT CURRENT_TIMESTAMP ON UPDATE CURRENT_TIMESTAMP,
PRIMARY KEY (intProductDataId),
UNIQUE KEY (strProductCode)
) ENGINE=InnoDB DEFAULT CHARSET=utf8 COLLATE utf8_unicode_ci;
EDIT:
I'm reading the date like this:
$hFile = #fopen($FileName, "r") or exit("\nUnable to open file: " . $FileName);
if($hFile)
{
while(!feof($hFile))
{
$Line = fgets($hFile);
$this->Products[] = new Product($Line);
}
fclose($hFile);
}
use
mysql_query("SET NAMES utf8");
just after connection to DB and be sure that browser encoding is in utf-8, too
header("Content-Type: text/html; charset: utf-8");
You should set your connection encoding with this query
SET NAMES 'utf8'
before storing any data.
Keep also in mind that some database gui or web gui (i.e. phpMyAdmin) shows wrong encoding even if your data are encoded correctly. This happen for example with SequelPro on Mac and with phpMyAdmin in some environments.
You should trust your browser, i.e. show your inserted content in a page which has the
<meta http-equiv="Content-Type" content="text/html; charset=UTF-8" />
header and see if the data are shown correctly. Or even better trust mysql command line using the shell:
echo 'SELECT yourdata FROM your table' | mysql -uuser -pyourpwd db_name
I'm doing a project with zend framework and I'm pulling data from a utf-8 database. The project is utf-8 as well.
In a form, I have a select element displaying a list of countries. The problem is:
In french or spanish, some countries are not displayed.
After doing a var_dump() of my country list, I saw that those were the countries with special characters. Accented ones.
in the var_dump I could see the character represented as a ? in a diamond. I tried changing the encoding to iso-8859-1 and I could see the var_dump result with the special characters just fine.
How come data coming from a utf-8 database are displaying in iso-8859-1!
Can I store iso-8859-1 character set in a utf-8 table in mysql without problem? Shouldn't it display messed up characters?
confused.
--
delimiter $$
CREATE TABLE `geo_Country` (
`CountryID` int(10) NOT NULL,
`CountryName` varchar(45) NOT NULL,
`CountryCompleteName` varchar(45) NOT NULL,
`Nationality` varchar(45) NOT NULL,
`CreationDate` timestamp NOT NULL DEFAULT CURRENT_TIMESTAMP,
`Status` tinyint(1) NOT NULL DEFAULT '1',
`LanguageCode` char(2) NOT NULL,
`ZoneID` int(10) NOT NULL,
PRIMARY KEY (`CountryID`,`LanguageCode`),
KEY `fk_geo_Country_web_Language1` (`LanguageCode`),
KEY `fk_geo_Country_geo_Zone` (`ZoneID`),
KEY `idx_CountryName` (`CountryName`)
CONSTRAINT `fk_geo_Country_geo_Zone` FOREIGN KEY (`ZoneID`) REFERENCES `geo_Zone` (`ZoneID`) ON DELETE NO ACTION ON UPDATE NO ACTION,
CONSTRAINT `fk_geo_Country_web_Language1` FOREIGN KEY (`LanguageCode`) REFERENCES `web_Language` (`LanguageCode`) ON DELETE NO ACTION ON UPDATE NO ACTION
) ENGINE=InnoDB DEFAULT CHARSET=utf8$$
The thing to remember with UTF-8 is this:
Everything in your entire application needs to be UTF-8!
For a normal PHP/MySQL web application (a form, posting to a database), you need to check if:
Your database connection uses UTF-8 (execute this query right after your connection is set up: SET NAMES UTF8;)
Your PHP code uses UTF-8. That means no using character set translation/encoding functions (no need to when everything is UTF-8).
Your HTML output is UTF-8, by either sending a Content-Type: text/html; charset=utf8 header, of using a <meta charset="utf8"> tag (for HTML5, for other HTML variants, use <meta http-equiv="Content-Type" content="text/html; charset=utf8">)
In your case of var_dump'ing, there is just some plain text that is sent to the browser, without any mention of a character set. Looking at rule #3, this means your browser is displaying this in a different character set, presumably latin1, thus giving you the diamonds/question marks/blocks.
If you need to check if your data is stored properly, use a database client like PHPMyAdmin to view the record. This way you're viewing the content as UTF-8 (NOTE: this is a setting in PMA, so check if it is not set to a different charset!).
On a side note, set the collation of your databases' text columns to utf8_general_ci, this is not used for storing, but for sorting. So this isn't related to your problem, but it's a good practice to do so.
When connecting to database you should set up cleint encoding.
for Zend_Db it seems should be like this (notice 'driver_options'):
$params = array(
'host' => 'localhost',
'username' => 'username',
'password' => 'password',
'dbname' => 'dbname',
'driver_options' => array(PDO::MYSQL_ATTR_INIT_COMMAND => 'SET NAMES UTF8;');
);
for the application.ini
resources.db.params.charset = utf8
as a last resort you could just run this query SET NAMES UTF8 manually just like any other query.