Another charset problem with php and MySQL - php

I'm having a problem with some characters like 'í' or 'ñ' working in a web project with PHP and MySQL.
The database table is in UTF-8 charset and the web page is ISO-8859-1 (latin-1). at first look everything is handled ok, but a problem is coming when I use the JSON_ENCODE function of PHP.
When I get a query result, let's say, this row:
| ID | VALUE |
--------------------
| 1 | Línea |
I got the following (correct) array in PHP:
Array("ID"=>"1","VALUE"=>"Línea");
So far, so good. But, when i apply the JSON_ENCODE
$result = json_encode($result);
//$result is {"id":"1","value":"L"}
Then i tried some coding/decoding but i couldn't get the right result.
First I tried to decode the UTF-8 chars like follow:
$result['value'] = utf8_decode($result['value']);
//and I get $result['value'] is "L?a"
Then I tried with mb functions:
$result['value'] = mb_convert_encoding($result['value'],"ISO-8859-1","UTF-8");
//and I get that $result['value'] is "Lnea"
I don't really know why is the Json_encode breaking my string and i can't figure out what else to try. I will appreciate any help :)
Thanks!

The documentation for json_encode states that the function will only work on UTF-8 data. If it's not working for you, it means that your data is not UTF-8.
To understand what's going wrong, you need to know what your connection character set is. Is it UTF-8? Something else? Use SET NAMES utf-8 and see if it makes any difference.
Assuming the connection character set is indeed UTF-8, json_encode should work just fine. Then, you still have the final issue of converting the encoded data to ISO-8859-1. For example:
// assume any strings in $result are UTF-8 encoded
$json = json_encode($result);
$output = mb_convert_encoding($json, 'ISO-8859-1', 'UTF-8');
echo $output;
If it still doesn't work, it means that your UTF-8 strings contain characters not available in the ISO-8859-1 character set. There's nothing you can do about that.
Update:
When debugging complex character set conversions like this, you can use file_put_contents to write intermediate results to a file which you can inspect with a hex editor. This will help confirm that the output of a particular step of the process is correct or not.

Related

Æøå in returned JSON result - the data doesn't look like it's supposed to

I have fetched some data from a url request using JSON with the following code:
$url = 'https://recruit.zoho.com/ats/private/xml/JobOpenings/getRecords?authtoken=$at&scope=recruitapi';
$request = new WP_Http;
$result = $request->request($url, $data = array());
$input = json_encode($result, true);
var_dump($input);
This code worked absolutely fine, except the data coming out looked really weird, such as:
"content-encoding":"gzip","vary":"Accept-Encoding","strict-transport-security":"max-age=15768000"},"body":"\u003C?xml version=\"1.0\" encoding=\"UTF-8\" ?\u003E\n\u003Cresponse uri=\"\/ats\/private\/xml\/JobOpenings\/getRecords\"\u003E\u003Cresult\u003E\u003CJobOpenings\u003E\u003Crow no=\"1\"\u003E\u003CFL val=\"JOBOPENINGID\"\u003E\u003C![CDATA[213748000001263043]]\u003E\u003C\/FL\u003E\u003CFL val=\"Published in website\"\u003E\u003C![CDATA[false]]\u003E\u003C\/FL\u003E\u003CFL val=\"Modified by\"\u003E\u003C![CDATA
After some research, I realize that part of the problem most likely is the fact that there are æ, ø, and å in the data I'm requesting. Others have solved the problem this way:
$input = json_encode(utf8_decode($result), true);
However this gives me this error:
Warning: utf8_decode() expects parameter 1 to be string, array given in
I know the array is not a string, but how else do I deal with this? It seems to have worked for others, and I cant figure out why.
Thanks.
Edit:
I noticed this in the beginning of the printed data.
string(31486) "{"headers":{"server":"ZGS","date":"Wed, 12 Aug 2015 13:59:32 GMT","content-type":"text\/xml;charset=utf-8"
Does that mean it is already UTF-8 and I'm totally off?
What you receive in $result is an utf-8 string that seems to represent an url of some sort. Anyhow, json_encode will escape any unicode character to \u008E strings.
If you don't want to escape utf-8 character, this question is relevent to you : Why does the PHP json_encode function convert UTF-8 strings to hexadecimal entities?
Everything seems to work fine from what I see. Although, the string you have provided us seem to be troncated but I guess this is an error on your part.

PHP json_encode assigns null instead of value?

I have a CSV file that looks like this:
http://ideone.com/YWuuWx
I read the file and convert it to array, which works completely fine, but then I jsonize the array - but json_encode doesnt put the real values - it puts null - here is the dump of the array and jsonized array:
http://jave.jecool.net/stackoverflowdemos/csv_to_json_to_arraydump.php
I convert like this: $php_array= json_encode($json_array,JSON_PRETTY_PRINT);
anyone knows what might cause the problem?
EDIT: I think ther is like 90% chance that its caused by the latin1 characters - anyone knows the best workaround?
Assuming that it is in fact an encoding error, and that your data is actually encoded in some ISO-8859 variant (I'm guessing latin2 rather than latin1 based on your use of LATIN SMALL LETTER R WITH CARON), and that it is CONSISTENTLY so, you can use iconv() to re-encode it as UTF-8 before doing json_encode():
$foo = iconv('ISO-8859-2', 'utf8', $foo);

php ActiveRecord and json_encode æøå encoding issue

Now I, in my own opionion, have tried everything there is on this encoding problem, looked through a lot of answered quistions but nothing worked for me, so here I go.
I have a MySQL database with a Users table. This table has a column for "firstname" which collation is set to utf8_general_ci (all varchar columns is). I have then inserted a row where the firstname-column is set to "Løw", with the scandinavian special character "ø".
I now use the php-ActiveRecord library, where the connection string is to ";charset=utf8", to retrieve the row and afterwards outputs the user as json, like so:
$user = User::find($ID);
$userArr = $user->to_array();
header('Content-Type: application/json; charset=utf-8');
print(json_encode($userArr));
Now the wired things starts. The firstname is now NOT "Løw" as displayed in the MySQL Database , but "L\u00f8w". I then tried to see if this was also the case without the json_encode function, like so:
$user = User::find($ID);
$userArr = $user->to_array();
header('Content-Type: text/plain; charset=utf-8');
print_r($userArr);
But here the output was correct, firstname was "Løw". I then tried to encode the fields in the array to utf-8, since everybody told me if the strings was utf-8 it should work, like so:
$return[] = array_map('utf8_encode', $userArr);
print_r(json_encode($return));
But this gave me "L\u00c3\u00b8w", so that didn't work. I then tried, since i was out of ideas to utf8_decode it:
$return[] = array_map('utf8_decode', $userArr);
print_r(json_encode($return));
But that made the string return as "null". I then tried to check what encoding my vars was when they came out of the database, like so:
header('Content-Type: text/plain; charset=utf-8');
print(mb_detect_encoding($userArr['firstname']));
But this returned UTF-8.
So as you, hopefully, can see, i have tried everything and i still don't know why my json_encode, changes the "ø" charcter to "\u00f8". Please help, i don't want to make my own json_encode-method.
Ok found an answer pretty quick, but ill let other scandinavian people know, since i coulden't find anything on the subject.
I solved the problem by adding the following to the json_encode method:
print(json_encode($userArr,JSON_UNESCAPED_UNICODE));
This tells the method NOT to escape unicode chars (i think) or as it says in the PHP doc:
JSON_UNESCAPED_UNICODE (integer)
Encode multibyte Unicode characters literally (default is to escape as
\uXXXX). Available since PHP 5.4.0.

UTF-8 data received by php isn't decoded

I'm having some troubles with my $_POST/$_REQUEST datas, they appear to be utf8_encoded still.
I am sending conventional ajax post requests, in these conditions:
oXhr.setRequestHeader("Content-type", "application/x-www-form-urlencoded; charset=utf-8");
js file saved under utf8-nobom format
meta-tags in html <header> tag setup
php files saved under utf-8-nobom format as well
encodeURIComponent is used but I tried without and it gives the same result
Ok, so everything is fine: the database is also in utf8, and receives it this way, pages show well.
But when I'm receiving the character "º" for example (through $_REQUEST or $_POST), its binary represention is 11000010 10111010, while "º" hardcoded in php (utf8...) binary representation is 10111010 only.
wtf? I just don't know whether it is a good thing or not... for instance if I use "#º#" as a delimiter of the explode php function, it won't get detected and this is actually the problem which lead me here.
Any help will be as usual greatly appreciated, thank you so much for your time.
Best rgds.
EDIT1: checking against mb_check_encoding
if (mb_check_encoding($_REQUEST[$i], 'UTF-8')) {
raise("$_REQUEST is encoded properly in utf8 at index " . $i);
} else {
raise(false);
}
The encoding got confirmed, I had the message raised up properly.
Single byte utf-8 characters do not have bit 7(the eight bit) set so 10111010 is not utf-8, your file is probably encoded in ISO-8859-1.

How to read Asiatic characters (Japanese, Chinese) after json_encode in PHP

I've read every post about the topic but I don't think I've found a reply to my question, that's driving me crazy.
I got a couple of php files, one stores data into mySQL db, another one read those data: I get data from all over the world and it seems that I succeed to store asiatic character in a right way, but when I try to read those data I can't get those characters back.
As many other users I got ?? instead of the correct chars.
Top of my PHP files I got:
header('Content-Type: application/json; charset=utf-8');
then
mysql_query("SET CHARACTER SET utf8", $link);
mysql_query("SET NAMES 'utf8'", $link);
then
$fab[] = array_map(utf8_encode,$array);
Here if I print_r ($fab) I lost asiatic chars :-(
Then when I do:
$json_string = json_encode($fab); //originale
What I get is "??".
How is the correct way to get the right chars back? The json string is then passed
to an iPhone client.
Any suggestion or help would be sooo appreciated.
Thank you anyway,
Fabrizio
Seems like you're double encoding it? If you get the data from mysql which is already utf8 encoded, what's the point of $fab[] = array_map(utf8_encode,$array); then?
Just had similar thing 2 days ago, when I was accepting utf8 data from an ExtJs form and it was messed up. It was cause I used utf8_encode on the data I received from the script (which was in utf8). So i broke it by double encoding. Maybe same in your case
The problem was what Tseng said: double encoding on the array: I thought I made the right test but simply I didn't.
So the only code I need is:
while($obj = mysql_fetch_object($rs)) {
$arr[] = $obj;
}
$json_string = json_encode($arr);
echo ($json_string);
Again Tseng, thanx.

Categories