Php json_encode converts utf8 string to characters codes - php

I have a Persian text "سرما"
And then when I convert it to JSON using json_encode(), I get a series of escaped character codes such as \u0633 which seems to be expected and of a rational process. But my confusion lies where I don't know how to convert them back into readable string of characters. How should I do that in PHP?
Should I use anything of mb_* family? I also have checked json_encode() parameters and have found nothing appropriate for me.
UPDATE
what I get saved in my DB is:
["u0633u0631u0645u0627"]
Which shows the characters are not escaped properly. While if I change it to
["\u0633\u0631\u0645\u0627"] it becomes easily readable by json_decode()

They should be converted back on the other end when it's decoded. This is the safest option as it might not be possible to guaranteed that the transmission or storage will not corrupt a multi-byte encoding.
If you're certain that everything is safe for UTF8 end-to-end you can do:
$res = json_encode($foo, \JSON_UNESCAPED_UNICODE);
http://php.net/manual/en/function.json-encode.php

Maybe try encoding the unicode characters, and then json_encoding it, then on the other side (receiving JSON) decode the json, then decode the unicode.
Example:
//Encode
json_encode(utf8_encode($string));
//Decode
utf8_decode(json_decode($string));

its simple just use JSON_UNESCAPED_SLASHES atribute
your problem is't utf8 you need force JSON to don't escape Slashes
example
$bar = "سرما";
$res = json_encode($bar, JSON_UNESCAPED_SLASHES );
// $res equal to ["\u0633\u0631\u0645\u0627"]
if you check the result in your MYSQL Database
it happen when you did't Use addslashes()
example
$bar = "سرما";
$res = json_encode($bar, JSON_UNESCAPED_SLASHES );
$res = addslashes($res);
// $res equal to ["\\u0633\\u0631\\u0645\\u0627"] now it's ready to use in MYSQL

Related

Json encode with special char [duplicate]

I have some json I need to decode, alter and then encode without messing up any characters.
If I have a unicode character in a json string it will not decode. I'm not sure why since json.org says a string can contain: any-Unicode-character- except-"-or-\-or- control-character. But it doesn't work in python either.
{"Tag":"Odómetro"}
I can use utf8_encode which will allow the string to be decoded with json_decode, however the character gets mangled into something else. This is the result from a print_r of the result array. Two characters.
[Tag] => Odómetro
When I encode the array again I the character escaped to ascii, which is correct according to the json spec:
"Tag"=>"Od\u00f3metro"
Is there some way I can un-escape this? json_encode gives no such option, utf8_encode does not seem to work either.
Edit I see there is an unescaped_unicode option for json_encode. However it's not working as expected. Oh damn, it's only on php 5.4. I will have to use some regex as I only have 5.3.
$json = json_encode($array, JSON_UNESCAPED_UNICODE);
Warning: json_encode() expects parameter 2 to be long, string ...
I have found following way to fix this issue... I hope this can help you.
json_encode($data,JSON_UNESCAPED_UNICODE|JSON_UNESCAPED_SLASHES);
Judging from everything you've said, it seems like the original Odómetro string you're dealing with is encoded with ISO 8859-1, not UTF-8.
Here's why I think so:
json_encode produced parseable output after you ran the input string through utf8_encode, which converts from ISO 8859-1 to UTF-8.
You did say that you got "mangled" output when using print_r after doing utf8_encode, but the mangled output you got is actually exactly what would happen by trying to parse UTF-8 text as ISO 8859-1 (ó is \x63\xb3 in UTF-8, but that sequence is ó in ISO 8859-1.
Your htmlentities hackaround solution worked. htmlentities needs to know what the encoding of the input string to work correctly. If you don't specify one, it assumes ISO 8859-1. (html_entity_decode, confusingly, defaults to UTF-8, so your method had the effect of converting from ISO 8859-1 to UTF-8.)
You said you had the same problem in Python, which would seem to exclude PHP from being the issue.
PHP will use the \uXXXX escaping, but as you noted, this is valid JSON.
So, it seems like you need to configure your connection to Postgres so that it will give you UTF-8 strings. The PHP manual indicates you'd do this by appending options='--client_encoding=UTF8' to the connection string. There's also the possibility that the data currently stored in the database is in the wrong encoding. (You could simply use utf8_encode, but this will only support characters that are part of ISO 8859-1).
Finally, as another answer noted, you do need to make sure that you're declaring the proper charset, with an HTTP header or otherwise (of course, this particular issue might have just been an artifact of the environment where you did your print_r testing).
JSON_UNESCAPED_UNICODE was added in PHP 5.4 so it looks like you need upgrade your version of PHP to take advantage of it. 5.4 is not released yet though! :(
There is a 5.4 alpha release candidate on QA though if you want to play on your development machine.
A hacky way of doing JSON_UNESCAPED_UNICODE in PHP 5.3. Really disappointed by PHP json support. Maybe this will help someone else.
$array = some_json();
// Encode all string children in the array to html entities.
array_walk_recursive($array, function(&$item, $key) {
if(is_string($item)) {
$item = htmlentities($item);
}
});
$json = json_encode($array);
// Decode the html entities and end up with unicode again.
$json = html_entity_decode($rson);
$json = array('tag' => 'Odómetro'); // Original array
$json = json_encode($json); // {"Tag":"Od\u00f3metro"}
$json = json_decode($json); // Od\u00f3metro becomes Odómetro
echo $json->{'tag'}; // Odómetro
echo utf8_decode($json->{'tag'}); // Odómetro
You were close, just use utf8_decode.
try setting the utf-8 encoding in your page:
header('content-type:text/html;charset=utf-8');
this works for me:
$arr = array('tag' => 'Odómetro');
$encoded = json_encode($arr);
$decoded = json_decode($encoded);
echo $decoded->{'tag'};
Try Using:
utf8_decode() and utf8_encode
To encode an array that contains special characters, ISO 8859-1 to UTF8. (If utf8_encode & utf8_decode is not what is working for you, this might be an option)
Everything that is in ISO-8859-1 should be converted to UTF8:
$utf8 = utf8_encode('이 감사의 마음을 전합니다!'); //contains UTF8 & ISO 8859-1 characters;
$iso88591 = mb_convert_encoding($utf8, 'ISO-8859-1', 'UTF-8');
$data = $iso88591;
Encode should work after this:
$encoded_data = json_encode($data);
Convert UTF-8 to & from ISO 8859-1

How do I convert an utf8 string in "\uxxxx" format to latin1?

I have to save a JSON formatted string into my latin1 mysql db. In order to be able to use the uft8_encode function, I have to convert the entire array to utf8, and then convert the resultant string back to latin1.
So I tried the following code:
// $context is equal to array('produção' => 'homologação'), for testing purposes
$context = Helper::getHelper('Util')->encodeUtf8($context); // Encodes key and value with utf8_encode
$context = json_encode($context); // {"produ\u00e7\u00e3o":"homologa\u00e7\u00e3o"}
$context = utf8_decode($context); // Still {"produ\u00e7\u00e3o":"homologa\u00e7\u00e3o"}
But as you can see, it just doesn't work as I expected. I tried to use Zend_Json library too, but it doesn't work with those chars either.
To simplify: I need to encode a latin1 array to JSON, and then insert that JSON in my latin1 db.
Anybody knows how I can do that? A better way to accomplish the same result will be much appreciated as well.
You are performing utf8_decode on something that is not utf8.
JSON encoded content is always ASCII so performing utf8_decode will do nothing (ASCII is a subset of UTF8). You must first decode the JSON.
The correct sequence would be:
$string = "some UTF8 string"; // utf8
$json = json_encode($string); // json
$utf8 = json_decode($json); // utf8
$latin = utf8_decode($utf8); // latin1
Of course, this JSON step here is unnecessary but I'm guessing you're using JSON to transmit or store your data (which is a good idea!).
Since you updated the question:
JSON is ASCII so storing it in a latin1 encoded field should be no problem.
If you want your utf8 encoded data to be sent to the client as latin1 then you need to do some encoding conversion, either before you put it in the database or after you pull it out.
My point is that you don't need to do any tricks to store the JSON in the database. This should not be part of the question. At this point it is to me still unclear what you want. The statement:
To simplify: I need to encode a latin1 array to JSON, and then insert that JSON in my latin1 db.
does not rhyme with your code sample where your input is (I assume) utf8 encoded JSON.
I have an latin1 encoded array. I have to encode that array to JSON, and then store that JSON in my also latin1 db. My first problem was that json_encode only accepts utf8 encoded arrays, so I had to encode my entire array to utf8.
But the real problem was my db. When I inserted the JSON, it inserts the literal string, with some "\uxxxx" sequences. I first thought those were just utf8 chars, so I tried to decode them. Obviously, I was wrong.
#Frits' explanation that json_encode's result is pure ascii helped me a lot and made me look at different directions, and I found the solution for my problem.
Since the "\uxxxx" sequences were just ascii, what I really needed was to replace those sequences with the proper utf8 chars, and then decode the entire string.
It's well explained here:
How to decode Unicode escape sequences like "\u00ed" to proper UTF-8 encoded characters?
I'm particulary not really happy about that solution, but I have a deadline. So, if someone has a better way to do that, please, share with me.
I hope that helps some people in the same situation. Despite its ugliness, it works.

write unicode characters into a file in php

I have a json array which is holding the correct string independent of language but when the json is encoded and wrriten into the file it doesnot have the correct values. Its has the the other value random english alphabets eg:(uuadb) I want to write a string into a file where the string could be in any language.Now i am testing with tamil language. But i found PHP doesn't support unicode. please help me how to write unicode charaters into the file using PHP.
I tried using pack function but how to use the pack function for any languages Or is there any other way of doing this.Please help me......
My guess is that you're seeing \uXXXX escapes instead of the non-ASCII characters you asked for. json_encode appears to always escape Unicode characters:
<?php
$arr = array("♫");
$json = json_encode($arr);
echo "$json\n";
# Prints ["\u266b"]
$str = '["♫"]';
$array = json_decode($str);
echo "{$array[0]}\n";
# Prints ♫
?>
If this is what you're getting, it's not wrong. You just have to ensure it's being decoded properly on the receiving end.
Another possibility is that the string you're passing is not in UTF-8. According to the documentation for json_encode and json_decode, these functions only work with UTF-8 data. Call mb_detect_encoding on your input string, and make sure it outputs either UTF-8 or ASCII.

Translate URLENCODED data into UTF-8 in PHP

I've got a string that is in my database like 中华武魂 when I post my request to retrieve the data via my website I'm getting the data to the server in the format %E4%B8%AD%E5%8D%8E%E6%AD%A6%E9%AD%82
What decoding steps to I have to take in order to get it back to the usable form?
While also cleaning the user input to ensure they're not going to try an SQL injection attack?
(escape string before or after encoding?)
EDIT:
rawurldecode(); // returns "中åŽæ­¦é­‚"
urldecode(); // returns "中åŽæ­¦é­‚"
public function utf8_urldecode($str) {
$str = preg_replace("/%u([0-9a-f]{3,4})/i","&#x\\1;",urldecode($str));
return html_entity_decode($str,null,'UTF-8');
}
// returns "中åŽæ­¦é­‚"
... which actually works when I try and use it in an SQL statement.
I think because I was doing an echo and die(); without specifying a header of UTF-8 (thus I guess that was reading to me as latin)
Thanks for the help!
When your data is actually that percent-encoded form, you just have to call rawurldecode:
$data = '%E4%B8%AD%E5%8D%8E%E6%AD%A6%E9%AD%82';
$str = rawurldecode($data);
This suffices as the data already is encoded in UTF-8: 中 (U+4E2D) is encoded with the byte sequence 0xE4B8AD in UTF-8 and that is encoded with %E4%B8%AD when using the percent-encoding.
That your output does not seem to be as expected is probably because the output is interpreted with the wrong character encoding, probably Windows-1252 instead of UTF-8. Because in Windows-1252, 0xE4 represents ä, 0xB8 represents ¸, 0xAD represents å, and so on. So make sure to specify the output character encoding properly.
Use PHP's urldecode:
http://php.net/manual/en/function.urldecode.php
You have choices here: urldecode or rawurldecode.
If you had encoded your string using urlencode, you must use urldecode because of the way spaces are handled. While urlencode converts spaces to +, it is not the same with rawurlencode.

Convert a JSON into a UTF-8 string

I want to convert a JSON object into a string. when I am using json_encode I get a string but all with hex letters. I want to convert it to a UTF-8. In other words I want to see the characters. How do I do it?
I was using json_encode to store data such as Arabic Characters in MySQL fields.
It would store the Arabic characters as HEX within the Database like this:
u0644 u063a...
Which is incorrect. You must ensure that you wrap your json_encode with mysql_escape_string().
This will make sure that the data is put in MySQL as:
\u0644\u063a...
Then, when you use json_decode, it converts the HEX strings into UTF-8 and is output correctly.
You can try passing an option to json_encode():
json_encode ( $value, JSON_UNESCAPED_UNICODE );
The JSON_UNESCAPED_UNICODE option is only available in PHP version 5.4.0 and later.
Thanks.
You can't, in PHP. Besides, the strings will still be the same once you decode them.
you are looking exactly for the funcition json_decode
it can convert json strings into utf8
here is an example of arabic word
$re = json_encode('لغة عربية');
echo $re ;
$dd = json_decode($re);
echo $dd ;
die;
it output :
"\u0644\u063a\u0629 \u0639\u0631\u0628\u064a\u0629"
لغة عربية
more examples here
http://php.net/manual/en/function.json-decode.php

Categories