I'm trying to decode a JSON feed containing some Cyrillic characters. Not all of the characters in the feed is Cyrillic though. I'm using json_decode which works fine for anything else, but return garbage when there are Cyrillic characters.
The results look like this: Деффачки
Any ideas?
Your page is being decoded as CP1252 when it's actually UTF-8. Set your headers properly.
>>> print u'Деффачки'.encode('cp1252').decode('utf-8')
Деффачки
if you can not decode unicode characters with json_decode, use addslashes() while using json_encode. The problem comes from unicode chars starting with \ such as \u30d7
$json_data = addslashes(json_encode($unicode_string_or_array));
hermanschutte Use the escape function when sending data through javascript
Related
My PHP application outputs JSON where special characters are encoded, f.ex. the string "Brøndum" is represented as "Br\u00f8ndum".
Can you tell me which encoding this is, as well as how I get back from "Br\u00f8ndum" to "Brøndum".
I have tried utf8_encode/decode but they don't work as expected.
Thanks!
That's standard JSON unicode escaping.
You get back to the actual character by using a JSON parser. json_decode in the case of PHP.
You can tell PHP not to escape Unicode characters in the first place with the JSON_UNESCAPED_UNICODE flag.
json_encode("Brøndum", JSON_UNESCAPED_UNICODE)
mb_detect_encoding is your function. You just pass it the string and it detects the codification. You can also send it an array with the possibilities (as a regular string like "hello" could potentially be encoded in different codifications.
echo mb_detect_encoding("Br\u00f8ndum");
I have a text coming from a CSV full of accent marks.
I check if mb_check_encoding($my_text, 'utf-8') is true, and yes, it is.
With this text I generate a variable $json which apply a
json_encode($json,JSON_NUMERIC_CHECK);
var_dump($json)
gives an array of arrays with all strange marks correct (é, ì, etc), but the generated JSON text is incorrect (ex: "Donn\u00e9es" instead of "Données").
I know that json_encode only works fine for utf8 encoded data, that's why I checked it before that it was utf8.
I tried also adding a header("Content-type: application/json; charset=UTF-8"); without success.
Then what could be the reason of that?
This is how JSON encodes "strange marks", i.e. Unicode characters. When you use json_decode() on your JSON-encoded string, it will return to normal.
I got Chinese characters encoded in ISO-8859-1, for example 兼 = 兼
Those characters are taken form the database using AJAX and sent by Json using json_encode.
I then use the template Handlebars to set the data on the page.
When I look at the ajax page the characters are displayed correctly, the source is still encoded.
But the final result displays the encrypted characters.
I tried to decode on the javascript part with unescape but there is no foreach with the template that gives me the possibility to decode the specific variable, so it crashes.
I tried to decode on the PHP side with htmlspecialchars_decode but without success.
Both pages are encoded in ISO-8859-1, but I can change them in UTF8 if necessary, but the data in the database remains encoded in ISO-8859-1.
Thank you for your help.
You're simply representing your characters in HTML entities. If you want them as "actual characters", you'll need to use an encoding that can represent those characters, ISO-8859 won't do. htmlspecialchars_decode doesn't work because it only decodes a handful of characters that are special in HTML and leaves other characters alone. You'll need html_entity_decode to decode all entities, and you'll need to provide it with a character set to decode to which can handle Chinese characters, UTF-8 being the obvious best choice:
$str = html_entity_decode($str, ENT_COMPAT, 'UTF-8');
You'll then need to make sure the browser knows that you're sending it UTF-8. If you want to store the text in the database in UTF-8 as well (which you really should), best follow the guide How to handle UTF-8 in a web app which explains all the pitfalls.
Are you including your text with the "double-stache" Handlebars syntax?
{{your expression}}
As the Handlebars documentation mentions, that syntax HTML-escapes its output, which would cause the results you're mentioning, where you're seeing the entity 兼 instead of 兼.
Using three braces instead ("triple-stache") won't escape the output and will let the browser correctly interpet those numeric entities:
{{{your expression}}}
I've got an ISO string that I fetch from database, and when I utf8_encode it, I get a \u00f6 instead of Ö. This confuses the javascript/html which ajaxes this PHP script. Why is there a \u00f6 instead of Ö? How to get Ö instead?
edit:
Ok, I did some more experimenting and it turns out this is caused by combination of utf8_encode and json_encode. Though if I don't utf8_encode at all, the value will be null in the json.
json_encode(array("city"=>utf8_encode("göteborg")))
utf8_encode doesn't encode characters to \uxxxx, as you figured out yourself it's json_encode doing this. And that's fine, because the JSON format specifies this behavior. If your client properly decodes the JSON string into a Javascript data type, the \uxxxx escapes will be turned into proper Unicode characters.
As for json_encode discarding characters if your string is Latin1 encoded: It's not explicitly stated on the manual page, but Javascript and JSON are entirely Unicode based, so I suspect Latin1 is an invalid and unexpected encoding to use with JSON strings, so it breaks.
How do you print that? javascript natively support \uXXXX encoding, and doing this in javascript:
var x = "\u00f6"; alert(x);
should print out a small ö.
EDIT: According to your code, if you output that directly to the response stream and use the actual response as a variable in js on the client side, you shouldn't care about json_encode at all.
You would just tell the browser that the content is utf8 by setting the content-type header:
header('content-type: text/plain;charset=utf8');
And then the jQuery.data() code would work just fine.
I have a json array which is holding the correct string independent of language but when the json is encoded and wrriten into the file it doesnot have the correct values. Its has the the other value random english alphabets eg:(uuadb) I want to write a string into a file where the string could be in any language.Now i am testing with tamil language. But i found PHP doesn't support unicode. please help me how to write unicode charaters into the file using PHP.
I tried using pack function but how to use the pack function for any languages Or is there any other way of doing this.Please help me......
My guess is that you're seeing \uXXXX escapes instead of the non-ASCII characters you asked for. json_encode appears to always escape Unicode characters:
<?php
$arr = array("♫");
$json = json_encode($arr);
echo "$json\n";
# Prints ["\u266b"]
$str = '["♫"]';
$array = json_decode($str);
echo "{$array[0]}\n";
# Prints ♫
?>
If this is what you're getting, it's not wrong. You just have to ensure it's being decoded properly on the receiving end.
Another possibility is that the string you're passing is not in UTF-8. According to the documentation for json_encode and json_decode, these functions only work with UTF-8 data. Call mb_detect_encoding on your input string, and make sure it outputs either UTF-8 or ASCII.