I have a json array which is holding the correct string independent of language but when the json is encoded and wrriten into the file it doesnot have the correct values. Its has the the other value random english alphabets eg:(uuadb) I want to write a string into a file where the string could be in any language.Now i am testing with tamil language. But i found PHP doesn't support unicode. please help me how to write unicode charaters into the file using PHP.
I tried using pack function but how to use the pack function for any languages Or is there any other way of doing this.Please help me......
My guess is that you're seeing \uXXXX escapes instead of the non-ASCII characters you asked for. json_encode appears to always escape Unicode characters:
<?php
$arr = array("♫");
$json = json_encode($arr);
echo "$json\n";
# Prints ["\u266b"]
$str = '["♫"]';
$array = json_decode($str);
echo "{$array[0]}\n";
# Prints ♫
?>
If this is what you're getting, it's not wrong. You just have to ensure it's being decoded properly on the receiving end.
Another possibility is that the string you're passing is not in UTF-8. According to the documentation for json_encode and json_decode, these functions only work with UTF-8 data. Call mb_detect_encoding on your input string, and make sure it outputs either UTF-8 or ASCII.
Related
My PHP application outputs JSON where special characters are encoded, f.ex. the string "Brøndum" is represented as "Br\u00f8ndum".
Can you tell me which encoding this is, as well as how I get back from "Br\u00f8ndum" to "Brøndum".
I have tried utf8_encode/decode but they don't work as expected.
Thanks!
That's standard JSON unicode escaping.
You get back to the actual character by using a JSON parser. json_decode in the case of PHP.
You can tell PHP not to escape Unicode characters in the first place with the JSON_UNESCAPED_UNICODE flag.
json_encode("Brøndum", JSON_UNESCAPED_UNICODE)
mb_detect_encoding is your function. You just pass it the string and it detects the codification. You can also send it an array with the possibilities (as a regular string like "hello" could potentially be encoded in different codifications.
echo mb_detect_encoding("Br\u00f8ndum");
i want to ask quick question, is json support arabic characters i mean when i search for something like following
$values = $database->get_by_name('معاً');
echo json_encode(array('returnedFromValue' => $value."<br/>"));
also I'm looking for arabic result from the database, the returned values will be like this
{"returnedFromValue":"\u0627\u0644\u0645\u0639\u0627\u062f\u0649<br\/>"}{"returnedFromValue":"\u0627\u0644\u0645\u0639\u0627\u062f\u0649<br\/>"}
what I'm missing here ? is it better to use XML in term of supporting the arabic characters
JSON is, just like XML, some kind of data-interchange-format. it's not addicted to a special charset, so arabic characters should be fine if u use a charset that supports these characters (UFT-8 for example).
PHP 5.4.0 will support a special option for json_encode() called JSON_UNESCAPED_UNICODE. This stops the default behaviour of converting characters to their \uXXXX form.
$value = 'معاً';
echo json_encode($value, JSON_UNESCAPED_UNICODE);
// Outputs: "معاً"
These \u0627-numbers are the Unicode-codepoints for your arabic letters. PHP uses them rather than the raw UTF-8 serialization, but they are there. So yes, JSON does support it. If the result string was printed out client-side (using Javascript) you would see the letters again.
This question already has answers here:
Any way to return PHP `json_encode` with encode UTF-8 and not Unicode? [duplicate]
(5 answers)
Closed 7 years ago.
I have a problem with json_encode function with special characters.
For example I try this:
$string="Svrček";
echo "ENCODING=".mb_detect_encoding($string); //ENCODING=UTF-8
echo "JSON=".json_encode($string); //JSON="Svr\u010dek"
What can I do to display the string correctly, so JSON="Svrček"?
Thank you very much.
json_encode() is not actually outputting JSON* there. It’s outputting a javascript string. (It outputs JSON when you give it an object or an array to encode.) That’s fine, as a javascript string is what you want.
In javascript (and in JSON), č may be escaped as \u010d. The two are equivalent. So there’s nothing wrong with what json_encode() is doing. It should work fine. I’d be very surprised if this is actually causing you any form of problem. However, if the transfer is safely in a Unicode encoding (UTF-8, usually)†, there’s no need for it either. If you want to turn off the escaping, you can do so thus: json_encode('Svrček', JSON_UNESCAPED_UNICODE). Note that the flag JSON_UNESCAPED_UNICODE was introduced in PHP 5.4.0, and is unavailable in earlier versions.
By the way, contrary to what #onteria_ says, JSON does use UTF-8:
The character encoding of JSON text is always Unicode. UTF-8 is the only encoding that makes sense on the wire, but UTF-16 and UTF-32 are also permitted.
* Or, at least, it's not outputting JSON as defined in RFC 4627. However, there are other definitions of JSON, by which scalar values are allowed.
† JSON may be in UTF-8, UTF-16LE, UTF-16BE, UFT-32LE, or UTF-32BE.
Ok, so, after you make database connection in your php script, put this line, and it should work, at least it solved my problem:
mysql_query('SET CHARACTER SET utf8');
Yes, json_encode escapes non-ascii characters. If you decode it you'll get your original result:
$string="こんにちは";
echo "ENCODING: " . mb_detect_encoding($string) . "\n";
$encoded = json_encode($string);
echo "ENCODED JSON: $encoded\n";
$decoded = json_decode($encoded);
echo "DECODED JSON: $decoded\n";
Output:
ENCODING: UTF-8
ENCODED JSON: "\u3053\u3093\u306b\u3061\u306f"
DECODED JSON: こんにちは
EDIT: It's worth nothing that:
JSON uses Unicode exclusively.
The self-documenting format that
describes structure and field names as
well as specific values;
Source: http://www.json.org/fatfree.html
It uses Unicode NOT UTF-8. This FAQ Explains the difference between UTF-8 and Unicode:
http://www.cl.cam.ac.uk/~mgk25/unicode.html#utf-8
You use JSON, your non-ascii characters get escaped into Unicode code points. For example こ = code point 3053.
I want to convert a JSON object into a string. when I am using json_encode I get a string but all with hex letters. I want to convert it to a UTF-8. In other words I want to see the characters. How do I do it?
I was using json_encode to store data such as Arabic Characters in MySQL fields.
It would store the Arabic characters as HEX within the Database like this:
u0644 u063a...
Which is incorrect. You must ensure that you wrap your json_encode with mysql_escape_string().
This will make sure that the data is put in MySQL as:
\u0644\u063a...
Then, when you use json_decode, it converts the HEX strings into UTF-8 and is output correctly.
You can try passing an option to json_encode():
json_encode ( $value, JSON_UNESCAPED_UNICODE );
The JSON_UNESCAPED_UNICODE option is only available in PHP version 5.4.0 and later.
Thanks.
You can't, in PHP. Besides, the strings will still be the same once you decode them.
you are looking exactly for the funcition json_decode
it can convert json strings into utf8
here is an example of arabic word
$re = json_encode('لغة عربية');
echo $re ;
$dd = json_decode($re);
echo $dd ;
die;
it output :
"\u0644\u063a\u0629 \u0639\u0631\u0628\u064a\u0629"
لغة عربية
more examples here
http://php.net/manual/en/function.json-decode.php
I am pulling data from the Facebook graph which has characters encoded like so: \u2014 and \u2014
Is there a function to convert those characters into HTML? i.e \u2014 -> —
If you have some further reading on these character codes), or suggested reading about unicode in general I would appreciate it. This is so confusing to me. I don't know what to call these codes... I guess unicode, but unicode seems to mean a whole lot of things.
that's not entirely true bobince.
How do you handle json containing spanish accents?
there are 2 problems.
I make FB.api(url, function(response)
... var s=JSON.stringify(response);
and pass it to a php script via $.post
First I get a truncated string. I need escape(JSON.stringify(response))
Then I get a full json encoded string with spanish accents.
As a test, I place it in a text file I load with file_get_contents and apply php json_decode and get nothing.
You first need utf8_encode.
And then you get awaiting object of your desire.
After a full day of test and google without any result when decoding unicode properly, I found your post.
So many thanks to you.
Someone asked me to solve the problem of Arabic texts from the Facebook JSON archive, maybe this code helps someone who searches for reading Arabic texts from Facebook (or instagram) JSON:
$str = '\u00d8\u00ae\u00d9\u0084\u00d8\u00b5';
function decode_encoded_utf8($string){
return preg_replace_callback('#\\\\u([0-9a-f]{4})#ism', function($matches) { return mb_convert_encoding(pack("H*", $matches[1]), "UTF-8", "UCS-2BE"); }, $string);
}
echo iconv("UTF-8", "ISO-8859-1//TRANSLIT", decode_encoded_utf8($str));
Facebook Graph API returns JSON objects. Use json_decode() to read them into PHP and you do not have to worry about handling string literal escapes like \uNNNN. Don't try to decode JSON/JavaScript string literals by yourself, or extract chosen properties using regex.
Having read the string value, you'll have a UTF-8-encoded string. If your target HTML is also UTF-8-encoded, you don't need to replace — (U+2014) with any entity reference. Just use htmlspecialchars() on the string when outputting it, so that any < or & characters in the string are properly encoded.
If you do for some reason need to produce ASCII-safe HTML, use htmlentities() with the charset arg set to 'utf-8'.