This question already has an answer here:
Reference: Why are my "special" Unicode characters encoded weird using json_encode?
(1 answer)
Closed 2 years ago.
Any way to return PHP json_encode with encode UTF-8 and not Unicode?
$arr=array('a'=>'á');
echo json_encode($arr);
mb_internal_encoding('UTF-8');and $arr=array_map('utf8_encode',$arr); does not fix it.
Result: {"a":"\u00e1"}
Expected result: {"a":"á"}
{"a":"\u00e1"} and {"a":"á"} are different ways to write the same JSON document; The JSON decoder will decode the unicode escape.
In php 5.4+, php's json_encode does have the JSON_UNESCAPED_UNICODE option for plain output. On older php versions, you can roll out your own JSON encoder that does not encode non-ASCII characters, or use Pear's JSON encoder and remove line 349 to 433.
I resolved my problem doing this:
The .php file is encoded to ANSI. In this file is the function to create the .json file.
I use json_encode($array, JSON_UNESCAPED_UNICODE) to encode the data;
The result is a .json file encoded to ANSI as UTF-8.
This function found here, works fine for me
function jsonRemoveUnicodeSequences($struct) {
return preg_replace("/\\\\u([a-f0-9]{4})/e", "iconv('UCS-4LE','UTF-8',pack('V', hexdec('U$1')))", json_encode($struct));
}
Use JSON_UNESCAPED_UNICODE inside json_encode() if your php version >=5.4.
just use this,
utf8_encode($string);
you've to replace your $arr with $string.
I think it will work...try this.
Related
I have some json I need to decode, alter and then encode without messing up any characters.
If I have a unicode character in a json string it will not decode. I'm not sure why since json.org says a string can contain: any-Unicode-character- except-"-or-\-or- control-character. But it doesn't work in python either.
{"Tag":"Odómetro"}
I can use utf8_encode which will allow the string to be decoded with json_decode, however the character gets mangled into something else. This is the result from a print_r of the result array. Two characters.
[Tag] => Odómetro
When I encode the array again I the character escaped to ascii, which is correct according to the json spec:
"Tag"=>"Od\u00f3metro"
Is there some way I can un-escape this? json_encode gives no such option, utf8_encode does not seem to work either.
Edit I see there is an unescaped_unicode option for json_encode. However it's not working as expected. Oh damn, it's only on php 5.4. I will have to use some regex as I only have 5.3.
$json = json_encode($array, JSON_UNESCAPED_UNICODE);
Warning: json_encode() expects parameter 2 to be long, string ...
I have found following way to fix this issue... I hope this can help you.
json_encode($data,JSON_UNESCAPED_UNICODE|JSON_UNESCAPED_SLASHES);
Judging from everything you've said, it seems like the original Odómetro string you're dealing with is encoded with ISO 8859-1, not UTF-8.
Here's why I think so:
json_encode produced parseable output after you ran the input string through utf8_encode, which converts from ISO 8859-1 to UTF-8.
You did say that you got "mangled" output when using print_r after doing utf8_encode, but the mangled output you got is actually exactly what would happen by trying to parse UTF-8 text as ISO 8859-1 (ó is \x63\xb3 in UTF-8, but that sequence is ó in ISO 8859-1.
Your htmlentities hackaround solution worked. htmlentities needs to know what the encoding of the input string to work correctly. If you don't specify one, it assumes ISO 8859-1. (html_entity_decode, confusingly, defaults to UTF-8, so your method had the effect of converting from ISO 8859-1 to UTF-8.)
You said you had the same problem in Python, which would seem to exclude PHP from being the issue.
PHP will use the \uXXXX escaping, but as you noted, this is valid JSON.
So, it seems like you need to configure your connection to Postgres so that it will give you UTF-8 strings. The PHP manual indicates you'd do this by appending options='--client_encoding=UTF8' to the connection string. There's also the possibility that the data currently stored in the database is in the wrong encoding. (You could simply use utf8_encode, but this will only support characters that are part of ISO 8859-1).
Finally, as another answer noted, you do need to make sure that you're declaring the proper charset, with an HTTP header or otherwise (of course, this particular issue might have just been an artifact of the environment where you did your print_r testing).
JSON_UNESCAPED_UNICODE was added in PHP 5.4 so it looks like you need upgrade your version of PHP to take advantage of it. 5.4 is not released yet though! :(
There is a 5.4 alpha release candidate on QA though if you want to play on your development machine.
A hacky way of doing JSON_UNESCAPED_UNICODE in PHP 5.3. Really disappointed by PHP json support. Maybe this will help someone else.
$array = some_json();
// Encode all string children in the array to html entities.
array_walk_recursive($array, function(&$item, $key) {
if(is_string($item)) {
$item = htmlentities($item);
}
});
$json = json_encode($array);
// Decode the html entities and end up with unicode again.
$json = html_entity_decode($rson);
$json = array('tag' => 'Odómetro'); // Original array
$json = json_encode($json); // {"Tag":"Od\u00f3metro"}
$json = json_decode($json); // Od\u00f3metro becomes Odómetro
echo $json->{'tag'}; // Odómetro
echo utf8_decode($json->{'tag'}); // Odómetro
You were close, just use utf8_decode.
try setting the utf-8 encoding in your page:
header('content-type:text/html;charset=utf-8');
this works for me:
$arr = array('tag' => 'Odómetro');
$encoded = json_encode($arr);
$decoded = json_decode($encoded);
echo $decoded->{'tag'};
Try Using:
utf8_decode() and utf8_encode
To encode an array that contains special characters, ISO 8859-1 to UTF8. (If utf8_encode & utf8_decode is not what is working for you, this might be an option)
Everything that is in ISO-8859-1 should be converted to UTF8:
$utf8 = utf8_encode('이 감사의 마음을 전합니다!'); //contains UTF8 & ISO 8859-1 characters;
$iso88591 = mb_convert_encoding($utf8, 'ISO-8859-1', 'UTF-8');
$data = $iso88591;
Encode should work after this:
$encoded_data = json_encode($data);
Convert UTF-8 to & from ISO 8859-1
This question already has answers here:
UTF-8 all the way through
(13 answers)
Closed 6 years ago.
I want to get a .html or .txt file from a folder with PHP, but this file is UTF-8 encoded, and if I use $html=file_get_contents('somewhere/somewhat.html'); and after that I echo $html; then this won't be UTF-8 encoded. I see many "�" in the text. Any idea? How can I prevent this?
You need to convert it to UTF8 yourselves. To do that use mb_convert_encoding() and mb_detect_encoding() PHP functions.
Like this,
$html=file_get_contents('somewhere/somewhat.html');
$html=mb_convert_encoding($html, 'UTF-8',mb_detect_encoding($html, 'UTF-8, ISO-8859-1', true));
echo $html;
mb_convert_encoding() converts character encoding
mb_detect_encoding() detects character encoding
Try to use iconv on your string:
http://php.net/manual/pl/function.iconv.php
Other solution:
http://php.net/manual/en/function.mb-convert-encoding.php
Or:
http://php.net/manual/en/function.utf8-encode.php
I have a json array which is holding the correct string independent of language but when the json is encoded and wrriten into the file it doesnot have the correct values. Its has the the other value random english alphabets eg:(uuadb) I want to write a string into a file where the string could be in any language.Now i am testing with tamil language. But i found PHP doesn't support unicode. please help me how to write unicode charaters into the file using PHP.
I tried using pack function but how to use the pack function for any languages Or is there any other way of doing this.Please help me......
My guess is that you're seeing \uXXXX escapes instead of the non-ASCII characters you asked for. json_encode appears to always escape Unicode characters:
<?php
$arr = array("♫");
$json = json_encode($arr);
echo "$json\n";
# Prints ["\u266b"]
$str = '["♫"]';
$array = json_decode($str);
echo "{$array[0]}\n";
# Prints ♫
?>
If this is what you're getting, it's not wrong. You just have to ensure it's being decoded properly on the receiving end.
Another possibility is that the string you're passing is not in UTF-8. According to the documentation for json_encode and json_decode, these functions only work with UTF-8 data. Call mb_detect_encoding on your input string, and make sure it outputs either UTF-8 or ASCII.
This question already has answers here:
Any way to return PHP `json_encode` with encode UTF-8 and not Unicode? [duplicate]
(5 answers)
Closed 7 years ago.
I have a problem with json_encode function with special characters.
For example I try this:
$string="Svrček";
echo "ENCODING=".mb_detect_encoding($string); //ENCODING=UTF-8
echo "JSON=".json_encode($string); //JSON="Svr\u010dek"
What can I do to display the string correctly, so JSON="Svrček"?
Thank you very much.
json_encode() is not actually outputting JSON* there. It’s outputting a javascript string. (It outputs JSON when you give it an object or an array to encode.) That’s fine, as a javascript string is what you want.
In javascript (and in JSON), č may be escaped as \u010d. The two are equivalent. So there’s nothing wrong with what json_encode() is doing. It should work fine. I’d be very surprised if this is actually causing you any form of problem. However, if the transfer is safely in a Unicode encoding (UTF-8, usually)†, there’s no need for it either. If you want to turn off the escaping, you can do so thus: json_encode('Svrček', JSON_UNESCAPED_UNICODE). Note that the flag JSON_UNESCAPED_UNICODE was introduced in PHP 5.4.0, and is unavailable in earlier versions.
By the way, contrary to what #onteria_ says, JSON does use UTF-8:
The character encoding of JSON text is always Unicode. UTF-8 is the only encoding that makes sense on the wire, but UTF-16 and UTF-32 are also permitted.
* Or, at least, it's not outputting JSON as defined in RFC 4627. However, there are other definitions of JSON, by which scalar values are allowed.
† JSON may be in UTF-8, UTF-16LE, UTF-16BE, UFT-32LE, or UTF-32BE.
Ok, so, after you make database connection in your php script, put this line, and it should work, at least it solved my problem:
mysql_query('SET CHARACTER SET utf8');
Yes, json_encode escapes non-ascii characters. If you decode it you'll get your original result:
$string="こんにちは";
echo "ENCODING: " . mb_detect_encoding($string) . "\n";
$encoded = json_encode($string);
echo "ENCODED JSON: $encoded\n";
$decoded = json_decode($encoded);
echo "DECODED JSON: $decoded\n";
Output:
ENCODING: UTF-8
ENCODED JSON: "\u3053\u3093\u306b\u3061\u306f"
DECODED JSON: こんにちは
EDIT: It's worth nothing that:
JSON uses Unicode exclusively.
The self-documenting format that
describes structure and field names as
well as specific values;
Source: http://www.json.org/fatfree.html
It uses Unicode NOT UTF-8. This FAQ Explains the difference between UTF-8 and Unicode:
http://www.cl.cam.ac.uk/~mgk25/unicode.html#utf-8
You use JSON, your non-ascii characters get escaped into Unicode code points. For example こ = code point 3053.
I want to convert a JSON object into a string. when I am using json_encode I get a string but all with hex letters. I want to convert it to a UTF-8. In other words I want to see the characters. How do I do it?
I was using json_encode to store data such as Arabic Characters in MySQL fields.
It would store the Arabic characters as HEX within the Database like this:
u0644 u063a...
Which is incorrect. You must ensure that you wrap your json_encode with mysql_escape_string().
This will make sure that the data is put in MySQL as:
\u0644\u063a...
Then, when you use json_decode, it converts the HEX strings into UTF-8 and is output correctly.
You can try passing an option to json_encode():
json_encode ( $value, JSON_UNESCAPED_UNICODE );
The JSON_UNESCAPED_UNICODE option is only available in PHP version 5.4.0 and later.
Thanks.
You can't, in PHP. Besides, the strings will still be the same once you decode them.
you are looking exactly for the funcition json_decode
it can convert json strings into utf8
here is an example of arabic word
$re = json_encode('لغة عربية');
echo $re ;
$dd = json_decode($re);
echo $dd ;
die;
it output :
"\u0644\u063a\u0629 \u0639\u0631\u0628\u064a\u0629"
لغة عربية
more examples here
http://php.net/manual/en/function.json-decode.php