I have a text coming from a CSV full of accent marks.
I check if mb_check_encoding($my_text, 'utf-8') is true, and yes, it is.
With this text I generate a variable $json which apply a
json_encode($json,JSON_NUMERIC_CHECK);
var_dump($json)
gives an array of arrays with all strange marks correct (é, ì, etc), but the generated JSON text is incorrect (ex: "Donn\u00e9es" instead of "Données").
I know that json_encode only works fine for utf8 encoded data, that's why I checked it before that it was utf8.
I tried also adding a header("Content-type: application/json; charset=UTF-8"); without success.
Then what could be the reason of that?
This is how JSON encodes "strange marks", i.e. Unicode characters. When you use json_decode() on your JSON-encoded string, it will return to normal.
Related
My PHP application outputs JSON where special characters are encoded, f.ex. the string "Brøndum" is represented as "Br\u00f8ndum".
Can you tell me which encoding this is, as well as how I get back from "Br\u00f8ndum" to "Brøndum".
I have tried utf8_encode/decode but they don't work as expected.
Thanks!
That's standard JSON unicode escaping.
You get back to the actual character by using a JSON parser. json_decode in the case of PHP.
You can tell PHP not to escape Unicode characters in the first place with the JSON_UNESCAPED_UNICODE flag.
json_encode("Brøndum", JSON_UNESCAPED_UNICODE)
mb_detect_encoding is your function. You just pass it the string and it detects the codification. You can also send it an array with the possibilities (as a regular string like "hello" could potentially be encoded in different codifications.
echo mb_detect_encoding("Br\u00f8ndum");
I have to save a JSON formatted string into my latin1 mysql db. In order to be able to use the uft8_encode function, I have to convert the entire array to utf8, and then convert the resultant string back to latin1.
So I tried the following code:
// $context is equal to array('produção' => 'homologação'), for testing purposes
$context = Helper::getHelper('Util')->encodeUtf8($context); // Encodes key and value with utf8_encode
$context = json_encode($context); // {"produ\u00e7\u00e3o":"homologa\u00e7\u00e3o"}
$context = utf8_decode($context); // Still {"produ\u00e7\u00e3o":"homologa\u00e7\u00e3o"}
But as you can see, it just doesn't work as I expected. I tried to use Zend_Json library too, but it doesn't work with those chars either.
To simplify: I need to encode a latin1 array to JSON, and then insert that JSON in my latin1 db.
Anybody knows how I can do that? A better way to accomplish the same result will be much appreciated as well.
You are performing utf8_decode on something that is not utf8.
JSON encoded content is always ASCII so performing utf8_decode will do nothing (ASCII is a subset of UTF8). You must first decode the JSON.
The correct sequence would be:
$string = "some UTF8 string"; // utf8
$json = json_encode($string); // json
$utf8 = json_decode($json); // utf8
$latin = utf8_decode($utf8); // latin1
Of course, this JSON step here is unnecessary but I'm guessing you're using JSON to transmit or store your data (which is a good idea!).
Since you updated the question:
JSON is ASCII so storing it in a latin1 encoded field should be no problem.
If you want your utf8 encoded data to be sent to the client as latin1 then you need to do some encoding conversion, either before you put it in the database or after you pull it out.
My point is that you don't need to do any tricks to store the JSON in the database. This should not be part of the question. At this point it is to me still unclear what you want. The statement:
To simplify: I need to encode a latin1 array to JSON, and then insert that JSON in my latin1 db.
does not rhyme with your code sample where your input is (I assume) utf8 encoded JSON.
I have an latin1 encoded array. I have to encode that array to JSON, and then store that JSON in my also latin1 db. My first problem was that json_encode only accepts utf8 encoded arrays, so I had to encode my entire array to utf8.
But the real problem was my db. When I inserted the JSON, it inserts the literal string, with some "\uxxxx" sequences. I first thought those were just utf8 chars, so I tried to decode them. Obviously, I was wrong.
#Frits' explanation that json_encode's result is pure ascii helped me a lot and made me look at different directions, and I found the solution for my problem.
Since the "\uxxxx" sequences were just ascii, what I really needed was to replace those sequences with the proper utf8 chars, and then decode the entire string.
It's well explained here:
How to decode Unicode escape sequences like "\u00ed" to proper UTF-8 encoded characters?
I'm particulary not really happy about that solution, but I have a deadline. So, if someone has a better way to do that, please, share with me.
I hope that helps some people in the same situation. Despite its ugliness, it works.
I've got an ISO string that I fetch from database, and when I utf8_encode it, I get a \u00f6 instead of Ö. This confuses the javascript/html which ajaxes this PHP script. Why is there a \u00f6 instead of Ö? How to get Ö instead?
edit:
Ok, I did some more experimenting and it turns out this is caused by combination of utf8_encode and json_encode. Though if I don't utf8_encode at all, the value will be null in the json.
json_encode(array("city"=>utf8_encode("göteborg")))
utf8_encode doesn't encode characters to \uxxxx, as you figured out yourself it's json_encode doing this. And that's fine, because the JSON format specifies this behavior. If your client properly decodes the JSON string into a Javascript data type, the \uxxxx escapes will be turned into proper Unicode characters.
As for json_encode discarding characters if your string is Latin1 encoded: It's not explicitly stated on the manual page, but Javascript and JSON are entirely Unicode based, so I suspect Latin1 is an invalid and unexpected encoding to use with JSON strings, so it breaks.
How do you print that? javascript natively support \uXXXX encoding, and doing this in javascript:
var x = "\u00f6"; alert(x);
should print out a small ö.
EDIT: According to your code, if you output that directly to the response stream and use the actual response as a variable in js on the client side, you shouldn't care about json_encode at all.
You would just tell the browser that the content is utf8 by setting the content-type header:
header('content-type: text/plain;charset=utf8');
And then the jQuery.data() code would work just fine.
I'm developing a dependent select script using jQuery, PHP and JSON as the response.
Everything goes well except for using special characters like French ones (é , è , à...)
if I pre-encode them like (é , è , à) (Here I'm using spaces between the ampersand and the rest of the word to prevent auto encoding in my question) it works but when rendered with jquery the characters are not converted to what they should look like (é...), instead they are shown as is (é)
If I write them like (é) and don't pre-encode them the full value in this array entry is not shown.
What should I do here?
Thanks.
If I write them like (é) and don't pre-encode them the full value in
this array entry is not shown.
What should I do here?
In JSON you do not HTML-encode values. You send them literally (é) and set set Content-Type correctly:
header('Content-Type: application/json; Charset=UTF-8');
Declare the encoding your data is in, of course.
This worked for me, hopefully it will work for anyone else experiencing similar issues.
$title = 'é';
$title = mb_convert_encoding($title, "UTF-8", "HTML-ENTITIES");
header('Content-Type: application/json; Charset="UTF-8"');
echo json_encode(array('title' => $title));
The mb_convert_encoding function takes a value and converts it from (in this case) HTML-ENTITIES to UTF-8.
See here for me details on the function http://php.net/manual/en/function.mb-convert-encoding.php
Just like the first anwser
Do you use a database? If Yes, make sure the database table is declared UFT8
How is declared the HTML page? UTF-8
IS the string in the PHP script file? If yes, make sure the file has a UTF-8 file format
You could also use utf8_encode (to send to HTML) and utf8_decode (to receive) but not the right way
I'm trying to decode a JSON feed containing some Cyrillic characters. Not all of the characters in the feed is Cyrillic though. I'm using json_decode which works fine for anything else, but return garbage when there are Cyrillic characters.
The results look like this: Деффачки
Any ideas?
Your page is being decoded as CP1252 when it's actually UTF-8. Set your headers properly.
>>> print u'Деффачки'.encode('cp1252').decode('utf-8')
Деффачки
if you can not decode unicode characters with json_decode, use addslashes() while using json_encode. The problem comes from unicode chars starting with \ such as \u30d7
$json_data = addslashes(json_encode($unicode_string_or_array));
hermanschutte Use the escape function when sending data through javascript