I receive json answer from another script. Next I used $json = json_decode($json) and die(json['message']) for show specific string, and this value contains Cyrillic data.
Function mb_detect_encoding() shows that string in UTF-8.
Ok, I use charset="utf-8" in html file, but
I see this output "Пользователь с этим адресом электронной почты уже существует" in my browser.
I used mb_convert_encoding(json['message'], 'UTF-8'), without any effect/
Only var_dump($json) shows me decoded string.
Maybe I wrong to access data in json?
Use mb_convert_encoding(json['message'], "utf-8", "windows-1251"); to properly convert string.
Related
I receive some generated json files and some files would contain the ™ symbol, if the file has that then json_decode won't work on it, when I print $data NULL will be printed. If I manually remove the symbol then I will see the data.
I am using the below code and it will print out what is in the json file until it gets to the json file that has the ™ symbol
$json = file_get_contents($count . '.json');
$data = json_decode($json);
echo '<pre>';
var_dump($data);
echo '</pre>';
I have tried using urlencode and urldecodeand htmlspecialchars but they don't work either.
json_decode will only parse UTF-8 strings.
If the file you are reading is not UTF-8 format it will fail.
If you do not know the encoding of the file you will be reading,
there are ways to convert the data to UTF-8 before parsing it as shown in this post:
PHP: Convert any string to UTF-8 without knowing the original character set, or at least try
I'm parsing a page using php cURL and slicing data using DOMDocument() function. Parsed page has "UTF-8" encoding.
Then i write data to database. But instead of
музыка
it is writing ASCII codes like this:
Музыка
I've tried iconv(), mb_convert_encoding(), utf8_encode, but still get the same. strlen() return the length of the coded string.
How to encode this to normal text?
<?php
$string ="Музыка";
echo html_entity_decode($string, ENT_NOQUOTES, 'UTF-8');
prints:
Музыка
I have a text coming from a CSV full of accent marks.
I check if mb_check_encoding($my_text, 'utf-8') is true, and yes, it is.
With this text I generate a variable $json which apply a
json_encode($json,JSON_NUMERIC_CHECK);
var_dump($json)
gives an array of arrays with all strange marks correct (é, ì, etc), but the generated JSON text is incorrect (ex: "Donn\u00e9es" instead of "Données").
I know that json_encode only works fine for utf8 encoded data, that's why I checked it before that it was utf8.
I tried also adding a header("Content-type: application/json; charset=UTF-8"); without success.
Then what could be the reason of that?
This is how JSON encodes "strange marks", i.e. Unicode characters. When you use json_decode() on your JSON-encoded string, it will return to normal.
I've got a string that is in my database like 中华武魂 when I post my request to retrieve the data via my website I'm getting the data to the server in the format %E4%B8%AD%E5%8D%8E%E6%AD%A6%E9%AD%82
What decoding steps to I have to take in order to get it back to the usable form?
While also cleaning the user input to ensure they're not going to try an SQL injection attack?
(escape string before or after encoding?)
EDIT:
rawurldecode(); // returns "ä¸åŽæ¦é‚"
urldecode(); // returns "ä¸åŽæ¦é‚"
public function utf8_urldecode($str) {
$str = preg_replace("/%u([0-9a-f]{3,4})/i","&#x\\1;",urldecode($str));
return html_entity_decode($str,null,'UTF-8');
}
// returns "ä¸åŽæ¦é‚"
... which actually works when I try and use it in an SQL statement.
I think because I was doing an echo and die(); without specifying a header of UTF-8 (thus I guess that was reading to me as latin)
Thanks for the help!
When your data is actually that percent-encoded form, you just have to call rawurldecode:
$data = '%E4%B8%AD%E5%8D%8E%E6%AD%A6%E9%AD%82';
$str = rawurldecode($data);
This suffices as the data already is encoded in UTF-8: 中 (U+4E2D) is encoded with the byte sequence 0xE4B8AD in UTF-8 and that is encoded with %E4%B8%AD when using the percent-encoding.
That your output does not seem to be as expected is probably because the output is interpreted with the wrong character encoding, probably Windows-1252 instead of UTF-8. Because in Windows-1252, 0xE4 represents ä, 0xB8 represents ¸, 0xAD represents å, and so on. So make sure to specify the output character encoding properly.
Use PHP's urldecode:
http://php.net/manual/en/function.urldecode.php
You have choices here: urldecode or rawurldecode.
If you had encoded your string using urlencode, you must use urldecode because of the way spaces are handled. While urlencode converts spaces to +, it is not the same with rawurlencode.
I am pulling data from the Facebook graph which has characters encoded like so: \u2014 and \u2014
Is there a function to convert those characters into HTML? i.e \u2014 -> —
If you have some further reading on these character codes), or suggested reading about unicode in general I would appreciate it. This is so confusing to me. I don't know what to call these codes... I guess unicode, but unicode seems to mean a whole lot of things.
that's not entirely true bobince.
How do you handle json containing spanish accents?
there are 2 problems.
I make FB.api(url, function(response)
... var s=JSON.stringify(response);
and pass it to a php script via $.post
First I get a truncated string. I need escape(JSON.stringify(response))
Then I get a full json encoded string with spanish accents.
As a test, I place it in a text file I load with file_get_contents and apply php json_decode and get nothing.
You first need utf8_encode.
And then you get awaiting object of your desire.
After a full day of test and google without any result when decoding unicode properly, I found your post.
So many thanks to you.
Someone asked me to solve the problem of Arabic texts from the Facebook JSON archive, maybe this code helps someone who searches for reading Arabic texts from Facebook (or instagram) JSON:
$str = '\u00d8\u00ae\u00d9\u0084\u00d8\u00b5';
function decode_encoded_utf8($string){
return preg_replace_callback('#\\\\u([0-9a-f]{4})#ism', function($matches) { return mb_convert_encoding(pack("H*", $matches[1]), "UTF-8", "UCS-2BE"); }, $string);
}
echo iconv("UTF-8", "ISO-8859-1//TRANSLIT", decode_encoded_utf8($str));
Facebook Graph API returns JSON objects. Use json_decode() to read them into PHP and you do not have to worry about handling string literal escapes like \uNNNN. Don't try to decode JSON/JavaScript string literals by yourself, or extract chosen properties using regex.
Having read the string value, you'll have a UTF-8-encoded string. If your target HTML is also UTF-8-encoded, you don't need to replace — (U+2014) with any entity reference. Just use htmlspecialchars() on the string when outputting it, so that any < or & characters in the string are properly encoded.
If you do for some reason need to produce ASCII-safe HTML, use htmlentities() with the charset arg set to 'utf-8'.