How is json decoding of utf emoji possible? - php

As far as i call recall. A valid Json data comes in this format '{"key":"value"}'
But while surfing, I found an article about sending UTF-8 codes as emoji.
Emoji was stored in variable as
$emoji = "\ud83d\udc4e";. For it to work properly, the answer was to use json_decode($emoji);. I tried it out and it returned a thumbs down emoji. Meanwhile, I was expecting NULL but it turns out that it was a valid json data. So I'm confused how that is possible.

Related

Converting Hebrew characters to UTF-8 using PHP

i have a MySQL field value with a json object containing Hebrew characters like this:
[{"name":"אספנות ואומנות","value":1,"target":null},{"name":"אופניים","value":2,"target":null}]
(the one in the name field)
This field output is giving me some trouble with a certain web interface.
so, looking around in the database i found another field containing json object and its output works fine.
[{"name":"\u05d0\u05e1\u05e4\u05e0\u05d5\u05ea \u05d5\u05d0\u05d5\u05de\u05e0\u05d5\u05ea","value":1,"target":null},{"name":"\u05d0\u05d5\u05e4\u05e0\u05d9\u05d9\u05dd","value":2,"target":null}]
So i would like to convert the first field to this encoding to see if its solves the output issue.
what is this encoding ? is it UTF-8 ? how can i convert it using PHP ?
i tried to isolate the value and convert it to UTF-8 using
echo iconv("Windows-1255","UTF-8",'אספנות ואומנות');
but its just returning an empty value.
Any help would be great
So, in PHP
json_encode('אספנות ואומנות');
did the trick

PHP json_decode returns null

I'm writing PHP code that uses a database. To do so, I use an array as a hash-map.
Every time content is added or removed from my DB, I save it to file.
I'm forced by my DB structure to use this method and can't use mysql or any other standard DB (School project, so structure stays as is).
I built two functions:
function saveDB($db){
$json_db = json_encode($db);
file_put_contents("wordsDB.json", $json_db);
} // saveDB
function loadDB(){
$json_db = file_get_contents("wordsDB.json");
return json_decode($json_db, true);
} // loadDB
When echo-ing the string I get after the encoding or after loading from file, I get a valid json (Tested it on a json viewer) Whenever I try to decode the string using json_decode(), I get null (Tested it with var_dump()).
The json string itself is very long (~200,000 characters, and that's just for testing).
I tried the following:
Replacing single/double-quotes with double/single-quotes (Without any backslashes, with one backslash and three backslashes. And any combination I could think of with a different number of backslashes in the original and replaced string), both manually and using str_replace().
Adding quotes before and after the json string.
Changing the page's encoding.
Decoding without saving to file (Right after encoding).
Checked for slashes and backslashes. None to be found.
Tried addslashes().
Tried using various "Escape String" variants.
json_last_error() doesn't work. I get no error number (Get null, not 0).
It's not my server, so I'm not sure what PHP version is used, and I can't upgrade/downgrade/install anything.
I believe the size has something to do with it, because small strings seem to work fine.
Thanks Everybody :)
In your JSON file change null to "null" and it will solve the problem.
Check if your file is UTF8 encoded. json_decode works with UTF8 encoded data only.
EDIT:
After I saw uploaded JSON data, I did some digging and found that there are 'null' key. Search for:
"exceeding":{"S01E01.html":{"2217":1}},null:{"S01E01.html":
Change that null to be valid property name and json_decode will do the job.
I had a similar problem last week. my json was valid according to jsonlint.com.
My json string contained a # and a & and those two made json_decode fail and return null.
by using var_dump(json_decode($myvar)) which stops right where it fails I managed to figure out where the problem was coming from.
I suggest var_dumping and using find dunction to look for these king of characters.
Just on the off chance.. and more for anyone hitting this thread rather than the OP's issue...I missed the following, someone had htmlentities($json) way above me in the call stack. Just ensure you haven't been bitten by the same and check the html source.
Kickself #124

UTF-8 data received by php isn't decoded

I'm having some troubles with my $_POST/$_REQUEST datas, they appear to be utf8_encoded still.
I am sending conventional ajax post requests, in these conditions:
oXhr.setRequestHeader("Content-type", "application/x-www-form-urlencoded; charset=utf-8");
js file saved under utf8-nobom format
meta-tags in html <header> tag setup
php files saved under utf-8-nobom format as well
encodeURIComponent is used but I tried without and it gives the same result
Ok, so everything is fine: the database is also in utf8, and receives it this way, pages show well.
But when I'm receiving the character "º" for example (through $_REQUEST or $_POST), its binary represention is 11000010 10111010, while "º" hardcoded in php (utf8...) binary representation is 10111010 only.
wtf? I just don't know whether it is a good thing or not... for instance if I use "#º#" as a delimiter of the explode php function, it won't get detected and this is actually the problem which lead me here.
Any help will be as usual greatly appreciated, thank you so much for your time.
Best rgds.
EDIT1: checking against mb_check_encoding
if (mb_check_encoding($_REQUEST[$i], 'UTF-8')) {
raise("$_REQUEST is encoded properly in utf8 at index " . $i);
} else {
raise(false);
}
The encoding got confirmed, I had the message raised up properly.
Single byte utf-8 characters do not have bit 7(the eight bit) set so 10111010 is not utf-8, your file is probably encoded in ISO-8859-1.

How to properly encode UTF-8 for JavaScript and JSON?

I have a problem creating an input validation hash. JavaScript submits data to API and API validates the sent data with json_encode. Basically it works like this:
$input=array('name'='John Doe','city'=>'New York');
$validationHash=sha1(json_encode($input).$key); // Key is known to both servers
If PHP connects to another server, then everything works. It also works through JavaScript (I have a custom sha1() function there as well):
var validationHash=sha1(JSON.stringify({'name':'John Doe','city'=>'New York'})+key);
My problem comes when the string contains UTF-8 characters. For example, if one of the values is:
Ränisipelgasöösel
Then PHP server that receives the command converts it to this after JSON encoding:
R\u00e4nisipelgas\u00f6\u00f6sel
I need to do this in JavaScript as well, but I haven't been able to work out how. I need to make sure that I send proper validation hash to the server or the command fails. I found from Google that unescape(encodeURIComponent(string)) and decodeURIComponent() could be used, but neither gives me the same string that PHP has and validates with.
UTF-8 is used on both client and server.
Any ideas?
It does not seem to be possible. The only working solution I have found is to encode all data with encodeURIComponent() on browser side and with rawurlencode() on PHP side and then calculate the JSON from these values in arrays.
My fix was to raw url encode my json data like so.
rawurlencode( json_encode( $data ) );
And then from within javascript decode the raw url encoded json and then parse the json string like so.
JSON.parse( decodeURIComponent( data ) );
Hope this helps.
Why not base64 encode the data for safe transport? It encodes UTF-8 characters in a safe string that can be sent across different mediums - php, javascript etc. That way you can base64 decode the string at the receiving end. Voila!
By base64 encoding the data, i mean base64 encoding the values and not the whole json string
is you html page encoding utf-8?
<meta http-equiv="Content-Type" content="text/html; charset=UTF-8">

more specific about sending json with utf8 to iphone from php server

We have php server that sends json string in utf-8 encoding.
Im responsible for the iphone app that get the data.
I want to be sure that on my side everything is correct :
//after I downlad the data stream :
NSString* content = [[NSString alloc] initWithData:self.m_dataToParse encoding:NSUTF8StringEncoding];
//here the data is shown correctly in the console
NSLog(#"%#",content);
SBJsonParser *_parser = [[SBJsonParser alloc]init];
NSDictionary *jsonContentDictionary = [_parser objectWithData:self.m_dataToParse];
//here, when i printer values of array IN array, i see \u454 u\545 \4545 format. any ideas why ?
for(id key in jsonContentDictionary)
{
NSLog(#"key:%#, value:%#,key, [ jsonContentDictionary objectForKey:key]);
}
im using the latest version of json library :
https://github.com/stig/json-framework/
There is problem is the iphone side ? (json parser ? ) or in the php server ?
just to be clear again :
1.on the console, before json, the string looks o.k
2.after doing json, the array in array values are in the format of \u545 \u453 \u545
Thanks in advance.
Your code is correct.
A possible reason of the issue, and you must investigate it with your content provider (the server that sends the json to you), is that even if the whole json string is correctly encoded as utf-8 (remember: the json text is a sequence of character and so an encoding must be specified), it may happen that some or all of the text content (that is the values of the single objects contained in the json message) has been originally encoded in another format, typically this is html-encoding (iso-8859) especially when particular characters are used (e.g. cyrillic or asian). Now the json framework by default decodes all data as utf-8, but if there is a coding mismatch between the utf-8 characters and the iso-8859 (just to remain in the example) then the only way to transform them in utf-8 is to use the \u format. This happens quite often, especially when php scripts extract the info from html pages, which are usually encoded using iso-8859. And consider also that iOS is not able to convert the whole set of iso-8859 characters to unicode (e.g.: cyrillic).
So possible solutions are:
- do a content encoding of texts server side (iso-8859 --> utf-8)
- or if this is not possible, then it's up to you to recognize the \uxxx sequences coming more often from your content provider and replace them with the corresponding utf-8 characters.

Categories