i am using ajax for data in arabic characters and everything works good , i can store arabic characters to database and i can retrieve arabic characters from database and prints it to the screem and everything works good , but my problem is that when i check javascripte concole on google chrome to check the retriveing data i can't show arabic characters , but the prints as this (this is just example and not all data)
["\u0645\u062f\u064a\u0646\u0629","\u0645\u062f\u064a\u0646\u0629 \u062a\u0627\u0631\u064a\u062e\u064a\u0651\u0629","\u0634\u062e\u0635\u064a\u0651\]
i mean like this
When using JSON, strings are in UTF-8, and special characters are encoded as \u followed by 4 hexadecimal characters.
In your case, if you try to decode that string -- for example, with the first item of your array :
>>> str = "\u0645\u062f\u064a\u0646\u0629";
"مدينة"
I don't read arabic, but this looks like arabic to me :-)
Even if the JSON doesn't look good, it's not what matters : the important thing is that you get your original data back, once the JSON is decoded ; and, here, it seems you'll do.
To get the original, decoded, string in the browser's console (for debugging purposes, I suppose), you should be able to use the same JS library you are using in your application (if any), or the JSON.parse() function (I just tested this in Firefox's console, actually) :
>>> JSON.parse('"\u0645\u062f\u064a\u0646\u0629"');
"مدينة"
Of course, you'll have to write some code to actually output that decoded-value to the browser's console (be it "by hand" or when getting the JSON back from your server) ; but since the browser's console it a debugging tool it seems OK.
By default, the console, as a debugging tool, outputs the raw JSON string it gets from the server -- and, with JSON, special characters are encoded, there is nothing you can do about it (except decode the JSON string and display it yourself, if you need to)
If you want to output the decoded string to the console each time you get a result from your server, you'll have to call JSON.parse() each time you get a result from your server ; and then output it, probably using console.log().
Don't forget to remove that debugging code before distributing your application / uploading it to your production server, though.
Related
The following problem occurs when I output something with PHP in JSON format and read it in my Android:
I have an & symbol in a string that occurs in the JSON code which isn't really displayed correctly. I'm sure it occurs with other symbols too but I haven't tested that.
I tried the following:
Raw "&" symbol:
Browser reads &, Android reads &
htmlentities("&"):
Browser reads &, Android reads &
htmlspecialchars("&"):
Browser reads &, Android reads &
html_entity_decode("&"):
Browser reads &, Android reads &
The last one is the desired result, but it's just wrong to decode something before it's even encoded.. What am I doing wrong??
PS: The content is outputted in UTF-8, not sure what json_encode does with it, and read in UTF-8.
This is something I encounter often, and I usually end up resorting to the try and try again method until the data works. I figured SO would know what the best practice is in order to maintain the data and not mess up the json.
Let's assume the data I want to send is text data of the most annoying sort - special characters, &,<,", \n, \n\r, \t, +, etc.
Let's also assume I want to keep everything in utf8, and my mysql table is configured to be utf8. However, since PHP's utf8 support is lacking, this should be considered.
What encoding / escaping / htmlentities should I be doing from:
1) Sending JSON data from client JS to PHP via AJAX POST (anything different for GET?)
2) Decoding data in PHP and storing text string in mysql database (or store the escaped/encoded data? )
3) Retrieving data from MySQL DB in PHP and returned as JSON response to JS AJAX request
4) In a JSON response from our REST api
Whenever I use php/mysql/jquery to pass data back and forth, I end up using the following combination of encodings/escapings, and it seems to work well for me.
1) you don't need to do anything here, UNLESS you are sending a URL (I think this is only for GET requests) - but if you're sending a url you need to use encodeURIComponent(url), which will properly escape the &'s and special characters in the url (see more here).
2) Use mysqli and bound parameters, it will do all the escaping for you (read about it here)
3) I always use this when echoing data into an HTML file :
<?php
htmlspecialchars($string_to_escape, ENT_QUOTES, 'UTF-8', false);
?>
This will properly encode all special characters (the false is for "no double encoding"). Also make sure you the proper UTF-8 meta tags at the top of your html pages.
4) Using json_encode should always escape your data properly, but I would use the code from #3 just to make sure. But you'll probably only need it if you're returning data with special characters in it.
for sending json data to php you don't have to do anything special. JSON is just a serialized javascript 'variable'
use prepared statements! do not try to decode/strip/alter the content with php
use the appropriate functions to escape data for json (I don't know if there's a builtin php function for that)
same as 3.
My web application communicates with the server over JSON protocol. Before sending each JSON message from the web application, I run a hmac-sha1 function on it (on already encoded object) and insert the resulting HMAC into the header of JSON request.
On server side, I decode JSON message with PHP, extract the HMAC, unset() the HMAC from the object and then encode the object back into JSON and create a HMAC of it.
The HMACs match as long as I don't use characters like "ž, š, č". When I use those characters in the message, the HMACs don't match anymore.
In the web application I'm using jQuery.post() to transmit the already encoded JSON string.
If I send the data I got from the web application back to it in the JSON encoded reply, the application will display "ž, č, š" just nicely.
How can I make the HMACs match?
UPDATE:
This is only a problem on latest version of Firefox and Opera. It works fine on IE8 and Chrome. On the former browsers, the JSON string (before it is sent) is:
{"body":[{"name":"Žiga Kraljevič","email":"test#email.com","password":"secretpass"}],"header":{"apiID":"person-27jhfa83ha-js84sjj18dasjd","hmac":"e4259d6ef8f477c020d644409cc16dd9c42301e8"}}
While on the latter browsers (IE8 and Chrome, where it works) is the following:
{"body":[{"name":"\u017diga Kraljevi\u010d","email":"test#email.com","password":"secretpass"}],"header":{"apiID":"person-27jhfa83ha-js84sjj18dasjd","hmac":"e4e9e2d0d8d11728a2b4329ad6dacdb9409b1de1"}}
You're probably running into multiple issues. One of them may well be that the character encoding being used on the client is different from that being used on the server, worth ensuring that they're the same (more about character encoding in Joel's excellent essay). Another may well be that there are multiple correct ways to encode things. The encoders may well be using different ways. For instance, you can encode a " within a string as either \" or \u0022. Both are valid, and they're equivalent, but the hashes won't match. Similarly, I'm a bit surprised you're not running into more trouble when not using accented characters, for instance with whitespace.
What is your hmac-sha1 function, where's it from? If it is taking a JSON String as input then there's an implicit encode-to-bytes step going on here because SHA1 operates on bytes, not UTF-16 code units like JS String.
I would suspect that your JS function is using a “one code unit n per byte n” type of encoding, for easy calculation with tools like getCharCodeAt. This is effectively the same as if the character string input had been encoded to ISO-8859-1. Whereas if you are using encodeURIComponent or posting the raw characters via XMLHttpRequest, the implicit encoding there is UTF-8.
You could convert the String to UTF-8-bytes-stored-as-code-units format for the JS hmac-sha1 function, that might make it match PHP. There's a sneaky idiom to do this:
var utf8= unescape(encodeURIComponent(s));
When POSTing JSON I base64 and urlencode it anyway
URL-encoding should be enough (with encodeURIComponent, not escape which is the wrong thing for absolutely everything except the reverse step of the UTF-8-conversion trick above).
BTW, what's the purpose of this? You do know it doesn't in any way secure the connection between the browser and the server, yeah?
Edit:
I'm using jssha.sourceforge.net for sha1-hmac. In PHP I'm using hash_hmac.
Works for me:
var data= '\u017E, \u010D, \u0161'; // 'ž, č, š' in a Unucode string
var utf8bytes= unescape(encodeURIComponent(data));
var hmac= new jsSHA(utf8bytes).getHMAC('foo', 'ASCII', 'SHA-1', 'HEX');
alert(hmac); // 5d15f0b9...
var form= 'message='+encodeURIComponent(data)+'&hmac='+encodeURIComponent(hmac);
xmlhttprequest.send(form);
...
$utf8bytes= $_POST['message']; // "\xc5\xbe, \xc4\x8d, \xc5\xa1"
// which is 'ž, č, š' as UTF-8 in byte string
$hmac= hash_hmac('sha1', $utf8bytes, 'foo');
echo $hmac; // 5d15f0b9...
echo strtolower($hmac)===strtolower($_POST['hmac']); // true
This uses the binary ('ASCII' to jsSHA) key foo. If you are using a binary key with non-ASCII characters in it, you would have to make sure that those are properly encoded too, in the same way as the data.
The key for HMAC is a shared secret between the server and the client, which has been previously exchanged over a secure connection.
It's not only the key you'd have to send over a secure connection, but the entire page and all scripts in it. Otherwise a man in the middle attack could sabotage your scripts on the way to the browser to replace them with a version that used the secret key to sign bogus messages. If you've got an HTTPS server for all this stuff, fine. I'm not sure what the HMAC would be doing in that case though, it seems a bit involved for an anti-XSRF scheme.
I'm having some strange issues with decoding an XML snippet, contained with a cookie, with PHP's base64_decode function:
In our PHPUnit tests, we can decode the XML and echo it out to the console and it prints XML as you would expect (all unit tests pass as well).
As soon as we try running the same code in the browser, the decoded XML appears to contain loads of UTF-16 characters interspersed with fragments of the expected XML tags. For example:
<CreateSession\u000f\u0013Y...
As you might then expect, we end up with an Exception: String could not be parsed as XML... error when passing this string to the SimpleXMLElement constructor.
Some further info:
The XML itself comes from an external login system and we don't have any control over it's format; it doesn't come with any <?xml...?> declaration and the root node is this <CreateSession>...</CreateSession> tag.
I've checked the character encoding of the page being served and have verified that it is UTF-8.
The site being developed is using Drupal
We tried passing the XML / UTF-16 string through Drupal's drupal_convert_to_utf8 function, but this just returns the Chinese (I think) symbols e.g. 敲
Has anyone come across anything like this before or have any idea what might be causing this?
Aha! It turns out that, when run in the browser, the cookie values were automatically URL decoded by PHP, meaning that any '+' in the base64 encoded text were being replaced by spaces. Adding this line of code before calling base64_decode fixed things:
$tmp = str_replace(' ', '+', $value);
I'm working on a project in PHP (5.3.1) where I need to send a JSON string to a webservice (in python), but the result I get from json_encode does not pass as a valid JSON (i'm using JSLint to check validity).
I should add that the structure I'm trying to encode is fairly big (13K encoded), and consists partially of UTF8 data, and while json_encode does handle it, i get spaces in weird places in the result. For example, I could get {"hello":tru e} or {"hell o":true} which results in an error from the webservice since the JSON is invalid (or data, like in the second example).
I've also tried to use Zend framework for JSON encoding, but that didn't make much different.
Is there a known issue with JSON in PHP? Did anyone encounter that behavior and found a solution?
You state that "the structure I'm trying to encode ... consists partially of UTF8 data." This implies that it is also partially of non-UTF8 data. The json_encode doc has a comment at the bottom, that
json_encode() expects strings to be encoded to be in UTF8 format, while by default PHP strings are ISO-8859-1 encoded.
This means that
json_encode(array('àü'));
will produce a json representation of an empty string, while
json_encode(array(utf8_encode('àü')));
will work.
Are the failing segments of the JSON due to non-UTF8 input?
For sure object keys cannot contain spaces or any non unicode characters, unquoted variables can be only boolean, integer ,float, object and array value, strings should always be quoted.
Also, I would recommend you to add correct header before your json output.
if(!headers_sent())
header('Content-Type: application/json; charset=utf-8', true,200);
Can you also post your array or object that you passing to json_encode?
I was handling some automatically generated emails the other day and noticed the same weird behavior (spaces were inserted to the email body), so I started to check the email post and found the culprit:
From the SMTP RFC2821:
The maximum total length of a text
line including the is 1000
characters (not counting the leading
dot duplicated for transparency).
My email body was indeed in one line, so breaking it with \n's fixed the spaces issue.
After scratching my head for nearly a day, I've come to the conclusion that the problem was not in the json_encode function. It was with my post function.
Basically, the json_encode was preparing the data to be sent to another service. Before today, I've used stream_context_create and fopen to post data to the external service, but now I use fsockopen and fputs and it seems to be working.
Although I'm unsure as to the nature of the problem, I'm happy it works now :)
BTW: After this process, I mail myself the input and output (both in JSON) and this is how I saw there was a problem in the first place. This problem still persists but I guess that's related to the encoding of the mail or something of that sort.