Encoding gone Wrong (PHP => JSON => Android) - php

The following problem occurs when I output something with PHP in JSON format and read it in my Android:
I have an & symbol in a string that occurs in the JSON code which isn't really displayed correctly. I'm sure it occurs with other symbols too but I haven't tested that.
I tried the following:
Raw "&" symbol:
Browser reads &, Android reads &
htmlentities("&"):
Browser reads &, Android reads &
htmlspecialchars("&"):
Browser reads &, Android reads &
html_entity_decode("&"):
Browser reads &, Android reads &
The last one is the desired result, but it's just wrong to decode something before it's even encoded.. What am I doing wrong??
PS: The content is outputted in UTF-8, not sure what json_encode does with it, and read in UTF-8.

Related

Convert characters to decoded html then back to encoded on submitted to page

I have special characters like # & and so on. What was suggested is I html decode these characters before passing them to my PHP image generator (a font previewer like things remembered).
I have no issues turning the text into html decoded, but how do I turn them back to encoded on the page the text gets submitted to?
The PHP superglobals $_GET and $_POST should be automatically decoded if you send a properly encoded data from the client side, I.E. using encodeURIcomponent() js function.
If it doesn't seem to be decoded for some reason, you should find out why (maybe you double-encoded it?), or bypass it like with urldecode() php func (there is also a workaround for utf8 decoding).

google chrome ajax call doesn't show arabic characters in js concole

i am using ajax for data in arabic characters and everything works good , i can store arabic characters to database and i can retrieve arabic characters from database and prints it to the screem and everything works good , but my problem is that when i check javascripte concole on google chrome to check the retriveing data i can't show arabic characters , but the prints as this (this is just example and not all data)
["\u0645\u062f\u064a\u0646\u0629","\u0645\u062f\u064a\u0646\u0629 \u062a\u0627\u0631\u064a\u062e\u064a\u0651\u0629","\u0634\u062e\u0635\u064a\u0651\]
i mean like this
When using JSON, strings are in UTF-8, and special characters are encoded as \u followed by 4 hexadecimal characters.
In your case, if you try to decode that string -- for example, with the first item of your array :
>>> str = "\u0645\u062f\u064a\u0646\u0629";
"مدينة"
I don't read arabic, but this looks like arabic to me :-)
Even if the JSON doesn't look good, it's not what matters : the important thing is that you get your original data back, once the JSON is decoded ; and, here, it seems you'll do.
To get the original, decoded, string in the browser's console (for debugging purposes, I suppose), you should be able to use the same JS library you are using in your application (if any), or the JSON.parse() function (I just tested this in Firefox's console, actually) :
>>> JSON.parse('"\u0645\u062f\u064a\u0646\u0629"');
"مدينة"
Of course, you'll have to write some code to actually output that decoded-value to the browser's console (be it "by hand" or when getting the JSON back from your server) ; but since the browser's console it a debugging tool it seems OK.
By default, the console, as a debugging tool, outputs the raw JSON string it gets from the server -- and, with JSON, special characters are encoded, there is nothing you can do about it (except decode the JSON string and display it yourself, if you need to)
If you want to output the decoded string to the console each time you get a result from your server, you'll have to call JSON.parse() each time you get a result from your server ; and then output it, probably using console.log().
Don't forget to remove that debugging code before distributing your application / uploading it to your production server, though.

php system, python and utf-8

I have a python program running very well. It connects to several websites and outputs the desired information. Since not all websites are encoded with utf-8, I am requesting the charset from the headers and using unicode(string, encoding) method to decode (I am not sure whether its the appropriate way to do this but it works pretty well). When I run the python program I receive no ??? marks and it works fine. But when I run the program using php's system function, I receive this error:
UnicodeEncodeError: 'ascii' codec can't encode character u'\u0131' in position 41: ordinal not in range(128)
This is a python specific error but what confuses me is that I don't receive this error when I run the program using the terminal. I only receive this when I use php's system function and call the program from php. What may be the cause behind this problem?
Here is a sample code:
php code that calls python program:
system("python somefile.py $search") // where $search is the variable coming from an input
python code:
encoding = "iso-8859-9"
l = "some string here with latin characters"
print unicode("<div class='line'>%s</div>" % l, encoding)
# when I run this code from terminal it works perfect and I receive no ??? marks
# when I run this code from php, I receive the error above
From the PrintFails wiki:
When Python finds its output attached to a terminal, it sets the
sys.stdout.encoding attribute to the terminal's encoding. The print
statement's handler will automatically encode unicode arguments into
str output.
This is why your program works when called from the terminal.
When Python does not detect the desired character set of the
output, it sets sys.stdout.encoding to None, and print will invoke the
"ascii" codec.
This is why your program fails when called from php.
To make it work when called from php, you need to make explicit what encoding print should use. For example, to make explicit that you want the output encoded in utf-8 (when not attached to a terminal):
ENCODING = sys.stdout.encoding if sys.stdout.encoding else 'utf-8'
print unicode("<div class='line'>%s</div>" % l, encoding).encode(ENCODING)
Alternatively, you could set the PYTHONIOENCODING environment variable.
Then your code should work without changes (both from the terminal and when called from php).
When you run the python script in your terminal, your terminal is likely to be encoded in UTF8 (specially if you are using linux or mac).
When you set l variable to "some string with latin characters", that string will be encoded to the default encoding, if you are using a terminal l will be UTF8 and the script wont crash.
A little tip: if you have a string encoded in latin1 and you want it in unicode you can do:
variable.decode('latin1')

JSON getting "name":"\u05d7\u05d1\u05e8\u05d4" for non-English

I am getting a JSON returned by Ajax from PHP json_encode
I have a Hebrew characters that turned into "\u05d7\u05d1\u05e8\u05d4"
How can I turn them back into Hebrew?
(The DB is encoded UTF8 and when calling the PHP file the Hebrew is displayed correctly)
You use any (non-broken) JSON parser.
As Quentin pointed out, this is correct. \uXXXX is a correct escape sequence for a unicode character. In fact, if you type in into the firebug console, it will prompt you "חברה". That does look hebrew to me, although I can't tell whether it's correct.
Therefore after parsing the data you received (either with eval or JSON.parse) the character should be unescaped automatically.

Decoding XML with base64_decode works fine in PHPUnit but returns UTF-16 encoded data in browser

I'm having some strange issues with decoding an XML snippet, contained with a cookie, with PHP's base64_decode function:
In our PHPUnit tests, we can decode the XML and echo it out to the console and it prints XML as you would expect (all unit tests pass as well).
As soon as we try running the same code in the browser, the decoded XML appears to contain loads of UTF-16 characters interspersed with fragments of the expected XML tags. For example:
<CreateSession\u000f\u0013Y...
As you might then expect, we end up with an Exception: String could not be parsed as XML... error when passing this string to the SimpleXMLElement constructor.
Some further info:
The XML itself comes from an external login system and we don't have any control over it's format; it doesn't come with any <?xml...?> declaration and the root node is this <CreateSession>...</CreateSession> tag.
I've checked the character encoding of the page being served and have verified that it is UTF-8.
The site being developed is using Drupal
We tried passing the XML / UTF-16 string through Drupal's drupal_convert_to_utf8 function, but this just returns the Chinese (I think) symbols e.g. 敲
Has anyone come across anything like this before or have any idea what might be causing this?
Aha! It turns out that, when run in the browser, the cookie values were automatically URL decoded by PHP, meaning that any '+' in the base64 encoded text were being replaced by spaces. Adding this line of code before calling base64_decode fixed things:
$tmp = str_replace(' ', '+', $value);

Categories