I'm attempting to send a string from client-side JavaScript to a back-end PHP script. The string contains special quotes like ’ and “.
When I look at the console in Chrome I can see that these are sent in the POST headers as they are. On the PHP side I then immediately json_encode() the $_POST array and send it back to see what its collected. The special characters now look like this \u2019. This is for testing please note I would normally sanitize all post data.
I wish to use UTF-8 but I'm not sure what I'm missing. My HTML includes:
<meta charset="utf-8">
My PHP server has UTF-8 set as its default charset.
If I start saving such data to the database I start ending up with strings like this: â for ’. However this is not a database issue the characters are already bad before going into the database. MySQL purely accentuates them.
Any ideas?
Update
I've noticed that if I return the string back to javascript without using json_encode() then it's in its original format with the special quotes (’ and “) still.
Have you tried:
utf8_decode()
On the server side for the variables you're passing? PHP is likely expecting iso-8859-1 rather than uft-8.
Turns out there was an issue both sides of the pond. The issue PHP side which this question regards was that the data was being sent to the back-end via a GET request (url encoded). I have changed this to a POST request.
This has allowed me to specify the UTF-8 charset when sending the headers for the POST request.
Related
I have special characters like # & and so on. What was suggested is I html decode these characters before passing them to my PHP image generator (a font previewer like things remembered).
I have no issues turning the text into html decoded, but how do I turn them back to encoded on the page the text gets submitted to?
The PHP superglobals $_GET and $_POST should be automatically decoded if you send a properly encoded data from the client side, I.E. using encodeURIcomponent() js function.
If it doesn't seem to be decoded for some reason, you should find out why (maybe you double-encoded it?), or bypass it like with urldecode() php func (there is also a workaround for utf8 decoding).
My PHP web application is running on ISO-8859-1 encoding. When I post an HTML form with accept-charset="ISO-8859-1" to a PHP server, this is what I receive on server side
a. ASCII characters in their original form.
b. Special characters that can be represented in ISO-8859-1 in their original form.
c. Special characters that can not be represented in ISO-8859-1 converted to HTML entities.
All of the data is received as proper ISO-8859-1 encoded characters. I want to replicate the exact same behavior with JavaScript or jQuery. All the solutions I have tried end up sending all the special characters to PHP file as double encoded characters and I have to utf8_decode them.
I have made a working solution as follows:
a. convert unsupported special characters to HTML entities.
b. send JavaScript AJAX post request with XMLHttpRequest/ActiveXObject. To do that I have to 'encodeURIComponent()' the data.
c. receive the post data on server and 'utf8_decode()' it because 'encodeURIComponent()' encodes the special characters as UTF-8.
I want a JavaScript solution which makes sure that I receive proper ISO-8859-1 encoded data from POST on server side, so that I don't have to do 'utf8_decode()'. In other words, I want the JavaScript solution to exactly replicate the form behavior with accept-charset="ISO-8859-1".
Thank you
To my knowledge, there is no sure-fire way in javascript to do this. Character encoding is not a forte of Javascript and I believe it is mainly because it leaves such areas to the discretion to others such as the HTML that describes the content, the Server Language which does the parsing or the Web Server that provides the environment.
As far as any URI encoding/decoding function in JS is concerned, it is all UTF-8 and that is inbuilt, as in, there is no param you can pass to override it.
You could also try something in the lines of overriding the MimeType and specifying a new one like:
var xhr = new XMLHttpRequest();
if (xhr.overrideMimeType) {
xhr.overrideMimeType('application/x-javascript; charset=ISO-8859-1');
}
Not sure how stable it is, but it is one direction you might want to explore..
This is something I encounter often, and I usually end up resorting to the try and try again method until the data works. I figured SO would know what the best practice is in order to maintain the data and not mess up the json.
Let's assume the data I want to send is text data of the most annoying sort - special characters, &,<,", \n, \n\r, \t, +, etc.
Let's also assume I want to keep everything in utf8, and my mysql table is configured to be utf8. However, since PHP's utf8 support is lacking, this should be considered.
What encoding / escaping / htmlentities should I be doing from:
1) Sending JSON data from client JS to PHP via AJAX POST (anything different for GET?)
2) Decoding data in PHP and storing text string in mysql database (or store the escaped/encoded data? )
3) Retrieving data from MySQL DB in PHP and returned as JSON response to JS AJAX request
4) In a JSON response from our REST api
Whenever I use php/mysql/jquery to pass data back and forth, I end up using the following combination of encodings/escapings, and it seems to work well for me.
1) you don't need to do anything here, UNLESS you are sending a URL (I think this is only for GET requests) - but if you're sending a url you need to use encodeURIComponent(url), which will properly escape the &'s and special characters in the url (see more here).
2) Use mysqli and bound parameters, it will do all the escaping for you (read about it here)
3) I always use this when echoing data into an HTML file :
<?php
htmlspecialchars($string_to_escape, ENT_QUOTES, 'UTF-8', false);
?>
This will properly encode all special characters (the false is for "no double encoding"). Also make sure you the proper UTF-8 meta tags at the top of your html pages.
4) Using json_encode should always escape your data properly, but I would use the code from #3 just to make sure. But you'll probably only need it if you're returning data with special characters in it.
for sending json data to php you don't have to do anything special. JSON is just a serialized javascript 'variable'
use prepared statements! do not try to decode/strip/alter the content with php
use the appropriate functions to escape data for json (I don't know if there's a builtin php function for that)
same as 3.
I have a javascript/PHP script that does the following:
Uses javascript to find text on a web-page.
Transmits the text using jQuery AJAX to a PHP page.
The PHP stores the text in a MySQL database.
The trouble is, when I look at what has been stored in the database, some non-ASCII characters are corrupted.
I have simplified the problem and printed out the character codes of each letter to investigate what is going on.
For example: send over a single character, the pound sterling symbol.
When I check in PHP, what is being received is characters 0xC2 followed by 0xA3
(capital A circumflex follwed by pound sterling).
Ie getting a spurious extra character  before the £).
I've looked at similar problems which suggested setting the jQuery.ajax contentType etc, but none of this made sense to me.
Thanks
Sounds like you're got mixed character sets. UTF-8, ISO-8859 there. PHP won't mangle the single pound character into two on its own, but the browser might if it's been told to expect iso-8859 but is sent UTF-8 instead. the  is a dead giveaway for that.
Basically, make sure you're using UTF-8 at all stages of processing (database, PHP, html) and usually things will work much better.
Finally got this to work.
The problem seems to be that the jQuery.ajax transmits data to the server using UTF-8 but the PHP expects iso-8859-1.
Solution: in PHP convert UTF-8 to ISO using the utf8_decode function, e.g.
$incomming = utf8_decode($_REQUEST('incomming'));
And when you send data back for the ajax return handler, use utf8_encode() to convert back to UTF-8.
Other things that seem to work include using the javascript escape() function on the data prior to transmission to the server and then un-escape the data in PHP with URLdecode().
Other things I tried but couldn't get to work:
I tried to make ajax transmit in iso-8859-1 so it would be compatible with the PHP: In the jquery.ajax settings: contentType: "application/x-www-form-urlencoded; charset=iso-8859-1".
Seemed to have no effect.
I tried to make PHP use UTF-8: header('Content-Type: text/html; charset=utf-8').
Again it didnt work.
I'm working on a project in PHP (5.3.1) where I need to send a JSON string to a webservice (in python), but the result I get from json_encode does not pass as a valid JSON (i'm using JSLint to check validity).
I should add that the structure I'm trying to encode is fairly big (13K encoded), and consists partially of UTF8 data, and while json_encode does handle it, i get spaces in weird places in the result. For example, I could get {"hello":tru e} or {"hell o":true} which results in an error from the webservice since the JSON is invalid (or data, like in the second example).
I've also tried to use Zend framework for JSON encoding, but that didn't make much different.
Is there a known issue with JSON in PHP? Did anyone encounter that behavior and found a solution?
You state that "the structure I'm trying to encode ... consists partially of UTF8 data." This implies that it is also partially of non-UTF8 data. The json_encode doc has a comment at the bottom, that
json_encode() expects strings to be encoded to be in UTF8 format, while by default PHP strings are ISO-8859-1 encoded.
This means that
json_encode(array('àü'));
will produce a json representation of an empty string, while
json_encode(array(utf8_encode('àü')));
will work.
Are the failing segments of the JSON due to non-UTF8 input?
For sure object keys cannot contain spaces or any non unicode characters, unquoted variables can be only boolean, integer ,float, object and array value, strings should always be quoted.
Also, I would recommend you to add correct header before your json output.
if(!headers_sent())
header('Content-Type: application/json; charset=utf-8', true,200);
Can you also post your array or object that you passing to json_encode?
I was handling some automatically generated emails the other day and noticed the same weird behavior (spaces were inserted to the email body), so I started to check the email post and found the culprit:
From the SMTP RFC2821:
The maximum total length of a text
line including the is 1000
characters (not counting the leading
dot duplicated for transparency).
My email body was indeed in one line, so breaking it with \n's fixed the spaces issue.
After scratching my head for nearly a day, I've come to the conclusion that the problem was not in the json_encode function. It was with my post function.
Basically, the json_encode was preparing the data to be sent to another service. Before today, I've used stream_context_create and fopen to post data to the external service, but now I use fsockopen and fputs and it seems to be working.
Although I'm unsure as to the nature of the problem, I'm happy it works now :)
BTW: After this process, I mail myself the input and output (both in JSON) and this is how I saw there was a problem in the first place. This problem still persists but I guess that's related to the encoding of the mail or something of that sort.