Please note that this is a JQuery-Ajax-CodeIgniter issue.
All files involved in the post process are UTF8 encoded.
At non-Ajax calls, all of my forms work perfectly with UTF8 characters.
At Ajax calls, the Firebug/Chrome sniffers display the posted values correctly.
If I print the $_POST array in the index.php (root file), the values appear correctly.
CI is doing something that is placing question marks where the curly quotes are and trimming the strings at the first accented character.
From the sound of it, iconv could be messing with the strings, so I tried removing XSS, commenting the iconv and mb_convert_encoding lines from CI system/core/Input.php and system/core/Utf8.php, but no joy.
After suffering a bit I decided to have a look at the $_REQUEST. OMG, the strings are correctly displayed there, so instead of $_POST I use $_REQUEST (not that I like it).
I found the following post saying that we should use $_REQUEST instead of $_POST:
`$('#form').serialize()` messes up UTF-8 characters, which then points back to the PHP urldecode function.
I am not sure the problem is related to the urldecode as the $_POST array changes after being treated by CI.
So, should I keep using $_REQUEST in my Ajax controllers or is there a better solution for this problem?
Any thoughts in the security/XSS side?
Thanks!
Related
I'm grabbing a bunch of data from a database and putting it into a PHP array. I'm then looking to json_encode that array using $output = json_encode($out).
My issue is that from time to time, something in the array is not able to be read by json_encode and the whole thing fails. If I use print_r($out) to have a look, I can clearly see where it's failing, because the character that is screwing things up always appears as a question mark inside of a black diamond �.
First - what are these characters?
Second - Is there a function I can pass the elements through prior to adding them to the array that would strip these out, or replace 'them' with blanks?
I found the answer to this. Since the data coming FROM the database was stored with the "black diamond" character, I needed to get this out POST grabbing it from the database.
$x[4] = utf8_encode(odbc_result($query, 'B'));
By passing the result through utf8_encode, the string is encoded into UTF-8 and the illegal character is removed.
Say echo json_encode($out);
This will solve your issue
Black diamonds are browser issue. Database uses plain question marks.
It seems you are getting already wrong data from databalse. But that's quite tricky to have incorrect utf with your settings. You need to check everything
if your table marked with utf8 charset
if your data indeed encoded in utf (not marked but indeed encoded)
if your server sending correct charset in Content-type header.
it is also useful to see the page choosing different charsets from your browser menu.
But first of all you have to wipe any trace of all random actions you tried, all these various encode, decode and stuff. Just plain and direct output from database. Otherwise you will never get to the problem
I'm attempting to send a string from client-side JavaScript to a back-end PHP script. The string contains special quotes like ’ and “.
When I look at the console in Chrome I can see that these are sent in the POST headers as they are. On the PHP side I then immediately json_encode() the $_POST array and send it back to see what its collected. The special characters now look like this \u2019. This is for testing please note I would normally sanitize all post data.
I wish to use UTF-8 but I'm not sure what I'm missing. My HTML includes:
<meta charset="utf-8">
My PHP server has UTF-8 set as its default charset.
If I start saving such data to the database I start ending up with strings like this: â for ’. However this is not a database issue the characters are already bad before going into the database. MySQL purely accentuates them.
Any ideas?
Update
I've noticed that if I return the string back to javascript without using json_encode() then it's in its original format with the special quotes (’ and “) still.
Have you tried:
utf8_decode()
On the server side for the variables you're passing? PHP is likely expecting iso-8859-1 rather than uft-8.
Turns out there was an issue both sides of the pond. The issue PHP side which this question regards was that the data was being sent to the back-end via a GET request (url encoded). I have changed this to a POST request.
This has allowed me to specify the UTF-8 charset when sending the headers for the POST request.
This is something I encounter often, and I usually end up resorting to the try and try again method until the data works. I figured SO would know what the best practice is in order to maintain the data and not mess up the json.
Let's assume the data I want to send is text data of the most annoying sort - special characters, &,<,", \n, \n\r, \t, +, etc.
Let's also assume I want to keep everything in utf8, and my mysql table is configured to be utf8. However, since PHP's utf8 support is lacking, this should be considered.
What encoding / escaping / htmlentities should I be doing from:
1) Sending JSON data from client JS to PHP via AJAX POST (anything different for GET?)
2) Decoding data in PHP and storing text string in mysql database (or store the escaped/encoded data? )
3) Retrieving data from MySQL DB in PHP and returned as JSON response to JS AJAX request
4) In a JSON response from our REST api
Whenever I use php/mysql/jquery to pass data back and forth, I end up using the following combination of encodings/escapings, and it seems to work well for me.
1) you don't need to do anything here, UNLESS you are sending a URL (I think this is only for GET requests) - but if you're sending a url you need to use encodeURIComponent(url), which will properly escape the &'s and special characters in the url (see more here).
2) Use mysqli and bound parameters, it will do all the escaping for you (read about it here)
3) I always use this when echoing data into an HTML file :
<?php
htmlspecialchars($string_to_escape, ENT_QUOTES, 'UTF-8', false);
?>
This will properly encode all special characters (the false is for "no double encoding"). Also make sure you the proper UTF-8 meta tags at the top of your html pages.
4) Using json_encode should always escape your data properly, but I would use the code from #3 just to make sure. But you'll probably only need it if you're returning data with special characters in it.
for sending json data to php you don't have to do anything special. JSON is just a serialized javascript 'variable'
use prepared statements! do not try to decode/strip/alter the content with php
use the appropriate functions to escape data for json (I don't know if there's a builtin php function for that)
same as 3.
My friend has been playing around with some language stuff on our site and our file names are being out put with these characters now. Usually I'd wait for him to wake up but this is a pretty big issue as we are getting e-mails through about the weird characters in the file names.
You don't see the characters when echoed in HTML, but we have the names being output to a header, which does show the characters, like so:
header('Content-Disposition: attachment; filename="'.$title.'.'.strtolower($type).'";');
How can we avoid these characters from displaying? They are also being input to our database, file names such as asdfmovie - I have googled the codes but I can't find any results for them.
Does anyone know what they are? and how to avoid them?
Thank you
html_entity_decode()
http://php.net/manual/en/function.html-entity-decode.php
These are html entities that are valid in HTML. Your email client is actually encoding them into HTML entities (a double effect), which means that the actual entities are what you're seeing. Just make sure that anything passed into the email runs through the html_entity_decode() function.
These are HTML entities which can be decoded using html_entity_decode, like echo html_entity_decode($str, ENT_COMPAT, 'UTF-8').
It's wrong to store such values in the database though, as you are seeing. The values should be stored in their original form and only HTML entity encoded when necessary for outputting to HTML. Figure out where they're being HTML encoded and fix that. If you already have a database full of this nonsense... um, have fun reversing it. :o)
I'm working on a project in PHP (5.3.1) where I need to send a JSON string to a webservice (in python), but the result I get from json_encode does not pass as a valid JSON (i'm using JSLint to check validity).
I should add that the structure I'm trying to encode is fairly big (13K encoded), and consists partially of UTF8 data, and while json_encode does handle it, i get spaces in weird places in the result. For example, I could get {"hello":tru e} or {"hell o":true} which results in an error from the webservice since the JSON is invalid (or data, like in the second example).
I've also tried to use Zend framework for JSON encoding, but that didn't make much different.
Is there a known issue with JSON in PHP? Did anyone encounter that behavior and found a solution?
You state that "the structure I'm trying to encode ... consists partially of UTF8 data." This implies that it is also partially of non-UTF8 data. The json_encode doc has a comment at the bottom, that
json_encode() expects strings to be encoded to be in UTF8 format, while by default PHP strings are ISO-8859-1 encoded.
This means that
json_encode(array('àü'));
will produce a json representation of an empty string, while
json_encode(array(utf8_encode('àü')));
will work.
Are the failing segments of the JSON due to non-UTF8 input?
For sure object keys cannot contain spaces or any non unicode characters, unquoted variables can be only boolean, integer ,float, object and array value, strings should always be quoted.
Also, I would recommend you to add correct header before your json output.
if(!headers_sent())
header('Content-Type: application/json; charset=utf-8', true,200);
Can you also post your array or object that you passing to json_encode?
I was handling some automatically generated emails the other day and noticed the same weird behavior (spaces were inserted to the email body), so I started to check the email post and found the culprit:
From the SMTP RFC2821:
The maximum total length of a text
line including the is 1000
characters (not counting the leading
dot duplicated for transparency).
My email body was indeed in one line, so breaking it with \n's fixed the spaces issue.
After scratching my head for nearly a day, I've come to the conclusion that the problem was not in the json_encode function. It was with my post function.
Basically, the json_encode was preparing the data to be sent to another service. Before today, I've used stream_context_create and fopen to post data to the external service, but now I use fsockopen and fputs and it seems to be working.
Although I'm unsure as to the nature of the problem, I'm happy it works now :)
BTW: After this process, I mail myself the input and output (both in JSON) and this is how I saw there was a problem in the first place. This problem still persists but I guess that's related to the encoding of the mail or something of that sort.