I am using codeigniter in an app. There is a form. In the textarea element, I wrote something including
%Features%
However, when I want to echo this by $this->input->post(key), I get something like
�atures%
The '%Fe' are vanished.
In main index.php file of CI, I tried var_dump($_POST) and I see the above word is fully ok. but when I am fetching it with the input library (xss filtering is on) I get the problem.
When the XSS filtering is off, it appears ok initially. however, if I store it in database and show next time, I see same problem (even the xss filtering is off).
%Fe happens to look like a URL-encoded sequence %FE, representing character 254. It's being munched into the Unicode "I have no idea what that sequence means" glyph, �.
It's clear that the "XSS filter" is being over-zealous when decoding the field on submission.
It's also very likely that a URL-decode is being run again at some point later in the process, when you output the result from the database. Check the database to make sure that the actual string is being represented properly.
First: Escape the variables before storing them into db. % has special meaning in SQL.
Second: % also has special meaning in URLs eg. %20 is %FE will map to some character which will be decoded by input()
Related
I thought the proper way to "sanitize" incoming data from an HTML form before entering it into a mySQL database was to use real_escape_string on it in the PHP script, like this:
$newsStoryHeadline = $_POST['newsStoryHeadline'];
$newsStoryHeadline = $mysqli->real_escape_string($newsStoryHeadline);
$storyDate = $_POST['storyDate'];
$storyDate = $mysqli->real_escape_string($storyDate);
$storySource = $_POST['storySource'];
$storySource = $mysqli->real_escape_string($storySource);
// etc.
And once that's done you could just insert the data to the DB like this:
$mysqli->query("INSERT INTO NewsStoriesTable (Headline, Date, DateAdded, Source, StoryCopy) VALUES ('".$newsStoryHeadline."', '".$storyDate."', '".$dateAdded."', '".$storySource."', '".$storyText."')");
So I thought doing this would take care of cleaning up all the invisible "junk" characters that may be coming in with your submitted text.
However, I just pasted some text I copied from a web-page into my HTML form, clicked "submit" - which ran the above script and inserted that text into my DB - but when I read that text back from the DB, I discovered that this piece of text did still have junk characters in it, such as –.
And those junk characters of course caused the PHP script I wrote that retrieves the information from the DB to crash.
So what am I doing wrong?
Is using real_escape_string not the way to go here? Or should I be using it in conjunction with something else?
OR, is there something I should be doing (like more escaping) when reading reading data back out from the the mySQL database?
(I should mention that I'm an Objective-C developer, not a PHP/mySQL developer, but I've unfortunately been given this task to do some DB stuff - hence my question...)
thanks!
Your assumption is wrong. mysqli_real_escape_string’s only intention is to escape certain characters so that the resulting string can be safely used in a MySQL string literal. That’s it, nothing more, nothing less.
The result should be that exactly the passed data is retained, including ‘junk’. If you don’t want that ‘junk’ in your database, you need to detect, validate, or filter it before passing to to MySQL.
In your case, the ‘junk’ seems to be due to different character encodings: You input data seems to be encoded with UTF-8 while it’s later displayed using Windows-1250. In this scenario, the character – (U+2013) would be encoded with 0xE28093 in UTF-8 which would represent the three characters â, €, and “ in Windows-1250. Properly declaring the document’s encoding would probably fix this.
Sanitization is a tricky subject, because it never means the same thing depending on the context. :)
real_escape_string just makes sure your data can be included in a request (inside quotes, of course) without having the possibility to change the "meaning" of the request.
The manual page explains what the function really does: it escapes nul characters, line feeds, carriage returns, simple quotes, double quotes, and "Control-Z" (probably the SUBSTITUTE character). So it just inserts a backslash before those characters.
That's it. It "sanitizes" the string so it can be passed unchanged in a request. But it doesn't sanitize it under any other point of view: users can still pass for instance HTML markers, or "strange" characters. You need to make rules depending on what your output format is (most of the time HTML, but HTTP isn't restricted to HTML documents), and what you want to let your users do.
If your code can't handle some characters, or if they have a special meaning in the output format, or if they cause your output to appear "corrupted" in some way, you need to escape or remove them yourself.
You will probably be interested in htmlspecialchars. Control characters generally aren't a problem with HTML. If your output encoding is the same as your input encoding, they won't be displayed and thus won't be an issue for your users (well, maybe for the W3C validator). If you think it is, make your own function to check and remove them.
I'm storing data in a MySQL database that may have some special characters. I'm wondering how to store it so that these characters are preserved if they're either output to HTML via PHP OR via JavaScript, e.g. createTextNode.
For example, the division symbol (÷) has the html code ÷, and when I store it as that it shows up fine when put directly into HTML by PHP, but when I pull it into JavaScript using $.getJSON and then insert it with createTextNode it shows up looking like ÷.
I also tried storing the symbol in the SQL directly, but my understanding is that the column would need to be changed from VARCHAR to NVARCHAR and that would cause a performance hit that doesn't seem necessary.
Given that I can modify the SQL, the PHP, or the JavaScript, is there an easy fix here? Maybe a way to unescape the HTML entity in JavaScript?
As answered by Yogesh, you should switch your collation of the DB to utf8_general_ci
So there's probably two things going on:
JSON escapes special characters.
Somewhere, something in your code flow is URL encoding the strings too.
So you just need to decode the string in your JavaScript, or you need to find what part of your code is URL encoding those strings and fix it.
I'm having some trouble with the dreaded UTF-8 Character Encoding! It's driving me insane, no matter which way I approach it or how many online guides I follow, I can never get it to return the desired results. Here's what's going on:
My whole website uses a simple text-file database that is UTF-8 encoded, and it correctly shows all manner of special characters, latin, arabic, japanese, you name it, they all show correctly, with one exception:
When the user uses the "Search" input box I have on my website, I use $search = $_REQUEST['search']; to get the input data on the results page and show results accordingly. When a user inserts special characters in the search box, they get "Percent Encoded" in the URL (for example, "ï" becomes "%E3%AF"). When showing $string in the actual website, any special character appears as � (black diamond with question mark).
I have tried everthing it says here http://malevolent.com/weblog/archive/2007/03/12/unicode-utf8-php-mysql/ with the exception of the header(). I have set the charset as UTF-8 in my head section with an http-equiv meta but for some reason whenever I set it as a header() my PHP stylesheet stops working (and the character problem remains). Maybe this is a clue?
I have tried urldecode and rawurldecode too, but they don't change anything.
Keep in mind special characters appear correctly elsewhere on the site, it's only with the $search string where this problem appears. As a side-note, even though the characters are not visualizing correctly, my search engine does actually interpret the special characters correctly when filtering the results. This makes me understand that the special character is actually there and correctly encoded, but it's just a matter of making it visualize correctly with the correct charset. However... everything appears to be UTF-8.
To be honest I'm so confused about this that this question might also appear to be confusing and the information I'm giving you might not be very well structured either, so I apologize and will try to provide more detailed information for any questions.
Thank you!
Make sure not to have any function which alters your $_REQUEST. Some functions are not aware of special encodings.
The best way to investigate is checking the state of the variables before and after they are altered.
I would like to add one thing more point regarding utf-8 string manipulation.
When manipulating utf-8 strings always use multibyte string functions.
use mb_strtolower in place of strtolower()
http://php.net/manual/en/ref.mbstring.php.
I'm unsure if this is a php-, filemaker-, mysql- or an odbc driver issue.
For security reasons the input fields of my current php webform convert special characters into hex codes, (for example: # becomes ' ) This hex code is saved in the database and will also be shown in Filemaker11 as the hex code. This is not what i want.
How can I make sure the special character will be displayed as it should be?
The other way round (from filemaker to db), no conversion will be done on inserting the special characters.
How can I make sure everything will be consistent?
Kind regards,
Jeroen
FileMaker is just showing the data stored in MySQL. If you pull up the DB in a tool like PhpMyAdmin you should see that the varchar contains the encoding as well. Since FMP is looking at it simply as a text field, it shows the encoding that was stored. If you wanted to decode in FMP you could show a calc field of the varchar that has a custom function to decode the text. (but that won't allow for updating the data..) You could also try a trigger on record load to decode the data in the fields so that you can properly view/edit.
Solved it! It appeared that I had to add an extra line to my PHP script.
after setting up the connection, php needs to tell mysql what the encoding needs to be. This can be done with the following line:
$dbh->query("SET NAMES 'utf8'");
Thanks for the effort guys!
This: ' type of encoding is not done automatically by the browser. Something is doing it. Normally you do it only on output not on input.
You can use html_entity_decode() to undo it. But I strongly suggest you figure out why it's happening in the first place.
The example web page has 2 fields and allows a user to enter a title and code. Both fields would later be embed and displayed in an HTML page for viewing and/or editing but not execution. In other words, any PHP or javascript or similar should not run but be displayed for editing and copying.
In this case, what is the best way to escape these fields before database insertion and after (for HTML display)
You need to use the function htmlspecialchars() in php
that will change any special characters (eg < and >) into their special HTML encoded characters (eg < and >). When you get these from the database and output them as HTML they will display as code, but won't harm your script or execute.
I faced with the same problem a few days back, to put the codes (javascript or PHP ) in the html in a non executable way, I used textarea, it solved the purpose.
The problem however, was with the database. I cannot use the typical escape functions with the data, as it is affecting my data, for example the tags are getting messed up.
To solve this problem, I encoded the data in base 64 format before putting it in the database. So what is happening is my JavaScript code is encoded and the resultant code is no longer a Javascript code and I can use the escape functions on this and store it in the database.
I am open to suggestions, feel free to comment.