I usually escape user input by doing the following:
htmlspecialchars($str,ENT_QUOTES,"UTF-8");
as well as mysql_real_escape_string($str) whenever a mysql connection is available.
How can this be improved? I have not had any problems with this so far, but I am unsure about it.
Thank you.
Data should be escaped (sanitized) for storage and encoded for display. Data should never be encoded for storage. You want to store only the raw data. Note that escaping does not alter raw data at all as escape characters are not stored; they are only used to properly signal the difference between raw data and command syntax.
In short, you want to do the following:
$data = $_POST['raw data'];
//Shorthand used; you all know what a query looks like.
mysql_query("INSERT " . mysql_real_escape_string($data));
$show = mysql_query("SELECT ...");
echo htmlentities($show);
// Note that htmlentities() is usually overzealous.
// htmlspecialchars() is enough the majority of the time.
// You also don't have to use ENT_QUOTES unless you are using single
// quotes to delimit input (or someone please correct me on this).
You may also need to strip slashes from user input if magic quotes is enabled. stripslashes() is enough.
As for why you should not encode for storage, take the following example:
Say that you have a DB field that is char(5). The html input is also maxlength="5". If a user enters "&&&&&", which may be perfectly valid, this is stored as "&&." When it's retrieved and displayed back to the user, if you do not encode, they will see "&&," which is incorrect. If you do encode, they see "&&," which is also incorrect. You are not storing the data that the user intended to store. You need to store the raw data.
This also becomes an issue in a case where a user wants to store special characters. How do you handle the storage of these? You don't. Store it raw.
To defend against sql injection, at the very least escape input with mysql_real_escape_string, but it is recommended to use prepared statements with a DB wrapper like PDO. Figure out which one works best, or write your own (and test it thoroughly).
To defend against XSS (cross-site-scripting), encode user input before it is displayed back to them.
If you only use mysql_real_escape_string($str) to avoid sql injection, make sure you always add single quotes around it in your query.
The htmlspecialchars is fine when parsing unsafe output to the screen.
For the database switch to PDO.
It's much easier and does the escaping for you.
http://php.net/pdo
Related
I'm interested to know whether or not it is necessary to escape output from a MySQL server if the data that is being retrieved has already been filtered when the user submitted a form.
Example:
1. The user submits a form with a comment for a blog post.
2. On form submission, prior to sending data to MySQL server, their input is filtered with FILTER_SANITIZE_SPECIAL_CHARS to prevent injection attacks.
3. Once the data has been posted to server, the user is rerouted to another screen where they can view their comment.
4. When retrieving their comment from the server (which has stored the filtered input), is it necessary to escape this output as well?
Here's the main issue for me. I'm taking user input from a form (for a blog post), sanitizing it with FILTER_SANITIZE_SPECIAL_CHARS, and then posting it to the MySQL server. If I retrieve this information from the server and display it in html, there are no issues. HOWEVER, I have been reading that you should ALWAYS escape output from servers as well. So I escaped the same post with htmlspecialchars(). Now, I have the issue that ALL special chars (including parentheses, and any quotes that are used by the user in their post) are coming back in their escaped html format. Not user friendly whatsoever.
What is the best work around for this, or is it even necessary to escape the output if it is coming from the server and has already been sanitized on user input?
Sanitization is not the same as escaping, and you should make sure not to confuse the two.
Sanitization is removing unwanted input. That is, if the user adds a <script> tag to their input, and you don't want their input to include <script> tags, then removing that <script> tag would be sanitization. Sanitization is not escaping data for an output context.
Escaping is properly encoding data for an output context. For example, to prevent HTML injection, you might call htmlspecialchars() to correctly encode & as &. To prevent SQL injection, you might use mysqli::real_escape_string() to convert ' to \'. (Though it would be highly preferable to use prepared statements / parameterized queries to prevent having to worry about sql injection or escaping at all.)
Importantly, escaping is context-specific. An escaping you use for HTML is not necessarily valid or sufficient for SQL (or vice-versa, or any other output context).
The problem with FILTER_SANITIZE_SPECIAL_CHARS is that that it's poorly named: it's doing both in one step, which is confusing for your database (since your database now has html-encoded data), and confusing for output (because now you have already-escaped data that is vulnerable to being multiply-escaped).
Instead, you should explicitly separate your sanitization and escaping efforts. Only sanitize data on input that you don't want to persist. Only escape data on output, and according to its proper output context.
The reason you want to store raw (pre-output-escaped) data in the database is so that if you ever need to output to a different context (e.g. now you're dong JSON output, or you need to write it to a file, or actually see what the raw data is), you won't need to unescape it first. (If you really have to, you might reasonably store a pre-escaped copy in a separate column, but you should always have your original data available.) It also makes the rule simple: always sanitize input; always escape output.
I'm adding some xss protection to the website I'm working on, the platform is zendFrameWork 2 and therefor I'm using Zend\escaper. from zend documentation i knew that:
Zend\Escaper is meant to be used only for escaping data that is to be
output, and as such should not be misused for filtering input data.
For such tasks, the Zend\Filter component, HTMLPurifier.
but what are the riskes if i escaped the data before inserting it into the database, am i so wrong to do that? please explane to me as im somehow new to this topic.
thanks
When encoding data before storing it you will have to decode it before you can do anything sensible with it before outputting it. That's why I'd not do it.
Let's say you have an international application and you want to store the escaped value of a form field which might contain any NON-ASCII characters those might become escaped into HTML-Entities. So what if you have to quantify the content of that field? Like counting the characters? You will always have to de-escape the content before counting it. and then you have to re-escape it again. Much work done but nothing gained.
The same applies to search-operations in your database. You will have to escape the search-phrase the same way then your input for the database to understand what you are looking for.
I'd use one character-set throughout the application and database (I prefer UTF-8, beware of the MySQL-Connection....) and only escape content on output. Thant way I can then do whatever I like with the data and are on the safe side on output. And escaping is done in my view-layer automaticaly so I don't even have to think about it every time I handle data as it works automaticaly. That way you can't forget it.
That does not prevent me from filtering and sanitizing the input. And it doesn't prevent me from escaping the database-content using the appropriate database-escaping mechanisms like mysqli_real_escape_string or similar or using prepared statements!
But that's just my opinion, others might think otherwise!
"Output" here refers to the web page. A form field ( HTML tag) is an INPUT (from the webpage), any text is an OUTPUT (to the webpage). You need to ensure any output (to the webpage) does not contain dangerous characters that could be used to forge XSS attack vectors.
This said, if you have DANGEROUS_INPUT_X given by the user and then
$NOT_DANGEROUS_ANYMORE = ZED.HtmlPurifier(DANGEROUS_INPUT_X)
DBSave($NOT_DANGEROUS_ANYMORE)
and somewhere else
$OUTPUT = DBLoad($NOT_DANGEROUS_ANYMORE)
echo $OUTPUT
you should be fine, as long as you do not apply any additional encoding/decoding to this output. It will be displayed in the way it is saved, that was safe.
I would suggest to look at output encoding more than validation: HtmlPurifier cleans the HTML, while you could accept any kind of bad characters if you ensure your output is encoded in the page.
Here https://www.owasp.org/index.php/XSS_(Cross_Site_Scripting)_Prevention_Cheat_Sheet some general rules, here the PHP example
echo htmlspecialchars($DANGEROUS_INPUT_X_NOW_OUTPUT, ENT_QUOTES, "UTF-8");
Remember to set the Character Set and be consistent with the same one throughout your pages/scripts/binaries and in the database as well.
I'm trying to sanitize a string to be saved in a db.
First step I took was to use addslashes(), but then I realized it didn't solve many security issues, so I added htmlspecialchars(), and now I have this line of code:
$val=htmlspecialchars(addslashes(trim($val)));
But then I was wondering if it makes any sense at all to use addslashes() on a string that will be processed by htmlspecialchars(), since the latter will "remove" any element that would cause problems, if I'm not mistaken.
In particular, I was wondering if that makes the server work twice without any real need.
You are wrong alltogether. addslashes() is no database escaping function, use the one that comes with your database access extension, like mysqli_real_escape_string().
htmlspecialchars() completey does not makes sense here. Only use it if you want to place a string within HTML - that should be when you output stuff, not when storing it in the database.
I wouldn't use either of those when saving the string to the database.
addslashes() escapes only quote characters and the backslash character (\). It's not adequate for avoiding SQL injection, because the DBMS may use other special characters which would have to be escaped as well. The best way to avoid SQL injection is to use PHP data objects and its support for bind parameters, which let you keep the parameter values out of the SQL string entirely. If PDO isn't an option for some reason, you should at least use a database-specific escaping function, e.g. mysqli_real_escape_string if you're using MySQL, to ensure that all the necessary characters are escaped.
htmlspecialchars() is for use when incorporating a non-HTML string into an HTML page; it escapes characters that are significant to a web browser, such as angle brackets, and has nothing to do with databases. Assuming that you're not generating and storing complete HTML documents in your database, you shouldn't be calling this function on values before putting them into the database. Store what the user actually entered, and call htmlspecialchars() when you retrieve the value from the database and you're about to actually put it into some HTML output.
For wont of avoiding SQL injection attacks, I'm looking to cleanse all of the text (and most other data) entered by the user of my website before sending it into the database for storage.
I was under the impression that the function inserted backslashes ( \ ) before all characters capable of being malicious ( \n , ' , " , etc ), and expected that the returned string would contain the newly added backslashes.
I performed a simple test on a made up string containing such potentially malicious characters and echo'd it to the document, seeing exactly what I expected: the string with backslashes escaping these characters.
So, I proceeded to add the cleansing function to the data before storing into the database. I inserted it (mysqli_real_escape_string( $link , $string)) into the query I build for data storage. Testing the script, I was surprised (a bit to my chagrin) to notice that the data stored in the database did not seem to contain the backslashes. I tested and tested and tested, but all to no avail, and I'm at a loss...
Any suggestions? Am I missing something? I was expecting to then have to remove the backslashes with the stripslashes($string) function, but there doesn't seem to be anything to strip...
When you view your data in the database after a successful insert, having escaped it with mysql_real_escape_string(), you will not see the backslashes in the database. This is because the escaping backslashes are only needed in the SQL query statement. mysql_real_escape_string() sanitizes it for insert (or update, or other query input) but doesn't result in a permanently modified version of the data when it is stored.
In general, you do not want to store modified or sanitized data in your database, but instead should be storing the data in its original version. For example, it is best practice to store complete HTML strings, rather than to store HTML that has been encoded with PHP's htmlspecialchars().
When you retrieve it back out from the database, there is no need for stripslashes() or other similar unescaping. There are some legacy (mis-)features of PHP like magic_quotes_gpc that had been designed to protect programmers from themselves by automatically adding backslashes to quoted strings, requiring `stripslashes() to be used on output, but those features have been deprecated and now mostly removed.
MySQL stores the data without the slashes (although it is passed to the RDBMS with the slashes). So you don't need to use stripslashes() later on.
You can be sure that the string was escaped, cause otherwise, the query would have failed.
I'm looking to cleanse all of the text (and most other data) entered by the user of my website
This is what you are doing wrong.
mysqli_real_escape_string does not "cleanse" anything. There is no word "cleanse" in it's name.
You should format, not "cleanse" your data. And different data require different formatting.
You should format ALL the data, not only data entered by the user of my website
In the current form you are leaving your site highly vulnerable to attacks and errors.
I was under the impression that the function inserted backslashes ( \ ) before all characters capable of being malicious ( \n , ' , " , etc ),
To let you know, there is nothing malicious in any character. There are some service characters, that can be misinterpreted in some circumstances.
But adding backslashes doesn't make your data automatically "safe". Some injections doesn't require any special characters. So, you need to properly format your data, not just use a some sort of magic that will make you magically safe
I have a simple textbox in a form and I want to safely store special characters in the database after POST or GET and I use the code below.
$text=mysql_real_escape_string(htmlspecialchars_decode(stripslashes(trim($_GET["text"])),ENT_QUOTES));
When I read the text from the database and put it in the text value I use the code above.
$text=htmlspecialchars($text_from_DB,ENT_QUOTES,'UTF-8',false);
<input type="text" value="<?=$text?>" />
I am trying to save in the database with no special characters (meaning I don't want to write in database field " or ')
Actually when writing to the database do htmlspecialchars_decode to the text.
When writing to the form text box do htmlspecialchars to the text.
Is this the best approach for safe writing special chars to the database?
You have the right idea of keeping the text in the database as raw. Not sure what all the HTML entity stuff is for; you shouldn't need to be doing that for a database insertion.
[The only reason I can think of why you might try to entity-decode incoming input for the database would be if you find you are getting character references like Š in your form submission input. If that's happening, it's because the user is inputting characters that don't exist in the encoding used by the page with the form. This form of encoding is totally bogus because you then can't distinguish between the user typing Š and literally typing Š! You should avoid this by using the UTF-8 encoding for all your pages and content, as every possible character fits in this encoding.]
Strings in your script should always be raw text with no escaping. That means you don't do anything to them until the time you output them into a context that isn't plain-text. So for putting them into an SQL string:
$category= trim($_POST['category']);
mysql_query("SELECT * FROM things WHERE category='".mysql_real_escape_string($category)."'");
(or use parameterised queries to avoid having to manually escape it.) When putting content into HTML:
<input type="text" name="category" value="<?php echo htmlspecialchars($category); ?>" />
(you can define a helper function with a shorter name like function h($s) { echo htmlspecialchars($s, ENT_QUOTES); } if you want to cut down on the amount of typing you have to do in templates.)
And... that's pretty much it. You don't need to process strings that come out of the database, as they're already raw strings. You don't need to process input strings(*), other than any application-specific field validation you want to do.
*: well, except if magic_quotes_gpc is turned on, in which case you do either need to stripslashes() everything that comes in from get/post/cookie, or, my favoured option, just immediately fail:
if (get_magic_quotes_gpc())
die(
'Magic quotes are turned on. They are utterly bogus and no-one should use them. '.
'Turn them off, you idiot, or I refuse to run. So there!'
);
When you write to db, use htmlentities but when you read back, use html_entity_decode function.
As a sidenote, if you are looking for some security, then for strings use mysql_real_escape_string and for numbers use intval.
I'd like to point out a couple of things:
there is nothing wrong in saving characters like ' and " in a database, SQL injections are just a matter of string manipulation, they actually have nothing to do with SQL or databases -- the problem only relies in how the query string is built. If you want to write your own queries (not recommended) you don't have to encode every apostrophe or double quote: just escape them once to build a safe string, and save them in the database. A better approach is using PDO as mentioned, or using the mysqli extension which allows queries with prepared statements
htmlentities() and similar functions should be used when sending data as output to the browser, not for encoding data to be stored in a database for at least two reasons: first of all it's useless, the DB doesn't care about html entities, it just contains data; secondly you should always treat data coming from the database as potentially insecure, so you should save it in "raw" format and encode it when using it.
The best approach to safe write to a DB is to use the PDO abstraction layer and make use of prepared statements.
http://www.php.net/manual/en/intro.pdo.php
A good tutorial (I learned from this one) is
http://www.phpro.org/tutorials/Introduction-to-PHP-PDO.html
However, you might have to rewrite alot of your site just to implement this. But this is no doubt the most elegant method than having to make use of all those functions. Plus, prepared statements are becoming the de facto now. Another benefit of this is that you do not have to rewrite your queries if you switch to a different database (such as from MySQL to PostgreSQL). But I would say consider this if you plan to scale your site.