I've been working on a code that escapes your posts if they are strings before you enter them in DB, is it an good idea? Here is the code: (Updated to numeric)
static function securePosts(){
$posts = array();
foreach($_POST as $key => $val){
if(!is_numeric($val)){
if(is_string($val)){
if(get_magic_quotes_gpc())
$val = stripslashes($val);
$posts[$key] = mysql_real_escape_string($val);
}
}else
$posts[$key] = $val;
}
return $posts;
}
Then in an other file:
if(isset($_POST)){
$post = ChangeHandler::securePosts();
if(isset($post['user'])){
AddUserToDbOrWhatEver($post['user']);
}
}
Is this good or will it have bad effects when escaping before even entering it in the function (addtodborwhater)
When working with user-input, one should distinguish between validation and escaping.
Validation
There you test the content of the user-input. If you expect a number, you check if this is really a numerical input. Validation can be done as early as possible. If the validation fails, you can reject it immediately and return with an error message.
Escaping
Here you bring the user-input into a form, that can not damage a given target system. Escaping should be done as late as possible and only for the given system. If you want to store the user-input into a database, you would use a function like mysqli_real_escape_string() or a parameterized PDO query. Later if you want to output it on an HTML page you would use htmlspecialchars().
It's not a good idea to preventive escape the user-input, or to escape it for several target systems. Each escaping can corrupt the original value for other target systems, you can loose information this way.
P.S.
As YourCommonSense correctly pointed out, it is not always enough to use escape functions to be safe, but that does not mean that you should not use them. Often the character encoding is a pitfall for security efforts, and it is a good habit to declare the character encoding explicitely. In the case of mysqli this can be done with $db->set_charset('utf8'); and for HTML pages it helps to declare the charset with a meta tag.
It is ALWAYS a good idea to escape user input BEFORE inserting anything in database. However, you should also try to convert values, that you expect to be a number to integers (signed or unsigned). Or better - you should use prepared SQL statements. There is a lot of info of the latter here and on PHP docs.
Related
I am a PHP newbie and am working on a basic form validation script. I understand that input filtering and output escaping are both vital for security reasons. My question is whether or not the code I have written below is adequately secure? A few clarifying notes first.
I understand there is a difference between sanitizing and validating. In the example field below, the field is plain text, so all I need to do is sanitize it.
$clean['myfield'] is the value I would send to a MySQL database. I am using prepared statements for my database interaction.
$html['myfield'] is the value I am sending back to the client so that when s/he submits the form with invalid/incomplete data, the sanitized fields that have data in them will be repopulated so they don't have to type everything in from scratch.
Here is the (slightly cleaned up) code:
$clean = array();
$html = array();
$_POST['fname'] = filter_var($_POST['fname'], FILTER_SANITIZE_STRING);
$clean['fname'] = $_POST['fname'];
$html['fname'] = htmlentities($clean['fname'], ENT_QUOTES, 'UTF-8');
if ($_POST['fname'] == "") {
$formerrors .= 'Please enter a valid first name.<br/><br/>';
}
else {
$formerrors .= 'Name is valid!<br/><br/>';
}
Thanks for your help!
~Jared
I understand that input filtering and output escaping are both vital for security reasons.
I'd say rather that output escaping is vital for security and correctness reasons, and input filtering is potentially-useful measure for defence-in-depth and to enforce specific application rules.
The input filtering step and the output escaping step are necessarily separate concerns, and cannot be combined into one step, not least because there are many different types of output escaping, and the right one has to be chosen for each output context (eg HTML-escaping in a page, URL-escaping to make a link, SQL-escaping, and so on).
Unfortunately PHP is traditionally very hazy on these issues and so offers a bunch of mixed-message functions that are likely to mislead you.
In the example field below, the field is plain text, so all I need to do is sanitize it.
Yes. Alas, FILTER_SANITIZE_STRING is not in any way a sane sanitiser. It completely removes some content (strip_tags, which is itself highly non-sensible) whilst HTML-escaping other content. eg quotes turn into ". This is a nonsense.
Instead, for input sanitisation, look at:
checking it's a valid string for the encoding you're using (hopefully UTF-8; see eg this regex for that);
removing control characters, U+0000–U+001F and U+007F–U+009F. Allow the newline through only on deliberate multi-line text fields;
removing the characters that are not suitable for use in markup;
validating the input conforms to application requirements on a field-by-field basis, for data whose content model is more specific than arbitrary text strings. Although your escaping should handle a < character correctly, it's probably a good idea to get rid of it early in fields where it makes no sense to have one.
For the output escaping step I'd generally prefer htmlspecialchars() to htmlentities(), though your correct use of the UTF-8 argument stops the latter function breaking in the way it usually does.
Depending on what you want to secure, the filter you call might be overactive (see comments). Injectionwise you should be safe since you're using Prepared Statements (see this answer)
On a design note you might want to filter first, then check for empty values. Doing that you can shorten your code ;)
I understand that input filtering ... is vital for security reasons.
This is wrong statement.
Although it can be right in some circumstances, in such a generalised form it can do no good but false feeling of safety.
all I need to do is sanitize it.
There is no such thing like "general sanitizing". You have to understand each particular case and it's limitations. For example, for the database you need to use several different sanitization techniques, not one. While for the filenames it is going to be completely different one.
I am using prepared statements for my database interaction.
Thus, you should not touch the data at all. Just leave it as is.
Here is the (slightly cleaned up) code:
It seems there is some overkill in your code.
you are cleaning your HTML data twice while it is possible that you won't need it at all.
and for some reason you are raising an error on success.
I'd make it rather this way
$formerrors = '';
if ($_POST['fname'] == "") {
$formerrors .= 'Please enter a valid first name.<br/><br/>';
}
if (!$formerrors) {
$html = array();
foreach ($_POST as $key => $val) {
$html[$key] = htmlspecialchars($val,ENT_QUOTES);
}
}
I'm quite confused now and would like to know, if you could clear things up for me.
After the lateste Anon/Lulsec attacks, i was questioning my php/mysql security.
So, i thought, how could I protect both, PHP and Mysql.
Question: Could anyone explain me, what's best practice to handle PHP and Mysql when it comes to quotes?
Especially in forms, I would need some kind of htmlspecialchars in order to protect the html, correct?
Can PHP be exploitet at all with a form? Is there any kind of protection needed?
Should I use real_escape_string just before a query? Would it be wrong/bad to use it already within PHP (see sanitize_post function)?
Currently i'm using the following function. The function "sanitizes" all $_POST and $_GET variables. Is this "safe"?
function sanitize_post($array) {
global $db;
if(is_array($array)) {
foreach($array as $key=>$value) {
if(is_array($array[$key])) {
$array[$key] = sanitize_post($array[$key]);
} elseif(is_string($array[$key])) {
$array[$key] = $db->real_escape_string(strtr(stripslashes(trim($array[$key])), array("'" => '', '"' => '')));
}
}
} elseif(is_string($array)) {
$array = $db->real_escape_string(strtr(stripslashes(trim($array)), array("'" => '', '"' => '')));
}
return $array;
}
I'm using PHP 5.3.5 with Mysql 5.1.54.
Thanks.
mysql_real_escape_string deserves your attention.
However direct queries are a quagmire and no longer considered safe practice. You should read up on PDO prepared statements and binding parameters which has a side benefit of quoting, escaping, etc. built-in.
BEST practice is always to use prepared statements. This makes SQL injection impossible. This is done with either PDO or mysqli. Forget about all the mysql_* functions. They are old and obsolete.
Question: Could anyone explain me, what's best practice to handle PHP
and Mysql when it comes to quotes?
That's easy: Use prepared statements, e. g. with PDO::prepare or mysqli_prepare.
There is nothing like "universal sanitization". Let's call it just quoting, because that's what its all about.
When quoting, you always quote text for some particular output, like:
string value for mysql query
like expression for mysql query
html code
json
mysql regular expression
php regular expression
For each case, you need different quoting, because each usage is present within different syntax context. This also implies that the quoting shouldn't be made at the input into PHP, but at the particular output! Which is the reason why features like magic_quotes_gpc are broken (always assure it is switched off!!!).
So, what methods would one use for quoting in these particular cases? (Feel free to correct me, there might be more modern methods, but these are working for me)
mysql_real_escape_string($str)
mysql_real_escape_string(addcslashes($str, "%_"))
htmlspecialchars($str)
json_encode() - only for utf8! I use my function for iso-8859-2
mysql_real_escape_string(addcslashes($str, '^.[]$()|*+?{}')) - you cannot use preg_quote in this case because backslash would be escaped two times!
preg_quote()
Don't waste the effort using mysql_real_escape_string() or anything like that. Just use prepared statements with PDO and SQL injection is impossible.
I usually use the PHP functions stripslashes and strip_tags on the variables as they come in via $_POST (or $_GET, depending on what you use) and mysql_real_escape_string during the query. (I'm not sure if this is "right" but it's worked for me so far.) You can also use PHP's built in validate filters to check things like email addresses, url's, data types, etc. PDO is supposedly decent at preventing SQL injection but I haven't had any experience with it yet.
The basic workflow should be
$data = $_POST['somefield which will go into the database'];
... do data validation ...
if (everything ok) {
$escaped_data = escape_function($data);
$sql = " ... query here with $escaped_data ... ";
do_query($sql);
}
Basically, data that's been escaped for database insertion should ONLY be used for database insertion. There's no point in pre-processing everything and overwriting all data with db-escaped values, when only 2 or 3 of 50(say) values actually go anywhere near the db.
Ditto for htmlspecialchars. Don't send data through htmlspecialchars unless it's headed for an HTML-type display.
Don't store data in the DB formatted for one particular purpose, because if you ever need the data in a different form for some other purpose, you have to undo the escaping. Always store raw/unformatted data in the db. And note: the escaping done with mysql_real_escape_string() and company does not actually get stored in the db. It's there only to make sure the data gets into the database SAFELY. What's actually stored in the db is the raw unescaped/unquoted data. Once it's in the database, it's "safe".
e.g. consider the escaping functions as handcuffs on a prisoner being transferred. While the prisoner is inside either jail, cuffs are not needed.
I have done some research and still confused, This is my outcome of that research. Can someone please comment and advise to how I can make these better or if there is a rock solid implementation already out there I can use?
Method 1:
array_map('trim', $_GET);
array_map('stripslashes', $_GET);
array_map('mysql_real_escape_string', $_GET);
Method 2:
function filter($data) {
$data = trim(htmlentities(strip_tags($data)));
if (get_magic_quotes_gpc())
$data = stripslashes($data);
$data = mysql_real_escape_string($data);
return $data;
}
foreach($_GET as $key => $value) {
$data[$key] = filter($value);
}
Both methods you show are not recommendable
Blanket "sanitizing" data is counter-productive, because data needs to be sanitised in different ways depending on how it is going to be used: Using it in a database query needs different sanitation from outputting it in HTML, or from using it as parameters in a command line call, etc. etc.
The best way to do sanitation is immediately before the data is being used. That way, it is easy for the programmer to see whether all data is actually getting sanitized.
If you use the mysql_* family of functions, do a mysql_real_escape_string() on every argument you use in a query:
$safe_name = mysql_real_escape_string($_POST["name"]);
$safe_address = mysql_real_escape_string($_POST["address"]);
$result = mysql_query ("INSERT INTO table VALUES '$safe_name', '$safe_address'");
If you use the PDO or mysqli families of functions, you can make use of parametrized queries, which eliminate most of the SQL injection woes - all everyday ones at any rate.
It is perfectly possible to write safe queries with mysql_*, and it is also perfectly possible to introduce a security hole in a PDO query, but if you are starting from scratch, consider using PDO or mysqli straight away.
Use strip_tags() only if you are planning to output user entered data in HTML; note that you need to do an additional htmlspecialchars() to reliably prevent XSS attacks.
The only blanket sanitation method that has some merit is the
if (get_magic_quotes_gpc())
$data = stripslashes($data);
call which filters out the layer of escaping added by the now-deprecated "magic quotes" feature of earlier versions of PHP.
I think this is enough (EDIT: with $data we mean here e.g. one text field from a form, $_POST['example'] etc):
if (get_magic_quotes_gpc())
$data = stripslashes($data);
$data = mysql_real_escape_string($data);
I usually do the first when receiving the $_POST or $_GET data (before input testing) and the latter right before - or during - composing of the sql query.
Additionaly trim it, if you want (maybe you don't always want that). However the best solution would be using some libraries for working with database.
You can either:
Escape all user input supposed for the DB using mysql_real_escape_string (or the mysqli_variant)
Use prepared statements (with mysqli_ or PDO)
Note that you should turn off magic_quotes if possible. If it's on, just use stripslashes (like in your example) as it's deprecated and you should not rely on it.
Also note that you should not do this for data which is not supposed to be inserted into the database as it's completely useless.
I like type-casting whenever you're dealing with ints;
e.g.
$id=(int)$_GET['id'];
if($id<1){//I generally don't pass around 0 or negative numbers
//Injection Attempt
die('Go Away');
}
//Continue with the number I **know** is a number
Well, among your codes only mysql_real_escape_string() function is SQL injection related. However, being completely misplaced.
And it's not enough.
In fact, this function should be used with quoted strings only.
And it's useless for the everything else. So, in any other case other precautions should be taken.
For the complete explanation see my earlier answer: In PHP when submitting strings to the database should I take care of illegal characters using htmlspecialchars() or use a regular expression?
All other functions you're using have no relation to SQL at all.
Note that using both strip_tags and htmlentities is redundant. and htmlentities is obsolete.
I'd use only htmlspecialchars() instead of these two, and not before inserting into database but right before sending it to browser.
function sanitizeString($var)
{
$var = stripslashes($var);
$var = htmlentities($var);
$var = strip_tags($var);
return $var;
}
function sanitizeMySQL($var)
{
$var = mysql_real_escape_string($var);
$var = sanitizeString($var);
return $var;
}
I got these two functions from a book and the author says that by using these two, I can be extra safe against XSS(the first function) and sql injections(2nd func).
Are all those necessary?
Also for sanitizing, I use prepared statements to prevent sql injections.
I would use it like this:
$variable = sanitizeString($_POST['user_input']);
$variable = sanitizeMySQL($_POST['user_input']);
EDIT: Get rid of strip_tags for the 1st function because it doesn't do anything.
Would using these two functions be enough to prevent the majority of attacks and be okay for a public site?
To be honest, I think the author of these function has either no idea what XSS and SQL injections are or what exactly the used function do.
Just to name two oddities:
Using stripslashes after mysql_real_escape_string removes the slashes that were added by mysql_real_escape_string.
htmlentities replaces the chatacters < and > that are used in strip_tags in order to identify tags.
Furthermore: In general, functions that protect agains XSS are not suitable to protect agains SQL injections and vice versa. Because each language and context hast its own special characters that need to be taken care of.
My advice is to learn why and how code injection is possible and how to protect against it. Learn the languages you are working with, especially the special characters and how to escape these.
Edit Here’s some (probably weird) example: Imagine you allow your users to input some value that should be used as a path segment in a URI that you use in some JavaScript code in a onclick attribute value. So the language context looks like this:
HTML attribute value
JavaScript string
URI path segment
And to make it more fun: You are storing this input value in a database.
Now to store this input value correctly into your database, you just need to use a proper encoding for the context you are about to insert that value into your database language (i.e. SQL); the rest does not matter (yet). Since you want to insert it into a SQL string declaration, the contextual special characters are the characters that allow you to change that context. As for string declarations these characters are (especially) the ", ', and \ characters that need to be escaped. But as already stated, prepared statements do all that work for you, so use them.
Now that you have the value in your database, we want to output them properly. Here we proceed from the innermost to the outermost context and apply the proper encoding in each context:
For the URI path segment context we need to escape (at least) all those characters that let us change that context; in this case / (leave current path segment), ?, and # (both leave URI path context). We can use rawurlencode for this.
For the JavaScript string context we need to take care of ", ', and \. We can use json_encode for this (if available).
For the HTML attribute value we need to take care of &, ", ', and <. We can use htmlspecialchars for this.
Now everything together:
'… onclick="'.htmlspecialchars('window.open("http://example.com/'.json_encode(rawurlencode($row['user-input'])).'")').'" …'
Now if $row['user-input'] is "bar/baz" the output is:
… onclick="window.open("http://example.com/"%22bar%2Fbaz%22"")" …
But using all these function in these contexts is no overkill. Because although the contexts may have similar special characters, they have different escape sequences. URI has the so called percent encoding, JavaScript has escape sequences like \" and HTML has character references like ". And not using just one of these functions will allow to break the context.
It's true, but this level of escaping may not be appropriate in all cases. What if you want to store HTML in a database?
Best practice dictates that, rather than escaping on receiving values, you should escape them when you display them. This allows you to account for displaying both HTML from the database and non-HTML from the database, and it's really where this sort of code logically belongs, anyway.
Another advantage of sanitizing outgoing HTML is that a new attack vector may be discovered, in which case sanitizing incoming HTML won't do anything for values that are already in the database, while outgoing sanitization will apply retroactively without having to do anything special
Also, note that strip_tags in your first function will likely have no effect, if all of the < and > have become < and >.
You are doing htmlentities (which turns all > into >) and then calling strip_tags which at this point will not accomplish anything more, since there are no tags.
If you're using prepared statements and SQL placeholders and never interpolating user input directly into your SQL strings, you can skip the SQL sanitization entirely.
When you use placeholders, the structure of the SQL statement (SELECT foo, bar, baz FROM my_table WHERE id = ?) is send to the database engine separately from the data values which are (eventually) bound to the placeholders. This means that, barring major bugs in the database engine, there is absolutely no way for the data values to be misinterpreted as SQL instructions, so this provides complete protection from SQL injection attacks without requiring you to mangle your data for storage.
No, this isn't overkill this is a vulnerability.
This code completely vulnerable to SQL Injection. You are doing a mysql_real_escape_string() and then you are doing a stripslashes(). So a " would become \" after mysql_real_escape_string() and then go back to " after the stripslashes(). mysql_real_escape_string() alone is best to stop sql injection. Parameterized query libraries like PDO and ADODB uses it, and Parameterized queries make it very easy to completely stop sql injection.
Go ahead test your code:
$variable = sanitizeString($_POST['user_input']);
$variable = sanitizeMySQL($_POST['user_input']);
mysql_query("select * from mysql.user where Host='".$variable."'");
What if:
$_POST['user_input'] = 1' or 1=1 /*
Patched:
mysql_query("select * from mysql.user where Host='".mysql_real_escape_string($variable)."'");
This code is also vulnerable to some types XSS:
$variable = sanitizeString($_POST['user_input']);
$variable = sanitizeMySQL($_POST['user_input']);
print("<body background='http://localhost/image.php?".$variable."' >");
What if:
$_POST['user_input']="' onload=alert(/xss/)";
patched:
$variable=htmlspecialchars($variable,ENT_QUOTES);
print("<body background='http://localhost/image.php?".$variable."' >");
htmlspeicalchars is encoding single and double quotes, make sure the variable you are printing is also encased in quotes, this makes it impossible to "break out" and execute code.
Well, if you don't want to reinvent the wheel you can use HTMLPurifier. It allows you to decide exactly what you want and what you don't want and prevents XSS attacks and such
I wonder about the concept of sanitization. You're telling Mysql to do exactly what you want it to do: run a query statement authored in part by the website user. You're already constructing the sentence dynamically using user input - concatenating strings with data supplied by the user. You get what you ask for.
Anyway, here's some more sanitization methods...
1) For numeric values, always manually cast at least somewhere before or while you build the query string: "SELECT field1 FROM tblTest WHERE(id = ".(int) $val.")";
2) For dates, convert the variable to unix timestamp first. Then, use the Mysql FROM_UNIXTIME() function to convert it back to a date. "SELECT field1 FROM tblTest WHERE(date_field >= FROM_UNIXTIME(".strtotime($val).")";. This is actually needed sometimes anyway to deal with how Mysql interprets and stores dates different from the script or OS layers.
3) For short and predictable strings that must follow a certain standard (username, email, phone number, etc), you can a) do prepared statements; or b) regex or other data validation.
4) For strings which wouldn't follow any real standard and which may or may not have pre- or double-escaped and executable code all over the place (text, memos, wiki markup, links, etc), you can a) do prepared statements; or b) store to and convert from binary/blob form - converting each character to binary, hex, or decimal representation before even passing the value to the query string, and converting back when extracting. This way you can focus more on just html validation when you spit the stored value back out.
I have one "go" script that fetches any other script requested and this is what I wrote to sanitize user input:
foreach ($_REQUEST as $key => $value){
if (get_magic_quotes_gpc())
$_REQUEST[$key] = mysql_real_escape_string(stripslashes($value));
else
$_REQUEST[$key] = mysql_real_escape_string($value);
}
I haven't seen anyone else use this approach. Is there any reason not to?
EDIT - amended for to work for arrays:
function mysql_escape($thing) {
if (is_array($thing)) {
$escaped = array();
foreach ($thing as $key => $value) {
$escaped[$key] = mysql_escape($value);
}
return $escaped;
}
// else
if (get_magic_quotes_gpc()) $thing = stripslashes($thing);
return mysql_real_escape_string($thing);
}
foreach ($_REQUEST as $key => $value){
$_REQUEST[$key] = mysql_escape($value);
}
I find it much better to escape the data at the time it is used, not on the way in. You might want to use that data in JSON, XML, Shell, MySQL, Curl, or HTML and each will have it's own way of escaping the data.
Lets have a quick review of WHY escaping is needed in different contexts:
If you are in a quote delimited string, you need to be able to escape the quotes.
If you are in xml, then you need to separate "content" from "markup"
If you are in SQL, you need to separate "commands" from "data"
If you are on the command line, you need to separate "commands" from "data"
This is a really basic aspect of computing in general. Because the syntax that delimits data can occur IN THE DATA, there needs to be a way to differentiate the DATA from the SYNTAX, hence, escaping.
In web programming, the common escaping cases are:
1. Outputting text into HTML
2. Outputting data into HTML attributes
3. Outputting HTML into HTML
4. Inserting data into Javascript
5. Inserting data into SQL
6. Inserting data into a shell command
Each one has a different security implications if handled incorrectly. THIS IS REALLY IMPORTANT! Let's review this in the context of PHP:
Text into HTML:
htmlspecialchars(...)
Data into HTML attributes
htmlspecialchars(..., ENT_QUOTES)
HTML into HTML
Use a library such as HTMLPurifier to ENSURE that only valid tags are present.
Data into Javascript
I prefer json_encode. If you are placing it in an attribute, you still need to use #2, such as
Inserting data into SQL
Each driver has an escape() function of some sort. It is best. If you are running in a normal latin1 character set, addslashes(...) is suitable. Don't forget the quotes AROUND the addslashes() call:
"INSERT INTO table1 SET field1 = '" . addslashes($data) . "'"
Data on the command line
escapeshellarg() and escapeshellcmd() -- read the manual
--
Take these to heart, and you will eliminate 95%* of common web security risks! (* a guess)
If you have arrays in your $_REQUEST their values won't be sanitized.
I've made and use this one:
<?php
function _clean($var){
$pattern = array("/0x27/","/%0a/","/%0A/","/%0d/","/%0D/","/0x3a/",
"/union/i","/concat/i","/delete/i","/truncate/i","/alter/i","/information_schema/i",
"/unhex/i","/load_file/i","/outfile/i","/0xbf27/");
$value = addslashes(preg_replace($pattern, "", $var));
return $value;
}
if(isset($_GET)){
foreach($_GET as $k => $v){
$_GET[$k] = _clean($v);
}
}
if(isset($_POST)){
foreach($_POST as $k => $v){
$_POST[$k] = _clean($v);
}
}
?>
Your approach tries to sanitize all the request data for insertion into the database, but what if you just wanted to output it? You will have unnecessary backslashes in your output. Also, escaping is not a good strategy to protect from SQL exceptions anyway. By using parametrized queries (e.g. in PDO or in MySQLi) you "pass" the problem of escaping to the abstraction layer.
Apart from the lack of recursion into arrays and the unnecessary escaping of, say, integers, this approach encodes data for use in an SQL statement before sanitization. mysql_real_escape_string() escapes data, it doesn't sanitize it -- escaping and sanitizing aren't the same thing.
Sanitization is the task many PHP scripts have of scrutinizing input data for acceptability before using it. I think this is better done on data that hasn't been escaped. I generally don't escape data until it goes into the SQL. Those who prefer to use Prepared Statements achieve the same that way.
One more thing: If input data can include utf8 strings, it seems these ought to be validated before escaping. I often use a recursive utf8 cleaner on $_POST before sanitization.