PHP Form Input Filtering

PHP Form Input Filtering - php

I am a PHP newbie and am working on a basic form validation script. I understand that input filtering and output escaping are both vital for security reasons. My question is whether or not the code I have written below is adequately secure? A few clarifying notes first.
I understand there is a difference between sanitizing and validating. In the example field below, the field is plain text, so all I need to do is sanitize it.
$clean['myfield'] is the value I would send to a MySQL database. I am using prepared statements for my database interaction.
$html['myfield'] is the value I am sending back to the client so that when s/he submits the form with invalid/incomplete data, the sanitized fields that have data in them will be repopulated so they don't have to type everything in from scratch.
Here is the (slightly cleaned up) code:
$clean = array();
$html = array();
$_POST['fname'] = filter_var($_POST['fname'], FILTER_SANITIZE_STRING);
$clean['fname'] = $_POST['fname'];
$html['fname'] = htmlentities($clean['fname'], ENT_QUOTES, 'UTF-8');
if ($_POST['fname'] == "") {
$formerrors .= 'Please enter a valid first name.<br/><br/>';
}
else {
$formerrors .= 'Name is valid!<br/><br/>';
}
Thanks for your help!
~Jared

I understand that input filtering and output escaping are both vital for security reasons.
I'd say rather that output escaping is vital for security and correctness reasons, and input filtering is potentially-useful measure for defence-in-depth and to enforce specific application rules.
The input filtering step and the output escaping step are necessarily separate concerns, and cannot be combined into one step, not least because there are many different types of output escaping, and the right one has to be chosen for each output context (eg HTML-escaping in a page, URL-escaping to make a link, SQL-escaping, and so on).
Unfortunately PHP is traditionally very hazy on these issues and so offers a bunch of mixed-message functions that are likely to mislead you.
In the example field below, the field is plain text, so all I need to do is sanitize it.
Yes. Alas, FILTER_SANITIZE_STRING is not in any way a sane sanitiser. It completely removes some content (strip_tags, which is itself highly non-sensible) whilst HTML-escaping other content. eg quotes turn into ". This is a nonsense.
Instead, for input sanitisation, look at:
checking it's a valid string for the encoding you're using (hopefully UTF-8; see eg this regex for that);
removing control characters, U+0000–U+001F and U+007F–U+009F. Allow the newline through only on deliberate multi-line text fields;
removing the characters that are not suitable for use in markup;
validating the input conforms to application requirements on a field-by-field basis, for data whose content model is more specific than arbitrary text strings. Although your escaping should handle a < character correctly, it's probably a good idea to get rid of it early in fields where it makes no sense to have one.
For the output escaping step I'd generally prefer htmlspecialchars() to htmlentities(), though your correct use of the UTF-8 argument stops the latter function breaking in the way it usually does.

Depending on what you want to secure, the filter you call might be overactive (see comments). Injectionwise you should be safe since you're using Prepared Statements (see this answer)
On a design note you might want to filter first, then check for empty values. Doing that you can shorten your code ;)

I understand that input filtering ... is vital for security reasons.
This is wrong statement.
Although it can be right in some circumstances, in such a generalised form it can do no good but false feeling of safety.
all I need to do is sanitize it.
There is no such thing like "general sanitizing". You have to understand each particular case and it's limitations. For example, for the database you need to use several different sanitization techniques, not one. While for the filenames it is going to be completely different one.
I am using prepared statements for my database interaction.
Thus, you should not touch the data at all. Just leave it as is.
Here is the (slightly cleaned up) code:
It seems there is some overkill in your code.
you are cleaning your HTML data twice while it is possible that you won't need it at all.
and for some reason you are raising an error on success.
I'd make it rather this way
$formerrors = '';
if ($_POST['fname'] == "") {
$formerrors .= 'Please enter a valid first name.<br/><br/>';
}
if (!$formerrors) {
$html = array();
foreach ($_POST as $key => $val) {
$html[$key] = htmlspecialchars($val,ENT_QUOTES);
}
}

Related

Checkmarx XSS while using htmlpurifier

I have a php page that echoes something like this:
echo "<div>" . $_REQUEST["id"] . "</div>";
This leads to XSS issue, which i tried to fix using htmlpurifier through a function that cleans $_REQUEST by reference, leading to this code:
function sanitizer(array &array) {
foreach ($array as $key => $value) {
$array[$key] = htmlpurifierInstance->purify($value);
}
}
sanitizer($_REQUEST);
echo "<div>" . $_REQUEST["id"] . "</div>";
After another checkmarx test, the issue stills pops up, what's the fix to this issue?

Sanitising HTML should be a very rare requirement, not something you do regularly on all input.
Whenever a value has a limited range of valid values, validate it. Reject it or unset it if it's not valid. So if "id" is supposed to be a number, reject non-numeric input.
Whenever outputting or sending any variable somewhere, escape it for the relevant context. In this case, you are outputting in an HTML context, so use htmlspecialchars. This is not something you can do ahead of time, because the same variable might be used in multiple contexts.
For the particular case of database queries, don't use escaping, use parameterised queries.
In the rare cases where you really need the user to be able to enter HTML, come up with a strict whitelist of tags and attributes they can use, and sanitise the particular variable based on that, as part of your input processing. (This is what HTMLPurifier is for.)
Never, ever, try to write a "universal" sanitising or escaping function. At best, you will end up mangling data by applying too many things at once; at worst, you'll defeat your own security.

Function escaping before inserting in mysql

I've been working on a code that escapes your posts if they are strings before you enter them in DB, is it an good idea? Here is the code: (Updated to numeric)
static function securePosts(){
$posts = array();
foreach($_POST as $key => $val){
if(!is_numeric($val)){
if(is_string($val)){
if(get_magic_quotes_gpc())
$val = stripslashes($val);
$posts[$key] = mysql_real_escape_string($val);
}
}else
$posts[$key] = $val;
}
return $posts;
}
Then in an other file:
if(isset($_POST)){
$post = ChangeHandler::securePosts();
if(isset($post['user'])){
AddUserToDbOrWhatEver($post['user']);
}
}
Is this good or will it have bad effects when escaping before even entering it in the function (addtodborwhater)

When working with user-input, one should distinguish between validation and escaping.
Validation
There you test the content of the user-input. If you expect a number, you check if this is really a numerical input. Validation can be done as early as possible. If the validation fails, you can reject it immediately and return with an error message.
Escaping
Here you bring the user-input into a form, that can not damage a given target system. Escaping should be done as late as possible and only for the given system. If you want to store the user-input into a database, you would use a function like mysqli_real_escape_string() or a parameterized PDO query. Later if you want to output it on an HTML page you would use htmlspecialchars().
It's not a good idea to preventive escape the user-input, or to escape it for several target systems. Each escaping can corrupt the original value for other target systems, you can loose information this way.
P.S.
As YourCommonSense correctly pointed out, it is not always enough to use escape functions to be safe, but that does not mean that you should not use them. Often the character encoding is a pitfall for security efforts, and it is a good habit to declare the character encoding explicitely. In the case of mysqli this can be done with $db->set_charset('utf8'); and for HTML pages it helps to declare the charset with a meta tag.

It is ALWAYS a good idea to escape user input BEFORE inserting anything in database. However, you should also try to convert values, that you expect to be a number to integers (signed or unsigned). Or better - you should use prepared SQL statements. There is a lot of info of the latter here and on PHP docs.

Sanitize user input in laravel

I've got a simple question:
When is it best to sanitize user input?
And which one of these is considered the best practice:
Sanitize data before writing to database.
Save raw data and sanitize it in the view.
For example use HTML::entities() and save result to database.
Or by using HTML methods in the views because in this case laravel by default uses HTML::entities().
Or maybe by using the both.
EDIT: I found interesting example http://forums.laravel.com/viewtopic.php?id=1789. Are there other ways to solve this?

I would say you need both locations but for different reasons. When data comes in you should validate the data according to the domain, and reject requests that do not comply. As an example, there is no point in allowing a tag (or text for that matter) if you expect a number. For a parameter representing.a year, you may even want to check that it is within some range.
Sanitization kicks in for free text fields. You can still do simple validation for unexpected characters like 0-bytes. IMHO it's best to store raw through safe sql (parameterized queries) and then correctly encode for output. There are two reasons. The first is that if your sanitizer has a bug, what do you do with all the data in your database? Resanitizing can have unwanted consequences. Secondly you want to do contextual escaping, for whichever output you are using (JSON, HTML, HTML attributes etc.)

I have a full article on input filtering in Laravel, you might find it useful http://usman.it/xss-filter-laravel/, here is the excerpt from this article:
You can do a global XSS clean yourself, if you don’t have a library to write common methods you may need frequently then I ask you to create a new library Common in application/library. Put this two methods in your Common library:
/*
* Method to strip tags globally.
*/
public static function global_xss_clean()
{
// Recursive cleaning for array [] inputs, not just strings.
$sanitized = static::array_strip_tags(Input::get());
Input::merge($sanitized);
}
public static function array_strip_tags($array)
{
$result = array();
foreach ($array as $key => $value) {
// Don't allow tags on key either, maybe useful for dynamic forms.
$key = strip_tags($key);
// If the value is an array, we will just recurse back into the
// function to keep stripping the tags out of the array,
// otherwise we will set the stripped value.
if (is_array($value)) {
$result[$key] = static::array_strip_tags($value);
} else {
// I am using strip_tags(), you may use htmlentities(),
// also I am doing trim() here, you may remove it, if you wish.
$result[$key] = trim(strip_tags($value));
}
}
return $result;
}
Then put this code in the beginning of your before filter (in application/routes.php):
//Our own method to defend XSS attacks globally.
Common::global_xss_clean();

I just found this question. Another way to do it is to enclose dynamic output in triple brackets like this {{{ $var }}} and blade will escape the string for you. That way you can keep the potentially dangerous characters in case they are important somewhere else in the code and display them as escaped strings.

i'd found this because i was worried about xss in laravel, so this is the packages gvlatko
it is easy:
To Clear Inputs = $cleaned = Xss::clean(Input::get('comment');
To Use in views = $cleaned = Xss::clean(Input::file('profile'), TRUE);

It depends on the user input. If you're generally going to be outputting code they may provide (for example maybe it's a site that provides code snippets), then you'd sanitize on output. It depends on the context. If you're asking for a username, and they're entering HTML tags, your validation should be picking this up and going "no, this is not cool, man!"
If it's like the example I stated earlier (code snippets), then let it through as RAW (but be sure to make sure your database doesn't break), and sanitize on output. When using PHP, you can use htmlentities($string).

What are the necessary and most important things, we should do at validation, if an web application gets the users input or parameters?

I am always thinking about validation in any kind on the webpage (PHP or ASP, it doesn't matter), but never find a good and accurate answer.
For example, a I have some GET-Parameter, which defines a SQL query like DESC oder ASC. (SQL-Injection?)
Or I have a comment-function for user, where the data is also saved in a database.
Is it enought to check for HTML-tags inside the data? Should the validation done before adding it to the database or showing it on the page?
I am searching for the ToDo's which should be always performed with any data given from "outside".
Thanks.

Have a good idea of what you want from the user.
You want them to specify ascending/descending order? That's an enumeration (or a boolean), not part of an SQL query:
$query = "SELECT [...] ORDER BY field " . escape($_GET['sortOrder']); //wrong
This is wrong no matter how much you escape and sanitize their string, because this is not the way to validate an enumeration. Compare:
if ($_GET['sortOrder'] == 'desc') {
$ascending = false;
} else {
$ascending = true;
}
if ($ascending) {
...
} else {
...
}
...which does not warrant a discussion of string escaping or SQL injection because all you want from the user is a yes/no (or ascending/descending) answer.
You want them to enter a comment? Why disallow HTML tags? What if the user wants to enter HTML code?
Again, what you want from them is, say, "a text... any text with a maximum length of 1024 characters*." What does this have to do with SQL or injection? Nothing:
$text = $_POST['commentText'];
if (mb_strlen($text, ENCODING) <= 1024) {
//valid!
}
The value in the database should reflect what the user entered verbatim; not translated, not escaped. Say you're stripping all HTML <tags> from the comment. What happens when you decide to send comments somewhere in JSON format? Do you strip JSON control characters as well? What about some other format? What happens if HTML introduces a tag called ":)"? Do you go around in your database stripping off smileys from all comments?
The answer is no, as you don't want HTML-safe, JSON-safe, some-weird-format-with-smileys-safe input from the user. You want text that is at maximum 1024 characters. Check for that. Store that.
Now, the displaying part is trickier. In order to display:
<b>I like HTML "tags"
in HTML, you need to write something like:
<b>I like HTML "tags"
In JSON, you would do:
{ "I like HTML \"tags\" }
That is why you should use your language facilities to escape the data when you're using it.
The same of course goes for SQL, which is why you should escape the data when using simple query functions like mysql_query() in PHP. (Parametrized queries, which you should really be using, on the other hand, need no escaping.)
Summary
Have a really good idea of what you want as the input, keeping in mind that you almost never need, say, "HTML-safe text." Validate against that. Escape when required, meaning escape HTML as you send to the browser, SQL as you send to the database, and so on.
*: You should also define what a "character" means here. UTF-8, for example, may use multiple bytes to encode a code point. Does "character" mean "byte" or "Unicode code point"?

If you're using PDO, be sure to use prepared statements - these clean the incoming data automatically.
If using the mysql_* functions, run each variable through mysql_real_escape_string first.
You can also do validation such as making sure the variable is one of an acceptable range:
$allowed_values = array('name', 'date', 'last_login')
if(in_array($v, $allowed_values)) {
// now we can use the variable
}

You are talking about two kinds of data sanitation. One is about putting user-generated data in your database and the other is about putting user-generated data on your webpage. For the former you should follow adam's suggestions. For the later you should look into htmlspecialchars.
Do not mix these two as they do two completely different things. For that purpose sanitation should only take place at the last moment. Use adam's suggestion just before updating the database. Use htmlspecialchars just before echoing data. Do not use htmlspecialchars on data before adding it to the database.
You might also want to look around Stackoverflow, because this sort of question has been asked and answered countless times in the past.

How to santize user inputs in PHP?

Is this enough?
$listing = mysql_real_escape_string(htmlspecialchars($_POST['listing']));

Depends - if you are expecting text, it's just fine, although you shouldn't put the htmlspecialchars in input. Do it in output.
You might want to read this: What's the best method for sanitizing user input with PHP?

you can use php function : filter_var()
a good tutorial in the link :
http://www.phpro.org/tutorials/Filtering-Data-with-PHP.html
example to sanitize integer :
To sanitize an Integer is simple with the FILTER_SANITIZE_INT filter. This filter strips out all characters except for digits and . + -
It is simple to use and we no longer need to boggle our minds with regular expressions.
<?php
/*** an integer ***/
$int = "abc40def+;2";
/*** sanitize the integer ***/
echo filter_var($int, FILTER_SANITIZE_NUMBER_INT);
?>
The above code produces an output of 40+2 as the none INT values, as specified by the filter, have been removed

See:
Best way to stop SQL Injection in PHP
What are the best practices for avoid xss attacks in a PHP site
And sanitise data immediately before it is used in the context it needs to be made safe for. (e.g. don't run htmlspecialchars until you are about to output HTML, you might need the unedited data before then (such as if you ever decide to send content from the database by email)).

Yes. However, you shouldn't use htmlspecialchars on input. Only on output, when you print it.
This is because, it's not certain that the output will always be through html. It could be through a terminal, so it could confuse users if weird codes suddenly show up.

It depends on what you want to achieve. Your version prevents (probably) all SQL injections and strips out HTML (more exactly: Prevents it from being interpreted when sent to the browser). You could (and probably should) apply the htmlspecialchars() on output, not input. Maybe some time in the future you want to allow simple things like <b>.
But there's more to sanitizing, e.g. if you expect an Email Address you could verify that it's indeed an email address.

As has been said don't use htmlspecialchars on input only output. Another thing to take into consideration is ensuring the input is as expected. For instance if you're expecting a number use is_numeric() or if you're expecting a string to only be of a certain size or at least a certain size check for this. This way you can then alert users to any errors they have made in their input.

What if your listing variable is an array ?
You should sanitize this variable recursively.
Edit:
Actually, with this technique you can avoid SQL injections but you can't avoid XSS.
In order to sanitize "unreliable" string, i usually combine strip_tags and html_entity_decode.
This way, i avoid all code injection, even if characters are encoded in a Ł way.
$cleaned_string = strip_tags( html_entity_decode( $var, ENT_QUOTES, 'UTF-8' ) );
Then, you have to build a recursive function which call the previous functions and walks through multi-dimensional arrays.
In the end, when you want to use a variable into an SQL statement, you can use the DBMS-specific (or PDO's) escaping function.
$var_used_with_mysql = mysql_real_escape_string( $cleaned_string );

In addition to sanitizing the data you should also validate it. Like checking for numbers after you ask for an age. Or making sure that a email address is valid. Besides for the security benefit you can also notify your users about problems with their input.
I would assume it is almost impossible to make an SQL injection if the input is definitely a number or definitely an email address so there is an added level of safety.

We Keep Coding

PHP, A popular general-purpose scripting language that is especially suited to web development.