I found it inside the "symphony CMS" app, it's very small:
https://github.com/symphonycms/xssfilter/blob/master/extension.driver.php#L100
And I was thinking of stealing it and use it in my own application to sanitize string with HTML for display. Do you think it does a good job?
ps: I know there's HTML Purifier, but that thing is huge. And I'd rather prefer something less permissive, but I still want it to be efficient.
I've been testing it against strings from this page: http://ha.ckers.org/xss.html. But if fails against "XSS locator 2". Not sure how can anyone use that string to hack a site though :)
No, I wouldn’t use it. There are many different attacks that all depend on the context the data is inserted into. One single function would not cover them all. If you take a close look, there are actually just four tests:
// Set the patterns we'll test against
$patterns = array(
// Match any attribute starting with "on" or xmlns
'#(<[^>]+[\x00-\x20\"\'\/])(on|xmlns)[^>]*>?#iUu',
// Match javascript:, livescript:, vbscript: and mocha: protocols
'!((java|live|vb)script|mocha):(\w)*!iUu',
'#-moz-binding[\x00-\x20]*:#u',
// Match style attributes
'#(<[^>]+[\x00-\x20\"\'\/])style=[^>]*>?#iUu',
// Match unneeded tags
'#</*(applet|meta|xml|blink|link|style|script|embed|object|iframe|frame|frameset|ilayer|layer|bgsound|title|base)[^>]*>?#i'
);
Nothing else is tested. Besides attacks that these tests don’t detect (false negative), it could also report some input mistakenly as an attack (false positive).
So instead of trying to detect XSS attacks, just make sure to use proper sanitizing.
I think it does a good job for testing strings,at least that's what I can say according to my tests.
Related
I'm trying to find the best way to sanitize requests in PHP.
From what I've read I learned that GET variables should be sanitized only when they're being displayed, not at the beginning of the "request flow". Post variables (which don't come from the database) either.
I can see several problems here:
Of course I can create functions sanitizing these variables, and by calling something like Class::post('name'), or Class::get('name') everything will be safe. But what if a person who will use my code in the future will forget about it and use casual $_POST['name'] instead of my function? Can I provide, or should I provide a bit of security here?
There is never a one-size-fits-all sanitization. "Sanitization" means you manipulate a value to conform to certain properties. For example, you cast something that's supposed to be a number to a number. Or you strip <script> tags out of supposed HTML. What and how exactly to sanitize depends on what the value is supposed to be and whether you need to sanitize at all. Sanitizing HTML for whitelisted tags is really complex, for instance.
Therefore, there's no magic Class::sanitize which fits everything at once. Anybody using your code needs to think about what they're trying to do anyway. If they just blindly use $_POST values as is, they have already failed and need to turn in their programmer card.
What you always need to do is to escape based on the context. But since that depends on the context, you only do it where necessary. You don't blindly escape all all $_POST values, because you have no idea what you're escaping for. See The Great Escapism (Or: What You Need To Know To Work With Text Within Text) for more background information on the whole topic.
The variables are basically "sanitized" when PHP reads them. Meaning if I were to submit
"; exec("some evil command"); $blah="
Then it won't be a problem as far as PHP is concerned - you will get that literal string.
However, when passing it on from PHP to something else, it's important to make sure that "something else" won't misinterpret the string. So, if it's going into a MySQL database then you need to escape it according to MySQL rules (or use prepared statements, which will do this for you). If it's going into HTML, you need to encode < as < as a minimum. If it's going into JavaScript, then you need to JSON-encode it, and so on.
You can do something like this... Not foolproof, but it works..
foreach($_POST as $key => $val)
{
//do sanitization
$val = Class::sanitize($val);
$_POST[$key] = $val;
}
Edit: You'd want to put this as close to the header as you can get. I usually put mine in the controller so it's executed from the __construct() automagically.
Replace the $_POST array with a sanitizer object which is beheaving like an array.
I've got a simple question:
When is it best to sanitize user input?
And which one of these is considered the best practice:
Sanitize data before writing to database.
Save raw data and sanitize it in the view.
For example use HTML::entities() and save result to database.
Or by using HTML methods in the views because in this case laravel by default uses HTML::entities().
Or maybe by using the both.
EDIT: I found interesting example http://forums.laravel.com/viewtopic.php?id=1789. Are there other ways to solve this?
I would say you need both locations but for different reasons. When data comes in you should validate the data according to the domain, and reject requests that do not comply. As an example, there is no point in allowing a tag (or text for that matter) if you expect a number. For a parameter representing.a year, you may even want to check that it is within some range.
Sanitization kicks in for free text fields. You can still do simple validation for unexpected characters like 0-bytes. IMHO it's best to store raw through safe sql (parameterized queries) and then correctly encode for output. There are two reasons. The first is that if your sanitizer has a bug, what do you do with all the data in your database? Resanitizing can have unwanted consequences. Secondly you want to do contextual escaping, for whichever output you are using (JSON, HTML, HTML attributes etc.)
I have a full article on input filtering in Laravel, you might find it useful http://usman.it/xss-filter-laravel/, here is the excerpt from this article:
You can do a global XSS clean yourself, if you don’t have a library to write common methods you may need frequently then I ask you to create a new library Common in application/library. Put this two methods in your Common library:
/*
* Method to strip tags globally.
*/
public static function global_xss_clean()
{
// Recursive cleaning for array [] inputs, not just strings.
$sanitized = static::array_strip_tags(Input::get());
Input::merge($sanitized);
}
public static function array_strip_tags($array)
{
$result = array();
foreach ($array as $key => $value) {
// Don't allow tags on key either, maybe useful for dynamic forms.
$key = strip_tags($key);
// If the value is an array, we will just recurse back into the
// function to keep stripping the tags out of the array,
// otherwise we will set the stripped value.
if (is_array($value)) {
$result[$key] = static::array_strip_tags($value);
} else {
// I am using strip_tags(), you may use htmlentities(),
// also I am doing trim() here, you may remove it, if you wish.
$result[$key] = trim(strip_tags($value));
}
}
return $result;
}
Then put this code in the beginning of your before filter (in application/routes.php):
//Our own method to defend XSS attacks globally.
Common::global_xss_clean();
I just found this question. Another way to do it is to enclose dynamic output in triple brackets like this {{{ $var }}} and blade will escape the string for you. That way you can keep the potentially dangerous characters in case they are important somewhere else in the code and display them as escaped strings.
i'd found this because i was worried about xss in laravel, so this is the packages gvlatko
it is easy:
To Clear Inputs = $cleaned = Xss::clean(Input::get('comment');
To Use in views = $cleaned = Xss::clean(Input::file('profile'), TRUE);
It depends on the user input. If you're generally going to be outputting code they may provide (for example maybe it's a site that provides code snippets), then you'd sanitize on output. It depends on the context. If you're asking for a username, and they're entering HTML tags, your validation should be picking this up and going "no, this is not cool, man!"
If it's like the example I stated earlier (code snippets), then let it through as RAW (but be sure to make sure your database doesn't break), and sanitize on output. When using PHP, you can use htmlentities($string).
I'm used to the habit of checking the type of my parameters when writing functions. Is there a reason for or against this? As an example, would it be good practice to keep the string verification in this code or remove it, and why?
function rmstr($string, $remove) {
if (is_string($string) && is_string($remove)) {
return str_replace($remove, '', $string);
}
return '';
}
rmstr('some text', 'text');
There are times when you may expect different parameter types and run different code for them, in which case the verification is essential, but my question is if we should explicitly check for a type and avoid an error.
Yes, it's fine. However, php is not strongly typed to begin with, so I think this is not very useful in practice.
Additionally, if one uses an object other than string, an exception is a more informative; therefore, I'd try to avoid just returning an empty string at the end, because it's not semantically explaining that calling rmstr(array, object) returns an empty string.
My opinion is that you should perform such verification if you are accepting input from the user. If those strings were not accepted from the user or are sanitized input from the user, then doing verification there is excessive.
As for me, type checking actual to data, getted from user on top level of abstraction, but after that, when You call most of your functions you already should now their type, and don't check it out in every method. It affects performance and readability.
Note: you can add info, which types is allowed to arguments for your functions by phpDoc
It seems local folks understood this question as "Should you verify parameters" where it was "Should you verify parameter types", and made nonsense answers and comments out of it.
Personally I am never checking operand types and never experienced any trouble of it.
It depends which code you produce. If it's actually production code, you should ensure that your function is working properly under any circumstances. This includes checking that parameters contain the data you expect. Otherwise throw an exception or have another form of error handling (which your example is totally missing).
If it's not for production use and you don't need to code defensively, you can ignore anything and follow the garbage-in-garbage-out principle (or the three shit principle: code shit, process shit, get shit).
In the end it is all about matching expectations: If you don't need your function to work properly, you don't need to code it properly. If you are actually relying on your code to work precisely, you even need to validate input data per each unit (function, class).
I'm posting it for a clarification in a specific situation, though user input sanitization/validations is a cliche subject.
A section of the code contain
$haystack=$_GET['user'];
$input is never used for 'echo' or 'print' or in any SQL query or in any such thing. The only further use of the user input ( $haystack ) is to check if the string contains a predefined $needle.
if (preg_match($needle,$haystack)) {
$result="A";
} else {
$result="B";
}
My worry is the execution of a malicious code, rather than the presence of it in the user input.
So the question is, if the user input is used only in the context (no usage in echo,print,SQL etc) mentioned above, is there still a possibility of a malicious code in the user input get executed.
I wanted to add the security measures that is just required for the context than overdoing it.
If used only in the context, there's no way to execute malicious code from the user input.
You should be careful with eval, preg_replace (with modifier e, thanks Pelshoff), database queries and echo (& print, sprintf…).
Its not possible to just execute arbitrary code by being able to alter a string. Only when you output the string directly, or use it in SQL should you be really worried.
preg_match won't end up executing your input. It's too simple and straightforward to have a hidden exploitable bug. If you toss $haystack after running preg_match on it, then it can't possibly hurt you.
While the $haystack may not be reflected, it can obviously affect program flow. The (extremely short) code you posted certainly doesn't look directly vulnerable, but not sanitizing your input may enable code execution in conjunction with other vulnerabilities.
Is this enough?
$listing = mysql_real_escape_string(htmlspecialchars($_POST['listing']));
Depends - if you are expecting text, it's just fine, although you shouldn't put the htmlspecialchars in input. Do it in output.
You might want to read this: What's the best method for sanitizing user input with PHP?
you can use php function : filter_var()
a good tutorial in the link :
http://www.phpro.org/tutorials/Filtering-Data-with-PHP.html
example to sanitize integer :
To sanitize an Integer is simple with the FILTER_SANITIZE_INT filter. This filter strips out all characters except for digits and . + -
It is simple to use and we no longer need to boggle our minds with regular expressions.
<?php
/*** an integer ***/
$int = "abc40def+;2";
/*** sanitize the integer ***/
echo filter_var($int, FILTER_SANITIZE_NUMBER_INT);
?>
The above code produces an output of 40+2 as the none INT values, as specified by the filter, have been removed
See:
Best way to stop SQL Injection in PHP
What are the best practices for avoid xss attacks in a PHP site
And sanitise data immediately before it is used in the context it needs to be made safe for. (e.g. don't run htmlspecialchars until you are about to output HTML, you might need the unedited data before then (such as if you ever decide to send content from the database by email)).
Yes. However, you shouldn't use htmlspecialchars on input. Only on output, when you print it.
This is because, it's not certain that the output will always be through html. It could be through a terminal, so it could confuse users if weird codes suddenly show up.
It depends on what you want to achieve. Your version prevents (probably) all SQL injections and strips out HTML (more exactly: Prevents it from being interpreted when sent to the browser). You could (and probably should) apply the htmlspecialchars() on output, not input. Maybe some time in the future you want to allow simple things like <b>.
But there's more to sanitizing, e.g. if you expect an Email Address you could verify that it's indeed an email address.
As has been said don't use htmlspecialchars on input only output. Another thing to take into consideration is ensuring the input is as expected. For instance if you're expecting a number use is_numeric() or if you're expecting a string to only be of a certain size or at least a certain size check for this. This way you can then alert users to any errors they have made in their input.
What if your listing variable is an array ?
You should sanitize this variable recursively.
Edit:
Actually, with this technique you can avoid SQL injections but you can't avoid XSS.
In order to sanitize "unreliable" string, i usually combine strip_tags and html_entity_decode.
This way, i avoid all code injection, even if characters are encoded in a Ł way.
$cleaned_string = strip_tags( html_entity_decode( $var, ENT_QUOTES, 'UTF-8' ) );
Then, you have to build a recursive function which call the previous functions and walks through multi-dimensional arrays.
In the end, when you want to use a variable into an SQL statement, you can use the DBMS-specific (or PDO's) escaping function.
$var_used_with_mysql = mysql_real_escape_string( $cleaned_string );
In addition to sanitizing the data you should also validate it. Like checking for numbers after you ask for an age. Or making sure that a email address is valid. Besides for the security benefit you can also notify your users about problems with their input.
I would assume it is almost impossible to make an SQL injection if the input is definitely a number or definitely an email address so there is an added level of safety.