I've been searching about this, but I can't find the most important part - what field to use.
I want to save a textarea without allowing any kind of javascript, html or php. What functions should I run the posted textarea through before saving it in the database? And what field type should I use for it in the database? It'll be a description, max 1000 chars.
There are a number of ways to go around in removing/handling code so that it can be saved in your database.
Regular Expressions
One way (but may be hard and unreliable) is to remove/ detect code using regular expressions.
For example, the following removes all script tags using php code (Taken from here):
$mystring = preg_replace('/<script\b[^>]*>(.*?)<\/script>/is', "", $mystring)
The stip_tags PHP function
You can also make use of the built in stip_tags function which strips HTML and PHP tags from a string. The manual provides several examples, one shown below for your convenience:
<?php
$text = '<p>Test paragraph.</p><!-- Comment --> Other text';
echo strip_tags($text);
echo "\n";
// Allow <p> and <a>
echo strip_tags($text, '<p><a>');
?>
HTML Purifier
You can check out HTML Purifier, which is a common HTML filter PHP library intended to detect and remove dangerous code.
Simple code found on their Getting Started Section:
require_once '/path/to/HTMLPurifier.auto.php';
$config = HTMLPurifier_Config::createDefault();
$purifier = new HTMLPurifier($config);
$clean_html = $purifier->purify($dirty_html);
In Practice (Safe Output)
If you are trying to avoid XSS attacks or Injection attacks, cleaning user data is the wrong way to go about it. Removing tags is not a 100 % guarantee for keeping your service safe from these attacks. Therefore, in practice, user data containing code is not usually filtered/ cleaned, but rather escaped during output. More specifically, the special characters within the string are escaped, where these characters are based on the syntax of the language. An example of this is making use of PHP's htmlspecialchars function in order to convert special characters to their respective HTML entities. A Code Snippet taken from manual is shown below:
<?php
$new = htmlspecialchars("<a href='test'>Test</a>", ENT_QUOTES);
echo $new; // <a href='test'>Test</a>
?>
For more information about escaping and a very good explanation related to your question, look at this page. It shows you other forms of output escaping. Also, for a question and answer related to escaping, click here.
Furthermore, one more short but VITAL point I want to throw at you is that ANY data received from a user CANNOT be trusted.
SQL Injection Attacks
Definition (From here)
A SQL injection attack consists of insertion or "injection" of a SQL
query via the input data from the client to the application. A
successful SQL injection exploit can read sensitive data from the
database, modify database data (Insert/Update/Delete), execute
administration operations on the database (such as shutdown the DBMS),
recover the content of a given file present on the DBMS file system
and in some cases issue commands to the operating system.
For SQL Injection attacks: Use prepared statements and parameterized queries when storing information to the database. (Question and Answer found here) A tutorial of prepared statements using PDO can be found here.
Cross-site Scripting (XSS)
Definition (from here):
Cross-Site Scripting attacks are a type of injection problem, in which
malicious scripts are injected into the otherwise benign and trusted
web sites. Cross-site scripting (XSS) attacks occur when an attacker
uses a web application to send malicious code, generally in the form
of a browser side script, to a different end user.
I personally like this image for a better understanding.
For XSS attacks: you should consult this famous page, which describes rule by rule on what needs to be done.
TLDR:
It is conventional to use htmlspecialchars() to encode text on output, rather than filter the text on input. A text field is fine for this purpose.
What you need to defend against
You are trying to protect yourself from XSS. XSS happens when users can stored HTML control characters on your site. Other users will see this HTML markup, so a malicious user can use your page to redirect people to other sites or steal cookies and so on.
You need to consider this for all of your inputs: this should include any varchar or text field that can be stored in your database; not just your textareas. I can add malicious content to an input field just as easily as I can add it to a textarea.
How do we defend against this?
Let's say that a user claims that their username is:
<script src="http://example.com/malicious.js"></script>
The simplest way to handle this is to save this into the database "as is". However, whenever you echo it on the site, you should filter it through the PHP htmlspecialchars() function:
echo 'Hi, my name is ' . htmlspecialchars($user->username) . '!';
htmlspecialchars turns the HTML control characters (<, >, &, ', and ") into their HTML Entities (<, >, &, ', and "). This would look like the original character in a browser (i.e.: to normal users), but it would not act like actual HTML markup.
The result is that instead of malicious JavaScript, the user's name would literally look like <script src="http: //example.com/malicious.js"></script>.
Why filter on output? Why not on input?
1 - OWASP recommends this way
2 - If you forget to protect an input field, and someone figures it out and adds malicious content, you now need to find the malicious content in the database and repair the fault code on your site.
3 - If you forget to encode an output field, and someone manages to sneak in malicious input, then you only need to repair the faulty code on your site.
4 - It is possible for users to write usernames that would break the HTML fields used to edit the usernames. If you encode the content before you store it in the database, then you need to display it "as is" in the appropriate input fields (let's assume that an admin or the user can change their username later). But, let's suppose that a user found a way to inject malicious code into the database. What if they said that their username is: " style="display:none;" />. The input field that would let the administrator change this username now looks like:
<input type="text" name="username" value="" style="display:none;" />" />
malicious content -> ^^^^^^^^^^^^^^^^^^^^^^^^^^
Now, the admins can't fix the problem: the input field has disappeared. But, if you encode the text on output, then all of your input fields will have protection against malicous content. Now, your inputs will look like this:
<input type="text" name="username" value="" style="display:none;" />" />
safe content -> ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
Related
I would appreciate an answer to settle a disagreement between me and some co-workers.
We have a typical PHP / LAMP web application.
The only input we want from users is plain text. We do not invite or want users to enter HTML at any point. Form elements are mostly basic input text tags. There might be a few textareas, checkboxes etc.
There is currently no sanitizing of output to pages. All dynamic content, some of which came from user input, is simply echoed to the page. We obviously need to make it safe.
My solution is to use htmlspecialchars on all output at the time it is echoed on the page.
My co-workers' solution is to add HTML Purifier to the database layer. They want to pass all user entered input through HTML Purifier before it is saved to the database. Apparently they've used it like this on other projects but I think that is a misunderstanding of what HTML Purifier is for.
My understanding is that it only makes sense to use HTML Purifier on a site which allows the user to enter HTML. It takes HTML and makes it safer and cleaner based on a whitelist and other rules.
Who's right and who's wrong?
There's also the whole "escape on input or output" issue but I guess that's a debate for another time and place.
Thanks
As a general rule, escaping should be done for context and for use-case.
If what you want to do is output plain text in an HTML context (and you do), then you need to use escaping functionality that will ensure that you will always output plain text in an HTML context. Given basic PHP, that would indeed be htmlspecialchars($yourString, ENT_QUOTES, 'yourEncoding');.
If what you want to do is output HTML in an HTML context (you don't), then you would want to santitise the HTML when you output it to prevent it from doing damage - here you would $purifier->purify($yourString); on output.
If you want to store plain text user input in a database (again, you do) by executing SQL statements, then you should either use prepared statements to prevent SQL injection, or an escaping function specific to your DB, such as mysql_real_escape_string($yourString).
You should not:
escape for HTML when you are putting data into the database
sanitise as HTML when you are putting data into the database
sanitise as HTML when you are outputting data as plain text
Of those, all are outright harmful, albeit to different degrees. Note that the following assumes the database is your only or canonical storage medium for the data (it also assumes you have SQL injection taken care of in some other way - if you don't, that'll be your primary issue):
if you escape for HTML when you put the data into the database, you rely on the guarantee that you will always be outputting the data into an HTML context; suddenly if you want to just put it into a plaintext file for printing as-is, you need to decode the data before you output it.
if you sanitise as HTML when you put the data into the database, you are destroying information that your user put there. Is it a messaging system and your user wanted to tell someone else about <script> tags? Your user can't do that - you'll destroy that part of his message!
Sanitising as HTML when you're outputting data as plain text (without also escaping it) may have confusing, page-breaking results if you don't set your sanitising module to strip all HTML (which you shouldn't, since then you clearly don't want to be outputting HTML).
Did you sanitise for a <div> context, but are putting your data into an inline element? Your user might put a <div> into your inline element, forcing a layout break into your page layout (how annoying this is depends on your layout), or to influence user perception of metadata (for example to make phishing easier), e.g. like this:
Name: John Doe(Site admin)
Did you sanitise for a <span> context? The user could use other tags to influence user perception of metadata, e.g. like this:
Name: John Doe (this user is an administrator)
Worst-case scenario: Did you sanitise your HTML with a version of HTML Purifier that later turns out to have a bug that does allow a certain kind of malicious HTML to survive? Now you're outputting untrusted data and putting users that view this data on your web page at risk.
Sanitising as HTML and escaping for HTML (in that order!) does not have this problem, but it means the sanitising step is unnecessary, meaning this constellation will just cost you performance. (Presumably that's why your colleague wanted to do the sanitising when saving the data, not when displaying it - presumably your use-case (like most) will display the data more often than the data will be submitted, meaning you would avoid having to deal with the performance hit frequently.)
tl;dr
Sanitising as HTML when you're outputting as plain text is not a good idea.
Escape / sanitise for use-case and context.
In your situation, you want to escape plain text for an HTML context (= use htmlspecialchars()).
I read user profiles from database and show them. Before I show them I use HTML sanitizing through php htmlentities. It shows them correctly. But, while allowing user to edit it, it is shown like double filtered.
echo '<input id="about" name="about" value="'.$php_filtered_value>.'">';
Then inside the input, ampersand would look like &
If I don't filter the variable there is worry about html injection.
What should I do?
I prefer to follow OWASP RULE#2:
> RULE #2 - Attribute Escape Before Inserting Untrusted Data into HTML
Requirements:
-Aggressive HTML Entity Encoding
-Only place untrusted data into a whitelist of safe attributes (listed below).
-Strictly validate unsafe attributes such as background, id and name.
Please see XSS (Cross Site Scripting) Prevention Cheat Sheet
Don't double escape the text then (as in: once before storing it in the database and again before echoing it).
Unescaped (what you typed): Able & Baker
Escaped once (what is being stored in the DB): Able & Baker
Double-escaped (what is ending up in your HTML): Able & Baker
Rather, escape the text only once: generally on the output side, not on the input side.
I have read OWASP's XSS prevention cheat sheet but I don't really recognize my application with those rules. I don't feel like I have any of the vulnerabilities pointed out in those rules.
I am doing a PHP application that follows all the following principles:
Not a single user input is displayed directly on the HTML page without being processed and sanitized on the server-side
All my user input are sanitized with htmlentities(). Is that sufficient? (I use prepared statements for SQL injection)
Some of the user input have a maxlength condition of 5 characters on server-side. Does that help protect against XSS? (since I hardly see an XSS code being shorter than 6 characters)
Apart from data from the database, the only user input that is displayed back to the user was sent to the server via ajax, sanitized with htmlentities and reintroduced in the DOM using text() instead of html() (using jQuery)
Should I be concerned about XSS in my case? What else can I do to protect myself from XSS?
All my user input are sanitized with htmlentities(). Is that sufficient? (I use prepared statements for SQL injection)
No. First, you should filter on output, not on input. In programming never trust any data, even those from your own database! On input, you just need to escape it for use in SQL, logs, etc. But you also have to filter basic html + some special characters: \0 & < > ( ) + - = " ' \ on output. htmlentities() is just not enough.
Imagine you have a image on site:
<img src="xxx" onload="image_loaded({some_text_from_db});">
{some_text_from_db} would be );alert(String.fromCharCode(58,53,53)
If you escape it just with htmlentities it will become:
<img src="" onload="image_loaded();alert(String.fromCharCode(58,53,53));">
Some of the user input have a maxlength condition of 5 characters on server-side. Does that help protect against XSS? (since I hardly see an XSS code being shorter than 6 characters)
Always check data on server side, if you want also on client side, its ok, but always do it also on server side. Many modern browsers (chrome,ff,opera) allows user to edit page "on the fly" so they can easily remove the maxlength attribute.
Apart from data from the database, the only user input that is displayed back to the user was sent to the server via ajax, sanitized with htmlentities and reintroduced in the DOM using text() instead of html() (using jQuery)
From .text() jquery documentation:
We need to be aware that this method escapes the string provided as necessary so that it will render correctly in HTML. To do so, it calls the DOM method .createTextNode(), which replaces special characters with their HTML entity equivalents (such as < for <).
So probably yes, it should be enough but be aware of escaping from text() like in example above.
Your application filtering should look like this:
INPUT USER -> FILTER -> APPLICATION
OUTPUT APPLICATION -> FILTER -> USER
Not just input filtering.
I suggest using HTMLawed or HTMLPurifier for user input that needs to be displayed as HTML, or just completely stripping all HTML from user input that shouldn't contain it anyway. HTMLPurifier is the more powerful of the two, and I've never had any XSS issues in any projects with which I have used it.
our company has made a website for our client. The client hired a webs security company to test the pages for security before the product launches.
We've removed most of our XSS problems. We developed the website with zend. We add the StripTags, StringTrim and HtmlEntities filters to the order form elements.
They ran another test and it still failed :(
They used the following for the one input field in the data of the http header: name=%3Cscript%3Ealert%28123%29%3C%2Fscript%3E which basically translates to name=<script>alert(123);</script>
I've added alpha and alnum to some of the fields, which fixes the XSS vulnerability (touch wood) by removing the %, however, now the boss don't like it because what of O'Brien and double-barrel surnames...
I haven't come across the %3C as < problem reading up about XSS. Is there something wrong with my html character set or encoding or something?
I probably now have to write a custom filter, but that would be a huge pain to do that with every website and deployment. Please help, this is really frustrating.
EDIT:
if it's about escaping the form's output, how do I do that? The form submits to the same page - how do I escape if I only have in my view <?= $this->form ?>
How can I get Zend Form to escape it's output?
%3Cscript%3Ealert%28123%29%3C%2Fscript%3E is the URL-encoded form of <script>alert(123);</script>. Any time you include < in a form value, it will be submitted to the server as %3C. PHP will read and decode that back to < before anything in your application gets a look at it.
That is to say, there is no special encoding that you have to handle; you won't actually see %3C in your input, you see <. If you're failing to encode that for on-page display then you don't have even the most basic defenses against XSS.
We've removed most of our XSS problems. We developed the website with zend. We add the StripTags, StringTrim and HtmlEntities filters to the order form elements.
I'm afraid you have not fixed your XSS problems at all. You may have merely obfuscated them.
Input filtering is a depressingly common but quite wrong strategy for blocking XSS.
It is not the input that's the problem. As your boss says, there is no reason you shouldn't be able to input O'Brien. Or even <script>, like I am just now in this comment box. You should not attempt to strip tags in the input or even HTML-encode them, because who knows at input-time that the data is going to end up in an HTML page? You don't want your database filled with nonsense like 'Fish&Chips' which then ends up in an e-mail or other non-HTML context with weird HTML escapes in it.
HTML-encoding is an output-stage issue. Leave the incoming strings alone, keep them as raw strings in the database (of course, if you are hacking together queries in strings to put the data in the database instead of parameterised queries, you would need to SQL-escape the content at exactly that point). Then only when you are inserting the values in HTML, encode them:
Name: <?php echo htmlspecialchars($row['name']); ?>
If you have a load of dodgy code like echo "Name: $name"; then I'm afraid you have much rewriting to do to make it secure.
Hint: consider defining a function with a short name like h so you don't have to type htmlspecialchars so much. Don't use htmlentities which will usually-unnecessarily encode non-ASCII characters, which will also mess them up unless you supply a correct $charset argument.
(Or, if you are using Zend_View, $this->escape().)
Input validation is useful on an application-specific level, for things like ensuring telephone number fields contain numbers and not letters. It is not something you can apply globally to avoid having to think about the issues that arise when you put a string inside the context of another string—whether that's inside HTML, SQL, JavaScript string literals or one of the many other contexts that require escaping.
If you correctly escape strings every time you write them to the HTML page, you won't have any issues.
%3C is a URL-encoded <; it is decoded by the server.
I am using HTML Purifier to protect my application from XSS attacks. Currently I am purifying content from WYSIWYG editors because that is the only place where users are allowed to use XHTML markup.
My question is, should I use HTML Purifier also on username and password in a login authentication system (or on input fields of sign up page such as email, name, address etc)? Is there a chance of XSS attack there?
You should Purify anything that will ever possibly be displayed on a page. Because with XSS attacks, hackers put in <script> tags or other malicious tags that can link to other sites.
Passwords and emails should be fine. Passwords should never be shown and emails should have their own validator to make sure that they are in the proper format.
Finally, always remember to put in htmlentities() on content.
Oh .. and look at filter_var aswell. Very nice way of filtering variables.
XSS risks exist where ever data entered by one user may be viewed by other users. Even if this data isn't currently viewable, don't assume that a need to do this won't arise.
As far as the username and password go, you should never display a password, or even store it in a form that can be displayed (i.e. encyrpt it with sha1()). For usernames, have a restriction on legal characters like [A-Za-z0-9_]. Finally, as the other answer suggests, use your languages html entity encoding function for any entered data that may contain reserved or special html characters, which prevents this data from causing syntax errors when displayed.
No, I wouldn't use HTMLPurifier on username and password during login authentication. In my appllications I use alphanumeric usernames and an input validation filter and display them with htmlspecialchars with ENT_QUOTES. This is very effective and a hell lot faster than HTMLpurifier. I'm yet to see an XSS attack using alphanumeric string. And BTW HTMLPurifier is useless when filtering alphanumeric content anyway so if you force the input string through an alphanumeric filter then there is no point to display it with HTMLpurifier. When it comes to passwords they should never be displayed to anybody in the first place which eliminates the possibility of XSS. And if for some perverse reason you want to display the passwords then you should design your application in such a way that it allows only the owner of the password to be able to see it, otherwise you are screwed big time and XSS is the least of your worry!
HTML Purifier takes HTML as input, and produces HTML as output. Its purpose is to allow the user to enter html with some tags, attributes, and values, while filtering out others. This uses a whitelist to prevent any data that can contain scripts. So this is useful for something like a WYSIWYG editor.
Usernames and passwords on the other hand are not HTML. They're plain text, so HTML purifier is not an option. Trying to use HTML Purifier here would either corrupt the data, or allow XSS attacks.
For example, it lets the following through unchanged, which can cause XSS issues when inserted as an attribute value in some elements:
" onclick="javascript:alert()" href="
Or if someone tried to use special symbols in their password, and entered:
<password
then their password would become blank, and make it much easier to guess.
Instead, you should encode the text. The encoding required depends on the context, but you can use htmlentities when outputting these values if you stick to rule #0 and rule #1, at the OWASP XSS Prevention Cheat Sheet