i am using regex and blocking out the words document|window|alert|onmouseover|onclick to prevent xss, and people seem to be able to bypassing it by just typing doc\ument, how do i fix this ?
thanks!
--
edit: what about preventing xss server side? maybe refuse to serve any file that contains stuff in a GET variable?
Obviously, you would have to supply some meaningful detail to get any serious answer for your problem at hand.
As #David Dorward notes, the most easy option is to escape all HTML entities. That disables all HTML, but you don't have to deal with the plight of fighting XSS attacks.
If you need to suppot HTML, consider using a pre-made Anti-XSS filter like HTML purifier that promises to reliably block such attempts.
HTML Purifier is a standards-compliant HTML filter library written in PHP. HTML Purifier will not only remove all malicious code (better known as XSS) with a thoroughly audited, secure yet permissive whitelist, it will also make sure your documents are standards compliant, something only achievable with a comprehensive knowledge of W3C's specifications.
The simple option is to disallow any HTML and the convert all &, < and > to their respective entities (&, < and >).
The more complicated approach is to run the input through an HTML parser, apply a whitelist to element and attribute names, then serialise it back to HTML.
Is this system at all important/critical?
If so, turn it off immediately and hire a security consultant to secure it for you.
Security is a hard problem. Don't think you can get it right first time, because you won't.
If this is just a system you play around with?
Trying to stop XSS by filtering particular words is a losing battle. If you don't want HTML insertion, just HTML-encode everything. If you do want some HTML, then you need to parse the HTML, make sure it's valid and isn't going to break the page, and only then make sure it doesn't contain any elements or attributes that you don't want.
I had the same problem and only asked the question yesterday. Personally rather than deleteing tags I created a list of all the tags I did want. Using the PHP command strip_tags is what I use now.
strip_tags ( string $str [, string $allowable_tags ] )
Using this command you can simply apply it to your filter like this.
text entered:
<b>Hi</b><malicious tag>
strip_tags("<b>Hi</b><malicious tag>","<b>")
This would output <b>Hi</b>.
Related
I am building a small/test CMS using Php and Mysql.
Everything is working amazingly on the adding, editing, deleting and displaying level, but after finishing my code, I wanted to add a WYSIWYG editor in the Admin back end.
My problem is that I am using escape method to hopefully make my form a bit more secure and try to escape injections, therefore when adding a styled text, image or any other HTML code in my Editor I am getting them printed as line codes on my page(Which is completely right to avoid attacks).
MY ESCAPE METHOD:
function e($text) {
return htmlspecialchars($text, ENT_QUOTES, 'UTF-8');}
Is there any way to work around my escape method (which is think it should not be done because if I can do it every attacker could).
Or should I change my escape method to another method?
If I understand you correctly you are going to allow your users to put some formatting into the text they are going to create. For this you are going to add some WYSISWYG editor. But the question is how to distinguish the formatting and special characters which are allowed from what is not allowed. You need to clean up the text and leave only valid allowed formatting (HTML tags) and remove all malicious JavaScript or HTML.
This is not an easy task like it might sound at the first moment. I can see several approaches here.
Easiest solution to use strip_tags and specify what tags are allowed.
But please keep in mind that strip_tags is not perfect. Let me quote the manual here.
Because strip_tags() does not actually validate the HTML, partial or
broken tags can result in the removal of more text/data than expected.
This function does not modify any attributes on the tags that you
allow using allowable_tags, including the style and onmouseover
attributes that a mischievous user may abuse when posting text that
will be shown to other users.
This is a known issue. And libraries exist which do a better cleanup of HTML and JS to prevent breaks.
A bit more complicated solution would be to use some advanced library to cleanup the HTML code. For example this might be HTML Purifier
Quote from the documentation
HTML Purifier will not only remove all malicious code (better known as
XSS) with a thoroughly audited, secure yet permissive whitelist, it
will also make sure your documents are standards compliant, something
only achievable with a comprehensive knowledge of W3C's
specifications.
The other libraries exist which solve the same task. You can check for example this article where libraries are compared. And finally you might choose the best one.
Completely different approach is to avoid users from writing HTML tags. Ask them to write some other markup instead like this is done on StackOverflow or Basecamp or GitHub. Markdown might be a good approach.
Using simple markup for text allows you to complete avoid issues with broken HTML and JavaScript cause you can escape everything and build HTML markup on your own.
The editor might look like the one I'm using to write this message :)
You can use strip_tags() to remove the unwanted tags. Read about it on this manual:
http://php.net/manual/en/function.strip-tags.php
Example 1 (Based on the manual)
<?php
$text = '<p>Test paragraph, With link.</p>';
# Output: Test paragraph, With link. (Tags are stripped)
echo strip_tags($text);
echo "\n";
# Allow <p> and <a>
#Output: <p>Test paragraph, With link.</p>
echo strip_tags($text, '<p><a>');
?>
I hope this will help you!
I'm building WYSIWYG editor with HTML5 and Javascript.
I'll allow users post pure HTML via WYSIWYG, so it have to be sanitized.
Basic task like protecting site from cross site scripting (XSS) is coming difficult task, because there isn't up-to-date purify & filter -software for PHP.
HTML Purifier isn't support HTML5 at the moment and overall status looks very bad (HTML5 support isn't coming anytime soon).
So how should I sanitize untrusted HTML5 with PHP (backend) ?
Options so far...
HTML Purifier (lack of new HTML5 tags, data-attributes etc.)
Implementing own purifier with strip_tags() and Tidy or PHP's DOM classes/functions
Using some "random" Tidy implementations like http://eksith.wordpress.com/2013/11/23/whitelist-html-sanitizing-with-php/
Google Caja (Javascript / Cloud)
htmLawed (there's beta for HTML5 support)
Is there any other options out there? Is PHP dying? ;)
PHP offers parsing methods to protect from code PHP/SQL injections (i.e. mysql_real_escape_string()). This is not the case for HTML/CSS/JavaScript. Why that?
First: HTML/CSS/Javascript sole purpose is to display information. It is pretty much up to you to accept certain elements of HTML or reject them depending of your requirements.
Secondly: due to the very high number of HTML/CSS/JS elements (also increasing constantly), it is impossible to try to control HTML. you cannot expect a functional solution.
This is why I would suggest a top-down solution. I suggest to start restricting everything and then only allowing a certain number of tags. One good base is probably to use BBCdode, pretty popular. If you want to "unlock" additional specific tags beyond BBCode, you can always add some.
This is the reason BBCode-like scripts are popular on forums and websites (including stack overflow). WISIGIG editors are designed for admin/internal use, because you don't expect your website administrator to inject bad content.
bottom-top approaches are vowed to fail. HTML sanitizers are exposed to exponential complexity and do not guarantee anything.
EDIT 1
You say it is a sanitation problem, not a front end issue. I disagree, because as you cannot handle all present and future HTML entities you would better restrict it at a front end level to be 100% sure.
This said, perhaps the below is a working solution for you:
you can do a bit to sanitize your code by striping all entities
except those in a white list using PHP's strip_tags().
You can also remove all remaining tags attributes (properties)
by using PHP's preg_replace() with some regular expression.
$string = "put some very dirty HTML here.";
$string = strip_tags($string, '<p><a><span><h1><li><ul><br>');
$string = preg_replace("/<([b-z][b-z0-9]*)[^>]*?(\/?)>/i",'<$1$2>', $string);
echo $string;
This will return your sanitized text.
note : I have excluded attributes removal for tags because you may still want to keep href="" properties. hence the [b-z][B-Z] regex.
I Believe the ideal is to use a combination :
mysql_real_escape_string(addslashes($_REQUEST['data']));
On Write
and
stripslashes($data)
on read always did the trick for me, I think it is better than
htmentities($data) on write
and
html_entity_decode($data) on read
I have a website related to entertainment. So, I have thought to use a new method to prevent XSS Attack. I have created the following words list
alert(, javascript, <script>,<script,vbscript,<layer>,
<layer,scriptalert,HTTP-EQUIV,mocha:,<object>,<object,
AllowScriptAccess,text/javascript,<link>, <link,<?php, <?import,
I have thought that because my site is related to entertainment, So I do not expect from a normal user (other than malicious user) to use such kind of words in his comment. So, I have decided to remove all the above comma separated words from the user submitted string. I need your advice. Do I no need to use htmlpurifier like tools after doing this?
Note: I am not using htmlspecialchars() because it will also convert the tags generated from my Rich Text Editor (CKEditor), so user formatted will be gone.
Using a black list is a bad idea as it is simple to circumvent. For example, you are checking for and presumably removing <script>. To circumvent this, a malicious user can enter:
<scri<script>pt>
your code will strip out the middle <script> leaving the outer <script> intact and saved to the page.
If you need to enter HTML and your users do not, then prevent them from entering HTML. You need to have a separate method, only accessible to you, for entering articles that with HTML.
This approach misunderstands what the HTML-injection problem is, and is utterly ineffective.
There are many, many more ways to put scripting in HTML than the above list, and many ways to evade the filter by using escaped forms. You will never catch all potential "harmful" constructs with this kind of naive sequence blacklisting, and if you try you will inconvenience users with genuine comments. (eg banning use of words beginning with on...)
The correct way to prevent HTML-injection XSS is:
use htmlspecialchars() when outputting content that is supposed to be normal text (which is the vast majority of content);
if you need to allow user-supplied HTML markup, whitelist the harmless tags and attributes you wish to allow, and enforce that using HTMLPurifier or another similar library.
This is a standard and well-understood part of writing a web application, and is not difficult to implement.
Why not just make a function that reverts the changes htmlspecialchars() made for the specific tags you want to be available, such as <b><i><a> etc?
Hacks to circumvent your list aside, it's always better to use a whitelist than a blacklist.
In this case, you would already have a clear list of tags that you want to support, so just whitelist tags like <em>, <b>, etc, using some HTML purifier.
you can try with
htmlentities()
echo htmlentities("<b>test word</b>");
ouput: <b>test word</b>gt;
strip_tags()
echo strip_tags("<b>test word</b>");
ouput: test word
mysql_real_escape_string()
or try a simple function
function clean_string($str) {
if (!get_magic_quotes_gpc()) {
$str = addslashes($str);
}
$str = strip_tags(htmlspecialchars($str));
return $str;
}
I'm planning to use Markdown syntax in my web page. I will keep users input (raw, no escaping or whatever) in the database and then, as usual, print out and escape on-the-fly with htmlspecialchars().
This is how it could look:
echo markdown(htmlspecialchars($content));
By doing that I'm protected from XSS vulnerabilities and Markdown works. Or, at least, kinda work.
The problem is, lets say, > syntax (there are other cases too, I think).
In short, to quote you do something like this:
> This is my quote.
After escaping and parsing to Markdown I get this:
> This is my quote.
Naturally, Markdown parser do not recognize > as “quote's symbol” and it does not work! :(
I came here to ask for solutions to this problem. One idea was to:
First, parse to Markdown, — then with HTML Purifier remove “bad parts”.
What do you think about it? Would it actually work?
I'm sure that someone had have the same situation and the one can help me too. :)
Yes, a certain website has that exact same situation. At the time I'm writing this, you have 1664 reputation on that website :)
On Stack Overflow, we do exactly what you describe (except that we don't render on the fly). The user-entered Markdown source is converted to plain HTML, and the result is then sanitized using a whitelist approach (JavaScript version, C# version part 1, part 2).
That's the same approach that HTML Purifier takes (having never used it, I can't speak for details though).
The approach you are using is not secure. Consider, for instance, this example: "[clickme](javascript:alert%28%22xss%22%29)". In general, don't escape the input to the Markdown processor. Instead, use Markdown properly in a safe mode, or apply HTML Purifier or another HTML sanitizer to the output of the Markdown processor.
I've written elsewhere about how to use Markdown securely. See the link for details about how to use it safely, but the short version is: it is important to use the latest version, to set safe_mode, and to set enable_attributes=False.
I developed a web application, that permits my users to manage some aspects of a web site dynamically (yes, some kind of cms) in LAMP environment (debian, apache, php, mysql)
Well, for example, they create a news in their private area on my server, then this is published on their website via a cURL request (or by ajax).
The news is created with an WYSIWYG editor (fck at moment, probably tinyMCE in the next future).
So, i can't disallow the html tags, but how can i be safe?
What kind of tags i MUST delete (javascripts?)?
That in meaning to be server-safe.. but how to be 'legally' safe?
If an user use my application to make xss, can i be have some legal troubles?
If you are using php, an excellent solution is to use HTMLPurifier. It has many options to filter out bad stuff, and as a side effect, guarantees well formed html output. I use it to view spam which can be a hostile environment.
It doesn't really matter what you're looking to remove, someone will always find a way to get around it. As a reference take a look at this XSS Cheat Sheet.
As an example, how are you ever going to remove this valid XSS attack:
<IMG SRC=javascript:alert('XSS')>
Your best option is only allow a subset of acceptable tags and remove anything else. This practice is know as White Listing and is the best method for preventing XSS (besides disallowing HTML.)
Also use the cheat sheet in your testing; fire as much as you can at your website and try to find some ways to perform XSS.
The general best strategy here is to whitelist specific tags and attributes that you deem safe, and escape/remove everything else. For example, a sensible whitelist might be <p>, <ul>, <ol>, <li>, <strong>, <em>, <pre>, <code>, <blockquote>, <cite>. Alternatively, consider human-friendly markup like Textile or Markdown that can be easily converted into safe HTML.
Rather than allow HTML, you should have some other markup that can be converted to HTML. Trying to strip out rogue HTML from user input is nearly impossible, for example
<scr<script>ipt etc="...">
Removing from this will leave
<script etc="...">
Kohana's security helper is pretty good. From what I remember, it was taken from a different project.
However I tested out
<IMG SRC=javascript:alert('XSS')>
From LFSR Consulting's answer, and it escaped it correctly.
For a C# example of white list approach, which stackoverflow uses, you can look at this page.
If it is too difficult removing the tags you could reject the whole html-data until the user enters a valid one.
I would reject html if it contains the following tags:
frameset,frame,iframe,script,object,embed,applet.
Also tags which you want to disallow are: head (and sub-tags),body,html because you want to provide them by yourself and you do not want the user to manipulate your metadata.
But generally speaking, allowing the user to provide his own html code always imposes some security issues.
You might want to consider, rather than allowing HTML at all, implementing some standin for HTML like BBCode or Markdown.
I use this php strip_tags function because i want user can post safely and i allow just few tags which can be used in post in this way nobody can hack your website through script injection so i think strip_tags is best option
Clich here for code for this php function
It is very good function in php you can use it
$string = strip_tags($_POST['comment'], "<b>");