Preventing server-side scripting, XSS - php

Are there any pre-made scripts that I can use for PHP / MySQL to prevent server-side scripting and JS injections?
I know about the typical functions such as htmlentities, special characters, string replace etc. but is there a simple bit of code or a function that is a failsafe for everything?
Any ideas would be great. Many thanks :)
EDIT: Something generic that strips out anything that could be hazardous, ie. greater than / less than signs, semi-colons, words like "DROP", etc?
I basically just want to compress everything to be alphanumeric, I guess...?

Never output any bit of data whatsoever to the HTML stream that has not been passed through htmlspecialchars() and you're done. Simple rule, easy to follow, completely eradicates any XSS risk.
As a programmer it's your job to do it, though.
You can define
function h(s) { return htmlspecialchars(s); }
if htmlspecialchars() is too long to write 100 times per PHP file. On the other hand, using htmlentities() is not necessary at all.
The key point is: There is code, and there is data. If you intermix the two, bad things ensue.
In the case of HTML, code is elements, attribute names, entities, comments. Data is everything else. Data must be escaped to avoid being mistaken for code.
In case of URLs, code is the scheme, the host name, the path, the mechanism of the query string (?, &, =, #). Data is everything in the query string: parameter names and values. They must be escaped to avoid being mistaken for code.
URLs embedded in HTML must be doubly escaped (by URL-escaping and HTML-escaping) to ensure proper separation of code and data.
Modern browsers are capable of parsing amazingly broken and incorrect markup into something useful. This capability should not be stressed, though. The fact that something happens to work (like URLs in <a href> without proper HTML-escaping applied) does not mean that it's good or correct to do it. XSS is a problem that roots in a) people unaware of data/code separation (i.e. "escaping") or those that are sloppy and b) people that try to be clever about what part of data they don't need to escape.
XSS is easy enough to avoid if you make sure you don't fall into categories a) and b).

I think Google-caja maybe a solution. I write a taint analyzer for java web application to detect and prevent XSS automatically. But not for PHP. I think Learning to using caja not bad for web developer.

No, there isn't. Risks depend on what you do with data, you can't write something that makes data safe for everything (unless you want to discard most of the data)

is there a simple bit of code or a function that is a failsafe for everything?
No.
The representation of data leaving PHP must be converted / encoded specifically according where it is going. And therefore should only be converted/encoded at the point where it leaves PHP.
C.

You can refer to OWASP to get more understanding of XSS attacks:
https://www.owasp.org/index.php/XSS_Filter_Evasion_Cheat_Sheet
To avoid js attacks, you can try this project provided by open source excellence:
https://www.opensource-excellence.com/shop/ose-security-suite.html
my website had the js attacks before, and this tool helps me block all attacks everyday. i think it can help you guys to avoid the problem.
Further, you can add a filter in your php script to filter all js attacks, here is one pattern that can do job:
if (preg_match('/(?:[".]script\s*()|(?:\$\$?\s*(\s*[\w"])|(?:/[\w\s]+/.)|(?:=\s*/\w+/\s*.)|(?:(?:this|window|top|parent|frames|self|content)[\s*[(,"]\s[\w\$])|(?:,\s*new\s+\w+\s*[,;)/ms', strtolower($POST['VARIABLENAME'])))
{
filter_variable($POST['VARIABLENAME']);
}

To answer to your edition: everything except <> symbols has nothing to do with XSS.
And htmlspecialchars() can deal with them.
There is no harm in the word DROP table in the page's text ;)

for clean user data use
html_special_chars(); str_replace() and other funcs to cut unsafe data.

Related

Is my method of finding XSS vulnerability's a good place to start?

Information
This is a quick question regarding finding possible XSS attacks in my website.
I am currently securing my website and have learnt that a good way to prevent XSS is to use htmlspecialchars($string, ENT_QUOTES, 'UTF-8'); to make sure that the html is displayed rather than ran.
My solution
I have pumped my database with test data which is exactly as below <script>console.log("This Page is Vulnerable");</script>
Therefore any page that displays any row of data that isnt escaped will put out a console.log which will then allow me to hunt it down in my source and escape it.
Question
Now I understand this isn't the only thing I would have to do to prevent XSS, but does this at least narrow the possibility of persistent XSS attacks?
Lastly, does anyone have any advice on where to go from here? (I understand this is a vague question, so please ignore if you like. Otherwise, any questions will be answered ASAP.)
Thank you.
Yes, ofcourse, a single function wrapping the input/output data is never enough. Programming ain't no magic, it's logic.
Assuming you have this example HTML/JS/PHP for some reason:
<form action="" method="POST" />
<input type="text" name="yourInput" />
<input type="submit"/>
</form>
<script><?= htmlspecialchars($_POST['yourInput'], ENT_QUOTES, 'UTF-8');?></script>
For some reason, which maybe nobody can explain, you put the user input into <script> tags. You have used HTML Special Chars, so no quotes or <, > will be present, but let's say the user enter this:
alert(document.cookie)
there's no quotes and opening tags, but still after submitting the form, on the next load an ALERT with the current user cookie will be displayed.
So, as I said in my comments, you should go through all your inputs and test their behavior. Try to think as a potential hacker.
It's not all about using a function which is wrapping the data, but where the data is used. If you put it in the wrong place, no function will save you. In the example above, you need to place data in the right place, and if you are going to use it - use it as a string.
I'd say it is exactly what you say — "a good start" — insofar as it should catch any simple mistakes where you simply meant to escape the user input but forgot.
In fact, I'd be inclined to simplify it further, and just try injecting an arbitrary string that should get escaped, like, say "<<<<<", and searching the HTML source of the returned page for that string to see if it appears unescaped anywhere. (Don't forget to also test it with single and double quotes, since those need to be escaped too in attributes.) This kind of testing would even be fairly simple to automate.
What neither of those methods will catch is cases where you've tried to escape or sanitize the input, but have done it poorly, so that the simple test inputs won't pass, but other vulnerabilities still remain. However, if you can be confident that you haven't done that, then these simple tests should be enough.
Anyway, where you IMO should go from here is getting in the habit of writing your code such that XSS vulnerabilities can't even happen. In particular, get used to:
Escaping every string that isn't supposed to be HTML with htmlspecialchars() before embedding it in HTML. (Of course, the same goes, mutatis mutandis, for embedding anything that isn't supposed to be SQL into SQL code, or anything that isn't supposed to be JavaScript into JavaScript code, etc.)
This is IMO best done just before the string is printed or concatenated with HTML, so that you never have to worry about whether or not the string has already been escaped earlier. (Tip: if you're tired of typing htmlspecialchars() all the time, consider defining your own, shorter alias for it.)
Thinking about user inputs (or anything else potentially coming from unknown sources) not in terms of what they're supposed to contain, or even what an attacker might inject into them, but simply in terms of what they could possibly contain.
That is, if your code takes a string as an input, write it so that it behaves correctly even if given any string of any length containing any arbitrary bytes in any order.
Conversely, if you cannot be sure that some code will do that (e.g. because you didn't write it), explicitly verify and/or force any inputs you pass into it to only have values that you know it can handle.

Is using htmlspecialchars() sufficient in all situations?

My users are allowed to insert anything into my database.
So using a whitelist / blacklist of characters is not an option.
I'm not worried (covered it) about the database end (SQL injection), but rather code injection in my pages.
Are there any situations where htmlspecialchars() wouldn't be sufficient to prevent code injection?
Plain htmlspecialchars is not sufficient when inserting user text into single quoted attributes. You need to add ENT_QUOTES in that case and you need to pass the encoding.
<tag attr='<?php echo htmlspecialchars($usertext);?>'> //dangerous if ENT_QUOTES is not used
When inserting user text into javascript/json as string you'll need additional escaping.
I think it fails for stange character sets too. But if you use one of the usual charsets UTF-8, Latin1,... it will work as expected.
No, it's not sufficient in all situations. It highly depends on your codebase. For example, if you use JavaScript to make certain AJAX requests to a database, htmlspecialchars() will sometimes not be enough (depending where you use it). If you want to protect cookies from JavaScript XSS, htmlspecialchars() will also not be good enough.
Here are some examples of when htmlspecialchars() may fail: https://www.owasp.org/index.php/Interpreter_Injection#Why_htmlspecialchars_is_not_always_enough. Your question is also highly dependent on what database you're using (not everyone uses MySQL). If you're writing a complex applicaton I highly suggest using one of the many frameworks out there that abstract these annoying little idiosyncrasies and let you worry about the application code.
Using htmlspecialchars is sufficient when inserting inside HTML code. The way it encodes the characters makes it impossible for the resulting text to “break out” of the current element. That way it can neither create other elements, nor script segments etc.
However in all other situations, htmlspecialchars it not automatically enough. For example when you use it to insert code within some JavaScript area, for example when you fill a JavaScript string with it, you will need additional methods to make it safe. In that case addslashes could help.
So depending on where you insert the resulting text, htmlspecialchars gives you either enough security or not. As the function name already suggests, it just promises security for HTML.
htmlspecialchars will suffice. With < and > being converted to < and > you cannot include scripts anymore.

What is the correct/safest way to escape input in a forum?

I am creating a forum software using php and mysql backend, and want to know what is the most secure way to escape user input for forum posts.
I know about htmlentities() and strip_tags() and htmlspecialchars() and mysql_real_escape_string(), and even javascript's escape() but I don't know which to use and where.
What would be the safest way to process these three different types of input (by process, I mean get, save in a database, and display):
A title of a post (which will also be the basis of the URL permalink).
The content of a forum post limited to basic text input.
The content of a forum post which allows html.
I would appreciate an answer that tells me how many of these escape functions I need to use in combination and why.
Thanks!
When generating HTLM output (like you're doing to get data into the form's fields when someone is trying to edit a post, or if you need to re-display the form because the user forgot one field, for instance), you'd probably use htmlspecialchars() : it will escape <, >, ", ', and & -- depending on the options you give it.
strip_tags will remove tags if user has entered some -- and you generally don't want something the user typed to just disappear ;-)
At least, not for the "content" field :-)
Once you've got what the user did input in the form (ie, when the form has been submitted), you need to escape it before sending it to the DB.
That's where functions like mysqli_real_escape_string become useful : they escape data for SQL
You might also want to take a look at prepared statements, which might help you a bit ;-)
with mysqli - and with PDO
You should not use anything like addslashes : the escaping it does doesn't depend on the Database engine ; it is better/safer to use a function that fits the engine (MySQL, PostGreSQL, ...) you are working with : it'll know precisely what to escape, and how.
Finally, to display the data inside a page :
for fields that must not contain HTML, you should use htmlspecialchars() : if the user did input HTML tags, those will be displayed as-is, and not injected as HTML.
for fields that can contain HTML... This is a bit trickier : you will probably only want to allow a few tags, and strip_tags (which can do that) is not really up to the task (it will let attributes of the allowed tags)
You might want to take a look at a tool called HTMLPUrifier : it will allow you to specify which tags and attributes should be allowed -- and it generates valid HTML, which is always nice ^^
This might take some time to compute, and you probably don't want to re-generate that HTML each time is has to be displayed ; so you can think about storing it in the database (either only keeping that clean HTML, or keeping both it and the not-clean one, in two separate fields -- might be useful to allow people editing their posts ? )
Those are only a few pointers... hope they help you :-)
Don't hesitate to ask if you have more precise questions !
mysql_real_escape_string() escapes everything you need to put in a mysql database. But you should use prepared statements (in mysqli) instead, because they're cleaner and do any escaping automatically.
Anything else can be done with htmlspecialchars() to remove HTML from the input and urlencode() to put things in a format for URL's.
There are two completely different types of attack you have to defend against:
SQL injection: input that tries to manipulate your DB. mysql_real_escape_string() and addslashes() are meant to defend against this. The former is better, but parameterized queries are better still
Cross-Site scripting (XSS): input that, when displayed on your page, tries to execute JavaScript in a visitor's browser to do all kinds of things (like steal the user's account data). htmlspecialchars() is the definite way to defend against this.
Allowing "some HTML" while avoiding XSS attacks is very, very hard. This is because there are endless possibilities of smuggling JavaScript into HTML. If you decided to do this, the safe way is to use BBCode or Markdown, i.e. a limited set of non-HTML markup that you then convert to HTML, while removing all real HTML with htmlspecialchars(). Even then you have to be careful not to allow javascript: URLs in links. Actually allowing users to input HTML is something you should only do if it's absolutely crucial for your site. And then you should spend a lot of time making sure you understand HTML and JavaScript and CSS completely.
The answer to this post is a good answer
Basically, using the pdo interface to parameterize your queries is much safer and less error prone than escaping your inputs manually.
I have a tendency to escape all characters that would be problematic in page display, Javascript and SQL all at the same time. It leaves it readable on the web and in HTML eMail and at the same time removes any problems with the code.
A vb.NET Line Of Code Would Be:
SafeComment = Replace( _
Replace(Replace(Replace( _
Replace(Replace(Replace( _
Replace(Replace(Replace( _
Replace(Replace(Replace( _
HttpUtility.HtmlEncode(Trim(strInput)), _
":", ":"), "-", "-"), "|", "|"), _
"`", "`"), "(", "("), ")", ")"), _
"%", "%"), "^", "^"), """", """), _
"/", "/"), "*", "*"), "\", "\"), _
"'", "'")
First of all, general advice: don't escape variables literally when inserting in the database. There are plenty of solutions that let you use prepared statements with variable binding. The reason to not do this explicitly is because it is only a matter of time then before you forget it just once.
If you're inserting plain text in the database, don't try to clean it on insert, but instead clean it on display. That is to say, use htmlentities to encode it as HTML (and pass the correct charset argument). You want to encode on display because then you're no longer trusting that the database contents are correct, which isn't necessarily a given.
If you're dealing with rich text (html), things get more complicated. Removing the "evil" bits from HTML without destroying the message is a difficult problem. Realistically speaking, you'll have to resort to a standardized solution, like HTMLPurifier. However, this is generally too slow to run on every page view, so you'll be forced to do this when writing to the database. You'll also have to ensure that the user can see their "cleaned up" html and correct the cleaned up version.
Definitely try to avoid "rolling your own" filter or encoding solution at any step. These problems are notoriously tricky, and you run a large risk of overlooking some minor detail that has big security implications.
I second Joeri, do not roll your own, go here to see some of the the many possible XSS attacks
http://ha.ckers.org/xss.html
htmlentities() -> turns text into html, converting characters to entities. If using UTF-8 encoding then use htmlspecialchars() instead as the other entities are not needed. This is the best defence against XSS. I use it on every variable I output regardless of type or origin unless I intend it to be html. There is only a tiny performance cost and it is easier than trying to work out what needs escaping and what doesn't.
strip_tags() - turns html into text by removing all html tags. Use this to ensure that there is nothing nasty in your input as a adjunct to escaping your output.
mysql_real_escape_string() - escapes a string for mysql and is your defence against SQL injections from little Bobby tables (better to use mysqli and prepare/bind as escaping is then done for you and you can avoid lots of messy string concatenations)
The advice given obve re avoiding HTML input unless it is essential and opting for BBCode or similar (make your own up if needs be) is very sound indeed.

Comprehensive server-side validation

I currently have a fairly robust server-side validation system in place, but I'm looking for some feedback to make sure I've covered all angles. Here is a brief outline of what I'm doing at the moment:
Ensure the input is not empty, or is too long
Escape query strings to prevent SQL injection
Using regular expressions to reject invalid characters (this depends on what's being submitted)
Encoding certain html tags, like <script> (all tags are encoded when stored in a database, with some being decoded when queried to render in the page)
Is there anything I'm missing? Code samples or regular expressions welcome.
You shouldn't need to "Escape" query strings to prevent SQL injection - you should be using prepared statements instead.
Ideally your input filtering will happen before any other processing, so you know it will always be used. Because otherwise you only need to miss one spot to be vulnerable to a problem.
Don't forget to encode HTML entities on output - to prevent XSS attacks.
You should encode every html tag, not only 'invalid' ones. This is a hot debate, but basically it boils down to there will always be some invalid HTML combination that you will forget to handle correctly (nested tags, mismatched tags some browsers interpret 'correctly' and so on). So the safest option in my opinion is to store everything as htmlentities and then, on output, print a validated HTML-safe-subset tree (as entities) from the content.
Run all server-side validation in a library dedicated to the task so that improvements in one area affect all of your application.
Additionally include work against known attacks, such as directory traversal and attempts to access the shell.
This Question/Answer has some good responses that you're looking for
(PHP-oriented, but then again you didn't specify language/platform and some of it applies beyond the php world):
What's the best method for sanitizing user input with PHP?
You might check out the Filter Extension for data filtering. It won't guarantee that you're completely airtight, but personally I feel a lot better using it because that code has a whole lot of eyeballs looking over it.
Also, consider prepared statements seconded. Escaping data in your SQL queries is a thing of the past.

How do I HTML Encode all the output in a web application?

I want to prevent XSS attacks in my web application. I found that HTML Encoding the output can really prevent XSS attacks. Now the problem is that how do I HTML encode every single output in my application? I there a way to automate this?
I appreciate answers for JSP, ASP.net and PHP.
One thing that you shouldn't do is filter the input data as it comes in. People often suggest this, since it's the easiest solution, but it leads to problems.
Input data can be sent to multiple places, besides being output as HTML. It might be stored in a database, for example. The rules for filtering data sent to a database are very different from the rules for filtering HTML output. If you HTML-encode everything on input, you'll end up with HTML in your database. (This is also why PHP's "magic quotes" feature is a bad idea.)
You can't anticipate all the places your input data will travel. The safe approach is to prepare the data just before it's sent somewhere. If you're sending it to a database, escape the single quotes. If you're outputting HTML, escape the HTML entities. And once it's sent somewhere, if you still need to work with the data, use the original un-escaped version.
This is more work, but you can reduce it by using template engines or libraries.
You don't want to encode all HTML, you only want to HTML-encode any user input that you're outputting.
For PHP: htmlentities and htmlspecialchars
For JSPs, you can have your cake and eat it too, with the c:out tag, which escapes XML by default. This means you can bind to your properties as raw elements:
<input name="someName.someProperty" value="<c:out value='${someName.someProperty}' />" />
When bound to a string, someName.someProperty will contain the XML input, but when being output to the page, it will be automatically escaped to provide the XML entities. This is particularly useful for links for page validation.
A nice way I used to escape all user input is by writing a modifier for smarty wich escapes all variables passed to the template; except for the ones that have |unescape attached to it. That way you only give HTML access to the elements you explicitly give access to.
I don't have that modifier any more; but about the same version can be found here:
http://www.madcat.nl/martijn/archives/16-Using-smarty-to-prevent-HTML-injection..html
In the new Django 1.0 release this works exactly the same way, jay :)
My personal preference is to diligently encode anything that's coming from the database, business layer or from the user.
In ASP.Net this is done by using Server.HtmlEncode(string) .
The reason so encode anything is that even properties which you might assume to be boolean or numeric could contain malicious code (For example, checkbox values, if they're done improperly could be coming back as strings. If you're not encoding them before sending the output to the user, then you've got a vulnerability).
You could wrap echo / print etc. in your own methods which you can then use to escape output. i.e. instead of
echo "blah";
use
myecho('blah');
you could even have a second param that turns off escaping if you need it.
In one project we had a debug mode in our output functions which made all the output text going through our method invisible. Then we knew that anything left on the screen HADN'T been escaped! Was very useful tracking down those naughty unescaped bits :)
If you do actually HTML encode every single output, the user will see plain text of <html> instead of a functioning web app.
EDIT: If you HTML encode every single input, you'll have problem accepting external password containing < etc..
The only way to truly protect yourself against this sort of attack is to rigorously filter all of the input that you accept, specifically (although not exclusively) from the public areas of your application. I would recommend that you take a look at Daniel Morris's PHP Filtering Class (a complete solution) and also the Zend_Filter package (a collection of classes you can use to build your own filter).
PHP is my language of choice when it comes to web development, so apologies for the bias in my answer.
Kieran.
OWASP has a nice API to encode HTML output, either to use as HTML text (e.g. paragraph or <textarea> content) or as an attribute's value (e.g. for <input> tags after rejecting a form):
encodeForHTML($input) // Encode data for use in HTML using HTML entity encoding
encodeForHTMLAttribute($input) // Encode data for use in HTML attributes.
The project (the PHP version) is hosted under http://code.google.com/p/owasp-esapi-php/ and is also available for some other languages, e.g. .NET.
Remember that you should encode everything (not only user input), and as late as possible (not when storing in DB but when outputting the HTTP response).
Output encoding is by far the best defense. Validating input is great for many reasons, but not 100% defense. If a database becomes infected with XSS via attack (i.e. ASPROX), mistake, or maliciousness input validation does nothing. Output encoding will still work.
there was a good essay from Joel on software (making wrong code look wrong I think, I'm on my phone otherwise I'd have a URL for you) that covered the correct use of Hungarian notation. The short version would be something like:
Var dsFirstName, uhsFirstName : String;
Begin
uhsFirstName := request.queryfields.value['firstname'];
dsFirstName := dsHtmlToDB(uhsFirstName);
Basically prefix your variables with something like "us" for unsafe string, "ds" for database safe, "hs" for HTML safe. You only want to encode and decode where you actually need it, not everything. But by using they prefixes that infer a useful meaning looking at your code you'll see real quick if something isn't right. And you're going to need different encode/decode functions anyways.

Categories