Protect HTML form from Javascript/Attacks

Protect HTML form from Javascript/Attacks - php

I have a form and as of right now, you can type any javascript, etc. you want. Any XSS, etc.
How do I go about creating a whitelist so you can only post characters.
At some point I would like anything that starts with http:// to be converted to
Thanks
Is this efficient?
http://htmlpurifier.org/

jQuery or Javascript is preferred
Well, no, you can't do that, you see? Because even if you 'sanitize' your data using javascript, noone's stopping anyone from
turning off javascript
using a browser's developer console to mess with the data
doing the POST directly, without a browser
In other words, you have to perform the validation/sanitization on the server side. Javascript validation is there to enhance the experience of your users (by providing instant feedback on invalid input, for example).

But still, in many high-load applications developers use partially client-side verifications (but all inputs have to be prepared for writing to db).
As you will be using PHP, i suggest you to parse your $_POST values with htmlspecialchars(), mysql_real_escape_string() and so on.
You will have to use regular expression to convert anything that starts with "http://" to links (well, you can also use explode('.', $_POST['yourInput']) which can be easier for you).

Related

Stop users from entering PHP code in textarea?

I have a PHP page with textareas that users can change, and their values get saved and displayed on another PHP page - I'm afraid this could be vulnerable to XSS attacks (or whatever malicious hackers are using today)... I see http://htmlpurifier.org is a nice solution to avoid XSS attacks, and I read in an SO thread that PHP code entered into a textarea is ignored by browsers and not executed server-side. I just want to know if htmlpurifier will protect my site fully and if there's any chance that old browsers like IE6 aren't smart enough to ignore PHP code like that. It's my first time making a complex site so I'm tip-toeing around the topic of security... Thanks :)
On a side note, I've used stripslashes and nl2br to avoid formatting issues with apostrophes and line breaks, but is there anything else I should be using to avoid unexpected display issues?

Just use htmlspecialchars() on output and the special characters no longer have their literal meaning and won't be processed by the browser.
PHP code itself will be ignored by the browser. The browser will think it is just some large weird <?php ... '?> element.

To answer your questions specifically...
No, you don't have to worry about the browser executing PHP code that a user has inputted. That's typically only something you have to worry about when you do "includes" inside php scripts, and even then, as long as you structure them properly, you have nothing to worry about. This is because PHP is interpreted server-side (on your webserver) rather than client-side (in the browser). Also, this type of attack would be more in-line with RFI or Code Injection (if you'd like some terms to google), rather than XSS.
Stripslashes can be useful for certain things (potentially with regards to SQL attacks, etc.) but isn't the main defense for XSS attacks.
With HTMLPurifier running by itself, you will be fine against XSS attacks (providing you configure it correctly, etc.)
That said, it's always best to filter user input against a whitelist rather than trying to blacklist 'bad' characters/input. What type of data do you want users to be able to input? Just regular text? BBCode + text? Html?

PHP code is server code. Browsers don't include a PHP interpreter so they won't execute it.

Sanitizing Form Input for administrators

In my site's administration area, I have been using mysqli_real_escape_string when retrieving form input that goes into the database. It works fine but I realize that it does not prevent script injections. I mean I can pass through scripts like:
<script>alert('hello');</script>
What do I use in addition to this to prevent a malicious admin from injecting some nasty stuff?
htmlentities()?
strip_tags()?
htmlspecialchars()?
What is the proper way to sanitize form input in back-end forms where html is not required for input data? I am confused?

htmlentities() and htmlspecialchars() are used when you're outputting data. Encoding and escaping are different.
If you don't want HTML, my recommendation would be to use strip_tags() to clean it of any HTML tags and use html* when you're outputting the content.
Also, you might consider switching to MySQL PDO. This is a much more preferred and secure way of running your queries.

The term you are looking for is Cross Site Scripting or XSS for short. Searching for that should give you plenty of resources, such as this question right here on StackOverflow.

The proper answer is highly dependent on your application.
Many administration systems need a way for admins to manipulate HTML. But some HTML is more dangerous than others.
As JohnP said, strip_tags() can be handy, since the second parameter allows you to explicitly allow certain, harmless tags (like or ), while stripping out anything else (like or )
If you need more sophistication than that, you'll need to do a more careful analysis and come up with a solution tailored to your needs. (Hint: If that solution involves using regular expressions to match HTML tags, you probably want to take a step back)

You should use htmlentities() .

You can use magic_quotes function to sanitize if you're using php 4 or less php 5.2 or less.

Serving JSON and HTML securely to JavaScript

I am thinking of secure ways to serve HTML and JSON to JavaScript. Currently I am just outputting the JSON like:
ajax.php?type=article&id=15
{
"name": "something",
"content": "some content"
}
but I do realize this is a security risk -- because the articles are created by users. So, someone could insert script tags (just an example) for the content and link to his article directly in the AJAX API. Thus, I am now wondering what's the best way to prevent such issues. One way would be to encode all non alphanumerical characters from the input, and then decode in JavaScript (and encode again when put in somewhere).
Another option could be to send some headers that force the browser to never render the response of the AJAX API requests (Content-Type and X-Content-Type-Options).

If you set the Content-Type to application/json then NO Browser will execute JavaScript on that page. This is apart of RFC-4627, and Google uses this to protect them selves. Other Application/ Content types follow similar rules.
You still have to worry about DOM Based XSS, however this would be a problem with your JavaScript, not really the content of the json. Another more exotic security concern with Json is information leakage like this vulnerability in gmail.
Make sure to always test your code. There is the Sitewatch free xss scanner, or the open source Skipfish and finally you could test this manually with a simple <script>alert(/xss/)</script>.

Instead of worrying about how you could encode the malicious code when you return it, you should probably take care that it does not even get into your database. A quick google search about preventing cross-site scripting and input validation might help you here. Cheers

If the user has to be logged in to view the web page then secure the ajax.php with the same authorization mechanism. Then a client that's not logged in cannot access ajax.php directly to retrieve the data.

I don't think your question is about validating user input, as others pointed out. You don't want to provide your JSON api to other people... right?
If this is the case then there isn't much you can do... in fact, even if you were serving HTML instead of JSON, people would still be doing HTML scraping to get what they wanted from your site (this is how Search Engine spiders work).
A good way to prevent scraping is to allow only a specific amount of downloads from an IP address. This way if someone is requesting http://yoursite.com/somejson.json more than 100 times a day, you probably know it's a scraper, and not someone visiting your page for 100 times in 1 day.

Insertion of script tags (or SQL) is only a problem if you fail to ensure it isn't at the point that it could be a problem.
A <script> tag in the middle of a comment that somebody submits will not hurt your server and it won't hurt your database. What it would hurt, if you fail to take appropriate measures, would be a page that includes the comment when you subsequently serve it up and it reaches a client browser. In order to prevent that from happening, your code that prepares the page must make sure that user-supplied content is always scrubbed before it is exposed to an unaware interpreter. In this case, that unaware interpreter is a client web browser. In fact, your client web browser really involves two unaware interpreters: the HTML parser & layout engine and the Javascript interpreter.
Another important example of an unaware interpreter is your database server. Note that a <script> tag is (almost certainly) harmless to your database, because "" doesn't mean anything in SQL. It's other sorts of input that cause problems for SQL, like quotes in strings (which are harmless to your HTML pages!).
Stackoverflow would be pretty lame if I couldn't put <script> tags in my answers, as I'm doing now. Same goes for examples of SQL Injection attacks. Recently somebody linked a page from some prominent US bank, where a big <textarea> was footnoted by a warning not to include the characters "<" or ">" in whatever you typed. Predictably, the bank was ridiculed over hundreds of Reddit comments, and rightly so.
Exactly how you "scrub" user-supplied content depends on the unaware interpreter to which you're delivering it. If it's going to be dropped in the middle of HTML markup, then you have to make sure that the "<", ">", and "&" characters are all encoded as HTML entitites. (You might want to do quote characters too, if the content might end up in an HTML element attribute value.) If the content is to be dropped into Javascript, however, you may not need to worry about HTML escaping, but you do need to worry about quotes, and possibly Unicode characters outside the 7-bit range.

For outputting safe html from php, I recommend http://htmlpurifier.org/

WYSIWYG editor security question (preventing malicious input)

I'm using jWYSIWYG in a form I'm creating that posts to a database and was wondering how you can prevent a malicious user from trying to inject code in the frame?
Doesn't the editor need brackets (which I'd normally strip during the post process) in order to display styles?

If the editor allows arbitrary HTML, you're fighting a losing battle since users could simply use the editor to craft their malicious content.
If the editor only allows for a subset of markup, then it should use an alternative syntax (similar to how stackoverflow does it), or you should escape all HTML except for specific, whitelisted tags.
Note that it's pretty easy to not do this correctly so I would use a third-party solution that has been appropriately tested for security.

Ultimately, the output is in your own hands when you will be inserting it into the database, a time you need to make sure that you strip away anything malicious. The simplest way will be to probaly use htmlentites against such data, however, there are other ways bad guys can bypass that. Here is a nice script also implemented by popular Kohana php framework for its input class against the possible XSS attacks:
http://svn.bitflux.ch/repos/public/popoon/trunk/classes/externalinput.php

I have encountered similar situations, and I have started using HTMLPurifier on my PHP backend which will prevent every attack vector I can think of. It is easy to install, and will allow you to whitelist the elements and attributes. It also prevents the XSS attacks that could still exist whilst using htmlentities.

How to safely allow embed content?

I run a website (sorta like a social network) that I wrote myself. I allow the members to send comments to each other. In the comment; i take the comment and then call this line before saving it in db..
$com = htmlentities($com);
When I want to display it; I call this piece of code..
$com = html_entity_decode($com);
This works out well most of the time. It allows the users to copy/paste youtube/imeem embed code and send each other videos and songs. It also allows them to upload images to photobucket and copy/paste the embed code to send picture comments.
The problem I have is that some people are basically putting in javascript code there as well that tends to do nasty stuff such as open up alert boxes, change location of webpage and things like that.. I am trying to find a good solution to solving this problem once and for all.. How do other sites allow this kind of functionality?
Thanks for your feedback

First: htmlentities or just htmlspecialchars should be used for escaping strings that you embed into HTML. You shouldn't use it for escaping string when you insert them into a SQL query - Use mysql_real_escape_string (For MySql) or better yet - use prepared statements, which have bound parameters. Make sure that magic_quotes are turned off or disabled otherwise, when you manually escape strings.
Second: You don't unescape strings when you pull them out again. Eg. there is no mysql_real_unescape_string. And you shouldn't use stripslashes either - If you find that you need, then you probably have magic_quotes turned on - turn them off instead, and fix the data in the database before proceeding.
Third: What you're doing with html_entity_decode completely nullifies the intended use of htmlentities. Right now, you have absolutely no protection against a malicious user injecting code into your site (You're vulnerable to cross site scripting aka. XSS). Strings that you embed into a HTML context, should be escaped with htmlspecialchars (or htmlentities). If you absolutely have to embed HTML into your page, you have to run it through a cleaning-solution first. strip_tags does this - in theory - but in practise it's very inadequate. The best solution I currently know of, is HtmlPurifier. However, whatever you do, it is always a risk to let random user embed code into your site. If at all possible, try to design your application such that it isn't needed.

I so hope you are scrubbing the data before you send it to the database. It sounds like you are a prime target for a SQl injection attack. I know this is not your question, but it is something that you need to be aware of.

Yes, this is a problem. A lot of sites solve it by only allowing their own custom markup in user fields.
But if you really want to allow HTML, you'll need to scrub out all "script" tags. I believe there are libraries available that do this. But that should be sufficient to prevent JS execution in user-entered code.

This is how Stackoverflow does it, I think, over at RefacterMyCode.

You may want to consider Zend Filter, it offers a lot more than strip_tags and you do not have to include the entire Zend Framework to use it.

We Keep Coding

PHP, A popular general-purpose scripting language that is especially suited to web development.