Confused on htmlspecialchars, real_escape_string, etc

Confused on htmlspecialchars, real_escape_string, etc - php

I've written a decent admin interface that includes inventory management, content management, and blogging. Now its time to lock it down and make it secure (Yes, I should have been doing it from the beginning...
For blog creation/editing, I'm using ckeditor which posts HTML output to editblog.php. Also i'm using simple text inputs for Title, Author, etc...
I'm concerned because the blog will have img src="uploads/etc.jpg", as well as divs, spans, etc...
SO! When I sanitize this data, how do I make sure that all those quotes and slashes can be safely shoved into my SQL database, and what do i do to spit it back out on the frontend? I'm also concerned because if the blogger "quotes" something, I don't want that to be messed with either.
Simple input like title, author, etc I'm using $title = mysqli_real_escape_string($title)
But is that enough? How do I preserve the user's intended input while avoiding attack?
I've done my research and yet I still don't get it. I hope someone can break it down nice and simple for me...

Nice and simple...
You always sanitize for the context to which you want to write.
These techniques will preserve the user's input, but prevent that input from being interpreted as code within a specific context.
When you want to query the database, you are worried about SQL injection attacks:
Use mysql_real_escape_string to sanitize SQL for the database query.
When you want to display something (as HTML) that will be parsed by the browser, you are worried about cross site scripting:
Use htmlspecialchars to sanitize for HTML output.
This will provide a basic level of security.
For more security on the database side, you should look at prepared statements and PHP PDO.
For more information about some pitfalls of htmlspecialchars, take a look at #Cheekysoft's excellent explaination: htmlspecialchars and mysql_real_escape_string

Related

How dangerous is it to output certain content without escaping it first

Following on from a question I asked about escaping content when building a custom cms I wanted to find out how dangerous not escaping content from the db can be - assume the data ha been filtered/validated prior to insertion in the db.
I know it's a best practice to escape output but I'm just not sure how easy or even possible it is for someone to 'inject' a value into page content that is to be displayed.
For example let's assume this content with HTML markup is displayed using a simple echo statement:
<p>hello</p>
Admittedly it won't win any awards as far as content writing goes ;)
My question is can someone alter that for evil purposes assuming filtered/validated prior to db insertion?

Always escape for the appropriate context; it doesn't matter if it's JSON or XML/HTML or CSV or SQL (although you should be using placeholders for SQL and a library for JSON), etc.
Why? Because it's consistent. And being consistent is also a form of being lazy: you don't need to ponder if the data is "safe for HTML" because it shouldn't matter. And being lazy (in a good way) is a valuable programming trait. (In this case it's also being lazy about avoiding having to fix "bugs" due to changes in the future.)
Don't omit escaping "because it will never contain data that needs to be escaped" .. because, one day, over a course of a number of situations, that assumption will be wrong.

If you do not escape your HTML output, one could simply insert scripts into the HTML code of your page - running in the browser of every client that visits your page. It is called Cross-site scripting (XSS).
For example:
<p>hello</p><script>alert('I could run any other Javascript code here!');</script>
In the place of the alert(), you can use basically anything: access cookies, manipulate the DOM, communicate with other servers, et cetera.
Well, this is a very easy way of inserting scripts, and strip_tags can protect against this one. But there are hundreds of more sophisticated tricks, that strip_tags simply won't protect against.
If you really want to store and output HTML, HTMLPurifier could be your solution:
Hackers have a huge arsenal of XSS vectors hidden within the depths of
the HTML specification. HTML Purifier is effective because it
decomposes the whole document into tokens and removing non-whitelisted
elements, checking the well-formedness and nesting of tags, and
validating all attributes according to their RFCs. HTML Purifier's
comprehensive algorithms are complemented by a breadth of knowledge,
ensuring that richly formatted documents pass through unstripped.

It could be, for example, also problem linked with some other vulnerabilities like e.g. sql injection. Then someone would b e able to ommit filtering/validation prior adding to db and display whatever he can.

If you are pulling the word hello from the database and displaying it nothing will happen. If the content contains the <script> tags though then it is dangerous because a users cookies can be stolen then and used to hijack their session.

What is a function that will allow output with HTML and avoid XSS attacks

I am looking for a way or function that will allow me to display data from my mySQL database. The users are allowed to post articles, that I use mysql_real_escape_string to avoid SQL injections before inserting their post in the DB.
For my testing pursposes I write in a text area my post with tags like <b> <a> <i> <li>.
Later I will use an editor like this one here on Stackoverflow to help users with their posts.
However, I am aware of XSS and just echoing straight from the DB may lead to XSS attacks. So, I choosed for my tests to output the content with htmlentities or htmlspecialchars. None of them will show me the post correctly with html.
Therefore, I used strip tags but as far as I know and read, is not safe.
What is a function that you may use too, that will let me output the data correctly, just like this and prevent XSS?

If you want to display html correctly you should print plain html as you get it.
But for avoiding XSS try to remove javascript tags and don't allow load images from external resources.

Sanitizing Form Input for administrators

In my site's administration area, I have been using mysqli_real_escape_string when retrieving form input that goes into the database. It works fine but I realize that it does not prevent script injections. I mean I can pass through scripts like:
<script>alert('hello');</script>
What do I use in addition to this to prevent a malicious admin from injecting some nasty stuff?
htmlentities()?
strip_tags()?
htmlspecialchars()?
What is the proper way to sanitize form input in back-end forms where html is not required for input data? I am confused?

htmlentities() and htmlspecialchars() are used when you're outputting data. Encoding and escaping are different.
If you don't want HTML, my recommendation would be to use strip_tags() to clean it of any HTML tags and use html* when you're outputting the content.
Also, you might consider switching to MySQL PDO. This is a much more preferred and secure way of running your queries.

The term you are looking for is Cross Site Scripting or XSS for short. Searching for that should give you plenty of resources, such as this question right here on StackOverflow.

The proper answer is highly dependent on your application.
Many administration systems need a way for admins to manipulate HTML. But some HTML is more dangerous than others.
As JohnP said, strip_tags() can be handy, since the second parameter allows you to explicitly allow certain, harmless tags (like or ), while stripping out anything else (like or )
If you need more sophistication than that, you'll need to do a more careful analysis and come up with a solution tailored to your needs. (Hint: If that solution involves using regular expressions to match HTML tags, you probably want to take a step back)

You should use htmlentities() .

You can use magic_quotes function to sanitize if you're using php 4 or less php 5.2 or less.

WYSIWYG editor security question (preventing malicious input)

I'm using jWYSIWYG in a form I'm creating that posts to a database and was wondering how you can prevent a malicious user from trying to inject code in the frame?
Doesn't the editor need brackets (which I'd normally strip during the post process) in order to display styles?

If the editor allows arbitrary HTML, you're fighting a losing battle since users could simply use the editor to craft their malicious content.
If the editor only allows for a subset of markup, then it should use an alternative syntax (similar to how stackoverflow does it), or you should escape all HTML except for specific, whitelisted tags.
Note that it's pretty easy to not do this correctly so I would use a third-party solution that has been appropriately tested for security.

Ultimately, the output is in your own hands when you will be inserting it into the database, a time you need to make sure that you strip away anything malicious. The simplest way will be to probaly use htmlentites against such data, however, there are other ways bad guys can bypass that. Here is a nice script also implemented by popular Kohana php framework for its input class against the possible XSS attacks:
http://svn.bitflux.ch/repos/public/popoon/trunk/classes/externalinput.php

I have encountered similar situations, and I have started using HTMLPurifier on my PHP backend which will prevent every attack vector I can think of. It is easy to install, and will allow you to whitelist the elements and attributes. It also prevents the XSS attacks that could still exist whilst using htmlentities.

How to safely allow embed content?

I run a website (sorta like a social network) that I wrote myself. I allow the members to send comments to each other. In the comment; i take the comment and then call this line before saving it in db..
$com = htmlentities($com);
When I want to display it; I call this piece of code..
$com = html_entity_decode($com);
This works out well most of the time. It allows the users to copy/paste youtube/imeem embed code and send each other videos and songs. It also allows them to upload images to photobucket and copy/paste the embed code to send picture comments.
The problem I have is that some people are basically putting in javascript code there as well that tends to do nasty stuff such as open up alert boxes, change location of webpage and things like that.. I am trying to find a good solution to solving this problem once and for all.. How do other sites allow this kind of functionality?
Thanks for your feedback

First: htmlentities or just htmlspecialchars should be used for escaping strings that you embed into HTML. You shouldn't use it for escaping string when you insert them into a SQL query - Use mysql_real_escape_string (For MySql) or better yet - use prepared statements, which have bound parameters. Make sure that magic_quotes are turned off or disabled otherwise, when you manually escape strings.
Second: You don't unescape strings when you pull them out again. Eg. there is no mysql_real_unescape_string. And you shouldn't use stripslashes either - If you find that you need, then you probably have magic_quotes turned on - turn them off instead, and fix the data in the database before proceeding.
Third: What you're doing with html_entity_decode completely nullifies the intended use of htmlentities. Right now, you have absolutely no protection against a malicious user injecting code into your site (You're vulnerable to cross site scripting aka. XSS). Strings that you embed into a HTML context, should be escaped with htmlspecialchars (or htmlentities). If you absolutely have to embed HTML into your page, you have to run it through a cleaning-solution first. strip_tags does this - in theory - but in practise it's very inadequate. The best solution I currently know of, is HtmlPurifier. However, whatever you do, it is always a risk to let random user embed code into your site. If at all possible, try to design your application such that it isn't needed.

I so hope you are scrubbing the data before you send it to the database. It sounds like you are a prime target for a SQl injection attack. I know this is not your question, but it is something that you need to be aware of.

Yes, this is a problem. A lot of sites solve it by only allowing their own custom markup in user fields.
But if you really want to allow HTML, you'll need to scrub out all "script" tags. I believe there are libraries available that do this. But that should be sufficient to prevent JS execution in user-entered code.

This is how Stackoverflow does it, I think, over at RefacterMyCode.

You may want to consider Zend Filter, it offers a lot more than strip_tags and you do not have to include the entire Zend Framework to use it.

We Keep Coding

PHP, A popular general-purpose scripting language that is especially suited to web development.

Confused on htmlspecialchars, real_escape_string, etc - php

Related

How dangerous is it to output certain content without escaping it first

What is a function that will allow output with HTML and avoid XSS attacks

Sanitizing Form Input for administrators

WYSIWYG editor security question (preventing malicious input)

How to safely allow embed content?

Categories

Resources