This question already has answers here:
How can I sanitize user input with PHP?
(16 answers)
Closed 8 years ago.
For global safely, is it safe to to use htmlspecialchars or striptags when user POST or GET in php ?
for example, htmlspecialchars any post and get that sent by request and save that to the database
For displaying purposes you could just use htmlspecialchars() or htmlentities() to ward of the common XSS attacks.
It is not suggested to strip_tags() the data (unless it is really neccessary) , because that may lose all formatting if the user had provided any.
I would do sanity-checks depending on what you're expecting to get.
A good reading (like always) is the OWASP cheat-sheet: https://www.owasp.org/index.php/PHP_Security_Cheat_Sheet#XSS_Cheat_Sheet
If you're expecting plain text, always use htmlspecialchars() when showing it by the web-client. Some template-engines, like Twig, already do that by default. For this case, I wouldn't do any checks when saving to the database, because you may need to encode it differently for another client later - and you expect it to be plain-text, right?
If the user has an RTE and can make use of HTML, I'd use strip_tags() or a method like used in other frameworks. An example is http://svn.openfoundry.org/wowsecmodules/trunk/filter/RemoveXSS.php. TYPO3 also has a pretty good one that you can view by downloading the package and looking into typo3/contrib/RemoveXSS/RemoveXSS.php
A workaround would be to use stuff like BB-Code or Markdown, handled as plain-text, that is later compiled to HTML in your code, but this mostly confuses the editor, if he isn't used to stuff like that.
What I do not recommend at all, but it's possible is to let the browser do the job - see XSS Basic Understanding
EDIT:
The two libs, I linked here for removing XSS from HTML-data, are both based on the same one, but have been forked into different projects and the communities applied fixes and so on. The goal of this method is like yours, even so I do not support it, because it sounds like a one-size-fits-all solution:
Usage: Run *every* variable passed in through it.
* The goal of this function is to be a generic function that can be used to
* parse almost any input and render it XSS safe. ...
Why I am against running this method on every input-variable? You do not think about what you really want to get. Maybe you just want plain-text ... In this case, as I wrote earlier here, you don't need to do that, but just use htmlspecialchars() when showing it in an HTML context.
Related
Following on from a question I asked about escaping content when building a custom cms I wanted to find out how dangerous not escaping content from the db can be - assume the data ha been filtered/validated prior to insertion in the db.
I know it's a best practice to escape output but I'm just not sure how easy or even possible it is for someone to 'inject' a value into page content that is to be displayed.
For example let's assume this content with HTML markup is displayed using a simple echo statement:
<p>hello</p>
Admittedly it won't win any awards as far as content writing goes ;)
My question is can someone alter that for evil purposes assuming filtered/validated prior to db insertion?
Always escape for the appropriate context; it doesn't matter if it's JSON or XML/HTML or CSV or SQL (although you should be using placeholders for SQL and a library for JSON), etc.
Why? Because it's consistent. And being consistent is also a form of being lazy: you don't need to ponder if the data is "safe for HTML" because it shouldn't matter. And being lazy (in a good way) is a valuable programming trait. (In this case it's also being lazy about avoiding having to fix "bugs" due to changes in the future.)
Don't omit escaping "because it will never contain data that needs to be escaped" .. because, one day, over a course of a number of situations, that assumption will be wrong.
If you do not escape your HTML output, one could simply insert scripts into the HTML code of your page - running in the browser of every client that visits your page. It is called Cross-site scripting (XSS).
For example:
<p>hello</p><script>alert('I could run any other Javascript code here!');</script>
In the place of the alert(), you can use basically anything: access cookies, manipulate the DOM, communicate with other servers, et cetera.
Well, this is a very easy way of inserting scripts, and strip_tags can protect against this one. But there are hundreds of more sophisticated tricks, that strip_tags simply won't protect against.
If you really want to store and output HTML, HTMLPurifier could be your solution:
Hackers have a huge arsenal of XSS vectors hidden within the depths of
the HTML specification. HTML Purifier is effective because it
decomposes the whole document into tokens and removing non-whitelisted
elements, checking the well-formedness and nesting of tags, and
validating all attributes according to their RFCs. HTML Purifier's
comprehensive algorithms are complemented by a breadth of knowledge,
ensuring that richly formatted documents pass through unstripped.
It could be, for example, also problem linked with some other vulnerabilities like e.g. sql injection. Then someone would b e able to ommit filtering/validation prior adding to db and display whatever he can.
If you are pulling the word hello from the database and displaying it nothing will happen. If the content contains the <script> tags though then it is dangerous because a users cookies can be stolen then and used to hijack their session.
I'm planning to use Markdown syntax in my web page. I will keep users input (raw, no escaping or whatever) in the database and then, as usual, print out and escape on-the-fly with htmlspecialchars().
This is how it could look:
echo markdown(htmlspecialchars($content));
By doing that I'm protected from XSS vulnerabilities and Markdown works. Or, at least, kinda work.
The problem is, lets say, > syntax (there are other cases too, I think).
In short, to quote you do something like this:
> This is my quote.
After escaping and parsing to Markdown I get this:
> This is my quote.
Naturally, Markdown parser do not recognize > as “quote's symbol” and it does not work! :(
I came here to ask for solutions to this problem. One idea was to:
First, parse to Markdown, — then with HTML Purifier remove “bad parts”.
What do you think about it? Would it actually work?
I'm sure that someone had have the same situation and the one can help me too. :)
Yes, a certain website has that exact same situation. At the time I'm writing this, you have 1664 reputation on that website :)
On Stack Overflow, we do exactly what you describe (except that we don't render on the fly). The user-entered Markdown source is converted to plain HTML, and the result is then sanitized using a whitelist approach (JavaScript version, C# version part 1, part 2).
That's the same approach that HTML Purifier takes (having never used it, I can't speak for details though).
The approach you are using is not secure. Consider, for instance, this example: "[clickme](javascript:alert%28%22xss%22%29)". In general, don't escape the input to the Markdown processor. Instead, use Markdown properly in a safe mode, or apply HTML Purifier or another HTML sanitizer to the output of the Markdown processor.
I've written elsewhere about how to use Markdown securely. See the link for details about how to use it safely, but the short version is: it is important to use the latest version, to set safe_mode, and to set enable_attributes=False.
Hoping this isn't a duplicate, I couldn't find an original question on the topic. If you have an area for users to input data, how do you store and retrieve the data without them inserting javascript or html?
As an example, say a user is making a forum post. They decide to write an html list or javascript function that runs when the post is viewed. How do you mitigate this when you receive their input on the server-side? Specifically a server'side of PHP.
Remove parts of their string data based on patterns?
Use an html tag around their entry like ?
Thanks
All you have to do, going for the bare minimum, is replace < with <.
I use HTML Purifier to strip out the bits I don't want and leave in the bits I do. The default rules are pretty good, but it offers enormous flexibility if you need it.
You have to remove or translate the offending parts of their post. You can do it once as the post is coming in, and save the translated post in the database, or you can do it every time you display the post, and store the raw post in the database. Both approaches have their good and bad points.
As to how to strip the bad stuff, using simple matching to replace all < and > with < and > goes a long way -- but there's plenty more to do besides that.
There are lots of tutorials out there on preventing code injections. Microsoft's is pretty comprehensive found here.
For html injects depending on how thorough you want to be you can usually just put in a string parser to check for <> and remove them without given exceptions.
In my site's administration area, I have been using mysqli_real_escape_string when retrieving form input that goes into the database. It works fine but I realize that it does not prevent script injections. I mean I can pass through scripts like:
<script>alert('hello');</script>
What do I use in addition to this to prevent a malicious admin from injecting some nasty stuff?
htmlentities()?
strip_tags()?
htmlspecialchars()?
What is the proper way to sanitize form input in back-end forms where html is not required for input data? I am confused?
htmlentities() and htmlspecialchars() are used when you're outputting data. Encoding and escaping are different.
If you don't want HTML, my recommendation would be to use strip_tags() to clean it of any HTML tags and use html* when you're outputting the content.
Also, you might consider switching to MySQL PDO. This is a much more preferred and secure way of running your queries.
The term you are looking for is Cross Site Scripting or XSS for short. Searching for that should give you plenty of resources, such as this question right here on StackOverflow.
The proper answer is highly dependent on your application.
Many administration systems need a way for admins to manipulate HTML. But some HTML is more dangerous than others.
As JohnP said, strip_tags() can be handy, since the second parameter allows you to explicitly allow certain, harmless tags (like or ), while stripping out anything else (like or )
If you need more sophistication than that, you'll need to do a more careful analysis and come up with a solution tailored to your needs. (Hint: If that solution involves using regular expressions to match HTML tags, you probably want to take a step back)
You should use htmlentities() .
You can use magic_quotes function to sanitize if you're using php 4 or less php 5.2 or less.
I run a website (sorta like a social network) that I wrote myself. I allow the members to send comments to each other. In the comment; i take the comment and then call this line before saving it in db..
$com = htmlentities($com);
When I want to display it; I call this piece of code..
$com = html_entity_decode($com);
This works out well most of the time. It allows the users to copy/paste youtube/imeem embed code and send each other videos and songs. It also allows them to upload images to photobucket and copy/paste the embed code to send picture comments.
The problem I have is that some people are basically putting in javascript code there as well that tends to do nasty stuff such as open up alert boxes, change location of webpage and things like that.. I am trying to find a good solution to solving this problem once and for all.. How do other sites allow this kind of functionality?
Thanks for your feedback
First: htmlentities or just htmlspecialchars should be used for escaping strings that you embed into HTML. You shouldn't use it for escaping string when you insert them into a SQL query - Use mysql_real_escape_string (For MySql) or better yet - use prepared statements, which have bound parameters. Make sure that magic_quotes are turned off or disabled otherwise, when you manually escape strings.
Second: You don't unescape strings when you pull them out again. Eg. there is no mysql_real_unescape_string. And you shouldn't use stripslashes either - If you find that you need, then you probably have magic_quotes turned on - turn them off instead, and fix the data in the database before proceeding.
Third: What you're doing with html_entity_decode completely nullifies the intended use of htmlentities. Right now, you have absolutely no protection against a malicious user injecting code into your site (You're vulnerable to cross site scripting aka. XSS). Strings that you embed into a HTML context, should be escaped with htmlspecialchars (or htmlentities). If you absolutely have to embed HTML into your page, you have to run it through a cleaning-solution first. strip_tags does this - in theory - but in practise it's very inadequate. The best solution I currently know of, is HtmlPurifier. However, whatever you do, it is always a risk to let random user embed code into your site. If at all possible, try to design your application such that it isn't needed.
I so hope you are scrubbing the data before you send it to the database. It sounds like you are a prime target for a SQl injection attack. I know this is not your question, but it is something that you need to be aware of.
Yes, this is a problem. A lot of sites solve it by only allowing their own custom markup in user fields.
But if you really want to allow HTML, you'll need to scrub out all "script" tags. I believe there are libraries available that do this. But that should be sufficient to prevent JS execution in user-entered code.
This is how Stackoverflow does it, I think, over at RefacterMyCode.
You may want to consider Zend Filter, it offers a lot more than strip_tags and you do not have to include the entire Zend Framework to use it.