PHP validate HTML to prevent html/xss injections [duplicate] - php

This question already has answers here:
How to prevent XSS with HTML/PHP?
(9 answers)
Closed 2 years ago.
I need to implement a logic where the user sends some content (it can be just a string but also can be part of HTML markup), we store that info to a database, and at some period of time, we replace the base email template placeholder with that data and sending that email.
And there is a chance that data sent by the user can contain some HTML/XSS injections. How can we efficiently validate the data before storing it to the database???

Against XSS injection you can use htmlspecialchars in general, however, we know that you intend to allow HTML to be sent, so your validation will have to check against the presence of <script. If that's present in your input, then you should render it invalid. Now, there is another way of providing Javascript in HTML, that is, inline Javascript, being the values of onclick, onhover and so on. I would advise to make sure that, if such an event handler is present between the < and > of a tag, then simply render the input invalid.
Now, you have also mentioned HTML injection, that is, some HTML is injected which causes undesirable behavior. However, due to the fact that you welcome HTML in the input, distinguishing between "bad HTML injection" and "good HTML injection" can be decided by:
checking the validity of the html you get
checking against any problems that the HTML might cause in your application
The first criteria is easy to determine, read the link, the second criteria depends on business logic. That HTML might ruin your design, for example, if there are some expectations for it, so you need to lay down the foundations of what you expect in terms of HTML.
And also, since we are speaking about security, make sure you protect your database against SQL injection as well.

Related

XSS Clean for Gets and Posts [duplicate]

This question already has answers here:
How can I sanitize user input with PHP?
(16 answers)
Closed 8 years ago.
For global safely, is it safe to to use htmlspecialchars or striptags when user POST or GET in php ?
for example, htmlspecialchars any post and get that sent by request and save that to the database
For displaying purposes you could just use htmlspecialchars() or htmlentities() to ward of the common XSS attacks.
It is not suggested to strip_tags() the data (unless it is really neccessary) , because that may lose all formatting if the user had provided any.
I would do sanity-checks depending on what you're expecting to get.
A good reading (like always) is the OWASP cheat-sheet: https://www.owasp.org/index.php/PHP_Security_Cheat_Sheet#XSS_Cheat_Sheet
If you're expecting plain text, always use htmlspecialchars() when showing it by the web-client. Some template-engines, like Twig, already do that by default. For this case, I wouldn't do any checks when saving to the database, because you may need to encode it differently for another client later - and you expect it to be plain-text, right?
If the user has an RTE and can make use of HTML, I'd use strip_tags() or a method like used in other frameworks. An example is http://svn.openfoundry.org/wowsecmodules/trunk/filter/RemoveXSS.php. TYPO3 also has a pretty good one that you can view by downloading the package and looking into typo3/contrib/RemoveXSS/RemoveXSS.php
A workaround would be to use stuff like BB-Code or Markdown, handled as plain-text, that is later compiled to HTML in your code, but this mostly confuses the editor, if he isn't used to stuff like that.
What I do not recommend at all, but it's possible is to let the browser do the job - see XSS Basic Understanding
EDIT:
The two libs, I linked here for removing XSS from HTML-data, are both based on the same one, but have been forked into different projects and the communities applied fixes and so on. The goal of this method is like yours, even so I do not support it, because it sounds like a one-size-fits-all solution:
Usage: Run *every* variable passed in through it.
* The goal of this function is to be a generic function that can be used to
* parse almost any input and render it XSS safe. ...
Why I am against running this method on every input-variable? You do not think about what you really want to get. Maybe you just want plain-text ... In this case, as I wrote earlier here, you don't need to do that, but just use htmlspecialchars() when showing it in an HTML context.

How dangerous is it to output certain content without escaping it first

Following on from a question I asked about escaping content when building a custom cms I wanted to find out how dangerous not escaping content from the db can be - assume the data ha been filtered/validated prior to insertion in the db.
I know it's a best practice to escape output but I'm just not sure how easy or even possible it is for someone to 'inject' a value into page content that is to be displayed.
For example let's assume this content with HTML markup is displayed using a simple echo statement:
<p>hello</p>
Admittedly it won't win any awards as far as content writing goes ;)
My question is can someone alter that for evil purposes assuming filtered/validated prior to db insertion?
Always escape for the appropriate context; it doesn't matter if it's JSON or XML/HTML or CSV or SQL (although you should be using placeholders for SQL and a library for JSON), etc.
Why? Because it's consistent. And being consistent is also a form of being lazy: you don't need to ponder if the data is "safe for HTML" because it shouldn't matter. And being lazy (in a good way) is a valuable programming trait. (In this case it's also being lazy about avoiding having to fix "bugs" due to changes in the future.)
Don't omit escaping "because it will never contain data that needs to be escaped" .. because, one day, over a course of a number of situations, that assumption will be wrong.
If you do not escape your HTML output, one could simply insert scripts into the HTML code of your page - running in the browser of every client that visits your page. It is called Cross-site scripting (XSS).
For example:
<p>hello</p><script>alert('I could run any other Javascript code here!');</script>
In the place of the alert(), you can use basically anything: access cookies, manipulate the DOM, communicate with other servers, et cetera.
Well, this is a very easy way of inserting scripts, and strip_tags can protect against this one. But there are hundreds of more sophisticated tricks, that strip_tags simply won't protect against.
If you really want to store and output HTML, HTMLPurifier could be your solution:
Hackers have a huge arsenal of XSS vectors hidden within the depths of
the HTML specification. HTML Purifier is effective because it
decomposes the whole document into tokens and removing non-whitelisted
elements, checking the well-formedness and nesting of tags, and
validating all attributes according to their RFCs. HTML Purifier's
comprehensive algorithms are complemented by a breadth of knowledge,
ensuring that richly formatted documents pass through unstripped.
It could be, for example, also problem linked with some other vulnerabilities like e.g. sql injection. Then someone would b e able to ommit filtering/validation prior adding to db and display whatever he can.
If you are pulling the word hello from the database and displaying it nothing will happen. If the content contains the <script> tags though then it is dangerous because a users cookies can be stolen then and used to hijack their session.

Is htmlspecialchars required if you are not outputting html?

I have a script that registers users based on their user input. This uses prepared statements plus whitelists to prevent sql injection. But I am struggling to understand the prevention of XSS.
From what I understand, you only need to prevent XSS if you are outputting HTML onto the page? What does this mean???
Im guessing that with this register page it doesn't apply because I am not outputting HTML to the web page? Is that right?
If I was to prevent XSS, do I use htmlspecialchars?
Generally correct, if you are having any returned values show up on the page, or if you are inserting information into the database for later retrieval and display (like user profile information) you will want to use htmlspecialchars.
For me, when I do my user registration, if they fail to enter a correct value in an input field, I redisplay the page with the values they entered. In this case, I have it encoded with htmlspecialchars.
If at any point ever, you plan on redisplaying the information from the DB into a webpage (as mentioned with profiles and the like) you should use htmlspecialchars.
Better safe than sorry I always say - never trust user input
Basically, XSS happens when you are taking the user's input un-sanitized and display in your webpage.
For example: A user inputs
<script>alert('hello you are hacked');</script>
In a text box, and you show this in your webpage after it is registered like
Hello, $username
This suddenly gets turned into
Hello, <script>alert('hello you are hacked');</script>
This is one of the form of XSS
One of a effiecient way to prevent XSS is like this
echo htmlspecialchars($varname, ENT_QUOTES, 'UTF-8');
From what I understand, you only need to prevent XSS if you are
outputting HTML onto the page? What does this mean???
XSS is an attack carried out by the server outputting HTML (in practice, Javascript) to the client when it did not mean to do so (and obviously when that HTML was specially crafted and supplied by a hostile user).
Im guessing that with this register page it doesn't apply because I am
not outputting HTML to the web page? Is that right?
If you are not outputting anything that comes from user input you are safe.
If I was to prevent XSS, do I use htmlspecialchars?
Yes, that is sufficient.

Examples of XSS that I can use to test my page input?

I have had issues with XSS. Specifically I had an individual inject JS alert showing that the my input had vulnerabilities. I have done research on XSS and found examples but for some reason I can't get them to work.
Can I get example(s) of XSS that I can throw into my input and when I output it back to the user see some sort of change like an alert to know it's vulnerable?
I'm using PHP and I am going to implement htmlspecialchars() but I first am trying to reproduce these vulnerabilities.
Thanks!
You can use this firefox addon:
XSS Me
XSS-Me is the Exploit-Me tool used to test for reflected Cross-Site
Scripting (XSS). It does NOT currently test for stored XSS.
The
tool works by submitting your HTML forms and substituting the form
value with strings that are representative of an XSS attack. If the
resulting HTML page sets a specific JavaScript value
(document.vulnerable=true) then the tool marks the page as vulnerable
to the given XSS string. The tool does not attempting to compromise
the security of the given system. It looks for possible entry points
for an attack against the system. There is no port scanning, packet
sniffing, password hacking or firewall attacks done by the
tool.
You can think of the work done by the tool as the same as the
QA testers for the site manually entering all of these strings into
the form fields.
For example:
<script>alert("XSS")</script>
"><b>Bold</b>
'><u>Underlined</u>
It is very good to use some of the automated tools, however you won't gain any insight or experience from those.
The point of XSS attack is to execute javascript in a browser window, which is not supplied by the site. So first you must have a look in what context the user supplied data is printed on the website; it might be within <script></script> code block, it might be within <style></style> block, it might be used as an attribute of an element <input type="text" value="USER DATA" /> or for instance in a <textarea>. Depending on that you will see what syntax you will use to escape the context (or use it); for instance if you are within <script> tags, it might be sufficient to close parethesis of a function and end the line with semicolon, so the final injection will look like ); alert(555);. If the data supplied is used as an html attribute, the injection might look like " onclick="alert(1)" which will cause js execution if you click on the element (this area is rich to play with especially with html5).
The point is, the context of the xss is as much important as any filtering/sanatizing functions that might be in place, and often there might be small nuances which the automated tool will not catch. As you can see above even without quotes and html tags, in a limited number of circumstance you might be able to bypass the filters and execute js.
There also needs to be considered the browser encoding, for instance you might be able to bypass filters if the target browser has utf7 encoding (and you encode your injection that way). Filter evasion is a whole another story, however the current PHP functions are pretty bulletproof, if used correctly.
Also here is a long enough list of XSS vectors
As a last thing, here is an actual example of a XSS string that was found on a site, and I guarantee you that not a single scanner would've found that (there were various filters and word blacklists, the page allowed to insert basic html formatting to customize your profile page):
<a href="Boom"><font color=a"onmouseover=alert(document.cookie);"> XSS-Try ME</span></font>
Ad-hoc testing is OK, however I also recommend trying a web application vulnerability scanning tool to ensure you haven't missed anything.
acunetix is pretty good and has a free trial of their application:
http://www.acunetix.com/websitesecurity/xss.htm
(Note I have no affiliation with this company, however I have used the product to test my own applications).

Stop users from entering PHP code in textarea?

I have a PHP page with textareas that users can change, and their values get saved and displayed on another PHP page - I'm afraid this could be vulnerable to XSS attacks (or whatever malicious hackers are using today)... I see http://htmlpurifier.org is a nice solution to avoid XSS attacks, and I read in an SO thread that PHP code entered into a textarea is ignored by browsers and not executed server-side. I just want to know if htmlpurifier will protect my site fully and if there's any chance that old browsers like IE6 aren't smart enough to ignore PHP code like that. It's my first time making a complex site so I'm tip-toeing around the topic of security... Thanks :)
On a side note, I've used stripslashes and nl2br to avoid formatting issues with apostrophes and line breaks, but is there anything else I should be using to avoid unexpected display issues?
Just use htmlspecialchars() on output and the special characters no longer have their literal meaning and won't be processed by the browser.
PHP code itself will be ignored by the browser. The browser will think it is just some large weird <?php ... '?> element.
To answer your questions specifically...
No, you don't have to worry about the browser executing PHP code that a user has inputted. That's typically only something you have to worry about when you do "includes" inside php scripts, and even then, as long as you structure them properly, you have nothing to worry about. This is because PHP is interpreted server-side (on your webserver) rather than client-side (in the browser). Also, this type of attack would be more in-line with RFI or Code Injection (if you'd like some terms to google), rather than XSS.
Stripslashes can be useful for certain things (potentially with regards to SQL attacks, etc.) but isn't the main defense for XSS attacks.
With HTMLPurifier running by itself, you will be fine against XSS attacks (providing you configure it correctly, etc.)
That said, it's always best to filter user input against a whitelist rather than trying to blacklist 'bad' characters/input. What type of data do you want users to be able to input? Just regular text? BBCode + text? Html?
PHP code is server code. Browsers don't include a PHP interpreter so they won't execute it.

Categories