I have a user form with a textarea that allows users to submit html formatted data. The html itself is limited by PHP strip_tags, but of course that does no completion checking etc.
My basic problem is that should a user leave a tag unclosed, such as the <a> tag, then all the content following that, including page content that follows that is 'outside' the user content display area, could now be malformed.
Checking for proper tag completion is one solution I will look at, but ideally I'd like to firewall the user htmlified content away from the rest of the site somehow.
Use HTML Purifier. Very thorough and easy-to-use standalone plugin. It makes sure all markup is valid XHTML and also prevents XSS attacks.
I would recommend saving two copies of the user's HTML input in your database. One copy would be the raw form that they submitted which you can use for when they edit their page later, and the second would be that sanitized by HTML Purifier which you display on output. Storing the sanitized version is much faster than runing HTML Purifier on every page load.
The only way to achieve complete isolation would be to use an iframe.
The other solution would be to limit the html tags users could employ. Limiting users to paragraph and inline tags (string, em, a, etc.) would ensure that you could wrap all of the content in a div tag and not have to worry about open tags.
Just use some function for completing unclosed tags.
This can help you:
http://concepts.waetech.com/unclosed_tags/
Related
I got a textarea where the user can write an article. The article can contain text (bold and italic), links and youtube videos. How do I allow those certain html tags and still post secure xss-preventing code?
I would use HTMLPurifier, to ensure that you only keep HTML
That is valid
and only contains tags and attributes you've choosen to allow
I should add that PHP provides the strip_tags() function, but it's not that good (quoting) :
Because strip_tags() does not actually validate the HTML, partial or
broken tags can result in the removal of more text/data than expected.
This function does not modify any attributes on the tags that
you allow using allowable_tags, including the style and onmouseover
attributes that a mischievous user may abuse when posting text that
will be shown to other users.
If you are looking for real XSS protection I suggest to use HTMLPurifier. Doing it yourself is pretty hard if not impossible to do. And is bound to have mistakes ( / holes) in it.
All,
I am building a small site using PHP. In the site, I receive user-generated text content. I want to allow some safe HTML tag (e.g., formatting) as well as MathML. How do I go about compiling a white list for a strip_tags() function? Is there a well accepted white list I can use?
The standard strip_tags function is not enough for security, since it doesn't validate attributes at all. Use a more complete library explicitly for the purpose of completely sanitizing HTML like HTML Purifier.
If your aim is to not allow javascript through, then your whitelist of tags is going to be pretty close to the empty set.
Remember that pretty much all tags can have event attributes that contain javascript code to be executed when the specified event occurs.
If you don't want to go down the HTMLPurifier kind of route, consider a different language, such as markdown (that this site uses) or some other wiki-like markup language; however, be sure to disable any use of passthrough HTML that may be allowed.
I run a niche social network site. I would like to disallow HTML content in user posted messages; such as embedded videos etc. what option is there in php to clean this up before I insert into the db.
There are three basic solutions:
Strip all HTML tags from the post. In PHP you can do this using the strip_tags() function.
Encode all the characters, so that if a user types <b>hello</b> it shows up as <b>hello</b> in the HTML, or <b>hello</b> on the page itself. In PHP this is the htmlspecialchars() function. (Note: in this situation you would generally store the content in the database as-is, and use htmlspecialchars wherever you output the content.)
Use a HTML sanitizer such as HTML Purifier. This allows users to use certain HTML formatting such as bold/italic, but blocks malicious Javascript and any other tags you wish (i.e. <object> in your case). You may or may not wish to do this before storing in the database, but you must always do it before output in either case.
You could use the strip_tags() function.
Lets assume we have a user form that generates HTML input, and the following could be an example of what gets POSTed to PHP.
<p>Hello</p>
<p><strong>World</strong></p>
Now, these will show up later on via injected to the HTML output, into some DIV.
What I'd like to prevent is the following being entered in:
</div>
<p>Hello</p>
<p><strong>World</strong></p>
<div>
Or even something like:
</div>
<script> someScript(); </script>
<iframe src="http://www.example.com">......
<p>Hello</p>
<p><strong>World</strong></p>
<div>
How can I use PHP to determine that this input will not break the document, include bad iframes, or run scripts? The most importat part is I still want that information, I'm not throwing it out, but it needs to be included as harmless text of some sort.
Using alternative markup is not an option, it needs to be HTML.
what you need is htmlpurifier
Not only it outputs html according to standars but it cleans the posted code from xss vulnerabilities.
Edit 1: you should also check the comparison out , its interesting:)
Edit 2: you can also check out htmlspecialchars and htmlentities
but imo htmlpurifier is far better and much more customizable, when it comes to more complex things, like yours.
If you want to keep the broken tags but render them harmless, I'd suggest saving it twice. Save the unmodified post data into one database column, and the Purified into another. Display the Purified version usually, and the dangerous version only when you need to.
Somewhere on the HTML Purifier support forums there's an example of how to change text to <span>text (dangerous.url.or.javascript)</span>. This may be the sort of thing you're looking for when you say you want to keep the information, not throw it out.
HTML Purifier is highly customisable, and the author, Ambush Commander, is very helpful both on the HTML Purifier forum and here at StackOverflow.
I am developing a php-based web application in which there is a text area within which user can type whatever he/she wants and the content later gets displayed on another page after being stored in a database. The scenario is that the user can type in HTML tags. But as far as functionality constraints are concerned, I wish to allow the user to execute some tags such as <a>, <div> etc., leaving the rest of the tags to be displayed as plaintext.
I had previously pasted this question:
Prevent HTML data from being posted into form textboxes
But it answered only the ways such as strip_tags() and htmlspecialchars() which either stripped the html content completely displaying the remaining plaintext or displayed everything as plaintext with no option for adding any tag as exception, respectively. Please help. Cheers.
You can look at HTML Purifier. This is a library specially designed for this.
It seems it can handle any form of xss attack. See also the comparison page.
as told in the last post , strip_tags() is the answer, if you bothered to read the manual page for strip_tags() ,you will see you can tell it what tags to allow, which is exactly what you want.
Check the documentation for strip_tags and you'll see that the second (optional) argument accepted is an array of allowable tags.
Edit: Misunderstood it. Never mind D: More sleep is needed methinks. Looks like you should just run a htmlspecialchars and reconvert the required tags back with a regex
Get PHP's translation table and strip out the ones you don't want, then call strtr();
$table = get_html_translation_table(HTML_SPECIALCHARS);
$table['allowed_tag'] = "";
$table['another_allowed_tag'] = "";
strtr($str, $table);
I haven't tested but it should work.