I am using CKEditor in my site to let the users post their comments. CKEditor has many buttons to compose the comment. Suppose If a User makes his comment bold and italic Such Like
This is comment
And CKEditor will ouput the following html
<i><strong>This is comment</strong></i>
Now, If I store this html in the mysql database and output on the webpage as it is, without wrapping it with htmlspecialchars(), then The Comment will be shown on the page bold and italic and this is what I want.
But on the other hand If I wrap the comment with htmlspecialchars() and displays it on the webpage it will be shown as
<i><strong>This is comment</strong></i>
But I do not want to show like this, I want the user formatting. But If I do not wrap it with htmlspecialchars(), it is risky and it can cause XSS Attack and other security risks.
How Can I Achieve both Purposes
(1). Keep the User Formatting
(2). Also Secure the HTML Contents
You need to draw up a whitelist of what elements and attributes you want to allow your users to include (eg allow <strong> but not <script>; allow <a href> but not <div onmouseover>), and then enforce it by parsing the input, removing all elements and attributes that don't fit your pattern, and serialising the results back into HTML.
This is a hard job that cannot be done with a few simple regexes or strip_tags (which is NOT an adequate solution for XSS even if it did fit your needs). You would be well advised to use an existing library to do it - HTML Purifier is one such for PHP.
i think you are looking for strip_tags. it will remove all the html and php tags from the string and only allow the given tags like <strong><i> etc
<?php
$str = "<i><strong>this is a comment<strong></i><script>here is script</script>";
echo $str = strip_tags($str,"<i><strong>");
?>
php.net documentation for strip_tags
strip_tags function has option to allow or disallow tags. use php.net for more reference about strip tags. You must strip unwanted or not allowed tags. if you don't then it might be vunerable by javascripts too.
Use htmlspecialchars while u are storing and use htmlspecialchars_decode while you are displaying. This will help you to keep format of user formated content
Two options spring to mind. First of all you can strip out all HTML and use a BB code parser to allow the user to post BB tags, rather than HTML - http://php.net/manual/en/book.bbcode.php
Secondly, you could strip out all HTML except a few tags. I don't know of any parser that does that personally, however I have seen it in action on sites before (Murphy's law I can't find any right now). You should be able to achieve this with a sophisticated enough RegEx replacement check.
Use this before printing it back on screen:
function html_escape($raw_input)
{
return htmlspecialchars($raw_input, ENT_QUOTES | ENT_HTML401, 'UTF-8');
}
Related
I have input box in which user entered the string like
"/> <img src=xxx onError=alert('test is here')
but at the time of I have used strip_tags function before saving the value into the database. It igoners the image tag but the string "/> is saved in the database as it is.
How can I overcome with this.
To be honest here, there's not one go to solution unfortunately.
The strip_tags function works good on well formatted HTML and you example is not a valid one.
One of your options is to write a custom code that "cleans" the input depending on its nature. For example, if the input should collect someones age, strip anything that's not a digit. You can do the same for names, phones, etc. etc.
Of course, we as a developers, can't foreseen all possible non-sense that an user can enter (on purpose or not) and sometimes we end up with such data in the DB. That's why it's always a good idea to escape data before printing it in the HTML. All of the frameworks and template engines out there are already doing it for you. If you're not using a framework you can use htmlentities function - http://php.net/manual/en/function.htmlentities.php.
The htmlentities would make any HTML reserved characters save and won't break you page. For example:
htmlentities("/> <img src=xxx onError=alert('test is here')");
would result in:
/> <img src=xxx onError=alert('test is here')
And once rendered via the browser that would look like:
I use Markdown for provide a simple way for write posts to my users in my forum script.
I'm trying to sanitize every user inputs, but I've a problem with Markdown's inputs.
I need to store in database the markdown text, not the HTML converted version, because users are allowed to edit their posts.
Basically I need something like what StackOverflow does.
I read this article about XSS vulnerability of Markdown. And the only solution I found is to use HTML_purifier before every output my script provides.
I think this can slowdown my script, I imagine output of 20 posts and running HTML_purifier for each one...
So I was trying to find a solution for sanitize from XSS vulnerabilities sanitizing the input instead of the output.
I can't run HTML_purifier on the input because my text is Markdown, not HTML. And if I convert it for get HTML I can't convert back for turn into Markdown.
I already remove (I hope) all HTML code with:
htmlspecialchars(strip_tags($text));
I've thinked about another solution:
When an user is trying to submit a new post:
Convert the input from Markdown to HTML, run HTML_purifier, and if it find some XSS injection it simply return an error.
But I don't know how to make this nor I know if HTML_purifier allows it.
I've found lot of questions about the same problem there, but all solutions was to store the input as HTML. I need to store as Markdown.
Someone has any advice?
Run Markdown on the input
Run HTML Purifier on the HTML generated by Markdown. Configure it so it allows links, href attributes and so on (it should still strip javascript: commands)
// the nasty stuff :)
$content = "> hello <a name=\"n\" \n href=\"javascript:alert('xss')\">*you*</a>";
require '/path/to/markdown.php';
// at this point, the generated HTML is vulnerable to XSS
$content = Markdown($content);
require '/path/to//HTMLPurifier/HTMLPurifier.auto.php';
$config = HTMLPurifier_Config::createDefault();
$config->set('Core.Encoding', 'UTF-8');
$config->set('HTML.Doctype', 'XHTML 1.0 Transitional');
$config->set('Cache.DefinitionImpl', null);
// put here every tag and attribute that you want to pass through
$config->set('HTML.Allowed', 'a[href|title],blockquote[cite]');
$purifier = new HTMLPurifier($config);
// here, the javascript command is stripped off
$content = $purifier->purify($content);
print $content;
Solved...
$text = "> hello <a name=\"n\"
> href=\"javascript:alert('xss')\">*you*</a>";
$text = strip_tags($text);
$text = Markdown($text);
echo $text;
It return:
<blockquote>
<p>hello href="javascript:alert('xss')"><em>you</em></p>
</blockquote>
And not:
<blockquote>
<p>hello <a name="n" href="javascript:alert('xss')"><em>you</em></a></p>
</blockquote>
So seems that strip_tags() does it works.
Merged with:
$text = preg_replace('/href=(\"|)javascript:/', "", $text);
The entire input should be sanitized from XSS injections. Correct me if I'm wrong.
The html output of your markdown depends only on the md parser, so you can
convert your md to html, and sanitize the html after that like described here:
Escape from XSS vulnerability maintaining Markdown syntax?
or you can modify your md parser to check every param which goes to html attribute for signs of xss. Ofc you should escape for html tags before parsing. I think this solution is much faster than the other, because by simple texts you should usually check only urls by images and links.
For example, a snippet of 50 characters. Problem is, of course, closing any opened tags. What's a good way to do this? Or else to make things easier, what's a good way to completely skim off all HTML content from the snippet?
You can strip out all HTML tags, etc. via the strip_tags() function, which is (being realistic) probably the best way to go, as otherwise you'll most likely end up with more tags than actual content.
For example:
$first50Chars = substr(trim(strip_tags($longString)), 0, 50);
If tags are generally allowed in the text (I mean, if, for example, text contains <b>, text must be marked with bold, etc), then looks like strip_tags() function is the easiest variant to remove tags from snippet.
If tags are generally not allowed in the text (for example, "<b>" must be just displayed as "<b>"), then you can use htmlentities() function.
I have data that is coming in from a rss feed. I want to be safe and use htmlentities but then again if I use it if there is html code in there the page is full of code and content. I don't mind the formatting the rss offers and would be glad to use it as long as I can display it safely. I'm after the content of the feed but also want it to format decently too (if there is a break tag or paragraph or div) Anyone know a way?
Do you want to protect from XSS in the feed? If so, you'll need an HTML sanitizer to run on the HTML prior to displaying it:
HTMLSanitizer
HTMLPurifier
If you just want to escape whatever is there, just call htmlspecialchars() on it. But any HTML will appear as escaped text...
You can use the strip_tags tags function and specify the allowed tags in there:
echo strip_tags($content, '<p><a>');
This way any tag not specified in allowed tags will be removed.
You can transform the HTML into mark down and then back up again using various libraries.
I googled a lot, for those kind of problems have been asked a lot in the past. But I didn't find anything to match my needs.
I have a html formatted text from a form. Just like this:
Hey, I am just some kind of <strong>formatted</strong> text!
Now, I want to strip all html tags, that I don't allow. PHP's built-in strip_tags() Method does that very well.
But I want to go a step further: I want to allow some Tags only inside or not inside of other tags. I also want to define my own XML Tags.
Another example:
I am a custom xml tag: <book><strong>Hello!</strong></book>. Ok... <strong>Hi!</strong>
Now, I want the <strong/> inside of <book/> to be stripped, but the <strong>Hi!</strong> can stay the way it is.
So, I want to define some rules of what I allow or don't allow, and want to have any filter do the rest.
Is there any easy way to do that? Regexp aren't what I'm looking for, for they can't parse html properly.
Regards, Jan Oliver
Don't think there is such a thing, I think not even HTML Purifier does that.
I suggest you parse the XHTML by hand using something like Simple HTML Dom.
Use a second argument to strip_tags, which is allowable tags.
$text = strip_tags($text, '<book><myxml:tag>');
I don't think there's a way to only strip certain tags if they're not inside other tags, without using regex.
Also, regex aren't not good at parsing HTML, but it's slow compared to the options. But that's not what you're doing here, anyways. You're going through the string and removing things you don't want. And for your complex requirement I think your only option is to use regex.
To be completely honest I think you should decide which tags are allowable and which aren't. Whether or not they are inside of other tags shouldn't matter at all. It's markup, not a script.
The second argument shows that you cal allow some tags:
string strip_tags ( string $str [, string $allowable_tags ] )
From php.net
I wrote my own Filter class based on the DOM classes of PHP. Look here: XHTMLFilter class