How to prevent dangrous HTML input with PHP using Summernote? - php

I recently discovered Summernote and it seems it is a decent application, although I have stumbled upon a problem.
You are able to when you go into your source code add malicious HTML code like for example:
<plaintext>
<script>
So how can I prevent that using PHP? I do want users to be able to use certain style tags like for example:
<h1>
<p>
Which the editor uses automatically.
I know I can go ahead and use str_replace() to check if the string has any of the malicious HTML in it, but I figured there must be an easier way to do it.

Normally, the problem here is that you're using text in the context of HTML without escaping all the reserved entities properly, which can lead to the injection of arbitrary HTML like you describe. htmlspecialchars() is the normal solution for this problem.
However, you want to support HTML, but don't really want to support all of it. Therefore, you need a different solution entirely. HTML Purifier is one solution that does what you want. It parses the data and only passes through white-listed tags. From their documentation:
require_once '/path/to/HTMLPurifier.auto.php';
$config = HTMLPurifier_Config::createDefault();
$purifier = new HTMLPurifier($config);
$clean_html = $purifier->purify($dirty_html);

Related

Get plain text from ckeditor

I want to know if it's possible to get plain text (text without html code) when I submit my form having a ckeditor textarea.
In fact, I want to have a simple textarea with the spellchecker option of ckeditor.
p.s. I am using vtiger 6.
It's possible, and given that you're using PHP a decent-ish solution would be HTML Purifier. Assuming you can install it via PEAR (which is the most straightforward way to do it), your example would look like:
require_once 'HTMLPurifier.auto.php';
$config = HTMLPurifier_Config::createDefault();
$config->set('HTML', 'Allowed', ''); // Allow Nothing
$purifier = new HTMLPurifier($config);
return $purifier->purify($_REQUEST['field_name']);
You're going to want to remove most of the editor buttons and other options from CKeditor as well to prevent your users from adding a lot of formatting that won't survive filtration.
However:
Using the full CKEditor stack for spellcheck is overkill! Why not just use a standalone, jQuery spellchecker like this one?
You can use the PHP function that strips HTML tags from a string, strip_tags:
$plaintext = strip_tags($_POST['mytexteditor']);
You can also allow certain tags:
$plaintext_with_ps = strip_tags($_POST['mytexteditor'], '<p>');
Don't attempt to use it as a security measure, however.

Download text-only webpage

The question title says it all, after a bit of Googling and several days of tinkering with code, I cannot figure out how to download the plain text of a webpage.
Using strip_tags(); still leaves the JavaScript and CSS and trying to clean it up with regex also causes issues.
Is there any (simple or complicated) way to download a webpage (say a Wikipedia article) in plain-text using PHP?
I downloaded the page using PHP's file_get_contents(); as here:
$homepage = file_get_contents('http://www.example.com/');
As I said, I tried using strip_tags(); etc but I can't get the plain text.
I've tried using: http://millkencode.googlecode.com/svn/trunk/htmlxtractor/ContentExtractor.php to get the main content but it doesn't seem to work.
This is not nearly as easy as it seems. I'd recommend looking on something like PHP Simple HTML DOM Parser. Aside from JavaScript and CSS being hard to remove (and using RegEx for HTML is not proper) there could still be some inline styling there and stuff like that.
This, of course, is relative to the complexity of the HTML. strip_tags could be sufficient in some cases.
Use this code:
require_once('simple_html_dom.php');
$content=file_get_html('http://en.wikipedia.org/wiki/FYI');
$title=$content->find("#firstHeading",0)->plaintext ;
$text=$content->find("#bodyContent",0)->plaintext;
echo $title.$text;
http://simplehtmldom.sourceforge.net

How to Secure Data Submitted Through CKEditor

I am using CKEditor in my site to let the users post their comments. CKEditor has many buttons to compose the comment. Suppose If a User makes his comment bold and italic Such Like
This is comment
And CKEditor will ouput the following html
<i><strong>This is comment</strong></i>
Now, If I store this html in the mysql database and output on the webpage as it is, without wrapping it with htmlspecialchars(), then The Comment will be shown on the page bold and italic and this is what I want.
But on the other hand If I wrap the comment with htmlspecialchars() and displays it on the webpage it will be shown as
<i><strong>This is comment</strong></i>
But I do not want to show like this, I want the user formatting. But If I do not wrap it with htmlspecialchars(), it is risky and it can cause XSS Attack and other security risks.
How Can I Achieve both Purposes
(1). Keep the User Formatting
(2). Also Secure the HTML Contents
You need to draw up a whitelist of what elements and attributes you want to allow your users to include (eg allow <strong> but not <script>; allow <a href> but not <div onmouseover>), and then enforce it by parsing the input, removing all elements and attributes that don't fit your pattern, and serialising the results back into HTML.
This is a hard job that cannot be done with a few simple regexes or strip_tags (which is NOT an adequate solution for XSS even if it did fit your needs). You would be well advised to use an existing library to do it - HTML Purifier is one such for PHP.
i think you are looking for strip_tags. it will remove all the html and php tags from the string and only allow the given tags like <strong><i> etc
<?php
$str = "<i><strong>this is a comment<strong></i><script>here is script</script>";
echo $str = strip_tags($str,"<i><strong>");
?>
php.net documentation for strip_tags
strip_tags function has option to allow or disallow tags. use php.net for more reference about strip tags. You must strip unwanted or not allowed tags. if you don't then it might be vunerable by javascripts too.
Use htmlspecialchars while u are storing and use htmlspecialchars_decode while you are displaying. This will help you to keep format of user formated content
Two options spring to mind. First of all you can strip out all HTML and use a BB code parser to allow the user to post BB tags, rather than HTML - http://php.net/manual/en/book.bbcode.php
Secondly, you could strip out all HTML except a few tags. I don't know of any parser that does that personally, however I have seen it in action on sites before (Murphy's law I can't find any right now). You should be able to achieve this with a sophisticated enough RegEx replacement check.
Use this before printing it back on screen:
function html_escape($raw_input)
{
return htmlspecialchars($raw_input, ENT_QUOTES | ENT_HTML401, 'UTF-8');
}

Sanitize Markdown from XSS

I use Markdown for provide a simple way for write posts to my users in my forum script.
I'm trying to sanitize every user inputs, but I've a problem with Markdown's inputs.
I need to store in database the markdown text, not the HTML converted version, because users are allowed to edit their posts.
Basically I need something like what StackOverflow does.
I read this article about XSS vulnerability of Markdown. And the only solution I found is to use HTML_purifier before every output my script provides.
I think this can slowdown my script, I imagine output of 20 posts and running HTML_purifier for each one...
So I was trying to find a solution for sanitize from XSS vulnerabilities sanitizing the input instead of the output.
I can't run HTML_purifier on the input because my text is Markdown, not HTML. And if I convert it for get HTML I can't convert back for turn into Markdown.
I already remove (I hope) all HTML code with:
htmlspecialchars(strip_tags($text));
I've thinked about another solution:
When an user is trying to submit a new post:
Convert the input from Markdown to HTML, run HTML_purifier, and if it find some XSS injection it simply return an error.
But I don't know how to make this nor I know if HTML_purifier allows it.
I've found lot of questions about the same problem there, but all solutions was to store the input as HTML. I need to store as Markdown.
Someone has any advice?
Run Markdown on the input
Run HTML Purifier on the HTML generated by Markdown. Configure it so it allows links, href attributes and so on (it should still strip javascript: commands)
// the nasty stuff :)
$content = "> hello <a name=\"n\" \n href=\"javascript:alert('xss')\">*you*</a>";
require '/path/to/markdown.php';
// at this point, the generated HTML is vulnerable to XSS
$content = Markdown($content);
require '/path/to//HTMLPurifier/HTMLPurifier.auto.php';
$config = HTMLPurifier_Config::createDefault();
$config->set('Core.Encoding', 'UTF-8');
$config->set('HTML.Doctype', 'XHTML 1.0 Transitional');
$config->set('Cache.DefinitionImpl', null);
// put here every tag and attribute that you want to pass through
$config->set('HTML.Allowed', 'a[href|title],blockquote[cite]');
$purifier = new HTMLPurifier($config);
// here, the javascript command is stripped off
$content = $purifier->purify($content);
print $content;
Solved...
$text = "> hello <a name=\"n\"
> href=\"javascript:alert('xss')\">*you*</a>";
$text = strip_tags($text);
$text = Markdown($text);
echo $text;
It return:
<blockquote>
<p>hello href="javascript:alert('xss')"><em>you</em></p>
</blockquote>
And not:
<blockquote>
<p>hello <a name="n" href="javascript:alert('xss')"><em>you</em></a></p>
</blockquote>
So seems that strip_tags() does it works.
Merged with:
$text = preg_replace('/href=(\"|)javascript:/', "", $text);
The entire input should be sanitized from XSS injections. Correct me if I'm wrong.
The html output of your markdown depends only on the md parser, so you can
convert your md to html, and sanitize the html after that like described here:
Escape from XSS vulnerability maintaining Markdown syntax?
or you can modify your md parser to check every param which goes to html attribute for signs of xss. Ofc you should escape for html tags before parsing. I think this solution is much faster than the other, because by simple texts you should usually check only urls by images and links.

Remove XSS attacks while still allowing html?

Ok, now I have a dilemma, I need to allow users to insert raw HTML but also block out all JS - not just script tags but from the href etc. at the moment, all I know of is
htmlspecialchars($string, ENT_QUOTES, 'UTF-8');
But this also converts valid tags into encoded characters. If I use striptags, it also doesn't work as it removes tags! (I know that you can allow tags but the thing is if I allow any tags such as <a></a> people can add malicious JS to it.
Is there anything I can do to allow html tags but without the XSS injection? I have planned a function: xss() and have setup my site's template with that. It returns the escaped string. (I just need help with escaping :))
Thanks!
Related: Preventing XSS but still allowing some HTML in PHP
Example code from HTMLPurifier Docs:
require_once '/path/to/HTMLPurifier.auto.php';
$config = HTMLPurifier_Config::createDefault();
$purifier = new HTMLPurifier($config);
$clean_html = $purifier->purify($dirty_html);
You can use that code as a reference in your xss(...) method.

Categories