I am using CKEditor to let the user post their comments. I thought to use the htmlpurifier to secure my html. But when I tried it, it actually removes all the formatting done by CKEditor.
The CKEditor generated the following html
<div class="originalpost"><span style="color:#B22222;">
<em><u><strong><span style="font-size:250%;">
This is Pakistan</span></strong></u></em></span></div>
After purifying with htmlpurifier the html became like this
<div class=""originalpost""><span><em><u><strong>
<span>This is Pakistan</span></strong></u></em></span></div>
It actually removes all the inline css styles and also class=""originalpost"" is not understand able.
I have used the following way to purify the html with htmlpurifier
require_once("path\HTMLPurifier.auto.php");
$config = HTMLPurifier_Config::createDefault();
$purifier = new HTMLPurifier($config);
$html = "xyzhtml";
$clean_html = $purifier->purify($html);
I want to keep the user formatting, How can I configure htmlpurifier to keep the user formatting also don't change the inline css.
It actually removes all the inline css styles
Inline styles are indeed dangerous - JavaScript can be injected into them using url(), IE's dodgy expression() and browser-specific behavioural extensions.
HTMLPurifier can parse inline styles and filter the dangerous properties and values. You can turn this on by including style in your whitelisted attributes.
$config->set('HTML.AllowedAttributes', '*.style, ...');
style is not included in the default attribute list because parsing styles is a lot of extra complexity (with accompanying chance of bugs) and most applications don't need it.
You can configure the properties that are permitted using %CSS.AllowedProperties if you wish.
I can't reproduce the " problem but certainly ensuring PHP's magic_quotes_gpc option is turned off is an all-round good thing...
I bet that you need to turn off Sybase quotes.
Related
I'm using HTML Purifier in my project.
My html is something like this. (containing simple html element + script + iframe)
<p>content...<p>
<iframe></iframe>
<script>alert('abc');</script>
<p>content2</p>
With default config, it turned into this
<p>content...</p>
<p></p>
<p>Content2</p>
But if I set the config like this...
$config->set('HTML.Trusted', true);
$config->set('HTML.SafeIframe', true);
I got this
<p>content...</p>
<p>
<iframe></iframe>
<script type="text/javascript"><!--//--><![CDATA[//><!--
alert('abc');
//--><!]]></script>
</p>
<p>content2</p>
Is there anyway to use HTML Purifier to completely remove 'script' tag but preserve 'iframe' tag? Or other alternative to HTML Purifier?
I've tried
$config->set('Filter.YouTube', true);
$config->set('URI.SafeIframeRegexp', '%^https://(www.youtube.com/embed/|player.vimeo.com/video/)%');
But it turned out that the 'script' tag still there.
[edited]
full example.
$config = HTMLPurifier_Config::createDefault();
$html = "<p>content...<p><iframe ...></iframe><script>alert('abc');</script><p>content2</p>";
$config->set(
'HTML.ForbiddenElements',
'script'
);
$purifier = new HTMLPurifier($config);
$clean_html = $purifier->purify($html);
Result
<p>content...</p><p></p><p>content2</p>
You were half on the right track. If you set HTML.SafeIframe to true and URI.SafeIframeRegexp to the URLs you want to accept (%^https://(www.youtube.com/embed/|player.vimeo.com/video/)% works fine), an input example of:
<p>content...<p>
<iframe src="https://www.youtube.com/embed/blep"></iframe>
<script>alert('abc');</script>
<p>content2</p>
...turns into...
<p>content...</p><p>
<iframe src="https://www.youtube.com/embed/blep"></iframe>
</p><p>content2</p>
Explanation: HTML.SafeIframe allows the <iframe> tag, but HTML Purifier still expects a whitelist for the URLs that the iframe can contain, since otherwise an <iframe> opens too much malicious potential. URI.SafeIframeRegexp supplies the whitelist (in the form of a regex that needs to be matched).
See if that works for you!
Code
This is the code that made the transformation I just mentioned:
$dirty = '<p>content...<p>
<iframe src="https://www.youtube.com/embed/blep"></iframe>
<script>alert(\'abc\');</script>
<p>content2</p>';
$config = HTMLPurifier_Config::createDefault();
$config->set('HTML.SafeIframe', true);
$config->set('URI.SafeIframeRegexp', '%^https://(www.youtube.com/embed/|player.vimeo.com/video/)%');
$purifier = new HTMLPurifier($config);
$clean = $purifier->purify($dirty);
Regarding HTML.Trusted
I implore you to never set HTML.Trusted to true if you don't fully trust each and every one of the people submitting the HTML.
Amongst other things, it allows forms in your input HTML to survive the purification unmolested, which (if you're purifying for a website, which I assume you are) makes phishing attacks trivial. It allows your input to use style tags which survive unscathed. There are some things it will still strip (any HTML tag that HTML Purifier doesn't actually know anything about, i.e. most HTML5 tags being some of them, various JavaScript attribute handlers as well), but there are enough attack vectors that you might as well not be purifying if you use this directive. As Ambush Commander once put it:
You shouldn't be using %HTML.Trusted anyway; it really ought to be named %HTML.Unsafe or something.
Consider using a full-fledged HTML parser like Masterminds html5-php. HTML code would then be parsed without undesired alterations like wrapping IFRAME in P, and you would be able to manipulate the resulting DOM tree the way you want, including removing some elements while keeping other ones.
For example, the following code could be used for removing SCRIPT elements from the document:
foreach ($dom->getElementsByTagName('script') as $script) {
$script->parentNode->removeChild($script);
}
And note that code like this:
<script type="text/javascript"><!--//--><![CDATA[//><!--
alert('abc');
//--><!]]></script>`
is obsolete. The modern HTML5 equivalent code is :
<script>alert('abc');</script>
exactly as in your source code before being processed by HTML Purifier.
I am using htmlpurifier. I have some doubts which are as below.
1- My config file contain
$config->set('HTML.Trusted' ,true);
$config->set('CSS.Trusted', true);
But a simple Google is landing me to pages where there are recommendation not to use *.Trusted as "true".
I am not able to understand why should we should not set *.Trusted to true? Can you please explain me. Because if I remove it than I wont get inline css? Even CSS.AllowTricky is not helping.
2- I found that HTML5 and CSS3 selectors are not allowed.
like the code at htmlpurifier/library/HTMLPurifier/Filter/ExtractStyleBlocks.php
// - No Unicode support
// - No escapes support
// - No string support (by proxy no attrib support)
// - element_name is matched against allowed
// elements (some people might find this
// annoying...)
// - Pseudo-elements one of :first-child, :link,
// :visited, :active, :hover, :focus
// handle ruleset
$selectors = array_map('trim', explode(',', $selector));
$new_selectors = array();
foreach ($selectors as $sel) {
//some code to filter css selectors
}
do not contain any code which can allow selectors like '[class*="grid-"]'. Hence all such css is getting removed after purification. Is ther some way to allow all CSS3?
3- Is there some way to allow all HTML 5 tags? for example if we have html like
<section class="mainhead">
<div class="subhead"> </div> </section>
then purifier removes and due to which some css like
.mainhead .subhead { //some css}
wont work.
Setting HTML.Trusted to true enables unsafe elements, such as script and forms. If you, for example, want to allow forms, but don't want to allow scripts, just add them to HTML.ForbiddenElements:
$config->set('HTML.Trusted', true);
$config->set('HTML.ForbiddenElements', ['script']);
CSS3 selectors are not supported - there is a pending Pull Request, but it has been inactive for more than a year (as of Aug 2019). I don't think this will change any soon.
As for HTML5 support - you can use an extension package https://github.com/xemlock/htmlpurifier-html5 (which I'm the author of), which adds spec compliant definitions of HTML5 elements. It's usage is almost the same the bare HTML Purifier - you just have to replace HTMLPurifier_Config with HTMLPurifier_HTML5Config.
Does anyone know how to enable GetHtml in xinha editor?
I tried to implement this:
GetHtml is a replacement for the getHTML() function in htmlarea.js. It
offers several improvements over the original, including:
Produces valid XHTML code
Formats code in HTML view in indented, readable format.
Much faster than HTMLArea.getHTML()
Eliminates many hacks to accomodate browser quirks
Returns correct code for Flash objects and scripts
Preserves formatting inside script and pre tags
You can enable this by setting xinha_config.getHtmlMethod to "TransformInnerHTML"
But I didn't understand very well and this is my attempt:
var xinha_plugins =
[
'GetHtml','ExtendedFileManager','Linker','InsertSmiley'
];
I just added the plug-in to the plug-ins library above(like my code), but the plug-ins didn't work! I need to insert flash in xinha editor.
I am using htmlpurifier to clean up user content. I am trying to remove inline style attributes like
<div style="float:left">some text</div>
I want to remove the whole style attribute.
How to do it using htmlpurifier?
You can tweak the AllowedProperties configuration by passing it an array of valid css attributes that should not be removed (white-list approach).
However, the following should remove all css attributes
$config->set('CSS.AllowedProperties', array());
See this online demo of purifying your input html
I need some help writing an awesome class to take a style sheet, detect browser specific CSS3 rules, and add support for all compatible browsers. This way, we can just write our styles sheets for one browser and then process the CSS files when we are ready for production.
Here's my thoughts on the class so far:
class CssRewriter {
public function reformCss($file) {
// Get the CSS as a string
$cssString = file_get_contents($file);
// Use regex to capture all styles delimited by {...}
// Use regex to determine if the any of the captured styles are browser
// specific (starts with -moz, -webkit, etc)
// Determine which CSS3 rules are not present and add them to the style
// (so if you have -moz-linear-gradient, automatically add the webkit
// version)
}
}
Yikes. CSS parsers are not as easy as you imagine, man. Depending on regular expressions is just asking for one typo to be totally misinterpreted.
Not the answer you were looking for, but quite possibly a better one: have you considered using Sass and mixins? You're not the first to hit the issue of the repetitive nature of CSS, so someone else has already faced the challenge of a CSS pre-processor for you.
Your best bet would be to modify existing CSS parser like CSS Tidy and add in a additional logic to output backwards-compatible CSS.