Get plain text from ckeditor - php

I want to know if it's possible to get plain text (text without html code) when I submit my form having a ckeditor textarea.
In fact, I want to have a simple textarea with the spellchecker option of ckeditor.
p.s. I am using vtiger 6.

It's possible, and given that you're using PHP a decent-ish solution would be HTML Purifier. Assuming you can install it via PEAR (which is the most straightforward way to do it), your example would look like:
require_once 'HTMLPurifier.auto.php';
$config = HTMLPurifier_Config::createDefault();
$config->set('HTML', 'Allowed', ''); // Allow Nothing
$purifier = new HTMLPurifier($config);
return $purifier->purify($_REQUEST['field_name']);
You're going to want to remove most of the editor buttons and other options from CKeditor as well to prevent your users from adding a lot of formatting that won't survive filtration.
However:
Using the full CKEditor stack for spellcheck is overkill! Why not just use a standalone, jQuery spellchecker like this one?

You can use the PHP function that strips HTML tags from a string, strip_tags:
$plaintext = strip_tags($_POST['mytexteditor']);
You can also allow certain tags:
$plaintext_with_ps = strip_tags($_POST['mytexteditor'], '<p>');
Don't attempt to use it as a security measure, however.

Related

How to properly escape HTML editor content corretly?

So I am using TinyMCE editor and have handled getting the content in the text area by using htmlspecialchars() which works fine, but I'm a little confused on the other side of using an WYSIWYG editor... The content output part.
I am using HTML Purifier to output the content, but from what I understand I've just been doing for example:
$purifierConfig = HTMLPurifier_Config::createDefault();
$purifierConfig->set('HTML.Allowed', 'p');
$Purifier = new HTMLPurifier($purifierConfig);
$input = $Purifier->purify($input);
I've only tested with the p tags, but does this mean I am going to have to go through everything TinyMCE uses and add it in as what is allowed? Or is there a better way of tackling this problem with safe output of an WYSIWYG editor?
Yes, you need to set all allowed tags you want, separated by a comma. You can also specify what attributes are allowed by enclosing them with brackets:
$purifierConfig = HTMLPurifier_Config::createDefault();
$purifierConfig->set('HTML.Allowed', 'p,a[href],b,i,strong,em');
$Purifier = new HTMLPurifier($purifierConfig);
$input = $Purifier->purify($input);
I guess for a better understanding, the printDefinition can help.

html visual editing like whatsapp in php

recent version of whatsapp introduced little bit of styling the message, suppose we want to write something like this
input This is a ~statement~ which has styling in it
output This is a statement which has styling in it
even stackoverflow has this kind of minimal styling which gives great look, we want to implement this in our platform where teachers while giving remark to student can give ol, ul, bold, italic but we also want to make sure they are not allowed to use traditional html tags because when tag changes we have to make changes instead we like the approach where you can add special character with word and turn them way you want in the output.
I don't know what is the specific terms for this type of editing so please ignore it.
language since our platform is already running in php we would like to implement that in php
thought process we thought it might be possible with regex but don't know how to implement ol, ul and we are not very sure if that is a correct method to implement
why not allowing traditional html tags
Not all of them know traditional html tags
want to keep our application secure
Take a look at this GitHub library
Here are some examples:
// traditional markdown and parse full text
$parser = new \cebe\markdown\Markdown();
$parser->parse($markdown);
// use github markdown
$parser = new \cebe\markdown\GithubMarkdown();
$parser->parse($markdown);
// use markdown extra
$parser = new \cebe\markdown\MarkdownExtra();
$parser->parse($markdown);
// parse only inline elements (useful for one-line descriptions)
$parser = new \cebe\markdown\GithubMarkdown();
$parser->parseParagraph($markdown);
You can use regular expressions like this:
/~([\w]*)~/
With preg_replace() function you can replace the content between ~ symbols with all you need. For example:
https://regex101.com/r/vD8wI4/2
Note the substitution tab, where I replace ~text~ with <pre>text</pre>.
The same technique applicable to bold, italic, etc:
Bold:
/\*([\w]*)\*/
Italic:
/_([\w]*)_/
Etc.
Good luck.

How to prevent dangrous HTML input with PHP using Summernote?

I recently discovered Summernote and it seems it is a decent application, although I have stumbled upon a problem.
You are able to when you go into your source code add malicious HTML code like for example:
<plaintext>
<script>
So how can I prevent that using PHP? I do want users to be able to use certain style tags like for example:
<h1>
<p>
Which the editor uses automatically.
I know I can go ahead and use str_replace() to check if the string has any of the malicious HTML in it, but I figured there must be an easier way to do it.
Normally, the problem here is that you're using text in the context of HTML without escaping all the reserved entities properly, which can lead to the injection of arbitrary HTML like you describe. htmlspecialchars() is the normal solution for this problem.
However, you want to support HTML, but don't really want to support all of it. Therefore, you need a different solution entirely. HTML Purifier is one solution that does what you want. It parses the data and only passes through white-listed tags. From their documentation:
require_once '/path/to/HTMLPurifier.auto.php';
$config = HTMLPurifier_Config::createDefault();
$purifier = new HTMLPurifier($config);
$clean_html = $purifier->purify($dirty_html);

How to set the value of hyperlink in a PHP Variable? [duplicate]

I am looking at getting the plain text from html. Which one should I choose, php strip_tags or simplehtmldom plaintext extraction?
One pro for simplehtmldom is support of invalid html, is that sufficient in itself?
strip_tags is sufficient for that.
Extracting text from HTML is tricky, so the best option is to use a library like Html2Text. It was built specifically for this purpose.
https://github.com/mtibben/html2text
Install using composer:
composer require html2text/html2text
Basic usage:
$html = new \Html2Text\Html2Text('Hello, "<b>world</b>"');
echo $html->getText(); // Hello, "WORLD"
You should probably use smiplehtmldom for the reason you mentioned and that strip_tags may also leave you non-text elements like javascript or css contained within script/style blocks
You would also be able to filter text from elements that aren't displayed (inline style=display:none)
That said, if the html is simple enough, then strip_tags may be faster and will accomplish the same task
If you just want a plain text rendering of a page then strip_tags is faster and simpler. If you want to do any manipulation of the text during that process, however, simplehtmldom is going to serve you better in the long run.
You may also want to remove slashes stripslashes()

Sanitize Markdown from XSS

I use Markdown for provide a simple way for write posts to my users in my forum script.
I'm trying to sanitize every user inputs, but I've a problem with Markdown's inputs.
I need to store in database the markdown text, not the HTML converted version, because users are allowed to edit their posts.
Basically I need something like what StackOverflow does.
I read this article about XSS vulnerability of Markdown. And the only solution I found is to use HTML_purifier before every output my script provides.
I think this can slowdown my script, I imagine output of 20 posts and running HTML_purifier for each one...
So I was trying to find a solution for sanitize from XSS vulnerabilities sanitizing the input instead of the output.
I can't run HTML_purifier on the input because my text is Markdown, not HTML. And if I convert it for get HTML I can't convert back for turn into Markdown.
I already remove (I hope) all HTML code with:
htmlspecialchars(strip_tags($text));
I've thinked about another solution:
When an user is trying to submit a new post:
Convert the input from Markdown to HTML, run HTML_purifier, and if it find some XSS injection it simply return an error.
But I don't know how to make this nor I know if HTML_purifier allows it.
I've found lot of questions about the same problem there, but all solutions was to store the input as HTML. I need to store as Markdown.
Someone has any advice?
Run Markdown on the input
Run HTML Purifier on the HTML generated by Markdown. Configure it so it allows links, href attributes and so on (it should still strip javascript: commands)
// the nasty stuff :)
$content = "> hello <a name=\"n\" \n href=\"javascript:alert('xss')\">*you*</a>";
require '/path/to/markdown.php';
// at this point, the generated HTML is vulnerable to XSS
$content = Markdown($content);
require '/path/to//HTMLPurifier/HTMLPurifier.auto.php';
$config = HTMLPurifier_Config::createDefault();
$config->set('Core.Encoding', 'UTF-8');
$config->set('HTML.Doctype', 'XHTML 1.0 Transitional');
$config->set('Cache.DefinitionImpl', null);
// put here every tag and attribute that you want to pass through
$config->set('HTML.Allowed', 'a[href|title],blockquote[cite]');
$purifier = new HTMLPurifier($config);
// here, the javascript command is stripped off
$content = $purifier->purify($content);
print $content;
Solved...
$text = "> hello <a name=\"n\"
> href=\"javascript:alert('xss')\">*you*</a>";
$text = strip_tags($text);
$text = Markdown($text);
echo $text;
It return:
<blockquote>
<p>hello href="javascript:alert('xss')"><em>you</em></p>
</blockquote>
And not:
<blockquote>
<p>hello <a name="n" href="javascript:alert('xss')"><em>you</em></a></p>
</blockquote>
So seems that strip_tags() does it works.
Merged with:
$text = preg_replace('/href=(\"|)javascript:/', "", $text);
The entire input should be sanitized from XSS injections. Correct me if I'm wrong.
The html output of your markdown depends only on the md parser, so you can
convert your md to html, and sanitize the html after that like described here:
Escape from XSS vulnerability maintaining Markdown syntax?
or you can modify your md parser to check every param which goes to html attribute for signs of xss. Ofc you should escape for html tags before parsing. I think this solution is much faster than the other, because by simple texts you should usually check only urls by images and links.

Categories