This question already has answers here:
How to display raw HTML code on an HTML page
(30 answers)
Any suitable plugin is available in aloha editor to add html contents for data projects
Closed 8 years ago.
I have a forum where any user can write the articles. The forum is powered with aloha editor for user input.It has one page for editing and another page for display.But the problem arises when user trying to input html code.
Suppose a user write a header tag
`<h1>header</h1>`
it is outputting
Header
Insted of that i want to output
<h1>Header</h1>
Any sugections?.
When user submits the article, you need to use htmlentities (PHP htmlentities) before rendering it back.
You should escape the html code, for example you should use:
<h1>Header</h1>
Instead of
<h1>Header</h1>
Im sorry but Im afraid my answer will not be the best, I know there are some better techniques to attain this one (using what #chiwangc says) but if you ever stuck you can use what I am thingking.
Just place the html code that you want to output in a 'textarea' (you may want to disable it) like
<textarea><h1>Header</h1></textarea>
Note! This was just your alternative but if it satisfies you, then cheers.
Hope it helps.
I dont know what is aloha editor, but you absolutely should check user input for security-break attempts (js-includes, sql-injects, etc...). For example it may be a function like this one, which i used in one of my projects:
function filterMsg($m){
$m=trim( stripslashes( urldecode($m) ) );
$m=htmlentities( $m, ENT_QUOTES | ENT_SUBSTITUTE, 'UTF-8' );
$m=preg_replace( '/[\\s\\t]+/iu', ' ', $m );
$m=preg_replace( '/[\\s\\|]{2,999}/iu', '|', $m);
$m=preg_replace( '/([^\\|]{1,40})[\\|]+/iu', '$1 ', $m);
$m=str_replace( '|', '<br>', $m);
$m=str_replace( "\n", '<br>', $m);
return $m;
}
But if you simply wanna to show unparsed content you can use <xmp></xmp> html tags.
Related
Fetch data from mysql.
Then echo with echo htmlspecialchars( $content['MainText'] , ENT_NOQUOTES, "UTF-8");.
Tried to echo without htmlspecialchars javascript like <script> alert('Hello');</script> and saw pop up box. So it is not good to echo without htmlspecialchars.
But with htmlspecialchars can not correctly display hyperlinks and images and also <h2>, <span> etc. Also not acceptable.
At the moment tried to replace some characters like
$content_main_text_modified =
str_replace(
array( '<br/>', '<br>', '</a>', ">", '<a', '<div', '<img', '</div', '<h2', '</h2', 'amp;amp;', '<span', '</span' ),
array( '<br/>', '<br>', '</a>', ">", '<a', '<div', '<img', '</div', '<h2', '</h2', '', '<span', '</span' ),
( htmlspecialchars( $content['MainText'] , ENT_NOQUOTES, "UTF-8") )
);
echo $content_main_text_modified;
My idea is not to replace sole < with <, but instead to replace < together br, a, <h2. So if in mysql would be like <script> it would not execute.
Want to check (get opions) if my idea is safe idea? And possibly some recommendations.
I think a different idea would be to stop it being stored that way in the first place so that you can just echo it. What you are referring to is an xss attack where someone can enter JavaScript that can then be executed on another users browser take a look at this link for more detailed information about xss click here.
As for a way to remove it what I would do is some form of validation on the imputed data there are so many ways to do this I would suggest reading the link above and that will give you an idea how to stop it and mean you can then just do a simple echo. Doing validation like this will also help to prevent sql injection attacks although that will require some more work.
This wont work every time and some people also suggest that you use htmlspecialchars but when you are working with html this causes issues as you know, you just have to make your best attempt no system can stop everything.
Not knowing exactly what your are doing it is impossible to say but you might find it useful to use some kind of template engine so the HTML is sepperate from the code and you can use the function htmlspecialchars() as you can then just pass out text to the template.
Take a look at http://htmlpurifier.org/ and the HTML.Allowed directive; where you can set tags that are allowed.
Use strip_tags($content['MainText'], '<a><h2><div><span><br><img>');
Or you can use htmlspecialchars then use this preg_replace('#<(/?(?:a|h2|div|span|br|img))>#', '<\1>', $html);, for example:
$content_main_text_modified = htmlspecialchars($content['MainText']);
$content_main_text_modified = reg_replace('#<(/?(?:a|h2|div|span|br|img))>#', '<$1>', $content_main_text_modified);
http://au1.php.net/manual/en/function.strip-tags.php
This question already has answers here:
Converting HTML to plain text in PHP for e-mail
(14 answers)
Closed 8 years ago.
The task: taking HTML page and keeping only text from it with formatting available for simple text: so if there was <br> tag I'd like to convert it to /r/n, if there was a table - I'd like to keep the initial structure of this table in the resulting text and so on.
There are built-in PHP function strip_tags() which is not really fits my requirements as it will keep the contents of styles and scripts and will not keep the formatting deleting <br>, <table> and other tags.
I also have read the stack question 'strip html,css from string' but there's no answer I'm looking for.
Essentially I'm looking for a way to render an HTML page to TXT file (with no links and images). Is it possible? Is there any libraries doing this thing?
One thing you can do with this is, you can do a reverse Markdown. There are a lot of implementation of HTML to Markdown, which does the job you want. They just convert the HTML to text, including the breaks, etc.
One such implementation is html2markdown. It uses NodeJS and you just need to add this:
html2markdown("<h1>Hello markdown!</h1>")
At least, this will strip the tags and give you the result as text, that can be easily markdown-stripped, coz it has less number of characters, say #s and ---s.
There is also one more implementation of html2markdown in PHP in GitHub. The syntax is again simple:
$html = "<h3>Quick, to the Batpoles!</h3>";
$markdown = new HTML_To_Markdown($html);
And this returns you with:
echo $markdown; // ==> ### Quick, to the Batpoles!
This plugin has an ability to strip the tags too:
$html = '<span>Turnips!</span>';
$markdown = new HTML_To_Markdown($html, array('strip_tags' => true)); // $markdown now contains "Turnips!"
Hi i am using ckeditor plugin to beautify the text given by the user.It was working properly but now i try to increase security to my website so that i used htmlentities() function in all places where echo is used.
The problem is while displaying a text output from ckeditor are shown as html tags in my website because of the effect of htmlentities() i used.This is the output i am getting in my website,
<p><strong><span style="color:#008080">Superhero</span></strong></p>
So the look of website is damaged.I want to show the ckeditor text as it is.But htmlentities()
must have to be used.
I searched stack overflow and found many issues related to this.So i used the following solution in my ckeditor/config.js page as below,
config.entities = false;
config.basicEntities = false;
config.entities_greek = false;
config.entities_latin = false;
But its not working in my code.
Thanks in advance!
Well, as far as I am aware there is no in-built way in php to distinguish between malicious injected script tags and normal html tags.
This leads to problem where you want to block malicious script, but not valid html tags.
When I have to accept user input and display again which may contain html tags, instead of using htmlentities I use htmlpurifier. There is another one I am aware of is safeHtml.
However, there might be better solutions then this and I am also interested in knowing as well. Unfortunately haven't came across one.
I'm using Mediawiki api in order to get content from Wikipedia pages.
I've written a code which generates the next query (for example):
http://en.wikipedia.org/w/api.php?action=query&prop=revisions&rvprop=content&rvsection=0&titles=hawaii
Which retrieves only the leading paragraph from the Wikipdia page about Hawaii.
The problem is that as you might notice there are a lot of irrelevant substrings such as:
"[[Molokai|Moloka{{okina}}i]], [[Lanai|Lāna{{okina}}i]], [[Kahoolawe|Kaho{{okina}}olawe]], [[Maui]] and the [[Hawaii (island)|".
All those barckets [[]] are not relevant , and I wonder whether there is an alegant method to pull only 'clean' content from such pages?
Thanks in advance.
You can get a clean HTML text from Wikipedia with this query:
https://en.wikipedia.org/w/api.php?action=query&prop=extracts&titles=hawaii
If you want just a plain text, without HTML, try this:
https://en.wikipedia.org/w/api.php?action=query&prop=extracts&titles=hawaii&explaintext
please try this:
$relevant = preg_replace('/[[.*?]]/', '', $string);
EDIT: just found this - hope it is helpful
This question already has answers here:
Closed 11 years ago.
Possible Duplicate:
How to parse and process HTML with PHP?
How do I go about pulling specific content from a given live online HTML page?
For example: http://www.gumtree.com/p/for-sale/ovation-semi-acoustic-guitar/93991967
I want to retrieve the text description, the path to the main image and the price only. So basically, I want to retrieve content which is inside specific divs with maybe specific IDs or classes inside a html page.
Psuedo code
$page = load_html_contents('http://www.gumtr..');
$price = getPrice($page);
$description = getDescription($page);
$title = getTitle($page);
Please note I do not intend to steal any content from gumtree, or anywhere else for that matter, I am just providing an example.
First of all, what u wanna do, is called WEBSCRAPING.
Basically, u load into the html content into one variable, so u will need to use regexps to search for specific ids..etc.
Search after webscraping.
HERE is a basic tutorial
THIS book should be useful too.
something like this would be a good starting point if you wanted tabular output
$raw=file_get_contents($url) or die('could not select');
$newlines=array("\t","\n","\r","\x20\x20","\0","\x0B","<br/>");
$content=str_replace($newlines, "", html_entity_decode($raw));
$start=strpos($content,'<some id> ');
$end = strpos($content,'</ending id>');
$table = substr($content,$start,$end-$start);
preg_match_all("|<tr(.*)</tr>|U",$table,$rows);
foreach ($rows[0] as $row){
if ((strpos($row,'<th')===false)){
// array to vars
preg_match_all("|<td(.*)</td>|U",$row,$cells);
$var1= strip_tags($cells[0][0]);
$var2= strip_tags($cells[0][1]);
etc etc
The tutorial Easy web scraping with PHP recommended by robotrobert is good to start, I have made several comments in it. For a better performance use curl. Among other things handles HTTP headers, SSL, cookies, proxies, etc. Cookies is something that you must pay attention.
I just found HTML Parsing and Screen Scraping with the Simple HTML DOM Library. Is more advanced, facilitates and speed up the page parsing through a DOM parser (instead regular expressions --enough hard to master and resources consuming). I recommend you this last one 100%.