A secure commenting system for php - php

I want to implement a commenting system for my website. I looked around and found CKEditor to be the best WYSIWYG editor I found. I tried its bbcode output and it works perfectly. However if I use bbcode output, when I want to show the comments to the users, I should use a reliable parser to parse the bbcode to HTML. If I use HTML output, I may need to use something to prevent XSS in the comments. Which way you suggest for a simple commenting system. I already integrated CKEditor to my system and prefer a very lightweight and simple approach without so much bloat (like PEAR). Also, StackOverflow seems pretty awesome. Is it possible to use something similar for my php?

I should use a reliable parser to parse the bbcode to HTML.
PHP has a pecl BBCode extension.
Also, StackOverflow seems pretty awesome. Is it possible to use something similar for my php?
SO uses Markdown. Markdown parser in PHP is also available

Related

Using php to get all translatable text from a website/html-page

I'm trying to set up a translation tool to translate websites. What I want to do is import html-code and get all translatable texts from that site.
One idea would be to use strip_tags, but it would ignore strings that could be translated such as alt-texts, title-texts and probably others that I don't have on my mind yet. Is there a clean way to do this?
In this case you need to parse HTML and extract text yourself. As you, probably, already know, parsing HTML with regular expressions is A Bad Idea (tm). SO, the only right solution is to parse DOM of the document. On this step you are free to use any tools including standard DOMDocument class.
If you are looking for some libraries or scripts to help, i would suggest to look on html2text which could be used commercially. As i see, it doesn't support attributes for <img> tags, but it's very easy to fix (use <a> tag as example).
If you are looking for some automated text extraction, then you should definitely look on something like Bolierpipe.
I would personally use the DOM Crowler component from Symfony2, which is a nice wrapper around php DOM functions and start from there.

Is there a better way then using Lynx to convert HTML to Plaintext reliably in PHP

I want to convert a HTML file with a table based layout to plaintext in order to send a multipart email via PHP.
I have tried a few different pre built classes / functions that I've found on SO, but none of them seem to produce decent results, which I believe is down to the table-based layout.
I don't want to roll my own class for stripping HTML and formatting the results as I am sure there are edge issues which I won't account for or be able to test until I come across them in production.
The best solution I've come up with so far is:
Create a temporary HTML file
Use something like shell_exec("/path/to/lynx -dump temporary.html"); to create a plaintext version of the email
Use some regex to get rid of any remaining unwanted tags
This works fine, but I'm a little worried that its not the optimal way of achieving a decent multipart email. Is anyone aware of a better way?
To clarify, I have already tried the following without success:
html2text class - http://www.chuggnutt.com/html2text.php
Markdownify - http://milianw.de/projects/markdownify/
html2text version 2 - http://www.howtocreate.co.uk/php/html2texthowto.html
http://journals.jevon.org/users/jevon-phd/entry/19818
Lynx is not the best solution as I truly believe :) Also, I've used html2text myself and it works fine and is better than lynx.. anyway, if you prefer regexing it would rather be much more heavy than using the system shell (shell_exec, system, exec, popen), as you need to preg_replace all unnecessary tags, and in php regex is deadly slow. So I guess if it's on linux machine it's better to pass to html2text..
PHP DomDocument should help you in this.
You can traverse the DOM tree and strip out relevant content as you want.
http://php.net/manual/en/class.domdocument.php
Related question on SO :
Parse HTML with PHP's HTML DOMDocument

php wiki parser for trac-style formatting

I am creating a very simple cms for my site and rather than using html, I'd like to insert content in the same kind of wiki-format that's used by the Trac project.
Do you know of any open-source php scripts/classes that I can grab and use for this?
Note: I am not trying to create a wiki site. Just that formatting aspect - like how this stack exchange site accepts wiki mark-up and renders it nicely.
After doing some more research, I think I've found it.
The Forever For Now wiki-syntax-to-html parser is pretty much the same as the formatting on the Trac project.
~I have not looked at the code yet, but its pretty likely to be cool. (like Fonzie)~
Edit - I've, now, looked at the code and its beautiful and elegant and does the job.
PHP Markdown might work for you.

Pretty-print HTML via PHP without validation?

I'd like to automatically pretty-print (indentation, mostly) the HTML output that my PHP scripts generate. I've been messing with Tidy, but have found that in its efforts to validate and clean my code, Tidy is changing way too much. I know Tidy's intentions are good but I'm really just looking for an HTML beautifier. Is there a simpler library out there that can run in PHP and just do the pretty-printing? Or, is there a way to configure Tidy to skip all the validation stuff and just beautify?
The behaviour that you've observed when using Tidy is a result of the underlying use of DOM API. Instead of manipulating the provided source code, DOM API will reconstruct the whole source, thus making fixes along the way.
I've written Dindent, which is a library that uses Regex. It does not do anything beyond adding the indentation and removing whitespaces. However, I advise against using this implementation beyond development purposes.
I've never used Tidy but it seems pretty customizable.
Here's the quick reference of configuration options: http://tidy.sourceforge.net/docs/quickref.html
But really, with tools like Firebug, I've never seen the need to Tidy HTML output.
Since you do not want to have it validate for whatever reason, I will not suggest htmlpurifier ; ). Why not just use an IDE to get everything indented nicely, like Alt-Shift-F in Netbeans.
Facing the same problem i currently use a combination of two commands:
cat template-home.php | js-beautify --type html | prettier --parser php
js-beautify formats the html bits and prettier formats the php code

Clean up PHP/HTML pages

Does anybody know of a good tool that cleans up files with php and html in it? I've used Tidy before but it doesn't do a good job at leaving the php code alone. I know there are various implementations of tidy but does any tool reign champion specifically for pages with html and php?
Cleaning your code starts with separating PHP from HTML !
I am aware that this is a pretty old question but still a valid one. I currently use this and it seems to be doing a decent job: PHP Formatter
For HTML, CSS and JS, DirtyMarkup is a handy tool. Only drawback of these is that you have to copy and paste the code twice.
As far as I know, Tidy is the "reigning champion" when is comes to cleaning html code. The only other tool I've personally used in cleaning code is within Adobe Dreamweaver.
I would agree with seperating your HTML and your PHP code. However, I think you have to think of it kind of backwards. I would seperate your HTML code from your PHP code. Take your HTML and block it up and use include 'html_code_1.php';. Thus you can run Tidy on your HTML and not worry about it affecting your PHP code.
I previously had this problem, however had issues with other programs reorganizing what I coded, and trying to clean it up usually ended up doing more harm than good. To solve this, I am starting to learn the ins and outs of Code Igniter, a basic PHP framework that uses the MVC approach to splitting HTML and PHP. I haven't tested much, but it looks like much less hassle than writing HTML and PHP straight into the single file.
You can use this PHP class, if you can't install the "Tidy" module (sometimes when you buy hosts you can't).
http://www.barattalo.it/html-fixer/

Categories