I was wondering whether there is class or something similar which I can include into my PHP pages to beautify the HTML output.
Such as putting new lines in after tags and correctly indenting so that my source code isn't only one line, I know that to the browser it doesn't matter but I wish to do this.
I have heard of http://www.php.net/manual/en/book.tidy.php but am not clear on what it does and how to implement it, i.e. I don't understand what the manual says about it.
The Tidy extension is the way to go.
If you don't understand the documentation (OK, admittedly it's not very thorough), then the first results on Google for php tidy tutorials look very promising:
http://devzone.zend.com/article/761
http://www.devshed.com/c/a/PHP/Working-with-the-Tidy-Library-in-PHP-5/
HTML purifier or HTML tidy seems to be the way to go for this, combined with this set of functions: http://www.php.net/manual/en/ref.outcontrol.php
http://htmlpurifier.org/
http://tidy.sourceforge.net/
Try Pretty Diff - http://prettydiff.com/?m=beautify&html It appears to be a more complete algorithm than Tidy.
Related
I'm trying to set up a translation tool to translate websites. What I want to do is import html-code and get all translatable texts from that site.
One idea would be to use strip_tags, but it would ignore strings that could be translated such as alt-texts, title-texts and probably others that I don't have on my mind yet. Is there a clean way to do this?
In this case you need to parse HTML and extract text yourself. As you, probably, already know, parsing HTML with regular expressions is A Bad Idea (tm). SO, the only right solution is to parse DOM of the document. On this step you are free to use any tools including standard DOMDocument class.
If you are looking for some libraries or scripts to help, i would suggest to look on html2text which could be used commercially. As i see, it doesn't support attributes for <img> tags, but it's very easy to fix (use <a> tag as example).
If you are looking for some automated text extraction, then you should definitely look on something like Bolierpipe.
I would personally use the DOM Crowler component from Symfony2, which is a nice wrapper around php DOM functions and start from there.
I want to convert a HTML file with a table based layout to plaintext in order to send a multipart email via PHP.
I have tried a few different pre built classes / functions that I've found on SO, but none of them seem to produce decent results, which I believe is down to the table-based layout.
I don't want to roll my own class for stripping HTML and formatting the results as I am sure there are edge issues which I won't account for or be able to test until I come across them in production.
The best solution I've come up with so far is:
Create a temporary HTML file
Use something like shell_exec("/path/to/lynx -dump temporary.html"); to create a plaintext version of the email
Use some regex to get rid of any remaining unwanted tags
This works fine, but I'm a little worried that its not the optimal way of achieving a decent multipart email. Is anyone aware of a better way?
To clarify, I have already tried the following without success:
html2text class - http://www.chuggnutt.com/html2text.php
Markdownify - http://milianw.de/projects/markdownify/
html2text version 2 - http://www.howtocreate.co.uk/php/html2texthowto.html
http://journals.jevon.org/users/jevon-phd/entry/19818
Lynx is not the best solution as I truly believe :) Also, I've used html2text myself and it works fine and is better than lynx.. anyway, if you prefer regexing it would rather be much more heavy than using the system shell (shell_exec, system, exec, popen), as you need to preg_replace all unnecessary tags, and in php regex is deadly slow. So I guess if it's on linux machine it's better to pass to html2text..
PHP DomDocument should help you in this.
You can traverse the DOM tree and strip out relevant content as you want.
http://php.net/manual/en/class.domdocument.php
Related question on SO :
Parse HTML with PHP's HTML DOMDocument
Here is my idea, I want to create a tool that can create static html pages, out of php pages, perhaps generated by a cms.
Then I want to use some kind of regex, or clean tool, to reorganize the html to generate a cleaner, more standardized, yslow compliant html pages.
I may asking for what does not exist, if so, any suggestions for a close cousin solution?
Thank you for your time.
Take a look at Tidy: http://php.net/manual/en/book.tidy.php
Works great for cleaning up html.
Not regex but an extension.
I'd like to automatically pretty-print (indentation, mostly) the HTML output that my PHP scripts generate. I've been messing with Tidy, but have found that in its efforts to validate and clean my code, Tidy is changing way too much. I know Tidy's intentions are good but I'm really just looking for an HTML beautifier. Is there a simpler library out there that can run in PHP and just do the pretty-printing? Or, is there a way to configure Tidy to skip all the validation stuff and just beautify?
The behaviour that you've observed when using Tidy is a result of the underlying use of DOM API. Instead of manipulating the provided source code, DOM API will reconstruct the whole source, thus making fixes along the way.
I've written Dindent, which is a library that uses Regex. It does not do anything beyond adding the indentation and removing whitespaces. However, I advise against using this implementation beyond development purposes.
I've never used Tidy but it seems pretty customizable.
Here's the quick reference of configuration options: http://tidy.sourceforge.net/docs/quickref.html
But really, with tools like Firebug, I've never seen the need to Tidy HTML output.
Since you do not want to have it validate for whatever reason, I will not suggest htmlpurifier ; ). Why not just use an IDE to get everything indented nicely, like Alt-Shift-F in Netbeans.
Facing the same problem i currently use a combination of two commands:
cat template-home.php | js-beautify --type html | prettier --parser php
js-beautify formats the html bits and prettier formats the php code
Does anybody know of a good tool that cleans up files with php and html in it? I've used Tidy before but it doesn't do a good job at leaving the php code alone. I know there are various implementations of tidy but does any tool reign champion specifically for pages with html and php?
Cleaning your code starts with separating PHP from HTML !
I am aware that this is a pretty old question but still a valid one. I currently use this and it seems to be doing a decent job: PHP Formatter
For HTML, CSS and JS, DirtyMarkup is a handy tool. Only drawback of these is that you have to copy and paste the code twice.
As far as I know, Tidy is the "reigning champion" when is comes to cleaning html code. The only other tool I've personally used in cleaning code is within Adobe Dreamweaver.
I would agree with seperating your HTML and your PHP code. However, I think you have to think of it kind of backwards. I would seperate your HTML code from your PHP code. Take your HTML and block it up and use include 'html_code_1.php';. Thus you can run Tidy on your HTML and not worry about it affecting your PHP code.
I previously had this problem, however had issues with other programs reorganizing what I coded, and trying to clean it up usually ended up doing more harm than good. To solve this, I am starting to learn the ins and outs of Code Igniter, a basic PHP framework that uses the MVC approach to splitting HTML and PHP. I haven't tested much, but it looks like much less hassle than writing HTML and PHP straight into the single file.
You can use this PHP class, if you can't install the "Tidy" module (sometimes when you buy hosts you can't).
http://www.barattalo.it/html-fixer/