Pretty-print HTML via PHP without validation? - php

I'd like to automatically pretty-print (indentation, mostly) the HTML output that my PHP scripts generate. I've been messing with Tidy, but have found that in its efforts to validate and clean my code, Tidy is changing way too much. I know Tidy's intentions are good but I'm really just looking for an HTML beautifier. Is there a simpler library out there that can run in PHP and just do the pretty-printing? Or, is there a way to configure Tidy to skip all the validation stuff and just beautify?

The behaviour that you've observed when using Tidy is a result of the underlying use of DOM API. Instead of manipulating the provided source code, DOM API will reconstruct the whole source, thus making fixes along the way.
I've written Dindent, which is a library that uses Regex. It does not do anything beyond adding the indentation and removing whitespaces. However, I advise against using this implementation beyond development purposes.

I've never used Tidy but it seems pretty customizable.
Here's the quick reference of configuration options: http://tidy.sourceforge.net/docs/quickref.html
But really, with tools like Firebug, I've never seen the need to Tidy HTML output.

Since you do not want to have it validate for whatever reason, I will not suggest htmlpurifier ; ). Why not just use an IDE to get everything indented nicely, like Alt-Shift-F in Netbeans.

Facing the same problem i currently use a combination of two commands:
cat template-home.php | js-beautify --type html | prettier --parser php
js-beautify formats the html bits and prettier formats the php code

Related

Is there a better way then using Lynx to convert HTML to Plaintext reliably in PHP

I want to convert a HTML file with a table based layout to plaintext in order to send a multipart email via PHP.
I have tried a few different pre built classes / functions that I've found on SO, but none of them seem to produce decent results, which I believe is down to the table-based layout.
I don't want to roll my own class for stripping HTML and formatting the results as I am sure there are edge issues which I won't account for or be able to test until I come across them in production.
The best solution I've come up with so far is:
Create a temporary HTML file
Use something like shell_exec("/path/to/lynx -dump temporary.html"); to create a plaintext version of the email
Use some regex to get rid of any remaining unwanted tags
This works fine, but I'm a little worried that its not the optimal way of achieving a decent multipart email. Is anyone aware of a better way?
To clarify, I have already tried the following without success:
html2text class - http://www.chuggnutt.com/html2text.php
Markdownify - http://milianw.de/projects/markdownify/
html2text version 2 - http://www.howtocreate.co.uk/php/html2texthowto.html
http://journals.jevon.org/users/jevon-phd/entry/19818
Lynx is not the best solution as I truly believe :) Also, I've used html2text myself and it works fine and is better than lynx.. anyway, if you prefer regexing it would rather be much more heavy than using the system shell (shell_exec, system, exec, popen), as you need to preg_replace all unnecessary tags, and in php regex is deadly slow. So I guess if it's on linux machine it's better to pass to html2text..
PHP DomDocument should help you in this.
You can traverse the DOM tree and strip out relevant content as you want.
http://php.net/manual/en/class.domdocument.php
Related question on SO :
Parse HTML with PHP's HTML DOMDocument

Beautify html output

I was wondering whether there is class or something similar which I can include into my PHP pages to beautify the HTML output.
Such as putting new lines in after tags and correctly indenting so that my source code isn't only one line, I know that to the browser it doesn't matter but I wish to do this.
I have heard of http://www.php.net/manual/en/book.tidy.php but am not clear on what it does and how to implement it, i.e. I don't understand what the manual says about it.
The Tidy extension is the way to go.
If you don't understand the documentation (OK, admittedly it's not very thorough), then the first results on Google for php tidy tutorials look very promising:
http://devzone.zend.com/article/761
http://www.devshed.com/c/a/PHP/Working-with-the-Tidy-Library-in-PHP-5/
HTML purifier or HTML tidy seems to be the way to go for this, combined with this set of functions: http://www.php.net/manual/en/ref.outcontrol.php
http://htmlpurifier.org/
http://tidy.sourceforge.net/
Try Pretty Diff - http://prettydiff.com/?m=beautify&html It appears to be a more complete algorithm than Tidy.

X/Html Validator in PHP

First thing: I know that there is interface to W3C validator: http://pear.php.net/package/Services_W3C_HTMLValidator/
But I don't know if I can install it on cheap hosting server. I don't think so.
I need validator for my seo tools within my Content Managment System so it must be pretty much portable.
I would love to use W3C but only if it would be portable. I can also use Curl for this but it won't be elegant solution.
The best one I found so far is: http://phosphorusandlime.blogspot.com/2007/09/php-html-validator-class.html
Is there any validator comparable to W3C but portable (only PHP that does not depend on custom packages)?
If you want to validate (X)HTML documents, you can use PHP's native DOM extension:
DOMDocument::validate — Validates the document based on its DTD
Example from Manual:
$dom = new DOMDocument;
$dom->load('book.xml'); // see docs for load, loadXml, loadHtml and loadHtmlFile
if ($dom->validate()) {
echo "This document is valid!\n";
}
If you want the individual errors, fetch them with libxml_get_errors()
I asked a similar question and you might check out some of the answers there.
In summary, I would recommend either running the HTML through tidy on the host or writing a short script to validate through W3C remotely. Personally, I don't like the tidy option because it reformats your code and I hate how it puts <p> tags on every line.
Here's a link to tidy and here's a link to the various W3C validation tools.
One thing to keep in mind is that HTML validation doesn't work with server-side code; it only works after your PHP is evaluated. This means that you'd need to run your code through the host's PHP interpreter and then 'pipe' it to either the tidy utility or the remote validation service. That command would look something like:
$ php myscript.php | tidy #options go here
Personally, I eventually chose to forgo the headache and simply render the page, copy the source and validate via direct input on the W3C validation utility. There are only so many times you need to validate a page anyway and automating it seemed more trouble than it's worth.
Good luck.

Clean up PHP/HTML pages

Does anybody know of a good tool that cleans up files with php and html in it? I've used Tidy before but it doesn't do a good job at leaving the php code alone. I know there are various implementations of tidy but does any tool reign champion specifically for pages with html and php?
Cleaning your code starts with separating PHP from HTML !
I am aware that this is a pretty old question but still a valid one. I currently use this and it seems to be doing a decent job: PHP Formatter
For HTML, CSS and JS, DirtyMarkup is a handy tool. Only drawback of these is that you have to copy and paste the code twice.
As far as I know, Tidy is the "reigning champion" when is comes to cleaning html code. The only other tool I've personally used in cleaning code is within Adobe Dreamweaver.
I would agree with seperating your HTML and your PHP code. However, I think you have to think of it kind of backwards. I would seperate your HTML code from your PHP code. Take your HTML and block it up and use include 'html_code_1.php';. Thus you can run Tidy on your HTML and not worry about it affecting your PHP code.
I previously had this problem, however had issues with other programs reorganizing what I coded, and trying to clean it up usually ended up doing more harm than good. To solve this, I am starting to learn the ins and outs of Code Igniter, a basic PHP framework that uses the MVC approach to splitting HTML and PHP. I haven't tested much, but it looks like much less hassle than writing HTML and PHP straight into the single file.
You can use this PHP class, if you can't install the "Tidy" module (sometimes when you buy hosts you can't).
http://www.barattalo.it/html-fixer/

When writing XML, is it better to hand write it, or to use a generator such as simpleXML in PHP?

I have normally hand written xml like this:
<tag><?= $value ?></tag>
Having found tools such as simpleXML, should I be using those instead? What's the advantage of doing it using a tool like that?
Good XML tools will ensure that the resulting XML file properly validates against the DTD you are using.
Good XML tools also save a bunch of repetitive typing of tags.
If you're dealing with a small bit of XML, there's little harm in doing it by hand (as long as you can avoid typos). However, with larger documents you're frequently better off using an editor, which can validate your doc against the schema and protect against typos.
You could use the DOM extenstion which can be quite cumbersome to code against. My personal opinion is that the most effective way to write XML documents from ground up is the XMLWriter extension that comes with PHP and is enabled by default in recent versions.
$w=new XMLWriter();
$w->openMemory();
$w->startDocument('1.0','UTF-8');
$w->startElement("root");
$w->writeAttribute("ah", "OK");
$w->text('Wow, it works!');
$w->endElement();
echo htmlentities($w->outputMemory(true));
using a good XML generator will greatly reduce potential errors due to fat-fingering, lapse of attention, or whatever other human frailty. there are several different levels of machine assistance to choose from, however:
at the very least, use a programmer's text editor that does syntax highlighting and auto-indentation. just noticing that your text is a different color than you expect, or not lining up the way you expect, can tip you off to a typo you might otherwise have missed.
better yet, take a step back and write the XML as a data structure of whatever language you prefer, than convert that data structure to XML. Perl gives you modules such as the lightweight XML::Simple for small jobs or the heftier XML::Generator; using XML::Simple is just a matter of arranging your content into a standard Perl hash of hashes and running it through the appropriate method.
-steve
Producing XML via any sort of string manipulation opens the door for bugs to get into your code. The extremely simple example you posted, for instance, won't produce well-formed XML if $value contains an ampersand.
There aren't a lot of edge cases in XML, but there are enough that it's a waste of time to write your own code to handle them. (And if you don't handle them, your code will unexpectedly fail someday. Nobody wants that.) Any good XML tool will automatically handle those cases.
Use the generator.
The advantage of using a generator is you have consistent markup and don't run the risk of fat-fingering a bracket or quote, or forgetting to encode something. This is crucial because these mistakes will not be found until runtime, unless you have significant tests to ensure otherwise.
hand writing isn't always the best practice, because in large XML ou can write wrong tags and can be difficult to find the reason of an error. So I suggest to use XMl parsers to create XML files.
Speed may be an issue... handwritten can be a lot faster.
The XML tools in eclipse are really useful too. Just create a new xml schema and document, and you can easily use most of the graphical tools. I do like to point out that a prior understanding of how schemas work will be of use.
Always use a tool of some kind. XML can be very complex, I know that the PHP guys are used to working with hackey little stuff, but its a huge code smell in the .NET world if someone doesn't use System.XML for creating XML.

Categories