HTML/PHP beautifier/formatter library written in PHP

HTML/PHP beautifier/formatter library written in PHP - php

I am trying to find a HTML beautifier written in PHP.
My sole purpose is to format or tabify few html/php files that are generated by my program.
I don't need to check whether it is valid or not.
I tried looking up different libraries like Tidy etc. but I couldn't decide which one to use.
Given my purpose is just to format the files on the server, I don't want the overhead of checking for the validity of these files. I need to have support for HTML5 tags and a lot of these libraries do not support them. Hence the only thing I am looking for is to be able to format the files.Something exactly like http://tools.arantius.com/tabifier but for PHP which can be run on the server side.
The files are generated using PHP DomDocument libraries.
I tried to use
file_doc->formatOutput = TRUE;
file_doc->preserveWhiteSpace = FALSE;
$this->file_doc->saveHTMLFile($this->filepath);
but it doesn't work.
The files are not generated totally from scratch. Few tags are added when my program is run and the data is sent back to the server where these tags get appended to the file and saved.

This question is old but you can use HTML purifier
http://htmlpurifier.org/
its has many option, it has one to tidy html code.

Related

Can i generate html code using other coding languages?

So, I want to send mails using Mailchimp. To make the process of making those mailing faster, I want to have a standard mailing template.
I tried .htacces to write PHP in my .html file. Sadly, mailchimp does not read any other than the html code, and completely ignores the PHP.
Is there a way to generate html code, so I can do things as import data from my database and get that data to an html file without using php in the file itself?
Or is there maybe anyone who knows a better way of doing things?

Yes, you can generate HTML with other languages, such as Python
See: https://stackoverflow.com/a/6748854/12149235
If you are having issues with MailChimp, there are a few alternatives, such as GetResponse, which isn't directly used with PHP etc, but can be integrated into web applications and so forth.

cURL PHP - load a fully page

I am currently trying to load an HTML page via cURL. I can retrieve the HTML content, but part is loaded later via scripting (AJAX POST). I can not recover the HTML part (this is a table).
Is it possible to load a page entirely?
Thank you for your answers

No, you cannot do this.
CURL does nothing more than download a file from a URL -- it doesn't care whether it's HTML, Javascript, and image, a spreadsheet, or any other arbitrary data; it just downloads. It doesn't run anything or parse anything or display anything, it just downloads.
You are asking for something more than that. You need to download, parse the result as HTML, then run some Javascript that downloads something else, then run more Javascript that parses that result into more HTML and inserts it into the original HTML.
What you're basically looking for is a full-blown web browser, not CURL.
Since your goal involves "running some Javascript code", it should be fairly clear that it is not acheivable without having a Javascript interpreter available. This means that it is obviously not going to work inside of a PHP program (*). You're going to need to move beyond PHP. You're going to need a browser.
The solution I'd suggest is to use a very specialised browser called PhantomJS. This is actually a full Webkit browser, but without a user interface. It's specifically designed for automated testing of websites and other similar tasks. Your requirement fits it pretty well: write a script to get PhantomJS to open your URL, wait for the table to finish rendering, and grab the finished HTML code.
You'll need to install PhantomJS on your server, and then use a library like this one to control it from your PHP code.
I hope that helps.
(*) yes, I'm aware of the PHP extension that provides a JS interpreter inside of PHP, and it would provide a way to solve the problem, but it's experimental, unfinished, would be still difficult to implement as a solution, and I don't think it's a particularly good idea anyway, so let's not consider it for the purposes of this answer.

No, the only way you can do that is if you make a separate curl request to ajax request and put the two results together afterwards.

X/Html Validator in PHP

First thing: I know that there is interface to W3C validator: http://pear.php.net/package/Services_W3C_HTMLValidator/
But I don't know if I can install it on cheap hosting server. I don't think so.
I need validator for my seo tools within my Content Managment System so it must be pretty much portable.
I would love to use W3C but only if it would be portable. I can also use Curl for this but it won't be elegant solution.
The best one I found so far is: http://phosphorusandlime.blogspot.com/2007/09/php-html-validator-class.html
Is there any validator comparable to W3C but portable (only PHP that does not depend on custom packages)?

If you want to validate (X)HTML documents, you can use PHP's native DOM extension:
DOMDocument::validate — Validates the document based on its DTD
Example from Manual:
$dom = new DOMDocument;
$dom->load('book.xml'); // see docs for load, loadXml, loadHtml and loadHtmlFile
if ($dom->validate()) {
echo "This document is valid!\n";
}
If you want the individual errors, fetch them with libxml_get_errors()

I asked a similar question and you might check out some of the answers there.
In summary, I would recommend either running the HTML through tidy on the host or writing a short script to validate through W3C remotely. Personally, I don't like the tidy option because it reformats your code and I hate how it puts <p> tags on every line.
Here's a link to tidy and here's a link to the various W3C validation tools.
One thing to keep in mind is that HTML validation doesn't work with server-side code; it only works after your PHP is evaluated. This means that you'd need to run your code through the host's PHP interpreter and then 'pipe' it to either the tidy utility or the remote validation service. That command would look something like:
$ php myscript.php | tidy #options go here
Personally, I eventually chose to forgo the headache and simply render the page, copy the source and validate via direct input on the W3C validation utility. There are only so many times you need to validate a page anyway and automating it seemed more trouble than it's worth.
Good luck.

Alternative of html purifier

I want to accept to accept the html input from user and post it on my site also want to make sure that it don't create problem with my site template due to dirty html code.
I was using html purifier in the past but Html purifier is not working on one of my server. So I am searching for best alternative.
Which is purely written in php.
which can fix the dirty html code like
</div> it is dirty code as div is closed without opening.

Simple solution without third-party libraries: create a DOMDocument and call loadHTML on it with your input. Surrounded the input with <html> and <body> tags if you are only parsing a little snippet. You'll probably want to suppress warnings too, as you'll get them spat out for common bad HTML.
Then simply walk over the resulting document tree, removing any elements and attributes you've not included in a known-good list. You should also check allowed URL attributes to ensure they use known-good schemes like http:, and not potentially troublesome schemes like javascript:. If you want to go the extra mile you can check that only allowed combinations of elements are nested inside each other (this is easier the smaller number of elements you're allowing).
Finally, serialise the snippet's node again using saveHTML. Because you're creating new markup from a DOM, not maintaining the original—potentially malformed—markup, that's a whole class of odd-markup injection techniques you're blocking.

You can try PHP Tidy, which is the Tidy library in PHP.

I believe Tidy will help close your tags, but it isn't as comprehensive as HTML Purifier which can remove valid but unwanted tags or attributes (i.e. JavaScript onclick events, that kind of thing).
Be aware that Tidy requires libtidy to be installed on your server, so it's not just straight PHP.
I know Pádraic Brady has been working on an alternative to HTML Purifier for Zend Framework, though I think its just experimental code at this time
http://framework.zend.com/wiki/pages/viewpage.action?pageId=25002168
http://github.com/padraic/wibble

Do also consider HTMLawed at https://www.bioinformatics.org/phplabware/internal_utilities/htmLawed/
From that page;
use to filter, secure & sanitize HTML in blog comments or forum posts, generate XML-
compatible feed items from web-page excerpts, convert HTML to XHTML, pretty-print
HTML, scrape web-pages, reduce spam, remove XSS code, etc.
Note that Tidy/HTML Tiday is NOT a anti XSS solution. It is a clean and repair utility which allows you to clean HTML, XHTML, and XML markup.
HTMLawed is a 55kb single php file whilst HTML Purifer is a 3 MB folder.

Using PHP to retrieve information from a different site

I was wondering if there's a way to use PHP (or any other server-side or even client-side [if possible] language) to obtain certain pieces of information from a different website (NOT a local file like the include 'nav.php'.
What I mean is that...Say I have a blog at www.blog.com and I have another website at www.mysite.com
Is there a way to gather ALL of the h2 links from www.blog.com and put them in a div in www.mysite.com?
Also, is there a way I could grab the entire information inside a DIV (with an ID of-course) from blog.com and insert it in mysite.com?
Thanks,
Amit

First of all, if you want to retrieve content from a blog, check if the blog generator (ie, Blogger, WordPress) does not have a API thanks to which you won't have to reinvent the wheel. Usually, good APis come with good documentations (meaning that probably 5% out of all APIs are good APIs) and these documentations should come with code examples for top languages such as PHP, JavaScript, Java, etc... Once again, if it is to retrieve content from a blog, there should be tons of frameworks that are here for you

Check out the PHP Simple HTML DOM library
Can be as easy as:
// Create DOM from URL or file
$html = file_get_html('http://www.otherwebsite.com/');
// Find all images
foreach($html->find('h2') as $element)
echo $element->src;

This can be done by opening the remote website as a file, then taking the HTML and using the DOM parser to manipulate it.
$site_html = file_get_contents('http://www.example.com/');
$document = new DOMDocument();
$document->loadHTML($site_html);
$all_of_the_h2_tags = $document->getElementsByTagName('h2');
Read more about PHP's DOM functions for what to do from here, such as grabbing other tags, creating new HTML out of bits and pieces of the DOM, and displaying that on your own site.

Your first step would be to use CURL to do a request on the other site, and bring down the HTML from the page you want to access. Then comes the part of parsing the HTML to find all the content you're looking for. One could use a bunch of regular expressions, and you could probably get the job done, but the Stackoverflow crew might frown at you. You could also take the resulting HTML and use the domDocument object, and loadHTML to parse the HTML and load the content you want.
Also, if you control both sites, you can set up a special page on the first site (www.blog.com) with exactly the information you need, properly formatted either in HTML you can output directly, or XML that you can manipulate more easily from www.mysite.com.

We Keep Coding

PHP, A popular general-purpose scripting language that is especially suited to web development.