Save html file as PDF

Save html file as PDF - php

I'm using a PHP Output Buffer to create an HTML file of a dynamic 'Data Review' page, I then save this output as an HTML file to the server and would like to create a PDF file of this HTML file (stored on the server) but every solution I've looked at requires you to put in HTML code into a variable, but I have the .HTML file that I want to convert to PDF automatically but can't seem to find a solution.
The overall idea here is to supply the user a 'copy' of the data review via email, so I assumed a PDF would be best, but if there are any other suggestions, I would happily consider something else.
Any help would be greatly appreciated.
Thank you!

I've looked heavily into generating PDFs in PHP and so here is what I've found over a few years...
PDF Conversion tools
FPDF
This option is really good if you want to generate a PDF file using the PDF method (I will coin it this because you literally generate the PDF piece by piece).
Features include:
Choice of measure unit, page format and margins
Page header and footer management
Automatic page break
Automatic line break and text justification
Image support (JPEG, PNG and GIF)
Colors
Links
TrueType, Type1 and encoding support
Page compression
Notes
Performance: Fast
Cost: Free
Ease of use: Difficult
Difficult to use unless you play a lot with it.
Good documentation.
Other:
Duplication of files (need to have HTML version of a page and an FPDF version of a page if you need to generate PDFs)
MPDF
This option is really good if you want to generate a PDF file from HTML and CSS and still have additional and extensive PDF customization.
Features include:
PDF generation from UTF-8 encoded HTML
It is based on FPDF and HTML2FPDF with a number of enhancements
Notes
Performance: Mediocre
Not the fastest but does the job
Cost: Free
Ease of use: Easy
Hardest part is knowing what is and is not valid HTML and CSS for MPDF)
Great documentation.
Not all CSS is supported and some CSS is extended causing some confusion
PrinceXML
This option is probably the best if you want high performance and high reliability.
Features include:
Powerful Layout
Headers and footers
Page numbers, duplex printing
Tables, lists, columns, floats
Footnotes, cross-references
Web Standards
HTML, XHTML, XML, SVG
Cascading Style Sheets (CSS)
JavaScript/ECMAScript
JPEG, PNG, GIF, TIFF
PDF Output
Bookmarks, links, metadata
Encryption and Document Security
Font embedding and subsetting
PDF attachments
Easy Integration
PHP and Ruby on Rails
Java class for servlets
.NET for C# and ASP
ActiveX/COM for VB6
Fonts & Unicode
OpenType fonts, TrueType and CFF
Kerning, Ligatures, Small Caps
Chinese, Japanese, Korean, Arabic, Hebrew, Hindi and others
Friendly Support
Prompt email support
Web forum, user guide
Regular upgrades
Notes
Performance: Fast
Pricing: $$$
Server License
1 license - $3,800
2 license - $3,420
3 license - $3,040
4 license - $2,850
5+ license - $2,800
OEM (with minimum commitment of 2 years, can be run on any number of servers; so you can create a server farm if you really need)
20,000 documents/month at $5,000
100,000 documents/month at $7,500
500,000 documents/month at $10,000
They also have an academic discount of 50% at $1,900 and a Desktop License for $495 as well as other plans (see here for full list)
Ease of use: Easy
I have not used PrinceXML directly (pricey), but we are currently looking into this as an option for our business.
DocRaptor
This option is really good if you want a high quality API. This is a cloud-hosted option for creating PDF and XLS files. Uses PrinceXML in the backend.
Features include:
You just send HTML, JS, and CSS
Uptime guaranteed
Unlimited document size
Expert support, including document debugging
Pretty much offers everything that PrinceXML does, but double check with their support or documentation for anything specific you may require.
API-based: Works with PHP, NodeJS, Ruby, Python, Java, C#
Notes
Performance: Fast
Depends on internet connection, so if your internet goes down, so does this part of your code.
Pricing: $ - $$$
Currently, their pricing plans are as follows (taken from their website):
Basic - 125 docs/mo - $15/mo
Professional - 325 docs/mo - $29/mo
Premium - 1,250 docs/mo - $75/mo
Max - 5,000 docs/mo - $149/mo
Bronze - 15,000 docs/mo - $399/mo
Silver - 40,000 docs/mo - $1,000/mo
Gold - 100,000 docs/mo - $2,250/mo
Enterprise - ∞ docs/mo - unlisted (contact them)
Ease of use: Very easy
Probably the easiest because you don't actually deal with the document or setup, etc. You just send your files and get a PDF back.
Great documentation
I contacted their support in the past and it was actually very helpful.
They use a proprietary JavaScript engine that allows you to use delayed or asynchronous JavaScript
wkhtmltopdf
This option is really good if you want the next best thing behind the purchased options above (PrinceXML and DocRaptor).
Features include:
[Uses] the Qt WebKit rendering engine
Create your HTML document that you want to turn into a PDF (or image). Run your HTML document through the tool.
Notes
Performance: Fast
Cost: Free
Ease of use: Easy
Uses command line unless you use a library such as the one created by MikeHaertl
We currently use this option and find it performs very well and has great support for HTML tags and CSS properties.
If you need to send variables to the PDF pages that need to be generated, you cannot use $_SESSION variables as this is ran through the command line and uses a separate browser. You need to pass all your variables through $_GET variables.
Other options: Many taken from this question
Cloud-based
HTM2PDF: Source
PDFmyURL: Source
PDFCrowd: Source 1, Source 2
PDFLayer: Source
RotativaHQ: Source
Client-side
jsPDF: Source
Server-side
TCPDF - Many people recommended this option: Source
ZendPDF - Part of Zend Framework: Source
flying-saucer - Java library usable via system(): Source 1, Source 2
CutyCapt: Source
PhantomJS: Source
Snappy: Source
DOMPDF: Source
HTML2PDF: Source
PDFReactor
HTML2PS - No solid links for this project, so I linked to Google search for it
Apache FOP
PHP - PHP has its native library for creating PDFs, I assume this is probably one of the most difficult ways to go about doing this, but if you're really adventurous, why not?
PDFLib - Many other libraries are based off this one
ReportLab - Python-based
iText - Java-based: Source
ActivePDF
WeasyPrint - Python-based. This is apparently really good?
xHTML2PDF - Python-based
Other options
We deal with many vendors. Some vendors send us PDFs for their invoices or other documents while others send us HTML emails (with all our invoice information in it), and some others even send us links to the invoices.
The easiest option is to create the document in HTML and send users a link to that document (secured obviously). This would allow users to view the invoice whenever they want (and from any device with a browser) and would also allow them to print from the browser if needed. This method also generates traffic to your website which is usually also beneficial to the business.
What we've done in the past is create a link to the file on the website (secured) so that they can view it in the browser, and then have a button to download the invoice (which just downloads a PDF version of that webpage generated with one of the PDF Conversion tools listed above - currently wkhtmltopdf).
In my opinion, the best method would be to combine all delivery approaches into one. Send an email with the file information in the email's HTML content and attach a PDF of that file. Inside the header portion of the email content (at the top of the email), send a link giving the recipient direct access to the webpage containing all the information (located within their account in your secure portal). This allows them to view it in the browser just in case they can't view it properly in their email and in case they don't have a PDF viewer (I know it's rare nowadays, but you'd be surprised just how many people out there have outdated systems - we still need to send faxes to some clients because they still don't have emails; yes still now in 2017, sigh...). On your website, also provide them with a download link for the PDF document (which would again just take the page they are currently on and convert it into a PDF and automatically download it through the browser).
I hope this helps!

I would like to add another option in the probable solution list. Aspose.PDF Cloud API also offers features to convert HTML to PDF. It provides SDKs for all popular programming languages.
PHP sample code for HTML to PDF conversion:
//Html file with resource files
$name = "HtmlWithImage.zip";
$html_file_name = "HtmlWithImage.html";
$height = 650;
$width = 250;
$src_path = $name;
$response = pdfApi->getHtmlInStorageToPdf($src_path, $html_file_name, $height, $width);
print_r($response);
echo "Completed!!!!";
I work with Aspose as developer evangelist.

Related

Convert PDF to HTML version 3.2 with images and html file in a folder

I hope you are doing well.
I need to know about a PHP library that converts a PDF file having images as well to be converted in a HTML file with the following features that the library can do.
HTML file needs to be of version 3.2 compatible
Save the images in PDF file having .jpg extension
Correct font from PDF needs to be used in the HTML file.
A result folder that contains the images and html file in one folder
I have tried most of the PHP libraries but most of the PHP libraries are NOT doing my needed tasks.
Please, help let me know about a library that do all the above 4 requirements (image attached for reference)
Waiting for your kind responses.
Thanks

I am not very sure, But here is a library in PHP I found.
Here

Try this:
http://www.pdfaid.com/pdf-to-html.aspx
Or this:
http://webdesign.about.com/od/pdf/tp/tools-for-converting-pdf-to-html.htm
Or this...
http://www.pdfconvertonline.com/pdf-to-html-online.html
There are plenty of options available to you, the secret is to use a new fangled thing called a Search Engine, such as a Bing or a Google.
you will also do well to research on Stack Overflow before asking your question:
1) HTML 3.2 wes superceeded in 1997, this is very nearly twenty years ago, why on eart are you still needing a comparatively ancient technology when there are far better improvements available such as XML HTML, HTML 4.01 and HTML5.
2) Please read How can I extract embedded fonts from a PDF as valid font files?
3) Also to extract images you can use:
http://www.makeuseof.com/tag/extract-images-pdf-files-save-windows/
but again, there are several options available to you if you care to look for them.
You seem to imply a fundamental misunderstanding about HTML; there are several different ways of getting any desired result with HTML. You have a PDF file and you want it to look a certain way, this look depends on the browser you are looking at it on. For example if you use a PDF to HTML converter as linked above you will very probably find that the output will look different on Internet Explorer 7 versus on Firefox versus Internet Explorer 10. There is no one way of writing output on HTML or with CSS.
If you want a custom built library to do your specific task then you will need to employ a professional to do it, or you will need to code it yourself. This obviously should be charged to the client for requiring a technology that is extremely outdated. You can probably search github for a similar library (the one linked by CK Khan looks like what you're after) and then fork it and make your own variation for your needs. I very much doubt anyone is going to put time into developing a system to output HTML 3.2 from a PDF, and even less likely to develop this system for free and to your exact specifications.
It also appears that you can not directly incorporate font families into the <font> tag in HTML 3.2, only being able to edit size and colour of fonts. You can use CSS1 font-family to show font families. See here.

How do extract text layer and background layer from pdf?

In my project I've to do a PDF Viewer in HTML5/CSS3 and the application has to allow user to add comments and annotation. Actually, I've to do something very similar to crocodoc.com.
At the beginning I was thinking to create images from the PDF and allow user create area and post comments associates to this area. Unfortunately, the client wants also navigate in this PDF and add only comments on allowed sections (for example, paragraphs or selected text).
And now I'm in front of one problem that is to get the text and the best way to do it. If any body has some clues how I can reach it, I would appreciate.
I tried pdftohtml, but output doesn't look like the original document whom is really complex (example of document). Even this one doesn't reflect really the output, but is much better than pdftohtml.
I'm open to any solutions, with preference for command line under linux.

I've been down the same road as you, with even much more complex tasks.
After trying out everything I ended up using C# under Mono (so it runs on linux) with iTextSharp.
Even with a very complete library such as iTextSharp, some tasks required allot of trial-and-error :)
To extract the text from a page is easy (check the below snipper), however if you intend to keep the text coordinates, fonts and sizes, you will have more work to do.
int pdf_page = 5;
string page_text = "";
PdfReader reader = new PdfReader("path/to/pdf/file.pdf");
PRTokeniser token = new PRTokeniser(reader.GetPageContent(pdf_page));
while(token.NextToken())
{
if(token.TokenType == PRTokeniser.TokType.STRING)
{
page_text += token.StringValue;
}
else if(token.StringValue == "Tj")
{
page_text += " ";
}
}
Do a Console.WriteLine(token.StringValue) on all tokens to see how paragraphs of text are structured in PDFs. This way you can detect coordinates, font, font size, etc.
Addition:
Given the task you are required to do, I have a suggestion for you:
Extract the text with coordinates and font families and sizes - all information about each paragraph. Then, to a PDF-to-images, and in your online viewer, apply invisible selectable text over the paragraphs on the image where needed.
This way your users can select a part of the text where needed, without the need of reconstructing the whole PDF in html :)

I recently researched and discovered a native PHP solution to achieve this using FOSS. The FPDI PHP class can be used to import a PDF document for use with either the TCPDF or FPDF PHP classes, both of which provide functionality for creating, reading, updating and writing PDF documents. Personally, I prefer TCPDF as it provides a larger feature set (TCPDF vs. FPDF), a richer API (TCPDF vs. FPDF), more usage examples (TCPDF vs. FPDF) and a more active community forum (TCPDF vs. FPDF).
Choose one of the before mentioned classes, or another, to programmatically handle PDF documents. Focusing on both current and possible future deliverables, as well as the desired user experience, decide where (e.g. server - PHP, client - JavaScript, both) and to what extent (feature driven) your interactive logic should be implemented.
Personally, I would use a TCPDF instance obtained by importing a PDF document via FPDI to iteratively inspect, translate to a common format (XML, JSON, etc.) and store the resulting representation in relational tables designed to persist data pertinent to the desired level of document hierarchy and detail. The necessary level of detail is often dictated by a specifications document and its mention of both current and possible future deliverables.
Note: In this case, I strongly advise translating documents and storing them in a common format to create a layer of abstraction and transparency. For example, a possible and unforeseen future deliverable might be to provide the same application functionality for users uploading Microsoft Word documents. If the uploaded Microsoft Word document was not translated and stored in a common format then updates to the Web service API and dependent business logic would almost certainly be necessary. This ultimately results in storing bloated, sub-optimal data and inefficient use of development resources in designing, developing and supporting multiple translators. It would also be an inefficient use of server resources to translate outbound data for every request, as opposed to translating inbound data to an optimal format only once.
I would then extend the base document tables by designing and relating additional tables for persisting functionality specific document asset data such as:
Versioned Additions / Edits / Deletions
What
Header / Footer
Text
Original Value
New Value
Image
Page(s) (one, many or all)
Location (relative - textual anchor, absolute - x/y coordinates)
File (relative or absolute directory or url)
Brush (drawing)
Page(s) (one, many or all)
Location (relative - textual anchor, absolute - x/y coordinates)
Shape (x/y coordinates to redraw line, square, circle, user defined, etc.)
Type (pen, pencil, marker, etc.)
Weight (1px, 3px, 5px, etc.)
Color
Annotation
Page
Location (relative - textual anchor, absolute - x/y coordinates)
Shape (line, square, circle, user defined, etc.)
Value (annotation text)
Comment
Target (page, another text/image/brush/annotation asset, parent comment - threading)
Value (comment text)
When
Date
Time
Who
User
Once some, all or more, of the document and its asset data has a place to persist I would design, document and develop a PHP Web service API to expose CRUD and PDF document upload functionality to the UI consumer, while enforcing core business rules. At this point, the remaining work now lies on the Client-side. Currently, I have relational tables persisting both a document and its asset data, as well as an API exposing sufficient functionality to the consumer, in this case the Client-side JavaScript.
I can now design and develop a Client-side application using the latest Web technologies such as HTML5, JavaScript and CSS3. I can upload and request PDF documents using the Web service API and easily render the returned common format out to the browser however I decide (probably HTML in this case). I can then use 100% native JavaScript and/or 3rd party libraries for DOM helper functionality, creating vector graphics to provide drawing and annotation features, as well as access and control functional and stylistic attributes of currently selected document text and/or images. I can provide a real-time collaborative experience by employing WebSockets (before mentioned WebService API does not apply), or a semi-delayed, but still fairly seamless experience using XMLHttpRequest.
From this point forward the sky is the limit and the ball is in your court!

It's a hard task you're trying to accomplish.
To read text from a PDF, have a look at PEAR's PDF_Reader proposal code.

There's also a very extensive documentation around Zend_PDF(), which also allows the loading and parsing of a PDF document. The various elements of the PDF can be iterated on and thus also being transformed to HTML5 or whatever you like. You may even embed the notations from your website into the PDFs and vice versa.
Still, you have been given no easy task. Good Luck.

pdftk is a very good tool to do thinks like that (I don't know if it can do exactly this task).
http://www.pdflabs.com/docs/pdftk-cli-examples/

Creating printable content with Php/JavaScript/Html/CSS

I work for a care centre that would like a feature on their website where friends and family can choose from a selection of care cards to deliver to someone they know. They will be able to choose a title, an image and type in some text on the card that we assemble and deliver. They need me to make an application for them that assembles the cards in a printer-friendly fashion (placing text and images in the right areas) that they will print and fold before delivery.
Image of what I am trying to create: http://i.imgur.com/f8GnD.png
Reading about how to do this I realize that I have two issues:
Size of card on-screen can't be fixed due to printer DPI
Should I use html/CSS to make a table with 4 cells to create this card? Php image library? JavaScript?
Any help would great.

I have the best luck, in terms of printing, with PDFs. The document format is nice, too, because it is portable and the user may choose to print somewhere other than where they accessed your site.
The best PDF-generating library I've used for PHP is fPDF: http://www.fpdf.org/
PDFs are great for printing full-page documents. All but the most ancient operating systems provide users the ability to open and print PDFs, and because PDF is a document format the printed output is fairly consistent between systems and printers.
The other route you suggest is certainly possible - you can build it up using HTML and CSS. There are serious drawbacks to this, however. Foremost, each user is going to have varying printer settings in their browser, and the browser is not configured by default to be good to your full-page printing. Most user agents add page numbers, margins, the date & time, the URL.... in short, your print from the browser is going to rely on the user tinkering with their browser print settings. There is nothing you can do to influence these settings from your end.

There are third-party utilities that generate PDFs on the server, based on your HTML. PDFs have solved many print-related issues internally so you don't have to worry about them yourself.

PHP create PDF invoice

Hi does anyone know how I can create a nicely formatted PDF invoice through PHP?
Ideally I'm looking for something with a header and then an itemised listing of the products with some sort of table around. After a quick Google I would be comfortable with generating a PDF but to try and style it nicely would be another thing altogether.
Thanks

I would recommend creating an html / css representation of what you want a PDF of and using that to generate the PDF. There are dozens of applications to handle the conversion, and there is a good thread with a lot of answers here: Convert HTML + CSS to PDF with PHP?

I use TCPDF (see http://www.tcpdf.org/) for this: its pretty capable and not too painful to get setup. I will say that depending on your data source you may have some issues. In my case my data is sourced from a SOAP interface to my accounting system and use CodeIgniter for my app, and I can do this:
$address = $extraclient->get_company_address();
// generate the PDF invoice
$this->load->library('pdfinvoice');
// set document information
$this->pdfinvoice->SetSubject("Invoice " . $data_invoice['code_invoice']);
// add a page
$this->pdfinvoice->AddPage();
$this->pdfinvoice->StartPageOffset();
// write the client's details out
$width = $this->pdfinvoice->GetPageWidth()/2;
$margins = $this->pdfinvoice->getMargins();
$this->pdfinvoice->SetFont('times', 'b', $this->pdfinvoice->bigFont );
$this->_row($width, array("From:", "To:"));
$this->pdfinvoice->SetFont('times', 'i', $this->pdfinvoice->smallFont );
$this->_row($width, array("MY NAME", $customer['name_contact']));
$this->_row($width, array($address['phone'], $customer['name_customer']));
$this->_row($width, array($address['street'], $customer['address1_street']));
$this->_row($width, array($address['city']. ", ".$address['state']." ".$address['zipcode'],
$customer['address1_city']. ", ".$customer['address1_state']." ".$customer['address1_zip
The full code is quite frankly too long to insert here, but you should get the idea, and you get fairly precise layout control.

You may wanna look at html2Pdf. It is a php class based on FPDF and it allows you to create PDF file from HTML. So just format your text using html and then create its pdf. Its very flexible and give greate control.
SourceForge Link

Checkout the php-pdf-invoice library from composer. The library is already designed for generating an invoice so it's easy and quick to integrate!
composer require quickshiftin/php-pdf-invoice
You can generate PDF files that look like this
There is a usage example on the README. Essentially you implement two interfaces, one for orders, another for order items. You can configure fonts and colors to your liking as well.

PDF generation tends to be tricky. As long as the invoice is relatively simple, I would recommend creating a template image file and use imagettftext to print strings over it (or any other library).
For image file to PDF conversion imagemagick tool suite can be used (as long as it is available on the server):
exec('convert -units \'PixelsPerInch\' -density 150 '.$filename.'.png '.$filename.'.pdf');
For me the whole approach worked smoothly.

Even though you can generate pdf documents programmatically, we have found out that it's best if you just generate an HTML page of the PDF document and use some kind of a tool / service to convert the HTML to PDF.
If you are looking for an app that you can host you can check these:
wkhtmltopdf - uses webkit to render html to pdf. works ok with some caveats.
PhantomJS - DISCONTINUED, I would not suggest using it.
Headless Chrome - Best of all but by itself it's not a solution. You need a client module according to your programming language environment.
In case you do not wish to handle the utility yourself, there are a lot of free / paid online services, some of them listed below:
PDF Layer (tested - good)
Restpack (tested - good)
DocRaptor
HTMLPDFAPI
HTML to PDF Rocket

creating pdf from web page with SWF files

I am trying to generate a pdf from a web page which has pictures and swf files.
Final pdf should have pictures (swf should be converted into image, last frame is sufficient).
I am able to generate pdf when only images are there but i am stuck in creating pdf when the web page has swf files.

I've used wkhtmltopdf before to render pdfs programatically from web sites. I'm not sure if it'll cope with swf but it may do since it uses a version of webkit compiled in to qt.

You might be able to use wkhtmltopdf --enable-plugins. But according to this bugreport it might not work http://code.google.com/p/wkhtmltopdf/issues/detail?id=48 with the flash plugin (Java however does!).
Another option is running a browser in headless mode, or on a virtual X. Firefox3 works supposedly if you use the extension "CommandLinePrint".
Xvfb :2 -screen 0 1600x1200x24 &
firefox --display=localhost:2.0 -print http://flashgames.com -printmode pdf -printfile '/tmp/test.pdf'
Infos stolen from http://spielwiese.la-evento.com/xelasblog/archives/31-Headless-Firefox-als-HTML-to-PDF.html (in German however).
But there are a few more guides like this ("headless browser, HTML to PDF"). I would totally link to one of the dupes here on Stackoverflow. But I'm too lazy to search right now.

Since you are wanting to output the target page as a PDF I would look at using .rdlc (Report Definition Language Client). It is part of the Microsoft.Reporting namespace and is designed to work with asp.net. It is freely usable and redistributable.
In many cases the layout of a web page is not "printer friendly". By using this technique you can re-arrange the layout and spacing of the PDF output to a presentation that is more printer friendly.
This will not "directly" convert your page to a PDF, but rather allow you to adapt your page layout and data to a dataset and use that to build a report. That report can then be output programmatically at runtime using the reportviewer control. If this approach interests you, let me know and I will be glad to provide more help getting you through setting it up and using.

We Keep Coding

PHP, A popular general-purpose scripting language that is especially suited to web development.