I am doing a bulk generation of pdf files based on templates and I ran into big performance issues pretty fast.
My current scenario is as follows:
get data to be filled from db
create fdf based on single data row and pdf form
write .fdf file to disk
merge the pdf with fdf using pdftk (fill_form with flatten command)
continue iterating over rows until all .pdf's are generated
all the generated files are merged together in the end and the single pdf is given to the client
I use passthru to give the raw output to the client (saves time writing file), but this is just a little performance improvements. The total operation time is about 50 seconds for 200 records and I would like to get down to at least 10 seconds in some way.
The ideal scenario would be operating all these pdfs in memory and not writing every single one of them to separate file but then the output would be impossible to do as I can't pass that kind of data to external tool like pdftk.
One other idea was to generate one big .fdf file with all those rows, but it looks like that is not allowed.
Am I missing something very trivial here?
I'm thanksfull for any advice.
PS. I know I could use some good library like pdflib but I am considering only open licensed libraries now.
EDIT:
I am up to figuring out the syntax to build an .fdf file with multiple pages using the same pdf as a template, spent few hours and couldn't find any good documentation.
After beeing faced with the same problem for a long time (wanted to generate my pdfs based on LaTeX) i finally decided to switch to another crude but effective technique:
i generate my pdfs in two steps: first i generate html with a template engine like twig or smarty. second i use mpdf to generate pdfs out of it. I tryed many other html2pdf frameworks and ended up using mpdf, it's very mature and is developed since a long time (frequent updates, rich functionality). the benefit using this technique: you can use css to design your documents (mpdf completely features css) - which comes along with the css benefit (http://www.csszengarden.com) and generate dynamic tables very easy.
Mpdf parses the html tables and looks for the theader, tfooter element and puts it on each page if your tables are bigger than one page size. Also you have the possibility to define page header and page footer elements with dynamic entities like page nr and so on.
i know, using this detour seems to be a workaround, but to be honest, no latex, pdf whatever engine is as strong and simple as html!
Try a different less complex library like fpdf (http://www.fpdf.org/)
I find it quite good and lite.
Always find libraries that are small and only do what you need them to do.
The bigger the library the more resources it consumes.
This won't help your multiple-page problem, but I notice that pdftk accepts the - character to mean 'read from standard input'.
You may be able to send the .fdf to the pdftk process via it's stdin, in order to avoid having to write them to disk.
Related
Our company allows its clients to view reports via our website. The pages are php based and the data is collected from MySQL. These reports were written a long time ago and include inline css. The pages themselves look fine, but the print version is lacking. I want to take the reports and create visually appealing "printable" pages that contain our branding.
I have found three solutions so far.
#Media Print Stylesheets
This is the easiest method, but does not give me complete layout control. I want landscape mode and need to control where the page breaks occur so this method has been eliminated from my list of possible solutions. The reports are built by looping through PHP data, so while I can always put a page break after a or for example, I can't stop the page from breaking before it gets to the next set of data.
TCPDF/FPDF
From what I have seen these classes will give me all of the control I need to customer a PDF. The challenge is that this appears to be a little more advanced than my programming skills require, and all of the inline CSS contained within the HTML tables may throw off formatting.
FDF
I am leaning towards this method if I understand it correctly. First I would create a PDF form and define all of the fields to be populated by the MySQL data. Then I would create a FDF file that would populate the form template with the data from the database. It seems easier to me to create a visually pleasing form via PDF and then populate that form using this method, rather than create the entire pdf from scratch using method 2.
Does it sound like I am on the right track? Are any of these methods "easier" than the other?
Any help is greatly appreciated.
TCPDF has the most control of each page which is what I am looking for. It is extremely sensitive when writing HTML, but that is the only downside I have found so far.
There's this excellent answer on SO already.
If you're looking for easy, my money is on mPDF. I found it to be the easiest, and essentially an out-of-the-box solution (often zero server configuration to do).
I think you should try out wkhtmltopdf.
https://code.google.com/p/wkhtmltopdf/
As for the TCPDF/FPDF pagination issue, you can see this other question for the solution provided and use the flow in it to sort yours out.
TCPDF / FPDF - Page break issue
Just found this other solution as well and think you'll need it
Convert HTML + CSS to PDF with PHP?
For me personally, FPDF works great to fetch data from my database, insert into the FPDF class and dynamically create PDF's for customers.
I see some people want to write HTML/CSS to create PDF's but you will always have
differences as the browser parses the HTML/CSS differently than when using it in PDF's.
When using FPDF's built-in method's, I have been able to get exactly what I wanted
and haven't seen any issues (yet).
I need to generate a large PDF, 2480 pages to be exact.
Currently I am using indesign, and while the output is exactly what I want.
I would rather not be involved in the document creation process.
It takes 31 minutes for indesign to execute the data merge, generate the pdf, save the pdf, and to save the pdf.indd file. (I dont really need the pdf.indd file, but I would rather not have to recreate the data merge if something were to happen to the pdf)
I am hoping for a php, or similar solution. Currently my data is stored in MySQL.
The majority of the pdf is static text, with 19 dynamically driven text fields.
There is one image on the pdf, 75x100px # 72dpi.
The output needs to be exact, the pdf file is printed and cut in half at 4.25 inches.
I have tried TCPDF, while it is fast at generating upto 50 pages, after that it would rather die than give me an output. I have also played with mPDF, and found it to be, ..., not as friendly. I have also considered generating many small files and using some utility to merge the smaller pdf's into one large pdf. Though that seems like driving around the mountain.
Any thoughts would be helpful.
You certainly can create documents directly with PHP, but it can be difficult. One method is to use one of the various PDF classes to create the document, as you have found. Another is to create images (using ImageMagic, GD, etc.) and convert those to PDF. (This method is less efficient, as you are creating raster graphics making the whole PDF page one giant graphic.)
However, I think you should consider simply scripting InDesign. InDesign has the capability to read data in via XML and create the document. This way, the design of your document isn't dependent on your programming abilities and you can still have the power of programmatically creating the document.
When it comes to huge number of pages in PDF, LaTeX is always the best answer. Nothing can really handle huge PDF generation as fast, accurate and elegant as LaTeX.
Check this question to see how to retrieve your data from the database.
I want to generate PDF from a PHP file that includes HTML controls like textbox, and textarea. I attached CSS in the same. I tried FPDF, DOMPDF and TCPDF, but still I don't get exactly what I want. How do I pass HTML controls with PHP variables and CSS to these libraries?
mpdf is another option that you could try.
EDIT :
Found another solution for it, TCPDF is a FLOSS PHP class for generating PDF documents. Looks more dominating library.
"PRINCEXML" is a good library (not completely free now).
Others:
If your meaning is to create a PDF file from PHP, pdflib will help you (as some other suggested).
Else, if you want to convert an HTML page in PDF via PHP, you'll find
a little trouble outta here.. For three years I have been trying to do it as best as I
can.
So, the options I know are:
HTML2PS: same of DOMPDF, but this one convert first in .ps
(Ghostscript), then, in whatever format you need (PDF, JPEG, PNG). For
me it is a little better than dompdf, but I have the same speed problem.. Oh,
it has better compatibility with CSS.
Those two are PHP classes, but if you can install some software on the
server, and access it through passthru() or system(), have a look at
these too:
wkhtmltopdf: based on webkit (safari's wrapper), is really fast and
powerful... It seem like it is the best one (atm) for converting HTML pages to PDF on the fly, taking only two seconds for a three pages XHTML document
with CSS 2. It is a recent project. Anyway, the Google Code page is often
updated.
htmldoc: this one is a tank, it really never stops orcrashes... The project
seems to have died in 2007, but anyway if you don't need CSS compatibility
this can be nice for you.
** Thumbs Up For Strae.
If I understand your needs correctly I don't think any PHP-PDF class would do that.
Mostly you could insert only text and images to a PDF file, so if you would want something that looks like an HTML element you would need to insert it as an image.
Usually just putting HTML doesn't mean all your elements would stay intact in the PDF . (Different world, after all)
http://www.fpdf.org/ is the site having a great HTML-to-PDF class which work well. I am using it, but you have to first study its functionality and then start.
I'm currently doing a task where I'm taking forms from a local government body, and converting them so that they are able to have a PDF generated dynamically via FPDF based on passed parameters. Currently the only copies of these documents are in read-only pdf files. What I'm wondering is if there is a way to have these files read somehow to where these documents could be converted into FPDF format somehow? Normally I'd just create them manually, but with 50 files to convert, and with some being multiple page forms, it'll probably take months, and hence looking for a quicker way.
The short answer is you're stuck. As far as I and my many hours of research know there's no such process. I would love to be proven wrong.
I recently went through a similar situation with insurance forms. I used the free trial of Adobe Live Cycle Designer to build out the forms. It basically turns the old pdf into a flat background image you can draw form fields over. Then I used PDF Toolkit and PDFTK-PHP to populate the fields.
The process wasn't ideal but it worked out well enough. I setup 20 forms consisting of about 50 pages with filling code and some other operations in a week.
In my site i m fetching my mysql data by using PHP. I want open that data in pdf file when i click pdf print button is it possible?
First of all, if you want a high quality professional product to do that. You want Prince XML
If you are looking into some open source tool to achieve something similar. You can look into this SO question.
You could prepare static PDF form file, that just fill it in with values using PHP's FDF module.
It depends which platform are you using. This would be an easy job if you are using Groovy on grails. There are plugins which facilitate pdf reporting like the jasper-plugin.
Luis
Check out jsPDF, an open-source library for generating PDF documents using nothing but JavaScript.
You can process the data with Apache FOP after transforming it to XML. (http://xmlgraphics.apache.org/fop/).
If your page is template based, you may create a template which produces xml output and process that. You'll have extremely well contol over the pdf construction. The tradeoff is that it is not a "plug this in and will work" solution, but I've done that and once its set up, works like charm.
I've used TCPDF in the past, it's a little kludgy but can definitely get the job done. (http://www.tecnick.com/public/code/cp_dpage.php?aiocp_dp=tcpdf)
The FPDF module in PHP is simple enough to get the data together. It is a safe option since you know what data you are passing out to the PDF engine. There are some streaming pdf options which can take in a bunch of html and then output that to pdf however they can get it quite wrong without you knowing.
I used, on Linux machines, WKHTMLTOIMAGE/WKHTMLTOPDF a number of times, on many projects. It workes like a charm, easy to use, just a script that you run.