Write faster pdfs with tcpdf - php

I generated pdfs with tcpdf using writeHTML. What I do, I write entirely html code and after that I generate pdfs with writeHTML.
My problem is that it's very slow. Generating 5 pages of data table (5 cols x 12 rows per page) takes about 10 seconds.
I followed almost all instructions from here: http://www.tcpdf.org/performances.php .
I put
$pdf->setFontSubsetting(false) ;
Do you have other tips? Is it going to be more faster if I generate pdfs problematically?

Generating HTML, letting TCPDF parse that HTML and rejiggle it into Postscript instructions, then write this Postscript is of course going to be way slower than directly writing the Postscript to begin with. Use the regular Ln, Cell, Write etc. methods to directly generate the PDF if you want maximum performance. Yes, it's somewhat more complicated than writing HTML, but that's because they're different things. And the slow part is translating between those different things.

Related

TCPDF runs slow and consumes huge amounts of memory

TCPDF is an open source PDF generator I'm using in a web app of mine for generating reports. These reports are standard pre-made HTML tables, which TCPDF reads and converts into a PDF. Lately The reports have been running multiple pages and are taking much longer than I'd like to load.
The problems
TCPDF is currently generating the PDFs at 0.48s per page.
TCPDF uses on average 3-4 MB RAM per page during generation, often timing out the half gig php limit I have set. (512/4 = 128 = my page limit), even though the final file will be below 1MB.
What I've tried
Initially I thought the long wait time may have been down to database
calls for the required info and my php script generating the HTML, however timestamps on each page of the report, and also printing time
stamps at the completion of database calls and HTML generation rule out that possibility.
First thing I tried was updating TCPDF because I am running a 2010 version, this actually increased the load time by a factor of 4x! (however the new version was more memory efficient)
I've tried re-writing the HTML, removing all images and including all CSS in a style tag at the top (previously it was contained in each HTML tag), however this actually slowed the generation by 50%!
Question
Is there anyone whoes had a similar problem that can point out some other common edits which can improve the loading?
What are my alternatives from using HTML tables and TCPDF that can provide a more efficient way of generating the PDF?
Worst case scenario, I think I might have to make a separate PDF generating machine which does every report page by page, mashes them together, and Emails it when its done but that sounds naaasty.

Generating a 130-page PDF with php

I'm using DOMpdf to generate PDF files, I'm able to generate 15 pages so far, however I need to be able to generate 130 pages, I know this might consume memory, so I'm looking for a way where I can generate each 10 pages in a separate process, and make all of them write to the same file. My question is, is that possible?
There are many tools that can join PDFs together. One that I have had a lot of luck with is PDFtk Server. If you generate many small PDFs (e.g. 001.pdf, 002.pdf, 003.pdf) you can generate a joined PDF (e.g. final.pdf) with something like
pdftk cat 001.pdf 002.pdf 003.pdf output final.pdf
The page ordering of final.pdf will respect the order of the input files.
This is a standalone Java program, not a PHP library, so you'll have to kick the tool off with something like shell_exec or any number of other options.

How do I create a large pdf, semi quickly

I need to generate a large PDF, 2480 pages to be exact.
Currently I am using indesign, and while the output is exactly what I want.
I would rather not be involved in the document creation process.
It takes 31 minutes for indesign to execute the data merge, generate the pdf, save the pdf, and to save the pdf.indd file. (I dont really need the pdf.indd file, but I would rather not have to recreate the data merge if something were to happen to the pdf)
I am hoping for a php, or similar solution. Currently my data is stored in MySQL.
The majority of the pdf is static text, with 19 dynamically driven text fields.
There is one image on the pdf, 75x100px # 72dpi.
The output needs to be exact, the pdf file is printed and cut in half at 4.25 inches.
I have tried TCPDF, while it is fast at generating upto 50 pages, after that it would rather die than give me an output. I have also played with mPDF, and found it to be, ..., not as friendly. I have also considered generating many small files and using some utility to merge the smaller pdf's into one large pdf. Though that seems like driving around the mountain.
Any thoughts would be helpful.
You certainly can create documents directly with PHP, but it can be difficult. One method is to use one of the various PDF classes to create the document, as you have found. Another is to create images (using ImageMagic, GD, etc.) and convert those to PDF. (This method is less efficient, as you are creating raster graphics making the whole PDF page one giant graphic.)
However, I think you should consider simply scripting InDesign. InDesign has the capability to read data in via XML and create the document. This way, the design of your document isn't dependent on your programming abilities and you can still have the power of programmatically creating the document.
When it comes to huge number of pages in PDF, LaTeX is always the best answer. Nothing can really handle huge PDF generation as fast, accurate and elegant as LaTeX.
Check this question to see how to retrieve your data from the database.

Bulk template based pdf generation in PHP using pdftk

I am doing a bulk generation of pdf files based on templates and I ran into big performance issues pretty fast.
My current scenario is as follows:
get data to be filled from db
create fdf based on single data row and pdf form
write .fdf file to disk
merge the pdf with fdf using pdftk (fill_form with flatten command)
continue iterating over rows until all .pdf's are generated
all the generated files are merged together in the end and the single pdf is given to the client
I use passthru to give the raw output to the client (saves time writing file), but this is just a little performance improvements. The total operation time is about 50 seconds for 200 records and I would like to get down to at least 10 seconds in some way.
The ideal scenario would be operating all these pdfs in memory and not writing every single one of them to separate file but then the output would be impossible to do as I can't pass that kind of data to external tool like pdftk.
One other idea was to generate one big .fdf file with all those rows, but it looks like that is not allowed.
Am I missing something very trivial here?
I'm thanksfull for any advice.
PS. I know I could use some good library like pdflib but I am considering only open licensed libraries now.
EDIT:
I am up to figuring out the syntax to build an .fdf file with multiple pages using the same pdf as a template, spent few hours and couldn't find any good documentation.
After beeing faced with the same problem for a long time (wanted to generate my pdfs based on LaTeX) i finally decided to switch to another crude but effective technique:
i generate my pdfs in two steps: first i generate html with a template engine like twig or smarty. second i use mpdf to generate pdfs out of it. I tryed many other html2pdf frameworks and ended up using mpdf, it's very mature and is developed since a long time (frequent updates, rich functionality). the benefit using this technique: you can use css to design your documents (mpdf completely features css) - which comes along with the css benefit (http://www.csszengarden.com) and generate dynamic tables very easy.
Mpdf parses the html tables and looks for the theader, tfooter element and puts it on each page if your tables are bigger than one page size. Also you have the possibility to define page header and page footer elements with dynamic entities like page nr and so on.
i know, using this detour seems to be a workaround, but to be honest, no latex, pdf whatever engine is as strong and simple as html!
Try a different less complex library like fpdf (http://www.fpdf.org/)
I find it quite good and lite.
Always find libraries that are small and only do what you need them to do.
The bigger the library the more resources it consumes.
This won't help your multiple-page problem, but I notice that pdftk accepts the - character to mean 'read from standard input'.
You may be able to send the .fdf to the pdftk process via it's stdin, in order to avoid having to write them to disk.

How to Handing EXTREMELY Large Strings in PHP When Generating a PDF

I've got a report that can generate over 30,000 records if given a large enough date range. From the HTML side of things, a resultset this large is not a problem since I implement a pagination system that limits the viewable results to 100 at a given time.
My real problem occurs once the user presses the "Get PDF" button. When this happens, I essentially re-run the portion of the report that prints the data (the results of the report itself are stored in a 'save' table so there's no need to re-run the data-gathering logic), and store the results in a variable called $html. Keep in mind that this variable now contains 30,000 records of data plus the HTML needed to format it correctly on the PDF. Once I've got this HTML string created, I pass it to TCPDF to try and generate the PDF file for the user. However, instead of generating the PDF file, it just craps out without an error message (the 'Generating PDf...') dialog disappears and the system acts like you never asked it to do anything.
Through tests, I've discovered that the problem lies in the size of the $html variable being passed in. If the report under 3K records, it works fine. If it's over that, the HTML side of the report will print but not the PDF.
Helpful Info
PHP 5.3
TCPDF for PDF generation (also tried PS2PDF)
Script Memory Limit: 500 MB
How would you guys handle this scale of data when generating a PDF of this size?
Here is how I solved this issue: I noticed that some of the strings that I was having in my HTML output had some slight encoding issues - I ran htmlentities on those particular strings as I was querying the database for them and that cleared the problem.
Don't know if this was what was causing your problem, but my experience was very similar - when I was trying to output an HTML table that had a large size, with about 80.000 rows, TCPDF would display the page header but nothing table-related. This behaviour would be the same with different sets of data and different table structures.
After many attempts I started adding my own pagination - every 15 table rows, I would break the page and add a new table to the following page. That's when I noticed that every once and a while I would get blank pages between a lot of full and correct ones. That's when I realised that there must be a problem with those particular subsets of data, and discovered the encoding issue. It may be that you had something similar and TCPDF was not making it clear what your problem was.
Are you using the writeHTML method?
I went through the performance recommendations here: http://www.tcpdf.org/performances.php
It says "Split large HTML blocks in smaller pieces;".
I found that if my blocks of HTML went over 20,000 characters the PDF would take well over 2 minutes to generate.
I simply split my html up into the blocks and called writeHTML for each block and it improved dramatically. A file that wouldn't generate in 2 minutes before now takes 16 seconds.
TCPDF seems to be a native implementation of PDF generation in PHP. You may have better performance using a compiled library like PDFlib or a command-line app like htmldoc. The latter will have the best chances of generating a large PDF.
Also, are you breaking the output PDF into multiple pages? I.e. does TCPDF know to take a single HTML document and cut it into multiple pages, or are you generating multiple HTML files for it to combine into a single PDF document? That may also help.
I would break the PDF into parts, just like pagination.
1) Have "Get PDF" button on every paginated HTML page and allow downloading of records from that HTML page only.
2) Limit the maximum number of records that can be downloaded. If the maximum limit reaches, split the PDF and let the user to download multiple PDFs.

Categories