TCPDF runs slow and consumes huge amounts of memory - php

TCPDF is an open source PDF generator I'm using in a web app of mine for generating reports. These reports are standard pre-made HTML tables, which TCPDF reads and converts into a PDF. Lately The reports have been running multiple pages and are taking much longer than I'd like to load.
The problems
TCPDF is currently generating the PDFs at 0.48s per page.
TCPDF uses on average 3-4 MB RAM per page during generation, often timing out the half gig php limit I have set. (512/4 = 128 = my page limit), even though the final file will be below 1MB.
What I've tried
Initially I thought the long wait time may have been down to database
calls for the required info and my php script generating the HTML, however timestamps on each page of the report, and also printing time
stamps at the completion of database calls and HTML generation rule out that possibility.
First thing I tried was updating TCPDF because I am running a 2010 version, this actually increased the load time by a factor of 4x! (however the new version was more memory efficient)
I've tried re-writing the HTML, removing all images and including all CSS in a style tag at the top (previously it was contained in each HTML tag), however this actually slowed the generation by 50%!
Question
Is there anyone whoes had a similar problem that can point out some other common edits which can improve the loading?
What are my alternatives from using HTML tables and TCPDF that can provide a more efficient way of generating the PDF?
Worst case scenario, I think I might have to make a separate PDF generating machine which does every report page by page, mashes them together, and Emails it when its done but that sounds naaasty.

Related

TCPDF takes 10 minutes to generate a 40 page pdf file

I have a report generation PHP program which used to work fine before. I have used 2 3rd party libraries in the program: Google image chart library ( returns image if I supply values in url ) and tcpdf ( for pdf generation ). I am using mysql not mysqli for queries. There are lots of queries and loops in the page.
Before it used to take less than 3 minutes to generate the report, I am using an ajax call to generate the report which gives a completed message once the file generation is done. This program saves the pdf file in a folder and I have a link with same name to download the file.
Recently when I checked its not generating properly.
Error was TCPDF unable to get the image. This was because of the google chart library not returning the image properly. When I access the chart url in browser it gives me the image without any issue but If I give it in an image src inside a php file, its not showing. So I decided to save the file in a folder using functions like file_get_contents,file_put_contents and link it in image src. This part is now working correctly I can see the image.
But now the problem is it is taking a lot of time to generate the report, even in local environment. I tried to generate the report without the chart priniting but even then its taking time. In between it was 25 minutes n all and now its close to 10 minutes to generate a 40 page pdf file.
I really don't know why its taking so much time. All of this was working fine before and now its not working. Only thing that changed was google image chart library but now even without(commented that part and checked) that also its taking time.
How do I speed this up ? Is there any way to check which part of program is slow.
Tried xdebug but its output file is more than 400 mb and webgrind is not able to process it.
Please help.
Your next step is to troubleshoot performance.
Is TCPDF doing a lot of work you don't need done? Presumably you've seen the tips from TCPDF's author on increasing performance, and put them into practice. http://www.tcpdf.org/performances.php
Are some of your MySQL queries inefficient? Obtain an interactive connection to your MySQL server, using phpMyAdmin or a similar command-line tool. While your pdf-creation process is running, repeatedly issue this command
SHOW FULL PROCESSLIST
It presents an INFO column showing the active MySQL query for each connection. It also shows each query's elapsed time in milliseconds. If you have queries that run for many hundreds of milliseconds, you might consider using MySQL's
EXPLAIN command to analyze those queries. Often adding an appropriate index to a MySQL table can dramatically speed things up.
Is the machine running your PDF program short on RAM? use a performance monitor like *nix top or Windows perfmon to take a look.
Is your 40-page report, simply put, a huge job to create? If so, you might consider switching to a faster report-generation program than PHP + TCPDF.
Sorted out.
The issue is with the database, one of the tables has more 120000 records in it. Deleted irrelevant records, not a permanent solution but now it generates the same thing in 2.1 minutes.
Now I can't do the same thing in my production server. I would love to get your inputs on how to optimize the database.
Thank You

Generating PDFs with images way too slow

A big bottleneck I have at the moment is PDF generation each time someone places an order. It's not a big deal for a single order, but when there are a lot in a short time frame, this process is very slow.
The PDF needs text information, a QR code, a Bar code, a logo, and 1 or more (up to 20+) 1/4-width images.
Current process w/ DOMPDF:
QR code image created w/ PHP and saved as png
Bar code image created and saved as png
DomPDF generates PDF
New thought:
HTML2PDF creates PDF, and uses it's qr and bar code tags to generate the bar codes
That theoretically would take care of the QR and Barcode images, but still, the rest of the images make it too slow.
Doing it this way, without any images other than the (QR and Bar code), the PDF can generate in ~500ms, but as soon as I start adding images, it goes up to 2, 3, 4, 5+ seconds each.
When running tests, and processing ~10k orders (in a few minutes), it was still processing the PDFs around 12 hours later until I just shut it down in frustration.
The PDF is generated in a separate queue process, so the person doesn't have to wait when ordering, but - still... it can't take 5+ hours for them to receive their confirmation PDF during high traffic.
Questions / TLDR:
How can I make my process of creating PDFs with a dynamic qr code, a dynamic bar code, dynamic text, and 1-20 static images (images are same across all PDFs) faster?
Are there other potential things I haven't thought of? Maybe making a template PDF and somehow use PHP to just fill in the dynamic spots?
I would strongly advice you to use TCPDF library. It's quite fast and can be easily integrated into CakePHP. You can find a lot of examples of how to include images, barcodes and QR codes into PDF at TCPDF examples page.
To further improve the performance use tips from this page:
Install and configure a PHP opcode cacher like XCache;
Edit the php.ini file and increase the maximum amount of memory a script may consume (memory_limit);
Edit the php.ini file and increase the maximum execution time of each script (max_execution_time);
Edit the config/tcpdf_config.php file: manually set the $_SERVER['DOCUMENT_ROOT'], K_PATH_MAIN and K_PATH_URL constants, and remove the automatic calculation part;
If you are not using the Thai language, edit the config/tcpdf_config.php file and set the K_THAI_TOPCHARS constant to false;
If you do not need extended chars, edit the config/tcpdf_config.php file and set the default fonts to core fonts;
If you do not need UTF-8 Unicode, set the $unicode parameter on TCPDF constructor to false and the $encoding parameter to 'ISO-8859-1' or other character map.
By default TCPDF enables font subsetting to reduce the size of embedded Unicode TTF fonts, this process, that is very slow and requires a lot of memory, can be turned off using setFontSubsetting(false) method;
Use core fonts instead of embedded fonts whenever possible;
Avoid using the HTML syntax (writeHTML and writeHTMLCell methods) if not strictly required;
Split large HTML blocks in smaller pieces;
Avoid using transactions if not strictly required;
Restart the webserver after changes.
If that does not improve the performance to the acceptable level you can install your CakePHP application (or just the script that runs the generation of PDFs if it doesn't use CakePHP) on a second server with more available resources and use that server only for PDF generation.
You can try to use JPEG instead of PNG files if you don't need transparency.
For example, in TCPDF, I had to generate a PDF with a big PNG in background (18cm x 18cm, 300dpi). I had to wait for 11 seconds before the file is generated.
I replaced the image with a JPEG of the same size and DPI, and it took less than 1 second.

Can I pre-load an image for use in TCPDF?

I am working with a large amount of pages (letters) that are the same except for the address and a few other minor details. I believe what slows the PDF creation down the most is the logo image that I'm including on every page (even though it is fairly small).
I'm hoping to speed up the process some more by caching the logo, i.e. by loading the file once and storing it in a variable and have TCPDF use that instead of loading the image every time. TCPDF can load a "PHP image data stream", and the example given is this:
$imgdata = base64_decode('iVBORw0KGgoAAAANSUhEUgAAABwAAAASCAMAAAB/2U7WAAAABlBMVEUAAAD///+l2Z/dAAAASUlEQVR4XqWQUQoAIAxC2/0vXZDrEX4IJTRkb7lobNUStXsB0jIXIAMSsQnWlsV+wULF4Avk9fLq2r8a5HSE35Q3eO2XP1A1wQkZSgETvDtKdQAAAABJRU5ErkJggg==');
$pdf->Image('#'.$imgdata);
However, I have no idea how to create an image stream like this from a file.
My logo is a small (4kB) PNG file. If I use readfile($file) and send that to $pdf->Image with the '#' in front, it errors out - something about the cache folder which is already set to chmod 777 (it's a test server - I'll work on proper permissions on the live server). I believe I also tried base64_encode which also didn't work.
Any thoughts on how to do this?
PS: I already noticed that the more pages I include into the PDF, the slower it gets, so I'll find a good middle (probably 200-250 pages per file instead of the current 500).
Thanks!
Posted the same question in the TCPDF forum on sourceforge (sourceforge forum post), and the author of TCPDF answered.
He said that images are cached internally, however if the images need processing, he suggests using the XObject() template system (see example 62 on TCPDF site).
It took me a while to get it working (still not sure why it didn't work for me at first), but once I had it looking exactly like my original version using Image(), I ran a few tests with about 3,000 entries divided into PDF files of 500 pages each.
There was no speed gain at all between XObject() and Image(), and XObject() actually appeared to make the resulting files just a tiny bit larger (2.5kB in a 1.2MB file).
While this doesn't directly answer my original question (how to create a PHP data stream that can be directly used in TCPDF using Image('#'.$image)), it tells me what I really needed to know - the image is already cached, and caching using XObject() does not provide any advantage to my situation.

Bulk template based pdf generation in PHP using pdftk

I am doing a bulk generation of pdf files based on templates and I ran into big performance issues pretty fast.
My current scenario is as follows:
get data to be filled from db
create fdf based on single data row and pdf form
write .fdf file to disk
merge the pdf with fdf using pdftk (fill_form with flatten command)
continue iterating over rows until all .pdf's are generated
all the generated files are merged together in the end and the single pdf is given to the client
I use passthru to give the raw output to the client (saves time writing file), but this is just a little performance improvements. The total operation time is about 50 seconds for 200 records and I would like to get down to at least 10 seconds in some way.
The ideal scenario would be operating all these pdfs in memory and not writing every single one of them to separate file but then the output would be impossible to do as I can't pass that kind of data to external tool like pdftk.
One other idea was to generate one big .fdf file with all those rows, but it looks like that is not allowed.
Am I missing something very trivial here?
I'm thanksfull for any advice.
PS. I know I could use some good library like pdflib but I am considering only open licensed libraries now.
EDIT:
I am up to figuring out the syntax to build an .fdf file with multiple pages using the same pdf as a template, spent few hours and couldn't find any good documentation.
After beeing faced with the same problem for a long time (wanted to generate my pdfs based on LaTeX) i finally decided to switch to another crude but effective technique:
i generate my pdfs in two steps: first i generate html with a template engine like twig or smarty. second i use mpdf to generate pdfs out of it. I tryed many other html2pdf frameworks and ended up using mpdf, it's very mature and is developed since a long time (frequent updates, rich functionality). the benefit using this technique: you can use css to design your documents (mpdf completely features css) - which comes along with the css benefit (http://www.csszengarden.com) and generate dynamic tables very easy.
Mpdf parses the html tables and looks for the theader, tfooter element and puts it on each page if your tables are bigger than one page size. Also you have the possibility to define page header and page footer elements with dynamic entities like page nr and so on.
i know, using this detour seems to be a workaround, but to be honest, no latex, pdf whatever engine is as strong and simple as html!
Try a different less complex library like fpdf (http://www.fpdf.org/)
I find it quite good and lite.
Always find libraries that are small and only do what you need them to do.
The bigger the library the more resources it consumes.
This won't help your multiple-page problem, but I notice that pdftk accepts the - character to mean 'read from standard input'.
You may be able to send the .fdf to the pdftk process via it's stdin, in order to avoid having to write them to disk.

How to Handing EXTREMELY Large Strings in PHP When Generating a PDF

I've got a report that can generate over 30,000 records if given a large enough date range. From the HTML side of things, a resultset this large is not a problem since I implement a pagination system that limits the viewable results to 100 at a given time.
My real problem occurs once the user presses the "Get PDF" button. When this happens, I essentially re-run the portion of the report that prints the data (the results of the report itself are stored in a 'save' table so there's no need to re-run the data-gathering logic), and store the results in a variable called $html. Keep in mind that this variable now contains 30,000 records of data plus the HTML needed to format it correctly on the PDF. Once I've got this HTML string created, I pass it to TCPDF to try and generate the PDF file for the user. However, instead of generating the PDF file, it just craps out without an error message (the 'Generating PDf...') dialog disappears and the system acts like you never asked it to do anything.
Through tests, I've discovered that the problem lies in the size of the $html variable being passed in. If the report under 3K records, it works fine. If it's over that, the HTML side of the report will print but not the PDF.
Helpful Info
PHP 5.3
TCPDF for PDF generation (also tried PS2PDF)
Script Memory Limit: 500 MB
How would you guys handle this scale of data when generating a PDF of this size?
Here is how I solved this issue: I noticed that some of the strings that I was having in my HTML output had some slight encoding issues - I ran htmlentities on those particular strings as I was querying the database for them and that cleared the problem.
Don't know if this was what was causing your problem, but my experience was very similar - when I was trying to output an HTML table that had a large size, with about 80.000 rows, TCPDF would display the page header but nothing table-related. This behaviour would be the same with different sets of data and different table structures.
After many attempts I started adding my own pagination - every 15 table rows, I would break the page and add a new table to the following page. That's when I noticed that every once and a while I would get blank pages between a lot of full and correct ones. That's when I realised that there must be a problem with those particular subsets of data, and discovered the encoding issue. It may be that you had something similar and TCPDF was not making it clear what your problem was.
Are you using the writeHTML method?
I went through the performance recommendations here: http://www.tcpdf.org/performances.php
It says "Split large HTML blocks in smaller pieces;".
I found that if my blocks of HTML went over 20,000 characters the PDF would take well over 2 minutes to generate.
I simply split my html up into the blocks and called writeHTML for each block and it improved dramatically. A file that wouldn't generate in 2 minutes before now takes 16 seconds.
TCPDF seems to be a native implementation of PDF generation in PHP. You may have better performance using a compiled library like PDFlib or a command-line app like htmldoc. The latter will have the best chances of generating a large PDF.
Also, are you breaking the output PDF into multiple pages? I.e. does TCPDF know to take a single HTML document and cut it into multiple pages, or are you generating multiple HTML files for it to combine into a single PDF document? That may also help.
I would break the PDF into parts, just like pagination.
1) Have "Get PDF" button on every paginated HTML page and allow downloading of records from that HTML page only.
2) Limit the maximum number of records that can be downloaded. If the maximum limit reaches, split the PDF and let the user to download multiple PDFs.

Categories