A big bottleneck I have at the moment is PDF generation each time someone places an order. It's not a big deal for a single order, but when there are a lot in a short time frame, this process is very slow.
The PDF needs text information, a QR code, a Bar code, a logo, and 1 or more (up to 20+) 1/4-width images.
Current process w/ DOMPDF:
QR code image created w/ PHP and saved as png
Bar code image created and saved as png
DomPDF generates PDF
New thought:
HTML2PDF creates PDF, and uses it's qr and bar code tags to generate the bar codes
That theoretically would take care of the QR and Barcode images, but still, the rest of the images make it too slow.
Doing it this way, without any images other than the (QR and Bar code), the PDF can generate in ~500ms, but as soon as I start adding images, it goes up to 2, 3, 4, 5+ seconds each.
When running tests, and processing ~10k orders (in a few minutes), it was still processing the PDFs around 12 hours later until I just shut it down in frustration.
The PDF is generated in a separate queue process, so the person doesn't have to wait when ordering, but - still... it can't take 5+ hours for them to receive their confirmation PDF during high traffic.
Questions / TLDR:
How can I make my process of creating PDFs with a dynamic qr code, a dynamic bar code, dynamic text, and 1-20 static images (images are same across all PDFs) faster?
Are there other potential things I haven't thought of? Maybe making a template PDF and somehow use PHP to just fill in the dynamic spots?
I would strongly advice you to use TCPDF library. It's quite fast and can be easily integrated into CakePHP. You can find a lot of examples of how to include images, barcodes and QR codes into PDF at TCPDF examples page.
To further improve the performance use tips from this page:
Install and configure a PHP opcode cacher like XCache;
Edit the php.ini file and increase the maximum amount of memory a script may consume (memory_limit);
Edit the php.ini file and increase the maximum execution time of each script (max_execution_time);
Edit the config/tcpdf_config.php file: manually set the $_SERVER['DOCUMENT_ROOT'], K_PATH_MAIN and K_PATH_URL constants, and remove the automatic calculation part;
If you are not using the Thai language, edit the config/tcpdf_config.php file and set the K_THAI_TOPCHARS constant to false;
If you do not need extended chars, edit the config/tcpdf_config.php file and set the default fonts to core fonts;
If you do not need UTF-8 Unicode, set the $unicode parameter on TCPDF constructor to false and the $encoding parameter to 'ISO-8859-1' or other character map.
By default TCPDF enables font subsetting to reduce the size of embedded Unicode TTF fonts, this process, that is very slow and requires a lot of memory, can be turned off using setFontSubsetting(false) method;
Use core fonts instead of embedded fonts whenever possible;
Avoid using the HTML syntax (writeHTML and writeHTMLCell methods) if not strictly required;
Split large HTML blocks in smaller pieces;
Avoid using transactions if not strictly required;
Restart the webserver after changes.
If that does not improve the performance to the acceptable level you can install your CakePHP application (or just the script that runs the generation of PDFs if it doesn't use CakePHP) on a second server with more available resources and use that server only for PDF generation.
You can try to use JPEG instead of PNG files if you don't need transparency.
For example, in TCPDF, I had to generate a PDF with a big PNG in background (18cm x 18cm, 300dpi). I had to wait for 11 seconds before the file is generated.
I replaced the image with a JPEG of the same size and DPI, and it took less than 1 second.
Related
TCPDF is an open source PDF generator I'm using in a web app of mine for generating reports. These reports are standard pre-made HTML tables, which TCPDF reads and converts into a PDF. Lately The reports have been running multiple pages and are taking much longer than I'd like to load.
The problems
TCPDF is currently generating the PDFs at 0.48s per page.
TCPDF uses on average 3-4 MB RAM per page during generation, often timing out the half gig php limit I have set. (512/4 = 128 = my page limit), even though the final file will be below 1MB.
What I've tried
Initially I thought the long wait time may have been down to database
calls for the required info and my php script generating the HTML, however timestamps on each page of the report, and also printing time
stamps at the completion of database calls and HTML generation rule out that possibility.
First thing I tried was updating TCPDF because I am running a 2010 version, this actually increased the load time by a factor of 4x! (however the new version was more memory efficient)
I've tried re-writing the HTML, removing all images and including all CSS in a style tag at the top (previously it was contained in each HTML tag), however this actually slowed the generation by 50%!
Question
Is there anyone whoes had a similar problem that can point out some other common edits which can improve the loading?
What are my alternatives from using HTML tables and TCPDF that can provide a more efficient way of generating the PDF?
Worst case scenario, I think I might have to make a separate PDF generating machine which does every report page by page, mashes them together, and Emails it when its done but that sounds naaasty.
I am working with a large amount of pages (letters) that are the same except for the address and a few other minor details. I believe what slows the PDF creation down the most is the logo image that I'm including on every page (even though it is fairly small).
I'm hoping to speed up the process some more by caching the logo, i.e. by loading the file once and storing it in a variable and have TCPDF use that instead of loading the image every time. TCPDF can load a "PHP image data stream", and the example given is this:
$imgdata = base64_decode('iVBORw0KGgoAAAANSUhEUgAAABwAAAASCAMAAAB/2U7WAAAABlBMVEUAAAD///+l2Z/dAAAASUlEQVR4XqWQUQoAIAxC2/0vXZDrEX4IJTRkb7lobNUStXsB0jIXIAMSsQnWlsV+wULF4Avk9fLq2r8a5HSE35Q3eO2XP1A1wQkZSgETvDtKdQAAAABJRU5ErkJggg==');
$pdf->Image('#'.$imgdata);
However, I have no idea how to create an image stream like this from a file.
My logo is a small (4kB) PNG file. If I use readfile($file) and send that to $pdf->Image with the '#' in front, it errors out - something about the cache folder which is already set to chmod 777 (it's a test server - I'll work on proper permissions on the live server). I believe I also tried base64_encode which also didn't work.
Any thoughts on how to do this?
PS: I already noticed that the more pages I include into the PDF, the slower it gets, so I'll find a good middle (probably 200-250 pages per file instead of the current 500).
Thanks!
Posted the same question in the TCPDF forum on sourceforge (sourceforge forum post), and the author of TCPDF answered.
He said that images are cached internally, however if the images need processing, he suggests using the XObject() template system (see example 62 on TCPDF site).
It took me a while to get it working (still not sure why it didn't work for me at first), but once I had it looking exactly like my original version using Image(), I ran a few tests with about 3,000 entries divided into PDF files of 500 pages each.
There was no speed gain at all between XObject() and Image(), and XObject() actually appeared to make the resulting files just a tiny bit larger (2.5kB in a 1.2MB file).
While this doesn't directly answer my original question (how to create a PHP data stream that can be directly used in TCPDF using Image('#'.$image)), it tells me what I really needed to know - the image is already cached, and caching using XObject() does not provide any advantage to my situation.
I am using HTML2PDF converter in order to export web page to PDF file. The issue I have faced is that the result PDF is to large (more than 1 MB). I want to reduce it, so here is what I have basically:
2 images (100 KB both)
1 Courier Bulgarian font - added
3 tables with a lot of inline styles of each cell
Could these things lead to the large size of the output PDF? And could anyone share some experience and best practices with the library in order to get smaller PDF as result.
Thanks in advance.
If you have access to Adobe Acrobat, it'll let you peek in the PDF and see which areas use what percentage of space. This would tell you how much of the space is taken up by images, fonts etc... If you make a sample PDF available I'll be happy to take a look at it.
How much space in the PDF is used by different objects really depends on how the PDF was written and how efficiently it uses different compression algorithms. For example, images can be ZIP, JPEG or even JPEG-2000 compressed in PDF, the question is what HTML2PDF does with your images.
Fonts can be big as well - depending on what the size of the original font is.
Page content (your tables with inline styles etc) are written in a condensed textual format (for example 1 0 0 rg would be the instruction to set the fill color for text and line-art to red). Normally efficient PDF writers will write all of these textual instructions and then ZIP compress them, which means they won't take much place in the file.
So you'd really need to take a look at the PDF file itself to see where the space is used. That will allow you to start looking at the library to see if it can be made more efficient.
To start with...
Have one separate page for printing and displaying - for printing.
While printing use Ptint.CSS and include as less as possible classes
In those three tables. Keep only require classes in CSS. Please use compress images for printing e.g if you have BMP in display use the PNG for printing.
Make tables simple using TR-TD and make them less colorfull do not put inline CSS.
This may help you to save PDF in compress version.
I have mananged to find what causes the big size of my output PDFs files. It last the fonts.
For my task I have used courier bulgarian (italics, normal and bold). This means I have embeded three additional fonts.
Something more, I do not know, but helvica is the defauld font for this PHP library and althought I am not using it, it is embeded to the PDFs too.
If you meet the same issue, open all files that have "helvica" and replace the "helvica" font with this that you are using.
Using Tinybutstrong and openTBS i created a script in PHP that opens multiple docx templates and replaces a lot of variables with values from a database. In a nutshell clients can download their unique files, add information and pictures and upload them again. This works excellent. But of coarse i wouldn't post here if there wasn't some sort of problem.
Because of the barcodes (I am using barcode fonts and embed them in Word because the documents will be scanned far later in the process), the documents get huge. Instead of 100 KB average, they'll easily get 7MB. This is a problem, because per year about 20.000 documents will be scanned. That's an extra +/- 130 GB per year.
It's a long story but we need docx, so we can't simply replace it with some sort of PHP / MySQL template that would be far more efficient.
Word has the option to just embed the font symbols that are being used to cut on the size. But that isn't an option, because the main template needs to have all chars available. It's also not an option to send the font to the users, since there are +/- 20.000 new ones each year.
Is there another solution to cut the file size or use compression. Perhaps in Word, PHP, FTP, Apache?
I'm afraid the solution of using the option "Embed fonts in the file" with "Embed only characters used in the document" cannot be exploited. Ms Word saves the font using a special format with the extension ODTTF (for example, you have it in "word\fonts\font1.odttf"). But this format is binary, it seems badly documented and thus it stays as a proprietary format. Only Ms Word will be able to build such a sub-file.
Since you haven't any lighter font for the barcode, the only solution I can see is to use image instead of font for you barcode:
OpenTBS has a feature to easily replace a picture inside a DOCX file (parameter "op=changepic").
Barcode2Image tools are easy to find in PHP. For example : Barcode Generator.
Then you only have to code your process like this :
Load the DOCX template,
Create the temporary image of the barcode.
Change the image inside the template.
Merge the template, and save or send the result.
Delete the temporary image.
It's important to delete the temporary image only after the final merge of the template, because OpenTBS actually inserts the image only when method $tbs->Show() is called.
It's also important to use a different temporary file for each merging because many merges can occur in the same time.
If temporary files have a prefix or are saved into a dedicated directory, then it is advisable to clean up old temporary images regulary.
I know there have been a lot of posts about these two but figured I'd address a glaring question I have. A designer of ours recently sent me a few files with TCPDF already tied in because a friend of him said it was "better".
In the past we have used FPDF for everything PDF generation in PHP but right off the bat I noticed an enormous glaring difference:
Filesize of fpdf.php: 46KB
filesize of tcpdf.php: 996 KB
note: the file sizes above are of the actual php file, not the PDF's generated.
I don't really have too much patience to sit down and look at all of the differences between the two but it doesn't seem it is worth the switch really for the huge file difference. Most on SO seem to really like TCPDF but what gives?
Main Question
Why the difference in size and should I be worried for my server having to load a 1MB file hundreds or thousands of times a day versus a 50KB file that does nearly the same thing? I am NOT saying my PDF file larger here folks. The filesize of the PHP script itself is the 1MB to 40Kb.
I avoid TCPDF because of its unfriendly license (you must leave link + logo intact in generated PDF documents). (Note: it seems the license has changed and is now standard LGPLv3: http://www.tcpdf.org/license.php)
That said, the usual cause for larger file size is embedded fonts. You can specify fonts in several different ways:
specify them and do not embed them (smallest size, however, text might not display correctly)
embed them fully (FPDF already supports this)
embed just the parts of characters that are used
The first option produces smallest files - I guess this is what you use with FPDF. Note that your PDF might display differently on different systems.
The second option produces largest files. Since the font is there it is (in theory - I have no experience with this) possible to edit file and add text in the same font.
The third option is the one that should be used in most cases, however, it is the most difficult to implement in libraries and core FPDF does not support it (TFPDF however does). It only embeds glyphs that are used so it produces cross-platform PDFs which are quite small.
The third option was not supported with TCPDF a few years ago (however, this might have changed by now). As I mentioned, it is also not supported in core FPDF - however, it is supported in MPDF and TFPDF (which I have successfully used in many projects).
On a side note, another reason for me not using TCPDF was unfriendly and unhelping attitude of mr. Asani (developer) in contrast with FPDF / MPDF / TFPDF community (Oliver, Ian,...) help on FPDF forum. It took 2 weeks of correspondence on forum before he admitted that TCPDF does not support partial font embedding. However, it is the license that is a real deal-breaker to me.
So, to answer your question: you could make TCPDF produce smaller files by not embedding fonts. However the license should be the main reason for switching from it. :)
i took an instant to compare both sources.
fpdf has almost no comments.
tcpdf has a few more methods but also has full blocks of phpDoc-like comments with explanations of every parameter and usage and examples in html format before every method and property. i'd say that's the main reason for the big file size.
Note that the sentence "YOU CAN'T REMOVE ANY TCPDF COPYRIGHT NOTICE OR LINK FROM THE GENERATED PDF DOCUMENTS" on the TCPDF license, refers only to an INVISIBLE link (metadata) and not on the header logo and links that appears on the default examples.
This means that the documents produced with TCPDF doesn't contain any visible artifact that interfere with your document layout.
Then, to answer your question, the document size depends by the new features you are using that are not present in FPDF. If you don't need some features and you want to reduce the document size, then disable Unicode, use only core fonts like helvetica and apply the other tips at http://www.tcpdf.org website.
TCPDF contains hundreds of features more than FPDF, that's why it's bigger (and better).
FPDF is a very primitive library compared to what TCPDF is and what it can does.