Fastest PDF generation in PHP?

Fastest PDF generation in PHP? - php

I'm attempting to generate some reports dynamically, very simple HTML tables with borders.
I've tried TCPDF and it renders up to 400 rows just fine but anything more than that (about 20 pages) it can't handle it. DOMPDF can't even do that.
These reports can be thousands of rows.
Any idea on a faster library or a better plan of attack?

Try php-wkhtml2x php extension. It uses popular web engine webkit(Chrome and Safari uses that)

I use the FPDF library, the output is fast and resource-efficient. Try it out...
http://www.fpdf.org/

I don't know if these methods are the fastest, but they can certainly handle more than 20 pages.
You could use latex in combination with php:
http://www.linuxjournal.com/article/7870
or Zend_Service_LiveDocx_MailMerge
http://www.phphatesme.com/blog/webentwicklung/pdf-erzeugung-mit-dem-zend-framework/

Try DocRaptor.com. It's a web-based app that converts html to pdf. Easy to use.

It's possible that depending on the report that PHP is not the right solution you and might consider another alternative language such as perl to accomplish this. I have no experience with other server-side languages but it is something to keep in mind. Definitely follow #Pekka's advice and determine the limits and work on adjusting those.

Fact
Thousands of rows can be handled by the php.
My assumption
Most probably you will be fetching data from db and saving to an array and then you will be looping to write the rows.
This will eat memory.
My suggestion
Try to write into the pdf at time of fetching from db. remove the step of storing into an array.
check execution time and memory allocated in php ini.
At last when you generate it , think PDF can handle it or not :-)
surely It will have huge size .

Related

Database for Content - OK to store HTML?

Basic question is - is it safe to store HTML in a database if I restrict who can submit to it?
I have a pretty simple question. I provide video tutorials and other content. Without spending months writing a proper BBCode parser, I would need to store the HTML so I can have it look exactly the way I want when I grab it from the database.
Basically I plan to store all information in the database about a tutorial series and each episode. I would like to have some formatting for the descriptions for both so I can add multiple paragraphs, ordered and unordered lists, links to required resources, and so on.
I am using PHP and creating my own database. I am using phpMyAdmin to store the information in the table right now. I will use a user with read only rights when I pull the information in the PHP code.
What is the best way to do this? Thank you!

Like others have pointed out there's nothing dangerous about storing HTML in the DB. But when you display it you need to know the HTML is safe. Seeing as you're the only one editing the HTML I see no problem.
However, I wouldn't store HTML at all. If all you need are headings, paragraphs, lists, links, images etc I'd say Markdown is a perfect fit. The benefit with Markdown is that it looks just like normal text (ie you could send your articles as e-mails or save them as txt-documents), it takes up a lot less space than HTML and you don't have to change it once HTML gets updated.
http://michelf.ca/projects/php-markdown/

From the security point of view it is not less secure to store your HTML in a database than storing it anywhere else - if you are the only author of that HTML. But then again if other people can author HTML in your website then it doesn't matter where you store it - only how you sanitize it and how and where you display it.
Now whether or not it is an efficient way to store HTML is a completely different matter. If I were you I would use some decent templating system and store HTML in files.

Storing HTML code is fine. But if it is not from trusted source, you need to check it and allow a secure subset of markup only. HTML Tidy library will help you with that.
Also, you need to count with a future change in website design, so do not use too much markup, only basic tags. To make it look like you want, use global CSS rules and semantically named classes in the markup.
But even better is to use Markdown or another wiki-like syntax. There are nice JS editors for Markdown with real-time preview (like the one here at Stackowerflow), and you can avoid HTML altogether.

My initial answer to "should I store html in a db" is generally no. Sure it's safe if you know what you're storing, but are you really considering best practices when you ask only that question? The true answer is "It depends".
I'm sure there are things like Wordpress that store html in a database, however, as a professional website designer, I like to remember the Separation of Concerns principle. How reusable is storing html in your database for a mobile app? Is your back end now in charge of display as well as data? Do you have many implementation possibilities for a front end or are you now stuck with whatever the back end portrays, what if you want it a different color and you've stacked ul within ul within ul? How easy is the css styling now? How easy is it to change or update that html?
I could be wrong, but even Sitecore and Kentico may store an html template in a database somewhere, but the data associated with that html template is a model, not directly on the html template.
So, when you are considering this question, you may want to store your models one place and your templates another, that way when you say "hey, lets build a mobile app" you can grab your data and go, rather than creating yet another table to store the same data.

I made a really big mistake by storing text data in Mongodb gridFS + compression and using mongodump for daily backup. GridFS is 1GB of textfiles but after backup memory usage rises sometimes 1GB daily after one month 20GB in memory due to how this backup is made.
In mongodb you should do a snapshot of the data folder - rather than do mongodump. The possible reason is that it copies unused data from disk into memory then makes bson dump. So in my case text that was never used for a long time should never be loaded into memory. I think this is how backup works as even right now my Mongodb is using 200MB of ram after run mongodump its can rise to 3GB
So i think the best solution is to use a filesystem for storing HTML files as your even RAID like PERC H700 has many amazing caching features including read ahead. But it has some limitations like network access and with my experiences some data was corrupted in time and needed to run chkdsk for repair as many GB of data was add or removed daily. Also you should consider to use proper raid features like Write trough that prevent data loss when power failure.
Sqlite is not designed to be used with extremely big data so you shouldn't not use it and has missing many caching features.
Not perfect solution is to use MariaDB or its own caching script in nodejs that can use memcached/Linux ramdisk with maybe 1GB of hot cache. Using an internal nodejs caching mechanism after some time can produce many memory leak. So i can use it for network connection and I/O are using filesystem lock and many "HOT" most used files can be programmed to cached in RAM or just leave as is

Bulk template based pdf generation in PHP using pdftk

I am doing a bulk generation of pdf files based on templates and I ran into big performance issues pretty fast.
My current scenario is as follows:
get data to be filled from db
create fdf based on single data row and pdf form
write .fdf file to disk
merge the pdf with fdf using pdftk (fill_form with flatten command)
continue iterating over rows until all .pdf's are generated
all the generated files are merged together in the end and the single pdf is given to the client
I use passthru to give the raw output to the client (saves time writing file), but this is just a little performance improvements. The total operation time is about 50 seconds for 200 records and I would like to get down to at least 10 seconds in some way.
The ideal scenario would be operating all these pdfs in memory and not writing every single one of them to separate file but then the output would be impossible to do as I can't pass that kind of data to external tool like pdftk.
One other idea was to generate one big .fdf file with all those rows, but it looks like that is not allowed.
Am I missing something very trivial here?
I'm thanksfull for any advice.
PS. I know I could use some good library like pdflib but I am considering only open licensed libraries now.
EDIT:
I am up to figuring out the syntax to build an .fdf file with multiple pages using the same pdf as a template, spent few hours and couldn't find any good documentation.

After beeing faced with the same problem for a long time (wanted to generate my pdfs based on LaTeX) i finally decided to switch to another crude but effective technique:
i generate my pdfs in two steps: first i generate html with a template engine like twig or smarty. second i use mpdf to generate pdfs out of it. I tryed many other html2pdf frameworks and ended up using mpdf, it's very mature and is developed since a long time (frequent updates, rich functionality). the benefit using this technique: you can use css to design your documents (mpdf completely features css) - which comes along with the css benefit (http://www.csszengarden.com) and generate dynamic tables very easy.
Mpdf parses the html tables and looks for the theader, tfooter element and puts it on each page if your tables are bigger than one page size. Also you have the possibility to define page header and page footer elements with dynamic entities like page nr and so on.
i know, using this detour seems to be a workaround, but to be honest, no latex, pdf whatever engine is as strong and simple as html!

Try a different less complex library like fpdf (http://www.fpdf.org/)
I find it quite good and lite.
Always find libraries that are small and only do what you need them to do.
The bigger the library the more resources it consumes.

This won't help your multiple-page problem, but I notice that pdftk accepts the - character to mean 'read from standard input'.
You may be able to send the .fdf to the pdftk process via it's stdin, in order to avoid having to write them to disk.

Using PHP and SQL to generate Excel files with statistical graphs and piecharts?

I got a database with a lot of number statistical related info, which i'm looking to generate out in an excel file.
So first of all of all, do anyone of you have experience with any popular libraries/scripts to generate excel files?
Secondly, im thinking it would be cool to display some of these very dull and boring numbers, in pie charts and graphs. Again, do anyone of you have experience with any popular libraries/scripts to generate such things inside an excel file?
Thanks a lot!

Try PHPExcel library - http://phpexcel.codeplex.com/
It satisfied all my needs, including the ones you specified, but not limited to only them.
Might need to increase memory limit a bit though since in large data cases as the one you specified it consumes a significant amount of memory.

I can second the PHPExcel library, I have used it a couple of times in the past and it works great. Unfortunately you cannot create charts with it as far as I remember. Someone else brought up this question before, maybe you will find something helpful here
Create Excel chart programmatically in PHP

How can I reduce time to generate PDF dynamically using PHP?

I am generating PDF dynamically using html2ps PHP library. I want to decrease time of generating that PDF .I want to reduce that PDF generation time .Is there any way to reduce the time or optimize it?
Help me please.

Are you running this code on recent hardware?
While it may sound like avoiding the solution, on the Coding Horror Blog (whom happens to be the guys who made this site) they preach that you shouldn't spend time tweaking performance if your hardware is limited.
If you're doing this on a single core CPU (ex: Pentium 4), you are wasting your time worrying about what library to use or what code to change. Even the slowest Core 2 Duos and newer AMDs start at 2x faster than the best Pentium 4.
PS: I wasn't able to find the article on their site to link for you.
PPS: Most Pentium 4 motherboards support the 65nm Core 2 Duos.

One method of optimizing is to try another library. I use dompdf when I need to convert HTML to PDF, and I haven't found any need for optimizing, it's very fast, supports CSS properly and produces accurate results.

Reduce the complexity of the output.
Reduce the output quantity.
If the PDF generation is impacting other operations, delegate it to another process or server.

I'm sure you can avoid a lot of processing by skipping the HTML/CSS step(s) and go directly to PDF. Check FPDF or PDFLib

Direct Print webpage in PDF file

In my site i m fetching my mysql data by using PHP. I want open that data in pdf file when i click pdf print button is it possible?

First of all, if you want a high quality professional product to do that. You want Prince XML
If you are looking into some open source tool to achieve something similar. You can look into this SO question.

You could prepare static PDF form file, that just fill it in with values using PHP's FDF module.

It depends which platform are you using. This would be an easy job if you are using Groovy on grails. There are plugins which facilitate pdf reporting like the jasper-plugin.
Luis

Check out jsPDF, an open-source library for generating PDF documents using nothing but JavaScript.

You can process the data with Apache FOP after transforming it to XML. (http://xmlgraphics.apache.org/fop/).
If your page is template based, you may create a template which produces xml output and process that. You'll have extremely well contol over the pdf construction. The tradeoff is that it is not a "plug this in and will work" solution, but I've done that and once its set up, works like charm.

I've used TCPDF in the past, it's a little kludgy but can definitely get the job done. (http://www.tecnick.com/public/code/cp_dpage.php?aiocp_dp=tcpdf)

The FPDF module in PHP is simple enough to get the data together. It is a safe option since you know what data you are passing out to the PDF engine. There are some streaming pdf options which can take in a bunch of html and then output that to pdf however they can get it quite wrong without you knowing.

I used, on Linux machines, WKHTMLTOIMAGE/WKHTMLTOPDF a number of times, on many projects. It workes like a charm, easy to use, just a script that you run.

We Keep Coding

PHP, A popular general-purpose scripting language that is especially suited to web development.