In my web-application I have some forms and based to those forms, users will generate some excel and chart files(.xls and .png). Each use has to generate at least 2 excel files and 5 png files.
The problem is that when is about to generate those files it takes 2 seconds for a excels file and 1 second for a chart. I'm using the PHPExcel and pcharts libs.
How to optimize this task?
Definitely time of generation depends on how many records you are working with.
Let's suppose this number is in small range..
Option 1
1 second for chart generation is ok. For excel files - no.
Possibly PHPExcel library is the one what you may want to replace with your own functions for faster xls file generation. Check out here. I have used functions from that article before and they are working like a charm. Sure, if you need to nicely format your data, you are forced to use PHPExcel and can nothing to do with that.
Option 2
Cache your files server-side.For example, once you generated chart/xls you can save it to appropriate place, and when user requests later the same chart/xls you are not needed to rebuild it.
Related
I need to export data to an Excel template that contains VBA code and data validation in PHP.
I tried using PHPExcel library but it is removing the VBA code and data validations from the template.
I tried using PHPReport library, didn't get proper solution.
The template contains multiple worksheets and they are interdependent.
E.g.: Worksheet 1 contains employee data, then worksheet 2 contains salary with respect to employee name.
I have spent a great deal of time working on this problem, and the problem of memory consumption for large data sets. All of the PHP libs I have found keep all of the cells in memory, which is not viable for anything more than a small sheet.
What I ended up doing was writing a set of Java utilities using Apache POI and packaged them with PHP/Java Bridge so I can call them from PHP. This will allow you to create a new workbook based upon a template, keeping the macros intact. You can also use POI's streaming API, so you can handle massive data sets without crashing your server.
If you have Java chops, I highly recommend going this route, it's really the only way to do brain surgery on Excel files from PHP.
If you have any questions about how to do this or would like some example code, I'll be glad to help.
I need to generate a large PDF, 2480 pages to be exact.
Currently I am using indesign, and while the output is exactly what I want.
I would rather not be involved in the document creation process.
It takes 31 minutes for indesign to execute the data merge, generate the pdf, save the pdf, and to save the pdf.indd file. (I dont really need the pdf.indd file, but I would rather not have to recreate the data merge if something were to happen to the pdf)
I am hoping for a php, or similar solution. Currently my data is stored in MySQL.
The majority of the pdf is static text, with 19 dynamically driven text fields.
There is one image on the pdf, 75x100px # 72dpi.
The output needs to be exact, the pdf file is printed and cut in half at 4.25 inches.
I have tried TCPDF, while it is fast at generating upto 50 pages, after that it would rather die than give me an output. I have also played with mPDF, and found it to be, ..., not as friendly. I have also considered generating many small files and using some utility to merge the smaller pdf's into one large pdf. Though that seems like driving around the mountain.
Any thoughts would be helpful.
You certainly can create documents directly with PHP, but it can be difficult. One method is to use one of the various PDF classes to create the document, as you have found. Another is to create images (using ImageMagic, GD, etc.) and convert those to PDF. (This method is less efficient, as you are creating raster graphics making the whole PDF page one giant graphic.)
However, I think you should consider simply scripting InDesign. InDesign has the capability to read data in via XML and create the document. This way, the design of your document isn't dependent on your programming abilities and you can still have the power of programmatically creating the document.
When it comes to huge number of pages in PDF, LaTeX is always the best answer. Nothing can really handle huge PDF generation as fast, accurate and elegant as LaTeX.
Check this question to see how to retrieve your data from the database.
I am doing a bulk generation of pdf files based on templates and I ran into big performance issues pretty fast.
My current scenario is as follows:
get data to be filled from db
create fdf based on single data row and pdf form
write .fdf file to disk
merge the pdf with fdf using pdftk (fill_form with flatten command)
continue iterating over rows until all .pdf's are generated
all the generated files are merged together in the end and the single pdf is given to the client
I use passthru to give the raw output to the client (saves time writing file), but this is just a little performance improvements. The total operation time is about 50 seconds for 200 records and I would like to get down to at least 10 seconds in some way.
The ideal scenario would be operating all these pdfs in memory and not writing every single one of them to separate file but then the output would be impossible to do as I can't pass that kind of data to external tool like pdftk.
One other idea was to generate one big .fdf file with all those rows, but it looks like that is not allowed.
Am I missing something very trivial here?
I'm thanksfull for any advice.
PS. I know I could use some good library like pdflib but I am considering only open licensed libraries now.
EDIT:
I am up to figuring out the syntax to build an .fdf file with multiple pages using the same pdf as a template, spent few hours and couldn't find any good documentation.
After beeing faced with the same problem for a long time (wanted to generate my pdfs based on LaTeX) i finally decided to switch to another crude but effective technique:
i generate my pdfs in two steps: first i generate html with a template engine like twig or smarty. second i use mpdf to generate pdfs out of it. I tryed many other html2pdf frameworks and ended up using mpdf, it's very mature and is developed since a long time (frequent updates, rich functionality). the benefit using this technique: you can use css to design your documents (mpdf completely features css) - which comes along with the css benefit (http://www.csszengarden.com) and generate dynamic tables very easy.
Mpdf parses the html tables and looks for the theader, tfooter element and puts it on each page if your tables are bigger than one page size. Also you have the possibility to define page header and page footer elements with dynamic entities like page nr and so on.
i know, using this detour seems to be a workaround, but to be honest, no latex, pdf whatever engine is as strong and simple as html!
Try a different less complex library like fpdf (http://www.fpdf.org/)
I find it quite good and lite.
Always find libraries that are small and only do what you need them to do.
The bigger the library the more resources it consumes.
This won't help your multiple-page problem, but I notice that pdftk accepts the - character to mean 'read from standard input'.
You may be able to send the .fdf to the pdftk process via it's stdin, in order to avoid having to write them to disk.
Using Tinybutstrong and openTBS i created a script in PHP that opens multiple docx templates and replaces a lot of variables with values from a database. In a nutshell clients can download their unique files, add information and pictures and upload them again. This works excellent. But of coarse i wouldn't post here if there wasn't some sort of problem.
Because of the barcodes (I am using barcode fonts and embed them in Word because the documents will be scanned far later in the process), the documents get huge. Instead of 100 KB average, they'll easily get 7MB. This is a problem, because per year about 20.000 documents will be scanned. That's an extra +/- 130 GB per year.
It's a long story but we need docx, so we can't simply replace it with some sort of PHP / MySQL template that would be far more efficient.
Word has the option to just embed the font symbols that are being used to cut on the size. But that isn't an option, because the main template needs to have all chars available. It's also not an option to send the font to the users, since there are +/- 20.000 new ones each year.
Is there another solution to cut the file size or use compression. Perhaps in Word, PHP, FTP, Apache?
I'm afraid the solution of using the option "Embed fonts in the file" with "Embed only characters used in the document" cannot be exploited. Ms Word saves the font using a special format with the extension ODTTF (for example, you have it in "word\fonts\font1.odttf"). But this format is binary, it seems badly documented and thus it stays as a proprietary format. Only Ms Word will be able to build such a sub-file.
Since you haven't any lighter font for the barcode, the only solution I can see is to use image instead of font for you barcode:
OpenTBS has a feature to easily replace a picture inside a DOCX file (parameter "op=changepic").
Barcode2Image tools are easy to find in PHP. For example : Barcode Generator.
Then you only have to code your process like this :
Load the DOCX template,
Create the temporary image of the barcode.
Change the image inside the template.
Merge the template, and save or send the result.
Delete the temporary image.
It's important to delete the temporary image only after the final merge of the template, because OpenTBS actually inserts the image only when method $tbs->Show() is called.
It's also important to use a different temporary file for each merging because many merges can occur in the same time.
If temporary files have a prefix or are saved into a dedicated directory, then it is advisable to clean up old temporary images regulary.
Using PHPExcel I can run each tab separately and get the results I want but if I add them all into one excel it just stops, no error or any thing.
Each tab consists of about 60 to 80 thousand records and I have about 15 to 20 tabs. So about 1600000 records split into multiple tabs (This number will probably grow as well).
Also I have tested the 65000 row limitation with .xls by using the .xlsx extension with no problems if I run each tab it it's own excel file.
Pseudo code:
read data from db
start the PHPExcel process
parse out data for each page (some styling/formatting but not much)
(each numeric field value does get summed up in a totals column at the bottom of the excel using the formula SUM)
save excel (xlsx format)
I have 3GB of RAM so this is not an issue and the script is set to execute with no timeout.
I have used PHPExcel in a number of projects and have had great results but having such a large data set seems to be an issue.
Anyone every have this problem? work around? tips? etc...
UPDATE:
on error log --- memory exhausted
Besides adding more RAM to the box is there any other tips I could do?
Anyone every save current state and edit excel with new data?
I had the exact same problem and googling around did not find a valuable solution.
As PHPExcel generates Objects and stores all data in memory, before finally generating the document file which itself is also stored in memory, setting higher memory limits in PHP will never entirely solve this problem - that solution does not scale very well.
To really solve the problem, you need to generate the XLS file "on the fly". Thats what i did and now i can be sure that the "download SQL resultset as XLS" works no matter how many (million) row are returned by the database.
Pity is, i could not find any library which features "drive-by" XLS(X) generation.
I found this article on IBM Developer Works which gives an example on how to generate the XLS XML "on-the-fly":
http://www.ibm.com/developerworks/opensource/library/os-phpexcel/#N101FC
Works pretty well for me - i have multiple sheets with LOTS of data and did not even touch the PHP memory limit. Scales very well.
Note that this example uses the Excel plain XML format (file extension "xml") so you can send your uncompressed data directly to the browser.
http://en.wikipedia.org/wiki/Microsoft_Office_XML_formats#Excel_XML_Spreadsheet_example
If you really need to generate an XLSX, things get even more complicated. XLSX is a compressed archive containing multiple XML files. For that, you must write all your data on disk (or memory - same problem as with PHPExcel) and then create the archive with that data.
http://en.wikipedia.org/wiki/Office_Open_XML
Possibly its also possible to generate compressed archives "on the fly", but this approach seems really complicated.