PHPExcel large data sets with multiple tabs - memory exhausted - php

Using PHPExcel I can run each tab separately and get the results I want but if I add them all into one excel it just stops, no error or any thing.
Each tab consists of about 60 to 80 thousand records and I have about 15 to 20 tabs. So about 1600000 records split into multiple tabs (This number will probably grow as well).
Also I have tested the 65000 row limitation with .xls by using the .xlsx extension with no problems if I run each tab it it's own excel file.
Pseudo code:
read data from db
start the PHPExcel process
parse out data for each page (some styling/formatting but not much)
(each numeric field value does get summed up in a totals column at the bottom of the excel using the formula SUM)
save excel (xlsx format)
I have 3GB of RAM so this is not an issue and the script is set to execute with no timeout.
I have used PHPExcel in a number of projects and have had great results but having such a large data set seems to be an issue.
Anyone every have this problem? work around? tips? etc...
UPDATE:
on error log --- memory exhausted
Besides adding more RAM to the box is there any other tips I could do?
Anyone every save current state and edit excel with new data?

I had the exact same problem and googling around did not find a valuable solution.
As PHPExcel generates Objects and stores all data in memory, before finally generating the document file which itself is also stored in memory, setting higher memory limits in PHP will never entirely solve this problem - that solution does not scale very well.
To really solve the problem, you need to generate the XLS file "on the fly". Thats what i did and now i can be sure that the "download SQL resultset as XLS" works no matter how many (million) row are returned by the database.
Pity is, i could not find any library which features "drive-by" XLS(X) generation.
I found this article on IBM Developer Works which gives an example on how to generate the XLS XML "on-the-fly":
http://www.ibm.com/developerworks/opensource/library/os-phpexcel/#N101FC
Works pretty well for me - i have multiple sheets with LOTS of data and did not even touch the PHP memory limit. Scales very well.
Note that this example uses the Excel plain XML format (file extension "xml") so you can send your uncompressed data directly to the browser.
http://en.wikipedia.org/wiki/Microsoft_Office_XML_formats#Excel_XML_Spreadsheet_example
If you really need to generate an XLSX, things get even more complicated. XLSX is a compressed archive containing multiple XML files. For that, you must write all your data on disk (or memory - same problem as with PHPExcel) and then create the archive with that data.
http://en.wikipedia.org/wiki/Office_Open_XML
Possibly its also possible to generate compressed archives "on the fly", but this approach seems really complicated.

Related

Creating large Excel files with PHPExcel

I am trying to create some very large excel documents with PHPExcel however I don't want to have to increase the memory limit within PHP.
Would it be possible to have it write say 100 rows at a time and then save it to disk. That way the whole document doesn't need to be stored in memory.
I am based in AWS so I was thinking of using S3 to store the temporary spreadsheet before it is completed and downloaded to the users computer.
If someone has experience with this method and could provide some guidance that would be great.
No it isn't possible to write 100 rows, then save to disk, then write the next 100 rows and save that to disk (appending it to the original 100 rows)..... Unlike CSV, native format Excel files aren't structured in a simple linear manner
Have you looked at any of the options that PHPExcel provides for reducing memory usage, such as cell caching?
Try https://packagist.org/packages/avadim/fast-excel-writer, I use this library for generation huge XLSX-files in 100K+ rows
I advise to use FastExcelWriter instead of the PHPExcel library. This library write to file row by row exactly you want. So using of FastExcelWriter resolves the problem with memory limit

Possible to write large csv file as xls sheets in PHP?

So I have a situation where I need to offer the user a multi-sheet excel document where each sheet has thousands of rows and ~10 columns. Data is coming from multiple MySQL queries.
Currently using "Laravel Excel" library to do this, but it uses up way too much memory and is giving me huge scalability problems.
So I have an idea to use MySQL OUTFILE to write a set of csv files on disk, one for each sheet, and then to create an xls document and write the previously written csv data as sheets in the xls.
Is there a method to accomplish writing the csv contents to a sheet "in bulk" per-say without iterating line-by-line through the csv or using up a large amount of memory (like writing to disk directly perhaps?)
Thanks for any help!
I had a very similar problem recently. My solution was to use the very lightweight PHP library PHP_XLSXWriter.
You can find it here: https://github.com/mk-j/PHP_XLSXWriter
It streams the output so it doesn't have to retain as much in memory.
In my usage case, I broke apart the "writeStream" method into three methods: one for each the header and footer, and one for the sheet content (i.e. the actual rows). This way I could write the header and then use Laravel's "chunking" feature, to get even more gradual with the writes.
The time increased slightly, but executing the script went from ~200Mb of RAM usage to under 15Mb!
Excel file formats (both BIFF and OfficeOpenXML) are not conducive to writing line-by-line like a CSV, because data isn't stored linearly. This means that all PHP libraries for writing native format Excel files have to work in PHP memory to manage the order of writing data to that file format, which means they will all consume large amounts of memory for larger volumes of data.
Laravel Excel is a wrapper around PHPExcel, which provides some options for reducing memory usage (eg. caching cell data to disk or SQLite database rather than holding it all in PHP memory), albeit at a cost in execution speed. What I don't know is whether Laravel Excel provides calls to enable these caching methods, though I believe some options are available allowing you to configure this.
Your alternative on a Linux platform is using non-PHP solutions like libXl or PUNO with Open/Libre Office

How to process 80MB+ xlsx to database MySQL with PHPExcel?

I need to insert all the data in an Excel file (.xlsx) to my database. I have tried all the available methods, such as caching, make it read chunk by chunk but nothing seems to work at all. Has anyone tried to do this with a big file size before? My spreadsheet has about 32 columns and about 700,000 rows of records.
The file is already uploaded in the server. And I want to write a cron job to actually read the excel file and insert the data to the database. I chunk it to read each time 5000, 3000 or even 10 records only, but none worked. What happens is it will return this error:
simplexml_load_string(): Memory allocation failed: growing buffer.
I did try with CSV file type and manage to get the thing run at 4000k records each time but will take about five minutes each time to process, but any higher will fail too. And get the same error. But the requirement need it in .xlsx file types, so I need to stick with that.
Consider converting it to CSV format using external tool, like ssconvert from Gnumeric package and then read CSV line by line with fgetcsv function.
Your issue occurs because you are trying to read the contents of a whole XML file. Caching and reading chunk by chunk does not help because the library you are using needs to read the entire XML file at one point to determine the structure of the spreadsheet.
So for very large files, the XML file is so big that reading it consumes all the available memory. The only working option is to use streamers and optimize the reading.
This is still a pretty complex problem. For instance, to resolve the data in your sheet, you need to read the shared strings from one XML file and the structure of your sheet from another one. Because of the way shared strings are stored, you need to have those strings in memory when reading the sheet structure. If you have thousands of shared strings, that becomes a problem.
If you are interested, Spout solves this problem. It is open-source so you can take a look at the code!

convert xml to sql

I need to convert an XML file (about 200 Mb in size) to SQL files and insert them into a MySQL table (one table - it looks like there are about 10 million rows, only a few columns).
Unfortunately, I don't have access to shell / command line tools
It looks like I would need to use PHPMyAdmin import tool where the import size is limited to 50mb per upload
Or, PHP is enabled via web browsers only, so write a PHP script to execute from the browser.
So, steps are (please let me know if there are better ways to go around):
unpack the file into the server
write a PHP script to convert and insert
or
do it locally and use phpadmin to upload them separately
What would be a good way to get this done? any ideas / feedbacks / details are appreciated.
DomDocument is very good at dealing with xml data. You can parse the data with it and convert it to the format you need.
You might have an issue with the size of the file if you can't change the configuration though. I believe the default allowed memory size is ~8MB.
If you are really dealing with 10 million rows, (200Mb of data, at 10 millows, rows, approx 21 bytes per row?), then unpacking the file on the server and writing a script to handle the insert's would probably be the best best.

How to Handing EXTREMELY Large Strings in PHP When Generating a PDF

I've got a report that can generate over 30,000 records if given a large enough date range. From the HTML side of things, a resultset this large is not a problem since I implement a pagination system that limits the viewable results to 100 at a given time.
My real problem occurs once the user presses the "Get PDF" button. When this happens, I essentially re-run the portion of the report that prints the data (the results of the report itself are stored in a 'save' table so there's no need to re-run the data-gathering logic), and store the results in a variable called $html. Keep in mind that this variable now contains 30,000 records of data plus the HTML needed to format it correctly on the PDF. Once I've got this HTML string created, I pass it to TCPDF to try and generate the PDF file for the user. However, instead of generating the PDF file, it just craps out without an error message (the 'Generating PDf...') dialog disappears and the system acts like you never asked it to do anything.
Through tests, I've discovered that the problem lies in the size of the $html variable being passed in. If the report under 3K records, it works fine. If it's over that, the HTML side of the report will print but not the PDF.
Helpful Info
PHP 5.3
TCPDF for PDF generation (also tried PS2PDF)
Script Memory Limit: 500 MB
How would you guys handle this scale of data when generating a PDF of this size?
Here is how I solved this issue: I noticed that some of the strings that I was having in my HTML output had some slight encoding issues - I ran htmlentities on those particular strings as I was querying the database for them and that cleared the problem.
Don't know if this was what was causing your problem, but my experience was very similar - when I was trying to output an HTML table that had a large size, with about 80.000 rows, TCPDF would display the page header but nothing table-related. This behaviour would be the same with different sets of data and different table structures.
After many attempts I started adding my own pagination - every 15 table rows, I would break the page and add a new table to the following page. That's when I noticed that every once and a while I would get blank pages between a lot of full and correct ones. That's when I realised that there must be a problem with those particular subsets of data, and discovered the encoding issue. It may be that you had something similar and TCPDF was not making it clear what your problem was.
Are you using the writeHTML method?
I went through the performance recommendations here: http://www.tcpdf.org/performances.php
It says "Split large HTML blocks in smaller pieces;".
I found that if my blocks of HTML went over 20,000 characters the PDF would take well over 2 minutes to generate.
I simply split my html up into the blocks and called writeHTML for each block and it improved dramatically. A file that wouldn't generate in 2 minutes before now takes 16 seconds.
TCPDF seems to be a native implementation of PDF generation in PHP. You may have better performance using a compiled library like PDFlib or a command-line app like htmldoc. The latter will have the best chances of generating a large PDF.
Also, are you breaking the output PDF into multiple pages? I.e. does TCPDF know to take a single HTML document and cut it into multiple pages, or are you generating multiple HTML files for it to combine into a single PDF document? That may also help.
I would break the PDF into parts, just like pagination.
1) Have "Get PDF" button on every paginated HTML page and allow downloading of records from that HTML page only.
2) Limit the maximum number of records that can be downloaded. If the maximum limit reaches, split the PDF and let the user to download multiple PDFs.

Categories