So I have a situation where I need to offer the user a multi-sheet excel document where each sheet has thousands of rows and ~10 columns. Data is coming from multiple MySQL queries.
Currently using "Laravel Excel" library to do this, but it uses up way too much memory and is giving me huge scalability problems.
So I have an idea to use MySQL OUTFILE to write a set of csv files on disk, one for each sheet, and then to create an xls document and write the previously written csv data as sheets in the xls.
Is there a method to accomplish writing the csv contents to a sheet "in bulk" per-say without iterating line-by-line through the csv or using up a large amount of memory (like writing to disk directly perhaps?)
Thanks for any help!
I had a very similar problem recently. My solution was to use the very lightweight PHP library PHP_XLSXWriter.
You can find it here: https://github.com/mk-j/PHP_XLSXWriter
It streams the output so it doesn't have to retain as much in memory.
In my usage case, I broke apart the "writeStream" method into three methods: one for each the header and footer, and one for the sheet content (i.e. the actual rows). This way I could write the header and then use Laravel's "chunking" feature, to get even more gradual with the writes.
The time increased slightly, but executing the script went from ~200Mb of RAM usage to under 15Mb!
Excel file formats (both BIFF and OfficeOpenXML) are not conducive to writing line-by-line like a CSV, because data isn't stored linearly. This means that all PHP libraries for writing native format Excel files have to work in PHP memory to manage the order of writing data to that file format, which means they will all consume large amounts of memory for larger volumes of data.
Laravel Excel is a wrapper around PHPExcel, which provides some options for reducing memory usage (eg. caching cell data to disk or SQLite database rather than holding it all in PHP memory), albeit at a cost in execution speed. What I don't know is whether Laravel Excel provides calls to enable these caching methods, though I believe some options are available allowing you to configure this.
Your alternative on a Linux platform is using non-PHP solutions like libXl or PUNO with Open/Libre Office
Related
I am trying to create some very large excel documents with PHPExcel however I don't want to have to increase the memory limit within PHP.
Would it be possible to have it write say 100 rows at a time and then save it to disk. That way the whole document doesn't need to be stored in memory.
I am based in AWS so I was thinking of using S3 to store the temporary spreadsheet before it is completed and downloaded to the users computer.
If someone has experience with this method and could provide some guidance that would be great.
No it isn't possible to write 100 rows, then save to disk, then write the next 100 rows and save that to disk (appending it to the original 100 rows)..... Unlike CSV, native format Excel files aren't structured in a simple linear manner
Have you looked at any of the options that PHPExcel provides for reducing memory usage, such as cell caching?
Try https://packagist.org/packages/avadim/fast-excel-writer, I use this library for generation huge XLSX-files in 100K+ rows
I advise to use FastExcelWriter instead of the PHPExcel library. This library write to file row by row exactly you want. So using of FastExcelWriter resolves the problem with memory limit
I need to insert all the data in an Excel file (.xlsx) to my database. I have tried all the available methods, such as caching, make it read chunk by chunk but nothing seems to work at all. Has anyone tried to do this with a big file size before? My spreadsheet has about 32 columns and about 700,000 rows of records.
The file is already uploaded in the server. And I want to write a cron job to actually read the excel file and insert the data to the database. I chunk it to read each time 5000, 3000 or even 10 records only, but none worked. What happens is it will return this error:
simplexml_load_string(): Memory allocation failed: growing buffer.
I did try with CSV file type and manage to get the thing run at 4000k records each time but will take about five minutes each time to process, but any higher will fail too. And get the same error. But the requirement need it in .xlsx file types, so I need to stick with that.
Consider converting it to CSV format using external tool, like ssconvert from Gnumeric package and then read CSV line by line with fgetcsv function.
Your issue occurs because you are trying to read the contents of a whole XML file. Caching and reading chunk by chunk does not help because the library you are using needs to read the entire XML file at one point to determine the structure of the spreadsheet.
So for very large files, the XML file is so big that reading it consumes all the available memory. The only working option is to use streamers and optimize the reading.
This is still a pretty complex problem. For instance, to resolve the data in your sheet, you need to read the shared strings from one XML file and the structure of your sheet from another one. Because of the way shared strings are stored, you need to have those strings in memory when reading the sheet structure. If you have thousands of shared strings, that becomes a problem.
If you are interested, Spout solves this problem. It is open-source so you can take a look at the code!
Using PHPExcel I can run each tab separately and get the results I want but if I add them all into one excel it just stops, no error or any thing.
Each tab consists of about 60 to 80 thousand records and I have about 15 to 20 tabs. So about 1600000 records split into multiple tabs (This number will probably grow as well).
Also I have tested the 65000 row limitation with .xls by using the .xlsx extension with no problems if I run each tab it it's own excel file.
Pseudo code:
read data from db
start the PHPExcel process
parse out data for each page (some styling/formatting but not much)
(each numeric field value does get summed up in a totals column at the bottom of the excel using the formula SUM)
save excel (xlsx format)
I have 3GB of RAM so this is not an issue and the script is set to execute with no timeout.
I have used PHPExcel in a number of projects and have had great results but having such a large data set seems to be an issue.
Anyone every have this problem? work around? tips? etc...
UPDATE:
on error log --- memory exhausted
Besides adding more RAM to the box is there any other tips I could do?
Anyone every save current state and edit excel with new data?
I had the exact same problem and googling around did not find a valuable solution.
As PHPExcel generates Objects and stores all data in memory, before finally generating the document file which itself is also stored in memory, setting higher memory limits in PHP will never entirely solve this problem - that solution does not scale very well.
To really solve the problem, you need to generate the XLS file "on the fly". Thats what i did and now i can be sure that the "download SQL resultset as XLS" works no matter how many (million) row are returned by the database.
Pity is, i could not find any library which features "drive-by" XLS(X) generation.
I found this article on IBM Developer Works which gives an example on how to generate the XLS XML "on-the-fly":
http://www.ibm.com/developerworks/opensource/library/os-phpexcel/#N101FC
Works pretty well for me - i have multiple sheets with LOTS of data and did not even touch the PHP memory limit. Scales very well.
Note that this example uses the Excel plain XML format (file extension "xml") so you can send your uncompressed data directly to the browser.
http://en.wikipedia.org/wiki/Microsoft_Office_XML_formats#Excel_XML_Spreadsheet_example
If you really need to generate an XLSX, things get even more complicated. XLSX is a compressed archive containing multiple XML files. For that, you must write all your data on disk (or memory - same problem as with PHPExcel) and then create the archive with that data.
http://en.wikipedia.org/wiki/Office_Open_XML
Possibly its also possible to generate compressed archives "on the fly", but this approach seems really complicated.
am exporting huge data to excel using php.
i change that to csv and text.
i see no difference in file size.
So PHP performance have anything to do with file format.
even if file format is different , rows and columns is same.
consider 60 column and 100000 rows.
is there any optimizing technique,other than ini memory limit and execution time.
we have to taken care
As far as I know, the various Excel libraries for PHP will build the spreadsheet in-memory, which could cause problems for very large data sets. CSV/txt, on the other hand, can be written out to disk or the client for each row, so memory usage is minimal.
Performance-wise, the Excel libraries will always have larger overhead. There's all kinds of extra Excel-specific binary bits in the file which need special handling in PHP, whereas CSV is just plain text. PHP's core purpose is to be able to spit out large amounts of text very quickly, so generating csv/txt is going to be faster, always. And of course, there's function call over head. In pseudo code, consider the difference between:
CSV:
echo "$column1, $column2, $column3, $column4";
versus Excel:
$workbook->write('A1', $column1);
$workbook->write('B1', $column2);
$workbook->write('C1', $column3);
$workbook->write('D1', $column3);
etc...
On the plus side for Excel, particularly with XLSX, there is some compression so the same amount of data will take up less space. This can be mitigated somewhat by using compression in the webserver, or feeding the CSV/txt output into a Zip library server-side.
I'm trying to read a 17MB excel file (2003) with PHPExcel1.7.3c, but it crushes already while loading the file, after exceeding the 120 seconds limit I have.
Is there another library that can do it more efficiently? I have no need in styling, I only need it to support UTF8.
Thanks for your help
Filesize isn't a good measure when using PHPExcel, it's more important to get some idea of the number of cells (rowsxcolumns) in each worksheet.
If you have no need for styling, are you calling:
$objReader->setReadDataOnly(true);
before loading the file?
If you don't need to access all worksheets, or only certain cells within a worksheet, look at using
$objReader->setLoadSheetsOnly(array(1,2))
or
$objReader->setLoadSheetsOnly(1)
or defining a readFilter
Are you using cell caching? If so, what method? That slows down the load time.
If the only thing you need from your read Excel file is data, here is my way to read huge Excel files :
I install gnumeric on my server, ie with debian/ubuntu :
apt-get install gnumeric
Then the php calls to read my excel file and store it into a two dimensionnal data array are incredibly simple (dimensions are rows and cols) :
system("ssconvert \"$excel_file_name\" \"temp.csv\"");
$array = array_map("str_getcsv", file("temp.csv"));
Then I can do what I want with my array. This takes less than 10 seconds for a 10MB large xls file, the same time I would need to load the file on my favorite spreadsheet software !
For very huge files, you should use fopen() and file_getcsv() functions and do what you have to do without storing data in a huge array to avoid storing the whole csv file in memory with the file() function. This will be slower, but will not eat all your server's memory !
17MB is a hefty file.
Time how long a 1MB file takes to parse so you can work out how long a 17MB file would take. Then one option might be just to increase your 120 second limit.
Alternatively, you could export to CSV, which will be way more efficient, and import via PHP's fgetcsv.
I've heard that Excel Explorer is better in reading large files.
Maybe you could convert/export into csv, and use built-in fgetcsv(). Depends on what kind of functionality you need.
I'm currently using the spreadsheet-reader (https://github.com/nuovo/spreadsheet-reader) which is quite fast in reading XLSX, ODS and CSV, and has the problems mentioned only in reading the XLS format.