Reading large excel file with PHP - php

I'm trying to read a 17MB excel file (2003) with PHPExcel1.7.3c, but it crushes already while loading the file, after exceeding the 120 seconds limit I have.
Is there another library that can do it more efficiently? I have no need in styling, I only need it to support UTF8.
Thanks for your help

Filesize isn't a good measure when using PHPExcel, it's more important to get some idea of the number of cells (rowsxcolumns) in each worksheet.
If you have no need for styling, are you calling:
$objReader->setReadDataOnly(true);
before loading the file?
If you don't need to access all worksheets, or only certain cells within a worksheet, look at using
$objReader->setLoadSheetsOnly(array(1,2))
or
$objReader->setLoadSheetsOnly(1)
or defining a readFilter
Are you using cell caching? If so, what method? That slows down the load time.

If the only thing you need from your read Excel file is data, here is my way to read huge Excel files :
I install gnumeric on my server, ie with debian/ubuntu :
apt-get install gnumeric
Then the php calls to read my excel file and store it into a two dimensionnal data array are incredibly simple (dimensions are rows and cols) :
system("ssconvert \"$excel_file_name\" \"temp.csv\"");
$array = array_map("str_getcsv", file("temp.csv"));
Then I can do what I want with my array. This takes less than 10 seconds for a 10MB large xls file, the same time I would need to load the file on my favorite spreadsheet software !
For very huge files, you should use fopen() and file_getcsv() functions and do what you have to do without storing data in a huge array to avoid storing the whole csv file in memory with the file() function. This will be slower, but will not eat all your server's memory !

17MB is a hefty file.
Time how long a 1MB file takes to parse so you can work out how long a 17MB file would take. Then one option might be just to increase your 120 second limit.
Alternatively, you could export to CSV, which will be way more efficient, and import via PHP's fgetcsv.

I've heard that Excel Explorer is better in reading large files.

Maybe you could convert/export into csv, and use built-in fgetcsv(). Depends on what kind of functionality you need.

I'm currently using the spreadsheet-reader (https://github.com/nuovo/spreadsheet-reader) which is quite fast in reading XLSX, ODS and CSV, and has the problems mentioned only in reading the XLS format.

Related

Spout open function issue with large xlsx

Spout open function takes ages to open a xlsx file containing 180000 rows.
am using spout to read excell sheets,when an xlsx file containg 180000 records ,the open process takes 25 mins.what is the possible cause to this issue,and what is the possible solution.N/B i have no control over the excel production .
i was using PHPExcel but was advised to use spout to avoid memory leaks
$callStartTime = microtime(true);
$reader = ReaderEntityFactory::createXLSXReader();
$reader->open($filePath);
During the loading process, Spout extracts all the strings that are used in the spreadsheet. This may take quite some time, depending on the number of strings, as it requires some IO for reading a file and writing on disk chunks of 10,000 strings (so if you have 200,000 strings, Spout will write 200 files).
If you had access to creation of the file, I would have suggested you to use "inline strings" instead of "shared strings". But since you don't, there's not much you can do without editing Spout's code directly. But if you wish to do so, you can take a look at this part. You can try tweaking these values according to your needs and see if that works better.

Creating large Excel files with PHPExcel

I am trying to create some very large excel documents with PHPExcel however I don't want to have to increase the memory limit within PHP.
Would it be possible to have it write say 100 rows at a time and then save it to disk. That way the whole document doesn't need to be stored in memory.
I am based in AWS so I was thinking of using S3 to store the temporary spreadsheet before it is completed and downloaded to the users computer.
If someone has experience with this method and could provide some guidance that would be great.
No it isn't possible to write 100 rows, then save to disk, then write the next 100 rows and save that to disk (appending it to the original 100 rows)..... Unlike CSV, native format Excel files aren't structured in a simple linear manner
Have you looked at any of the options that PHPExcel provides for reducing memory usage, such as cell caching?
Try https://packagist.org/packages/avadim/fast-excel-writer, I use this library for generation huge XLSX-files in 100K+ rows
I advise to use FastExcelWriter instead of the PHPExcel library. This library write to file row by row exactly you want. So using of FastExcelWriter resolves the problem with memory limit

Possible to write large csv file as xls sheets in PHP?

So I have a situation where I need to offer the user a multi-sheet excel document where each sheet has thousands of rows and ~10 columns. Data is coming from multiple MySQL queries.
Currently using "Laravel Excel" library to do this, but it uses up way too much memory and is giving me huge scalability problems.
So I have an idea to use MySQL OUTFILE to write a set of csv files on disk, one for each sheet, and then to create an xls document and write the previously written csv data as sheets in the xls.
Is there a method to accomplish writing the csv contents to a sheet "in bulk" per-say without iterating line-by-line through the csv or using up a large amount of memory (like writing to disk directly perhaps?)
Thanks for any help!
I had a very similar problem recently. My solution was to use the very lightweight PHP library PHP_XLSXWriter.
You can find it here: https://github.com/mk-j/PHP_XLSXWriter
It streams the output so it doesn't have to retain as much in memory.
In my usage case, I broke apart the "writeStream" method into three methods: one for each the header and footer, and one for the sheet content (i.e. the actual rows). This way I could write the header and then use Laravel's "chunking" feature, to get even more gradual with the writes.
The time increased slightly, but executing the script went from ~200Mb of RAM usage to under 15Mb!
Excel file formats (both BIFF and OfficeOpenXML) are not conducive to writing line-by-line like a CSV, because data isn't stored linearly. This means that all PHP libraries for writing native format Excel files have to work in PHP memory to manage the order of writing data to that file format, which means they will all consume large amounts of memory for larger volumes of data.
Laravel Excel is a wrapper around PHPExcel, which provides some options for reducing memory usage (eg. caching cell data to disk or SQLite database rather than holding it all in PHP memory), albeit at a cost in execution speed. What I don't know is whether Laravel Excel provides calls to enable these caching methods, though I believe some options are available allowing you to configure this.
Your alternative on a Linux platform is using non-PHP solutions like libXl or PUNO with Open/Libre Office

How to process 80MB+ xlsx to database MySQL with PHPExcel?

I need to insert all the data in an Excel file (.xlsx) to my database. I have tried all the available methods, such as caching, make it read chunk by chunk but nothing seems to work at all. Has anyone tried to do this with a big file size before? My spreadsheet has about 32 columns and about 700,000 rows of records.
The file is already uploaded in the server. And I want to write a cron job to actually read the excel file and insert the data to the database. I chunk it to read each time 5000, 3000 or even 10 records only, but none worked. What happens is it will return this error:
simplexml_load_string(): Memory allocation failed: growing buffer.
I did try with CSV file type and manage to get the thing run at 4000k records each time but will take about five minutes each time to process, but any higher will fail too. And get the same error. But the requirement need it in .xlsx file types, so I need to stick with that.
Consider converting it to CSV format using external tool, like ssconvert from Gnumeric package and then read CSV line by line with fgetcsv function.
Your issue occurs because you are trying to read the contents of a whole XML file. Caching and reading chunk by chunk does not help because the library you are using needs to read the entire XML file at one point to determine the structure of the spreadsheet.
So for very large files, the XML file is so big that reading it consumes all the available memory. The only working option is to use streamers and optimize the reading.
This is still a pretty complex problem. For instance, to resolve the data in your sheet, you need to read the shared strings from one XML file and the structure of your sheet from another one. Because of the way shared strings are stored, you need to have those strings in memory when reading the sheet structure. If you have thousands of shared strings, that becomes a problem.
If you are interested, Spout solves this problem. It is open-source so you can take a look at the code!

PHPExcel large data sets with multiple tabs - memory exhausted

Using PHPExcel I can run each tab separately and get the results I want but if I add them all into one excel it just stops, no error or any thing.
Each tab consists of about 60 to 80 thousand records and I have about 15 to 20 tabs. So about 1600000 records split into multiple tabs (This number will probably grow as well).
Also I have tested the 65000 row limitation with .xls by using the .xlsx extension with no problems if I run each tab it it's own excel file.
Pseudo code:
read data from db
start the PHPExcel process
parse out data for each page (some styling/formatting but not much)
(each numeric field value does get summed up in a totals column at the bottom of the excel using the formula SUM)
save excel (xlsx format)
I have 3GB of RAM so this is not an issue and the script is set to execute with no timeout.
I have used PHPExcel in a number of projects and have had great results but having such a large data set seems to be an issue.
Anyone every have this problem? work around? tips? etc...
UPDATE:
on error log --- memory exhausted
Besides adding more RAM to the box is there any other tips I could do?
Anyone every save current state and edit excel with new data?
I had the exact same problem and googling around did not find a valuable solution.
As PHPExcel generates Objects and stores all data in memory, before finally generating the document file which itself is also stored in memory, setting higher memory limits in PHP will never entirely solve this problem - that solution does not scale very well.
To really solve the problem, you need to generate the XLS file "on the fly". Thats what i did and now i can be sure that the "download SQL resultset as XLS" works no matter how many (million) row are returned by the database.
Pity is, i could not find any library which features "drive-by" XLS(X) generation.
I found this article on IBM Developer Works which gives an example on how to generate the XLS XML "on-the-fly":
http://www.ibm.com/developerworks/opensource/library/os-phpexcel/#N101FC
Works pretty well for me - i have multiple sheets with LOTS of data and did not even touch the PHP memory limit. Scales very well.
Note that this example uses the Excel plain XML format (file extension "xml") so you can send your uncompressed data directly to the browser.
http://en.wikipedia.org/wiki/Microsoft_Office_XML_formats#Excel_XML_Spreadsheet_example
If you really need to generate an XLSX, things get even more complicated. XLSX is a compressed archive containing multiple XML files. For that, you must write all your data on disk (or memory - same problem as with PHPExcel) and then create the archive with that data.
http://en.wikipedia.org/wiki/Office_Open_XML
Possibly its also possible to generate compressed archives "on the fly", but this approach seems really complicated.

Categories