Spout open function takes ages to open a xlsx file containing 180000 rows.
am using spout to read excell sheets,when an xlsx file containg 180000 records ,the open process takes 25 mins.what is the possible cause to this issue,and what is the possible solution.N/B i have no control over the excel production .
i was using PHPExcel but was advised to use spout to avoid memory leaks
$callStartTime = microtime(true);
$reader = ReaderEntityFactory::createXLSXReader();
$reader->open($filePath);
During the loading process, Spout extracts all the strings that are used in the spreadsheet. This may take quite some time, depending on the number of strings, as it requires some IO for reading a file and writing on disk chunks of 10,000 strings (so if you have 200,000 strings, Spout will write 200 files).
If you had access to creation of the file, I would have suggested you to use "inline strings" instead of "shared strings". But since you don't, there's not much you can do without editing Spout's code directly. But if you wish to do so, you can take a look at this part. You can try tweaking these values according to your needs and see if that works better.
Related
I use PHPexcel_1.8 to generate Microsoft Excel 97-2003 Worksheet (.xls)
$objWriter = PHPExcel_IOFactory::createWriter($objPHPExcel, 'Excel5');
$objWriter->save($myFile);
If I open the file and press ctr+s (without making any change), the size drops almost to 50%.
Simple.... PHPExcel doesn't waste precious PHP overhead resources in speed and memory usage to optimize storage of the file data; whereas MS Excel will reduce filesize requirements by optimizing storage. Building a native Excel format file in PHP is slow and memory hungry already
As an example of this, all string data is maintained in a shared strings table. When two or more cells contain the same string value, MS Excel will point them both to the same entry in the shared strings table, so that the string data is only stored once. PHPExcel doesn't do this check to see if a string value is already in the shared strings table, but simply creates a new entry, so the string is stored twice. This reduces time taken to save the data by eliminating the overhead of checking is the string already in the table, but at a cost in duplication and hence in filesize.
The key question is "which is more important? To be able to read/edit/write native Excel files in PHP? To keep PHPExcel as fast and low-memory as possible?
Or to have the smallest possible filesize on disk?"
So I have a situation where I need to offer the user a multi-sheet excel document where each sheet has thousands of rows and ~10 columns. Data is coming from multiple MySQL queries.
Currently using "Laravel Excel" library to do this, but it uses up way too much memory and is giving me huge scalability problems.
So I have an idea to use MySQL OUTFILE to write a set of csv files on disk, one for each sheet, and then to create an xls document and write the previously written csv data as sheets in the xls.
Is there a method to accomplish writing the csv contents to a sheet "in bulk" per-say without iterating line-by-line through the csv or using up a large amount of memory (like writing to disk directly perhaps?)
Thanks for any help!
I had a very similar problem recently. My solution was to use the very lightweight PHP library PHP_XLSXWriter.
You can find it here: https://github.com/mk-j/PHP_XLSXWriter
It streams the output so it doesn't have to retain as much in memory.
In my usage case, I broke apart the "writeStream" method into three methods: one for each the header and footer, and one for the sheet content (i.e. the actual rows). This way I could write the header and then use Laravel's "chunking" feature, to get even more gradual with the writes.
The time increased slightly, but executing the script went from ~200Mb of RAM usage to under 15Mb!
Excel file formats (both BIFF and OfficeOpenXML) are not conducive to writing line-by-line like a CSV, because data isn't stored linearly. This means that all PHP libraries for writing native format Excel files have to work in PHP memory to manage the order of writing data to that file format, which means they will all consume large amounts of memory for larger volumes of data.
Laravel Excel is a wrapper around PHPExcel, which provides some options for reducing memory usage (eg. caching cell data to disk or SQLite database rather than holding it all in PHP memory), albeit at a cost in execution speed. What I don't know is whether Laravel Excel provides calls to enable these caching methods, though I believe some options are available allowing you to configure this.
Your alternative on a Linux platform is using non-PHP solutions like libXl or PUNO with Open/Libre Office
I need to insert all the data in an Excel file (.xlsx) to my database. I have tried all the available methods, such as caching, make it read chunk by chunk but nothing seems to work at all. Has anyone tried to do this with a big file size before? My spreadsheet has about 32 columns and about 700,000 rows of records.
The file is already uploaded in the server. And I want to write a cron job to actually read the excel file and insert the data to the database. I chunk it to read each time 5000, 3000 or even 10 records only, but none worked. What happens is it will return this error:
simplexml_load_string(): Memory allocation failed: growing buffer.
I did try with CSV file type and manage to get the thing run at 4000k records each time but will take about five minutes each time to process, but any higher will fail too. And get the same error. But the requirement need it in .xlsx file types, so I need to stick with that.
Consider converting it to CSV format using external tool, like ssconvert from Gnumeric package and then read CSV line by line with fgetcsv function.
Your issue occurs because you are trying to read the contents of a whole XML file. Caching and reading chunk by chunk does not help because the library you are using needs to read the entire XML file at one point to determine the structure of the spreadsheet.
So for very large files, the XML file is so big that reading it consumes all the available memory. The only working option is to use streamers and optimize the reading.
This is still a pretty complex problem. For instance, to resolve the data in your sheet, you need to read the shared strings from one XML file and the structure of your sheet from another one. Because of the way shared strings are stored, you need to have those strings in memory when reading the sheet structure. If you have thousands of shared strings, that becomes a problem.
If you are interested, Spout solves this problem. It is open-source so you can take a look at the code!
am exporting huge data to excel using php.
i change that to csv and text.
i see no difference in file size.
So PHP performance have anything to do with file format.
even if file format is different , rows and columns is same.
consider 60 column and 100000 rows.
is there any optimizing technique,other than ini memory limit and execution time.
we have to taken care
As far as I know, the various Excel libraries for PHP will build the spreadsheet in-memory, which could cause problems for very large data sets. CSV/txt, on the other hand, can be written out to disk or the client for each row, so memory usage is minimal.
Performance-wise, the Excel libraries will always have larger overhead. There's all kinds of extra Excel-specific binary bits in the file which need special handling in PHP, whereas CSV is just plain text. PHP's core purpose is to be able to spit out large amounts of text very quickly, so generating csv/txt is going to be faster, always. And of course, there's function call over head. In pseudo code, consider the difference between:
CSV:
echo "$column1, $column2, $column3, $column4";
versus Excel:
$workbook->write('A1', $column1);
$workbook->write('B1', $column2);
$workbook->write('C1', $column3);
$workbook->write('D1', $column3);
etc...
On the plus side for Excel, particularly with XLSX, there is some compression so the same amount of data will take up less space. This can be mitigated somewhat by using compression in the webserver, or feeding the CSV/txt output into a Zip library server-side.
I'm trying to read a 17MB excel file (2003) with PHPExcel1.7.3c, but it crushes already while loading the file, after exceeding the 120 seconds limit I have.
Is there another library that can do it more efficiently? I have no need in styling, I only need it to support UTF8.
Thanks for your help
Filesize isn't a good measure when using PHPExcel, it's more important to get some idea of the number of cells (rowsxcolumns) in each worksheet.
If you have no need for styling, are you calling:
$objReader->setReadDataOnly(true);
before loading the file?
If you don't need to access all worksheets, or only certain cells within a worksheet, look at using
$objReader->setLoadSheetsOnly(array(1,2))
or
$objReader->setLoadSheetsOnly(1)
or defining a readFilter
Are you using cell caching? If so, what method? That slows down the load time.
If the only thing you need from your read Excel file is data, here is my way to read huge Excel files :
I install gnumeric on my server, ie with debian/ubuntu :
apt-get install gnumeric
Then the php calls to read my excel file and store it into a two dimensionnal data array are incredibly simple (dimensions are rows and cols) :
system("ssconvert \"$excel_file_name\" \"temp.csv\"");
$array = array_map("str_getcsv", file("temp.csv"));
Then I can do what I want with my array. This takes less than 10 seconds for a 10MB large xls file, the same time I would need to load the file on my favorite spreadsheet software !
For very huge files, you should use fopen() and file_getcsv() functions and do what you have to do without storing data in a huge array to avoid storing the whole csv file in memory with the file() function. This will be slower, but will not eat all your server's memory !
17MB is a hefty file.
Time how long a 1MB file takes to parse so you can work out how long a 17MB file would take. Then one option might be just to increase your 120 second limit.
Alternatively, you could export to CSV, which will be way more efficient, and import via PHP's fgetcsv.
I've heard that Excel Explorer is better in reading large files.
Maybe you could convert/export into csv, and use built-in fgetcsv(). Depends on what kind of functionality you need.
I'm currently using the spreadsheet-reader (https://github.com/nuovo/spreadsheet-reader) which is quite fast in reading XLSX, ODS and CSV, and has the problems mentioned only in reading the XLS format.