Reading large .xls file with PHP - php

At the moment I am doing a mass interface of files/data and some files are in XLS format, which I need to normalize them into csv (so basically, convert XLS to CSV files)
The problem is that PHPExcel (and similar libraries) load the entire sheet data at once thus exhausting memory.
So far I tried various libraries (in the meantime negotiating to have the data in csv though no luck so far)
I am running my tests on various large file sizes, my memory allocation is set properly before and after my script runs using ini_set etc.
Is there a way that I can read an xls line by line or in chunks (like fgetcsv or fread) please?
I am programming this so it can work with any filesize (even if it takes ages to run) as this is a fully automated system.
PS: I checked this post and various others already
Reading an Excel file in PHP

Possible ways...
Get help from other languages. e.g. find a Python excel library and use it. Then call Python from PHP.
Modify the source code of those Excel readers
Use a command line tool to convert excel to csv, e.g. Pandoc maybe, and use the csv in PHP
Since xls file is nothing but a zip file, maybe it can be unzipped and found the values
First decompose one xls into many small xls files via non-PHP solution, e.g. VBA in excel, then read each of them.

Related

PHP - Read excel file

I want to read an Excel file with PHP row by row because reading the entire file at once cause memory overflow.
I have searched a lot, but no luck until now.
I think PHPExcel library can read chunks of an excel file, when you implement the filter class, but each time it gets this chunk it reads the entire file, which is impossible in huge .xls files because of the time it will take.
Any help ?
This may be something that is totally out of question, but from the information that I get from your question the following seems like an obvious option, at least something to consider ...
I get the impression that this is a really big file that needs to be accessed often. So, I would just try to import its data in a database.
I guess there is no need to explain that databases are masters in performance and caching.
And it is still possible to export the contents of the database to an excel file afterwards.
MySql works great with PHP and is certainly easier to access than an excel file. Most php hosting providers offer a MySql database by default with a PhpMyAdmin management tool.
How to do it:
If you have PhpMyAdmin installed, then you can follow these simple steps.
If you have command-line access to the server then you can even import the file from commandline directly to a MySql database.
If the only thing you need from your read Excel file is data, here is my way to read huge Excel files :
I install gnumeric on my server, ie with debian/ubuntu :
apt-get install gnumeric
Then the php calls to read my excel file and store it into a two dimensionnal data array are incredibly simple (dimensions are rows and cols) :
system("ssconvert \"$excel_file_name\" \"temp.csv\"");
$array = array_map("str_getcsv", file("temp.csv"));
Then I can do what I want with my array. This takes less than 10 seconds for a 10MB large xls file, the same time I would need to load the file on my favorite spreadsheet software !
For very huge files, you should use fopen() and file_getcsv() functions and do what you have to do without storing data in a huge array to avoid storing the whole csv file in memory with the file() function. This will be slower, but will not eat all your server's memory !

how to load just a few rows of .xls file with phpexcel?

I have a trouble with converting a .xls file (Excel) to CSV in PHPExcel.
All works fine until comes some Big file. My php script just exceeds the memory limit and blows up. I cannot use more than 64MB because of the specifics of the computer. I'm running Apache on it.
We need to find a solution.
I think I have to tell PHPExcel to load just a few lines of Excel than convert it to small CSV, save it, free the used memory and so on with the rest of the file until it's done...
What you think about? Can we find the more accurate way of doing it.
You have a few options for saving memory with PHPExcel. The main two are:
cell caching, described in section 4.2.1 of the developer
documentation,
This allows you to reduce the memory overhead for each cell that is read from the file
Chunking, described in section 4.3 of the User Documentation for
Readers
This allows you to read small ranges of rows and
columns from a file, rather than the whole worksheet

Best way to import large amounts of data (Csv vs xls vs xlsx vs ?)

I have a php application that needs to work on many configurations of php with as little requirements outside of the code igniter framework as possible.
I have an import function right now that uses .csv files. Csv is pretty good as if is cross platform. But people have trouble with it when using excel. It also can't display chiense characters correctly.
Then there is .xls and .xlsx files. There are libraries for these but often require php_zip
What option should I choose that works with many php installs and is good for display and import?
there may be chances of information lost in the export to CSV.
It will only save the values of the cells - not their formatting informations.
There's no way you'll read an .xlsx file without unzipping it, which means you'll need a zip lib.
PHPExcel handles several formats of excel files, but it can be a bit resource hungry.
http://phpexcel.codeplex.com/
XLSX2CSV is less resource intensive, but only reads one page of multi-page worksheets, doesn't read parse formulas and doesn't handle .xls files.
http://davidacollins.com/weblog/xlsx2csv

Converting plain text HTML file to binary Excel 97-2003

I have an 'Excel' file (with a .xls extension) which turns out to be a plain text HTML file masquerading as a spreadsheet (if I run 'file [filename]' I get 'HTML document text' as the type). The file comes from a third party supplier and I have no control over the format.
I want to convert the file into Excel 97-2003 format so that I can read it in a PHP library (PHPExcel). I can do this by opening the file in Excel, ignoring the warning message and then explicitly saving it as Excel 97-2003, but I want to automate the whole process from the initial file coming in to extracting the cell data and dumping it into a database.
Ideally I'd like to use a PHP library for the conversion, because that would integrate better with the rest of the codebase, but libraries written in Perl, Java or (at a pinch) C# would also work, provided they don't rely on the server running Windows and Office.
Is there a tool or library available which can provide this functionality?
PhpExcel http://phpexcel.codeplex.com/ is decent but you'll have issues with it gobbling up memory with large sheets. For large sheets or speed I'd recommend perl writeExcel http://search.cpan.org/~jmcnamara/Spreadsheet-WriteExcel-2.37/lib/Spreadsheet/WriteExcel.pm
The perl writeExcel library is faster and uses less memory than PhpExcel. I then use
<?php
echo passthru('perl filename.pl');
?>
to run the perl script through PHP.
It looks like for the moment the only answer is to manually process the file by opening it in Excel and re-saving it, which does work but doesn't allow for complete automation.
I'll take a look at the new version of PHPExcel with HTML support once it has been released though as that sounds promising.

writing to a XLS file using PHP

My application generates some .xls files and until now I was using PHPExcel lib. One of the SO has recommend me to use this approach. The problem is that I have to use some .xls templates and to append some data to them.
Who can help me with some pointers. I don't get how xlsBOF() and xlsEOF() works or have to work in my case.
If the approach you use right now works for you, don't bother with anything else.
PHPExcel writes XML files (or more accurately zip files containing XML files), in the new Excel 2007 format. For this reason, it's not compatible with older office versions (unless you install the compatibility plugin in the older office).
What this code does is write a binary XLS file in Excel 97 (BIFF8) format. It's a bit of a hack though. This won't deal correctly with unicode issues and so on. xlsBOF writes the binary header of the XLS file, and xlsEOF the footer.
If you want to write binary XLS files, you're better off using PEAR Excel Writer. I have mixed experiences with that. It gets the job done, but to use it with unicode you have to look through the bug list for a few patches that fix BIFF8 format bugs (the package is poorly maintained). It's still better than the code you linked to though.
Update: PHPExcel supports export as Excel 97 also. I remember that it used to be limited to the office 2007 file format, but apparently currently it's not. So I would recommend using PHPExcel.

Categories