PHPExcel Writer object uses huge memory and time outs eventually - php

I am creating a excel sheet using:
Codeigniter 2.2.1
PHP 5.4.25
Apache 2.4.7
XAMPP 1.8.2
PHPExcel 1.8.0
I am fetching 35000 rows (mostly empty) with 79 columns from database and writing to excel file (Excel5).
It works just fine for 30000 rows with peak memory usage of 1.44GB,file size 24MB. But when I go for 35000 it times out saying,
Fatal error: Out of memory (allocated 1780219904) (tried to allocate 17301483 bytes) in E:\XAMPP\htdocs\ProjectName\application\libraries\PHPExcel\Writer\Excel5\BIFFwriter.php on line 144
When I am creating PHPExcel object it uses big amount of memory and time outs finally. I have set "memory_limit= -1" and "max_execution_time= 1000" in php.ini file and tried different cache storage method in PHPExcel.
My algorithm in controller looks like this
public function write_controller() {
error_reporting(E_ALL);
ini_set("display_errors", 1);
ini_set('memory_limit', '-1'); //-1 for unlimited memory
$dir = "assets/output/";
//FIRST CHECK IF PREVIOUS FILE EXISTS OR NOT
$this->clear_directory($dir);
// Loading PHPExcel library
$this->load->library('PHPExcel');
$this->load->library('PHPExcel/IOFactory');
$cacheMethod = PHPExcel_CachedObjectStorageFactory::cache_to_phpTemp;
$cacheSettings = array('memoryCacheSize' => '5000MB', 'cacheTime' => '1000');
PHPExcel_Settings::setCacheStorageMethod($cacheMethod, $cacheSettings);
//First Create the xls file and then insert rest of the data
$objPHPExcel = new PHPExcel();
$objPHPExcel->getProperties()->setTitle("export")->setDescription("none");
//activate sheet number 1
$objPHPExcel->setActiveSheetIndex(0);
//Setting font styles
$objPHPExcel->getActiveSheet()->getDefaultStyle()->getFont()->setName('Arial')->setSize(8)->setBold(false);
//Setting number format as TEXT
$objPHPExcel->getActiveSheet()->getDefaultStyle()->getNumberFormat()->setFormatCode(PHPExcel_Style_NumberFormat::FORMAT_TEXT);
//Freezing first top row
$objPHPExcel->getActiveSheet()->freezePane('A2');
$objWorksheet = $objPHPExcel->getActiveSheet();
$row1 = 1;
$objWorksheet->setCellValueByColumnAndRow(0, $row1, "Site name");
$objWorksheet->setCellValueByColumnAndRow(1, $row1, "Vendor_1");
$objWorksheet->setCellValueByColumnAndRow(2, $row1, "Status");
$objWorksheet->setCellValueByColumnAndRow(3, $row1, "Easting");
$objWorksheet->setCellValueByColumnAndRow(4, $row1, "Northing");
$objWorksheet->setCellValueByColumnAndRow(5, $row1, "Sector_1");
.
.
.
//Rest of the 74 columns
$style = array(
'alignment' => array(
'horizontal' => PHPExcel_Style_Alignment::HORIZONTAL_CENTER,
'vertical' => PHPExcel_Style_Alignment::VERTICAL_CENTER)
);
$objWorksheet->getDefaultStyle()->applyFromArray($style);
$objWriter = IOFactory::createWriter($objPHPExcel, 'Excel5');
$saved_location ='assets/output/Piano11.xls';
$objWriter->save($saved_location);
//Now reading the saved xls file
$objReader = new PHPExcel_Reader_Excel5();
$newPHPExcel = $objReader->load($saved_location);
$newWorksheet = $newPHPExcel->getActiveSheet();
//Now insert rest of the data from Piano table which will come from database
$table_name = 'piano_test';
$query = $this->db->get('tbl_piano');
if (!$query) {
return false;
}
// Fetching the data from table
$fields = $query->list_fields();
$row = 2;
foreach ($query->result() as $data) {
set_time_limit(0);
$col = 0;
foreach ($fields as $field) {
$newWorksheet->setCellValueByColumnAndRow($col, $row, $data->$field); //<- This skips leading 0s
$col++;
}
$row++;
}
$newobjWriter = IOFactory::createWriter($newPHPExcel, 'Excel5');
$newobjWriter->save('assets/output/Piano11.xls');
echo 'Memory peak usage: <b>'.$this->convert(memory_get_peak_usage(true)).'</b><br/>';
gc_collect_cycles();//garbage collector
echo 'inserted.';
}
Any solution how can I minimize memory usage & execution time? Or Any other alternative solution? Or Should I change my algorithm?

You're using phptemp for caching, but with settings
$cacheSettings = array('memoryCacheSize' => '5000MB', 'cacheTime' => '1000');
This means that PHPExcel will use 5000MB (5GB) of your PHP memory before its starts to make use of phptemp for caching..... I'd be surprised if you had php.ini max memory settings allowing PHP to use that much memory
You should use a much lower value for memoryCacheSize, perhaps 512MB, which means that PHPExcel will only use 512MB of PHP Memory for caching cell data before it switches to using php://temp

Related

Memory while reading large Excel 2007 (.xlsx)

I'm using PHPExcel which I've used before many times. The problem I have now is when reading Excel2007 files (.xlsx - format). What I'm doing is simply looping the the .xlsx file and creating an array by row/column and then print_r()-ing the results after the read operation to make sure the data output is good before importing it into a MySQL database.
Now when reading the Excel2007 .xlsx file (6MB) the output fails, but whats interesting is if I save the file as the older format .xls (1992-2004 - Excel5) the file becomes larger (16MB) but outputs correctly. This made me think originally it wasn't a memory problem since the older larger .xls file (16MB) ran with no problems and was almost 3x the size of the .xlsx file (6MB).
For test purposes I then copied 25 rows of the 30,000 in the .xlsx (6MB) file and created a new Excel2007 .xlsx and ran the import against the smaller 25 row data-set and it output correctly. This then led me think that it is a memory problem however related specifically to the .xlsx format...
I'm running the server on Amazon Web Services and have C4.Xlarge (16-core, 30GB RAM) so I should have plenty of resources to run this operation.
Question: Why does my output fail when reading a smaller .xlsx file vs a larger .xls file, but then succeed with a smaller .xlsx (25-row) file?
//PHP Function
function parse_xls($file){
ini_set('memory_limit','-1');
$type = PHPExcel_IOFactory::identify($file);
$reader = PHPExcel_IOFactory::createReader($type);
$reader->setReadDataOnly(true);
$xls = $reader->load($file);
$sheet = $xls->getActiveSheet();
$highestRow = $sheet->getHighestRow();
$highestColumn = $sheet->getHighestColumn();
$highestColumnIndex = PHPExcel_Cell::columnIndexFromString($highestColumn);
for($row=2; $row <= ($highestRow+2); $row++){
$import[$row] = [];
for($col=0; $col < $highestColumnIndex; $col++){
$result = $sheet->getCellByColumnAndRow($col, $row)->getValue();
array_push($import[$row],$result);
}
}
print_r($import);
die();
}
for big files i use chunkReadFilter
$iChunkSize=1000;
for($iStartRow = $row_start; $iStartRow <= $totalRows; $iStartRow += $iChunkSize) {
$objReader = $oExcel->SetCreateReader();
$oChunkFilter = new chunkReadFilter();
$objReader->setReadFilter($oChunkFilter);
$oChunkFilter->setRows($iStartRow,$iChunkSize);
$objReader->setReadFilter($oChunkFilter);
$objReader->setReadDataOnly(true);
$objPHPExcel = $objReader->load($files['path']);
$objPHPExcel->setActiveSheetIndex($iList);
$sFromCell = 'A'.$iStartRow;
$aData = $objPHPExcel->getActiveSheet()->toArray(null,true,true,false,$sFromCell);
// free memory
unset($objPHPExcel);
unset($objReader);
unset($oChunkFilter);
// parse data
foreach ($aData as $sKey => $aValue) {
...
}
// real data rows
if (count($aData) < $iChunkSize) {
unset($aData);
break;
}
unset($aData);
}

phpexcel memory exhausted with 128Mb memory reading only first row of a big file

I've a memory problem with an xlsx file of about 95.500 rows and 28 columns.
To handle such big file (more than 10 MB xlsx) i wrote below code but when i execute the code and calling the load method i receive a memory exhausted error even with only one row read! (I've assigned only 128Mb to php interpreter)
Please consider that:
Currently i try to read only one single row and the receive the error about memory exhausted (see $chunkFilter->setRows(1,1);)
After solving this problem about reading the first line, i need to read all other lines to load content in a database table
If you think that there is other library or solution, please consider that i prefer PHP as language because is the main language used for this application But i can accept any other solutions with other languages (like go)
Please, don't simply suggest to increment memory of php process. I alredy know that this is possible but this code run on VPS shared server with only 512Mb of RAM maximum and I need to maintain the memory use lowest as possible
there is solution? please find below code that i use:
/** Define a Read Filter class implementing PHPExcel_Reader_IReadFilter to read file in "chunks" */
class chunkReadFilter implements PHPExcel_Reader_IReadFilter {
private $_startRow = 0;
private $_endRow = 0;
/** Set the list of rows that we want to read */
public function setRows($startRow, $chunkSize) {
$this->_startRow = $startRow;
$this->_endRow = $startRow + $chunkSize;
}
public function readCell($column, $row, $worksheetName = '') {
// Only read the heading row, and the rows that are configured in $this->_startRow and $this->_endRow
if (($row == 1) || ($row >= $this->_startRow && $row < $this->_endRow)) {
return true;
}
return false;
}
}
function loadXLSFile($inputFile){
// Initiate cache
$cacheMethod = PHPExcel_CachedObjectStorageFactory:: cache_to_sqlite3;
if (!PHPExcel_Settings::setCacheStorageMethod($cacheMethod)) {
echo date('H:i:s'), " Unable to set Cell Caching using ", $cacheMethod,
" method, reverting to memory", EOL;
}
$inputFileType = PHPExcel_IOFactory::identify($inputFile);
$objReader = PHPExcel_IOFactory::createReader($inputFileType);
$chunkFilter = new chunkReadFilter();
// Tell the Read Filter, the limits on which rows we want to read this iteration
$chunkFilter->setRows(1,1);
// Tell the Reader that we want to use the Read Filter that we've Instantiated
$objReader->setReadFilter($chunkFilter);
$objReader->setReadDataOnly(true);
$objPHPExcel = $objReader->load($inputFile);
}
UPDATE
Below the error returned as requested by pamelus
PHP Fatal error: Allowed memory size of 134217728 bytes exhausted (tried to allocate 112 bytes) in /vendor/phpoffice/phpexcel/Classes/PHPExcel/Reader/Excel2007.php on line 471
PHP Stack trace:
PHP 1. {main}() dataimport.php:0
PHP 2. loadFileToDb($inputFile = *uninitialized*, $tabletoupdate = *uninitialized*) dataimport.php:373
PHP 3. PHPExcel_Reader_Excel2007->load($pFilename = *uninitialized*) dataimport.php:231
Given the low memory limit you have, I can suggest you an alternative to PHPExcel that would solve your problem once and for all: Spout. It only requires 10MB of memory, so you should be good!
Your loadXLSXFile() function would become:
use Box\Spout\Reader\ReaderFactory;
use Box\Spout\Common\Type;
function loadXLSFile($inputFile) {
$reader = ReaderFactory::create(Type::XLSX);
$reader->open($inputFile);
foreach ($reader->getSheetIterator() as $sheet) {
foreach ($sheet->getRowIterator() as $row) {
// $row is the first row of the sheet. Do something with it
break; // you won't read any other rows
}
break; // if you only want to read the first sheet
}
$reader->close();
}
It's that simple! No need for caching, filters, and other optimizations :)

is it possible to import and export excel file with size 70MB using PHPExcel library?

I have one excel file with 3 columns in which 2nd column contains email hyper-link. So I have to import this file and export it with only 2 columns first one should contains name and second one email means I have to split that hyper-link into name and email.
For 31MB file I changed memory limit to 2048MB and execution time 1200 in php.ini file. I can successfully imported and exported excel file of 31MB but while exporting 70MB file execution takes so much time and gives the following error message.
Fatal error: Allowed memory size of 2147483648 bytes exhausted (tried to allocate 15667514 bytes) in /var/www/html/PHPExcel/Reader/Excel2007.php on line 327
Is it possible to import and export excel file with size 70MB using PHPExcel library? And what I have to change like memory limit and max execution time etc in php.ini file.
require "PHPExcel.php";
require "PHPExcel/IOFactory.php";
$inputFileName = 'xxx.xlsx';
$inputFileType = PHPExcel_IOFactory::identify($inputFileName);
$objReader = PHPExcel_IOFactory::createReader($inputFileType);
$objReader->setReadDataOnly(true);
$objPHPExcel = $objReader->load($inputFileName);
$outputObj = new PHPExcel();
// Get worksheet dimensions
$sheet = $objPHPExcel->getSheet(0);
$highestRow = $sheet->getHighestRow();
$outputObj->setActiveSheetIndex(0);
$outSheet = $outputObj->getActiveSheet();
// Loop through each row of the worksheet in turn
for ($row = 2; $row <= $highestRow; $row++){ // As row 1 seems to be header
// Read cell B2, B3, etc.
$line = $sheet->getCell('B' . $row)->getValue();
preg_match("|([^\.]+)\ <([^>]+)>|", $line, $data);
if(!empty($data))
{
// $data[1] will be name & $data[2] will be email
$outSheet->setCellValue('A' . $row, $data[1]);
$outSheet->setCellValue('B' . $row, $data[2]);
}
}
$objWriter = new PHPExcel_Writer_CSV($outputObj);
$objWriter->save("xxx.csv");
NOTE: Can I export excel file without making any changes in php.ini file
I got solution. Successfully I have done this task in python. Hopefully it will help someone. :)
# Time taken 2min 4sec for 69.9MB file.
import csv
import re
from openpyxl import Workbook, load_workbook
location = 'big.xlsx'
wb = load_workbook(filename=location, read_only=True)
users_data = []
# pattern = '^(.+?) <([^>].+)>$' # matches "your name <email#email.com>"
# pattern_new = '^(.+?)<([^>].+)>$' # matches "your name<email#email.com>"
# pattern_email = '([\w.-]+#[\w.-]+)' # extracts email from sentence
# Define patterns to check on string.
patterns = ['^(.+?) <([^>].+)>$', '^(.+?)<([^>].+)>$']
# Loop through all sheets in XLSX
for wsheet in wb.get_sheet_names():
# Load data from Sheet.
ws = wb.get_sheet_by_name(wsheet)
# Loop through each row in current Sheet.
for row in ws.rows:
# We need column B data, so get that directly.
# Check if its not empty.
if row[1].value:
val = ""
# Get column B data, remove unnecessary data and encode using utf-8 format.
data = row[1].value.replace("(at)", "#").replace("(dot)", ".").encode('utf-8')
# Loop through all patterns to match in current data.
for pattern in patterns:
# Apply regex on data.
name_data = re.search(pattern, data)
# If match found.
if name_data:
# Create list of matched data and break loop to avoid extra searches on current row.
val = [name_data.group(1), name_data.group(2)]
# val = name_data.group()
break
# If no matches found, check for only email, if not then use data as it is.
if not val:
# val = data
name_data = re.search('([\w.-]+#[\w.-]+)', data)
# If match found, then use that, else use data.
if name_data:
val = [name_data.group(1)]
else:
val = data
# Append new data to users_data array.
users_data.append(val)
# Open CSV file for writting list.
myfile = open('big.csv', 'wb')
# Open file in write mode.
wr = csv.writer(myfile, dialect='excel', delimiter = ',', quotechar='"', quoting=csv.QUOTE_MINIMAL, lineterminator='\n')
# Loop through each value in list.
for word in users_data:
# Append data in CSV.
wr.writerow([word])
# Close CSV file.
myfile.close()
#Priyanka, you can also try using Spout: https://github.com/box/spout. It works great for large files! You won't have to change your php.ini file, as it won't require more than 10MB of memory and should finish before the default time limit.
You can do something like this:
$filePath = 'xxx.xlsx';
$reader = ReaderFactory::create(Type::XLSX);
$reader->open($filePath);
$writer = WriterFactory::create(Type::CSV);
$writer->openToFile($'xxx.csv');
$rowCount = 0;
while ($reader->hasNextSheet()) {
$reader->nextSheet();
while ($reader->hasNextRow()) {
$row = $reader->nextRow();
$rowCount++;
if ($rowCount === 1) {
continue; // that's for the header row
}
// get the values you need in the current row
// for example:
$name = $row[1];
$email = $row[2];
// write the data to the CSV file
$writer->addRow([$name, $email]);
}
}
$reader->close();
$writer->close();
Give it a try! Hopefully it will solve your problem :)
I don't see the point in loading one spreadsheet file, copying everything from that to a second, then saving the second.... that will be memory and performance intensive
why not just load the first, delete your heading row 1, then save to your CSV output
// Read the original spreadsheet
$inputFileName = 'TraiDBDump.xlsx';
$inputFileType = PHPExcel_IOFactory::identify($inputFileName);
$objReader = PHPExcel_IOFactory::createReader($inputFileType);
$objReader->setReadDataOnly(true);
$objPHPExcel = $objReader->load($inputFileName);
// Remove header row
$objPHPExcel->getSheet(0)->removeRow(1, 1);
// Save as a csv file
$objWriter = new PHPExcel_Writer_CSV($objPHPExcel);
$objWriter->save("TraiDBDump.csv");
If your original has a lot of columns, and you only need A and B, then you could use a read filter to read only those two columns

Uploading data from Excel to DB in PHP making size issues

I have to upload excel and read data then writing to DB. I need to get data from 2mb excel file. I have done coding up to reading excel data. While uploading excel there was shown some issues
Fatal error: Allowed memory size of 134217728 bytes exhausted (tried to allocate 20 bytes)
I have solved this by using
ini_set('memory_limit', '-1');
and also execution time issues that also somehow managed by
set_time_limit(0);
the problem is when I'm trying to upload excel size more than 300kb, it taking too much to complete the execution. Can anyone suggest me best way to read data from excel ? Is there any performance issue if I'm storing the values to an array before inserting to DB ?
Adding my codes here
<?php
/** Include path **/
set_include_path(get_include_path() . PATH_SEPARATOR . 'Classes/');
/** PHPExcel_IOFactory */
include 'PHPExcel/IOFactory.php';
include('../includes/class_read_xl.php');
$obj_read_xl=new class_read_xl();
if(isset($_POST["upload"])) {
move_uploaded_file($_FILES["ufile"]["tmp_name"], "../uploads/xlsheet/" . $_FILES["ufile"]["name"]);
$file = $_FILES['ufile']['name'];
$inputFileName = '../uploads/xlsheet/'.$file;
//$inputFileName = "test.xls";
$inputFileType = PHPExcel_IOFactory::identify($inputFileName);
$objReader = PHPExcel_IOFactory::createReader($inputFileType);
$objReader->setReadDataOnly(false);
// Load $inputFileName to a PHPExcel Object
$objPHPExcel = $objReader->load($inputFileName);
$total_sheets=$objPHPExcel->getSheetCount(); // here 4
$allSheetName=$objPHPExcel->getSheetNames(); // array ([0]=>'student',[1]=>'teacher',[2]=>'school',[3]=>'college')
//print_r($allSheetName);
$objWorksheet = $objPHPExcel->setActiveSheetIndex(0); // first sheet
$highestRow = $objWorksheet->getHighestRow(); // here 5
$highestColumn = $objWorksheet->getHighestColumn(); // here 'E'
$highestColumnIndex = PHPExcel_Cell::columnIndexFromString($highestColumn); // here 5
//exit();
$arr_data=array();
for ($row = 1; $row <= $highestRow; ++$row) {
for ($col = 0; $col <= $highestColumnIndex; ++$col) {
$value = $objWorksheet->getCellByColumnAndRow($col, $row)->getValue();
if(is_array($arr_data) ) {
$arr_data[$row-1][$col]=$value;
}
}
}
print_r($arr_data);
}
?>
Doesn't look as though you're using any of the recommended memory saving methods such as cell caching or "chunking", which are described in the PHPExcel documentation - see section 4.2.1 of the developer documentation (entitled "Cell Caching", and various sections of the User documentation for Reading Spreadsheet files.
If you're only actually reading data from a single worksheet in the file, just load that one worksheet, not all four
Storing values to an array will significantly increase the memory requirements of the script, because your array will also take up memory. However, rather than reading every cell in a row individually, you could use the rangeToArray() method for each row, which wouldn't be such a big memory overhead, and is faster than reading each individual cell

PHPExcel taking an extremely long time to read Excel file

I'm using PHPExcel 1.7.8, PHP 5.4.14, Windows 7, and an Excel 2007 spreadsheet. The spreadsheet consists of 750 rows, columns A through BW, and is about 600KB in size. This is my code for opening the spreadsheet--pretty standard:
//Include PHPExcel_IOFactory
include 'PHPExcel/IOFactory.php';
include 'PHPExcel.php';
$inputFileName = 'C:\xls\lspimport\GetLSP1.xlsx';
// Read your Excel workbook
try {
$inputFileType = PHPExcel_IOFactory::identify($inputFileName);
$objReader = PHPExcel_IOFactory::createReader($inputFileType);
$objReader->setReadDataOnly(true);
$objPHPExcel = $objReader->load($inputFileName);
} catch(Exception $e) {
die('Error loading file "'.pathinfo($inputFileName,PATHINFO_BASENAME).'": '.$e->getMessage());
}
//set active worksheet
$objWorksheet = $objPHPExcel->setActiveSheetIndexbyName('Sheet1');
$j = 0;
for($i = 2; $i < 3; $i++)
{
...
}
In the end, I eventually want to loop through each row in the spreadsheet, but for the time being while I perfect the script, I'm only looping through one row. The problem is, it takes 30 minutes for this script to execute. I echo'd messages after each section of code so I could see what was being processed and when, and my script basically waits for 30 minutes at this part:
$objPHPExcel = $objReader->load($inputFileName);
Have a configured something incorrectly? I can't figure out why it takes 30 minutes to load the spreadsheet. I appreciate any and all help.
PHPExcel has a problem with identifying where the end of your excel file is. Or rather, Excel has a hard time knowing where the end of itself is. If you touch a cell at A:1000000 it thinks it needs to read that far.
I have done 2 things in the past to fix this:
1) Cut and past the data you need into new excel file.
2) Specify the exact dimensions you want to read.
Edit How to do option 2
public function readExcelDataToArray($excelFilePath, $maxRowNumber=-1, $maxColumnNumber=-1)
{
$objPHPExcel = PHPExcel_IOFactory::load($excelFilePath);
$objWorksheet = $objPHPExcel->getActiveSheet();
//Get last row and column that have data
if ($maxRowNumber == -1){
$lastRow = $objWorksheet->getHighestDataRow();
} else {
$lastRow = $maxRowNumber;
}
if ($maxColumnNumber == -1){
$lastCol = $objWorksheet->getHighestDataColumn();
//Change Column letter to column number
$lastCol = PHPExcel_Cell::columnIndexFromString($lastCol);
} else {
$lastCol = $maxColumnNumber;
}
//Get Data Array
$dataArray = array();
for ($currentRow = 1; $currentRow <= $lastRow; $currentRow++){
for ($currentCol = 0; $currentCol <= $lastCol; $currentCol++){
$dataArray[$currentRow][$currentCol] = $objWorksheet->getCellByColumnAndRow($currentCol,, $currentRow)->getValue();
}
}
return $dataArray;
}
Unfortunately these solutions aren't very dynamic.
Note that a modern excel file is really just a zip with an xlsx extension. I have written extensions to PHPExcel that unzip them, and modify certain xml files to get the kinds of behaviors I want.
A third suggestion for you would be to monitor the contents of each row and stop when you get an empty one.
Resolved (for me) - see note at bottom of this post
I'm trying to use pretty much identical code on a dedicated quad core server with 16GB of RAM, also running similar versions - PHPExcel 1.7.9 and PHP 5.4.16
Just creating an empty reader takes 50 seconds!
// $inputFileType is 'Excel5';
$objReader = PHPExcel_IOFactory::createReader($inputFileType);
Loading the spreadsheet (1 sheet, 2000 rows, 25 columns) I want to process (readonly) then takes 1802 seconds.
$objReader->setReadDataOnly(true);
$objPHPExcel = $objReader->load($inputFileName);
Of the various types of reader I consistently get timings for instantiation as shown below
foreach(array(
'Excel2007', // 350 seconds
'Excel5', // 50 seconds
'Excel2003XML', // 50 seconds
'OOCalc', // 50 seconds
'SYLK', // 50 seconds
'Gnumeric', // 50 seconds
'HTML', // 250 seconds
'CSV' // 50 seconds
) as $inputFileType) {
$objReader = PHPExcel_IOFactory::createReader($inputFileType);
}
Peak memory usage was about 8MB... far less than the 250MB the script has available to it.
My suspicion WAS that PHPExcel_IOFactory::createReader($inputFileType) was calling something within a loop that's extremely slow under PHP 5.4.x ?
However the excessive time was due to how PHPExcel names its class names and corresponding file structure. It has an autoloader that converts class names such as *PHPExcel_abc_def* into PHPExcel/abc/def.php for the require statement. Although we had PHPExcel's class directory defined in our include path, our own (already defined) autoloader couldn't handle the manipulation from class name to file name required (it was looking for *PHPExcel_abc_def.php*). When a class file cannot be included, our autoloader will loop 5 times with a 10 second delay to see if the file is being updated and so might become available. So for every PHPExcel class that needed to be loaded we were introducing a delay of 50 seconds before hitting PHPExcel's own autoloader which required the file in fine.
Now that I've got that resolved PHPExcel is proving to be truly awesome.
I'm using the latest version of PHPExcel (1.8.1) in a Symfony project, and I also ran into time delays when using the $objReader->load($file) method. The time delays were not due to an autoloader, but to the load method itself. This method actually reads every cell in every worksheet. And since my data worksheet was 30 columns wide by 5000 rows, it took about 90 seconds to read all this information on my ancient work computer.
I assumed that the real loading/reading of cell values would occur on the fly as I requested them, but it looks like short of a pretty major re-write of the PHPExcel code, there's no real way around this initial load time delay.
If you know your file is a pretty plain excel file, you can do manual reading. A .xslx file is just a zip archive with the spreadsheet values and structure stored into xml files. This script took me from the 60 seconds used on PHPExcel down to 0.18 seconds.
$zip = new ZipArchive();
$zip->open('path_to/file.xlsx');
$sheet_xml = simplexml_load_string($zip->getFromName('xl/worksheets/sheet1.xml'));
$sheet_array = json_decode(json_encode($xml), true);
$values = simplexml_load_string($zip->getFromName('xl/sharedStrings.xml'));
$values_array = json_decode(json_encode($values), true);
$end_result = array();
if ($sheet_array['sheetData']) {
foreach ($sheet_array['sheetData']['row'] as $r => $row) {
$end_result[$r] = array();
foreach ($row['c'] as $c => $cell) {
if (isset($cell['#attributes']['t'])) {
if ($cell['#attributes']['t'] == 's') {
$end_result[$r][] = $values_array['si'][$cell['v']]['t'];
} else if ($cell['#attributes']['t'] == 'e') {
$end_result[$r][] = '';
}
} else {
$end_result[$r][] = $cell['v'];
}
}
}
}
Result:
Array
(
[0] => Array
(
[0] => A1
[1] => B1
[2] => C1
)
[1] => Array
(
[0] => A2
[1] => B2
[2] => C2
)
)
This is error prone and not optimized, but it works and illustrates the basic idea. If you know your file, then you can make reading very fast. If you allow users to input the files, then you should maybe avoid it - or at least do the neccessary checks.

Categories