Is there a way to tell PHPExcel to just write rows supplied from an array, without doing any calculation / apply styling / any other thing it does while writing OR when using fromArray ?
Need this for performance.
$inputFileName = 'client_files/sample.xlsx';
$objPHPExcel = PHPExcel_IOFactory::load($inputFileName);
$objPHPExcel->getSheet(0)->setCellValue('D2', '#' . $user . ' followers');
$objPHPExcel->getSheet(0)->fromArray(
$followersData,
NULL,
'A5'
);
$objWriter = new PHPExcel_Writer_Excel2007($objPHPExcel);
$objWriter->setPreCalculateFormulas(false);
$objWriter->save(FINAL_FOLDER . '/' . $line[0] . '.xlsx');
Memory consumption isn't an issue. But the above is just taking too much time (2 minutes with 2700 rows)
the ->save() call takes 93 seconds. The ->fromArray() takes 53 seconds
Also is there any other wayy faster Excel library that allows loading existing xlsx and then writing to it ?
Thanks
You can try using Spout. If you don't care about styling/calculation, it should solve your performance problem (it takes only a few seconds).
Something along these lines should work:
$inputFileName = 'client_files/sample.xlsx';
$reader = ReaderFactory::create(Type::XLSX);
$reader->open($inputFileName);
$outputFileName = FINAL_FOLDER . '/' . $line[0] . '.xlsx';
$writer = WriterFactory::create(Type::XLSX);
$writer->openToFile($outputFileName);
$reader->nextSheet();
$rowCount = 0;
while ($reader->hasNextRow()) {
$row = $reader->nextRow();
if ($rowCount === 1) {
$row[1] = '#' . $user . ' followers';
}
$followersDataForCurrentRow = $followersData[$rowCount];
$columnIndexStart = 4; // To add stuff in the 5th column
foreach ($followersDataForCurrentRow as $followerValue) {
$row[$columnIndexStart] = $followerValue;
$columnIndexStart++;
}
$writer->addRow($row);
$rowCount++;
}
$reader->close();
$writer->close();
I did a bunch of things that resulted in wayyyy faster performance.
ran the script outside the IDE
set memory limit to 3GB
Used a different version of PHP
Fixed memory leak
$objPHPExcel->disconnectWorksheets() ;
unset($objPHPExcel) ;
I am not sure what solved the issue..
Related
In a scheduled task of my Laravel application I'm reading several large gzipped CSV files, ranging from 80mb to 4gb on an external FTP server, containing products which I store in my database based on a product attribute.
I loop through a list of product feeds that I want to import but each time a fatal error is returned: 'Allowed memory size of 536870912 bytes exhausted'. I can bump up the length parameter of the fgetcsv function from 1000 to 100000 which solves the problem for the smaller files (< 500mb) but for the larger files it will return the fatal error.
Is there a solution that allows me to either download or unzip the .csv.gz files, reading the lines (by batch or one by one) and inserting the products into my database without running out of memory?
$feeds = [
"feed_baby-mother-child.csv.gz",
"feed_computer-games.csv.gz",
"feed_general-books.csv.gz",
"feed_toys.csv.gz",
];
foreach ($feeds as $feed) {
$importedProducts = array();
$importedFeedProducts = 0;
$csvfile = 'compress.zlib://ftp://' . config('app.ftp_username') . ':' . config('app.ftp_password') . '#' . config('app.ftp_host') . '/' . $feed;
if (($handle = fopen($csvfile, "r")) !== FALSE) {
$row = 1;
$header = fgetcsv($handle, 1, "|");
while (($data = fgetcsv($handle, 1000, "|")) !== FALSE) {
if($row == 1 || array(null) !== $data){ $row++; continue; }
$product = array_combine($header, $data);
$importedProducts[] = $product;
}
fclose($handle);
} else {
echo 'Failed to open: ' . $feed . PHP_EOL;
continue;
}
// start inserting products into the database below here
}
The problem is probably not the gzip file itself,
Of course you can download it, on process it then, this will keep the same issues.
Because you are loading all products in a single array (Memory)
$importedProducts[] = $product;
You could comment this line out, and see it if this prevent's hitting your memory limit.
Usually i would create a method like this addProduct($product) to handle it memory safe.
You can then from there decide a max number of products before doing a bulk insert. to achieve optimal speed.. i usually use something between 1000 en 5000 rows.
For example
class ProductBatchInserter
{
private $maxRecords = 1000;
private $records = [];
function addProduct($record) {
$this->records[] = $record;
if (count($this->records) >= $this->maxRecords) {
EloquentModel::insert($this->records);
$this->records = [];
}
}
}
However i usualy don't implement it as a single class, but in my projects i used to integrate them as a BulkInsertable trait that could be used on any eloquent model.
But this should give you an direction, how you can avoid memory limits.
Or, the easier , but significantly slower, just insert the row where you now assign it to array.
But that will put a ridiculous load on your database and will be really very slow.
If the GZIP stream is the bottleneck
As i expect this is not the issue, but if it would, then you could use gzopen()
https://www.php.net/manual/en/function.gzopen.php
and nest the gzopen handle as handle for fgetcsv.
But i expect the streamhandler you are using, is doing this already the same way for you..
If not, i mean like this:
$input = gzopen('input.csv.gz', 'r');
while (($row = fgetcsv($input)) !== false) {
// do something memory safe, like suggested above
}
If you need to download it anyway there are many ways to do it, but make sure you use something memory safe, like fopen / fgets , or a guzzle stream and don't try to use something like file_get_contents() that loads it into memory
I'm a beginner level developer learning php.The task that i need to do is upload a 6gb CSV file which contains data, into the data base.I need to access the data i.e reading the file through controller.php file and then splitting that huge CSV file into 10,000 row output CSV files and writing data into those output CSV files. I have been through this task a week already and dint figure it out yet.Would you guys please help me in solving this issue.
<?php
namespace App\Http\Controllers;
use Illuminate\Queue\SerializesModels;
use App\User;
use DateTime;
use Illuminate\Http\Request;
use Storage;
use Validator;
use GuzzleHttp\Client;
use GuzzleHttp\RequestOptions;
use Queue;
use App\model;
class Name extends Controller
{
public function Post(Request $request)
{
if($request->hasfile('upload')){
ini_set('auto_detect_line_endings', TRUE);
$main_input = $request->file('upload');
$main_output = 'output';
$filesize = 10000;
$input = fopen($main_input,'r');
$rowcount = 0;
$filecount = 1;
$output = '';
// echo "here1";
while(!feof($input)){
if(($rowcount % $filesize) == 0){
if($rowcount>0) {
fclose($output);
}
$output = fopen(storage_path(). "/tmp/".$main_output.$filecount++ . '.csv','w');
}
$data = fgetcsv($input);
print_r($data);
if($data) {
fputcsv($output, $data);
}
$rowcount++;
}
fclose($output);
}
}
}
Maybe it's because you are creating a new $output file handler for each iteration.
I've made some adjustments, so that we only create a file when the rowCount = 0 and close it when the fileSize is reached. Also the rowCount has to be reset to 0 each time we close the file.
public function Post(Request $request)
{
if($request->hasfile('upload')){
ini_set('auto_detect_line_endings', TRUE);
$main_input = $request->file('upload');
$main_output = 'output';
$filesize = 10000;
$input = fopen($main_input,'r');
$rowcount = 0;
$filecount = 1;
$output = '';
// echo "here1";
while(!feof($input)){
if ($rowCount == 0) {
$output = fopen('php://output', storage_path(). "/tmp/".$main_output.$filecount++ . '.csv','w');
}
if(($rowcount % $filesize) == 0){
if($rowcount>0) {
fclose($output);
$rowCount = 0;
continue;
}
}
$data = fgetcsv($input);
print_r($data);
if($data) {
fputcsv($output, $data);
}
$rowcount++;
}
fclose($output);
}
}
Here is working example of splitting CSV file by the amount of lines (defined by$numberOfLines). Just set your path in $filePath and run the script in shell for example:
php -f convert.php
script code:
convert.php
<?php
$filePath = 'data.csv';
$numberOfLines = 10000;
$file = new SplFileObject($filePath);
//get header of the csv
$header = $file->fgets();
$outputBuffer = '';
$outputFileNamePrefix = 'datasplit-';
$readLinesCount = 1;
$readlLinesTotalCount = 1;
$suffix=0;
$outputBuffer .= $header;
while ($currentLine = $file->fgets()) {
$outputBuffer .= $currentLine;
$readLinesCount++;
$readlLinesTotalCount++;
if ($readLinesCount >= $numberOfLines) {
$outputFilename = $outputFileNamePrefix . $suffix . '.csv';
file_put_contents($outputFilename, $outputBuffer);
echo 'Wrote ' . $readLinesCount . ' lines to: ' . $outputFilename . PHP_EOL;
$outputBuffer = $header;
$readLinesCount = 0;
$suffix++;
}
}
//write remainings of output buffer if it is not empty
if ($outputBuffer !== $header) {
$outputFilename = $outputFileNamePrefix . $suffix . '.csv';
file_put_contents($outputFilename, $outputBuffer);
echo 'Wrote (last time)' . $readLinesCount . ' lines to: ' . $outputFilename . PHP_EOL;
$outputBuffer = '';
$readLinesCount = 0;
}
you will not be able to convert such amount of data in one php execution if it is run form web because of the maximum execution time of php scripts that is usually between 30-60sec and there is a reason for that - don't event try to extend it to some huge number. If you want your script to run even for hours you need to call it from command line, but you also can call it similar way from another script (for example the controller you have)
You do that this way:
exec('php -f convert.php');
and that's it.
The controller you have will not be able to tell if the whole data was converted because before that happens it will be terminated. What you can do is to write your own code in convert.php that updates some field in database and other controller in your application can read that and print to the user the progress of the runnig convert.php.
The other approach is to crate job/jobs that you can put in the queue and can be run by job manager process with workers that can take care for the conversion but I think that would be an overkill for your need.
Keep in mind that if you split something and on different location join you may have problem of getting something wrong in that process the method that would assure you that you split, transferred, joined your data successfully is to calculate HASH ie SHA-1 of the whole 6GB file before split, send that HASH to destination where all small parts of data needs to be combined, combine them into one 6GB file, calculate HASH of that file and compare with the one that was send. Keep in mind that each of small parts of your data after splitting has their own header to be CSV file easy to interpret (import), where in the original file you have only one header row.
Trying to maintain some old dusty code, I am facing a problem with phpexcel in a import symfony command.
It seems that the library cannot calculate the formula correctly which is linked to another sheet of the same document as the active sheet.
The error I get is :
[PHPExcel_Calculation_Exception]
Price Template Map!B2 -> Invalid cell coordinate A
My code is :
try {
$inputFileType = \PHPExcel_IOFactory::identify($filePath);
$objReader = \PHPExcel_IOFactory::createReader($inputFileType);
$objReader->setLoadSheetsOnly(array($this->getListName(), 'Template Info'));
$objReader->setIncludeCharts(true);
$objPHPExcel = $objReader->load($filePath);
} catch (\Exception $e) {
throw new \Exception("Invalid file");
}
$sheet = $objPHPExcel->getActiveSheet();
$highestRow = $sheet->getHighestRow();
$highestColumn = $sheet->getHighestColumn();
$fieldsNumber = array();
$filterData = array();
$templateIdList = array();
for ($i = 1; $i <= $highestRow; $i++) {
$rowData = $sheet->rangeToArray('A' . $i . ':' . $highestColumn . $i, null, true, false);
$rowData = $rowData[0];
var_dump($rowData);
}
The first line with the headers is read correctly, but the rest is not.
My formula is :
=VLOOKUP(A18,'Template Info'!A:C,3,FALSE)"
Do not hesitate to ask me more informations if you need it !
Thank you all in advance :) !
The problem is the column reference: the PHPExcel Calculation Engine supports range references (even to other worksheets), but not row or column references
so
=VLOOKUP(A18,'Template Info'!A1:C100,3,FALSE)
would be valid, but
=VLOOKUP(A18,'Template Info'!A:C,3,FALSE)
can't be calculated
Use the getCalculatedValue() function.
I'm trying to export some records to excel from my MySQL (webserver) and when the query returns >4k records the script hangs the web browser and temporaly the web hosting.
My PHP_version is 5.2.13-pl1-gentoo and the memory_limit configurated in php.ini is 128M
The result excel only have one column and N rows. With 100 or 200 rows the php script runs fine.
This is the php script
<? session_start();
ini_set('memory_limit', '1024M');
set_time_limit(0);
include("include/conexion.php");
require_once 'include/PHPExcel/Classes/PHPExcel.php';
require_once 'include/PHPExcel/Classes/PHPExcel/IOFactory.php';
$objPHPExcel = new PHPExcel();
$objPHPExcel->getProperties()->setCreator("Name")
->setLastModifiedBy("Name")
->setTitle("Listado")
->setSubject("Listado")
->setDescription("Listado.")
->setKeywords("Listado")
->setCategory("Listado");
$query = explode("|",stripcslashes($_POST['query']));
$objPHPExcel->getActiveSheet()->setTitle('List');
$resEmp = mysql_query ($query, $conexion ) or die(mysql_error());
$tot = mysql_num_rows($resEmp);
$num_fields = mysql_num_fields($resEmp);
$fistIndex = $objPHPExcel->getActiveSheet()->getCellByColumnAndRow(0, 1)->getColumn();
$lastIndex = $objPHPExcel->getActiveSheet()->getCellByColumnAndRow($num_campos - 1, 1)->getColumn();
//tittles
for ($e=0;$e < $num_fields;$e++){
$objPHPExcel->getActiveSheet()->setCellValueByColumnAndRow($e, 2, utf8_decode(ucwords(mysql_field_name($resEmp,$e))));
$objPHPExcel->getActiveSheet()->getColumnDimension($objPHPExcel->getActiveSheet()->getCellByColumnAndRow($e, 2)->getColumn())->setAutoSize(true);
}
//color tittles
$objPHPExcel->getActiveSheet()->getStyle( $fistIndex.'1:'.$lastIndex.'2' )->getFill()->setFillType(PHPExcel_Style_Fill::FILL_SOLID)->getStartColor()->setRGB('c5c5c7');
$objPHPExcel->getActiveSheet()->getStyle( $fistIndex.'1:'.$lastIndex.'2' )->getFont()->setBold(true);
if(isset ( $_POST ['mail'] )){
$objPHPExcel->getActiveSheet()->setCellValueByColumnAndRow(0, 2, "Email");
$emails = array();
for ($row = 0; $row < $totEmp; $row++) {
//more than one mail in field separated by ";"
$aux = explode(";", mysql_result($resEmp,$row,$col));
for($i=0; $i<count($aux); $i++){
$cleaned = utf8_encode(strtolower(trim($aux[$i])));
//filter repeated mails
if(!in_array($cleaned, $emails) && $aux[$i] != ""){
$num_rows = $objPHPExcel->getActiveSheet()->getHighestRow();
$objPHPExcel->getActiveSheet()->insertNewRowBefore($num_rows + 1, 1);
array_push($emails, $cleaned);
$objPHPExcel->getActiveSheet()->setCellValueByColumnAndRow(0, $num_rows + 1, $cleaned);
}
}
}
}
$objWriter = PHPExcel_IOFactory::createWriter($objPHPExcel, 'Excel2007');
header('Content-type: application/vnd.ms-excel');
header("Content-Disposition: attachment; filename=".$nom_archivo.".xlsx");
// Write file to the browser
$objWriter->save('php://output');
exit();
?>
When enter to the script run a mysql query and then, iterate the result to get the mail field, if the obtained mail not exist in a array this mail is inserted in excel
I've tried to set
ini_set('memory_limit', '1024M');
set_time_limit(0);
But the problem persist.
Any idea to solve problem?
Thanks a lot
EDIT 1
I've updated the code with the recommendations and now works fine.
Anyway How can I get if occurs any error or the memory usage just before of hanging?
How can I get the max memory_limit available to set with ini_set('memory_limit', '2048M'); ?
<? session_start();
ini_set('memory_limit', '2048M');
set_time_limit(0);
include("include/conexion.php");
require_once 'include/PHPExcel/Classes/PHPExcel.php';
require_once 'include/PHPExcel/Classes/PHPExcel/IOFactory.php';
$objPHPExcel = new PHPExcel();
$objPHPExcel->getProperties()->setCreator("Name")
->setLastModifiedBy("Name")
->setTitle("Listado")
->setSubject("Listado")
->setDescription("Listado.")
->setKeywords("Listado")
->setCategory("Listado");
$activeSheet = $objPHPExcel->getActiveSheet();
$query = explode("|",stripcslashes($_POST['query']));
$activeSheet->setTitle('List');
$resEmp = mysql_query ($query, $conexion ) or die(mysql_error());
$tot = mysql_num_rows($resEmp);
$num_fields = mysql_num_fields($resEmp);
$fistIndex = $activeSheet->getCellByColumnAndRow(0, 1)->getColumn();
$lastIndex = $activeSheet->getCellByColumnAndRow($num_campos - 1, 1)->getColumn();
//tittles
for ($e=0;$e < $num_fields;$e++){
$activeSheet->setCellValueByColumnAndRow($e, 2, utf8_decode(ucwords(mysql_field_name($resEmp,$e))));
$activeSheet->getColumnDimension($activeSheet->getCellByColumnAndRow($e, 2)->getColumn())->setAutoSize(true);
}
//color tittles
$activeSheet->getStyle( $fistIndex.'1:'.$lastIndex.'2' )->getFill()->setFillType(PHPExcel_Style_Fill::FILL_SOLID)->getStartColor()->setRGB('c5c5c7');
$activeSheet->getStyle( $fistIndex.'1:'.$lastIndex.'2' )->getFont()->setBold(true);
if(isset ( $_POST ['mail'] )){
$activeSheet->setCellValueByColumnAndRow(0, 2, "Email");
$emails = array();
for ($row = 0; $row < $totEmp; $row++) {
//more than one mail in field separated by ";"
$aux = explode(";", mysql_result($resEmp,$row,$col));
for($i=0; $i<count($aux); $i++){
$cleaned = utf8_encode(strtolower(trim($aux[$i])));
//filter repeated mails
if(!in_array($cleaned, $emails) && $aux[$i] != ""){
array_push($emails, $cleaned);
}
}
}
for ($row = 0; $row < count($emails); $row++) {
$activeSheet->setCellValueByColumnAndRow(0, $row + 3, $emails[$row]);
}
}
$objWriter = PHPExcel_IOFactory::createWriter($objPHPExcel, 'Excel2007');
header('Content-type: application/vnd.ms-excel');
header("Content-Disposition: attachment; filename=".$nom_archivo.".xlsx");
// Write file to the browser
$objWriter->save('php://output');
exit();
?>
Seems this library has serious problem in parsing large excel spreadsheets, I'd this issue already & I couldn't find a proper solution. I guess this is normal behaviour because this library is written fully in PHP that causes a lot of parsing overhead.
I strongly suggest you to use a excel parsing PHP-extension like this one.
As another thinkable solution [if its possible], you can break down your big file to several smaller files (e.g by sheets), otherwise I guess you should use a faster CPU or use another library or programming language to parse your exel files (e.g. apache-poi in java, maybe with a PHP/Java bridge).
Unfortunately, PHPExcel is not good for performing with large data because PHP is not really a good binary file processing language.
Some people export their data to XML format of excel (http://en.wikipedia.org/wiki/Microsoft_Office_XML_formats) and it can work well. However, the xml format does not have full features of excel binary file and of course it will have a bigger file size.
In order to work with the large data (import/export to binary excel file), our system now using libxl which will cost you 199$ for a license, and php_excel which is a wrapper for libxl. In effect, our system now export a excel file with more than 5k of rows in about just only some seconds using libxl and I think it's an only solution for you until now to use binary excel.
P/s: The $objPHPExcel->getActiveSheet() also have a cost, so you could store it value to a variable for reusing later which will help you to speed up your code a little bit.
I had this problem but after changed some options in php.ini and scripts, I could reduce file from 28 MB to 4 MB.
increase memory_limit=2048M in php.ini.
change max_execution_time to more seconds.
in the script yo should use Excel2007 like below:
ob_end_clean();
header('Content-Type: application/vnd.ms-excel');
header("Content-Disposition: attachment;filename=$date.xls");
header('Cache-Control: max-age=0');
ob_end_clean();
$objWriter =PHPExcel_IOFactory::createWriter($objPHPExcel, 'Excel2007');
$objWriter->save('php://output');
I am using PHPExcel to create an Excel document, using data from a MySQL database. My script must execute in under 512MB of RAM, and I am running into trouble as my export reaches 200k records:
PHP Fatal error: Allowed memory size of...
How can I use PHPExcel to create large documents in as little amount of RAM as possible?
My current code:
// Autoload classes
ProjectConfiguration::registerPHPExcel();
$xls = new PHPExcel();
$xls->setActiveSheetIndex(0);
$i = 0;
$j = 2;
// Write the col names
foreach ($columnas_excel as $columna) {
$xls->getActiveSheet()->setCellValueByColumnAndRow($i,1,$columna);
$xls->getActiveSheet()->getColumnDimensionByColumn($i)->setAutoSize(true);
$i++;
}
// paginate the result from database
$pager = new sfPropelPager('Antecedentes',50);
$pager->setCriteria($query_personas);
$pager->init();
$last_page = $pager->getLastPage();
//write the data to excel object
for($pagina =1; $pagina <= $last_page; $pagina++) {
$pager->setPage($pagina);
$pager->init();
foreach ($pager->getResults() as $persona) {
$i = 0;
foreach ($columnas_excel as $key_col => $columnas) {
$xls->getActiveSheet()->setCellValueByColumnAndRow($i,$j,$persona->getByName($key_col, BasePeer::TYPE_PHPNAME));
$i++;
}
$j++;
}
}
// write the file to the disk
$writer = new PHPExcel_Writer_Excel2007($xls);
$filename = sfConfig::get('sf_upload_dir') . DIRECTORY_SEPARATOR . "$cache.listado_personas.xlsx";
if (file_exists($filename)) {
unlink($filename);
}
$writer->save($filename);
CSV version:
// Write the col names to the file
$columnas_key = array_keys($columnas_excel);
file_put_contents($filename, implode(",", $columnas_excel) . "\n");
//write data to the file
for($pagina =1; $pagina <= $last_page; $pagina++) {
$pager->setPage($pagina);
$pager->init();
foreach ($pager->getResults() as $persona) {
$persona_arr = array();
// make an array
foreach ($columnas_excel as $key_col => $columnas) {
$persona_arr[] = $persona->getByName($key_col, BasePeer::TYPE_PHPNAME);
}
// append to the file
file_put_contents($filename, implode(",", $persona_arr) . "\n", FILE_APPEND | LOCK_EX);
}
}
Still have the problem of RAM when Propel makes requests to the database, it's like Propel, does not release the RAM every time you make a new request. I even tried to create and delete the Pager object in each iteration
Propel has formatters in the Query API, you'll be able to write this kind of code:
<?php
$query = AntecedentesQuery::create()
// Some ->filter()
;
$csv = $query->toCSV();
$csv contains a CSV content you'l be able to render by setting the correct mime-type.
Since it appears you can use a CSV, try pulling 1 record at a time and appending it to your CSV. Don't try to get all 200k records at the same time.
$cursor = mysql_query( $sqlToFetchData ); // get a MySql resource for your query
$fileHandle = fopen( 'data.csv', 'a'); // use 'a' for Append mode
while( $row = mysql_fetch_row( $cursor ) ){ // pull your data 1 record at a time
fputcsv( $fileHandle, $row ); // append the record to the CSV file
}
fclose( $fileHandle ); // clean up
mysql_close( $cursor );
I'm not sure how to transform the CSV into an XLS file, but hopefully this will get you on your way.