Export large rows to Excel document, in small memory footprint - php

I am using PHPExcel to create an Excel document, using data from a MySQL database. My script must execute in under 512MB of RAM, and I am running into trouble as my export reaches 200k records:
PHP Fatal error: Allowed memory size of...
How can I use PHPExcel to create large documents in as little amount of RAM as possible?
My current code:
// Autoload classes
ProjectConfiguration::registerPHPExcel();
$xls = new PHPExcel();
$xls->setActiveSheetIndex(0);
$i = 0;
$j = 2;
// Write the col names
foreach ($columnas_excel as $columna) {
$xls->getActiveSheet()->setCellValueByColumnAndRow($i,1,$columna);
$xls->getActiveSheet()->getColumnDimensionByColumn($i)->setAutoSize(true);
$i++;
}
// paginate the result from database
$pager = new sfPropelPager('Antecedentes',50);
$pager->setCriteria($query_personas);
$pager->init();
$last_page = $pager->getLastPage();
//write the data to excel object
for($pagina =1; $pagina <= $last_page; $pagina++) {
$pager->setPage($pagina);
$pager->init();
foreach ($pager->getResults() as $persona) {
$i = 0;
foreach ($columnas_excel as $key_col => $columnas) {
$xls->getActiveSheet()->setCellValueByColumnAndRow($i,$j,$persona->getByName($key_col, BasePeer::TYPE_PHPNAME));
$i++;
}
$j++;
}
}
// write the file to the disk
$writer = new PHPExcel_Writer_Excel2007($xls);
$filename = sfConfig::get('sf_upload_dir') . DIRECTORY_SEPARATOR . "$cache.listado_personas.xlsx";
if (file_exists($filename)) {
unlink($filename);
}
$writer->save($filename);
CSV version:
// Write the col names to the file
$columnas_key = array_keys($columnas_excel);
file_put_contents($filename, implode(",", $columnas_excel) . "\n");
//write data to the file
for($pagina =1; $pagina <= $last_page; $pagina++) {
$pager->setPage($pagina);
$pager->init();
foreach ($pager->getResults() as $persona) {
$persona_arr = array();
// make an array
foreach ($columnas_excel as $key_col => $columnas) {
$persona_arr[] = $persona->getByName($key_col, BasePeer::TYPE_PHPNAME);
}
// append to the file
file_put_contents($filename, implode(",", $persona_arr) . "\n", FILE_APPEND | LOCK_EX);
}
}
Still have the problem of RAM when Propel makes requests to the database, it's like Propel, does not release the RAM every time you make a new request. I even tried to create and delete the Pager object in each iteration

Propel has formatters in the Query API, you'll be able to write this kind of code:
<?php
$query = AntecedentesQuery::create()
// Some ->filter()
;
$csv = $query->toCSV();
$csv contains a CSV content you'l be able to render by setting the correct mime-type.

Since it appears you can use a CSV, try pulling 1 record at a time and appending it to your CSV. Don't try to get all 200k records at the same time.
$cursor = mysql_query( $sqlToFetchData ); // get a MySql resource for your query
$fileHandle = fopen( 'data.csv', 'a'); // use 'a' for Append mode
while( $row = mysql_fetch_row( $cursor ) ){ // pull your data 1 record at a time
fputcsv( $fileHandle, $row ); // append the record to the CSV file
}
fclose( $fileHandle ); // clean up
mysql_close( $cursor );
I'm not sure how to transform the CSV into an XLS file, but hopefully this will get you on your way.

Related

How do i split a 6 gb CSV file into chunks using php

I'm a beginner level developer learning php.The task that i need to do is upload a 6gb CSV file which contains data, into the data base.I need to access the data i.e reading the file through controller.php file and then splitting that huge CSV file into 10,000 row output CSV files and writing data into those output CSV files. I have been through this task a week already and dint figure it out yet.Would you guys please help me in solving this issue.
<?php
namespace App\Http\Controllers;
use Illuminate\Queue\SerializesModels;
use App\User;
use DateTime;
use Illuminate\Http\Request;
use Storage;
use Validator;
use GuzzleHttp\Client;
use GuzzleHttp\RequestOptions;
use Queue;
use App\model;
class Name extends Controller
{
public function Post(Request $request)
{
if($request->hasfile('upload')){
ini_set('auto_detect_line_endings', TRUE);
$main_input = $request->file('upload');
$main_output = 'output';
$filesize = 10000;
$input = fopen($main_input,'r');
$rowcount = 0;
$filecount = 1;
$output = '';
// echo "here1";
while(!feof($input)){
if(($rowcount % $filesize) == 0){
if($rowcount>0) {
fclose($output);
}
$output = fopen(storage_path(). "/tmp/".$main_output.$filecount++ . '.csv','w');
}
$data = fgetcsv($input);
print_r($data);
if($data) {
fputcsv($output, $data);
}
$rowcount++;
}
fclose($output);
}
}
}
Maybe it's because you are creating a new $output file handler for each iteration.
I've made some adjustments, so that we only create a file when the rowCount = 0 and close it when the fileSize is reached. Also the rowCount has to be reset to 0 each time we close the file.
public function Post(Request $request)
{
if($request->hasfile('upload')){
ini_set('auto_detect_line_endings', TRUE);
$main_input = $request->file('upload');
$main_output = 'output';
$filesize = 10000;
$input = fopen($main_input,'r');
$rowcount = 0;
$filecount = 1;
$output = '';
// echo "here1";
while(!feof($input)){
if ($rowCount == 0) {
$output = fopen('php://output', storage_path(). "/tmp/".$main_output.$filecount++ . '.csv','w');
}
if(($rowcount % $filesize) == 0){
if($rowcount>0) {
fclose($output);
$rowCount = 0;
continue;
}
}
$data = fgetcsv($input);
print_r($data);
if($data) {
fputcsv($output, $data);
}
$rowcount++;
}
fclose($output);
}
}
Here is working example of splitting CSV file by the amount of lines (defined by$numberOfLines). Just set your path in $filePath and run the script in shell for example:
php -f convert.php
script code:
convert.php
<?php
$filePath = 'data.csv';
$numberOfLines = 10000;
$file = new SplFileObject($filePath);
//get header of the csv
$header = $file->fgets();
$outputBuffer = '';
$outputFileNamePrefix = 'datasplit-';
$readLinesCount = 1;
$readlLinesTotalCount = 1;
$suffix=0;
$outputBuffer .= $header;
while ($currentLine = $file->fgets()) {
$outputBuffer .= $currentLine;
$readLinesCount++;
$readlLinesTotalCount++;
if ($readLinesCount >= $numberOfLines) {
$outputFilename = $outputFileNamePrefix . $suffix . '.csv';
file_put_contents($outputFilename, $outputBuffer);
echo 'Wrote ' . $readLinesCount . ' lines to: ' . $outputFilename . PHP_EOL;
$outputBuffer = $header;
$readLinesCount = 0;
$suffix++;
}
}
//write remainings of output buffer if it is not empty
if ($outputBuffer !== $header) {
$outputFilename = $outputFileNamePrefix . $suffix . '.csv';
file_put_contents($outputFilename, $outputBuffer);
echo 'Wrote (last time)' . $readLinesCount . ' lines to: ' . $outputFilename . PHP_EOL;
$outputBuffer = '';
$readLinesCount = 0;
}
you will not be able to convert such amount of data in one php execution if it is run form web because of the maximum execution time of php scripts that is usually between 30-60sec and there is a reason for that - don't event try to extend it to some huge number. If you want your script to run even for hours you need to call it from command line, but you also can call it similar way from another script (for example the controller you have)
You do that this way:
exec('php -f convert.php');
and that's it.
The controller you have will not be able to tell if the whole data was converted because before that happens it will be terminated. What you can do is to write your own code in convert.php that updates some field in database and other controller in your application can read that and print to the user the progress of the runnig convert.php.
The other approach is to crate job/jobs that you can put in the queue and can be run by job manager process with workers that can take care for the conversion but I think that would be an overkill for your need.
Keep in mind that if you split something and on different location join you may have problem of getting something wrong in that process the method that would assure you that you split, transferred, joined your data successfully is to calculate HASH ie SHA-1 of the whole 6GB file before split, send that HASH to destination where all small parts of data needs to be combined, combine them into one 6GB file, calculate HASH of that file and compare with the one that was send. Keep in mind that each of small parts of your data after splitting has their own header to be CSV file easy to interpret (import), where in the original file you have only one header row.

PHP memory exhaused while using array_combine in foreach loop

I'm having a trouble when tried to use array_combine in a foreach loop. It will end up with an error:
PHP Fatal error: Allowed memory size of 268435456 bytes exhausted (tried to allocate 85 bytes) in
Here is my code:
$data = array();
$csvData = $this->getData($file);
if ($columnNames) {
$columns = array_shift($csvData);
foreach ($csvData as $keyIndex => $rowData) {
$data[$keyIndex] = array_combine($columns, array_values($rowData));
}
}
return $data;
The source file CSV which I've used has approx ~1,000,000 rows. This row
$csvData = $this->getData($file)
I was using a while loop to read CSV and assign it into an array, it's working without any problem. The trouble come from array_combine and foreach loop.
Do you have any idea to resolve this or simply have a better solution?
UPDATED
Here is the code to read the CSV file (using while loop)
$data = array();
if (!file_exists($file)) {
throw new Exception('File "' . $file . '" do not exists');
}
$fh = fopen($file, 'r');
while ($rowData = fgetcsv($fh, $this->_lineLength, $this->_delimiter, $this->_enclosure)) {
$data[] = $rowData;
}
fclose($fh);
return $data;
UPDATED 2
The code above is working without any problem if you are playing around with a CSV file <=20,000~30,000 rows. From 50,000 rows and up, the memory will be exhausted.
You're in fact keeping (or trying to keep) two distinct copies of the whole dataset in your memory. First you load the whole CSV date into memory using getData() and the you copy the data into the $data array by looping over the data in memory and creating a new array.
You should use stream based reading when loading the CSV data to keep just one data set in memory. If you're on PHP 5.5+ (which you definitely should by the way) this is a simple as changing your getData method to look like that:
protected function getData($file) {
if (!file_exists($file)) {
throw new Exception('File "' . $file . '" do not exists');
}
$fh = fopen($file, 'r');
while ($rowData = fgetcsv($fh, $this->_lineLength, $this->_delimiter, $this->_enclosure)) {
yield $rowData;
}
fclose($fh);
}
This makes use of a so-called generator which is a PHP >= 5.5 feature. The rest of your code should continue to work as the inner workings of getData should be transparent to the calling code (only half of the truth).
UPDATE to explain how extracting the column headers will work now.
$data = array();
$csvData = $this->getData($file);
if ($columnNames) { // don't know what this one does exactly
$columns = null;
foreach ($csvData as $keyIndex => $rowData) {
if ($keyIndex === 0) {
$columns = $rowData;
} else {
$data[$keyIndex/* -1 if you need 0-index */] = array_combine(
$columns,
array_values($rowData)
);
}
}
}
return $data;

PHP fputcsv not appending all data in csv

I have this simple function to write data to csv, but not able to write all the data that I fetch.
Consider an example where I have 500 records to write to csv, which will correspond to 500 rows in csv. While I run this operation it sometimes write only 120 rows or 150 or even 10 rows . Everytime I run this function I get variable rows not 500 rows.
I have not yet found the root cause of why it not writing all 500 records that I fetch.
Following is the PHP code :
$arrHead = array();
$arrHead = array('Assign to','Group assigned to','Customer name','Email','Phone','created Date','Ticket type',
'Ticket Status','internal note','Ticket number','Subject','Origin','Travel Start Date','Destination',
'Package Type','Lead Received','Mail box ID');
$csvfname = time().'.csv';
if(Yii::app()->params['env'] == "LOCAL")
{
$basePath = '/var/www/c360/uploads/1/1/';
$basePath = '/var/www/html/c360/uploads/1/1/'; // Mangesh
}
else
{
$basePath = '/data/cview/run/nginx/htdocs/c360/uploads/1/1/';
}
$completeFilePath = $basePath . $csvfname;
if (!empty($arrRecords) && count($aarrRecords) > 0)
{
$fp = fopen ($completeFilePath,'w+');
fputcsv($fp, $arrHead,",",'"');
chmod($completeFilePath, 0777);
foreach ($arrRecords as $keyTicket => $arrTicket)
{
// preparing columns data into array
$columnValue = array();
$columnValue['col1'] = trim('some value');
.
.
.
.
// after the data is prepared in array I write it to CSV
fputcsv($fp, $columnValue,",",'"');
}
fclose ($fp);
}
After it writes to CSV , and when i check the CSV has less data appended, not all 500 rows
I dont know here I'm wrong in this case.

Convert csv to excel with PHPExcel in laravel?

i have found this answer ,
PHP Converting CSV to XLS - phpExcel error
but i have tried it in Laravel 4 and i am not able to get it to work , any help would be appreciated.
My Code
public function CsvExcelConverter($filename){
$objReader = Excel::createReader('CSV');
$objReader->setDelimiter(";");
$objPHPExcel = $objReader->load('uploads/'.$filename);
$objWriter = Excel::createWriter($objPHPExcel, 'Excel5');
//new file
$new_filename = explode('.',$filename);
$new_name = $new_filename[1];
$objWriter->save($new_name.'.xls');
return $new_name.'.xls';
}
thank for the answers, but for some reason we cant seem to set the delimiter on load but i have found that you can set it in the config file .
vendeor/maatwebsite/excel/src/config/csv.php
then just specify the delimiter. this way when loading the file it actually separates each entry and when converting it each entry is in its own cell.
thanks for all the help.
/* Get the excel.php class here: http://www.phpclasses.org/browse/package/1919.html */
require_once("../classes/excel.php");
$inputFile=$argv[1];
$xlsFile=$argv[2];
if( empty($inputFile) || empty($xlsFile) ) {
die("Usage: ". basename($argv[0]) . " in.csv out.xls\n" );
}
$fh = fopen( $inputFile, "r" );
if( !is_resource($fh) ) {
die("Error opening $inputFile\n" );
}
/* Assuming that first line is column headings */
if( ($columns = fgetcsv($fh, 1024, "\t")) == false ) {
print( "Error, couldn't get header row\n" );
exit(-2);
}
$numColumns = count($columns);
/* Now read each of the rows, and construct a
big Array that holds the data to be Excel-ified: */
$xlsArray = array();
$xlsArray[] = $columns;
while( ($rows = fgetcsv($fh, 1024, "\t")) != FALSE ) {
$rowArray = array();
for( $i=0; $i<$numColumns;$i++ ) {
$key = $columns[$i];
$val = $rows[$i];
$rowArray["$key"] = $val;
}
$xlsArray[] = $rowArray;
unset($rowArray);
}
fclose($fh);
/* Now let the excel class work its magic. excel.php
has registered a stream wrapper for "xlsfile:/"
and that's what triggers its 'magic': */
$xlsFile = "xlsfile://".$xlsFile;
$fOut = fopen( $xlsFile, "wb" );
if( !is_resource($fOut) ) {
die( "Error opening $xlsFile\n" );
}
fwrite($fOut, serialize($xlsArray));
fclose($fOut);
exit(0);
If you use the maatwebsite/excel library in Laravel, you can only use native PHPExcel instance methods, not static methods. To convert from CSV to excel, this code can be found at Documentation page
Excel::load($filename, function($file) {
// modify file content
})->setFileName($new_name)->store('xls');
In theory, you should create your custom class to set delimiter:
class CSVExcel extends Excel {
protected $delimiter = ';';
}
and now you could use:
CSVExcel::load('csvfilename.csv')->setFileName('newfilename')->export('xls');
But the problem is, that $delimiter isn't used in this case. Delimiter support seems to be added not long time ago, so maybe there is a bug or it needs to be used in the other way. I've added issue just in case for that: https://github.com/Maatwebsite/Laravel-Excel/issues/262

PHP simpleXML, instead of appending, writes over everything

I have an ongoing xml file, that when I call a php function to add a new child, it loops through an array of strings and queries a db to add a new child to the document and save it as the current string in the array. However, it is not appending, it is overwriting everything. Do I need to load the file first? and check if it exists?
function createUnitsXML($units,$wcccanumber,$mysqli) {
// Delete whitespaces and create an array of units assigned to call
$unit = preg_replace('/\s+/', '', $units);
$unitsarray = explode(",",$unit);
for ($i = 0; $i < count($unitsarray); $i++) {
$xml = new SimpleXMLElement('<xml/>');
$query = "SELECT * FROM calls WHERE wcccanumber = '$wcccanumber'";
$result = $mysqli->query($query);
while($row = mysqli_fetch_assoc($result)) {
$draw = $xml->addChild('call');
$draw->addChild('wcccanumber',$row['wcccanumber']);
$draw->addChild('currentcall',$row['call']);
$draw->addChild('county',$row['county']);
$draw->addChild('id',$row['id']);
$draw->addChild('location',$row['location']);
$draw->addChild('callcreated',$row['callcreated']);
$draw->addChild('station',$row['station']);
$draw->addChild('units',$row['units']);
$draw->addChild('calltype',$row['calltype']);
$draw->addChild('lat',$row['lat']);
$draw->addChild('lng',$row['lng']);
$draw->addChild('inputtime',$row['inputtime']);
}
$fp = fopen("xml/units/$unitsarray[$i].xml","wb");
fwrite($fp,$xml->asXML());
fclose($fp);
}
echo "--- Created units XML document for call: $wcccanumber";
echo "</br>";
}
$fp = fopen("xml/units/$unitsarray[$i].xml","wb");
By opening the file as "wb", you are truncating the file to write. Try using "ab" (write-only, appends to end of file) or "ab+" (read or write, appends to end of file) instead.

Categories