fputcsv - running out of memory during creation of larger files - php

I am creating sometimes large csv files from db information for users to then download - 100k or more rows. It appears I am running into a memory issue during the csv creation on some of the larger files. Here is an example of how I am currently handling creation of the csv.
Is there any way around this? Originally had 32mb and changed that to 64mb and still having the issue.
//columns array
$log_columns = array(
'1',
'2',
'3',
'4',
'5',
'6',
'7',
'8',
'9'
);
//results from the db
$results = $log_stmt->fetchAll(PDO::FETCH_ASSOC);
$log_file = 'test.csv';
$log_path = $_SERVER['DOCUMENT_ROOT'].'/../user-data/'.$_SESSION['user']['account_id'].'/downloads/';
// if location does not exist create it
if(!file_exists($log_path))
{
mkdir($log_path, 0755, true);
}
// open file handler
$fp = fopen($log_path.$log_file, 'wb');
// write the csv column titles / labels
fputcsv($fp, $log_columns);
//are there any logs?
if($results)
{
//write the rows
foreach($results as $row)
{
//rows array
$log_rows = array(
$row['1'],
$row['2'],
$row['3'],
$row['4'],
$row['5'],
$row['6'],
$row['7'],
$row['8'],
$row['9']
);
//write the rows
$newcsv = fputcsv($fp, $log_rows);
}//end foreach
}
// there were no results so just return an empty log
else
{
$newcsv = fputcsv($fp, array('No results found.') );
}
//close handler
fclose($fp);
// if csv was created return true
if($newcsv)
{
return true;
}
UPDATE :
Using a while loop and fetch instead of foreach and fetchAll still produces a memory error.
while($result = $log_stmt->fetch(PDO::FETCH_ASSOC))
How is that possible if I am only loading one one row at a time?
UPDATE 2 :
I have further tracked this down to the while loop using memory_get_usage();
echo (floor( memory_get_usage() / 1024) ).' kb<br />';
Before the while loop starts the result is 4658 kb and then for each iteration of the while loop it increases 1kb every 2-3 loops until it reaches the 32748 kb max memory allowed.
What can I do to solve this issue?
UPDATE 3 :
Played around more with this today... the way this works just does not make much sense to me - I can only assume it is a strange behavior with php's GC.
scenario 1 : My query gets all 80k rows and uses a while loop to output them. Memory used is around 4500kb after the query is fetched then increments 1kb every two to three rows that are outputted in the loop. Memory is not released what so ever and it crashes without enough memory at some point.
while($results = $log_stmt->fetch(PDO::FETCH_ASSOC))
{
echo $results['timestamp'].'<br/>';
}
scenario 2 : My query is now looped and gets 1000 rows at a time with a loop within that outputting each row. Memory maxes at 400k as it loops and completes the entire output with no memory issues.
For this example I just used a counter 80 times as I know there is more than 80k rows to retrieve. In reality I would have to do this different obviously.
$t_counter = 0;
while($t_counter < 80)
{
//set bindings
$binding = array(
'cw_start' => $t_counter * 1000,
//some other bindings...
);
$log_stmt->execute($binding);
echo $t_counter.' after statement '.floor( memory_get_usage() / 1024 ).' kb<br />';
while($results = $log_stmt->fetch(PDO::FETCH_ASSOC))
{
echo $results['capture_timestamp'].'<br/>';
}
echo $t_counter.' after while'.floor( memory_get_usage() / 1024 ).' kb<br />';
$t_counter++;
}
So I guess my question is why does the first scenario have incrementing memory usage and nothing is released? In that while loop there are no new variables and everything is 'reused'. The exact same situation happens in the second scenario just within another loop.

fetchAll fetches all records, who not just query it and do a while loop with fetch then it does not need to load all the result set in memory.
http://php.net/manual/en/pdostatement.fetch.php

Then i think you should try reading the files in bits. Read them and append into one csv file,that way you free memory during the process .
You could do count(*) ,but try to find the total count before the multiple collection

I have been using php's csv myself, i even use it as a databse system(nosql)
try
csv code for reading
<?php
$CSVfp = fopen("filename.csv", "r");
if($CSVfp !== FALSE) {
$con=1;
while(! feof($CSVfp))
{
do something
}?>
**csv code for writting **
<?php
$list = array
(
"edmond,dog,cat,redonton",
"Glenn,Quagmire,Oslo,Norway",
);$file = fopen("filename.csv","w");foreach ($list as $line)
{fputcsv($file,explode(',',$line));}fclose($file); ?>

Related

PHP Array Processing Ability Decreases

I need help processing files holding about 46k lines or more than 30MB of data.
My original idea was to open the file and turn each line into an array element. This worked the first time as the array held about 32k values total.
The second time, the process was repeated, the array only held 1011 elements, and finally, the third time it could only hold 100.
I'm confused and don't know much about the backend array processes. Can someone explain what is happening and fix the code?
function file_to_array($cvsFile){
$handle = fopen($cvsFile, "r");
$path = fread($handle, filesize($cvsFile));
fclose($handle);
//Turn the file into an array and separate lines to elements
$csv = explode(",", $path);
//Remove common double spaces
foreach ($csv as $key => $line){
$csv[$key] = str_replace(' ', '', str_getcsv($line));
}
array_filter($csv);
//get the row count for the file and array
$rows = count($csv);
$filerows = count(file($cvsFile)); //this no longer works
echo "File has $filerows and array has $rows";
return $csv;
}
The approach here can be split in 2.
Optimized file reading and processing
Proper storage solution
Optimized file processing can be done like so:
$handle = fopen($cvsFile, "r");
$rowsSucceed = 0;
$rowsFailed = 0;
if ($handle) {
while (($line = fgets($handle)) !== false) { // Reading file by line
// Process CSV line and check if it was parsed correctly
// And count as you go
if (!empty($parsedLine)) {
$csv[$key] = ... ;
$rowsSucceed++;
} else {
$rowsFailed++;
}
}
fclose($handle);
} else {
// Error handling
}
$totalLines = $rowsSucceed + $rowsFailed;
Also you can avoid array_filter() simply by not adding processed line if its empty.
It will allow to optimize memory usage during script execution.
Proper storage
Proper storage here is needed for performing operations on certain amount of data. File reading are ineffective and expensive. Using simple file based database like sqlite can help you a lot and increase overall performance of your script.
For this purpose you probably should process your CSV directly to database and than perform count operation on parsed data avoiding excessive file line counts etc.
Also it gives you further advantage on working with data not keeping it all in memory.
Your question says you want to "turn each line into an array element" but that is definitely not what you are doing. The code is quite clear; it reads the entire file into $path and then uses explode() to make one massive flat array of every element on every line. Then later you're trying to run str_getcsv() on each item, which of course isn't going to work; you've already exploded all the commas away.
Looping over the file using fgetcsv() makes more sense:
function file_to_array($cvsFile) {
$filerows = 0;
$handle = fopen($cvsFile, "r");
while ($line = fgetcsv($handle)) {
$filerows++;
// skip empty lines
if ($line[0] === null) {
continue;
}
//Remove common double spaces
$csv[] = str_replace(' ', '', $line);
}
//get the row count for the file and array
$rows = count($csv);
echo "File has $filerows and array has $rows";
fclose($handle);
return $csv;
}

Reading multiple columns from large CSV files in PHP

I need to read two columns from a large CSV file. The CSV has multiple columns and can sometimes have following properties:
~25,000 lines
Contain spaces and blank rows
Be uneven (some columns longer than others)
In the example CSV file above, I would be only interested in the codes in the "Buy" and "Sell" columns (columns A and D).
I have written the following code (warning: it's not very elegant) to iterate over all rows and read only the columns I require. I create strings as inputs for 1 large MYSQL query (as opposed to running many small queries).
<?php
//Increase the allowed execution time
set_time_limit(0);
ini_set('memory_limit','256M');
ini_set('max_execution_time', 0);
//Set to detect the ending of CSV files
ini_set('auto_detect_line_endings', true);
$file = "test.csv";
$buy = $sold = ""; //Initialize empty strings
if (($handle = #fopen($file, "r")) !== FALSE) {
while (($pieces = fgetcsv($handle, 100, ",")) !== FALSE) {
if ( ! empty($pieces[0]) ) {
$buy .= $pieces[0] ." ";
}
if ( ! empty($pieces[3]) ) {
$sold .= $pieces[3] ." ";
}
}
echo "Buy ". $buy ."<br>"; //Do something with strings...
echo "Sold ". $sold ."<br>";
//Close the file
fclose($handle);
}
?>
My question is: is this the best way to perform such a task? The code works for smaller test files, but are there short comings I've overlooked in iterating over the CSV file like this?
First, reading any large files is memory consuming if you store them in variables. You may check out reading large files(more than 4GB in unix)
Secondly, you can output the $buy & $sold
on the while loop which might be more memory efficient in the way that those two variables are not saved on the memory.
Lastly, Use file seek method in php fseek documentation

Handle Large File with PHP

I have a file with the size of around 10 GB or more. The file contains only numbers ranging from 1 to 10 on each line and nothing else. Now the task is to read the data[numbers] from the file and then sort the numbers in ascending or descending order and create a new file with the sorted numbers.
Can anyone of you please help me with the answer?
I'm assuming this is somekind of homework and goal for this is to sort more data than you can hold in your RAM?
Since you only have numbers 1-10, this is not that complicated task. Just open your input file and count how many occourances of every specific number you have. After that you can construct simple loop and write values into another file. Following example is pretty self explainatory.
$inFile = '/path/to/input/file';
$outFile = '/path/to/output/file';
$input = fopen($inFile, 'r');
if ($input === false) {
throw new Exception('Unable to open: ' . $inFile);
}
//$map will be array with size of 10, filled with 0-s
$map = array_fill(1, 10, 0);
//Read file line by line and count how many of each specific number you have
while (!feof($input)) {
$int = (int) fgets($input);
$map[$int]++;
}
fclose($input);
$output = fopen($outFile, 'w');
if ($output === false) {
throw new Exception('Unable to open: ' . $outFile);
}
/*
* Reverse array if you need to change direction between
* ascending and descending order
*/
//$map = array_reverse($map);
//Write values into your output file
foreach ($map AS $number => $count) {
$string = ((string) $number) . PHP_EOL;
for ($i = 0; $i < $count; $i++) {
fwrite($output, $string);
}
}
fclose($output);
Taking into account the fact, that you are dealing with huge files, you should also check script execution time limit for your PHP environment, following example will take VERY long for 10GB+ sized files, but since I didn't see any limitations concerning execution time and performance in your question, I'm assuming it is OK.
I had a similar issue before. Trying to manipulate such a large file ended up being huge drain on resources and it couldn't cope. The easiest solution I ended up with was to try and import it into a MySQL database using a fast data dump function called LOAD DATA INFILE
http://dev.mysql.com/doc/refman/5.1/en/load-data.html
Once it's in you should be able to manipulate the data.
Alternatively, you could just read the file line by line while outputting the result into another file line by line with the sorted numbers. Not too sure how well this would work though.
Have you had any previous attempts at it or are you just after a possible method of doing it?
If that's all you don't need PHP (if you have a Linux maschine at hand):
sort -n file > file_sorted-asc
sort -nr file > file_sorted-desc
Edit: OK, here's your solution in PHP (if you have a Linux maschine at hand):
<?php
// Sort ascending
`sort -n file > file_sorted-asc`;
// Sort descending
`sort -nr file > file_sorted-desc`;
?>
:)

php fgetcsv multiple lines not only one or all

I wand to read biiiiig CSV-Files and want to insert them into a database. That already works:
if(($handleF = fopen($path."\\".$file, 'r')) !== false){
$i = 1;
// loop through the file line-by-line
while(($dataRow = fgetcsv($handleF,0,";")) !== false) {
// Only start at the startRow, otherwise skip the row.
if($i >= $startRow){
// Check if to use headers
if($lookAtHeaders == 1 && $i == $startRow){
$this->createUberschriften( array_map(array($this, "convert"), $dataRow ) );
} else {
$dataRow = array_map(array($this, "convert"), $dataRow );
$data = $this->changeMapping($dataRow, $startCol);
$this->executeInsert($data, $tableFields);
}
unset($dataRow);
}
$i++;
}
fclose($handleF);
}
My problem of this solution is, that it's very slow. But the files are too big to put it directly into the memory... So I wand to ask, if there a posibility to read, for example 10 lines, into the $dataRow array not only one or all.
I want to get a better balance between the memory and the performance.
Do you understand what i mean? Thanks for help.
Greetz
V
EDIT:
Ok, I still have to try to find a solution with the MSSQL-Database. My solution was to stack the data and than make a multiple-MSSQL-Insert:
while(($dataRow = fgetcsv($handleF,0,";")) !== false) {
// Only start at the startRow, otherwise skip the row.
if($i >= $startRow){
// Check if to use headers
if($lookAtHeaders == 1 && $i == $startRow){
$this->createUberschriften( array_map(array($this, "convert"), $dataRow ) );
} else {
$dataRow = array_map(array($this, "convert"), $dataRow );
$data = $this->changeMapping($dataRow, $startCol);
$this->setCurrentRow($i);
if(count($dataStack) > 210){
array_push($dataStack, $data);
#echo '<pre>', print_r($dataStack), '</pre>';
$this->executeInsert($dataStack, $tableFields, true);
// reset the stack
unset($dataStack);
$dataStack = array();
} else {
array_push($dataStack, $data);
}
unset($data);
}
$i++;
unset($dataRow);
}
}
Finaly I have to loop the Stack and build in mulitiple Insert in the method "executeInsert", to create a query like this:
INSERT INTO [myTable] (field1, field2) VALUES ('data1', 'data2'),('data2', 'datta3')...
That works much better. I still have to check the best balance, but therefor i can change only the value '210' in the code above. I hope that help's everybody with a similar problem.
Attention: Don't forget to execute the method "executeInsert" again after readin the complete file, because it could happen that there are still some data in the stack and the method will only be executed when the stack reach the size of 210....
Greetz
V
I think your bottleneck is not reading the file. Which is a text file. Your bottleneck is the INSERT in the SQL table.
Do something, just comment the line that actually do the insert and you will see the difference.
I had this same issue in the past, where i did exactly what you are doing. reading a 5+ million lines CSV and inserting it in a Mysql table. The execution time was 60 hours which is
unrealistic.
My solutions was switch to another db technology. I selected MongoDB and the execution time
was reduced to 5 minutes. MongoDB performs really fast on this scenarios and also have a tool called mongoimport that will allow you to import a csv file firectly from the command line.
Give it a try if the db technology is not a limitation on your side.
Another solution will be spliting the huge CSV file into chunks and then run the same php script multiple times in parallel and each one will take care of the chunks with an specific preffix or suffix on the filename.
I don't know which specific OS are you using, but in Unix/Linux there is a command line tool
called split that will do that for you and will also add any prefix or suffix you want to the filename of the chunks.

Output 1,000s of records to text file

So I was hoping to be able to get by with a simple solution to read records from a database and save them to a text file that the user downloads. I have been doing this on the fly and for under 20,000 records, this works great. Over 20,000 records and I'm loading too much data into memory and PHP hits a fatal error.
My thought was to just grab everything in chunks. So I grab XX number of rows and echo them to the file and then loop to get the next XX rows until I'm done.
I am just echoing the results right now though, not building the file and then sending it for download, which I'm guessing I'll have to do.
The issue at this point succinctly is that with up to 20,000 rows, the file builds and downloads perfectly. With more than that, I get an empty file.
The code:
header('Content-type: application/txt');
header('Content-Disposition: attachment; filename="export.'.$file_type.'"');
header('Expires: 0');
header('Cache-Control: must-revalidate');
// I do other things to check for records before, hence the do-while loop
$this->items = $model->getItems();
do {
foreach ($this->items as $k => $item) {
$i=0;
$tables = count($this->data['column']);
foreach ($this->data['column'] as $table => $fields) {
$columns = count($fields);
$j = 0;
foreach ($fields as $field => $junk) {
if ($quote_output) {
echo '"'.ucwords(str_replace(array('"'), array('\"'), $item->$field)).'"';
} else {
echo ''.$item->$field.'';
}
$j++;
if ($j<$columns) {
echo $delim;
}
}
$i++;
if ($i<$tables) {
echo $delim;
}
}
echo "\n";
}
} while($this->items = $this->_model->getItems());
Very large tables won't work that way.
You have to output the data as you read it from the database. If you need to sorted, then use the database ORDER BY for that purpose.
So more or less
// assuming you use a var such as $query to handle the DB
while(!$query->eof())
{
$fields = $query->read_next();
echo $fields; // with your formatting, maybe call a function...
}
The empty result is normal. If the memory is exhausted before any echo happens then nothing was sent to the browser.
Note also that PHP has a time limit (a watchdog) that you may need to tweak. The default is defined in your php.ini. You may set it to zero if you expect the tables to grow very much.
You should change your str_replace for addslashes(). This will probably free some memory.
Then I suggest you to save a file and use php file functions to do so: fopen() or file_put_contents().
I hope that might help you!
Actually, this might be simple fix. If PHP is running out of memory it's probably because the output buffer is filling before the file is sent. If so, simply flush() at regular intervals.
This will flush after each line:
do {
foreach(...) {
// assemble your output line here
}
echo "\n";
flush();
}
} while($this->items = $this->_model->getItems());
Flushing after each line might prove too slow, in which case add a counter and flush after every hundred, or whatever works best.

Categories