PHP memory usage for fopen in append mode

PHP memory usage for fopen in append mode - php

I have a custom CakePHP shopping cart application where I’m trying to create a CSV file that contains a row of data for each transaction. I’m running into memory problems when having PHP create the CSV file at once by compiling the relevant data in the MySql databse. Currently the CSV file contains about 200 rows of data.
Alternatively, I’ve considered creating the CSV in a piecemeal process by appending a row of data to the file every time a transaction is made using: fopen($mFile.csv, 'a');
My developers are saying that I will still run into memory issues with this approach when the CSV file gets too large as PHP will read the whole file into memory. Is this the case? When using the append mode will PHP attempt to read the whole file into memory? If so, can you recommend a better approach?
Thanks in advance,
Ben

I ran the following script for a few minutes, and generated a 1.4gb file, well over my php memory limit. I also read from the file without issue. If you are running into memory issues it is probably something else that is causing the problem.
$fp = fopen("big_file.csv","a");
for($i = 0; $i < 100000000; $i++)
{
fputcsv($fp , array("val1","val2","val3","val4","val5","val6","val7","val8","val9"));
}

can't you just export from the db like so:
SELECT list_fields INTO OUTFILE '/tmp/result.text'
FIELDS TERMINATED BY ',' OPTIONALLY ENCLOSED BY '"'
LINES TERMINATED BY '\n'
FROM test_table;

Related

PHP - How to append to a JSON file

I generate JSON files which I load into datatables, and these JSON files can contain thousands of rows from my database. To generate them, I need to loop through every row in the database and add each database row as a new row in the JSON file. The problem I'm running into is this:
Fatal error: Allowed memory size of 134217728 bytes exhausted (tried to allocate 262643 bytes)
What I'm doing is I get the JSON file with file_get_contents($json_file) and decode it into an array then I add a new row to the array, then encode the array back into JSON and export it to the file with file_put_contents($json_file).
Is there a better way to do this? Is there a way I can prevent the memory increasing with each loop iteration? Or is there a way I can clear the memory before it reaches the limit? I need the script to run to completion, but with this memory problem it barely gets up to 5% completion before crashing.
I can keep rerunning the script and each time I rerun it, it adds more rows to the JSON file, so if this memory problem is unavoidable, is there a way to automatically rerun the script numerous times until its finished? For example could I detect the memory usage, and detect when its about to reach the limit, then exit out of the script and restart it? I'm on wpengine so they won't allow security risky functions like exec().

So I switched to using CSV files and it solved the memory problem. The script runs vastly faster too. JQuery DataTables doesn't have built in support for CSV files, so I wrote a function to convert the CSV file to JSON:
public function csv_to_json($post_type) {
$data = array(
"recordsTotal" => $this->num_rows,
"recordsFiltered" => $this->num_rows,
"data"=>array()
);
if (($handle = fopen($this->csv_file, 'r')) === false) {
die('Error opening file');
}
$headers = fgetcsv($handle, 1024, "\t");
$complete = array();
while ($row = fgetcsv($handle, 1024, "\t")) {
$complete[] = array_combine($headers, $row);
}
fclose($handle);
$data['data'] = $complete;
file_put_contents($this->json_file,json_encode($data,JSON_PRETTY_PRINT));
}
So the result is I create a CSV file and a JSON file much faster than creating a JSON file alone, and there are no issues with memory limits.

Personally as I said in the comments, I would use CSV files. They have several advantages.
you can read / write one line at a time so you only manage the memory for one line
you can just append new data into the file.
PHP has plenty of built in support using either the fputcsv() or SPL file objects.
you can load them directly into the database using using "Load Data Infile"
http://dev.mysql.com/doc/refman/5.7/en/load-data.html
The only cons are
keep the same schema through the whole file
no nested data structures
The issue with Json, is ( as far as I know ) you have to keep the whole thing in memory as a single data set. Therefor you cannot stream it ( line for line ) like a normal text file. There is really no solution beside limiting the size of the json data, which may or may not even be easy to do. You can increase the memory some, but that is just a temporary fix if you expect the data to continue to grow.
We use CSV files in a production environment and I regularly deal with datasets that are 800k or 1M rows. I've even seen one that was 10M rows. We have a single table of 60M rows ( MySql ) that is populated from CSV uploads. So it will work and be robust.
If your set on Json, then I would just come up with a fixed number of rows that works and design your code to only run that many rows at a time. It's impossible for me to guess how to do that without more details.

php loop 502 Bad Gateway

Good afternoon.
I have a problem with php foreach loop.
I parse xml file ( ~20mb ) using simplexml and then insert data to mysql
in xml are over 37000 items,i must loop 37000 times ,to read data from xml
every 100 iteration i create string like this:
insert into my_table values (...)
But i get a 502 error in 10500-st iteration.
I try send string after loop,but get error again:
memory_limit=240
max_execution_time 500
How can I solve this problem.
Thanks and best regards.

I think the problem is that your script is timing out, you can overcome this by using set_time_limit(0) in you script or by changing the max_execution_time in your php.ini:
while(1) {
set_time_limit(0);
// do something
}
You also need to increase your memory_limit by editing your php.ini and restart your webserver.
Read documentation for set_time_limit()

Can you split the one big xml into several smaller files?
I'd queue the 37000 items into several batches and process them one after another or asynchronous. I've done this a few times in PHP. A better language for jobs like this would be Phyton or RoR.
However, try creating batches of the items.
 
Regarding storing the data, I use CSV batch inserts with MySQL.
I use this function to convert the strings into CSV format:
<?php
function convertStrToCsv($data, $delimiter = ';', $enclosure = '"')
{
ob_start();
$fp = fopen('php://output', 'w');
fputcsv($fp, $data, $delimiter, $enclosure);
fclose($fp);
return ob_get_clean();
}
… then I save the functions output as a file and finaly use this query to save the CSV data in the Database:
LOAD DATA LOW_PRIORITY LOCAL INFILE '$file' IGNORE INTO TABLE `$table` CHARACTER SET utf8 FIELDS TERMINATED BY ';' OPTIONALLY ENCLOSED BY '\"' ESCAPED BY '\"' LINES TERMINATED BY '\\n';
 
Read more about this:
LOAD DATA INFILE
PHP's str_getcsv()
Happy coding!

Process very big csv file without timeout and memory error

At the moment I'm writing an import script for a very big CSV file. The Problem is most times it stops after a while because of an timeout or it throws an memory error.
My Idea was now to parse the CSV file in "100 lines" steps and after 100 lines recall the script automatically. I tried to achieve this with header (location ...) and pass the current line with get but it didn't work out as I want to.
Is there a better way to this or does someone have an idea how to get rid of the memory error and the timeout?

I've used fgetcsv to read a 120MB csv in a stream-wise-manner (is that correct english?). That reads in line by line and then I've inserted every line into a database. That way only one line is hold in memory on each iteration. The script still needed 20 min. to run. Maybe I try Python next time… Don't try to load a huge csv-file into an array, that really would consume a lot of memory.
// WDI_GDF_Data.csv (120.4MB) are the World Bank collection of development indicators:
// http://data.worldbank.org/data-catalog/world-development-indicators
if(($handle = fopen('WDI_GDF_Data.csv', 'r')) !== false)
{
// get the first row, which contains the column-titles (if necessary)
$header = fgetcsv($handle);
// loop through the file line-by-line
while(($data = fgetcsv($handle)) !== false)
{
// resort/rewrite data and insert into DB here
// try to use conditions sparingly here, as those will cause slow-performance
// I don't know if this is really necessary, but it couldn't harm;
// see also: http://php.net/manual/en/features.gc.php
unset($data);
}
fclose($handle);
}

I find uploading the file and inserting using mysql's LOAD DATA LOCAL query a fast solution eg:
$sql = "LOAD DATA LOCAL INFILE '/path/to/file.csv'
REPLACE INTO TABLE table_name FIELDS TERMINATED BY ','
ENCLOSED BY '\"' LINES TERMINATED BY '\r\n' IGNORE 1 LINES";
$result = $mysqli->query($sql);

If you don't care about how long it takes and how much memory it needs, you can simply increase the values for this script. Just add the following lines to the top of your script:
ini_set('memory_limit', '512M');
ini_set('max_execution_time', '180');
With the function memory_get_usage() you can find out how much memory your script needs to find a good value for the memory_limit.
You might also want to have a look at fgets() which allows you to read a file line by line. I am not sure if that takes less memory, but I really think this will work. But even in this case you have to increase the max_execution_time to a higher value.

There seems to be an enormous difference between fgetcsv() and fgets() when it comes to memory consumption.
A simple CSV with only one column passed my 512M memory limit for just 50000 records with fgetcsv() and took 8 minutes to report that.
With fgets() it took only 3 minutes to successfully process 649175 records, and my local server wasn't even gasping for additional air..
So my advice is to use fgets() if the number of columns in your csv is limited. In my case fgets() returned directly the string inside column 1.
For more then one column, you might use explode() in a disposable array which you unset() after each record operation.
Thumbed up answer 3 #ndkauboy

Oh. Just make this script called as CLI, not via silly web interface. So, no execution time limit will affect it.
And do not keep parsed results forever but write them down immediately - so, you won't be affected by memory limit either.

Writing a very large CSV file From DB output in PHP

I have a DB of sensor data that is being collected every second. The client would like to be able to download 12hour chunks in CSV format - This is all done.
The output is sadly not straight data and needs to be processed before the CSV can be created (parts are stored as JSON in the DB) - so I cant just dump the table.
So, to reduce load, I figured that the first time the file is downloaded, I would cache it to disk, then any more requests just download that file.
If I dont try to write it (using file_put_contents, FILE_APPEND), and just echo every line it is fine, but writing it, even if I give the script 512M it runs out of memory.
so this works
while($stmt->fetch()){
//processing code
$content = //CSV formatting
echo $content;
}
This does not
while($stmt->fetch()){
//processing code
$content = //CSV formatting
file_put_contents($pathToFile, $content, FILE_APPEND);
}
It seems like even thought I am calling file_put_contents at every line, it is storing it all to memory.
Any suggestions?

The problem is that file_put_contents is trying to dump the entire thing at once. Instead you should loop through in your formatting and use fopen, fwrite, fclose.
while($stmt->fetch()){
//processing code
$content[] = //CSV formatting
$file = fopen($pathToFile, a);
foreach($content as $line)
{
fwrite($file, $line);
}
fclose($file);
}
This will limit the amount of data trying to be tossed around in data at any given time.

I agree completely with writing one line at a time, you will never have memory issues this way since there is never more than 1 line loaded in to memory at a time. I have an application that does the same. A problem I have found with this method however, is that the file takes forever to finish writing. So this post is to back up what has already been said, but also to ask all of you for an opinion on how to speed this up? For example, my system cleans a data file against a suppression file, so I read in one line at a time and look for a match in the suppression file, then if no match is found, I write the line in to the new cleaned file. A 50k line file is taking about 4 hours to finish however, so I am hoping to find a better way. I have tried this several ways, and at this point I load the entire suppression file in to memory now to avoid my main reading loop to have to run another loop through each line in the suppression file, but even that is still taking hours.
So, line by line is by far the best way to manage your system's memory, but I'd like to get the processing time for a 50k line file (lines are email addresses and first and last names) to finishing running in less than 30 minutes if possible.
fyi: the suppression file is 16,000 kb in size and total memory used by the script as told by memory_get_usage() is about 35 megs.
Thanks!

How to read from where it stopped in a loop?

I do have text file having 500k data.
I am running a loop to store some info. something like..
$file = fopen("top-1-500000.txt", "r") or exit("Unable to open file!");
while(!feof($file)) { //some function
mysql_query("INSERT INTO table (name) VALUES ('$value')");
} fclose($file);
The issue is when the loop stopped in the middle then I need to delete data which already read from text file by reading mySQL database manually to prevent loop to read it again from first line from text file. this is a huge effort in terms of multiple files.

Alternative method of reading a large file is to use the MySQL LOAD DATA INFILE functionality.
Example:
LOAD DATA INFILE 'top-1-500000.txt' INTO TABLE tbl_name
FIELDS TERMINATED BY ',' ENCLOSED BY '"'
LINES TERMINATED BY '\r\n'
IGNORE 1 LINES;

I'm assuming by "stopped in the middle" you mean that the script is timing out. You should use set_time_limit to prevent your script from timing out (I'm assuming your server config allows you to do this).

if you use file() to read the file, you can just use a counter inside a for-loop counting the number of iterations, then when you need to go back to where the loop stopped, use that counter
$file_array = file("top-1-500000.txt");
for($i=0;$i<count($file_array);$i++)
{
// ...code here...
mysql_query("INSERT INTO table (name) VALUES ('$value')");
}
Also, this is assuming this isn't an absolutely massive file

You are reading the file line by line, so
$line = fgets($handle);
will read 1024 chars by default (you can specify that)
There's a function called fseek with which you can navigate the internal file pointer and read from that place on.
A possible solution is to keep in the database the number of read lines and when (if) the loop dies in the middle, you can have $number_lines * 1024 to be the offset for fseek()
and continue to read on.

We Keep Coding

PHP, A popular general-purpose scripting language that is especially suited to web development.

PHP memory usage for fopen in append mode - php

can't you just export from the db like so: SELECT list_fields INTO OUTFILE '/tmp/result.text' FIELDS TERMINATED BY ',' OPTIONALLY ENCLOSED BY '"' LINES TERMINATED BY '\n' FROM test_table;

Related

PHP - How to append to a JSON file

php loop 502 Bad Gateway

Process very big csv file without timeout and memory error

Writing a very large CSV file From DB output in PHP

How to read from where it stopped in a loop?

Categories

Resources