PHP memory exhaused while using array_combine in foreach loop - php

I'm having a trouble when tried to use array_combine in a foreach loop. It will end up with an error:
PHP Fatal error: Allowed memory size of 268435456 bytes exhausted (tried to allocate 85 bytes) in
Here is my code:
$data = array();
$csvData = $this->getData($file);
if ($columnNames) {
$columns = array_shift($csvData);
foreach ($csvData as $keyIndex => $rowData) {
$data[$keyIndex] = array_combine($columns, array_values($rowData));
}
}
return $data;
The source file CSV which I've used has approx ~1,000,000 rows. This row
$csvData = $this->getData($file)
I was using a while loop to read CSV and assign it into an array, it's working without any problem. The trouble come from array_combine and foreach loop.
Do you have any idea to resolve this or simply have a better solution?
UPDATED
Here is the code to read the CSV file (using while loop)
$data = array();
if (!file_exists($file)) {
throw new Exception('File "' . $file . '" do not exists');
}
$fh = fopen($file, 'r');
while ($rowData = fgetcsv($fh, $this->_lineLength, $this->_delimiter, $this->_enclosure)) {
$data[] = $rowData;
}
fclose($fh);
return $data;
UPDATED 2
The code above is working without any problem if you are playing around with a CSV file <=20,000~30,000 rows. From 50,000 rows and up, the memory will be exhausted.

You're in fact keeping (or trying to keep) two distinct copies of the whole dataset in your memory. First you load the whole CSV date into memory using getData() and the you copy the data into the $data array by looping over the data in memory and creating a new array.
You should use stream based reading when loading the CSV data to keep just one data set in memory. If you're on PHP 5.5+ (which you definitely should by the way) this is a simple as changing your getData method to look like that:
protected function getData($file) {
if (!file_exists($file)) {
throw new Exception('File "' . $file . '" do not exists');
}
$fh = fopen($file, 'r');
while ($rowData = fgetcsv($fh, $this->_lineLength, $this->_delimiter, $this->_enclosure)) {
yield $rowData;
}
fclose($fh);
}
This makes use of a so-called generator which is a PHP >= 5.5 feature. The rest of your code should continue to work as the inner workings of getData should be transparent to the calling code (only half of the truth).
UPDATE to explain how extracting the column headers will work now.
$data = array();
$csvData = $this->getData($file);
if ($columnNames) { // don't know what this one does exactly
$columns = null;
foreach ($csvData as $keyIndex => $rowData) {
if ($keyIndex === 0) {
$columns = $rowData;
} else {
$data[$keyIndex/* -1 if you need 0-index */] = array_combine(
$columns,
array_values($rowData)
);
}
}
}
return $data;

Related

Reading > 1GB GZipped CSV files from external FTP server

In a scheduled task of my Laravel application I'm reading several large gzipped CSV files, ranging from 80mb to 4gb on an external FTP server, containing products which I store in my database based on a product attribute.
I loop through a list of product feeds that I want to import but each time a fatal error is returned: 'Allowed memory size of 536870912 bytes exhausted'. I can bump up the length parameter of the fgetcsv function from 1000 to 100000 which solves the problem for the smaller files (< 500mb) but for the larger files it will return the fatal error.
Is there a solution that allows me to either download or unzip the .csv.gz files, reading the lines (by batch or one by one) and inserting the products into my database without running out of memory?
$feeds = [
"feed_baby-mother-child.csv.gz",
"feed_computer-games.csv.gz",
"feed_general-books.csv.gz",
"feed_toys.csv.gz",
];
foreach ($feeds as $feed) {
$importedProducts = array();
$importedFeedProducts = 0;
$csvfile = 'compress.zlib://ftp://' . config('app.ftp_username') . ':' . config('app.ftp_password') . '#' . config('app.ftp_host') . '/' . $feed;
if (($handle = fopen($csvfile, "r")) !== FALSE) {
$row = 1;
$header = fgetcsv($handle, 1, "|");
while (($data = fgetcsv($handle, 1000, "|")) !== FALSE) {
if($row == 1 || array(null) !== $data){ $row++; continue; }
$product = array_combine($header, $data);
$importedProducts[] = $product;
}
fclose($handle);
} else {
echo 'Failed to open: ' . $feed . PHP_EOL;
continue;
}
// start inserting products into the database below here
}
The problem is probably not the gzip file itself,
Of course you can download it, on process it then, this will keep the same issues.
Because you are loading all products in a single array (Memory)
$importedProducts[] = $product;
You could comment this line out, and see it if this prevent's hitting your memory limit.
Usually i would create a method like this addProduct($product) to handle it memory safe.
You can then from there decide a max number of products before doing a bulk insert. to achieve optimal speed.. i usually use something between 1000 en 5000 rows.
For example
class ProductBatchInserter
{
private $maxRecords = 1000;
private $records = [];
function addProduct($record) {
$this->records[] = $record;
if (count($this->records) >= $this->maxRecords) {
EloquentModel::insert($this->records);
$this->records = [];
}
}
}
However i usualy don't implement it as a single class, but in my projects i used to integrate them as a BulkInsertable trait that could be used on any eloquent model.
But this should give you an direction, how you can avoid memory limits.
Or, the easier , but significantly slower, just insert the row where you now assign it to array.
But that will put a ridiculous load on your database and will be really very slow.
If the GZIP stream is the bottleneck
As i expect this is not the issue, but if it would, then you could use gzopen()
https://www.php.net/manual/en/function.gzopen.php
and nest the gzopen handle as handle for fgetcsv.
But i expect the streamhandler you are using, is doing this already the same way for you..
If not, i mean like this:
$input = gzopen('input.csv.gz', 'r');
while (($row = fgetcsv($input)) !== false) {
// do something memory safe, like suggested above
}
If you need to download it anyway there are many ways to do it, but make sure you use something memory safe, like fopen / fgets , or a guzzle stream and don't try to use something like file_get_contents() that loads it into memory

How to parse a csv file that contains 15 million lines of data in php

I have a script which parses the CSV file and start verifying the emails. this works fine for 1000 lines. but on 15 million lines it shows memory exhausted error. the file size is 400MB. any suggestions? how to parse and verify them?
Server Specs: Core i7 with 32GB of Ram
function parse_csv($file_name, $delimeter=',') {
$header = false;
$row_count = 0;
$data = [];
// clear any previous results
reset_parse_csv();
// parse
$file = fopen($file_name, 'r');
while (!feof($file)) {
$row = fgetcsv($file, 0, $delimeter);
if ($row == [NULL] || $row === FALSE) { continue; }
if (!$header) {
$header = $row;
} else {
$data[] = array_combine($header, $row);
$row_count++;
}
}
fclose($file);
return ['data' => $data, 'row_count' => $row_count];
}
function reset_parse_csv() {
$header = false;
$row_count = 0;
$data = [];
}
Iterating over a large dataset (file lines, etc.) and pushing into array it increases memory usage and this is directly proportional to the number of items handling.
So the bigger file, the bigger memory usage - in this case.
If it's desired a function to formatting the CSV data before processing it, backing it on the of generators sounds like a great idea.
Reading the PHP doc it fits very well for your case (emphasis mine):
A generator allows you to write code that uses foreach to iterate over a set of data without needing to build an array in memory, which
may cause you to exceed a memory limit, or require a considerable
amount of processing time to generate.
Something like this:
function csv_read($filename, $delimeter=',')
{
$header = [];
$row = 0;
# tip: dont do that every time calling csv_read(), pass handle as param instead ;)
$handle = fopen($filename, "r");
if ($handle === false) {
return false;
}
while (($data = fgetcsv($handle, 0, $delimeter)) !== false) {
if (0 == $row) {
$header = $data;
} else {
# on demand usage
yield array_combine($header, $data);
}
$row++;
}
fclose($handle);
}
And then:
$generator = csv_read('rdu-weather-history.csv', ';');
foreach ($generator as $item) {
do_something($item);
}
The major difference here is:
you do not get (from memory) and consume all data at once. You get items on demand (like a stream) and process it instead, one item at time. It has huge impact on memory usage.
P.S.: The CSV file above has taken from: https://data.townofcary.org/api/v2/catalog/datasets/rdu-weather-history/exports/csv
It is not necessary to write a generator function. The SplFileObject also works fine.
$fileObj = new SplFileObject($file);
$fileObj->setFlags(SplFileObject::READ_CSV
| SplFileObject::SKIP_EMPTY
| SplFileObject::READ_AHEAD
| SplFileObject::DROP_NEW_LINE
);
$fileObj->setCsvControl(';');
foreach($fileObj as $row){
//do something
}
I tried that with the file "rdu-weather-history.csv" (> 500KB). memory_get_peak_usage() returned the value 424k after the foreach loop. The values ​​must be processed line by line.
If a 2-dimensional array is created, the storage space required for the example increases to more as 8 Mbytes.
One thing you could possibly attempt, is a Bulk Import to MySQL which may give you a better platform to work from once it's imported.
LOAD DATA INFILE '/home/user/data.csv' INTO TABLE CSVImport; where CSVimport columns match your CSV.
Bit of a left field suggestion, but depending on what your use case is can be a better way to parse massive datasets.

Fatal Error - Out of Memory while reading a *.dat file in php [duplicate]

I am reading a file containing around 50k lines using the file() function in Php. However, its giving a out of memory error since the contents of the file are stored in the memory as an array. Is there any other way?
Also, the lengths of the lines stored are variable.
Here's the code. Also the file is 700kB not mB.
private static function readScoreFile($scoreFile)
{
$file = file($scoreFile);
$relations = array();
for($i = 1; $i < count($file); $i++)
{
$relation = explode("\t",trim($file[$i]));
$relation = array(
'pwId_1' => $relation[0],
'pwId_2' => $relation[1],
'score' => $relation[2],
);
if($relation['score'] > 0)
{
$relations[] = $relation;
}
}
unset($file);
return $relations;
}
Use fopen, fread and fclose to read a file sequentially:
$handle = fopen($filename, 'r');
if ($handle) {
while (!feof($handle)) {
echo fread($handle, 8192);
}
fclose($handle);
}
EDIT after update of question and comments to answer of fabjoa:
There is definitely something fishy if a 700kb file eats up 140MB of memory with that code you gave (you could unset $relation at the end of the each iteration though). Consider using a debugger to step through it to see what happens. You might also want to consider rewriting the code to use SplFileObject's CSV functions as well (or their procedural cousins)
SplFileObject::setCsvControl example
$file = new SplFileObject("data.csv");
$file->setFlags(SplFileObject::READ_CSV);
$file->setCsvControl('|');
foreach ($file as $row) {
list ($fruit, $quantity) = $row;
// Do something with values
}
For an OOP approach to iterate over the file, try SplFileObject:
SplFileObject::fgets example
$file = new SplFileObject("file.txt");
while (!$file->eof()) {
echo $file->fgets();
}
SplFileObject::next example
// Read through file line by line
$file = new SplFileObject("misc.txt");
while (!$file->eof()) {
echo $file->current();
$file->next();
}
or even
foreach(new SplFileObject("misc.txt") as $line) {
echo $line;
}
Pretty much related (if not duplicate):
How to save memory when reading a file in Php?
If you don't know the maximum line length and you are not comfortable to use a magic number for the max line length then you'll need to do an initial scan of the file and determine the max line length.
Other than that the following code should help you out:
// length is a large number or calculated from an initial file scan
while (!feof($handle)) {
$buffer = fgets($handle, $length);
echo $buffer;
}
Old question but since I haven't seen anyone mentioning it, PHP generators is a great way to reduce save memory consumption.
For example:
function read($fileName)
{
$fileHandler = fopen($fileName, 'rb');
while(($line = fgets($fileHandler)) !== false) {
yield rtrim($line, "\r\n");
}
fclose($fileHandler);
}
foreach(read(__DIR__ . '/filenameHere') as $line) {
echo $line;
}
allocate more memory during the operation, maybe something like ini_set('memory_limit', '16M');. Don't forget to go back to initial memory allocation once operation is done

php memory limit and reading/writing temp files

using the function below I am pulling rows from tables, encoding them, then putting them in csv format. I am wondering if there is an easier way to prevent high memory usage. I don't want to have to rely on ini_set. I believe the memory consumption is caused from reading the temp file and gzipping it up. I'd love to be able to have a limit of 64mb ram to work with. Any ideas? Thanks!
function exportcsv($tables) {
foreach ($tables as $k => $v) {
$fh = fopen("php://temp", 'w');
$sql = mysql_query("SELECT * FROM $v");
while ($row = mysql_fetch_row($sql)) {
$line = array();
foreach ($row as $key => $vv) {
$line[] = base64_encode($vv);
}
fputcsv($fh, $line, chr(9));
}
rewind($fh);
$data = stream_get_contents($fh);
$gzdata = gzencode($data, 6);
$fp = fopen('sql/'.$v.'.csv.gz', 'w');
fwrite($fp, $gzdata);
fclose($fp);
fclose($fh);
}
}
untested, but hopefully you understand
function exportcsv($tables) {
foreach ($tables as $k => $v) {
$fh = fopen('compress.zlib://sql/' .$v. '.csv.gz', 'w');
$sql = mysql_unbuffered_query("SELECT * FROM $v");
while ($row = mysql_fetch_row($sql)) {
fputcsv($fh, array_map('base64_encode', $row), chr(9));
}
fclose($fh);
mysql_free_result($sql);
}
}
edit-
points of interest are the use of mysql_unbuffered_query and use of php's compression stream. regular mysql_query() buffers entire result set into memory. and using the compression stream gets rid of having to buffer the data yet again into php memory as a string before writing to a file.
Pulling the whole file into memory via the stream_get_contents() is probably what's killing you. Not only are you having to hold the base64 data (which is usually about 33% than its raw content), you've got the csv overhead to deal with as well. If memory is a problem, consider simply calling a command-line gzip app instead of gzipping inside of PHP, something like:
... database loop here ...
exec('gzip yourfile.csv');
And you can probably optimize things a little bit better inside the DB loop, and encode in-place, rather than building a new array for each row:
while($row = mysql_fetch_row($result)) {
foreach ($row as $key => $val) {
$row[$key] = base64_encode($val);
fputcsv($fh, $row, chr(9));
}
}
Not that this will reduce memory usage much - it's only a single row of data, so unless you're dealing with huge record fields, it won't have much effect.
You could insert some flushing there, currently your entire php file will be held in memory then flushed at the end, however if you manually
fflush($fh);
Also instead of gzipping the entire file you could gzip line by line using
$gz = gzopen ( $fh, 'w9' );
gzwrite ( $gz, $content );
gzclose ( $gz );
This will write line by line packed data rather than creating an entire file and then gzipping it.
I found this suggestion for compressing in chunks on http://johnibanez.com/node/21
It looks like it wouldn't be hard to modify for your purposes.
function gzcompressfile($source, $level = false){
$dest = $source . '.gz';
$mode = 'wb' . $level;
$error = false;
if ($fp_out = gzopen($dest, $mode)) {
if ($fp_in = fopen($source, 'rb')) {
while(!feof($fp_in)) {
gzwrite($fp_out, fread($fp_in, 1024*512));
}
fclose($fp_in);
}
else
$error=true;
gzclose($fp_out);
}
else
$error=true;
if ($error)
return false;
else
return $dest;
}

Least memory intensive way to read a file in PHP

I am reading a file containing around 50k lines using the file() function in Php. However, its giving a out of memory error since the contents of the file are stored in the memory as an array. Is there any other way?
Also, the lengths of the lines stored are variable.
Here's the code. Also the file is 700kB not mB.
private static function readScoreFile($scoreFile)
{
$file = file($scoreFile);
$relations = array();
for($i = 1; $i < count($file); $i++)
{
$relation = explode("\t",trim($file[$i]));
$relation = array(
'pwId_1' => $relation[0],
'pwId_2' => $relation[1],
'score' => $relation[2],
);
if($relation['score'] > 0)
{
$relations[] = $relation;
}
}
unset($file);
return $relations;
}
Use fopen, fread and fclose to read a file sequentially:
$handle = fopen($filename, 'r');
if ($handle) {
while (!feof($handle)) {
echo fread($handle, 8192);
}
fclose($handle);
}
EDIT after update of question and comments to answer of fabjoa:
There is definitely something fishy if a 700kb file eats up 140MB of memory with that code you gave (you could unset $relation at the end of the each iteration though). Consider using a debugger to step through it to see what happens. You might also want to consider rewriting the code to use SplFileObject's CSV functions as well (or their procedural cousins)
SplFileObject::setCsvControl example
$file = new SplFileObject("data.csv");
$file->setFlags(SplFileObject::READ_CSV);
$file->setCsvControl('|');
foreach ($file as $row) {
list ($fruit, $quantity) = $row;
// Do something with values
}
For an OOP approach to iterate over the file, try SplFileObject:
SplFileObject::fgets example
$file = new SplFileObject("file.txt");
while (!$file->eof()) {
echo $file->fgets();
}
SplFileObject::next example
// Read through file line by line
$file = new SplFileObject("misc.txt");
while (!$file->eof()) {
echo $file->current();
$file->next();
}
or even
foreach(new SplFileObject("misc.txt") as $line) {
echo $line;
}
Pretty much related (if not duplicate):
How to save memory when reading a file in Php?
If you don't know the maximum line length and you are not comfortable to use a magic number for the max line length then you'll need to do an initial scan of the file and determine the max line length.
Other than that the following code should help you out:
// length is a large number or calculated from an initial file scan
while (!feof($handle)) {
$buffer = fgets($handle, $length);
echo $buffer;
}
Old question but since I haven't seen anyone mentioning it, PHP generators is a great way to reduce save memory consumption.
For example:
function read($fileName)
{
$fileHandler = fopen($fileName, 'rb');
while(($line = fgets($fileHandler)) !== false) {
yield rtrim($line, "\r\n");
}
fclose($fileHandler);
}
foreach(read(__DIR__ . '/filenameHere') as $line) {
echo $line;
}
allocate more memory during the operation, maybe something like ini_set('memory_limit', '16M');. Don't forget to go back to initial memory allocation once operation is done

Categories