I have a big csv file ( about 30M ), that I want to read using my php program and convert it to another format and save it as different small files . When I am using the traditional fopen , fwrite methods I am getting an error that says Fatal error: Allowed memory size of 134217728 bytes exhausted . I am aware that I can set the memory limit in php.ini but is there any way that I can read the file as stream so that it wont create much memory overhead ? May be something like StreamReader classes in java ?
You could just read the file one line at a time with fgets(), provided you are reassigning your variable each time through (and not storing the lines in an array or something, where they would remain in memory).
One way, with a ~65 MB file:
// load the whole thing
$file = file_get_contents('hugefile.txt');
echo memory_get_peak_usage() / 1024 / 1024, ' MB';
// prints '66.153938293457 MB'
Second way:
// load only one line at a time
$fh = fopen('hugefile.txt', 'r');
while ($line = fgets($fh)) {}
echo memory_get_peak_usage() / 1024 / 1024, ' MB';
// prints '0.62477111816406 MB'
Also, if you want to rearrange the data in a different format, you could parse each line as CSV as you go using fgetcsv() instead.
Related
This question already has answers here:
Streaming a large file using PHP
(5 answers)
Closed 4 years ago.
I have a Laravel 5.3 project. I need to import and parse a pretty large (1.6M lines) text file.
I am having memory resource issues. I think at some point, I need to use chunk but am having trouble getting the file loaded to do so.
Here is what I am trying;
if(Input::hasFile('file')){
$path = Input::file('file')->getRealPath(); //assign file from input
$data = file($path); //load the file
$data->chunk(100, function ($content) { //parse it 100 lines at a time
foreach ($content as $line) {
//use $line
}
});
}
I understand that file() will return an array vs. File::get() which will return a string.
I have increased my php.ini upload and memory limits to be able to handle the file size but am running into this error;
Allowed memory size of 524288000 bytes exhausted (tried to allocate 4096 bytes)
This is occurring at the line;
$data = file($path);
What am I missing? And/or is this the most ideal way to do this?
Thanks!
As mentioned, file() reads the entire file into an array, in this case 1.6 million elements. I doubt that is possible. You can read each line one by one overwriting the previous one:
$fh = fopen($path "r");
if($fh) {
while(($line = fgets($fh)) !== false) {
//use $line
}
}
The only way to keep it from timing out is to set the maximum execution time:
set_time_limit(0);
If file is too large, you need split your file without php, you can use exec command safely, if you want use just with php interpreter, you need many memory and it need long time, linux commands save your time for each run.
exec('split -C 20m --numeric-suffixes input_filename output_prefix');
After that you may use Directory Iterator and read each file.
Regards
I'm using the following code to download a large file (>100mb). The code is executed in a shell.
$fileHandle = fopen($url, 'rb');
$bytes = 100000;
while ($read = #fread($fileHandle, $bytes)) {
debug(strlen($read));
if (!file_put_contents($filePath, $read, FILE_APPEND)) {
return false;
}
}
Where I would expect that debug(strlen($read)) would output 100000, this is the actual output:
10627
8192
8192
8192
...
Why doesn't fread read more than 8192 bytes after the first time, and why does it read 10627 bytes on the first iteration?
This makes downloading the file very slow, is there a better way to do this?
The answer to your question is (quoting from the PHP docs for fread()):
if the stream is read buffered and it does not represent a plain file, at most one read of up to a number of bytes equal to the chunk size (usually 8192) is made; depending on the previously buffered data, the size of the returned data may be larger than the chunk size
The solution to your performance problem is to using stream_copy_to_stream() which should be faster than block reading using fread(), and more memory efficient as well
I checked the manual, and found this: http://php.net/manual/en/function.fread.php
"If the stream is read buffered and it does not represent a plain file, at most one read of up to a number of bytes equal to the chunk size (usually 8192) is made;"
Since you're opening a URL this is probably the case.
It doesn't explain the 10627 though...
Besides that, why do you expect 100000 byte reads to be faster than 8192?
I doubt that's your bottle neck. My guess is that either the download speed from the URL or the writing speed of the HD is the problem.
This question already has answers here:
Least memory intensive way to read a file in PHP
(5 answers)
Closed 10 years ago.
I am trying to parse a tab delimited file that is ~1GB in size.
Where I run the script i get:
Fatal error: Allowed memory size of 1895825408 bytes exhausted (tried to allocate 1029206974 bytes) ...
My script at the moment is just:
$file = file_get_contents('allCountries.txt') ;
$file = str_replace(array("\r\n", "\t"), array("[NEW*LINE]", "[tAbul*Ator]"), $file) ;
I have set the memory limit in php.ini to -1, which then gives me:
Fatal error: Out of memory (allocated 1029963776) (tried to allocate 1029206974 bytes)
Is there anyway to partially open the file and then move on to the next part so less memory is used up at one time?
Yes, you can read it line by line:
$handle = #fopen("/tmp/inputfile.txt", "r");
if ($handle) {
while (($buffer = fgets($handle, 4096)) !== false) {
echo $buffer;
}
fclose($handle);
}
You have to use blocks to read the file. Check the answer of this question.
https://stackoverflow.com/a/6564818/1572528
You can also try to use this for less large files.
ini_set('memory_limit', '32M'); //max size 32m
Are you sure that it's fopen that's failing and not your script's timeout setting? The default is usually around 30 seconds or so, and if your file is taking longer than that to read in, it may be tripping that up.
Another thing to consider may be the memory limit on your script - reading the file into an array may trip over this, so check your error log for memory warnings.
If neither of the above are your problem, you might look into using fgets to read the file in line-by-line, processing as you go.
$handle = fopen("/tmp/uploadfile.txt", "r") or die("Couldn't get handle");
if ($handle) {
while (!feof($handle)) {
$buffer = fgets($handle, 4096);
// Process buffer here..
}
fclose($handle);
}
Edit
PHP doesn't seem to throw an error, it just returns false.
Is the path to $rawfile correct relative to where the script is running? Perhaps try setting an absolute path here for the filename.
Yes, use fopen and fread / fgets for this:
http://www.php.net/manual/en/function.fread.php
string fread ( resource $handle , int $length )
Set $length to how many of the file you want to read.
The $handle saves the position for new reads then, with fseek you can also set the position later....
I'm looking for the most efficient way to write the contents of the PHP input stream to disk, without using much of the memory that is granted to the PHP script. For example, if the max file size that can be uploaded is 1 GB but PHP only has 32 MB of memory.
define('MAX_FILE_LEN', 1073741824); // 1 GB in bytes
$hSource = fopen('php://input', 'r');
$hDest = fopen(UPLOADS_DIR.'/'.$MyTempName.'.tmp', 'w');
fwrite($hDest, fread($hSource, MAX_FILE_LEN));
fclose($hDest);
fclose($hSource);
Does fread inside an fwrite like the above code shows mean that the entire file will be loaded into memory?
For doing the opposite (writing a file to the output stream), PHP offers a function called fpassthru which I believe does not hold the contents of the file in the PHP script's memory.
I'm looking for something similar but in reverse (writing from input stream to file). Thank you for any assistance you can give.
Yep - fread used in that way would read up to 1 GB into a string first, and then write that back out via fwrite. PHP just isn't smart enough to create a memory-efficient pipe for you.
I would try something akin to the following:
$hSource = fopen('php://input', 'r');
$hDest = fopen(UPLOADS_DIR . '/' . $MyTempName . '.tmp', 'w');
while (!feof($hSource)) {
/*
* I'm going to read in 1K chunks. You could make this
* larger, but as a rule of thumb I'd keep it to 1/4 of
* your php memory_limit.
*/
$chunk = fread($hSource, 1024);
fwrite($hDest, $chunk);
}
fclose($hSource);
fclose($hDest);
If you wanted to be really picky, you could also unset($chunk); within the loop after fwrite to absolutely ensure that PHP frees up the memory - but that shouldn't be necessary, as the next loop will overwrite whatever memory is being used by $chunk at that time.
I have the following code:
<?php
$FILE="giant-data-barf.txt";
$fp = fopen($FILE,'r');
//read everything into data
$data = fread($fp, filesize($FILE));
fclose($fp);
$data_arr = json_decode($data);
var_dump($data_arr);
?>
The file giant-data-barf.txt is, as its name suggests, a huge file(it's 5.4mb right now, but it could go up to several GB)
When I execute this script, I get the following error:
Fatal error: Allowed memory size of 134217728 bytes exhausted (tried to allocate 71 bytes) in ........./data.php on line 12
I looked at possible solutions, and saw this:
ini_set('memory_limit','16M');
and my question is, is there a limit to how big I should set my memory? Or is there a better way of solving this problem?
THIS IS A VERY BAD IDEA, that said, you'll need to set
ini_set('memory_limit',filesize($FILE) + SOME_OVERHEAD_AMOUNT);
because you're reading the entire thing into memory. You may very well have to set the memory limit to two times the size of the file since you also want to JSON_DECODE
NOTE THAT ON A WEB SERVER THIS WILL CONSUME MASSIVE AMOUNTS OF MEMORY AND YOU SHOULD NOT DO THIS IF THE FILE WILL BE MANY GIGABYTES AS YOU SAID!!!!
Is it really a giant JSON blob? You should look at converting this to a database or other format which you can use random or row access with before parsing with PHP.
I've given all my servers a memory_limit of 100M... didn't run into trouble yet.
I would consider splitting up that file somehow, or get rid of it and use a database