I have many file chunks and I need to merge them using PHP fopen function.However, I'm worring about the memory usage.
For example,I got about 100 files listed in split_hash.txt,each is about 100mb. And here I combine them together:
<?php
$hash = file_get_contents("split_hash.txt");
$list = explode("\r\n",$hash);
$fp = fopen("hadoop2.zip","ab");
foreach($list as $value){
if(!empty($value)) {
$handle = fopen($value,"rb");
fwrite($fp,fread($handle,filesize($value)));
fclose($handle);
unset($handle);
}
}
fclose($fp);
echo "ok";
Will it cost a lot of my memory?
Will it cost a lot of my memory?
it will if you use fread($handle,filesize($value)) to get the whole length of the file for your fread, use fread in smaller chunks per file.
I would change:
fwrite($fp,fread($handle,filesize($value)));
to:
while (!feof($handle)) {
fwrite($fp,fread($handle,1048576));
}
so that you are only dealing with 10 megabytes at a time
It will cost you the maximum of your 100 file size in memory peak.
you are reading whole file and write it to another file. In this case it doesn't matter how many files do you have, but it matters how big is file.
You are saying it's ~100 mb, with default php settings it's 128 mb memory limit, which is acceptable for your case
Related
This question already has answers here:
Streaming a large file using PHP
(5 answers)
Closed 4 years ago.
I have a Laravel 5.3 project. I need to import and parse a pretty large (1.6M lines) text file.
I am having memory resource issues. I think at some point, I need to use chunk but am having trouble getting the file loaded to do so.
Here is what I am trying;
if(Input::hasFile('file')){
$path = Input::file('file')->getRealPath(); //assign file from input
$data = file($path); //load the file
$data->chunk(100, function ($content) { //parse it 100 lines at a time
foreach ($content as $line) {
//use $line
}
});
}
I understand that file() will return an array vs. File::get() which will return a string.
I have increased my php.ini upload and memory limits to be able to handle the file size but am running into this error;
Allowed memory size of 524288000 bytes exhausted (tried to allocate 4096 bytes)
This is occurring at the line;
$data = file($path);
What am I missing? And/or is this the most ideal way to do this?
Thanks!
As mentioned, file() reads the entire file into an array, in this case 1.6 million elements. I doubt that is possible. You can read each line one by one overwriting the previous one:
$fh = fopen($path "r");
if($fh) {
while(($line = fgets($fh)) !== false) {
//use $line
}
}
The only way to keep it from timing out is to set the maximum execution time:
set_time_limit(0);
If file is too large, you need split your file without php, you can use exec command safely, if you want use just with php interpreter, you need many memory and it need long time, linux commands save your time for each run.
exec('split -C 20m --numeric-suffixes input_filename output_prefix');
After that you may use Directory Iterator and read each file.
Regards
I'm using the following code to download a large file (>100mb). The code is executed in a shell.
$fileHandle = fopen($url, 'rb');
$bytes = 100000;
while ($read = #fread($fileHandle, $bytes)) {
debug(strlen($read));
if (!file_put_contents($filePath, $read, FILE_APPEND)) {
return false;
}
}
Where I would expect that debug(strlen($read)) would output 100000, this is the actual output:
10627
8192
8192
8192
...
Why doesn't fread read more than 8192 bytes after the first time, and why does it read 10627 bytes on the first iteration?
This makes downloading the file very slow, is there a better way to do this?
The answer to your question is (quoting from the PHP docs for fread()):
if the stream is read buffered and it does not represent a plain file, at most one read of up to a number of bytes equal to the chunk size (usually 8192) is made; depending on the previously buffered data, the size of the returned data may be larger than the chunk size
The solution to your performance problem is to using stream_copy_to_stream() which should be faster than block reading using fread(), and more memory efficient as well
I checked the manual, and found this: http://php.net/manual/en/function.fread.php
"If the stream is read buffered and it does not represent a plain file, at most one read of up to a number of bytes equal to the chunk size (usually 8192) is made;"
Since you're opening a URL this is probably the case.
It doesn't explain the 10627 though...
Besides that, why do you expect 100000 byte reads to be faster than 8192?
I doubt that's your bottle neck. My guess is that either the download speed from the URL or the writing speed of the HD is the problem.
I'd like to know if there is a faster way of concatenating 2 text files in PHP, than the usual way of opening txt1 in a+, reading txt2 line by line and copying each line to txt1.
If you want to use a pure-PHP solution, you could use file_get_contents to read the whole file in a string and then write that out (no error checking, just to show how you could do it):
$fp1 = fopen("txt1", 'a+');
$file2 = file_get_contents("txt2");
fwrite($fp1, $file2);
It's probably much faster to use the cat program in linux if you have command line permissions for PHP
system('cat txt1 txt2 > txt3');
$content = file_get_contents("file1");
file_put_contents("file2", $content, FILE_APPEND);
I have found using *nix cat to be the most effective here, but if for whatever reason you don't have access to it, and you are concatenating large files, then you can use this line by line function. (Error handling stripped for simplicity).
function catFiles($arrayOfFiles, $outputPath) {
$dest = fopen($outputPath,"a");
foreach ($arrayOfFiles as $f) {
$FH = fopen($f,"r");
$line = fgets($FH);
while ($line !== false) {
fputs($dest,$line);
$line = fgets($FH);
}
fclose($FH);
}
fclose($dest);
}
While the fastest way is undobtedly to use OS commands, like cp or cat, this is hardly advisable for compatibility.
The fastest "PHP only" way is using file_get_contents, that reads the whole source file, in one shot but it also has some drawbacks. It will require a lot of memory for large files and for this reason it may fail depending on the memory assigned to PHP.
A universal clean and fast solution is to use fread and fwrite with a large buffer.
If the file is smaller than the buffer, all reading will happen in one burst, so speed is optimal, otherwise reading happens at big chunks (the size of the buffer) so the overhead is minimal and speed is quite good.
Reading line by line with fgets instead, has to test for every charachter, one by one, if it's a newline or line feed.
Also, reading line by line with fgets a file with many short lines will be slower as you will read many little pieces, of different sizes, depending of where newlines are positioned.
fread is faster as it only checks for EOF (which is easy) and reads files using a fixed size chunk you decide, so it can be made optimal for your OS or disk or kind of files (say you have many files <12k you can set the buffer size to 16k so they are all read in one shot).
// Code is untested written on mobile phone inside Stack Overflow, comes from various examples online you can also check.
<?php
$BUFFER_SIZE=1*1024*1024; // 1MB, bigger is faster.. depending on file sizes and count
$dest = fopen($fileToAppendTo "a+");
if (FALSE === $dest) die("Failed to open destination");
$handle = fopen("source.txt", "rb");
if (FALSE === $handle) {
fclose($dest);
die("Failed to open source");
}
$contents = '';
while( !feof($handle) ) {
fwrite($dest, fread($handle, $BUFFER_SIZE) );
}
fclose($handle);
fclose($dest);
?>
This question already has answers here:
PHP what is the best way to write data to middle of file without rewriting file
(3 answers)
Closed 9 years ago.
I have a file that I'm reading with PHP. I want to look for some lines that start with some white space and then some key words I'm looking for (for example, "project_name:") and then change other parts of that line.
Currently, the way I handle this is to read the entire file into a string variable, manipulate that string and then write the whole thing back to the file, fully replacing the entire file (via fopen( filepath, "wb" ) and fwrite()), but this feels inefficient. Is there a better way?
Update: After finishing my function I had time to benchmark it. I've used a 1GB large file for testing but the results where unsatisfying :|
Yes, the memory peak allocation is significantly smaller:
standard solution: 1,86 GB
custom solution: 653 KB (4096 bytes buffersize)
But compared to the following solution there is just a slight performance boost:
ini_set('memory_limit', -1);
file_put_contents(
'test.txt',
str_replace('the', 'teh', file_get_contents('test.txt'))
);
the script above tooks ~16 seconds, the custom solution took ~13 seconds.
Resume: The custome solution is slight faster on large files and consumes much less memory(!!!).
Also if you want to run this in a web server environment the custom solution is better as many concurrent scripts would likely consume the whole available memory of the system.
Original Answer:
The only thing that comes in mind, is to read the file in chunks which fit the file systems block size and write the content or modified content back to a temporary file. After finish processing you use rename() to overwrite the original file.
This would reduce the memory peak and should be significantly faster if the file is really large.
Note: On a linux system you can get the file system block size using:
sudo dumpe2fs /dev/yourdev | grep 'Block size'
I got 4096
Here comes the function:
function freplace($search, $replace, $filename, $buffersize = 4096) {
$fd1 = fopen($filename, 'r');
if(!is_resource($fd1)) {
die('error opening file');
}
// the tempfile can be anywhere but on the same partition as the original
$tmpfile = tempnam('.', uniqid());
$fd2 = fopen($tmpfile, 'w+');
// we store len(search) -1 chars from the end of the buffer on each loop
// this is the maximum chars of the search string that can be on the
// border between two buffers
$tmp = '';
while(!feof($fd1)) {
$buffer = fread($fd1, $buffersize);
// prepend the rest from last one
$buffer = $tmp . $buffer;
// replace
$buffer = str_replace($search, $replace, $buffer);
// store len(search) - 1 chars from the end of the buffer
$tmp = substr($buffer, -1 * (strlen($search)) + 1);
// write processed buffer (minus rest)
fwrite($fd2, $buffer, strlen($buffer) - strlen($tmp));
};
if(!empty($tmp)) {
fwrite($fd2, $tmp);
}
fclose($fd1);
fclose($fd2);
rename($tmpfile, $filename);
}
Call it like this:
freplace('foo', 'bar', 'test.txt');
I'm looking for the most efficient way to write the contents of the PHP input stream to disk, without using much of the memory that is granted to the PHP script. For example, if the max file size that can be uploaded is 1 GB but PHP only has 32 MB of memory.
define('MAX_FILE_LEN', 1073741824); // 1 GB in bytes
$hSource = fopen('php://input', 'r');
$hDest = fopen(UPLOADS_DIR.'/'.$MyTempName.'.tmp', 'w');
fwrite($hDest, fread($hSource, MAX_FILE_LEN));
fclose($hDest);
fclose($hSource);
Does fread inside an fwrite like the above code shows mean that the entire file will be loaded into memory?
For doing the opposite (writing a file to the output stream), PHP offers a function called fpassthru which I believe does not hold the contents of the file in the PHP script's memory.
I'm looking for something similar but in reverse (writing from input stream to file). Thank you for any assistance you can give.
Yep - fread used in that way would read up to 1 GB into a string first, and then write that back out via fwrite. PHP just isn't smart enough to create a memory-efficient pipe for you.
I would try something akin to the following:
$hSource = fopen('php://input', 'r');
$hDest = fopen(UPLOADS_DIR . '/' . $MyTempName . '.tmp', 'w');
while (!feof($hSource)) {
/*
* I'm going to read in 1K chunks. You could make this
* larger, but as a rule of thumb I'd keep it to 1/4 of
* your php memory_limit.
*/
$chunk = fread($hSource, 1024);
fwrite($hDest, $chunk);
}
fclose($hSource);
fclose($hDest);
If you wanted to be really picky, you could also unset($chunk); within the loop after fwrite to absolutely ensure that PHP frees up the memory - but that shouldn't be necessary, as the next loop will overwrite whatever memory is being used by $chunk at that time.