Read and parse contents of very large file [duplicate] - php

This question already has answers here:
Least memory intensive way to read a file in PHP
(5 answers)
Closed 10 years ago.
I am trying to parse a tab delimited file that is ~1GB in size.
Where I run the script i get:
Fatal error: Allowed memory size of 1895825408 bytes exhausted (tried to allocate 1029206974 bytes) ...
My script at the moment is just:
$file = file_get_contents('allCountries.txt') ;
$file = str_replace(array("\r\n", "\t"), array("[NEW*LINE]", "[tAbul*Ator]"), $file) ;
I have set the memory limit in php.ini to -1, which then gives me:
Fatal error: Out of memory (allocated 1029963776) (tried to allocate 1029206974 bytes)
Is there anyway to partially open the file and then move on to the next part so less memory is used up at one time?

Yes, you can read it line by line:
$handle = #fopen("/tmp/inputfile.txt", "r");
if ($handle) {
while (($buffer = fgets($handle, 4096)) !== false) {
echo $buffer;
}
fclose($handle);
}

You have to use blocks to read the file. Check the answer of this question.
https://stackoverflow.com/a/6564818/1572528
You can also try to use this for less large files.
ini_set('memory_limit', '32M'); //max size 32m

Are you sure that it's fopen that's failing and not your script's timeout setting? The default is usually around 30 seconds or so, and if your file is taking longer than that to read in, it may be tripping that up.
Another thing to consider may be the memory limit on your script - reading the file into an array may trip over this, so check your error log for memory warnings.
If neither of the above are your problem, you might look into using fgets to read the file in line-by-line, processing as you go.
$handle = fopen("/tmp/uploadfile.txt", "r") or die("Couldn't get handle");
if ($handle) {
while (!feof($handle)) {
$buffer = fgets($handle, 4096);
// Process buffer here..
}
fclose($handle);
}
Edit
PHP doesn't seem to throw an error, it just returns false.
Is the path to $rawfile correct relative to where the script is running? Perhaps try setting an absolute path here for the filename.

Yes, use fopen and fread / fgets for this:
http://www.php.net/manual/en/function.fread.php
string fread ( resource $handle , int $length )
Set $length to how many of the file you want to read.
The $handle saves the position for new reads then, with fseek you can also set the position later....

Related

What's the proper way to import 1.6M line file? [duplicate]

This question already has answers here:
Streaming a large file using PHP
(5 answers)
Closed 4 years ago.
I have a Laravel 5.3 project. I need to import and parse a pretty large (1.6M lines) text file.
I am having memory resource issues. I think at some point, I need to use chunk but am having trouble getting the file loaded to do so.
Here is what I am trying;
if(Input::hasFile('file')){
$path = Input::file('file')->getRealPath(); //assign file from input
$data = file($path); //load the file
$data->chunk(100, function ($content) { //parse it 100 lines at a time
foreach ($content as $line) {
//use $line
}
});
}
I understand that file() will return an array vs. File::get() which will return a string.
I have increased my php.ini upload and memory limits to be able to handle the file size but am running into this error;
Allowed memory size of 524288000 bytes exhausted (tried to allocate 4096 bytes)
This is occurring at the line;
$data = file($path);
What am I missing? And/or is this the most ideal way to do this?
Thanks!
As mentioned, file() reads the entire file into an array, in this case 1.6 million elements. I doubt that is possible. You can read each line one by one overwriting the previous one:
$fh = fopen($path "r");
if($fh) {
while(($line = fgets($fh)) !== false) {
//use $line
}
}
The only way to keep it from timing out is to set the maximum execution time:
set_time_limit(0);
If file is too large, you need split your file without php, you can use exec command safely, if you want use just with php interpreter, you need many memory and it need long time, linux commands save your time for each run.
exec('split -C 20m --numeric-suffixes input_filename output_prefix');
After that you may use Directory Iterator and read each file.
Regards

Concatenate files in PHP

I'd like to know if there is a faster way of concatenating 2 text files in PHP, than the usual way of opening txt1 in a+, reading txt2 line by line and copying each line to txt1.
If you want to use a pure-PHP solution, you could use file_get_contents to read the whole file in a string and then write that out (no error checking, just to show how you could do it):
$fp1 = fopen("txt1", 'a+');
$file2 = file_get_contents("txt2");
fwrite($fp1, $file2);
It's probably much faster to use the cat program in linux if you have command line permissions for PHP
system('cat txt1 txt2 > txt3');
$content = file_get_contents("file1");
file_put_contents("file2", $content, FILE_APPEND);
I have found using *nix cat to be the most effective here, but if for whatever reason you don't have access to it, and you are concatenating large files, then you can use this line by line function. (Error handling stripped for simplicity).
function catFiles($arrayOfFiles, $outputPath) {
$dest = fopen($outputPath,"a");
foreach ($arrayOfFiles as $f) {
$FH = fopen($f,"r");
$line = fgets($FH);
while ($line !== false) {
fputs($dest,$line);
$line = fgets($FH);
}
fclose($FH);
}
fclose($dest);
}
While the fastest way is undobtedly to use OS commands, like cp or cat, this is hardly advisable for compatibility.
The fastest "PHP only" way is using file_get_contents, that reads the whole source file, in one shot but it also has some drawbacks. It will require a lot of memory for large files and for this reason it may fail depending on the memory assigned to PHP.
A universal clean and fast solution is to use fread and fwrite with a large buffer.
If the file is smaller than the buffer, all reading will happen in one burst, so speed is optimal, otherwise reading happens at big chunks (the size of the buffer) so the overhead is minimal and speed is quite good.
Reading line by line with fgets instead, has to test for every charachter, one by one, if it's a newline or line feed.
Also, reading line by line with fgets a file with many short lines will be slower as you will read many little pieces, of different sizes, depending of where newlines are positioned.
fread is faster as it only checks for EOF (which is easy) and reads files using a fixed size chunk you decide, so it can be made optimal for your OS or disk or kind of files (say you have many files <12k you can set the buffer size to 16k so they are all read in one shot).
// Code is untested written on mobile phone inside Stack Overflow, comes from various examples online you can also check.
<?php
$BUFFER_SIZE=1*1024*1024; // 1MB, bigger is faster.. depending on file sizes and count
$dest = fopen($fileToAppendTo "a+");
if (FALSE === $dest) die("Failed to open destination");
$handle = fopen("source.txt", "rb");
if (FALSE === $handle) {
fclose($dest);
die("Failed to open source");
}
$contents = '';
while( !feof($handle) ) {
fwrite($dest, fread($handle, $BUFFER_SIZE) );
}
fclose($handle);
fclose($dest);
?>

Stream reading a file in php

I have a big csv file ( about 30M ), that I want to read using my php program and convert it to another format and save it as different small files . When I am using the traditional fopen , fwrite methods I am getting an error that says Fatal error: Allowed memory size of 134217728 bytes exhausted . I am aware that I can set the memory limit in php.ini but is there any way that I can read the file as stream so that it wont create much memory overhead ? May be something like StreamReader classes in java ?
You could just read the file one line at a time with fgets(), provided you are reassigning your variable each time through (and not storing the lines in an array or something, where they would remain in memory).
One way, with a ~65 MB file:
// load the whole thing
$file = file_get_contents('hugefile.txt');
echo memory_get_peak_usage() / 1024 / 1024, ' MB';
// prints '66.153938293457 MB'
Second way:
// load only one line at a time
$fh = fopen('hugefile.txt', 'r');
while ($line = fgets($fh)) {}
echo memory_get_peak_usage() / 1024 / 1024, ' MB';
// prints '0.62477111816406 MB'
Also, if you want to rearrange the data in a different format, you could parse each line as CSV as you go using fgetcsv() instead.

PHP using fwrite and fread with input stream

I'm looking for the most efficient way to write the contents of the PHP input stream to disk, without using much of the memory that is granted to the PHP script. For example, if the max file size that can be uploaded is 1 GB but PHP only has 32 MB of memory.
define('MAX_FILE_LEN', 1073741824); // 1 GB in bytes
$hSource = fopen('php://input', 'r');
$hDest = fopen(UPLOADS_DIR.'/'.$MyTempName.'.tmp', 'w');
fwrite($hDest, fread($hSource, MAX_FILE_LEN));
fclose($hDest);
fclose($hSource);
Does fread inside an fwrite like the above code shows mean that the entire file will be loaded into memory?
For doing the opposite (writing a file to the output stream), PHP offers a function called fpassthru which I believe does not hold the contents of the file in the PHP script's memory.
I'm looking for something similar but in reverse (writing from input stream to file). Thank you for any assistance you can give.
Yep - fread used in that way would read up to 1 GB into a string first, and then write that back out via fwrite. PHP just isn't smart enough to create a memory-efficient pipe for you.
I would try something akin to the following:
$hSource = fopen('php://input', 'r');
$hDest = fopen(UPLOADS_DIR . '/' . $MyTempName . '.tmp', 'w');
while (!feof($hSource)) {
/*
* I'm going to read in 1K chunks. You could make this
* larger, but as a rule of thumb I'd keep it to 1/4 of
* your php memory_limit.
*/
$chunk = fread($hSource, 1024);
fwrite($hDest, $chunk);
}
fclose($hSource);
fclose($hDest);
If you wanted to be really picky, you could also unset($chunk); within the loop after fwrite to absolutely ensure that PHP frees up the memory - but that shouldn't be necessary, as the next loop will overwrite whatever memory is being used by $chunk at that time.

php read large text file log

I have a text log file, about 600 MB.
I want to read it using php and display the data on a html page, but I only need the last 18 lines that were added each time I run the script.
Since its a large file, I can't read it all in then flip the array as I would have hoped. Is their another way?
Use fopen, filesize and fseek to open the file and start reading it only near the end of the file.
Comments on the fseek manual page include full code to read the last X lines of a large file.
Loading that size file into memory would probably not be a good idea. This should get you around that.
$file = escapeshellarg($file);
$line = 'tail -n 18 '.$file;
system($line);
you can stream it backwards with
$file = popen("tac $filename",'r');
while ($line = fgets($file)) {
echo $line;
}
The best way to do this is use fread and fgets to read line by line, this is extreamly fast as only one line is read at one time and not the while file:
Example of usage would be:
$handle = fopen("/logs/log.txt", "r")
if ($handle)
{
fseek($handle,-18,SEEK_END); //Seek to the end minus 18 lines
while (!feof($handle))
{
echo fgets($handle, 4096); //Make sure your line is less that 4096, otherwise update
$line++;
}
fclose($handle);
}
For the record, had the same problem and tried every solution here.
Turns out Dagon's popen "tac $filename" way is the fastest and the one with the lowest memory and CPU loads.
Tested with a 2Gb log-file reading 500, 1000 and 2000 lines each time. Smooth. Thank you.

Categories