Related
I am reading a file containing around 50k lines using the file() function in Php. However, its giving a out of memory error since the contents of the file are stored in the memory as an array. Is there any other way?
Also, the lengths of the lines stored are variable.
Here's the code. Also the file is 700kB not mB.
private static function readScoreFile($scoreFile)
{
$file = file($scoreFile);
$relations = array();
for($i = 1; $i < count($file); $i++)
{
$relation = explode("\t",trim($file[$i]));
$relation = array(
'pwId_1' => $relation[0],
'pwId_2' => $relation[1],
'score' => $relation[2],
);
if($relation['score'] > 0)
{
$relations[] = $relation;
}
}
unset($file);
return $relations;
}
Use fopen, fread and fclose to read a file sequentially:
$handle = fopen($filename, 'r');
if ($handle) {
while (!feof($handle)) {
echo fread($handle, 8192);
}
fclose($handle);
}
EDIT after update of question and comments to answer of fabjoa:
There is definitely something fishy if a 700kb file eats up 140MB of memory with that code you gave (you could unset $relation at the end of the each iteration though). Consider using a debugger to step through it to see what happens. You might also want to consider rewriting the code to use SplFileObject's CSV functions as well (or their procedural cousins)
SplFileObject::setCsvControl example
$file = new SplFileObject("data.csv");
$file->setFlags(SplFileObject::READ_CSV);
$file->setCsvControl('|');
foreach ($file as $row) {
list ($fruit, $quantity) = $row;
// Do something with values
}
For an OOP approach to iterate over the file, try SplFileObject:
SplFileObject::fgets example
$file = new SplFileObject("file.txt");
while (!$file->eof()) {
echo $file->fgets();
}
SplFileObject::next example
// Read through file line by line
$file = new SplFileObject("misc.txt");
while (!$file->eof()) {
echo $file->current();
$file->next();
}
or even
foreach(new SplFileObject("misc.txt") as $line) {
echo $line;
}
Pretty much related (if not duplicate):
How to save memory when reading a file in Php?
If you don't know the maximum line length and you are not comfortable to use a magic number for the max line length then you'll need to do an initial scan of the file and determine the max line length.
Other than that the following code should help you out:
// length is a large number or calculated from an initial file scan
while (!feof($handle)) {
$buffer = fgets($handle, $length);
echo $buffer;
}
Old question but since I haven't seen anyone mentioning it, PHP generators is a great way to reduce save memory consumption.
For example:
function read($fileName)
{
$fileHandler = fopen($fileName, 'rb');
while(($line = fgets($fileHandler)) !== false) {
yield rtrim($line, "\r\n");
}
fclose($fileHandler);
}
foreach(read(__DIR__ . '/filenameHere') as $line) {
echo $line;
}
allocate more memory during the operation, maybe something like ini_set('memory_limit', '16M');. Don't forget to go back to initial memory allocation once operation is done
Can I read a file in PHP from my end, for example if I want to read last 10-20 lines?
And, as I read, if the size of the file is more than 10mbs I start getting errors.
How can I prevent this error?
For reading a normal file, we use the code :
if ($handle) {
while (($buffer = fgets($handle, 4096)) !== false) {
$i1++;
$content[$i1]=$buffer;
}
if (!feof($handle)) {
echo "Error: unexpected fgets() fail\n";
}
fclose($handle);
}
My file might go over 10mbs, but I just need to read the last few lines. How do I do it?
Thanks
You can use fopen and fseek to navigate in file backwards from end. For example
$fp = #fopen($file, "r");
$pos = -2;
while (fgetc($fp) != "\n") {
fseek($fp, $pos, SEEK_END);
$pos = $pos - 1;
}
$lastline = fgets($fp);
It's not pure PHP, but the common solution is to use the tac command which is the revert of cat and loads the file in reverse. Use exec() or passthru() to run it on the server and then read the results. Example usage:
<?php
$myfile = 'myfile.txt';
$command = "tac $myfile > /tmp/myfilereversed.txt";
exec($command);
$currentRow = 0;
$numRows = 20; // stops after this number of rows
$handle = fopen("/tmp/myfilereversed.txt", "r");
while (!feof($handle) && $currentRow <= $numRows) {
$currentRow++;
$buffer = fgets($handle, 4096);
echo $buffer."<br>";
}
fclose($handle);
?>
It depends how you interpret "can".
If you wonder whether you can do this directly (with PHP function) without reading the all the preceding lines, then the answer is: No, you cannot.
A line ending is an interpretation of the data and you can only know where they are, if you actually read the data.
If it is a really big file, I'd not do that though.
It would be better if you were to scan the file starting from the end, and gradually read blocks from the end to the file.
Update
Here's a PHP-only way to read the last n lines of a file without reading through all of it:
function last_lines($path, $line_count, $block_size = 512){
$lines = array();
// we will always have a fragment of a non-complete line
// keep this in here till we have our next entire line.
$leftover = "";
$fh = fopen($path, 'r');
// go to the end of the file
fseek($fh, 0, SEEK_END);
do{
// need to know whether we can actually go back
// $block_size bytes
$can_read = $block_size;
if(ftell($fh) < $block_size){
$can_read = ftell($fh);
}
// go back as many bytes as we can
// read them to $data and then move the file pointer
// back to where we were.
fseek($fh, -$can_read, SEEK_CUR);
$data = fread($fh, $can_read);
$data .= $leftover;
fseek($fh, -$can_read, SEEK_CUR);
// split lines by \n. Then reverse them,
// now the last line is most likely not a complete
// line which is why we do not directly add it, but
// append it to the data read the next time.
$split_data = array_reverse(explode("\n", $data));
$new_lines = array_slice($split_data, 0, -1);
$lines = array_merge($lines, $new_lines);
$leftover = $split_data[count($split_data) - 1];
}
while(count($lines) < $line_count && ftell($fh) != 0);
if(ftell($fh) == 0){
$lines[] = $leftover;
}
fclose($fh);
// Usually, we will read too many lines, correct that here.
return array_slice($lines, 0, $line_count);
}
Following snippet worked for me.
$file = popen("tac $filename",'r');
while ($line = fgets($file)) {
echo $line;
}
Reference: http://laughingmeme.org/2008/02/28/reading-a-file-backwards-in-php/
If your code is not working and reporting an error you should include the error in your posts!
The reason you are getting an error is because you are trying to store the entire contents of the file in PHP's memory space.
The most effiicent way to solve the problem would be as Greenisha suggests and seek to the end of the file then go back a bit. But Greenisha's mecanism for going back a bit is not very efficient.
Consider instead the method for getting the last few lines from a stream (i.e. where you can't seek):
while (($buffer = fgets($handle, 4096)) !== false) {
$i1++;
$content[$i1]=$buffer;
unset($content[$i1-$lines_to_keep]);
}
So if you know that your max line length is 4096, then you would:
if (4096*lines_to_keep<filesize($input_file)) {
fseek($fp, -4096*$lines_to_keep, SEEK_END);
}
Then apply the loop I described previously.
Since C has some more efficient methods for dealing with byte streams, the fastest solution (on a POSIX/Unix/Linux/BSD) system would be simply:
$last_lines=system("last -" . $lines_to_keep . " filename");
For Linux you can do
$linesToRead = 10;
exec("tail -n{$linesToRead} {$myFileName}" , $content);
You will get an array of lines in $content variable
Pure PHP solution
$f = fopen($myFileName, 'r');
$maxLineLength = 1000; // Real maximum length of your records
$linesToRead = 10;
fseek($f, -$maxLineLength*$linesToRead, SEEK_END); // Moves cursor back from the end of file
$res = array();
while (($buffer = fgets($f, $maxLineLength)) !== false) {
$res[] = $buffer;
}
$content = array_slice($res, -$linesToRead);
If you know about how long the lines are, you can avoid a lot of the black magic and just grab a chunk of the end of the file.
I needed the last 15 lines from a very large log file, and altogether they were about 3000 characters. So I just grab the last 8000 bytes to be safe, then read the file as normal and take what I need from the end.
$fh = fopen($file, "r");
fseek($fh, -8192, SEEK_END);
$lines = array();
while($lines[] = fgets($fh)) {}
This is possibly even more efficient than the highest rated answer, which reads the file character by character, compares each character, and splits based on newline characters.
Here is another solution. It doesn't have line length control in fgets(), you can add it.
/* Read file from end line by line */
$fp = fopen( dirname(__FILE__) . '\\some_file.txt', 'r');
$lines_read = 0;
$lines_to_read = 1000;
fseek($fp, 0, SEEK_END); //goto EOF
$eol_size = 2; // for windows is 2, rest is 1
$eol_char = "\r\n"; // mac=\r, unix=\n
while ($lines_read < $lines_to_read) {
if (ftell($fp)==0) break; //break on BOF (beginning...)
do {
fseek($fp, -1, SEEK_CUR); //seek 1 by 1 char from EOF
$eol = fgetc($fp) . fgetc($fp); //search for EOL (remove 1 fgetc if needed)
fseek($fp, -$eol_size, SEEK_CUR); //go back for EOL
} while ($eol != $eol_char && ftell($fp)>0 ); //check EOL and BOF
$position = ftell($fp); //save current position
if ($position != 0) fseek($fp, $eol_size, SEEK_CUR); //move for EOL
echo fgets($fp); //read LINE or do whatever is needed
fseek($fp, $position, SEEK_SET); //set current position
$lines_read++;
}
fclose($fp);
Well while searching for the same thing, I can across the following and thought it might be useful to others as well so sharing it here:
/* Read file from end line by line */
function tail_custom($filepath, $lines = 1, $adaptive = true) {
// Open file
$f = #fopen($filepath, "rb");
if ($f === false) return false;
// Sets buffer size, according to the number of lines to retrieve.
// This gives a performance boost when reading a few lines from the file.
if (!$adaptive) $buffer = 4096;
else $buffer = ($lines < 2 ? 64 : ($lines < 10 ? 512 : 4096));
// Jump to last character
fseek($f, -1, SEEK_END);
// Read it and adjust line number if necessary
// (Otherwise the result would be wrong if file doesn't end with a blank line)
if (fread($f, 1) != "\n") $lines -= 1;
// Start reading
$output = '';
$chunk = '';
// While we would like more
while (ftell($f) > 0 && $lines >= 0) {
// Figure out how far back we should jump
$seek = min(ftell($f), $buffer);
// Do the jump (backwards, relative to where we are)
fseek($f, -$seek, SEEK_CUR);
// Read a chunk and prepend it to our output
$output = ($chunk = fread($f, $seek)) . $output;
// Jump back to where we started reading
fseek($f, -mb_strlen($chunk, '8bit'), SEEK_CUR);
// Decrease our line counter
$lines -= substr_count($chunk, "\n");
}
// While we have too many lines
// (Because of buffer size we might have read too many)
while ($lines++ < 0) {
// Find first newline and remove all text before that
$output = substr($output, strpos($output, "\n") + 1);
}
// Close file and return
fclose($f);
return trim($output);
}
As Einstein said every thing should be made as simple as possible but no simpler. At this point you are in need of a data structure, a LIFO data structure or simply put a stack.
A more complete example of the "tail" suggestion above is provided here. This seems to be a simple and efficient method -- thank-you. Very large files should not be an issue and a temporary file is not required.
$out = array();
$ret = null;
// capture the last 30 files of the log file into a buffer
exec('tail -30 ' . $weatherLog, $buf, $ret);
if ( $ret == 0 ) {
// process the captured lines one at a time
foreach ($buf as $line) {
$n = sscanf($line, "%s temperature %f", $dt, $t);
if ( $n > 0 ) $temperature = $t;
$n = sscanf($line, "%s humidity %f", $dt, $h);
if ( $n > 0 ) $humidity = $h;
}
printf("<tr><th>Temperature</th><td>%0.1f</td></tr>\n",
$temperature);
printf("<tr><th>Humidity</th><td>%0.1f</td></tr>\n", $humidity);
}
else { # something bad happened }
In the above example, the code reads 30 lines of text output and displays the last temperature and humidity readings in the file (that's why the printf's are outside of the loop, in case you were wondering). The file is filled by an ESP32 which adds to the file every few minutes even when the sensor reports only nan. So thirty lines gets plenty of readings so it should never fail. Each reading includes the date and time so in the final version the output will include the time the reading was taken.
I am reading a file containing around 50k lines using the file() function in Php. However, its giving a out of memory error since the contents of the file are stored in the memory as an array. Is there any other way?
Also, the lengths of the lines stored are variable.
Here's the code. Also the file is 700kB not mB.
private static function readScoreFile($scoreFile)
{
$file = file($scoreFile);
$relations = array();
for($i = 1; $i < count($file); $i++)
{
$relation = explode("\t",trim($file[$i]));
$relation = array(
'pwId_1' => $relation[0],
'pwId_2' => $relation[1],
'score' => $relation[2],
);
if($relation['score'] > 0)
{
$relations[] = $relation;
}
}
unset($file);
return $relations;
}
Use fopen, fread and fclose to read a file sequentially:
$handle = fopen($filename, 'r');
if ($handle) {
while (!feof($handle)) {
echo fread($handle, 8192);
}
fclose($handle);
}
EDIT after update of question and comments to answer of fabjoa:
There is definitely something fishy if a 700kb file eats up 140MB of memory with that code you gave (you could unset $relation at the end of the each iteration though). Consider using a debugger to step through it to see what happens. You might also want to consider rewriting the code to use SplFileObject's CSV functions as well (or their procedural cousins)
SplFileObject::setCsvControl example
$file = new SplFileObject("data.csv");
$file->setFlags(SplFileObject::READ_CSV);
$file->setCsvControl('|');
foreach ($file as $row) {
list ($fruit, $quantity) = $row;
// Do something with values
}
For an OOP approach to iterate over the file, try SplFileObject:
SplFileObject::fgets example
$file = new SplFileObject("file.txt");
while (!$file->eof()) {
echo $file->fgets();
}
SplFileObject::next example
// Read through file line by line
$file = new SplFileObject("misc.txt");
while (!$file->eof()) {
echo $file->current();
$file->next();
}
or even
foreach(new SplFileObject("misc.txt") as $line) {
echo $line;
}
Pretty much related (if not duplicate):
How to save memory when reading a file in Php?
If you don't know the maximum line length and you are not comfortable to use a magic number for the max line length then you'll need to do an initial scan of the file and determine the max line length.
Other than that the following code should help you out:
// length is a large number or calculated from an initial file scan
while (!feof($handle)) {
$buffer = fgets($handle, $length);
echo $buffer;
}
Old question but since I haven't seen anyone mentioning it, PHP generators is a great way to reduce save memory consumption.
For example:
function read($fileName)
{
$fileHandler = fopen($fileName, 'rb');
while(($line = fgets($fileHandler)) !== false) {
yield rtrim($line, "\r\n");
}
fclose($fileHandler);
}
foreach(read(__DIR__ . '/filenameHere') as $line) {
echo $line;
}
allocate more memory during the operation, maybe something like ini_set('memory_limit', '16M');. Don't forget to go back to initial memory allocation once operation is done
The closest I've seen in the PHP docs, is to fread() a given length, but that doesnt specify which line to start from. Any other suggestions?
Yes, you can do that easily with SplFileObject::seek
$file = new SplFileObject('filename.txt');
$file->seek(1000);
for($i = 0; !$file->eof() && $i < 1000; $i++) {
echo $file->current();
$file->next();
}
This is a method from the SeekableIterator interface and not to be confused with fseek.
And because SplFileObject is iterable you can do it even easier with a LimitIterator:
$file = new SplFileObject('longFile.txt');
$fileIterator = new LimitIterator($file, 1000, 2000);
foreach($fileIterator as $line) {
echo $line, PHP_EOL;
}
Again, this is zero-based, so it's line 1001 to 2001.
You not going to be able to read starting from line X because lines can be of arbitrary length. So you will have to read from the start counting the number of lines read to get to line X. For example:
<?php
$f = fopen('sample.txt', 'r');
$lineNo = 0;
$startLine = 3;
$endLine = 6;
while ($line = fgets($f)) {
$lineNo++;
if ($lineNo >= $startLine) {
echo $line;
}
if ($lineNo == $endLine) {
break;
}
}
fclose($f);
Unfortunately, in order to be able to read from line x to line y, you'd need to be able to detect line breaks... and you'd have to scan through the whole file. However, assuming you're not asking about this for performance reasons, you can get lines x to y with the following:
$x = 10; //inclusive start line
$y = 20; //inclusive end line
$lines = file('myfile.txt');
$my_important_lines = array_slice($lines, $x, $y);
See: array_slice
Well, you can't use function fseek to seek the appropriate position because it works with given number of bytes.
I think that it's not possible without some sort of cache or going through lines one after the other.
Here is the possible solution :)
<?php
$f = fopen('sample.txt', 'r');
$lineNo = 0;
$startLine = 3;
$endLine = 6;
while ($line = fgets($f)) {
$lineNo++;
if ($lineNo >= $startLine) {
echo $line;
}
if ($lineNo == $endLine) {
break;
}
}
fclose($f);
?>
If you're looking for lines then you can't use fread because that relies on a byte offset, not the number of line breaks. You actually have to read the file to find the line breaks, so a different function is more appropriate. fgets will read the file line-by-line. Throw that in a loop and capture only the lines you want.
I was afraid of that... I guess it's plan B then :S
For each AJAX request I'm going to:
Read into a string the number of lines I'm going to return to the client.
Copy the rest of the file into a temp file.
Return string to the client.
It's lame, and it will probably be pretty slow with 10,000+ lines files, but I guess it's better than reading the same over and over again, at least the temp file is getting shorter with every request... No?
I have a script which, each time is called, gets the first line of a file. Each line is known to be exactly of the same length (32 alphanumeric chars) and terminates with "\r\n".
After getting the first line, the script removes it.
This is done in this way:
$contents = file_get_contents($file));
$first_line = substr($contents, 0, 32);
file_put_contents($file, substr($contents, 32 + 2)); //+2 because we remove also the \r\n
Obviously it works, but I was wondering whether there is a smarter (or more efficient) way to do this?
In my simple solution I basically read and rewrite the entire file just to take and remove the first line.
I came up with this idea yesterday:
function read_and_delete_first_line($filename) {
$file = file($filename);
$output = $file[0];
unset($file[0]);
file_put_contents($filename, $file);
return $output;
}
There is no more efficient way to do this other than rewriting the file.
No need to create a second temporary file, nor put the whole file in memory:
if ($handle = fopen("file", "c+")) { // open the file in reading and editing mode
if (flock($handle, LOCK_EX)) { // lock the file, so no one can read or edit this file
while (($line = fgets($handle, 4096)) !== FALSE) {
if (!isset($write_position)) { // move the line to previous position, except the first line
$write_position = 0;
} else {
$read_position = ftell($handle); // get actual line
fseek($handle, $write_position); // move to previous position
fputs($handle, $line); // put actual line in previous position
fseek($handle, $read_position); // return to actual position
$write_position += strlen($line); // set write position to the next loop
}
}
fflush($handle); // write any pending change to file
ftruncate($handle, $write_position); // drop the repeated last line
flock($handle, LOCK_UN); // unlock the file
}
fclose($handle);
}
This will shift the first line of a file, you dont need to load the entire file in memory like you do using the 'file' function. Maybe for small files is a bit more slow than with 'file' (maybe but i bet is not) but is able to manage largest files without problems.
$firstline = false;
if($handle = fopen($logFile,'c+')){
if(!flock($handle,LOCK_EX)){fclose($handle);}
$offset = 0;
$len = filesize($logFile);
while(($line = fgets($handle,4096)) !== false){
if(!$firstline){$firstline = $line;$offset = strlen($firstline);continue;}
$pos = ftell($handle);
fseek($handle,$pos-strlen($line)-$offset);
fputs($handle,$line);
fseek($handle,$pos);
}
fflush($handle);
ftruncate($handle,($len-$offset));
flock($handle,LOCK_UN);
fclose($handle);
}
you can iterate the file , instead of putting them all in memory
$handle = fopen("file", "r");
$first = fgets($handle,2048); #get first line.
$outfile="temp";
$o = fopen($outfile,"w");
while (!feof($handle)) {
$buffer = fgets($handle,2048);
fwrite($o,$buffer);
}
fclose($handle);
fclose($o);
rename($outfile,$file);
I wouldn't usually recommend opening up a shell for this sort of thing, but if you're doing this infrequently on really large files, there's probably something to be said for:
$lines = `wc -l myfile` - 1;
`tail -n $lines myfile > newfile`;
It's simple, and it doesn't involve reading the whole file into memory.
I wouldn't recommend this for small files, or extremely frequent use though. The overhead's too high.
You could store positional info into the file itself. For example, the first 8 bytes of the file could store an integer. This integer is the byte offset of the first real line in the file.
So, you never delete lines anymore. Instead, deleting a line means altering the start position. fseek() to it and then read lines as normal.
The file will grow big eventually. You could periodically clean up the orphaned lines to reduce the file size.
But seriously, just use a database and don't do stuff like this.
Here's one way:
$contents = file($file, FILE_IGNORE_NEW_LINES);
$first_line = array_shift($contents);
file_put_contents($file, implode("\r\n", $contents));
There's countless other ways to do that also, but all the methods would involve separating the first line somehow and saving the rest. You cannot avoid rewriting the whole file. An alternative take:
list($first_line, $contents) = explode("\r\n", file_get_contents($file), 2);
file_put_contents($file, implode("\r\n", $contents));
My problem was large files. I just needed to edit, or remove the first line. This was a solution I used. Didn't require to load the complete file in a variable. Currently echos, but you could always save the contents.
$fh = fopen($local_file, 'rb');
echo "add\tfirst\tline\n"; // add your new first line.
fgets($fh); // moves the file pointer to the next line.
echo stream_get_contents($fh); // flushes the remaining file.
fclose($fh);
I think this is best for any file size
$myfile = fopen("yourfile.txt", "r") or die("Unable to open file!");
$ch=1;
while(!feof($myfile)) {
$dataline= fgets($myfile) . "<br>";
if($ch == 2){
echo str_replace(' ', ' ', $dataline)."\n";
}
$ch = 2;
}
fclose($myfile);
The solutions here didn't work performantly for me.
My solution grabs the last line (not the first line, in my case it was not relevant to get the first or last line) from the file and removes that from that file.
This is very quickly even with very large files (>150000000 lines).
function file_pop($file)
{
if ($fp = #fopen($file, "c+")) {
if (!flock($fp, LOCK_EX)) {
fclose($fp);
}
$pos = -1;
$found = 0;
while ($found < 2) {
if (fseek($fp, $pos--, SEEK_END) < 0) { // can not seek to position
rewind($fp); // rewind to the beginnung of the file
break;
};
if (ord(fgetc($fp)) == 10) { // newline
$found++;
}
}
$lastpos = ftell($fp); // get current position of file
$lastline = fgets($fp); // get current line
ftruncate($fp, $lastpos); // truncate file to last position
flock($fp, LOCK_UN); // unlock
fclose($fp); // close the file
return trim($lastline);
}
}
You could use file() method.
Gets the first line
$content = file('myfile.txt');
echo $content[0];