PHP & stress tests - load large file into memory

PHP & stress tests - load large file into memory - php

I have to stress test an URL (PHP script) using Apache Benchmark. But for each request, I need a different set of data to be processed, and the URL to remain the same. So inside that PHP script I need to read a 3.000.000 lines file and pick a random one. It means that for each ab request, I need to read that file, then get a random line, then process it.
What method do you recommend?
I was thinking to somehow load that file in memory once (and be available for all requests) and then get a random line from it.
In other words, I need to read one random line from a large file without "feeling" it.
Thank you!

$fh = fopen($file, 'r');
$stats = fstat($fh);
// jump to random location within file
fseek($fh, mt_rand(0, $stats['size'] - 1));
// check where you are
$chr = fread($fh, 1);
// while you're not at a newline and not at the start of the file
while ($chr != "\n" && ftell($fh) > 0) {
// go back one character (plus an additional one you've just read)
fseek($fh, -2, SEEK_CUR);
$chr = fread($fh, 1);
}
// you're just past a newline now, go read the whole next line
$line = fgets($fh);

Related

What is the most efficient PHP way to read first and last line of a file?

I'm trying to open a file and determine if it is valid. It's valid if the first line is START and the last line is END.
I've seen different ways of getting the last line of a file, but it does not pay particular attention to the first line either.
How should I go about this? I was thinking of loading the file contents in an array and checking $array[0] and $array[x] for START and END. But this seems to be a waste for all the junk that could possibly be in the middle.
If its a valid file, I will be reading/processing the contents of the file between START and END.

Don't read entire file into an array if it is not needed. If file can be big you can do it that way:
$h = fopen('text.txt', 'r');
$firstLine = fgets($h);
fseek($h, -3, SEEK_END);
$lastThreeChars = fgets($h);
Memory footprint is much lower

That's from me:
$lines = file($pathToFile);
if ($lines[0] == 'START' && end($lines) == 'END') {
// do stuff
}

Reading whole file with fgets will be efficient for small siles. iF ur file is big then:
open It and read first line
use tail (i didn't check it but it looks OK) function I found in php.net in fseek documentation

What's the best (most efficient) way to search for content in a file and change it with PHP? [duplicate]

This question already has answers here:
PHP what is the best way to write data to middle of file without rewriting file
(3 answers)
Closed 9 years ago.
I have a file that I'm reading with PHP. I want to look for some lines that start with some white space and then some key words I'm looking for (for example, "project_name:") and then change other parts of that line.
Currently, the way I handle this is to read the entire file into a string variable, manipulate that string and then write the whole thing back to the file, fully replacing the entire file (via fopen( filepath, "wb" ) and fwrite()), but this feels inefficient. Is there a better way?

Update: After finishing my function I had time to benchmark it. I've used a 1GB large file for testing but the results where unsatisfying :|
Yes, the memory peak allocation is significantly smaller:
standard solution: 1,86 GB
custom solution: 653 KB (4096 bytes buffersize)
But compared to the following solution there is just a slight performance boost:
ini_set('memory_limit', -1);
file_put_contents(
'test.txt',
str_replace('the', 'teh', file_get_contents('test.txt'))
);
the script above tooks ~16 seconds, the custom solution took ~13 seconds.
Resume: The custome solution is slight faster on large files and consumes much less memory(!!!).
Also if you want to run this in a web server environment the custom solution is better as many concurrent scripts would likely consume the whole available memory of the system.
Original Answer:
The only thing that comes in mind, is to read the file in chunks which fit the file systems block size and write the content or modified content back to a temporary file. After finish processing you use rename() to overwrite the original file.
This would reduce the memory peak and should be significantly faster if the file is really large.
Note: On a linux system you can get the file system block size using:
sudo dumpe2fs /dev/yourdev | grep 'Block size'
I got 4096
Here comes the function:
function freplace($search, $replace, $filename, $buffersize = 4096) {
$fd1 = fopen($filename, 'r');
if(!is_resource($fd1)) {
die('error opening file');
}
// the tempfile can be anywhere but on the same partition as the original
$tmpfile = tempnam('.', uniqid());
$fd2 = fopen($tmpfile, 'w+');
// we store len(search) -1 chars from the end of the buffer on each loop
// this is the maximum chars of the search string that can be on the
// border between two buffers
$tmp = '';
while(!feof($fd1)) {
$buffer = fread($fd1, $buffersize);
// prepend the rest from last one
$buffer = $tmp . $buffer;
// replace
$buffer = str_replace($search, $replace, $buffer);
// store len(search) - 1 chars from the end of the buffer
$tmp = substr($buffer, -1 * (strlen($search)) + 1);
// write processed buffer (minus rest)
fwrite($fd2, $buffer, strlen($buffer) - strlen($tmp));
};
if(!empty($tmp)) {
fwrite($fd2, $tmp);
}
fclose($fd1);
fclose($fd2);
rename($tmpfile, $filename);
}
Call it like this:
freplace('foo', 'bar', 'test.txt');

What is a practical way to process, parse, and stream a large text file?

I currently have a log file stream (text format), which is constantly appended to by running process. I am using PHP to process it into JSON format, and then parsing it with jQuery's getJSON.
I am wondering what would be a practical way to fetch the data in the logfile. I've used jQuery's post function, which the file is too long to fetch. The function getJSON is fine, but the log file gets long enough that PHP can't process it, so it doesn't get passed to the function.
I have thought about limiting the amount of lines in the logfile (Tee'd from CentOS), and fetching a certain number of lines from the logfile (impractical for speed) but how would I do so?

To read only the last part of the file, fseek to a good position and start from there. For example:
define('FILE', '/var/log/logfile');
define('SIZE', 1024*1024);
if (filesize(FILE) <= SIZE) {
$text = file_get_contents(FILE);
} else {
$fh = fopen(FILE, 'r');
fseek($fh, -SIZE, SEEK_END);
// Remove up to newline to avoid a broken line
$skip = strlen(fgets($fh));
$text = fread($fh, SIZE - $skip);
fclose($fh);
}
// Do your work with $text here...

How to save memory when reading a file in Php?

I have a 200kb file, what I use in multiple pages, but on each page I need only 1-2 lines of that file so how I can read only these lines what I need if I know the line number?
For example if I need only the 10th line, I don`t want to load in memory all the lines, just the 10th line.
Sorry for my bad english!

Try SplFileObject
echo memory_get_usage(), PHP_EOL; // 333200
$file = new SplFileObject('bible.txt'); // 996kb
$file->seek(5000); // jump to line 5000 (zero-based)
echo $file->current(), PHP_EOL; // output current line
echo memory_get_usage(), PHP_EOL; // 342984 vs 3319864 when using file()
For outputting the current line, you can either use current() or just echo $file. I find it clearer to use the method though. You can also use fgets(), but that would get the next line.
Of course, you only need the middle three lines. I've added the memory_get_usage calls just to prove this approach does eat almost no memory.

Unless you know the offset of the line, you will need to read every line up to that point. You can just throw away the old lines (that you don't want) by looping through the file with something like fgets(). (EDIT: Rather than fgets(), I would suggest #Gordon's solution)
Possibly a better solution would be to use a database, as the database engine will do the grunt work of storing the strings and allow you to (very efficiently) get a certain "line" (It wouldn't be a line but a record with an numeric ID, however it amounts to the same thing) without having to read the records before it.

Do the contents of the file change? If it's static, or relatively static, you can build a list of offsets where you want to read your data. For instance, if the file changes once a year, but you read it hundreds of times a day, then you can pre-compute the offsets of the lines you want and jump to them directly like this:
$offsets = array();
while ($line = fread($filehandle)) { .... find line 10 .... }
$offsets[10] = ftell($filehandle); // store line 10's location
.... find next line
$offsets[20] = ftell($filehandle);
and so on. Afterwards, you can trivially jump to that line's location like this:
$fh = fopen('file.txt', 'rb');
fseek($fh, $offsets[20]); // jump to line 20
But this could entirely be overkill. Try benchmarking the operations - compare how long it takes to do an oldfashioned "read 20 lines" versus precompute/jump.

<?php
$lines = array(1, 2, 10);
$handle = #fopen("/tmp/inputfile.txt", "r");
if ($handle) {
$i = 0;
while (!feof($handle)) {
$line = stream_get_line($handle, 1000000, "\n");
if (in_array($i, $lines)) {
echo $line;
$line = ''; // Don't forget to clean the buffer!
}
if ($i > end($lines)) {
break;
}
$i++;
}
fclose($handle);
}
?>

Just loop through them without storing, e.g.
$i = 1;
$file = fopen('file.txt', 'r');
while (!feof($file)) {
$line = fgets($file); // this gets whole line from the file;
if ($i == 10) {
break; // break on tenth line
}
$i ++;
}
The above example would keep memory for only the last line it got from the file, so this is the most memory efficient way to do it.

use fgets(). 10 times :-) in this case you will not store all 10 lines in the memory

Why are you only trying to load the first ten lines? Do you know that loading all those lines is in fact a problem?
If you haven't measured, then you don't know that it's a problem. Don't waste your time optimizing for non-problems. Chances are that any performance change you'll have in not loading the entire 200K file will be imperceptible, unless you know for a fact that loading that file is indeed a bottleneck.

Efficient flat file searching in PHP

I'd like to store 0 to ~5000 IP addresses in a plain text file, with an unrelated header at the top. Something like this:
Unrelated data
Unrelated data
----SEPARATOR----
1.2.3.4
5.6.7.8
9.1.2.3
Now I'd like to find if '5.6.7.8' is in that text file using PHP. I've only ever loaded an entire file and processed it in memory, but I wondered if there was a more efficient way of searching a text file in PHP. I only need a true/false if it's there.
Could anyone shed any light? Or would I be stuck with loading in the whole file first?
Thanks in advance!

5000 isn't a lot of records. You could easily do this:
$addresses = explode("\n", file_get_contents('filename.txt'));
and search it manually and it'll be quick.
If you were storing a lot more I would suggest storing them in a database, which is designed for that kind of thing. But for 5000 I think the full load plus brute force search is fine.
Don't optimize a problem until you have a problem. There's no point needlessly overcomplicating your solution.

I'm not sure if perl's command line tool needs to load the whole file to handle it, but you could do something similar to this:
<?php
...
$result = system("perl -p -i -e '5\.6\.7\.8' yourfile.txt");
if ($result)
....
else
....
...
?>
Another option would be to store the IP's in separate files based on the first or second group:
# 1.2.txt
1.2.3.4
1.2.3.5
1.2.3.6
...
# 5.6.txt
5.6.7.8
5.6.7.9
5.6.7.10
...
... etc.
That way you wouldn't necessarily have to worry about the files being so large you incur a performance penalty by loading the whole file into memory.

You could shell out and grep for it.

You might try fgets()
It reads a file line by line. I'm not sure how much more efficient this is though. I'm guessing that if the IP was towards the top of the file it would be more efficient and if the IP was towards the bottom it would be less efficient than just reading in the whole file.

You could use the GREP command with backticks in your on a Linux server. Something like:
$searchFor = '5.6.7.8';
$file = '/path/to/file.txt';
$grepCmd = `grep $searchFor $file`;
echo $grepCmd;

I haven't tested this personally, but there is a snippet of code in the PHP manual that is written for large file parsing:
http://www.php.net/manual/en/function.fgets.php#59393
//File to be opened
$file = "huge.file";
//Open file (DON'T USE a+ pointer will be wrong!)
$fp = fopen($file, 'r');
//Read 16meg chunks
$read = 16777216;
//\n Marker
$part = 0;
while(!feof($fp)) {
$rbuf = fread($fp, $read);
for($i=$read;$i > 0 || $n == chr(10);$i--) {
$n=substr($rbuf, $i, 1);
if($n == chr(10))break;
//If we are at the end of the file, just grab the rest and stop loop
elseif(feof($fp)) {
$i = $read;
$buf = substr($rbuf, 0, $i+1);
break;
}
}
//This is the buffer we want to do stuff with, maybe thow to a function?
$buf = substr($rbuf, 0, $i+1);
//Point marker back to last \n point
$part = ftell($fp)-($read-($i+1));
fseek($fp, $part);
}
fclose($fp);
The snippet was written by the original author: hackajar yahoo com

are you trying to compare the current IP with the text files listed IP's? the unrelated data wouldnt match anyway.
so just use strpos on the on the full file contents (file_get_contents).
<?php
$file = file_get_contents('data.txt');
$pos = strpos($file, $_SERVER['REMOTE_ADDR']);
if($pos === false) {
echo "no match for $_SERVER[REMOTE_ADDR]";
}
else {
echo "match for $_SERVER[REMOTE_ADDR]!";
}
?>

We Keep Coding

PHP, A popular general-purpose scripting language that is especially suited to web development.

PHP & stress tests - load large file into memory - php

Related

What is the most efficient PHP way to read first and last line of a file?

What's the best (most efficient) way to search for content in a file and change it with PHP? [duplicate]

What is a practical way to process, parse, and stream a large text file?

How to save memory when reading a file in Php?

Efficient flat file searching in PHP

Categories

Resources