Efficient flat file searching in PHP

Efficient flat file searching in PHP - php

I'd like to store 0 to ~5000 IP addresses in a plain text file, with an unrelated header at the top. Something like this:
Unrelated data
Unrelated data
----SEPARATOR----
1.2.3.4
5.6.7.8
9.1.2.3
Now I'd like to find if '5.6.7.8' is in that text file using PHP. I've only ever loaded an entire file and processed it in memory, but I wondered if there was a more efficient way of searching a text file in PHP. I only need a true/false if it's there.
Could anyone shed any light? Or would I be stuck with loading in the whole file first?
Thanks in advance!

5000 isn't a lot of records. You could easily do this:
$addresses = explode("\n", file_get_contents('filename.txt'));
and search it manually and it'll be quick.
If you were storing a lot more I would suggest storing them in a database, which is designed for that kind of thing. But for 5000 I think the full load plus brute force search is fine.
Don't optimize a problem until you have a problem. There's no point needlessly overcomplicating your solution.

I'm not sure if perl's command line tool needs to load the whole file to handle it, but you could do something similar to this:
<?php
...
$result = system("perl -p -i -e '5\.6\.7\.8' yourfile.txt");
if ($result)
....
else
....
...
?>
Another option would be to store the IP's in separate files based on the first or second group:
# 1.2.txt
1.2.3.4
1.2.3.5
1.2.3.6
...
# 5.6.txt
5.6.7.8
5.6.7.9
5.6.7.10
...
... etc.
That way you wouldn't necessarily have to worry about the files being so large you incur a performance penalty by loading the whole file into memory.

You could shell out and grep for it.

You might try fgets()
It reads a file line by line. I'm not sure how much more efficient this is though. I'm guessing that if the IP was towards the top of the file it would be more efficient and if the IP was towards the bottom it would be less efficient than just reading in the whole file.

You could use the GREP command with backticks in your on a Linux server. Something like:
$searchFor = '5.6.7.8';
$file = '/path/to/file.txt';
$grepCmd = `grep $searchFor $file`;
echo $grepCmd;

I haven't tested this personally, but there is a snippet of code in the PHP manual that is written for large file parsing:
http://www.php.net/manual/en/function.fgets.php#59393
//File to be opened
$file = "huge.file";
//Open file (DON'T USE a+ pointer will be wrong!)
$fp = fopen($file, 'r');
//Read 16meg chunks
$read = 16777216;
//\n Marker
$part = 0;
while(!feof($fp)) {
$rbuf = fread($fp, $read);
for($i=$read;$i > 0 || $n == chr(10);$i--) {
$n=substr($rbuf, $i, 1);
if($n == chr(10))break;
//If we are at the end of the file, just grab the rest and stop loop
elseif(feof($fp)) {
$i = $read;
$buf = substr($rbuf, 0, $i+1);
break;
}
}
//This is the buffer we want to do stuff with, maybe thow to a function?
$buf = substr($rbuf, 0, $i+1);
//Point marker back to last \n point
$part = ftell($fp)-($read-($i+1));
fseek($fp, $part);
}
fclose($fp);
The snippet was written by the original author: hackajar yahoo com

are you trying to compare the current IP with the text files listed IP's? the unrelated data wouldnt match anyway.
so just use strpos on the on the full file contents (file_get_contents).
<?php
$file = file_get_contents('data.txt');
$pos = strpos($file, $_SERVER['REMOTE_ADDR']);
if($pos === false) {
echo "no match for $_SERVER[REMOTE_ADDR]";
}
else {
echo "match for $_SERVER[REMOTE_ADDR]!";
}
?>

Related

Is it good to use system calls to write and read a file in php ?

I need to keep a 20k to 30k file register with a simple key:value per line.
I need to keep it in a file , since other instance also will use it.
Then I will need to find an especific key to get its value and also write a key:value in the file.
I was wondering wich of the following methods are faster / better or considered as good practice.
In order to write to file, I know about three ways to do it:
first:
$fh = fopen('myfile.txt', 'a') or die("can't open file");
fwrite($fh, 'key:value');
fclose($fh);
second or with file_put_contents
file_put_contents('myfile.txt','key:value',FILE_APPEND);
and third using a system call.
exec("echo key:value >> myfile.txt");
And also, in order to read a file and find a line a can do:
Using file_get_contents
$filename = 'info.txt';
$contents = file_get_contents($filename);
foreach($contents as $line) {
$pos = strpos($line, $key);
}
Using file
$filename = 'info.txt';
$contents = file($filename);
foreach($contents as $line) {
$pos = strpos($line, $key);
}
And with a system call:
exec("grep $key | wc -l",$result);

I guess you already considered using a database? Because otherwise you are reinventing the wheel. A database has all the advantages with fast-seeking and row-level locking.
If you are using a file, you have to build this by yourself.
I strongly advice to switch to some kind of database.
BTW, you don't mention if you are replacing values or just appending to the file.

Concatenate files in PHP

I'd like to know if there is a faster way of concatenating 2 text files in PHP, than the usual way of opening txt1 in a+, reading txt2 line by line and copying each line to txt1.

If you want to use a pure-PHP solution, you could use file_get_contents to read the whole file in a string and then write that out (no error checking, just to show how you could do it):
$fp1 = fopen("txt1", 'a+');
$file2 = file_get_contents("txt2");
fwrite($fp1, $file2);

It's probably much faster to use the cat program in linux if you have command line permissions for PHP
system('cat txt1 txt2 > txt3');

$content = file_get_contents("file1");
file_put_contents("file2", $content, FILE_APPEND);

I have found using *nix cat to be the most effective here, but if for whatever reason you don't have access to it, and you are concatenating large files, then you can use this line by line function. (Error handling stripped for simplicity).
function catFiles($arrayOfFiles, $outputPath) {
$dest = fopen($outputPath,"a");
foreach ($arrayOfFiles as $f) {
$FH = fopen($f,"r");
$line = fgets($FH);
while ($line !== false) {
fputs($dest,$line);
$line = fgets($FH);
}
fclose($FH);
}
fclose($dest);
}

While the fastest way is undobtedly to use OS commands, like cp or cat, this is hardly advisable for compatibility.
The fastest "PHP only" way is using file_get_contents, that reads the whole source file, in one shot but it also has some drawbacks. It will require a lot of memory for large files and for this reason it may fail depending on the memory assigned to PHP.
A universal clean and fast solution is to use fread and fwrite with a large buffer.
If the file is smaller than the buffer, all reading will happen in one burst, so speed is optimal, otherwise reading happens at big chunks (the size of the buffer) so the overhead is minimal and speed is quite good.
Reading line by line with fgets instead, has to test for every charachter, one by one, if it's a newline or line feed.
Also, reading line by line with fgets a file with many short lines will be slower as you will read many little pieces, of different sizes, depending of where newlines are positioned.
fread is faster as it only checks for EOF (which is easy) and reads files using a fixed size chunk you decide, so it can be made optimal for your OS or disk or kind of files (say you have many files <12k you can set the buffer size to 16k so they are all read in one shot).
// Code is untested written on mobile phone inside Stack Overflow, comes from various examples online you can also check.
<?php
$BUFFER_SIZE=1*1024*1024; // 1MB, bigger is faster.. depending on file sizes and count
$dest = fopen($fileToAppendTo "a+");
if (FALSE === $dest) die("Failed to open destination");
$handle = fopen("source.txt", "rb");
if (FALSE === $handle) {
fclose($dest);
die("Failed to open source");
}
$contents = '';
while( !feof($handle) ) {
fwrite($dest, fread($handle, $BUFFER_SIZE) );
}
fclose($handle);
fclose($dest);
?>

Which method is better? Hashing each line in a file with PHP

This question was asked on a message board, and I want to get a definitive answer and intelligent debate about which method is more semantically correct and less resource intensive.
Say I have a file with each line in that file containing a string. I want to generate an MD5 hash for each line and write it to the same file, overwriting the previous data. My first thought was to do this:
$file = 'strings.txt';
$lines = file($file);
$handle = fopen($file, 'w+');
foreach ($lines as $line)
{
fwrite($handle, md5(trim($line))."\n");
}
fclose($handle);
Another user pointed out that file_get_contents() and file_put_contents() were better than using fwrite() in a loop. Their solution:
$thefile = 'strings.txt';
$newfile = 'newstrings.txt';
$current = file_get_contents($thefile);
$explodedcurrent = explode('\n', $thefile);
$temp = '';
foreach ($explodedcurrent as $string)
$temp .= md5(trim($string)) . '\n';
$newfile = file_put_contents($newfile, $temp);
My argument is that since the main goal of this is to get the file into an array, and file_get_contents() is the preferred way to read the contents of a file into a string, file() is more appropriate and allows us to cut out another unnecessary function, explode().
Furthermore, by directly manipulating the file using fopen(), fwrite(), and fclose() (which is the exact same as one call to file_put_contents()) there is no need to have extraneous variables in which to store the converted strings; you're writing them directly to the file.
My method is the exact same as the alternative - the same number of opens/closes on the file - except mine is shorter and more semantically correct.
What do you have to say, and which one would you choose?
This should be more efficient and less resource-intensive as the previous two methods:
$file = 'passwords.txt';
$passwords = file($file);
$converted = fopen($file, 'w+');
while (count($passwords) > 0)
{
static $i = 0;
fwrite($converted, md5(trim($passwords[$i])));
unset($passwords[$i]);
$i++;
}
fclose($converted);
echo 'Done.';

As one of the comments suggests do what makes more sense to you. Since you might come back to this code in few months and you need to spend least amount of time trying to understand it.
However, if speed is your concern then I would create two test cases (you pretty much already got them) and use timestamp (create variable with timestamp at the beginning of the script, then at the end of the script subtract it from timestamp at the end of the script to work out the difference - how long it took to run the script.) Prepare few files I would go for about 3, two extremes and one normal file. To see which version runs faster.
http://php.net/manual/en/function.time.php
I would think that differences would be marginal, but it also depends on your file sizes.

I'd propose to write a new temporary file, while you process the input one. Once done, overwrite the input file with the temporary one.

How to pass a file as an argument to php exec?

I would like to know how I can pass the content of a file (csv in my case) as an argument for a command line executable (in C or Objective C) to be called by exec in php.
Here is what I have done: the user loads the content of its file from an URL like this:
http://www.myserver.com/model.php?fileName=test.csv
Then the following code allows php to parse and load the csv file:
<?php
$f = $_GET['fileName'];
$handle = fopen("$f", "r");
$data = array();
while (($line = fgetcsv($handle)) !== FALSE) {
$data[] = $line;
}
?>
where I'm stuck is how to pass the content of this csv file as an argument to exec. Even if I can assume the csv is known to have only two columns, how many rows it has is user-specific, so I cannot pass all the values one by one as parameters, e.g.
exec("/path_to_executable/model -a $data[0][0] -b $data[0][1] .....");
The only alternative solution I guess would be to write something like that:
exec("/path_to_executable/model -fileName test.csv");
and have the command line executable do the csv parsing, but in that case, I think I need to have the csv file physically written on the server side. I'm wondering what happens if several people are accessing the webpage at the same time with their own different csv file, are they over-writing each others?
I guess there must be a much proper way to do this and I have not figured it out. Any idea? Thanks!

I would recommend having that data on disk, and loading it within the command line utility - it is much less messing about. But if you can't do that, just pass it in 1 (unparsed) line at a time:
$command = "/path_to_executable/model";
foreach ($fileData as $line) {
$command .= ' "'.escapeshellarg($line).'"';
}
exec($command);
Then you can just fetch the data into your utility by looping argv, where argv[0] is the first line, argv[1] is the second line, and so on.

you could use popen() to get a handle on the process to write to. If you need to go both ways (read/write) and might requre some more power, have a look a proc_open().
You could also just write your data to some random file (to avoid multiple users kicking each other's race-conditioned butts). Something along the lines of
<?php
$csv = file_get_contents('http://www.myserver.com/model.php?fileName=test.csv
');
$filename = '/tmp/' . uniqid(sha1($csv)) . '.csv';
file_put_contents($filename, $csv);
exec('/your/thing < '. escapeshellarg($filename));
unlink($filename);
And since you're also in charge of the executable, you might figure out how to get the number of arguments passed (hint: argc) and read them in (hint: argv). Passing them through line-based like so:
<?php
$csvRow = fgetcsv($fh);
if ($csvRow) {
$escaped = array_map('escapeshellarg', $csvRow);
exec('/your/thing '. join(' ', $escaped));
}

How to save memory when reading a file in Php?

I have a 200kb file, what I use in multiple pages, but on each page I need only 1-2 lines of that file so how I can read only these lines what I need if I know the line number?
For example if I need only the 10th line, I don`t want to load in memory all the lines, just the 10th line.
Sorry for my bad english!

Try SplFileObject
echo memory_get_usage(), PHP_EOL; // 333200
$file = new SplFileObject('bible.txt'); // 996kb
$file->seek(5000); // jump to line 5000 (zero-based)
echo $file->current(), PHP_EOL; // output current line
echo memory_get_usage(), PHP_EOL; // 342984 vs 3319864 when using file()
For outputting the current line, you can either use current() or just echo $file. I find it clearer to use the method though. You can also use fgets(), but that would get the next line.
Of course, you only need the middle three lines. I've added the memory_get_usage calls just to prove this approach does eat almost no memory.

Unless you know the offset of the line, you will need to read every line up to that point. You can just throw away the old lines (that you don't want) by looping through the file with something like fgets(). (EDIT: Rather than fgets(), I would suggest #Gordon's solution)
Possibly a better solution would be to use a database, as the database engine will do the grunt work of storing the strings and allow you to (very efficiently) get a certain "line" (It wouldn't be a line but a record with an numeric ID, however it amounts to the same thing) without having to read the records before it.

Do the contents of the file change? If it's static, or relatively static, you can build a list of offsets where you want to read your data. For instance, if the file changes once a year, but you read it hundreds of times a day, then you can pre-compute the offsets of the lines you want and jump to them directly like this:
$offsets = array();
while ($line = fread($filehandle)) { .... find line 10 .... }
$offsets[10] = ftell($filehandle); // store line 10's location
.... find next line
$offsets[20] = ftell($filehandle);
and so on. Afterwards, you can trivially jump to that line's location like this:
$fh = fopen('file.txt', 'rb');
fseek($fh, $offsets[20]); // jump to line 20
But this could entirely be overkill. Try benchmarking the operations - compare how long it takes to do an oldfashioned "read 20 lines" versus precompute/jump.

<?php
$lines = array(1, 2, 10);
$handle = #fopen("/tmp/inputfile.txt", "r");
if ($handle) {
$i = 0;
while (!feof($handle)) {
$line = stream_get_line($handle, 1000000, "\n");
if (in_array($i, $lines)) {
echo $line;
$line = ''; // Don't forget to clean the buffer!
}
if ($i > end($lines)) {
break;
}
$i++;
}
fclose($handle);
}
?>

Just loop through them without storing, e.g.
$i = 1;
$file = fopen('file.txt', 'r');
while (!feof($file)) {
$line = fgets($file); // this gets whole line from the file;
if ($i == 10) {
break; // break on tenth line
}
$i ++;
}
The above example would keep memory for only the last line it got from the file, so this is the most memory efficient way to do it.

use fgets(). 10 times :-) in this case you will not store all 10 lines in the memory

Why are you only trying to load the first ten lines? Do you know that loading all those lines is in fact a problem?
If you haven't measured, then you don't know that it's a problem. Don't waste your time optimizing for non-problems. Chances are that any performance change you'll have in not loading the entire 200K file will be imperceptible, unless you know for a fact that loading that file is indeed a bottleneck.

We Keep Coding

PHP, A popular general-purpose scripting language that is especially suited to web development.

Efficient flat file searching in PHP - php

You could shell out and grep for it.

You might try fgets() It reads a file line by line. I'm not sure how much more efficient this is though. I'm guessing that if the IP was towards the top of the file it would be more efficient and if the IP was towards the bottom it would be less efficient than just reading in the whole file.

You could use the GREP command with backticks in your on a Linux server. Something like: $searchFor = '5.6.7.8'; $file = '/path/to/file.txt'; $grepCmd = `grep $searchFor $file`; echo $grepCmd;

Related

Is it good to use system calls to write and read a file in php ?

Concatenate files in PHP

Which method is better? Hashing each line in a file with PHP

How to pass a file as an argument to php exec?

How to save memory when reading a file in Php?

Categories

Resources