I come from a C and C++ background but also play with some web stuff. All us C folks (hopefully) know that calling feof on a FILE* before doing a read is an error. This is something that stings newbies to C and C++ very often. Is this also the case for PHP's implementation?
I figure it has to be because the file could be a socket or anything else where it is impossible to know the size before finishing reading. But just about every PHP example (even those found on php.net I've seen looks something like this (and alarms go off in my head):
$f = fopen("whatever.txt", "rb");
while(!feof($f)) {
echo fgets($f);
}
fclose($f);
I know it is preferable to write this like this and avoid this issue:
$f = fopen("whatever.txt", "rb");
while($line = fgets($f)) {
echo $line;
}
fclose($f);
but that's besides the point. I tried testing if things would fail if I did it "the wrong way", but I could not get it to cause incorrect behavior. This isn't exactly scientific, but I figured it was worth a try.
So, is it incorrect to call feof before an fread in PHP?
There are a couple of ways that PHP could have done this differently that C's version, but I feel they have downsides.
they could have it default to !EOF. This is sub-optimal because it may be incorrect for some corner cases.
they could get the file size during an fopen call, but this couldn't work on all types of file resources, yielding inconsistent behavior and would be slower.
PHP doesn't know whether it's at the end of the file until you've tried to read from it. Try this simple example:
<?php
$fp = fopen("/dev/null","rb");
while (!feof($fp)) {
$data = fread($fp, 1);
echo "read " . strlen($data) . " bytes";
}
fclose($fp);
?>
You will get one line reading read 0 bytes. feof() returned true even though you were technically at the end of the file. Usually this doesn't cause a problem because fread($fp, 1) returns no data, and whatever processing you're doing on that handles no data just fine. If you really do need to know if you're at the end of the file, you do have to do a read first.
This may be more of a question than an answer, but why not use file_get_contents()? In my experience, if you are reading a file or a stream, this function does the dirty work for you (assuming you want to read the entire resource to a string, or know/can compute a limit and offset). Its sister function file_put_contents() works well, in reverse.
For instance, here's an example:
$expected_content = "Hello Stack Overflow!"
$real_content = file_get_contents("/path/to/file.txt");
if ($expected_content != $real_content){
file_put_contents("/path/to/file.txt", $real_content);
}
or a stream:
$expected_content = "Hello Stack Overflow!"
$real_content = file_get_contents("http://host.domain.com/file.txt");
if ($expected_content != $real_content){
$options = array('ftp' => array('overwrite' => true));
$stream = stream_context_create($options);
file_put_contents("ftp://user:pass#host.domain.com/file.txt", $real_content, 0, $stream);
}
Then you don't need to worry about EOF or anything, it does it for you (ftp put gets a bit dicey, but that's ok). Of course, this won't work in all situations...
Is there something that I'm missing from the original question that makes this approach unfeasible?
Your assertion about not calling feof before an fread is not correct - consequently the question is not valid.
Related
I'd like to know if there is a faster way of concatenating 2 text files in PHP, than the usual way of opening txt1 in a+, reading txt2 line by line and copying each line to txt1.
If you want to use a pure-PHP solution, you could use file_get_contents to read the whole file in a string and then write that out (no error checking, just to show how you could do it):
$fp1 = fopen("txt1", 'a+');
$file2 = file_get_contents("txt2");
fwrite($fp1, $file2);
It's probably much faster to use the cat program in linux if you have command line permissions for PHP
system('cat txt1 txt2 > txt3');
$content = file_get_contents("file1");
file_put_contents("file2", $content, FILE_APPEND);
I have found using *nix cat to be the most effective here, but if for whatever reason you don't have access to it, and you are concatenating large files, then you can use this line by line function. (Error handling stripped for simplicity).
function catFiles($arrayOfFiles, $outputPath) {
$dest = fopen($outputPath,"a");
foreach ($arrayOfFiles as $f) {
$FH = fopen($f,"r");
$line = fgets($FH);
while ($line !== false) {
fputs($dest,$line);
$line = fgets($FH);
}
fclose($FH);
}
fclose($dest);
}
While the fastest way is undobtedly to use OS commands, like cp or cat, this is hardly advisable for compatibility.
The fastest "PHP only" way is using file_get_contents, that reads the whole source file, in one shot but it also has some drawbacks. It will require a lot of memory for large files and for this reason it may fail depending on the memory assigned to PHP.
A universal clean and fast solution is to use fread and fwrite with a large buffer.
If the file is smaller than the buffer, all reading will happen in one burst, so speed is optimal, otherwise reading happens at big chunks (the size of the buffer) so the overhead is minimal and speed is quite good.
Reading line by line with fgets instead, has to test for every charachter, one by one, if it's a newline or line feed.
Also, reading line by line with fgets a file with many short lines will be slower as you will read many little pieces, of different sizes, depending of where newlines are positioned.
fread is faster as it only checks for EOF (which is easy) and reads files using a fixed size chunk you decide, so it can be made optimal for your OS or disk or kind of files (say you have many files <12k you can set the buffer size to 16k so they are all read in one shot).
// Code is untested written on mobile phone inside Stack Overflow, comes from various examples online you can also check.
<?php
$BUFFER_SIZE=1*1024*1024; // 1MB, bigger is faster.. depending on file sizes and count
$dest = fopen($fileToAppendTo "a+");
if (FALSE === $dest) die("Failed to open destination");
$handle = fopen("source.txt", "rb");
if (FALSE === $handle) {
fclose($dest);
die("Failed to open source");
}
$contents = '';
while( !feof($handle) ) {
fwrite($dest, fread($handle, $BUFFER_SIZE) );
}
fclose($handle);
fclose($dest);
?>
I'm using fsockopen to connect to an OpenVAS manager and send XML. The code I am using is:
$connection = fsockopen('ssl://'.$server_data['host'], $server_data['port']);
stream_set_timeout($connection, 5);
fwrite($connection, $xml);
while ($chunk = fread($connection, 2048)) {
$response .= $chunk;
}
However after reading the first two chunks of data, PHP hangs on fread and doesn't time out after 5 seconds. I have tried using stream_get_contents, which gives the same result, BUT if I only use one fread, it works ok, just that I want to read everything, regardless of length.
I am guessing, it is an issue with OpenVAS, which doesn't end the stream the way PHP expects it to, but that's a shot in the dark. How do I read the stream?
I believe that fread is hanging up because on that last chunk, it is expecting 2048 bytes of information and is probably getting less that that, so it waits until it times out.
You could try to refactor your code like this:
$bytes_to_read = 2048;
while ($chunk = fread($connection, $bytes_to_read)) {
$response .= $chunk;
$status = socket_get_status ($connection);
$bytes_to_read = $status["unread_bytes"];
}
That way, you'll read everything in two chunks.... I haven't tested this code, but I remember having a similar issue a while ago and fixing it with something like this.
Hope it helps!
This question was asked on a message board, and I want to get a definitive answer and intelligent debate about which method is more semantically correct and less resource intensive.
Say I have a file with each line in that file containing a string. I want to generate an MD5 hash for each line and write it to the same file, overwriting the previous data. My first thought was to do this:
$file = 'strings.txt';
$lines = file($file);
$handle = fopen($file, 'w+');
foreach ($lines as $line)
{
fwrite($handle, md5(trim($line))."\n");
}
fclose($handle);
Another user pointed out that file_get_contents() and file_put_contents() were better than using fwrite() in a loop. Their solution:
$thefile = 'strings.txt';
$newfile = 'newstrings.txt';
$current = file_get_contents($thefile);
$explodedcurrent = explode('\n', $thefile);
$temp = '';
foreach ($explodedcurrent as $string)
$temp .= md5(trim($string)) . '\n';
$newfile = file_put_contents($newfile, $temp);
My argument is that since the main goal of this is to get the file into an array, and file_get_contents() is the preferred way to read the contents of a file into a string, file() is more appropriate and allows us to cut out another unnecessary function, explode().
Furthermore, by directly manipulating the file using fopen(), fwrite(), and fclose() (which is the exact same as one call to file_put_contents()) there is no need to have extraneous variables in which to store the converted strings; you're writing them directly to the file.
My method is the exact same as the alternative - the same number of opens/closes on the file - except mine is shorter and more semantically correct.
What do you have to say, and which one would you choose?
This should be more efficient and less resource-intensive as the previous two methods:
$file = 'passwords.txt';
$passwords = file($file);
$converted = fopen($file, 'w+');
while (count($passwords) > 0)
{
static $i = 0;
fwrite($converted, md5(trim($passwords[$i])));
unset($passwords[$i]);
$i++;
}
fclose($converted);
echo 'Done.';
As one of the comments suggests do what makes more sense to you. Since you might come back to this code in few months and you need to spend least amount of time trying to understand it.
However, if speed is your concern then I would create two test cases (you pretty much already got them) and use timestamp (create variable with timestamp at the beginning of the script, then at the end of the script subtract it from timestamp at the end of the script to work out the difference - how long it took to run the script.) Prepare few files I would go for about 3, two extremes and one normal file. To see which version runs faster.
http://php.net/manual/en/function.time.php
I would think that differences would be marginal, but it also depends on your file sizes.
I'd propose to write a new temporary file, while you process the input one. Once done, overwrite the input file with the temporary one.
To make this more clear, I'm going to put code samples:
$file = fopen('filename.ext', 'rb');
// Assume $pos has been declared
// method 1
fseek($file, $pos);
$parsed = fread($file, 2);
// method 2
while (!feof($file)) {
$data = fread($file, 1000000);
}
$data = bin2hex($data);
$parsed = substr($data, $pos, 2);
$fclose($file);
There are about 40 fread() in method 1 (with maybe 15 fseek()) vs 1 fread() in method 2. The only thing I am wondering is if loading in 1000000 bytes is overkill when you're really only extracting maybe 100 total bytes (all relatively close together in the middle of the file).
So which code is going to perform better? Which code makes more sense to use? A quick explanation would be greatly appreciated.
If you already know the offset you are looking for, fseek is the best method here, as there is no reason to load the whole file into memory if you only need a few bytes of it. The first method is better because you skip right to what you want in the file stream and read out a small portion. The second method requires you to read the entire file into memory, then seek through that while you could have just read it straight from the file. Hope this answers your question
Files are read in units of clusters, and a cluster is usually something like 8 kb. Usually a few clusters are read ahead.
So, if the file is only a few kb there is very little to gain by using fseek compared to reading the entire file. The file system will read the entire file anyway.
If the file is considerably larger, as in your case, only a few of the clusters has to be read, so the first method should perform better. At worst all the data will still be read from the disk, but your application will still use less memory.
It seems that seeking the position you want and then reading only be bytes you need is the best approach.
But the correct answer is (as always) to test it for real instead of guessing. Run your two examples in your server environment and make some time measurements. Also check memory usage. Then make your optimization once you have some hard data to back it up.
I'd like to store 0 to ~5000 IP addresses in a plain text file, with an unrelated header at the top. Something like this:
Unrelated data
Unrelated data
----SEPARATOR----
1.2.3.4
5.6.7.8
9.1.2.3
Now I'd like to find if '5.6.7.8' is in that text file using PHP. I've only ever loaded an entire file and processed it in memory, but I wondered if there was a more efficient way of searching a text file in PHP. I only need a true/false if it's there.
Could anyone shed any light? Or would I be stuck with loading in the whole file first?
Thanks in advance!
5000 isn't a lot of records. You could easily do this:
$addresses = explode("\n", file_get_contents('filename.txt'));
and search it manually and it'll be quick.
If you were storing a lot more I would suggest storing them in a database, which is designed for that kind of thing. But for 5000 I think the full load plus brute force search is fine.
Don't optimize a problem until you have a problem. There's no point needlessly overcomplicating your solution.
I'm not sure if perl's command line tool needs to load the whole file to handle it, but you could do something similar to this:
<?php
...
$result = system("perl -p -i -e '5\.6\.7\.8' yourfile.txt");
if ($result)
....
else
....
...
?>
Another option would be to store the IP's in separate files based on the first or second group:
# 1.2.txt
1.2.3.4
1.2.3.5
1.2.3.6
...
# 5.6.txt
5.6.7.8
5.6.7.9
5.6.7.10
...
... etc.
That way you wouldn't necessarily have to worry about the files being so large you incur a performance penalty by loading the whole file into memory.
You could shell out and grep for it.
You might try fgets()
It reads a file line by line. I'm not sure how much more efficient this is though. I'm guessing that if the IP was towards the top of the file it would be more efficient and if the IP was towards the bottom it would be less efficient than just reading in the whole file.
You could use the GREP command with backticks in your on a Linux server. Something like:
$searchFor = '5.6.7.8';
$file = '/path/to/file.txt';
$grepCmd = `grep $searchFor $file`;
echo $grepCmd;
I haven't tested this personally, but there is a snippet of code in the PHP manual that is written for large file parsing:
http://www.php.net/manual/en/function.fgets.php#59393
//File to be opened
$file = "huge.file";
//Open file (DON'T USE a+ pointer will be wrong!)
$fp = fopen($file, 'r');
//Read 16meg chunks
$read = 16777216;
//\n Marker
$part = 0;
while(!feof($fp)) {
$rbuf = fread($fp, $read);
for($i=$read;$i > 0 || $n == chr(10);$i--) {
$n=substr($rbuf, $i, 1);
if($n == chr(10))break;
//If we are at the end of the file, just grab the rest and stop loop
elseif(feof($fp)) {
$i = $read;
$buf = substr($rbuf, 0, $i+1);
break;
}
}
//This is the buffer we want to do stuff with, maybe thow to a function?
$buf = substr($rbuf, 0, $i+1);
//Point marker back to last \n point
$part = ftell($fp)-($read-($i+1));
fseek($fp, $part);
}
fclose($fp);
The snippet was written by the original author: hackajar yahoo com
are you trying to compare the current IP with the text files listed IP's? the unrelated data wouldnt match anyway.
so just use strpos on the on the full file contents (file_get_contents).
<?php
$file = file_get_contents('data.txt');
$pos = strpos($file, $_SERVER['REMOTE_ADDR']);
if($pos === false) {
echo "no match for $_SERVER[REMOTE_ADDR]";
}
else {
echo "match for $_SERVER[REMOTE_ADDR]!";
}
?>