I have found this script
Quick and easy flood protection?
and I have turned it into a function.
Works great for the most part. From time to time I see an error:
[<a href='function.unlink'>function.unlink</a>]: No such file or directory
in line:
else if ($diff>3600) { unlink($path); } // If first request was more than 1 hour, new ip file
Apparently some IP files for some reason are getting deleted ?
I have tried to find the logic error, but I'm not good at all at that. Maybe somebody could help.
The function:
function ht_request_limiter() {
if (!isset($_SERVER['REMOTE_ADDR'])) { return; } // Maybe its impossible, however we check it first
if (empty($_SERVER['REMOTE_ADDR'])) { return; } // Maybe its impossible, however we check it first
$path = '/home/czivbaby/valuemarket.gr/ip-sec/'; // I use a function to validate a path first and return if false...
$path = $path.$_SERVER['REMOTE_ADDR'].'.txt'; // Real file path (filename = <ip>.txt)
$now = time(); // Current timestamp
if (!file_exists($path)) { // If first request or new request after 1 hour / 24 hour ban, new file with <timestamp>|<counter>
if ($handle = fopen($path, 'w+')) {
if (fwrite($handle, $now.'|0')) { chmod($path, 0700); } // Chmod to prevent access via web
fclose($handle);
}
}
else if (($content = file_get_contents($path)) !== false) { // Load existing file
$content = explode('|',$content); // Create paraset [0] -> timestamp [1] -> counter
$diff = (int)$now-(int)$content[0]; // Time difference in seconds from first request to now
if ($content[1] == 'ban') { // If [1] = ban we check if it was less than 24 hours and die if so
if ($diff>86400) { unlink($path); } // 24 hours in seconds.. if more delete ip file
else {
header("HTTP/1.1 503 Service Unavailable");
exit("Your IP is banned for 24 hours, because of too many requests.");
}
}
else if ($diff>3600) { unlink($path); } // If first request was more than 1 hour, new ip file
else {
$current = ((int)$content[1])+1; // Counter + 1
if ($current>200) { // We check rpm (request per minute) after 200 request to get a good ~value
$rpm = ($current/($diff/60));
if ($rpm>10) { // If there was more than 10 rpm -> ban (if you have a request all 5 secs. you will be banned after ~17 minutes)
if ($handle = fopen($path, 'w+')) {
fwrite($handle, $content[0].'|ban');
fclose($handle);
// Maybe you like to log the ip once -> die after next request
}
return;
}
}
if ($handle = fopen($path, 'w+')) { // else write counter
fwrite($handle, $content[0].'|'.$current .'');
fclose($handle);
}
}
}
}
Your server is processing two (or more) requests at the same time from the same client, and the script does not seem to handle this (completely normal) situation correctly. Web browsers download multiple objects from a server in parallel in order to speed up browsing. It's quite likely that, every now and then, a browser does two requests which then end up executing in parallel so that two copies of that script end up at the same unlink() call at roughly the same time. One succeeds in deleting the file, and the other one gives the error message.
Even if your server has a single CPU, the operating system will be happily providing multitasking by context switching between multiple PHP processes which are executing the same PHP script at the same time for the same client IP address.
The script should probably use file locking (http://php.net/manual/en/function.flock.php) to lock the file while working on it. Or simply ignore the unlink() error (by placing a # in front of the unlink), but other concurrency problems are likely to come up.
The script should:
Open the file for reading and writing using $f = fopen($filename, 'r+');
Lock the opened file using the file handle. The flock($f, LOCK_EX) call will block and wait if some other process already has a lock.
Read file contents.
Decide what to do (increment counter, refuse to service).
fseek($f, 0, SEEK_SET) to beginning of file, ftruncate($f, 0) to make it empty and rewrite the file contents if necessary or unlink() the file if necessary.
Close the file handle with fclose($f), which also releases the lock on it and lets another process continue with step 3.
The pattern is same for all programming languages.
Related
I have a log file maintained by a PHP script. The PHP script is subject to parallel processing. I cannot get the flock() mechanism to work on the log file: in my case, flock() does not prevent the log file shared by PHP scripts running in parallel from being accessed at the same time and being sometimes overwritten.
I want to be able to read a file, do some processing, modify the data and write back without the same code running in parallel on the server doing the same at the same time. The read modify write has to be in sequence.
On one of my shared hostings (OVH France), it does not work as expected. In that case, we see that the counter $c has the same value in different iframes, which should not be possible if the lock works as expected, which it does on an other shared hosting.
Any suggestions to make this work, or for an alternative method?
Googling "read modify write" php or fetch and add or test and set did not provide useful information: all solutions are based on a working flock().
Here is some standalone running demo code to illustrate. It generates a number of parallel requests from the browser to the server and displays the results. It is easy to visually observe a disfunction: if your webserver does not support flock() like one of mine, the counter value and the number of log lines will be the same in some frames.
<!DOCTYPE html>
<html lang="en">
<title>File lock test</title>
<style>
iframe {
width: 10em;
height: 300px;
}
</style>
<?php
$timeStart = microtime(true);
if ($_GET) { // iframe
// GET
$time = $_GET['time'] ?? 'no time';
$instance = $_GET['instance'] ?? 'no instance';
// open file
// $mode = 'w+'; // no read
// $mode = 'r+'; // does not create file, we have to lock file creation also
$mode = 'c+'; // read, write, create
$fhandle = fopen(__FILE__ .'.rwtestfile.txt', $mode) or exit('fopen');
// lock
flock($fhandle, LOCK_EX) or exit('flock');
// start of file (optional, only some modes like require it)
rewind($fhandle);
// read file (or default initial value if new file)
$fcontent = fread($fhandle, 10000) or ' 0';
// counter value from previous write is last integer value of file
$c = strrchr($fcontent, ' ') + 1;
// new line for file
$fcontent .= "<br />\n$time $instance $c";
// reset once in a while
if ($c > 20) {
$fcontent = ' 0'; // avoid long content
}
// simulate other activity
usleep(rand(1000, 2000));
// start of file
rewind($fhandle);
// write
fwrite($fhandle, $fcontent) or exit('fwrite');
// truncate (in unexpected case file is shorter now)
ftruncate($fhandle, ftell($fhandle)) or exit('ftruncate');
// close
fclose($fhandle) or exit('fclose');
// echo
echo "instance:$instance c:$c<br />";
echo $timeStart ."<br />";
echo microtime(true) - $timeStart ."<br />";
echo $fcontent ."<br />";
} else {
echo 'File lock test<br />';
// iframes that will be requested in parallel, to check flock
for ($i = 0; $i < 14; $i++) {
echo '<iframe src="?instance='. $i .'&time='. date('H:i:s') .'"></iframe>'."\n";
}
}
There is a warning about flock() limitations in the PHP: flock - Manual, but it is about ISAPI (Windows) and FAT (Windows). My server configuration is:
PHP Version 7.2.5
System: Linux cluster026.gra.hosting.ovh.net
Server API: CGI/FastCGI
A way to do an atomic test and set instruction in PHP is to use mkdir(). It is a bit strange to use a directory for that instead of a file, but mkdir() will create a directory or return a false (and a suppressile warning) if it already exists. File commands like fopen(), fwrite(), file_put_contents() do not test and set in one instruction.
<?php
// lock
$fnLock = __FILE__ .'.lock'; // lock directory filename
$lockLooping = 0; // counter can be used for tuning depending on lock duration
do {
if (#mkdir($fnLock, 0777)) { // mkdir is a test and set command
$lockLooping = 0;
} else {
$lockLooping += 1;
$lockAge = time() - filemtime($fnLock);
if ($lockAge > 10) {
rmdir($fnLock); // robustness, in case a lock was not erased
} else {
// wait without consuming CPU before try again
usleep(rand(2500, 25000)); // random to avoid parallel process conflict again
}
}
} while ($lockLooping > 0);
// do stuff under atomic protection
// don't take too long, because parallel processes are waiting for the unlock (rmdir)
$content = file_get_contents($protected_file_name); // example read
$content = $modified_content; // example modify
file_put_contents($protected_file_name, $modified_content); // example write
// unlock
rmdir($fnLock);
Using files for data management coordinated only by PHP request handlers you are heading for a world of pain - you've only just dipped your toes in the water thus far.
Using LOCK_EX, your writer needs to wait for any (and every) instance of LOCK_SH to be released before it will acquire the lock. Here you are setting flock to block until the lock can be acquired. On a relatively busy system, the writer could be blocked indefinitely. There is no priority queuing of locks on most OS that would place any subsequent reader requesting the lock behind a process waiting for a write lock.
A further complication is that you can only use flock on an open file handle. Meaning that a opening the file and acquiring the lock is not atomic, further you need to flush the stat cache in order to determine the age of the file after acquiring the lock.
Any writes to the file (even using file_put_contents()) are not atomic. So in the absence of exclusive locking you can't be sure that nobody will read a partial file.
In the absence of additional components (e.g. a daemon providing a lock queuing mechanism, or a caching reverse proxy in front of the web server, or a relational database) then your only option is to assume that you cannot ensure exclusive access and use atomic operations to semaphore the file, something like:
$lock_age=time()-filectime(dirname(CACHE_FILE) . "/lock");
if (filemtime(CACHE_FILE)>time()-CACHE_TTL
&& $lock_age>MAX_LOCK_TIME) {
rmdir(dirname(CACHE_FILE) . "/lock");
mkdir(dirname(CACHE_FILE) . "/lock") || die "I give up";
}
$content=generate_content(); // might want to add specific timing checks around this
file_put_contents(CACHE_FILE, $content);
rmdir(dirname(CACHE_FILE) . "/lock");
} else if (is_dir(dirname(CACHE_FILE) . "/lock") {
$snooze=MAX_LOCK_TIME-$lock_age;
sleep($snooze);
$content=file_get_contents(CACHE_FILE);
} else {
$content=file_get_contents(CACHE_FILE);
}
(note that this is a really ugly hack)
There is one fopen() test and set mode: the x mode.
x Create and open for writing only; place the file pointer at the beginning of the file. If the file already exists, the fopen() call will fail by returning FALSE and generating an error of level E_WARNING. If the file does not exist, attempt to create it.
The fopen($filename ,'x') behaviour is the same as mkdir() and it can be used in the same way:
<?php
// lock
$fnLock = __FILE__ .'.lock'; // lock file filename
$lockLooping = 0; // counter can be used for tuning depending on lock duration
do {
if ($lockHandle = #fopen($fnLock, 'x')) { // test and set command
$lockLooping = 0;
} else {
$lockLooping += 1;
$lockAge = time() - filemtime($fnLock);
if ($lockAge > 10) {
rmdir($fnLock); // robustness, in case a lock was not erased
} else {
// wait without consuming CPU before try again
usleep(rand(2500, 25000)); // random to avoid parallel process conflict again
}
}
} while ($lockLooping > 0);
// do stuff under atomic protection
// don't take too long, because parallel processes are waiting for the unlock (rmdir)
$content = file_get_contents($protected_file_name); // example read
$content = $modified_content; // example modify
file_put_contents($protected_file_name, $modified_content); // example write
// unlock
fclose($lockHandle);
unlink($fnLock);
It is a good idea to test this, e.g. using the code in the question.
Many people rely on locking as documented, but surprises may appear during test or production under load (parallel requests from one browser may be enough).
I am building a log parser in PHP. The log parser program runs in an infinite loop and scans through the log lines, then does some additional processing for each line.
Log parser uses inotify to detect whether the log file was modified, and then it opens the file again, goes to the previously processed line number and then processed onward. The previously processed line number is stored in a variable and incremented each time a log line is processed. It is also stored into a file so if a log program crashes, it can continue where it last stopped processing.
My problem is that if the log is modified, the parser program does not refresh the contents of the file that was originally opened before the modification, meaning that after the loop iterates to the end of the log, it is waiting for the inotify to signal that the file is modified, which is fine, but then it reopens the whole file again and goes line by line again to the last processed line. This might be performance intensive if log contains a lot of lines. How can I avoid this and get the file updates immediately without reopening the file and skipping N processed lines all over again?
Example code:
$ftp_log_file = '/var/log/proftpd/my_log.log';
$ftp_log_status_file = '/var/log/proftpd/log_status.log';
if ( ! file_exists($ftp_log_status_file)) {
die("failed to load the ftp log status file $ftp_log_status_file!\n");
}
$log_status = json_decode(file_get_contents($ftp_log_status_file));
if ( ! isset($log_status->read_position)) {
$read_position = 0;
} else {
$read_position = $log_status->read_position;
}
// Open an inotify instance
$inoInst = inotify_init();
$watch_id = inotify_add_watch($inoInst, '/var/log/proftpd/my_log.log', IN_MODIFY);
while (1) {
$current_read_index = 0;
$events = inotify_read($inoInst);
$fd = fopen($ftp_log_file, 'r+');
if ($fd === false)
die("unable to open $ftp_log_file!\n");
while ($line = trim(fgets($fd))) {
$current_read_index++;
if ($current_read_index < $read_position) {
continue;
}
// DO SOME LOG PROCESSING
$read_position++;
$log_status->read_position++;
file_put_contents($ftp_log_status_file, json_encode($log_status));
}
fclose($fd);
}
// stop watching our directory
inotify_rm_watch($inoInst, $watch_id);
// close our inotify instance
fclose($inoInst);
fgets seems to remember the fact that the end of file was reached, and future fgets fail silently. An explict fseek() before fgets() seems to fix this.
<?php
$inoInst = inotify_init();
inotify_add_watch($inoInst, 'foo.txt', IN_MODIFY);
$f = fopen('foo.txt', 'r');
for (;;) {
while ($line = fgets($f)) {
echo $line;
}
inotify_read($inoInst);
fseek($f, 0, SEEK_CUR); // make fgets work again
}
Note that there is still the issue of incomplete lines. The line you are currently reading may not be complete yet (e.g. proftpd will finish it with its next write() call).
Since fgets doesn't let you know if it reached a newline or the end of the file, I don't see a convenient way to handle this from the top of my head. The only thing I can think of is to read N bytes at a time and split the lines yourself.
Following the good advice on this link:
How to keep checking for a file until it exists, then provide a link to it
The loop will never end if the file will never be created.
In a perfect system, it should not happen, but if it does how would one exit from that loop?
I have a similar case:
/* More codes above */
// writing on the file
$csvfile = $foldername.$date.$version.".csv";
$csv = fopen( $csvfile, 'w+' );
foreach ($_POST['lists'] as $pref) {
fputcsv($csv, $pref, ";");
}
// close and wait IO creation
fclose($csv);
sleep(1);
// Running the Java
$exec = shell_exec("/usr/bin/java -jar $app $csvfile");
sleep(3);
$xmlfile = preg_replace('/\\.[^.\\s]{3,4}$/', '.xml', $csvfile);
if (file_exists("$csvfile") && (file_exists("$xmlfile"))){
header("Location:index.php?msg");
exit;
}
else if (!file_exists("$csvfile")){
header("Location:index.php?msgf=".basename($csvfile)." creation failed!");
exit;
}
else if (!file_exists("$xmlfile")){
header("Location:index.php?msgf=".basename($xmlfile)." creation failed!");
exit;
}
//exit;
} // Just the end
?>
( Yes, bad idea to pass variables in the url.. I got that covered )
I use sleep(N); because I know the java takes short to create the file, same for the csv on the php.
How can I improve the check on the file, to wait the necessary time before reporting the status OK or NOT ok if the file was not created?
After reading your comments, I think "the best loop" isn't a good question to get a better answer.
The linked script just give a good approach when the script expects a file. That script will wait until the file is created or forever (but the creator ensures about the file creation).
Better than that, you could give a particular period to ensure if the file exists or not.
If after the shell_exec the java script didn't create the file (which I think is almost impossible, but is just a thought), you could use a code like above:
$cycles = 0;
while (!($isFileCreated = file_exists($filename)) && $cycles > 1000) {
$cycles++;
usleep(1);
}
if (!$isFileCreated)
{
//some action
//throw new RuntimeException("File doesn't exists");
}
//another action
The script above will wait until the file is created or until reach a particular amount of cycles (it's better to call cycles than microseconds, because I can't ensure that each cycle will be execute in one microsecond). The number of cycles can be changed if you need more time.
I have a script that re-writes a file every few hours. This file is inserted into end users html, via php include.
How can I check if my script, at this exact moment, is working (e.g. re-writing) the file when it is being called to user for display? Is it even an issue, in terms of what will happen if they access the file at the same time, what are the odds and will the user just have to wait untill the script is finished its work?
Thanks in advance!
More on the subject...
Is this a way forward using file_put_contents and LOCK_EX?
when script saves its data every now and then
file_put_contents($content,"text", LOCK_EX);
and when user opens the page
if (file_exists("text")) {
function include_file() {
$file = fopen("text", "r");
if (flock($file, LOCK_EX)) {
include_file();
}
else {
echo file_get_contents("text");
}
}
} else {
echo 'no such file';
}
Could anyone advice me on the syntax, is this a proper way to call include_file() after condition and how can I limit a number of such calls?
I guess this solution is also good, except same call to include_file(), would it even work?
function include_file() {
$time = time();
$file = filectime("text");
if ($file + 1 < $time) {
echo "good to read";
} else {
echo "have to wait";
include_file();
}
}
To check if the file is currently being written, you can use filectime() function to get the actual time the file is being written.
You can get current timestamp on top of your script in a variable and whenever you need to access the file, you can compare the current timestamp with the filectime() of that file, if file creation time is latest then the scenario occured when you have to wait for that file to be written and you can log that in database or another file.
To prevent this scenario from happening, you can change the script which is writing the file so that, it first creates temporary file and once it's done you just replace (move or rename) the temporary file with original file, this action would require very less time compared to file writing and make the scenario occurrence very rare possibility.
Even if read and replace operation occurs simultaneously, the time the read script has to wait will be very less.
Depending on the size of the file, this might be an issue of concurrency. But you might solve that quite easy: before starting to write the file, you might create a kind of "lock file", i.e. if your file is named "incfile.php" you might create an "incfile.php.lock". Once you're doen with writing, you will remove this file.
On the include side, you can check for the existance of the "incfile.php.lock" and wait until it's disappeared, need some looping and sleeping in the unlikely case of a concurrent access.
Basically, you should consider another solution by just writing the data which is rendered in to that file to a database (locks etc are available) and render that in a module which then gets included in your page. Solutions like yours are hardly to maintain on the long run ...
This question is old, but I add this answer because the other answers have no code.
function write_to_file(string $fp, string $string) : bool {
$timestamp_before_fwrite = date("U");
$stream = fopen($fp, "w");
fwrite($stream, $string);
while(is_resource($stream)) {
fclose($stream);
}
$file_last_changed = filemtime($fp);
if ($file_last_changed < $timestamp_before_fwrite) {
//File not changed code
return false;
}
return true;
}
This is the function I use to write to file, it first gets the current timestamp before making changes to the file, and then I compare the timestamp to the last time the file was changed.
I am trying to build a small demon in PHP that analyzes the logfiles on a linux system. (eg. follow the syslog).
I have managed to open the file via fopen and continuosly read it with stream_get_line. My problem starts when the monitored file is deleted and recreated (eg when rotating logs). The program then does not read anything anymore, even if the file grew larger than previously.
Is there an elegant solution for this? stream_get_meta_data does not help and using tail -f on the command line shows the same problem.
EDIT, added sample code
I tried to boil down the code to a minimum to illustrate what I am looking for
<?php
$break=FALSE;
$handle = fopen('./testlog.txt', 'r');
do {
$line = stream_get_line($handle, 100, "\n");
if(!empty($line)) {
// do something
echo $line;
}
while (feof($handle)) {
sleep (5);
$line = stream_get_line($handle, 100, "\n");
if(!empty($line)) {
// do something
echo $line;
}
// a commented on php.net indicated it is possible
// with tcp streams to distinguish empty and lost
// does NOT work here --> need somefunction($handle)
if($line !== FALSE && $line ='') $break=TRUE;
}
} while (!$break);
fclose($handle);
?>
When log files are rotated, the original file is copied, then deleted, and a new file with the same name is created. It may have the same name as the original file, but it has a different inode. Inodes (dumbed down description follows) are like hidden incremental index numbers for your files. You can change the name of a file, or move it, but it takes the inode with it. Once that original log file is deleted, you can't re-open a file with the same name using the same file handler, because the inode has changed. Your best bet is detect the failure, and attempt to open the new file.