I'd like to know if fstat() is cached.
Documentation says nothing about it.
(PHP 8.1.12)
EDIT:
I made a LOT of tests, I have now an accurate answer for that.
Introduction:
I created two scripts and ran them both simultaneously (locking the file pointer with an exclusive lock (flock($fp, LOCK_EX))* to prevent them from overwriting each other) and on their own; I printed print_r(fstat($fp)) at various stages of a loop that led to writing 5 MB of data. I also used var_dump(realpath_cache_get())
Returns an array of realpath cache entries. The keys are original path entries, and the values are arrays of data items, containing the resolved path, expiration date, and other options kept in the cache.
to detect the realpath cache and ran tests trying to clear and not clear the cache via clearstatcache().
What happened:
I found that the only item that gets cached is 'atime'
time of last access (Unix timestamp)
, but most upsetting of all: it is not put in the realpath cache, consequently nothing will change if you clear it via clearstatcache(), you can only restart the machine or wait for the cache timeout to clear the data (depending on the machine).
Fortunately this doesn't affect me since I only need 'size' element of fstat. Sadly I cannot give you the source code cause I changed it a lot of time to try all possibilities.
Thank you to everyone who spent time to help me.
*NOTE: Use flock in both files, if you use it only in one the other file will be able to write in the not locked file that has been ran before
CONCLUSION
The only cached element of fstat is atime, but it is not cached in the realpath, so clearstatcache() is useless
I made a quick demo that appears to show that it is not cached. However, see the edit, too
$filename = __DIR__.'/test.txt';
#unlink($filename);
touch($filename);
echo 'STAT - Size before write: '.stat($filename)['size'], PHP_EOL;
file_put_contents($filename, 'test');
echo 'STAT - Size after write: '.stat($filename)['size'], PHP_EOL;
clearstatcache();
echo 'STAT - Size after cache clear: '.stat($filename)['size'], PHP_EOL;
#unlink($filename);
touch($filename);
$fp = fopen($filename, 'wb');
echo 'FSTAT - Size before write: '.fstat($fp)['size'], PHP_EOL;
fwrite($fp, 'test');
echo 'FSTAT - Size after write: '.fstat($fp)['size'], PHP_EOL;
clearstatcache();
echo 'FSTAT - Size after cache clear: '.fstat($fp)['size'], PHP_EOL;
Output:
STAT - Size before write: 0
STAT - Size after write: 0
STAT - Size after cache clear: 4
FSTAT - Size before write: 0
FSTAT - Size after write: 4
FSTAT - Size after cache clear: 4
Edit
Per #Barmar, I ran the test again, this time with just an fstat call followed by a sleep(10), then I quickly updated the file with vim manually, and then one final fstat call (all in the same request), and that one came back as cached.
I then ran that again, this time with clearstatcache() before the final fstat, and it did not change. I also tried the tests with both w and r modes for fopen, same results.
So there does appear to be a cache of some sort, but I don't think it is the stat cache.
As a supplement:
I also did exactly such a test as Chris Haas describes in the comment.
/*
* Open test.txt with a text editor
* Run this script
* Add some characters in the editor and save while the script is running
*/
$file = __DIR__.'/../test/test.txt';
$fp = fopen($file,'r');
$size[] = fstat($fp)['size'];
sleep(10);
$size[] = fstat($fp)['size'];
var_dump($size);
//array(2) { [0]=> int(3) [1]=> int(13) }
Same results for PHP on Win10 and a small Linux.
Related
I wanted to wait for all processes reading a certain file in PHP by obtaining an exclusive lock on that file, and after that delete (unlink) the file. This concerns files like profile pictures which a user can delete or change. The name of the file will be something like the user ID.
My code:
//Obtain lock
$file = fopen("path/to/file", "r"); //(I'm not sure which mode to use here btw)
flock($file, LOCK_EX);
//Delete file
unlink("path/to/file");
Line 3 waits for all locks to be released, which is good, but the unlink function throws an error: Warning: unlink(path/to/file): Resource temporarily unavailable in path/to/script on line xx
To prevent this I could release the lock before calling unlink, but this means another process could lock on the file again, which would cause the same error.
My questions are:
Is it possible to delete a file in PHP without releasing the lock? That is, without the risk of other processes trying to use the file at the same time.
If not:
Is this possible in Windows at all? How about Unix?
Should I involve my database for this matter and lock on rows in the database instead, or is there a better way?
Another option I can see is repeating this piece of code, including a release of the lock before calling unlink, until unlink succeeds, but this seems a bit messy, right?
Hey I'm struggling with this too, 2 years later. Kind of seems dumb you can't acquire an exclusive lock on a file when trying to rename or unlink it, or at least the documentation isn't there for doing this.
One solution is open the file for writing, acquire an exclusive lock, clear contents of the file using ftruncate, close it, and then unlink it. When you're reading from the file, you can check the size to make sure the file has contents.
When deleting (untested code):
$fh = fopen('yourfile.txt', 'c'); // 'w' mode truncates file, you don't want to do that yet!
flock($fh, LOCK_EX); // blocking, but you can use LOCK_EX | LOCK_NB for nonblocking and a loop + sleep(1) for a timeout
ftruncate($fh, 0); // truncate file to 0 length
fclose($fh);
unlink('yourfile.txt');
When reading (untested code):
if (!file_exists('yourfile.txt') || filesize('yourfile.txt') <= 0) {
print 'nah.jpg, must be dELeTeD :O';
}
in my PHP project, I use some kind of counter that appends to an existing (or new) file very often:
$f = fopen($filename, 'ab');
fwrite($f);
fclose($f);
When a new file is created, I have to edit this file's permissions, so another user may access the file as well:
$existed = file_exists($filename);
// Do the append
$f = fopen($filename, 'ab');
fwrite($f);
fclose($f);
// Update permissions
if (!$existed) {
#chmod($filename, 0666);
}
Is there any way to find out, whether 'a' (append) created a new file or appended to an exiting one without using file_exists()? To my understanding, file_exists() retrieves the file stats, which causes some unnecessary overhead compared to a simple file-append. As the function is used very often, I wonder if there's an option to tell if fopen(..., 'a') created a new file without using file_exists()?
Note: This is mostly a question of style and interest, not a true performance issue. But if I am mistaken and fopen() already retrieves the file stats, please let me know!
Update
Okay, it really is a rather academic question. Here're some performance tests run on a windows system (Apache, Win8.1 - no UNIX file permissions) and a linux machine (Nginx, Ubuntu 14.04, virutal machine).
Each test run with 1000 repetitions, file deleted before the first repetition.
Win Linux
simply append one byte 1.8ms 9.4ms
append + clearstatcache() 1.8ms 9.3ms
test fileexists() + append 2.2ms 10.5ms
fileexists() + append + clear 2.2ms 11.0ms
append + chmod() 2.7ms 12.3ms
append + fileexists() -> chmod() 3.3ms 10.6ms
Note: The last one is the only one that uses and IF within the test loop.
The php fopen is just a call to the libc fopen, that automatically creates a file for the modes w,w+,a and a+. As far as I can see, there is no way to get the stat with the permission bits from the returned file pointer.
It seems that PHP stores the stat array for each opened file and you can access it with fstat($fp) with the opened file handle $fp. But the mode field contains inode permission bits. I can't immediately see how "inode permission bits" are related to the "UNIX file mode". The stat system call does not use this term.
You can use "r+" mode to open your file and create it if that fails. If not you need to SEEK to then end to achieve something similar.
But finally it's best to check for existence before you open the file.
No, fopen just returns the resource, it doesn't return or set a flag that indicates whether the file already existed - http://php.net/manual/en/function.fopen.php
EDIT: see the performance test in the edited question.
Why not calling chmod() every times?
Your file_exist() is probably (maybe a little performance test...) more expensive than a chmod().
// Do the append
$f = fopen($filename, 'ab');
fwrite($f);
fclose($f);
// Update persmissions
#chmod($this->filename, 0666);
Have a file in a website. A PHP script modifies it like this:
$contents = file_get_contents("MyFile");
// ** Modify $contents **
// Now rewrite:
$file = fopen("MyFile","w+");
fwrite($file, $contents);
fclose($file);
The modification is pretty simple. It grabs the file's contents and adds a few lines. Then it overwrites the file.
I am aware that PHP has a function for appending contents to a file rather than overwriting it all over again. However, I want to keep using this method since I'll probably change the modification algorithm in the future (so appending may not be enough).
Anyway, I was testing this out, making like 100 requests. Each time I call the script, I add a new line to the file:
First call:
First!
Second call:
First!
Second!
Third call:
First!
Second!
Third!
Pretty cool. But then:
Fourth call:
Fourth!
Fifth call:
Fourth!
Fifth!
As you can see, the first, second and third lines simply disappeared.
I've determined that the problem isn't the contents string modification algorithm (I've tested it separately). Something is messed up either when reading or writing the file.
I think it is very likely that the issue is when the file's contents are read: if $contents, for some odd reason, is empty, then the behavior shown above makes sense.
I'm no expert with PHP, but perhaps the fact that I performed 100 calls almost simultaneously caused this issue. What if there are two processes, and one is writing the file while the other is reading it?
What is the recommended approach for this issue? How should I manage file modifications when several processes could be writing/reading the same file?
What you need to do is use flock() (file lock)
What I think is happening is your script is grabbing the file while the previous script is still writing to it. Since the file is still being written to, it doesn't exist at the moment when PHP grabs it, so php gets an empty string, and once the later processes is done it overwrites the previous file.
The solution is to have the script usleep() for a few milliseconds when the file is locked and then try again. Just be sure to put a limit on how many times your script can try.
NOTICE:
If another PHP script or application accesses the file, it may not necessarily use/check for file locks. This is because file locks are often seen as an optional extra, since in most cases they aren't needed.
So the issue is parallel accesses to the same file, while one is writing to the file another instance is reading before the file has been updated.
PHP luckily has a mechanisms for locking the file so no one can read from it until the lock is released and the file has been updated.
flock()
can be used and the documentation is here
You need to create a lock, so that any concurrent requests will have to wait their turn. This can be done using the flock() function. You will have to use fopen(), as opposed to file_get_contents(), but it should not be a problem:
$file = 'file.txt';
$fh = fopen($file, 'r+');
if (flock($fh, LOCK_EX)) { // Get an exclusive lock
$data = fread($fh, filesize($file)); // Get the contents of file
// Do something with data here...
ftruncate($fh, 0); // Empty the file
fwrite($fh, $newData); // Write new data to file
fclose($fh); // Close handle and release lock
} else {
die('Unable to get a lock on file: '.$file);
}
Here's my code:
$cachefile = "cache/ttcache.php";
if(file_exists($cachefile) && ((time() - filemtime($cachefile)) < 900))
{
include($cachefile);
}
else
{
ob_start();
/*resource-intensive loop that outputs
a listing of the top tags used on the website*/
$fp = fopen($cachefile, 'w');
fwrite($fp, ob_get_contents());
fflush($fp);
fclose($fp);
ob_end_flush();
}
This code seemed like it worked fine at first sight, but I found a bug, and I can't figure out how to solve it. Basically, it seems that after I leave the page alone for a period of time, the cache file empties (either that, or when I refresh the page, it clears the cache file, rendering it blank). Then the conditional sees the now-blank cache file, sees its age as less than 900 seconds, and pulls the blank cache file's contents in place of re-running the loop and refilling the cache.
I catted the cache file in the command line and saw that it is indeed blank when this problem exists.
I tried setting it to 60 seconds to replicate this problem more often and hopefully get to the bottom of it, but it doesn't seem to replicate if I am looking for it, only when I leave the page and come back after a while.
Any help?
In the caching routines that I write, I almost always check the filesize, as I want to make sure I'm not spewing blank data, because I rely on a bash script to clear out the cache.
if(file_exists($cachefile) && (filesize($cachefile) > 1024) && ((time() - filemtime($cachefile)) < 900))
This assumes that your outputted cache file is > 1024 bytes, which, usually it will be if it's anything relatively large. Adding a lock file would be useful as well, as noted in the comments above to avoid multiple processes trying to write to the same lock file.
you can double check the file size with the filesize() function, if it's too small, act as if the cache was old.
if there's no PHP in the file, you may want to either use readfile() for performance reasons to just spit the file back out to the end user.
This first script gets called several times for each user via an AJAX request. It calls another script on a different server to get the last line of a text file. It works fine, but I think there is a lot of room for improvement but I am not a very good PHP coder, so I am hoping with the help of the community I can optimize this for speed and efficiency:
AJAX POST Request made to this script
<?php session_start();
$fileName = $_POST['textFile'];
$result = file_get_contents($_SESSION['serverURL']."fileReader.php?textFile=$fileName");
echo $result;
?>
It makes a GET request to this external script which reads a text file
<?php
$fileName = $_GET['textFile'];
if (file_exists('text/'.$fileName.'.txt')) {
$lines = file('text/'.$fileName.'.txt');
echo $lines[sizeof($lines)-1];
}
else{
echo 0;
}
?>
I would appreciate any help. I think there is more improvement that can be made in the first script. It makes an expensive function call (file_get_contents), well at least I think its expensive!
This script should limit the locations and file types that it's going to return.
Think of somebody trying this:
http://www.yoursite.com/yourscript.php?textFile=../../../etc/passwd (or something similar)
Try to find out where delays occur.. does the HTTP request take long, or is the file so large that reading it takes long.
If the request is slow, try caching results locally.
If the file is huge, then you could set up a cron job that extracts the last line of the file at regular intervals (or at every change), and save that to a file that your other script can access directly.
readfile is your friend here
it reads a file on disk and streams it to the client.
script 1:
<?php
session_start();
// added basic argument filtering
$fileName = preg_replace('/[^A-Za-z0-9_]/', '', $_POST['textFile']);
$fileName = $_SESSION['serverURL'].'text/'.$fileName.'.txt';
if (file_exists($fileName)) {
// script 2 could be pasted here
//for the entire file
//readfile($fileName);
//for just the last line
$lines = file($fileName);
echo $lines[count($lines)-1];
exit(0);
}
echo 0;
?>
This script could further be improved by adding caching to it. But that is more complicated.
The very basic caching could be.
script 2:
<?php
$lastModifiedTimeStamp filemtime($fileName);
if (isset($_SERVER['HTTP_IF_MODIFIED_SINCE'])) {
$browserCachedCopyTimestamp = strtotime(preg_replace('/;.*$/', '', $_SERVER['HTTP_IF_MODIFIED_SINCE']));
if ($browserCachedCopyTimestamp >= $lastModifiedTimeStamp) {
header("HTTP/1.0 304 Not Modified");
exit(0);
}
}
header('Content-Length: '.filesize($fileName));
header('Expires: '.gmdate('D, d M Y H:i:s \G\M\T', time() + 604800)); // (3600 * 24 * 7)
header('Last-Modified: '.date('D, d M Y H:i:s \G\M\T', $lastModifiedTimeStamp));
?>
First things first: Do you really need to optimize that? Is that the slowest part in your use case? Have you used xdebug to verify that? If you've done that, read on:
You cannot really optimize the first script usefully: If you need a http-request, you need a http-request. Skipping the http request could be a performance gain, though, if it is possible (i.e. if the first script can access the same files the second script would operate on).
As for the second script: Reading the whole file into memory does look like some overhead, but that is neglibable, if the files are small. The code looks very readable, I would leave it as is in that case.
If your files are big, however, you might want to use fopen() and its friends fseek() and fread()
# Do not forget to sanitize the file name here!
# An attacker could demand the last line of your password
# file or similar! ($fileName = '../../passwords.txt')
$filePointer = fopen($fileName, 'r');
$i = 1;
$chunkSize = 200;
# Read 200 byte chunks from the file and check if the chunk
# contains a newline
do {
fseek($filePointer, -($i * $chunkSize), SEEK_END);
$line = fread($filePointer, $i++ * $chunkSize);
} while (($pos = strrpos($line, "\n")) === false);
return substr($line, $pos + 1);
If the files are unchanging, you should cache the last line.
If the files are changing and you control the way they are produced, it might or might not be an improvement to reverse the order lines are written, depending on how often a line is read over its lifetime.
Edit:
Your server could figure out what it wants to write to its log, put it in memcache, and then write it to the log. The request for the last line could be fulfulled from memcache instead of file read.
The most probable source of delay is that cross-server HTTP request. If the files are small, the cost of fopen/fread/fclose is nothing compared to the whole HTTP request.
(Not long ago I used HTTP to retrieve images to dinamically generate image-based menus. Replacing the HTTP request by a local file read reduced the delay from seconds to tenths of a second.)
I assume that the obvious solution of accessing the file server filesystem directly is out of the question. If not, then it's the best and simplest option.
If not, you could use caching. Instead of getting the whole file, you just issue a HEAD request and compare the timestamp to a local copy.
Also, if you are ajax-updating a lot of clients based on the same files, you might consider looking at using comet (meteor, for example). It's used for things like chats, where a single change has to be broadcasted to several clients.