What is the PHP manual talking about with clearstatcache()?

What is the PHP manual talking about with clearstatcache()? - php

You should also note that PHP doesn't cache information about non-existent files. So, if you call file_exists() on a file that doesn't exist, it will return false until you create the file. If you create the file, it will return true even if you then delete the file. However unlink() clears the cache automatically.
Source: https://www.php.net/manual/en/function.clearstatcache.php
I've read this numerous times but simply cannot make any sense of it. What is the actual information it's trying to convey?
To me, it sounds as if it's contradicting itself. First it says that PHP doesn't cache information about non-existent files. Then it goes on to state that it will return true even if you delete the file. But also that unlink() clears the cache automatically.
Is it referring to the file being deleted outside of PHP? Isn't that the only thing it can mean? But the way it's put is so incredibly confusing, ambiguous and weird. Why even mention that file_exists() will return false until you create the file? It's like saying that water will remain wet even if you clap your hands.
Is it actually saying, in a very round-about way, that I have to always run clearstatcache() before file_exists() unless I want a potentially lying response because a non-PHP script/program has deleted the file in question after the script was launched?
I swear I've spent half my life just re-reading cryptic paragraphs like this because they just don't seem to be written by a human being. I've many, many times had to ask questions like this about small parts of various manuals, and even then, who knows if your interpretations are correct?

I'd like to first address your last paragraph:
I swear I've spent half my life just re-reading cryptic paragraphs like this because they just don't seem to be written by a human being.
Quite the opposite: like all human beings, the people who contribute to the PHP manual are not perfect, and make mistakes. It's worth stressing that in this case these people are not professional writers being paid to write the text, they are volunteers who have spent their free time working on it, and yet the result is better than many manuals I've seen for paid software. If there are parts you think could be improved, I encourage you to join that effort.
Now, onto the actual question. Before going onto the part you quote, let's look at the first sentence on that page:
When you use stat(), lstat(), or any of the other functions listed in the affected functions list (below), PHP caches the information those functions return in order to provide faster performance.
What this is saying is that when PHP asks the system about the status of a file (permissions, modification times, etc), it stores the answer in a cache. Next time you ask about the same file, it looks in that cache rather than asking the system again.
Now, onto the part you quoted:
You should also note that PHP doesn't cache information about non-existent files.
Straight-forward enough: if PHP asks the system about the status of a file, and the answer is "it doesn't exist", PHP does not store that answer in its cache.
So, if you call file_exists() on a file that doesn't exist, it will return false until you create the file.
The first time you call file_exists() for a file, PHP will ask the system; if the system says it doesn't exist, and you call file_exists() again, PHP will ask the system again. As soon as the file starts existing, a call to file_exists() will return true.
Put another way, file_exists() is guaranteed not to return false if the file exists at the time you call it.
If you create the file, it will return true even if you then delete the file.
This is the point of the paragraph: as soon as the system says "yes, the file exists", PHP will store the information about it in its cache. If you then call file_exists() again, PHP will not ask the system; it will assume that it still exists, and return true.
In other words, file_exists() is not guaranteed to return true if the file doesn't exist, because it might have previously existed, and had information filed in the cache.
However unlink() clears the cache automatically.
As you guessed, all of the above is about you monitoring if something else has created or deleted the file. This is just confirming that if you delete it from within PHP itself, PHP knows that any information it had cached about that file is now irrelevant, and discards it.
Perhaps a different way to word this would be to give a scenario: Imagine you have a piece of software that creates a temporary file while it's running; you want to monitor when it is created, and when it is deleted. If you write a loop which repeatedly calls file_exists(), it will start returning true as soon as the software creates the file, without any delay or false negatives; however, it will then carry on returning true, even after the software deletes the file. In order to see when it is deleted, you need to additionally call clearstatcache() on each iteration of the loop, so that PHP asks the system every time.

Related

PHP include inside loop - is it hard on disk io?

Im working on a website that shows products.
The site is written in PHP.
to make it easier to maintain, I created a php code for the "product" item with thumbnail, price, etc.
I would like to know if it is hard on Disk IO to put an include file inside a foreach. Let say the array counts about 200 items.
foreach($wines AS $wine):
require 'components/wine.php';
endforeach;
Are we still ok or there will have some hosting issue?
Thanks!

Answer
Regarding your question though, its probably Ok with the disk. Files imported using require() are also cached in precompiled bytecode the same way as the main file (if you have OPCache or any cache system enabled), so PHP wont read it from disk every time you include it.
Recomendation
I would not recommend that approach at all. I think a more recomendable approach would be to define a function that returns or displays whatever you want to show, then require the file once and call the function between the loop.
I see many downsides in your approach, like:
Its a bad practice, it couples your code because now this file can only be included in this file. It becomes required that the contents of the file are aware of the file that its including it, making it harder to maintain and more prone to errors in the future.
It can arise problems in the future, eg. if someone declares a function inside the file, it would cause a crash as requiring the file twice would redeclare the function leading to an error
It will cause some overhead in the execution, as PHP perform some validations and operations when a file is included
If you want more information about require or OPCache I link documentation below
https://www.php.net/manual/en/function.include.php
https://www.php.net/manual/en/intro.opcache.php

Is this file operation really "atomic" and "safe"? If not, how can I make it so?

I have a function which is supposed to return a unique filename based on the input file path. For example, if the user sends in C:\meow\bark.txt, and C:\meow\bark.txt already exists, it returns C:\meow\bark_1.txt, or _2 if the _1 one already exists, and so on.
The crucial part of this function is currently like this:
if (!file_exists($candidate_file_path))
{
if (touch($candidate_file_path))
return $candidate_file_path;
}
That is, if the "candidate" file does not exist, and it can be created ("touched"), then it appears to be safe to return this as a unique file path.
However, I'm not so sure about this. My fear is that some other script, running at the same time, also tries to assign a file with the same name in the same dir at the same time. If script A is currently inside the first if, and is about to "touch" its candidate, and script B has just already done that, then script A will still have its touch() return true, because it just sets the file system metadata for the file, and that's what it returns as "true"/successful. Thus, in such a situation, however unlikely, I will end up with both scripts thinking that they have the same "unique" file path!
This is obviously a recipe for disaster, and eerily similar to my database "transactions" nightmares of the not-so-distant past.
While I'm vaguely aware of the concept of "exclusive file locks", I have to be honest: no matter how much I read about that and tried to use it in the past for other things, I could never figure out how. It would really help me if you could think of some way to make this operation "atomic", or perhaps simply to replace the touch call with some other function which I don't even know exists?
I should note that the PHP function is_writeable has caused me serious issues in the past, which made me stop using it, according to my own internal comments to myself. If this can be avoided, it would also be good.

This operation is not atomic and not safe.
But there is a way for you to create a file which is guaranteed not to exist by using fopen() with the 'x'(or 'x+' for added reading) mode as documented in the manual.

is PHP fopen() as APPEND better than file_get_(put)contents(), to avoid losing logs from multiple requests?

Most examples I've seen to update text based log files seem to suggest checking that the file exists, if so load it into a big string with file_get_contents(), add your new logs onto it, and then writing it back with file_put_contents().
I may be over-thinking this, but I think I see two problems there. First, if the log file gets big, isn't it somewhat wasteful of the the scripts available memory to stuff the huge file contents into a variable? Second, it seems that if you did any processing between the 'get' and 'put', you risk the possibility that multiple site visitors may update between the two calls, resulting in lost log info.
So for a script that is simply called (GET or POST) and exited after doing some work, wouldn't it be better to just build up your current (shorter) log string to be written, and at then just before exit(), just open in APPEND mode and WRITE?
It would seem that either approach could lead to losing data if there were no LOCK on the file between get and put. In the case of file_get/put_contents, I see that method does have a flag available called "LOCK_EX", which I assume attempts to prevent that occurrence. But then there is that issue of the time taken to move a large file into an array, and add to it before writing back. Wouldn't it be better to use fopen (append) with some kind of 'lock', between the fopen() and the fwrite()?
I apologise as I DO understand that "best way to do something" questions are not appreciated by the community. But surely the is a preferred way that addresses the concerns I'm raising?
Thanks for any help.

Symfony 2 asset, check if file exists performance

on symfony 2, using Twig you can use the asset() function to link an image in the way
{{ asset('path_to_the_image') }}, now if the file does not exists then the src of the file keeps the same,
Thinking about that, I was tempted to create another Twig function on my TwigExtensions, in order to check the file existence, in order to do the following, If the file exists then I will use the given url and if does not exists then I will change the result of the function to a default image that I will use as a not_image image.
The motivation for this function is to always show an image to the user.
Now, my question.
I can't figure out the performance issues with this approach because I will check the file twice, the first time to check if exists and the other time is the request that asks for the file. And the usual thing to do is to put the asset address and in case the file does not exists replace it with some default file using javascript.
I will use php's file_exists function, and on the manual I have read that is very inexpensive and that in case that the file indeed exists the result is cached to avoid performance issues.
Thanks in advance...

file_exists triggers a read access to your file system (or, even just the FS metadata). This is very inexpensive indeed. Keep in mind that, when running a Symfony application, you’re usually accessing hundreds of PHP files alone.
The result of file_exists is cached indeed, but only while during the execution of a script. So, if you call file_exists several times within one script execution (and don’t call clearstatcache in between), the result is cached. If you call the script again later, PHP will look for the file again.
However, if you really worry about performance, you shouldn’t check for the files’ existence “on the fly”, but instead create a Symfony command that checks if all linked assets are valid during building or deployment.
This command would basically render all pages, and your custom asset function would, instead of returning a dummy image, throw an exception. This way, you only need to check once, during deployment, if all assets are valid.

php script that deletes itself after completion

There's a problem that I'm currently investigating: after a coworker left, one night some files that he created, basicly all his work on a completed project that the boss hasn't payed him for, got deleted. From what I know all access credentials have been changed.
Is it possible to do this by setting up a file to do the deletion task and then delete the file in question? Or something similar that would change the code after the task has been done? Is this untraceable? (i'm thinking he could have cleverly disguised the request as a normal request, and i have skimmed through the code base and through the raw access logs and found nothing).

It's impossible to tell whether this is what actually happened or not, but setting up a mechanism that deletes files is trivial.
This works for me:
<? // index.php
unlink("index.php");
it would be a piece of cake to set up a script that, if given a certain GET variable for example, would delete itself and a number of other files.
Except for the server access logs, I'm not aware of a way to trace this - however, depending on your OS and file system, an undelete utility may be able to recover the files.
It has already been said in the comments how to prevent this - using centralized source control, and backups. (And of course paying your developers - although this kind of stuff can happen to anyone.)

Is is possible to do this by setting up a file to do the deletion task
and then delete the file in question?
Yes it is. He could have left an innoculous looking php file on the server which when accessed over the web later, would give him shell access. Getting this file to self delete when he is done is possible.
Create a php file with the following in it:
<?php
if ($_GET['vanish'] == 'y') {
echo "You wouldn't find me the next time you look!";
#unlink(preg_replace('!\(\d+\)\s.*!', '', __FILE__));
} else {
echo "I can self destruct ... generally";
}
?>
Put on your server and navigate to it. Then navigate again with a "vanish=y" argument and see what happens

We Keep Coding

PHP, A popular general-purpose scripting language that is especially suited to web development.