Symfony 2 asset, check if file exists performance - php

on symfony 2, using Twig you can use the asset() function to link an image in the way
{{ asset('path_to_the_image') }}, now if the file does not exists then the src of the file keeps the same,
Thinking about that, I was tempted to create another Twig function on my TwigExtensions, in order to check the file existence, in order to do the following, If the file exists then I will use the given url and if does not exists then I will change the result of the function to a default image that I will use as a not_image image.
The motivation for this function is to always show an image to the user.
Now, my question.
I can't figure out the performance issues with this approach because I will check the file twice, the first time to check if exists and the other time is the request that asks for the file. And the usual thing to do is to put the asset address and in case the file does not exists replace it with some default file using javascript.
I will use php's file_exists function, and on the manual I have read that is very inexpensive and that in case that the file indeed exists the result is cached to avoid performance issues.
Thanks in advance...

file_exists triggers a read access to your file system (or, even just the FS metadata). This is very inexpensive indeed. Keep in mind that, when running a Symfony application, you’re usually accessing hundreds of PHP files alone.
The result of file_exists is cached indeed, but only while during the execution of a script. So, if you call file_exists several times within one script execution (and don’t call clearstatcache in between), the result is cached. If you call the script again later, PHP will look for the file again.
However, if you really worry about performance, you shouldn’t check for the files’ existence “on the fly”, but instead create a Symfony command that checks if all linked assets are valid during building or deployment.
This command would basically render all pages, and your custom asset function would, instead of returning a dummy image, throw an exception. This way, you only need to check once, during deployment, if all assets are valid.

Related

What is the PHP manual talking about with clearstatcache()?

You should also note that PHP doesn't cache information about non-existent files. So, if you call file_exists() on a file that doesn't exist, it will return false until you create the file. If you create the file, it will return true even if you then delete the file. However unlink() clears the cache automatically.
Source: https://www.php.net/manual/en/function.clearstatcache.php
I've read this numerous times but simply cannot make any sense of it. What is the actual information it's trying to convey?
To me, it sounds as if it's contradicting itself. First it says that PHP doesn't cache information about non-existent files. Then it goes on to state that it will return true even if you delete the file. But also that unlink() clears the cache automatically.
Is it referring to the file being deleted outside of PHP? Isn't that the only thing it can mean? But the way it's put is so incredibly confusing, ambiguous and weird. Why even mention that file_exists() will return false until you create the file? It's like saying that water will remain wet even if you clap your hands.
Is it actually saying, in a very round-about way, that I have to always run clearstatcache() before file_exists() unless I want a potentially lying response because a non-PHP script/program has deleted the file in question after the script was launched?
I swear I've spent half my life just re-reading cryptic paragraphs like this because they just don't seem to be written by a human being. I've many, many times had to ask questions like this about small parts of various manuals, and even then, who knows if your interpretations are correct?
I'd like to first address your last paragraph:
I swear I've spent half my life just re-reading cryptic paragraphs like this because they just don't seem to be written by a human being.
Quite the opposite: like all human beings, the people who contribute to the PHP manual are not perfect, and make mistakes. It's worth stressing that in this case these people are not professional writers being paid to write the text, they are volunteers who have spent their free time working on it, and yet the result is better than many manuals I've seen for paid software. If there are parts you think could be improved, I encourage you to join that effort.
Now, onto the actual question. Before going onto the part you quote, let's look at the first sentence on that page:
When you use stat(), lstat(), or any of the other functions listed in the affected functions list (below), PHP caches the information those functions return in order to provide faster performance.
What this is saying is that when PHP asks the system about the status of a file (permissions, modification times, etc), it stores the answer in a cache. Next time you ask about the same file, it looks in that cache rather than asking the system again.
Now, onto the part you quoted:
You should also note that PHP doesn't cache information about non-existent files.
Straight-forward enough: if PHP asks the system about the status of a file, and the answer is "it doesn't exist", PHP does not store that answer in its cache.
So, if you call file_exists() on a file that doesn't exist, it will return false until you create the file.
The first time you call file_exists() for a file, PHP will ask the system; if the system says it doesn't exist, and you call file_exists() again, PHP will ask the system again. As soon as the file starts existing, a call to file_exists() will return true.
Put another way, file_exists() is guaranteed not to return false if the file exists at the time you call it.
If you create the file, it will return true even if you then delete the file.
This is the point of the paragraph: as soon as the system says "yes, the file exists", PHP will store the information about it in its cache. If you then call file_exists() again, PHP will not ask the system; it will assume that it still exists, and return true.
In other words, file_exists() is not guaranteed to return true if the file doesn't exist, because it might have previously existed, and had information filed in the cache.
However unlink() clears the cache automatically.
As you guessed, all of the above is about you monitoring if something else has created or deleted the file. This is just confirming that if you delete it from within PHP itself, PHP knows that any information it had cached about that file is now irrelevant, and discards it.
Perhaps a different way to word this would be to give a scenario: Imagine you have a piece of software that creates a temporary file while it's running; you want to monitor when it is created, and when it is deleted. If you write a loop which repeatedly calls file_exists(), it will start returning true as soon as the software creates the file, without any delay or false negatives; however, it will then carry on returning true, even after the software deletes the file. In order to see when it is deleted, you need to additionally call clearstatcache() on each iteration of the loop, so that PHP asks the system every time.

Is this file operation really "atomic" and "safe"? If not, how can I make it so?

I have a function which is supposed to return a unique filename based on the input file path. For example, if the user sends in C:\meow\bark.txt, and C:\meow\bark.txt already exists, it returns C:\meow\bark_1.txt, or _2 if the _1 one already exists, and so on.
The crucial part of this function is currently like this:
if (!file_exists($candidate_file_path))
{
if (touch($candidate_file_path))
return $candidate_file_path;
}
That is, if the "candidate" file does not exist, and it can be created ("touched"), then it appears to be safe to return this as a unique file path.
However, I'm not so sure about this. My fear is that some other script, running at the same time, also tries to assign a file with the same name in the same dir at the same time. If script A is currently inside the first if, and is about to "touch" its candidate, and script B has just already done that, then script A will still have its touch() return true, because it just sets the file system metadata for the file, and that's what it returns as "true"/successful. Thus, in such a situation, however unlikely, I will end up with both scripts thinking that they have the same "unique" file path!
This is obviously a recipe for disaster, and eerily similar to my database "transactions" nightmares of the not-so-distant past.
While I'm vaguely aware of the concept of "exclusive file locks", I have to be honest: no matter how much I read about that and tried to use it in the past for other things, I could never figure out how. It would really help me if you could think of some way to make this operation "atomic", or perhaps simply to replace the touch call with some other function which I don't even know exists?
I should note that the PHP function is_writeable has caused me serious issues in the past, which made me stop using it, according to my own internal comments to myself. If this can be avoided, it would also be good.
This operation is not atomic and not safe.
But there is a way for you to create a file which is guaranteed not to exist by using fopen() with the 'x'(or 'x+' for added reading) mode as documented in the manual.

PHP - Global, Update-Able Variable Available to All Clients

GOAL: To have a global variable that any php on my website can access. The variable would be a bool.
What I am stuck on is how I can store such a variable that is available to all php scripts, but can also be updated via php.
The variable is a bool that determines whether or not the site loads advertisements based off if a certain criteria was met that day. So, every day I will have a cron job that runs to reset this variable, thus meaning the variable needs to be update-able via php.
The only way I can think of to store it is either via a db table, which seems like overkill just for one little bool, or a json file that I store outside of the public_html directory.
With the json file, I would just perform a get on load with file_get_contents via my "class lib" file that is present on all pages of the site. Then do something similar to update it with the cron job.
NOTE: I do have a php file that is present on ALL of my pages, so including a file on every page is not a problem.
Is there a better way? It would be nice if there was a way I could just set a PHP superglobal or something, but I'm unsure if settings something like $_SERVER['custom-variable'] sticks or if it's just for that session.
Apologies if this is a simple mis-understanding of how PHP superglobals/constants work.
I appreciate any help.
A couple of options:
Just store it in the database. This is a perfectly reasonable solution.
Store it in a file, in whatever format you want. JSON is handy if you want to store anything more complex than a single string or number.
Store it in a PHP file which returns a value, e.g.
<?php return array("ads_enabled" => true);
then require() that file -- the require() call will return that value. If your server has a PHP opcode cache enabled, this will be faster than loading a normal file, as the contents of the file will be cached.
Note that the file cannot return false, as that's indistinguishable from the include() failing.
The following are not viable options:
Storing it in a session. Sessions are per-user, and start out empty.
Storing it in an in-memory cache, like APCu or Memcache. Caches are not persistent storage; the value may be evicted from the cache.
Storing it in an environment variable. Environment variables are awkward to update in most server environments.
SetEnv APPLICATION_ENV "development"
Use that in your Apache vhost definition if you are using Apache and have access to modify it. I think you can do something similar in .htaccess files.
You can use a .env file and a library for reading that file.
If you are using a front controller where all requests pass through a single index.php file then you can set a global variable or constant in there.

Php File function doubts: URL or Path

Using the file function, is there any difference in using an URL or a path?
$my_array = file("http://www.mydomain.com/my_script.php?id=1");
$my_array = file($_SERVER['DOCUMENT_ROOT']."my_script.php?id=1");
I'm using the first one, but I guess that it's dependant on my server internet conection because, sometimes, despite the fact that the script is called (I know it because my_script.php inserts a row in the database) I don't get the response and $my_array is empty.
Am I right?
If so, using the second call would fill always $my_array with a response. Won't it?
Can I call a file from a path passing arguments in the same way I would do with the URL?
Edit: Thanks a lot for your answers, and sorry if this question is too stupid.. I'm working on some other guy code. He was doing it this way because my_script.php is also called from other server. I'll try to do it with require preparing firstly the $_GET variable (a bit tricky but I don't want to touch my_script.php).
Any path starting with http://... will make an actual HTTP request. Essentially, it'll do the same as if you typed that URL into your browser. If you're doing this for a script on your own server, it's extreme nonsense because:
it needs to "go out" and establish a TCP connection, to itself
it ties up another web server process
it ramps up another separate PHP process
it doesn't keep the context of the current PHP process
it may not be able to return anything at all, because it will only return whatever the other script echos out
This only ever makes sense if you're trying to communicate with some other remote server.
On the other hand, using ?id=1 on a local, non-http://... file is not possible, since ?id=1 is not a valid part of a file name (or at least it probably doesn't do what you think it does).
What you typically want is something like:
require __DIR__ . '/foo.php';
This includes the other PHP script in your current script as PHP code. You should be defining functions and classes, use autoload to load them and call them as needed, but this is quite a broad topic of proper code organisation I won't go into here.
All file() does is read a file. It's intended to load something from disk, but supports any fopen wrapper PHP provides, such as HTTP.
When you pass a URL to file(), it goes and fetches that URL from your web server. Your web server will execute your PHP and return the result, which is what you get back from file(). So no, what you have there are completely different mainly because one involves the web server and PHP, and the other doesn't.
Don't do it this way. Use require() or require_once() instead.

Security implications of writing files using PHP

I'm currently trying to create a CMS using PHP, purely in the interest of education. I want the administrators to be able to create content, which will be parsed and saved on the server storage in pure HTML form to avoid the overhead that executing PHP script would incur. Unfortunately, I could only think of a few ways of doing so:
Setting write permission on every directory where the CMS should want to write a file. This sounds like quite a bad idea.
Setting write permissions on a single cached directory. A PHP script could then include or fopen/fread/echo the content from a file in the cached directory at request-time. This could perhaps be carried out in a Mediawiki-esque fashion: something like index.php?page=xyz could read and echo content from cached/xyz.html at runtime. However, I'll need to ensure the sanity of $_GET['page'] to prevent nasty variations like index.php?page=http://www.bad-site.org/malicious-script.js.
I'm personally not too thrilled by the second idea, but the first one sounds very insecure. Could someone please suggest a good way of getting this done?
EDIT: I'm not in the favour of fetching data from the database. The only time I would want to fetch data from the database would be when the content is cached. Secondly, I do not have access to memcached or any PHP accelerator.
Since you're building a CMS, you'll have to accept that if the user wants to do evil things to visitors, they very likely can. That's true regardless of where you store your content.
If the public site is all static content, there's nothing wrong with letting the CMS write the files directly. However, you'll want to configure the web server to not execute anything in any directory writable by the CMS.
Even though you don't want to hit the database every time, you can set up a cache to minimize database reads. Zend_Cache works very nicely for this, and can be used quite effectively as a stand-alone component.
You should put your pages in a database and retrieve them using parameterized SQL queries.
I'd go with the second option but modify it so the files are retrieved using mod_rewrite rather than a custom php function.

Categories