I have to make sure a certain PHP script (started by a web request) does not run more then once simultaneously.
With binaries, it is quite easy to check if a process of a certain binary is already around.
However, a PHP script may be run by several pathways, eg. CGI, FCGI, inside webserver modules etc. so I cannot use system commands to find it.
So how to reliable check if another instance of a certain script is currently running?
The exact same strategy is used as one would chose with local applications:
The process manages a "lock file".
You define a static location in the file system. Upon script startup you check if a lock file exists in that location, if so you bail out. If not you first create that lock file, then proceed. During tear down of your script you delete that lock file again. Such lock file is a simple passive file, only its existence is of interest, often not its content. That is a standard procedure.
You can win extra candy points if you use the lock file not only as a passive semaphore, but if you store the process id of the generating process in it. That allows subsequent attempts to verify of that process actually still exists or has crashed in the mean time. That makes sense because such a crash would leave a stale lock file, thus create a dead lock.
To work around the issue discussed in the comments which correctly states that in some of the scenarios in which php scripts are used in a wen environment a process ID by itself may not be enough to reliably test if a given task has been successfully and completely processed one could use a slightly modified setup:
The incoming request does not directly trigger to task performing php script itself, but merely a wrapper script. That wrapper manages the lock file whilst delegating the actual task to be performed into a sub request to the http server. That allows the controlling wrapper script to use the additional information of the request state. If the actual task performing php script really crashes without prior notice, then the requesting wrapper knows about that: each request is terminated with a specific http status code which allows to decide if the task performing request has terminated normally or not. That setup should be reliable enough for most purposes. The chances of the trivial wrapper script crashing or being terminated falls into the area of a system failure which is something no locking strategy can reliably handle.
As PHP does not always provide a reliable way of file locking (it depends on how the script is run, eg. CGI, FCGI, server modules and the configuration), some other environment for locking should be used.
The PHP script can for example call another PHP interpreter in it's CLI variant. That would provide a unique PID that could be checked for locking. The PID should be stored to some lock file then which can be checked for stale lock by querying if a process using the PID is still around.
Maybe it is also possible to do all tasks needing the lock inside a shell script. Shell scripts also provide a unique PID and release it reliable after exit. A shell script may also use a unique filename that can be used to check if it is still running.
Also semaphores (http://php.net/manual/de/book.sem.php) could be used, which are explicitely managed by the PHP interpreter to reflect a scripts lifetime. They seem to work quite well, however there is not much fuzz around about how reliable they are in case of premature script death.
Also keep in mind that external processes launched by a PHP script may continue executing even if the script ends. For example, a user abort on FCGI releases passthru processes, which carry on working despite the client connection is closed. They may be killed later if enough output accumulated or not at all.
So such external processes have to locked as well, which can't be done by the PHP-accquired semaphores alone.
Related
I will try to summarize my problem in order to make it understandable.
I have a script serverHandler.php that can start multiples server using an another script server.php.
So I start a new server like this :
$server = shell_exec("php server.php");
So now I will have a server.php script that will run in the backgroung until I manually kill it.
Is there a way to directly manage the kill of this server within the script serverHandler.php like that ?
// Start the script server.php
$server = shell_exec("php server.php");
// Stop the script that run on background
// So the server will be stopped
killTask($server);
Shell management of tasks is typically done using the ID of a process (PID). In order to kill the process, you must keep track of this PID and then provide it to your kill command. If your serverHandler is a command line script then keeping a local copy of the PID could suffice, but in a web interface over HTTP/HTTPS you would need to send back the PID so it could be managed.
Using a stateless language like PHP for this is not recommended, however, as attempting to retrieve process information, determine whether or not the process is one of the server processes previously dispatched, and other fine little details will be unnecessarily complicated and, if you're not careful, error-prone and potentially even dangerous.
Better would be to use a stateful language like Java or Python for managing these processes. By using a single point of access with a maintained state, you can have several threads "waiting" on these processes so that :
you know for certain which PIDs are expected to be valid at all times,
you can avoid the need for excessive PID validation,
you minimize the security risks of bad PID validation,
you know if these processes end prematurely so you can remove them from the list of expected processes automatically
you can keep track of which PID is associated with which server instance.
Use the right tools for the problem you're trying to solve. PHP really isn't the tool for this particular problem (your servers can be written in PHP, but use a different language for your serverHandler to avoid headaches).
I've seen a lot of examples where a "lock" file is used to keep track of if a PHP script is currently running.
Example:
script starts
checks if "/tmp/lockfile" is currently locked
If it is locked, exit. If not, lock the file and continue
This way, if a long-running script is started up twice, only the first instance will run. Which is great.
However, it seems like the wrong way to go around it. Why don't we just check if the process is already running like this?
if(exec("ps -C " . basename(__FILE__) . " --no-headers | wc -l") > 1){
echo "Already running.";
exit;
}
Are there any potential pitfalls to this method? Why do I see the "lock" file workaround so often? It definitely seems more accurate to count the processes with the name we're looking for....
Based on comments here and my own observations, I've composed a list of pro's and con's of both approaches:
flock method:
pros:
More compatible across operating systems
No knowledge of bash required
More common approach, lots of examples
Works even with exec() disabled
Can use multiple locks in a single file to allow different running "modes" of the same file at the same time
cons:
It's not definite. If your lock file is deleted by an external process / user, you could end up with multiple processes. If you're saving the lock file in the /tmp directory, that's a valid possibility, since everything in this directory is supposed to be "temporary"
Under certain circumstances, when a process dies unexpectedly, the file lock can be transferred to an unrelated process (I didn't believe this at first, but I found instances of it happening (although rarely) across 200+ unix based systems, in 3 different operating systems)
exec("ps -C...") method
pros:
Since you're actually counting the processes, it will work everytime, regardless of the state of file locks, etc.
cons:
Only works in linux
requires "exec" to be enabled
If you change the name of your script, it could cause double processes (and make sure your script name isn't hard-coded in the code)
Assumes that your script only has one running "mode"
EDIT: I ended up using this:
if (exec("pgrep -x " . $scriptName . " -u ". $currentUser . " | wc -l") > 1)
{
echo $scriptName . " is already running.\n";
exit;
}
... because ps doesn't allow you to filter on the owner of the process in addition to the process name, and I wanted to allow this script to run multiple times if a different user was running it.
EDIT 2:
... So, after having that running for a few days, it's not perfect either. Somehow, the process started up multiple times on the same machine under the same user. My only guess is that there was some issue (ran out of memory, etc) that caused the pgrep to return nothing, when it should have returned something.
So that means that NEITHER the flock method and the counting process methods are 100% reliable. You'll have to determine what approach will work better for your project.
Ultimately, I'm using another solution that stores the PID of the current task in a "lock" file, that's not actually locked with flock. Then, when the script starts up, checks if the lock file exists, and if it does, gets the contents (PID of the last time the script started up) Then, it checks if it's still running, by comparing the /proc/#PID#/cmdline contents with the name of the script that's running.
The most important reasons that lock files are used, unfortunately not given in the other answers, is that lock files use a locking mechanism that is atomic, allow you to run multiple instances your script, at a higher level context than the script itself, and are more secure.
Lock files are atomic
Iterating through a process list is inherently prone to race conditions; in the time it takes to retrieve and iterate through the list a second process might have just spawned, and you unintentionally end up with multiple processes.
The file locking mechanism is strictly atomic. Only one process can get an exclusive lock to a file, so when it does have a lock, there is no possibility the same command will be running twice.
Lock files allow for multiple instances of the same script
Say you want to have several separate instances of a script running, but each with its own scope/options. If you'd simply count the number of times the script appears in ps output, you'd only be able to run a single instance. By using separate lock files you can properly lock its running state in an individual scope.
Lock files are more secure
Lock files are simply a much more elegant and secure design. Instead of abusing a process list (which requires a lot more permissions and would have to be accessed and parsed differently on each OS) to deduce if a script is already running, you have a single dedicated system to explicitly lock a state.
Summary
So to reiterate all the reasons lock files are used over other methods:
Lock files allow for atomic locking, negating race conditions.
They allow for running multiple instances/scopes of the same script.
File system access is more universally available than execute permissions or full access to process information.
They are more secure, because no permissions outside of the specific lock file are needed.
Using file locks is more compatible across different operating systems.
It is a more elegant solution; doesn't require parsing through a process list.
As for the cons mentioned in other answers; that the lock file can be deleted and sometimes the lock is transferred to another process:
Deletion of lock files is prevent by setting proper permissions, and storing the file on non-volatile storage. Lock files are "definite" if used according to spec, unlike a process list.
It's true a process that forks a child process wil "bequeath" it's lock to any child processes that are still running, however this is easily remedied by specifically unlocking the file once the script is done. E.g. using flock --unlock.
Long story short: you should always use lock files over parsing running processes.
Other solutions
There are other solutions, though they usually offer no benefit over simple file locks unless you have additional requirements:
Mutexes in a separate database / store (e.g. redis).
Exclusively listening to a port on a network interface.
First, the command is not right . When I run php test.php and the command
ps -C test.php
whill get nothing. You can use ps -aux|grep 'test.php' -c to get process number.But exec("ps -aux|grep 'php test.php' -c"); return number must minus 2 is the real process number
The reason to use lock file most reason is exec or other command function need some special permission, and is disable_functions in php.ini default config.
Test script like this:
$count = exec("ps -aux|grep 'php test.php' -c");
if($count > 3){
echo "Already running.".$count;
exit;
}
while(1){
sleep(20);
}
The main reason why "lock files" are used is simply because they can be expected to work in any host-environment. If you can "create a file," and if you can "lock it," such code will work ... anywhere.
Also – "there is nothing to be gained by being 'clever.'" This well-known strategy is known to work well – therefore, "run with it."
I want to have my own variable that would be (most likely an array) storing what my php application is up to right now.
The application can trigger few processes that are in background (like downloading files) and I want to have a list what is being currently processed.
For example
if php calls exec() that will be downloading for 15mins
and then another download starts
and another download starts
then if I access my application I want to be able to see that 3 downloads are in process. If none of them finished yet.
Can do that? Only in memory, not storing anything on the disk?
I thought that the solution would be a some kind of server variable.
PHP doesn't have knowledge of previous processes. As soon has a php process is finished everything it knows about itself goes with it.
I can think of two options. Write knowledge about spawned processes to a file or database and use it to sync all your php request, (store the PID of each spawned process)
Or
Create an Daemon. The people behind PHP have worked hard to clean up PHP memory handling and such to make this more feasible. Take a look at their PEAR package - http://pear.php.net/package/System_Daemon
Off the top of my head, a quick architecture would compose of 3 peices
Part A) The web app that will take in request for downloads, and report back the progress of all request
Part B) You daemon, which accepts requests for downloads, spawns process, and will report back status of all spawned reqeust
Part C) The spawn request that will perform the download you need.
Anyone for shared memory?
Obviously you would have to have some sort of daemon, but you could use the inbuilt semaphore functions to easily have contact between each of the scripts. You need to be careful though because sometimes if you're not closing the memory block properly, you could risk ending up with no blocks left.
You can't store your own variables in $_SERVER. The best method would be to store your data in a database where and query/update it as required.
I am new to PHP. I understand I can use flock() to lock a file and avoid race conditions when two users reach the same php file adding content to the lockable file.
However, what happens if a php process crashes? What happens to the next user waiting for the lockable file? What happens if the server crashes (someone pulls the plug)? Is the lock automatically released? Will the file remain locked after rebooting the server?
To make it short, does PHP make sure such critical situations (i.e., lock not explicitly released) are handled properly? If not, how should one deal with these situations? How to recover from these?
Locks are handled by the OS. Therefore:
if a process crashes, all locks it held are released (along with any other kind of resource it held)
if the system crashes, locks are meaningless because they do not "carry over" to the next reboot
PHP does not need to do anything special other than use the OS-provided mechanism for locking files, so in general you are perfectly safe.
However, if your web server setup is such that each request is not handled by a new process then if one request is abnormally terminated (let's say a thread is aborted) the lock will persist and block all further requests for the lock, quickly resulting in a deadlocked web server. That's one of the many reasons that you really, really should not use setups that do not provide process-level isolation among requests (disclaimer: I am not a web server expert -- I could be wrong in the "should not" part, even though I doubt it).
I am developing a website that requires a lot background processes for the site to run. For example, a queue, a video encoder and a few other types of background processes. Currently I have these running as a PHP cli script that contains:
while (true) {
// some code
sleep($someAmountOfSeconds);
}
Ok these work fine and everything but I was thinking of setting these up as a deamon which will give them an actual process id that I can monitor, also I can run them int he background and not have a terminal open all the time.
I would like to know if there is a better way of handling these? I was also thinking about cron jobs but some of these processes need to loop every few seconds.
Any suggestions?
Creating a daemon which you can make calls to and ask questions would seem the sensible option. Depends on wether your hoster permits such things, especially if you're requiring it to do work every few seconds, then definately an OS based service/daemon would seem far more sensible than anything else.
You could create a daemon in PHP, but in my experience this is a lot of hard work and the result is unreliable due to PHP's memory management and error handling.
I had the same problem, I wanted to write my logic in PHP but have it daemonised by a stable program that could restart the PHP script if it failed and so I wrote The Fat Controller.
It's written in C, runs as a daemon and can run PHP scripts, or indeed anything. If the PHP script ends for whatever reason, The Fat Controller will restart it. This means you don't have to take care of daemonising or error recovery - it's all handled for you.
The Fat Controller can also do lots of other things such as parallel processing which is ideal for queue processing, you can read about some potential use cases here:
http://fat-controller.sourceforge.net/use-cases.html
I've done this for 5 years using PHP to run background tasks and its no different to doing in any other language. Just use CRON and lock files. The lock file will prevent multiple instances of your script running.
Also its important to monitor your code and one check I always do to prevent stale lock files from preventing scripts to run is to have second CRON job to check if if the lock file is older than a few minutes and if an instance of the PHP script is running, if not it then removes the lock file.
Using this technique allows you to set your CRON to run the script every minute without issues.
Use the System::Daemon module from PEAR.
One solution (that I really need to try myself, as I may need it) is to use cron, but get the process to loop for five mins or so. Then, get cron to kick it off every five minutes. As one dies, the next one should be finishing (or close to finishing).
Bear in mind that the two may overlap a bit, and so you need to ensure that this doesn't cause a clash (e.g. writing to the same video file). Some simple inter-process communication may be useful, even if it is just writing to a PID file in the temp directory.
This approach is a bit low-tech but helps avoid PHP hanging onto memory over the longer term - sort of in-built task restarts!