How to avoid file deadlocks when PHP process/server crashes?

How to avoid file deadlocks when PHP process/server crashes? - php

I am new to PHP. I understand I can use flock() to lock a file and avoid race conditions when two users reach the same php file adding content to the lockable file.
However, what happens if a php process crashes? What happens to the next user waiting for the lockable file? What happens if the server crashes (someone pulls the plug)? Is the lock automatically released? Will the file remain locked after rebooting the server?
To make it short, does PHP make sure such critical situations (i.e., lock not explicitly released) are handled properly? If not, how should one deal with these situations? How to recover from these?

Locks are handled by the OS. Therefore:
if a process crashes, all locks it held are released (along with any other kind of resource it held)
if the system crashes, locks are meaningless because they do not "carry over" to the next reboot
PHP does not need to do anything special other than use the OS-provided mechanism for locking files, so in general you are perfectly safe.
However, if your web server setup is such that each request is not handled by a new process then if one request is abnormally terminated (let's say a thread is aborted) the lock will persist and block all further requests for the lock, quickly resulting in a deadlocked web server. That's one of the many reasons that you really, really should not use setups that do not provide process-level isolation among requests (disclaimer: I am not a web server expert -- I could be wrong in the "should not" part, even though I doubt it).

Related

LiteSpeed / PHP server parallel requests

On our PHP website we have noticed, that if I request pages in 2 separate tabs, the second tab will never start loading until the first tab finishes processing the PHP side (uses LiteSpeed webserver). It's a cloud webhosting that should handle a notable number of visitors. We observed the same behavior on previous hosting with Apache.
Could the issue be in webserver configuration? Or could it be something in our application blocking the processing?

If your application uses PHP sessions (intentionally or not), this will generally be the behavior.
When an application calls session_start() it generates a session for you as a user, and will do a file lock on this session until the execution of the site is done, then the lock will be removed, and the next script waiting for the same session can then proceed.
If you're not going to write any data into the session, it's beneficial to call session_write_close() early on, since this will free the lock, and let other processes (Your other tab) allow being processed.
The locking happens because you obviously don't want two processes trying to write to the same file at once, possibly overriding one of the processes data that was written to the session.
Obviously it could also be something else in your application causing this, but you'd have to actually debug the code to figure out. You could possibly use strace on the command-line as well to try to gather more information.
PHP session locking is explained in depth in https://ma.ttias.be/php-session-locking-prevent-sessions-blocking-in-requests/ - it simply goes into a lot more detail than I did above.

configure Apache and MySQL for parallel and cancelable service to clients

We have a client server architecture with Angular on client side and Apache2 PHP PDO and MySQL on the server side. server side exposing an API to clients that gives them data to show.
Some observations :
some API calls can take very long to compute and return response.
server side seem to handle a single request per client at any given time (im seeing only one coresponding query thats being executed in mysql), that limit comes either from apache or from mysql since front-end sending requests in parallel for sure.
front end cancels requests that are not relevant anymore (data being fetched will not be visible)
seems like requests canceled by front end are not canceled in server side and continues to run anyway, i think even if they are queued they will still run when their turn arrives (even though they were cancelled on client side)
Need help to understand :
what exactly is the cause of not having all requests (or at least X>1 requests) run on parallel? can it be changed?
What configurations should i change in either apache or mysql to overcome this?
is there a way to make apache drop cancelled requests? at least those that are still queued and not started?
Thanks!
EDIT
Following #Markus AO comment (Thanks Markus!!!) this was session blocking related... wish i knew about that before!

OP has a number of tangled problems on the table. However I feel these are worthwhile concerns (having wrestled with them myself), so let's take this apart. For great justice; main screen turn on:
Solving Concurrent Request Problems
There are several possible problems and solutions with concurrent connections in a (L)AMP stack. Before looking at tuning Apache and MySQL, however, let me gloss a common "mystery" issue that creates concurrence problems; namely, a necessary evil called "PHP Session Locking".
PHP Session Blocking & Concurrent Requests
In a nutshell: When you use sessions in your application, after calling session_start(), PHP locks the session file stored at your session.save_path directory. This file lock will remain in place until the script ends, or session_write_close() is called. Result: Any subsequent calls by the same user will be queued, rather than concurrently processed, to ensure there's no session data corruption. (Imagine parallel scripts writing into the same $_SESSION!)
An easy way to demonstrate this is to create a long-running script; then call it in your browser; and then open a new tab, and call it again (or in fact, call any script sharing the same session cookie/ID). You'll see that the second call won't execute until the first one is concluded. This is a common cause of strange AJAX lags, especially with parallel AJAX requests from a single page. Processing will be consecutive instead of concurrent. Then, 10 calls at 0.3 sec each will take a total of 3 sec to conclude, and so on. We don't want that, do we!
You can remedy request blocking caused by PHP session lock by ensuring that:
Scripts using sessions should call session_write_close() once done storing session data. The session lock will be immediately released.
Scripts that don't require sessions shouldn't start sessions to begin with.
Scripts that need to only read session data: Using session_start() with ['read_and_close' => true] option will give you a read-only (non-persistent) $_SESSION variable without session locking. (Available since PHP 7.)
Options 1 and 3 will leave you with read access for the $_SESSION variable and release/avoid the session lock. Any changes made to $_SESSION after the session is closed will be silently discarded; no warnings/errors are displayed.
The session lock request blocking issue is only consequential for a single user (using the same session). It has no impact on multi-user concurrence. For further reading, please see:
SO: Session (Auto)-Start, Performance & Session Locking
SO: PHP & Sessions: Is there any way to disable PHP session locking?
In-Depth: PHP Session Locking: How To Prevent Sessions Blocking in PHP requests.
Apache & MySQL Concurrent Requests
Once upon a time, before realizing PHP was the culprit behind blocking/queuing my concurrent calls, I spent a small aeon in tweaking Apache and MySQL and wondering, what happen?
Apache 2.4 supports 150 concurrent requests by default; any further requests will queue up. There are several settings under the MPM/Multi-Processing Module that you can tune to support the desired level of concurrent connections. Please see:
MPM Docs
Worker Docs
Overview at Oxpedia
MySQL has options for max_connections (default 151) and max_user_connections (default unlimited). If your application sends a lot of concurrent requests per user, you'll want to ensure the global max connections is high enough to ensure a handful of users don't hog the entire DBMS.
Obviously, you'll further want to tune these settings in light of your server CPU/RAM specs. (The calculations for which are beyond this answer.) Your concurrency issues probably aren't caused by too many open TCP sockets, but hey, you never know...
Canceling Requests to Apache/PHP/MySQL
We don't have much to go on as far as your application's specific wiring, but I understand from the comments that as it stands, a user can cancel a request at the front-end, but no back-end action is taken. (Ie. any back-end response is simply ignored/discarded.)
"Is there a way to make Apache drop cancelled requests?" I'm assuming that your front-end sends the requests directly and without delay to Apache; and onward to PHP > MySQL > PHP > Apache. In that case, no, you can't really have Apache cancel the request that it's already received; or you could hit "stop", but chances are PHP and MySQL are already munching it away...
Holding a "Cancel Window"
However, you could program a "cancel window" lag into your front-end, where requests are only passed on to Apache after e.g. a 0.5-second sleep waiting for a possible cancel. This may or may not have a negative impact on the UX; may be worth implementing to save server resources if a significant portion of requests are canceled. This assumes an UI with Javascript. If you're getting direct HTTP calls to API, you could have a "sleepy proxy receiver" instead.
Using a "Cancel Controller"
How would one cancel PHP/MySQL processes? This is obviously only feasible/doable if calls to your API result in a processing time of any significant duration. If the back-end takes 0.28 sec to process, and user cancels after 0.3 seconds, then there isn't much left to cancel, is there.
However, if you do have scripts that may run for longer, say into a couple of seconds. You could always find relevant break-points in your code, where you have a "not-canceled" check or a kill/rollback routine. Basically, you'd have the following flow:
Front-end sends request with unique ID to main script
PHP script begins the long march for building a response
On cancel: Front-end re-sends the ID to a light-weight cancel controller
Cancel controller logs ID to temporary file/database/wherever
PHP checks at break-points if there's a cancel request for current process
On cancel, PHP executes a kill/rollback routine instead of further processing
This sort of "cancel watch" will obviously create some overhead, and as such you may want to only incorporate this into heavier scripts, to ensure you actually save some processing time in the big picture. Further, you'd only want at most a couple of breakpoints at significant junctions. For read requests, you could just kill the process; but for write requests, you'd probably want to have a graceful rollback to ensure data integrity in your system.
You can also cancel/kill a long-running MySQL thread, already initiated by PHP, with mysqli::kill. For this to make sense, you'd want to run it as MYSQLI_ASYNC, so PHP's around to pull the plug. PDO doesn't seem to have a native equivalent for either async queries or kill. Came across $pdo->query('KILL CONNECTION_ID()'); and PHP Asynchronous MySQL Query (see answer for PDO). Haven't tested these myself. Also see: Kill MySQL query on user abort
PHP Connection Handling
As an alternative to a controller that passes the cancel signal "from the side", you could look into PHP Connection Handling and poll for aborted connection status at your cancel check-points with connection_aborted(). (See "MySQL kill" link above for a code example.)
A CONNECTION_ABORTED state follows if a user clicks the "stop" button in their browser. PHP has a ignore_user_abort() setting, default "Off", which should abort a script on user-abort. (In my experience though, if I have a rogue script and session lock is on, I can't do anything until it times out, even when I hit "stop" in the browser. Go figure.)
If you have "ignore user abort" on false, ie. the PHP script terminates on user abort, be aware that this will be a wholly uncontrolled termination, unless you have register_shutdown_function() implemented. Even so, you'd have to flag check-points in your code for your shutdown function to be able to "rewind the clock" from the termination point onward. Also note this caveat:
PHP will not detect that the user has aborted the connection until an attempt is made to send information to the client. Simply using an echo statement does not guarantee that information is sent, see flush(). ~ PHP Manual on ignore_user_abort
I have no experience with implementing "user abort" over AJAX/JS. For a starting point, see: Abort AJAX Request and Cancel an HTTP fetch() request. Not sure how/if they register with PHP. If you decide to travel down this road, please return and update us with your code / research!

PHP exec() Throttling

I have a webpage that, if some data is missing, I run an exec command to gather the data I need. It all works great, the issue I want to try to avoid is if someone spams my site with the wrong kinds of urls and runs this call thousands of times.
I am removing potential items in the call, but I would really like it if I could limit the call to, say once every 5 seconds, no matter who makes the call. IE, the first call goes through fine, but if someone else tries during that time it would not allow it for the set amount of time.
I wouldn't mind adding a tar to it later, ie if the next call is under 5 seconds increase to 10 seconds, etc, but for now I just want to add a safety throttle on the call.
Thanks

What you are looking for is called a "Lock" or a "Mutex"
In computer science, a lock or mutex (from mutual exclusion) is a synchronization mechanism for enforcing limits on access to a resource in an environment where there are many threads of execution. A lock is designed to enforce a mutual exclusion concurrency control policy.
There are a couple of libraries out there that take care of this for you and I would encourage you to use those, if this is an option.
Symfony Lock
php-lock/lock
PHP has/had a mutex implementation but I believe (if I understand things correctly) it is only for CLI applications because it came from the optional pthreads extension.
If you want to roll your own, you could literally just use a file on disk. If that file exists, you know that a process is running and other requests should abort/do something else. Once the running process is complete, you just need to delete that special file. You should also register an error handler and/or shutdown function who guarantees that the file is deleted, just in case your exec logic throws a fatal exception.
The actual file isn't important, it could just be /tmp/my-app-lock but it is common in *nux to put these in /var/run or similar, if your process has access to it. You could also just put this in your website's folder, too.
Instead of a file, you could also use a database with the same idea. If a value exists, it should assume to be locked. If it isn't, the caller should create the value in a transaction (to guarantee that it exists). Maybe even uses a shared key but a unique per-process random value that double-checks that it actually acquired the lock. This is actually built into the Symfony version, too, as one of their locks.
edit
Here's an example using Symfony's lock component:
use Symfony\Component\Lock\LockFactory;
use Symfony\Component\Lock\Store\FlockStore;
// the argument is the path of the directory where the locks are created
// if none is given, sys_get_temp_dir() is used internally.
$store = new FlockStore('/var/stores');
$factory = new LockFactory($store);
$lock = $factory->createLock('pdf-invoice-generation');
if ($lock->acquire()) {
// The resource "pdf-invoice-generation" is locked.
// You can compute and generate invoice safely here.
$lock->release();
}
Also, Symfony's lock component doesn't require the entire Symfony framework, it is standalone.

Reliable PHP script reentrant lock

I have to make sure a certain PHP script (started by a web request) does not run more then once simultaneously.
With binaries, it is quite easy to check if a process of a certain binary is already around.
However, a PHP script may be run by several pathways, eg. CGI, FCGI, inside webserver modules etc. so I cannot use system commands to find it.
So how to reliable check if another instance of a certain script is currently running?

The exact same strategy is used as one would chose with local applications:
The process manages a "lock file".
You define a static location in the file system. Upon script startup you check if a lock file exists in that location, if so you bail out. If not you first create that lock file, then proceed. During tear down of your script you delete that lock file again. Such lock file is a simple passive file, only its existence is of interest, often not its content. That is a standard procedure.
You can win extra candy points if you use the lock file not only as a passive semaphore, but if you store the process id of the generating process in it. That allows subsequent attempts to verify of that process actually still exists or has crashed in the mean time. That makes sense because such a crash would leave a stale lock file, thus create a dead lock.
To work around the issue discussed in the comments which correctly states that in some of the scenarios in which php scripts are used in a wen environment a process ID by itself may not be enough to reliably test if a given task has been successfully and completely processed one could use a slightly modified setup:
The incoming request does not directly trigger to task performing php script itself, but merely a wrapper script. That wrapper manages the lock file whilst delegating the actual task to be performed into a sub request to the http server. That allows the controlling wrapper script to use the additional information of the request state. If the actual task performing php script really crashes without prior notice, then the requesting wrapper knows about that: each request is terminated with a specific http status code which allows to decide if the task performing request has terminated normally or not. That setup should be reliable enough for most purposes. The chances of the trivial wrapper script crashing or being terminated falls into the area of a system failure which is something no locking strategy can reliably handle.

As PHP does not always provide a reliable way of file locking (it depends on how the script is run, eg. CGI, FCGI, server modules and the configuration), some other environment for locking should be used.
The PHP script can for example call another PHP interpreter in it's CLI variant. That would provide a unique PID that could be checked for locking. The PID should be stored to some lock file then which can be checked for stale lock by querying if a process using the PID is still around.
Maybe it is also possible to do all tasks needing the lock inside a shell script. Shell scripts also provide a unique PID and release it reliable after exit. A shell script may also use a unique filename that can be used to check if it is still running.
Also semaphores (http://php.net/manual/de/book.sem.php) could be used, which are explicitely managed by the PHP interpreter to reflect a scripts lifetime. They seem to work quite well, however there is not much fuzz around about how reliable they are in case of premature script death.
Also keep in mind that external processes launched by a PHP script may continue executing even if the script ends. For example, a user abort on FCGI releases passthru processes, which carry on working despite the client connection is closed. They may be killed later if enough output accumulated or not at all.
So such external processes have to locked as well, which can't be done by the PHP-accquired semaphores alone.

Anynchronous Logging strategy for Python & PHP

Here's the situation: We have a bunch of python scripts continuously doing stuff and ultimately writing data in mysql, and we need a log to analyse the error rate and script performance.
We also have php front-end that interacts with the mysql data and we also need to log the user actions so that we can analyse their behaviour, and compute some scoring functions.
So we thought of having a mysql table table for each case (one for "python scripts" log and one for "user actions" log).
Ideally, we would be writing to thsese log tables asynchronously, for performance and low-latency reasons. Is there a way to do so in Python (we are using django ORM) and in PHP (we are using Yii Framework) ?
Are there any better approaches for solving this problem ?
Update :
for the user actions, (Web UI), we are now considering loading the Apache Log into mysql with relevant session info automatically through simple Apache configuration

There are (AFAIK) only two ways to do anything a-synchronously in PHP:
Fork the process (requires pcntl_fork)
exec() a process and release it by (assuming *nix) appending > /dev/null & to the end of the command string.
Both of these approaches result in a new process being created, albeit temporarily, so whether this would afford any performance increase is debatable and depends highly on your server environment - I suspect it would make things worse, not better. If your database is very heavily loaded (and therefore the thing that is slowing you down) you might get a faster result from dumping the log messages to file, and having a daemon script that crawls for thing to enter into the DB - but again, whether this would help is debatable.
Python supports multi-threading which makes life a lot easier.

You could open a raw Unix or network socket to a logging service that caches messages and writes them to disk or database asynchronously. If your PHP and Python processes are long-running and generate many messages per execution, keeping an open socket would be more performant than making separate HTTP/database requests synchronously.
You'd have to measure it compared to appending to a file (open once then lock, seek, write, and unlock while running and close at end) to see which is faster.

We Keep Coding

PHP, A popular general-purpose scripting language that is especially suited to web development.