I need to use mutexes or semaphores in PHP, and it scares me. To clarify, I'm not scared of writing deadlock-free code that synchronizes properly or afraid of the perils of concurrent programming, but of how well PHP handles fringe cases.
Quick background: writing a credit card handler interface that sits between the users and the 3rd party credit card gateway. Need to prevent duplicate requests, and already have a system in place that works, but if the user hits submit (w/out JS enabled so I can't disable the button for them) milliseconds apart, a race condition ensues where my PHP script does not realize that a duplicate request has been made. Need a semaphore/mutex so I can ensure only one successful request goes through for each unique transaction.
I'm running PHP behind nginx via PHP-FPM with multiple processes on a multi-core Linux machine. I want to be sure that
semaphores are shared between all php-fpm processes and across all cores (i686 kernel).
php-fpm handles a PHP process crash while holding a mutex/semaphore and releases it accordingly.
php-fpm handles a session abort while holding a mutex/semaphore and releases it accordingly.
Yes, I know. Very basic questions, and it would be foolish to think that a proper solution doesn't exist for any other piece of software. But this is PHP, and it was most certainly not built with concurrency in mind, it crashes often (depending on which extensions you have loaded), and is in a volatile environment (PHP-FPM and on the web).
With regards to (1), I'm assuming if PHP is using the POSIX functions that both these conditions hold true on a SMP i686 machine. As for (2), I see from briefly skimming the docs that there is a parameter that decides this behavior (though why would one ever want PHP to NOT release a mutex is the session is killed I don't understand). But (3) is my main concern and I don't know if it's safe to assume that php-fpm properly handles all fringe cases for me. I (obviously) don't ever want a deadlock, but I'm not sure I can trust PHP to never leave my code in a state where it cannot obtain a mutex because the session that grabbed it was either gracefully or ungracefully terminated.
I have considered using a MySQL LOCK TABLES approach, but there's even more doubt there because while I trust the MySQL lock more than the PHP lock, I fear if PHP aborts a request (with*out* crashing) while holding the MySQL session lock, MySQL might keep the table locked (esp. because I can easily envision the code that would cause this to take place).
Honestly, I'd be most comfortable with a very basic C extension where I can see exactly what POSIX calls are being made and with what params to ensure the exact behavior I want.. but I don't look forward to writing that code.
Anyone have any concurrency-related best practices regarding PHP they'd like to share?
In fact, i think there is no need for a complex mutex / semaphore whatever solution.
Form keys stored in a PHP $_SESSION are all you need. As a nice side effect, this method also protects your form against CSRF attacks.
In PHP, sessions are locked by aquiring a POSIX flock() and PHP's session_start() waits until the user session is released. You just have to unset() the form key on the first valid request. The second request has to wait until the first one releases the session.
However, when running in a (not session or source ip based) load balancing scenario involving multiple hosts things are getting more complicated. For such a scenario, i'm sure you will find a valuable solution in this great paper: http://thwartedefforts.org/2006/11/11/race-conditions-with-ajax-and-php-sessions/
I reproduced your use case with the following demonstration. just throw this file onto your webserver and test it:
<?php
session_start();
if (isset($_REQUEST['do_stuff'])) {
// do stuff
if ($_REQUEST['uniquehash'] == $_SESSION['uniquehash']) {
echo "valid, doing stuff now ... "; flush();
// delete formkey from session
unset($_SESSION['uniquehash']);
// release session early - after committing the session data is read-only
session_write_close();
sleep(20);
echo "stuff done!";
}
else {
echo "nope, {$_REQUEST['uniquehash']} is invalid.";
}
}
else {
// show form with formkey
$_SESSION['uniquehash'] = md5("foo".microtime().rand(1,999999));
?>
<html>
<head><title>session race condition example</title></head>
<body>
<form method="POST">
<input type="hidden" name="PHPSESSID" value="<?=session_id()?>">
<input type="text" name="uniquehash"
value="<?= $_SESSION['uniquehash'] ?>">
<input type="submit" name="do_stuff" value="Do stuff!">
</form>
</body>
</html>
<?php } ?>
An interesting question you have but you don't have any data or code to show.
For 80% of cases the chances of anything nasty happening because of PHP itself are virtually zero if you follow the standard procedures and practices regarding stopping users from submitting forms multiple times, which applies to nearly every other setup, not just PHP.
If you're the 20% and your environment demands it, then one option is using message queues which I'm sure you are familiar with. Again, this idea is language agnostic. Nothing to do with languages. Its all about how data moves around.
you can store a random hash in an array within your session data as well as print that hash as a hidden form input value. when a request comes in, if the hidden hash value exists in your session array, you can delete the hash from the session and process the form, otherwise don't.
this should prevent duplicate form submits as well as help prevent csrf attacks.
If the problem only arises when hitting a button milliseconds apart, wouldn't a software debouncer work? Like saving the time of a button press in a session variable and not allowing any more for, say, a second? Just a before-my-morning-coffee idea. Cheers.
What I do in order to prevent session race condition in the code is after the last operation that stores data in session I use PHP function session_write_close() notice that if you are using PHP 7 you need to disable default output buffering in php.ini. If you have time consuming operations it'd be better to execute them after session_write_close() is invoked.
I hope it'll help someone, for me it saved my life :)
Related
Until recently I wasn't even aware it was possible for PHP to abort a script due to user disconnect.
Anyways, it could cause some real trouble for my database if the script could just abort midway through. Like if I'm inserting rows into multiple tables that partially depend on each other and only half of it gets done, I'd have to get real defensive with my programming.
Oddly enough, I found that ignore_user_abort defaults to false (at least on my installation), which seems like the sort of thing that could confuse the hell out of developers not aware of this possibility when something goes wrong because of it.
So to make things easier, shouldn't I just always set it to true? Or are there a good reason why it defaults to false?
Passing true to ignore_user_abort() as its only parameter will instruct PHP that the script is not to be terminated even if your end-user closes their browser, has navigated away to another site, or has clicked Stop. This is useful if you have some important processing to do and you do not want to stop it even if your users click cancel, such as running a payment through on a credit card. You can of course also pass false to ignore_user_abort(), thereby making PHP exit when the user closes the connection.
For handling shutdown tasks, register_shutdown_function() is perfect, as it allows you to register with PHP a function to be run when script execution ends.so it depends on your project
Anyways, it could cause some real trouble for my database if the script could just abort midway through. Like if I'm inserting rows into multiple tables that partially depend on each other and only half of it gets done, I'd have to get real defensive with my programming.
This can happen with or without ignore_user_abort, and should be addressed using database transactions.
So to make things easier, shouldn't I just always set it to true? Or are there a good reason why it defaults to false?
Since people are typically writing PHP code for the web, ignoring a user abort means your server would be sitting around doing useless work that's never going to be of value. Enough of them and you might find your server bogged down on abandoned, long-running HTTP requests.
If you've got lots of long-running requests that should ignore a user abort, a queue is a much better approach.
Hi I have had an issue where two visitors have hit a php function within a second of each other. This function sends them a one time use code from a pool of codes and it sent both people the same code.
What methods can I use in my script to check if someone else is already being processed and either delay or wait for the other person to finish?
I know this seems a really general question its hard to explain what I mean! Hopefully someone can help!
What methods can I use in my script to check if someone else is already being processed and either delay or wait for the other person to finish?
That would be what we call a "mutex", short for mutually exclusive.
Notice that without knowing how your PHP is run on your server, it's hard to know whether PHP's built-in mutex routines will work. PHP is a bad language when it comes to multithreading.
If your pool of codes lives in the database you could use transactions and lock tables for reading when one of the requests is trying to obtain the code. Wherever the data are, you will have to introduce some way locking or queuing requests to deal with concurrent requests.
This will be a newbie question but I'm learning php for one sole purpose (atm) to implement a solution--everything i've learned about php was learned in the last 18 hours.
The goal is adding indirection to my javascript get requests to allow for cross-domain accesses of another website. I also don't wish to throttle said website and want to put safeguards in place. I can't rely on them being in javascript because that can't account for other peers sending their requests.
So right now I have the following makeshift code, without any throttling measures:
<?php
$expires = 15;
if(!$_GET["target"])
exit();
$fn = md5($_GET["target"]);
if(!$_GET["cache"]) {
if(!array_search($fn, scandir("cache/")) ||
time() - filemtime($file) > $expires)
echo file_get_contents("cache/".$fn);
else
echo file_get_contents(file);
}
else if($_GET["data"]) {
file_put_contents("cache/".$fn, $_GET["data"]);
}
?>
It works perfectly, as far as I can tell (doesn't account for the improbable checksum clash). Now what I want to know is, and what my search queries in google refuse to procure for me, is how php actually launches and when it ends.
Obviously if I was running my own web server I'd have a bit more insight into this: I'm not, I have no shell access either.
Basically I'm trying to figure out whether I can control for when the script ends in the code, and whether every 'get' request to the php file would launch a new instance of the script or whether it can 'wake up' the same script. The reason being I wish to track whether, say, it already sent a request to 'target' within the last n milliseconds, and it seems a bit wasteful to dump the value to a savefile and then recover it, over and over, for something that doesn't need to be kept in memory for very long.
Every HTTP request starts a new instance of the interpreter; it's basically an implementation detail whether this is a whole new process, or a reuse of an existing one.
This generally pushes you towards good simple and scalable designs: you can run multiple server processes and threads and you won't get varying behaviour depending whether the request goes back to the same instance or not.
Loading a recently-touched file will be very fast on Linux, since it will come right from the cache. Don't worry about it.
Do worry about the fact that by directly appending request parameters to the path you have a serious security hole: people can get data=../../../etc/passwd and so on. Read http://www.php.net/manual/en/security.variables.php and so on. (In this particular example you're hashing the inputs before putting them in the path so it's not a practical problem but it is something to watch for.)
More generally, if you want to hold a cache across multiple requests the typical thing these days is to use memcached.
php is done from a per-connection basis. IE: each request for a php file is seen as a new instance. Each instance is ended, generally, when the connection is closed. You can however use sessions to save data between connections for a specific user
For basic use of sessions look into:
session_start()
$_SESSION
session_destroy()
I've been thinking for a while about the idea of allowing user to inject code on website and run it on a web server. It's not a new idea - many websites allow users to "test" their code online - such as http://ideone.com/.
For example: Let's say that we have a form containing <textarea> element in which that user enters his piece of code and then submits it. Server reads POST data, saves as PHP file and require()s it while being surrounded by ob_*() output buffering handlers. Captured output is presented to end user.
My question is: how to do it properly? Things that we should take into account [and possible solutions]:
security, user is not allowed to do anything evil,
php.ini's disable_functions
stability, user is not allowed to kill webserver submitting while(true){},
set_time_limit()
performance, server returns answer in an acceptable time,
control, user can do anything that matches previous points.
I would prefer PHP-oriented answers, but general approach is also welcome. Thank you in advance.
I would think about this problem one level higher, above and outside of the web server. Have a very unprivileged, jailed, chroot'ed standalone process for running these uploaded PHP scripts, then it doesn't matter what PHP functions are enabled or not, they will fail based on permissions and lack of access.
Have a parent process that monitors how long the above mentioned "worker" process has been running, if its been too long, kill it, and report back a timeout error to the end user.
Obviously there are many implementation details to work out as to how to run this system asynchronously outside of the browser request, but I think it would provide a pretty secure way to run your untrusted PHP scripts.
Wouldn't disabling functions in your server's ini file limit some of the functions of the application itself?
I think you have to do some hardcore sanitization on the POST data and strip "illegal" code there. I think doing that with the addition of the other methods you describe might make it work.
Just remember. Sanitize the everloving daylight out of that POST data.
I am battling with race condition protection in PHP.
My application is written in symfony 1.4 and PHP locks session data until a page completes processing. I have a long running (~10 second) login script and I want to display a progress bar showing the user what is being done while they wait. (I want to actually display what is being done and not use a standard [faux] loading bar.)
Whenever a script calls session_start(), PHP locks that user's session data until that script completes. This prevents my status check ajax calls from returning anything until the longer running script completes. (My question on why my ajax calls were not asynchronous is here.)
I have devised a way to do this but I want to make sure this way is secure enough for general purposes (i.e.- this is not a banking application).
My idea is:
On authentication of username & password (before the long login script starts), a cookie is set on the client computer with a unique identifier.
This same unique identifier is written to a file on the server along with the client IP address.
While the long login script runs, it will update that file with the status of the login process.
The ajax status check will ping the server on a special page that does not use session_start(). This page will get the cookie value and the client IP and check the server side file for any status updates.
Are there any glaringly obvious problems with this solution?
Again, from the security angle, even if someone hacked this all they would get is a number representing the state of the login progress.
I don't see anything inherently wrong with the approach that you are proposing.
But if your machine has APC installed you can use apc_store and apc_fetch to store your status in a sort of shared memory instead of writing to disk. Use something like apc_store(SID, 'login not started') to initialize and update the request state in memory, then apc_fetch(SID) to retrieve it on subsequent requests.
There are other shared memory systems, including Apache, or even a database connection might be simpler.
I have same problem and think the trick is session_write_close() that frees the session file.
Please see my https://github.com/jlaso/MySession repository and check if this can be apply to your particular question.