PHP mutual exclusion (mutex) - php

Read some texts about locking in PHP.
They all, mainly, direct to http://php.net/manual/en/function.flock.php .
This page talks about opening a file on the hard-disk!!
Is it really so? I mean, this makes locking really expensive - it means each time I want to lock I'll have to access the hard-disk )=
Can anymore comfort me with a delightful news?
Edit:
Due to some replies I've got here, I want to ask this;
My script will run only by one thread, or several? Because if it's by one then I obviously don't need a mutex. Is there a concise answer?
What exactly I'm trying to do
Asked by ircmaxell.
This is the story:
I have two ftp servers. I want to be able to show at my website how many online users are online.
So, I thought that these ftp servers will "POST" their stats to a certain PHP script page. Let's assume that the URL of this page is "http://mydomain.com/update.php".
On the website's main page ("http://mydomain.com/index.php") I will display the cumulative statistics (online users).
That's it.
My problem is that I'm not sure if, when one ftp server updates his stats while another does it too, the info will get mixed.
Like when multi-threading; Two threads increase some "int" variable at the same time. It will not happen as expected unless you sync between them.
So, will I have a problem? Yes, no, maybe?
Possible solution
Thinking hard about it all day long, I have an idea here and I want you to give your opinion.
As said these ftp servers will post their stats, once every 60sec.
I'm thinking about having this file "stats.php".
It will be included at the updating script that the ftp servers go to ("update.php") and at the "index.php" page where visitors see how many users are online.
Now, when an ftp server updates, the script at "update.php" will modify "stats.php" with the new cumulative statistics.
First it will read the stats included at "stats.php", then accumulate, and then rewrite that file.
If I'm not mistaken PHP will detect that the file ("stats.php") is changed and load the new one. Correct?

Well, most of PHP runs in a different process space (there are few threading implementations). The easy one is flock. It's guaranteed to work on all platforms.
However, if you compile in support, you can use a few other things such as the Semaphore extension. (Compile PHP with --enable-sysvsem). Then, you can do something like (note, sem_acquire() should block. But if it can't for some reason, it will return false):
$sem = sem_get(1234, 1);
if (sem_acquire($sem)) {
//successful lock, go ahead
sem_release($sem);
} else {
//Something went wrong...
}
The other options that you have, are MySQL user level locks GET_LOCK('name', 'timeout'), or creating your own using something like APC or XCache (Note, this wouldn't be a true lock, since race conditions could be created where someone else gets a lock between your check and acceptance of the lock).
Edit: To answer your edited question:
It all depends on your server configuration. PHP May be run multi-threaded (where each request is served by a different thread), or it may be run multi-process (where each request is served by a different process). It all depends on your server configuration...
It's VERY rare that PHP will serve all requests serially, with only one process (and one thread) serving all requests. If you're using CGI, then it's multi-process by default. If you're using FastCGI, it's likely multi-process and multi-thread. If you're using mod_php with Apache, then it depends on the worker type:
mpm_worker will be both multi-process and multi-thread, with the number of processes dictated by the ServerLimit variable.
prefork will be multi-process
perchild will be multi-process as well
Edit: To answer your second edited question:
It's quite easy. Store it in a file:
function readStatus() {
$f = fopen('/path/to/myfile', 'r');
if (!$f) return false;
if (flock($f, LOCK_SH)) {
$ret = fread($f, 8192);
flock($f, LOCK_UN);
fclose($f);
return $ret;
}
fclose($f);
return false;
}
function updateStatus($new) {
$f = fopen('/path/to/myfile', 'w');
if (!$f) return false;
if (flock($f, LOCK_EX)) {
ftruncate($f, 0);
fwrite($f, $new);
flock($f, LOCK_UN);
fclose($f);
return true;
}
fclose($f);
return false;
}
function incrementStatus() {
$f = fopen('/path/to/myfile', 'rw');
if (!$f) return false;
if (flock($f, LOCK_EX)) {
$current = fread($f, 8192);
$current++;
ftruncate($f, 0);
fwrite($f, $current);
flock($f, LOCK_UN);
fclose($f);
return true;
}
fclose($f);
return false;
}

The question is: Where will you store the stats that the FTP servers are pushing with POST to your update.php file? If it's a local file, than ircmaxell in the second post has answered you. You can do this with a mutex as well - the semaphore functions. Another solution is to use MySQL MyISAM table to store the stats and use something like update info_table set value = value + 1. It should lock the table, and serialize your requests, and you will have no problems.

I recently created my own simple implementation of a mutex-like mechanism using the flock function of PHP. Of course the code below can be improved, but it is working for most use cases.
function mutex_lock($id, $wait=10)
{
$resource = fopen(storage_path("app/".$id.".lck"),"w");
$lock = false;
for($i = 0; $i < $wait && !($lock = flock($resource,LOCK_EX|LOCK_NB)); $i++)
{
sleep(1);
}
if(!$lock)
{
trigger_error("Not able to create a lock in $wait seconds");
}
return $resource;
}
function mutex_unlock($id, $resource)
{
$result = flock($resource,LOCK_UN);
fclose($resource);
#unlink(storage_path("app/".$id.".lck"));
return $result;
}

Yes that's true, as PHP is run by Apache, and Apache can organize the threads of execution as it deems the best (see the various worker model). So if you want to access a resource one at a time, you either lock to a file (which is good if you are dealing with cron jobs for example), or you rely on database transaction mechanism, ACID features, and database resources locking, if you are dealing with data.

PHP doesn't support multithreading, every request (and therefore every PHP script) will be executed in only one thread (or even process, depending on the way you run PHP).

Related

PHP cURL; Wait for API status change before continuing [duplicate]

I work on a somewhat large web application, and the backend is mostly in PHP. There are several places in the code where I need to complete some task, but I don't want to make the user wait for the result. For example, when creating a new account, I need to send them a welcome email. But when they hit the 'Finish Registration' button, I don't want to make them wait until the email is actually sent, I just want to start the process, and return a message to the user right away.
Up until now, in some places I've been using what feels like a hack with exec(). Basically doing things like:
exec("doTask.php $arg1 $arg2 $arg3 >/dev/null 2>&1 &");
Which appears to work, but I'm wondering if there's a better way. I'm considering writing a system which queues up tasks in a MySQL table, and a separate long-running PHP script that queries that table once a second, and executes any new tasks it finds. This would also have the advantage of letting me split the tasks among several worker machines in the future if I needed to.
Am I re-inventing the wheel? Is there a better solution than the exec() hack or the MySQL queue?
I've used the queuing approach, and it works well as you can defer that processing until your server load is idle, letting you manage your load quite effectively if you can partition off "tasks which aren't urgent" easily.
Rolling your own isn't too tricky, here's a few other options to check out:
GearMan - this answer was written in 2009, and since then GearMan looks a popular option, see comments below.
ActiveMQ if you want a full blown open source message queue.
ZeroMQ - this is a pretty cool socket library which makes it easy to write distributed code without having to worry too much about the socket programming itself. You could use it for message queuing on a single host - you would simply have your webapp push something to a queue that a continuously running console app would consume at the next suitable opportunity
beanstalkd - only found this one while writing this answer, but looks interesting
dropr is a PHP based message queue project, but hasn't been actively maintained since Sep 2010
php-enqueue is a recently (2017) maintained wrapper around a variety of queue systems
Finally, a blog post about using memcached for message queuing
Another, perhaps simpler, approach is to use ignore_user_abort - once you've sent the page to the user, you can do your final processing without fear of premature termination, though this does have the effect of appearing to prolong the page load from the user perspective.
When you just want to execute one or several HTTP requests without having to wait for the response, there is a simple PHP solution, as well.
In the calling script:
$socketcon = fsockopen($host, 80, $errno, $errstr, 10);
if($socketcon) {
$socketdata = "GET $remote_house/script.php?parameters=... HTTP 1.1\r\nHost: $host\r\nConnection: Close\r\n\r\n";
fwrite($socketcon, $socketdata);
fclose($socketcon);
}
// repeat this with different parameters as often as you like
On the called script.php, you can invoke these PHP functions in the first lines:
ignore_user_abort(true);
set_time_limit(0);
This causes the script to continue running without time limit when the HTTP connection is closed.
Another way to fork processes is via curl. You can set up your internal tasks as a webservice. For example:
http://domain/tasks/t1
http://domain/tasks/t2
Then in your user accessed scripts make calls to the service:
$service->addTask('t1', $data); // post data to URL via curl
Your service can keep track of the queue of tasks with mysql or whatever you like the point is: it's all wrapped up within the service and your script is just consuming URLs. This frees you up to move the service to another machine/server if necessary (ie easily scalable).
Adding http authorization or a custom authorization scheme (like Amazon's web services) lets you open up your tasks to be consumed by other people/services (if you want) and you could take it further and add a monitoring service on top to keep track of queue and task status.
http://domain/queue?task=t1
http://domain/queue?task=t2
http://domain/queue/t1/100931
It does take a bit of set-up work but there are a lot of benefits.
If it just a question of providing expensive tasks, in case of php-fpm is supported, why not to use fastcgi_finish_request() function?
This function flushes all response data to the client and finishes the request. This allows for time consuming tasks to be performed without leaving the connection to the client open.
You don't really use asynchronicity in this way:
Make all your main code first.
Execute fastcgi_finish_request().
Make all heavy stuff.
Once again php-fpm is needed.
I've used Beanstalkd for one project, and planned to again. I've found it to be an excellent way to run asynchronous processes.
A couple of things I've done with it are:
Image resizing - and with a lightly loaded queue passing off to a CLI-based PHP script, resizing large (2mb+) images worked just fine, but trying to resize the same images within a mod_php instance was regularly running into memory-space issues (I limited the PHP process to 32MB, and the resizing took more than that)
near-future checks - beanstalkd has delays available to it (make this job available to run only after X seconds) - so I can fire off 5 or 10 checks for an event, a little later in time
I wrote a Zend-Framework based system to decode a 'nice' url, so for example, to resize an image it would call QueueTask('/image/resize/filename/example.jpg'). The URL was first decoded to an array(module,controller,action,parameters), and then converted to JSON for injection to the queue itself.
A long running cli script then picked up the job from the queue, ran it (via Zend_Router_Simple), and if required, put information into memcached for the website PHP to pick up as required when it was done.
One wrinkle I did also put in was that the cli-script only ran for 50 loops before restarting, but if it did want to restart as planned, it would do so immediately (being run via a bash-script). If there was a problem and I did exit(0) (the default value for exit; or die();) it would first pause for a couple of seconds.
Here is a simple class I coded for my web application. It allows for forking PHP scripts and other scripts. Works on UNIX and Windows.
class BackgroundProcess {
static function open($exec, $cwd = null) {
if (!is_string($cwd)) {
$cwd = #getcwd();
}
#chdir($cwd);
if (strtoupper(substr(PHP_OS, 0, 3)) == 'WIN') {
$WshShell = new COM("WScript.Shell");
$WshShell->CurrentDirectory = str_replace('/', '\\', $cwd);
$WshShell->Run($exec, 0, false);
} else {
exec($exec . " > /dev/null 2>&1 &");
}
}
static function fork($phpScript, $phpExec = null) {
$cwd = dirname($phpScript);
#putenv("PHP_FORCECLI=true");
if (!is_string($phpExec) || !file_exists($phpExec)) {
if (strtoupper(substr(PHP_OS, 0, 3)) == 'WIN') {
$phpExec = str_replace('/', '\\', dirname(ini_get('extension_dir'))) . '\php.exe';
if (#file_exists($phpExec)) {
BackgroundProcess::open(escapeshellarg($phpExec) . " " . escapeshellarg($phpScript), $cwd);
}
} else {
$phpExec = exec("which php-cli");
if ($phpExec[0] != '/') {
$phpExec = exec("which php");
}
if ($phpExec[0] == '/') {
BackgroundProcess::open(escapeshellarg($phpExec) . " " . escapeshellarg($phpScript), $cwd);
}
}
} else {
if (strtoupper(substr(PHP_OS, 0, 3)) == 'WIN') {
$phpExec = str_replace('/', '\\', $phpExec);
}
BackgroundProcess::open(escapeshellarg($phpExec) . " " . escapeshellarg($phpScript), $cwd);
}
}
}
PHP HAS multithreading, its just not enabled by default, there is an extension called pthreads which does exactly that.
You'll need php compiled with ZTS though. (Thread Safe)
Links:
Examples
Another tutorial
pthreads PECL Extension
UPDATE: since PHP 7.2 parallel extension comes into play
Tutorial/Example
reference manual
This is the same method I have been using for a couple of years now and I haven't seen or found anything better. As people have said, PHP is single threaded, so there isn't much else you can do.
I have actually added one extra level to this and that's getting and storing the process id. This allows me to redirect to another page and have the user sit on that page, using AJAX to check if the process is complete (process id no longer exists). This is useful for cases where the length of the script would cause the browser to timeout, but the user needs to wait for that script to complete before the next step. (In my case it was processing large ZIP files with CSV like files that add up to 30 000 records to the database after which the user needs to confirm some information.)
I have also used a similar process for report generation. I'm not sure I'd use "background processing" for something such as an email, unless there is a real problem with a slow SMTP. Instead I might use a table as a queue and then have a process that runs every minute to send the emails within the queue. You would need to be warry of sending emails twice or other similar problems. I would consider a similar queueing process for other tasks as well.
It's a great idea to use cURL as suggested by rojoca.
Here is an example. You can monitor text.txt while the script is running in background:
<?php
function doCurl($begin)
{
echo "Do curl<br />\n";
$url = 'http://'.$_SERVER['SERVER_NAME'].$_SERVER['REQUEST_URI'];
$url = preg_replace('/\?.*/', '', $url);
$url .= '?begin='.$begin;
echo 'URL: '.$url.'<br>';
$ch = curl_init();
curl_setopt($ch, CURLOPT_URL, $url);
curl_setopt($ch, CURLOPT_RETURNTRANSFER, true);
$result = curl_exec($ch);
echo 'Result: '.$result.'<br>';
curl_close($ch);
}
if (empty($_GET['begin'])) {
doCurl(1);
}
else {
while (ob_get_level())
ob_end_clean();
header('Connection: close');
ignore_user_abort();
ob_start();
echo 'Connection Closed';
$size = ob_get_length();
header("Content-Length: $size");
ob_end_flush();
flush();
$begin = $_GET['begin'];
$fp = fopen("text.txt", "w");
fprintf($fp, "begin: %d\n", $begin);
for ($i = 0; $i < 15; $i++) {
sleep(1);
fprintf($fp, "i: %d\n", $i);
}
fclose($fp);
if ($begin < 10)
doCurl($begin + 1);
}
?>
There is a PHP extension, called Swoole.
Although it might not be enabled, it is available on my hosting for being enabled at click of a button.
Worth checking it out. I haven't had time to use it yet, as I was searching here for info, when I stumbled across it and thought it worth sharing.
Unfortunately PHP does not have any kind of native threading capabilities. So I think in this case you have no choice but to use some kind of custom code to do what you want to do.
If you search around the net for PHP threading stuff, some people have come up with ways to simulate threads on PHP.
If you set the Content-Length HTTP header in your "Thank You For Registering" response, then the browser should close the connection after the specified number of bytes are received. This leaves the server side process running (assuming that ignore_user_abort is set) so it can finish working without making the end user wait.
Of course you will need to calculate the size of your response content before rendering the headers, but that's pretty easy for short responses (write output to a string, call strlen(), call header(), render string).
This approach has the advantage of not forcing you to manage a "front end" queue, and although you may need to do some work on the back end to prevent racing HTTP child processes from stepping on each other, that's something you needed to do already, anyway.
If you don't want the full blown ActiveMQ, I recommend to consider RabbitMQ. RabbitMQ is lightweight messaging that uses the AMQP standard.
I recommend to also look into php-amqplib - a popular AMQP client library to access AMQP based message brokers.
Spawning new processes on the server using exec() or directly on another server using curl doesn't scale all that well at all, if we go for exec you are basically filling your server with long running processes which can be handled by other non web facing servers, and using curl ties up another server unless you build in some sort of load balancing.
I have used Gearman in a few situations and I find it better for this sort of use case. I can use a single job queue server to basically handle queuing of all the jobs needing to be done by the server and spin up worker servers, each of which can run as many instances of the worker process as needed, and scale up the number of worker servers as needed and spin them down when not needed. It also let's me shut down the worker processes entirely when needed and queues the jobs up until the workers come back online.
i think you should try this technique it will help to call as many as pages you like all pages will run at once independently without waiting for each page response as asynchronous.
cornjobpage.php //mainpage
<?php
post_async("http://localhost/projectname/testpage.php", "Keywordname=testValue");
//post_async("http://localhost/projectname/testpage.php", "Keywordname=testValue2");
//post_async("http://localhost/projectname/otherpage.php", "Keywordname=anyValue");
//call as many as pages you like all pages will run at once independently without waiting for each page response as asynchronous.
?>
<?php
/*
* Executes a PHP page asynchronously so the current page does not have to wait for it to finish running.
*
*/
function post_async($url,$params)
{
$post_string = $params;
$parts=parse_url($url);
$fp = fsockopen($parts['host'],
isset($parts['port'])?$parts['port']:80,
$errno, $errstr, 30);
$out = "GET ".$parts['path']."?$post_string"." HTTP/1.1\r\n";//you can use POST instead of GET if you like
$out.= "Host: ".$parts['host']."\r\n";
$out.= "Content-Type: application/x-www-form-urlencoded\r\n";
$out.= "Content-Length: ".strlen($post_string)."\r\n";
$out.= "Connection: Close\r\n\r\n";
fwrite($fp, $out);
fclose($fp);
}
?>
testpage.php
<?
echo $_REQUEST["Keywordname"];//case1 Output > testValue
?>
PS:if you want to send url parameters as loop then follow this answer :https://stackoverflow.com/a/41225209/6295712
PHP is a single-threaded language, so there is no official way to start an asynchronous process with it other than using exec or popen. There is a blog post about that here. Your idea for a queue in MySQL is a good idea as well.
Your specific requirement here is for sending an email to the user. I'm curious as to why you are trying to do that asynchronously since sending an email is a pretty trivial and quick task to perform. I suppose if you are sending tons of email and your ISP is blocking you on suspicion of spamming, that might be one reason to queue, but other than that I can't think of any reason to do it this way.

PHP virtual resource access serialization with automatic release

I would like to implement a quick and efficient serialization mechanism between PHP requests for virtual named resources that would unlock when the script is finished, either normally or due to error. I had eaccelerator_lock() and its corresponding eaccelerator_unlock() in the past, but eaccelerator doesn't implement that function anymore. What I want to do is something like:
lock_function("my-named-resource");
..
my_might_abort_abruptly_function();
..
unlock_function("my-named-resource");
Other PHP scripts calling lock_function() with the exact same parameter should block until this script calls unlock_function() or aborts. The resource name is unknown before the processing (it's a generated string) and can't be constrained to a small set (i.e., the locking mechanism should have good granularity). I would like to avoid try/catch code, because there are circunstances in which catch is not called. Also, any mechanism depending on manual usleep() spinning (instead of native OS blocking) should be avoided.
Mine is the only running application in the server. The system is a CentOS 6 Linux with PHP 5.3.3, Apache 2.2.15 and I have full control over it.
I explored the following alternatives:
semaphores: they are not well implemented in PHP; Linux allows arrays of thousands, while PHP only allocates one per id.
flock(): my resources are virtual, and flock() would only lock whole/real/existing files; I'd need to pre-create thousands of files and choose one to lock with a hash function. The granularity would depend on the number of files.
dio_fcntl(): I could attempt to reproduce the idea of flock() with a single file and fcntl(F_SETLK). This would have the advantage of a good granularity without the need of many files; the file could even be zero bytes long! (F_SETLK can lock beyond the end of the file). Alas! The problem is that nowhere in the documentation says that dio_fcntl() will release resources when the script terminates.
database lock: I could implement some key locking in a database with good key locking granularity, althought this is too database dependent. It would not be so quick either.
implement my own PHP extension: I'd really like to avoid that path.
The thing is, I think someone somewhere should have thought of this before me. What would be a good choice? Is there another solution I'm not seeing?
Thanks in advance. Guillermo.
You can always go old school and touch a file when your script starts and remove it when complete.
You could register_shutdown_function to remove the file.
The existence or absence of the file would indicate the locked state of the resource.
It turns out dio_open() does release the resources upon script termination. So I ended writing up the following functions:
$lockfile = $writable_dir."/serialized.lock";
function serialize_access($name)
{
$f = serialize_openfile();
if( !$f ) return false;
$h = serialize_gethash($name);
return dio_fcntl($f, F_SETLKW, array("whence"=>SEEK_SET,"start"=>$h, "length"=>1, "type"=>F_WRLCK)) >= 0;
}
function serialize_release($name)
{
$f = serialize_openfile();
if( !$f ) return false;
$h = serialize_gethash($name);
#dio_fcntl($f, F_SETLK, array("whence"=>SEEK_SET,"start"=>$h, "length"=>1, "type"=>F_UNLCK));
}
function serialize_gethash($name)
{
// Very good granularity (2^31)
return crc32($name) & 0x7fffffff;
}
function serialize_openfile()
{
global $lockfile, $serialize_file;
if( !isset($serialize_file) )
{
$serialize_file = false;
if( extension_loaded("dio") )
{
$serialize_file = #dio_open($lockfile,O_RDWR);
if( $serialize_file )
{
// Do not attempt to create the file with dio_open()
// because the file permissions get all mangled.
$prev = umask(0);
$temp = fopen($lockfile,"a");
if( $temp )
{
$serialize_file = #dio_open($lockfile,O_RDWR);
fclose($temp);
}
umask($prev);
}
}
}
return $serialize_file;
}
It seems to work very well.
implement my own PHP extension
You might want to check ninja-mutex library which does exactly what you want

How do I retry a PHP flock() for a period of time?

I need to open a log file for writing. Trouble is, many things may do this at the same time, and I don't want conflicts. Each write will be a single line, generally about 150 bytes (and always less than 1K), and getting things in chronological order is not strictly required.
I think what I want is to attempt to flock(), and if it fails, keep trying for a few seconds. If a lock can't be established after a number of tries, then give up.
$fh=fopen($logfile, "a");
if (flock($fh, LOCK_EX|LOCK_NB)) {
$locked=TRUE;
} else {
$locked=FALSE;
// Retry lock every 0.1 seconds for 3 seconds...
$x=0; while($x++ < 30) {
usleep(100000);
if (flock($fh, LOCK_EX|LOCK_NB)) {
$locked=TRUE;
break;
}
}
}
if ($locked) {
if (fwrite($fh, strftime("[%Y-%m-%d %T] ") . $logdata . "\n")) {
print "Success.\n";
} else {
print "Fail.\n";
}
flock($fh, LOCK_UN)
} else {
print "Lock failed.\n";
}
I have two questions, one general and one specific. First, aside from implementing the same solution in different ways (do...while, etc), is there a better general strategy for handling this kind of problem, that runs solely in PHP? Second, is there a better way of implementing this in PHP? (Yes, I separated these because I'm really interested in the strategy part.)
One alternative I've considered is to use syslog(), but the PHP code may need to run on platforms where system-level administration (i.e. adding things to /etc/syslog.conf) may not be available as an option.
UPDATE: added |LOCK_NB to the code above, per randy's suggestion.
In my long experience in PHP making logs (under linux!) I've never experienced problems of conflicts (even with hundreds of simultaneus and concurrent writes). So simple skip lock maagement:
$fh=fopen($logfile, "a");
if (fwrite($fh, strftime("[%Y-%m-%d %T] ") . $logdata . "\n")) {
print "Success.\n";
} else {
print "Fail.\n";
}
fclose($fh);
About this strategy the file logging (with or without lock) is not the best solution, because every fopen with the "a" imply a seek syscall to set the cursor at the end of the file. syslog keeping the file open avoid this overhead.
Of course the overhead become significative (in performance) with "big" files, an easy solution is create log file with date (or date-time) in the name.
ADD
apache package include a tester program: ab, allowing doing query in concurrency, you can test my thesis stressing your server with 1000000 of query done by 10to1000 threads.
ADD - following the comment
No, it'not an impossible task.
I found a note from http://php.net/manual/en/function.fwrite.php
If handle was fopen()ed in append mode, fwrite()s are atomic (unless
the size of string exceeds the filesystem's block size, on some
platforms, and as long as the file is on a local filesystem). That is,
there is no need to flock() a resource before calling fwrite(); all of
the data will be written without interruption.
to know how big is a block in bytes (usual 4k):
dumpe2fs /dev/sd_your_disk_partition |less -i
The "atomicity" of write is realized blocking other "agents" to write (when you see process in "D" status in "ps ax"), BUT PHP stream facility can solve this problem, see: *stream_set_blocking*.
This approach can introduce partial writes, so you will have to validate the integrity of your records.
In any case a fwrite (network or file) is susceptible to block/failure regardless of use of flock. IMHO flock introduce only overhead.
About your original questions AND your goal (trying to implement corporate policy in a highly risk-averse environment), where even a fwrite can be problematic, I can only imagine a simple solution: use a DB
most of complexity is out of PHP and written in C!
total control of the flow of the operation
hi level of concurrency

How do I restore this script after a hardware failure?

I know this is a bit generic, but I'm sure you'll understand my explanation. Here is the situation:
The following code is executed every 10 minutes. Variable "var_x" is always read/written to an external text file when its refereed to.
if ( var_x != 1 )
{
var_x = 1;
//
// here is where the main body of the script is.
// it can take hours to completely execute.
//
var_x = 0;
}
else
{
// exit script as it's already running.
}
The problem is: if I simulate a hardware failure (do a hard reset when the script is executing) then the main script logic will never execute again because "var_x" will always be "1". (I already have logic to work out the restore point).
Thanks.
You should lock and unlock files with flock:
$fp = fopen($your_file);
if (flock($fp, LOCK_EX)) { )
{
//
// here is where the main body of the script is.
// it can take hours to completely execute.
//
flock($fp, LOCK_UN);
}
else
{
// exit script as it's already running.
}
Edit:
As flock seems not to work correctly on Windows machines, you have to resort to other solutions. From the top of my head an idea for a possible solution:
Instead of writing 1 to var_x, write the process ID retrieved via getmypid. When a new instance of the script reads the file, it should then lookup for a running process with this ID, and if the process is a PHP script. Of course, this can still go wrong, as there is the possibility of another PHP script obtaining the same PID after a hardware failure, so the solution is far from optimal.
Don't you think this would be better solved using file locks? (When the reset occurs file locks are reset as well)
http://php.net/flock
It sounds like you're doing some kind of manual semaphore for process management.
Rather than writing to a file, perhaps you should use an environment variable instead. That way, in the event of failure, your script will not have a closed semaphore when you restore.

Run PHP Task Asynchronously

I work on a somewhat large web application, and the backend is mostly in PHP. There are several places in the code where I need to complete some task, but I don't want to make the user wait for the result. For example, when creating a new account, I need to send them a welcome email. But when they hit the 'Finish Registration' button, I don't want to make them wait until the email is actually sent, I just want to start the process, and return a message to the user right away.
Up until now, in some places I've been using what feels like a hack with exec(). Basically doing things like:
exec("doTask.php $arg1 $arg2 $arg3 >/dev/null 2>&1 &");
Which appears to work, but I'm wondering if there's a better way. I'm considering writing a system which queues up tasks in a MySQL table, and a separate long-running PHP script that queries that table once a second, and executes any new tasks it finds. This would also have the advantage of letting me split the tasks among several worker machines in the future if I needed to.
Am I re-inventing the wheel? Is there a better solution than the exec() hack or the MySQL queue?
I've used the queuing approach, and it works well as you can defer that processing until your server load is idle, letting you manage your load quite effectively if you can partition off "tasks which aren't urgent" easily.
Rolling your own isn't too tricky, here's a few other options to check out:
GearMan - this answer was written in 2009, and since then GearMan looks a popular option, see comments below.
ActiveMQ if you want a full blown open source message queue.
ZeroMQ - this is a pretty cool socket library which makes it easy to write distributed code without having to worry too much about the socket programming itself. You could use it for message queuing on a single host - you would simply have your webapp push something to a queue that a continuously running console app would consume at the next suitable opportunity
beanstalkd - only found this one while writing this answer, but looks interesting
dropr is a PHP based message queue project, but hasn't been actively maintained since Sep 2010
php-enqueue is a recently (2017) maintained wrapper around a variety of queue systems
Finally, a blog post about using memcached for message queuing
Another, perhaps simpler, approach is to use ignore_user_abort - once you've sent the page to the user, you can do your final processing without fear of premature termination, though this does have the effect of appearing to prolong the page load from the user perspective.
When you just want to execute one or several HTTP requests without having to wait for the response, there is a simple PHP solution, as well.
In the calling script:
$socketcon = fsockopen($host, 80, $errno, $errstr, 10);
if($socketcon) {
$socketdata = "GET $remote_house/script.php?parameters=... HTTP 1.1\r\nHost: $host\r\nConnection: Close\r\n\r\n";
fwrite($socketcon, $socketdata);
fclose($socketcon);
}
// repeat this with different parameters as often as you like
On the called script.php, you can invoke these PHP functions in the first lines:
ignore_user_abort(true);
set_time_limit(0);
This causes the script to continue running without time limit when the HTTP connection is closed.
Another way to fork processes is via curl. You can set up your internal tasks as a webservice. For example:
http://domain/tasks/t1
http://domain/tasks/t2
Then in your user accessed scripts make calls to the service:
$service->addTask('t1', $data); // post data to URL via curl
Your service can keep track of the queue of tasks with mysql or whatever you like the point is: it's all wrapped up within the service and your script is just consuming URLs. This frees you up to move the service to another machine/server if necessary (ie easily scalable).
Adding http authorization or a custom authorization scheme (like Amazon's web services) lets you open up your tasks to be consumed by other people/services (if you want) and you could take it further and add a monitoring service on top to keep track of queue and task status.
http://domain/queue?task=t1
http://domain/queue?task=t2
http://domain/queue/t1/100931
It does take a bit of set-up work but there are a lot of benefits.
If it just a question of providing expensive tasks, in case of php-fpm is supported, why not to use fastcgi_finish_request() function?
This function flushes all response data to the client and finishes the request. This allows for time consuming tasks to be performed without leaving the connection to the client open.
You don't really use asynchronicity in this way:
Make all your main code first.
Execute fastcgi_finish_request().
Make all heavy stuff.
Once again php-fpm is needed.
I've used Beanstalkd for one project, and planned to again. I've found it to be an excellent way to run asynchronous processes.
A couple of things I've done with it are:
Image resizing - and with a lightly loaded queue passing off to a CLI-based PHP script, resizing large (2mb+) images worked just fine, but trying to resize the same images within a mod_php instance was regularly running into memory-space issues (I limited the PHP process to 32MB, and the resizing took more than that)
near-future checks - beanstalkd has delays available to it (make this job available to run only after X seconds) - so I can fire off 5 or 10 checks for an event, a little later in time
I wrote a Zend-Framework based system to decode a 'nice' url, so for example, to resize an image it would call QueueTask('/image/resize/filename/example.jpg'). The URL was first decoded to an array(module,controller,action,parameters), and then converted to JSON for injection to the queue itself.
A long running cli script then picked up the job from the queue, ran it (via Zend_Router_Simple), and if required, put information into memcached for the website PHP to pick up as required when it was done.
One wrinkle I did also put in was that the cli-script only ran for 50 loops before restarting, but if it did want to restart as planned, it would do so immediately (being run via a bash-script). If there was a problem and I did exit(0) (the default value for exit; or die();) it would first pause for a couple of seconds.
Here is a simple class I coded for my web application. It allows for forking PHP scripts and other scripts. Works on UNIX and Windows.
class BackgroundProcess {
static function open($exec, $cwd = null) {
if (!is_string($cwd)) {
$cwd = #getcwd();
}
#chdir($cwd);
if (strtoupper(substr(PHP_OS, 0, 3)) == 'WIN') {
$WshShell = new COM("WScript.Shell");
$WshShell->CurrentDirectory = str_replace('/', '\\', $cwd);
$WshShell->Run($exec, 0, false);
} else {
exec($exec . " > /dev/null 2>&1 &");
}
}
static function fork($phpScript, $phpExec = null) {
$cwd = dirname($phpScript);
#putenv("PHP_FORCECLI=true");
if (!is_string($phpExec) || !file_exists($phpExec)) {
if (strtoupper(substr(PHP_OS, 0, 3)) == 'WIN') {
$phpExec = str_replace('/', '\\', dirname(ini_get('extension_dir'))) . '\php.exe';
if (#file_exists($phpExec)) {
BackgroundProcess::open(escapeshellarg($phpExec) . " " . escapeshellarg($phpScript), $cwd);
}
} else {
$phpExec = exec("which php-cli");
if ($phpExec[0] != '/') {
$phpExec = exec("which php");
}
if ($phpExec[0] == '/') {
BackgroundProcess::open(escapeshellarg($phpExec) . " " . escapeshellarg($phpScript), $cwd);
}
}
} else {
if (strtoupper(substr(PHP_OS, 0, 3)) == 'WIN') {
$phpExec = str_replace('/', '\\', $phpExec);
}
BackgroundProcess::open(escapeshellarg($phpExec) . " " . escapeshellarg($phpScript), $cwd);
}
}
}
PHP HAS multithreading, its just not enabled by default, there is an extension called pthreads which does exactly that.
You'll need php compiled with ZTS though. (Thread Safe)
Links:
Examples
Another tutorial
pthreads PECL Extension
UPDATE: since PHP 7.2 parallel extension comes into play
Tutorial/Example
reference manual
This is the same method I have been using for a couple of years now and I haven't seen or found anything better. As people have said, PHP is single threaded, so there isn't much else you can do.
I have actually added one extra level to this and that's getting and storing the process id. This allows me to redirect to another page and have the user sit on that page, using AJAX to check if the process is complete (process id no longer exists). This is useful for cases where the length of the script would cause the browser to timeout, but the user needs to wait for that script to complete before the next step. (In my case it was processing large ZIP files with CSV like files that add up to 30 000 records to the database after which the user needs to confirm some information.)
I have also used a similar process for report generation. I'm not sure I'd use "background processing" for something such as an email, unless there is a real problem with a slow SMTP. Instead I might use a table as a queue and then have a process that runs every minute to send the emails within the queue. You would need to be warry of sending emails twice or other similar problems. I would consider a similar queueing process for other tasks as well.
It's a great idea to use cURL as suggested by rojoca.
Here is an example. You can monitor text.txt while the script is running in background:
<?php
function doCurl($begin)
{
echo "Do curl<br />\n";
$url = 'http://'.$_SERVER['SERVER_NAME'].$_SERVER['REQUEST_URI'];
$url = preg_replace('/\?.*/', '', $url);
$url .= '?begin='.$begin;
echo 'URL: '.$url.'<br>';
$ch = curl_init();
curl_setopt($ch, CURLOPT_URL, $url);
curl_setopt($ch, CURLOPT_RETURNTRANSFER, true);
$result = curl_exec($ch);
echo 'Result: '.$result.'<br>';
curl_close($ch);
}
if (empty($_GET['begin'])) {
doCurl(1);
}
else {
while (ob_get_level())
ob_end_clean();
header('Connection: close');
ignore_user_abort();
ob_start();
echo 'Connection Closed';
$size = ob_get_length();
header("Content-Length: $size");
ob_end_flush();
flush();
$begin = $_GET['begin'];
$fp = fopen("text.txt", "w");
fprintf($fp, "begin: %d\n", $begin);
for ($i = 0; $i < 15; $i++) {
sleep(1);
fprintf($fp, "i: %d\n", $i);
}
fclose($fp);
if ($begin < 10)
doCurl($begin + 1);
}
?>
There is a PHP extension, called Swoole.
Although it might not be enabled, it is available on my hosting for being enabled at click of a button.
Worth checking it out. I haven't had time to use it yet, as I was searching here for info, when I stumbled across it and thought it worth sharing.
Unfortunately PHP does not have any kind of native threading capabilities. So I think in this case you have no choice but to use some kind of custom code to do what you want to do.
If you search around the net for PHP threading stuff, some people have come up with ways to simulate threads on PHP.
If you set the Content-Length HTTP header in your "Thank You For Registering" response, then the browser should close the connection after the specified number of bytes are received. This leaves the server side process running (assuming that ignore_user_abort is set) so it can finish working without making the end user wait.
Of course you will need to calculate the size of your response content before rendering the headers, but that's pretty easy for short responses (write output to a string, call strlen(), call header(), render string).
This approach has the advantage of not forcing you to manage a "front end" queue, and although you may need to do some work on the back end to prevent racing HTTP child processes from stepping on each other, that's something you needed to do already, anyway.
If you don't want the full blown ActiveMQ, I recommend to consider RabbitMQ. RabbitMQ is lightweight messaging that uses the AMQP standard.
I recommend to also look into php-amqplib - a popular AMQP client library to access AMQP based message brokers.
Spawning new processes on the server using exec() or directly on another server using curl doesn't scale all that well at all, if we go for exec you are basically filling your server with long running processes which can be handled by other non web facing servers, and using curl ties up another server unless you build in some sort of load balancing.
I have used Gearman in a few situations and I find it better for this sort of use case. I can use a single job queue server to basically handle queuing of all the jobs needing to be done by the server and spin up worker servers, each of which can run as many instances of the worker process as needed, and scale up the number of worker servers as needed and spin them down when not needed. It also let's me shut down the worker processes entirely when needed and queues the jobs up until the workers come back online.
i think you should try this technique it will help to call as many as pages you like all pages will run at once independently without waiting for each page response as asynchronous.
cornjobpage.php //mainpage
<?php
post_async("http://localhost/projectname/testpage.php", "Keywordname=testValue");
//post_async("http://localhost/projectname/testpage.php", "Keywordname=testValue2");
//post_async("http://localhost/projectname/otherpage.php", "Keywordname=anyValue");
//call as many as pages you like all pages will run at once independently without waiting for each page response as asynchronous.
?>
<?php
/*
* Executes a PHP page asynchronously so the current page does not have to wait for it to finish running.
*
*/
function post_async($url,$params)
{
$post_string = $params;
$parts=parse_url($url);
$fp = fsockopen($parts['host'],
isset($parts['port'])?$parts['port']:80,
$errno, $errstr, 30);
$out = "GET ".$parts['path']."?$post_string"." HTTP/1.1\r\n";//you can use POST instead of GET if you like
$out.= "Host: ".$parts['host']."\r\n";
$out.= "Content-Type: application/x-www-form-urlencoded\r\n";
$out.= "Content-Length: ".strlen($post_string)."\r\n";
$out.= "Connection: Close\r\n\r\n";
fwrite($fp, $out);
fclose($fp);
}
?>
testpage.php
<?
echo $_REQUEST["Keywordname"];//case1 Output > testValue
?>
PS:if you want to send url parameters as loop then follow this answer :https://stackoverflow.com/a/41225209/6295712
PHP is a single-threaded language, so there is no official way to start an asynchronous process with it other than using exec or popen. There is a blog post about that here. Your idea for a queue in MySQL is a good idea as well.
Your specific requirement here is for sending an email to the user. I'm curious as to why you are trying to do that asynchronously since sending an email is a pretty trivial and quick task to perform. I suppose if you are sending tons of email and your ISP is blocking you on suspicion of spamming, that might be one reason to queue, but other than that I can't think of any reason to do it this way.

Categories