I'm using PCNTL to multiprocess a big script in PHP on an ubuntu server.
Here is the code ( simplified and commented )
function signalHandler($signo = null) {
$pid = posix_getpid();
switch ($signo) {
case SIGTERM:
case SIGINT:
case SIGKILL:
// a process is asked to stop (from user or father)
exit(3);
break;
case SIGCHLD:
case SIGHUP:
// ignore signals
break;
case 10: // signal user 1
// a process finished its work
exit(0);
break;
case 12: // signal user 2
// a process got an error.
exit(3);
break;
default:
// nothing
}
}
public static function run($nbProcess, $nbTasks, $taskFunc, $args) {
$pid = 0;
// there will be $nbTasks tasks to do, and no more than $nbProcess children must work at the same time
$MAX_PROCESS = $nbProcess;
$pidFather = posix_getpid();
$data = array();
pcntl_signal(SIGTERM, "signalHandler");
pcntl_signal(SIGINT, "signalHandler");
// pcntl_signal(SIGKILL, "signalHandler"); // SIGKILL can't be overloaded
pcntl_signal(SIGCHLD, "signalHandler");
pcntl_signal(SIGHUP, "signalHandler");
pcntl_signal(10, "signalHandler"); // user signal 1
pcntl_signal(12, "signalHandler"); // user signal 2
for ($indexTask = 0; $indexTask < $nbTasks ; $indexTask++) {
$pid = pcntl_fork();
// Father and new child both read code from here
if ($pid == -1) {
// log error
return false;
} elseif ($pid > 0) {
// We are in father process
// storing child id in an array
$arrayPid[$pid] = $indexTask;
} else {
// We are in child, nothing to do now
}
if ($pid == 0) {
// We are in child process
$pidChild = posix_getpid();
try {
//$taskFunc is an array containing an object, and the method to call from that object
$ret = (array) call_user_func($taskFunc, $indexTask, $args);// similar to $ret = (array) $taskFunc($indexTask, $args);
$returnArray = array(
"tasknb" => $indexTask,
"time" => $timer,
"data" => $ret,
);
} catch(Exception $e) {
// some stuff to exit child
}
$pdata = array();
array_push($pdata, $returnArray);
$data_str = serialize($pdata);
$shm_id = shmop_open($pidChild, "c", 0644, strlen($data_str));
if (!$shm_id) {
// log error
} else {
if(shmop_write($shm_id, $data_str, 0) != strlen($data_str)) {
// log error
}
}
// We are in a child and job is done. Let's exit !
posix_kill($pidChild, 10); // sending user signal 1 (OK)
pcntl_signal_dispatch();
} else {
// we are in father process,
// we check number of running children
while (count($arrayPid) >= $MAX_PROCESS) {
// There are more children than allowed
// waiting for any child to send signal
$pid = pcntl_wait($status);
// A child sent a signal !
if ($pid == -1) {
// log error
}
if (pcntl_wifexited($status)) {
$statusChild = pcntl_wexitstatus($status);
} else
$statusChild = $status;
// father ($pidFather) saw a child ($pid) exiting with status $statusChild (or $status ?)
// ^^^^ ^^^^^^
// (=3) (= random number ?)
if(isset($arrayPid[$pid])) {
// father knows this child
unset($arrayPid[$pid]);
if ($statusChild == 0 || $statusChild == 10 || $statusChild == 255) {
// Child did not report any error
$shm_id = shmop_open($pid, "a", 0, 0);
if ($shm_id === false)
// log error
else {
$shm_data = unserialize(shmop_read($shm_id, 0, shmop_size($shm_id)));
shmop_delete($shm_id);
shmop_close($shm_id);
$data = array_merge($data, $shm_data);
}
// kill (again) child
posix_kill($pid, 10);
pcntl_signal_dispatch();;
}
else {
// Child reported an error
}
}
}
}
}
}
The problem I'm facing is about the value returned by wexitstatus.
To make it simple, there is a father-process, that must create 200 threads.
He makes process one at a time, and wait for a process to finish if there are more than 8 threads actually running.
I added many logs, so I see a child finished its work.
I see it calling the line posix_kill($pidChild, 10);.
I see the signal handler is called with signal user 1 (which results in an exit(0)).
I see the father awakening, but when he gets the returned code from wexitstatus, he sees a code 3, and so thinks the child got an error, whereas it has exited with code 0 !!.
The pid is the good child's pid.
Maybe I misunderstand how signals work... Any clue ?
I found the problem.
In my application, register_shutdown_function(myFrameworkShutdownFunction) was used to shut down the script "smoothly".
So an exit(0) didn't immediately stop the child process. It first went into myFrameworkShutdownFunction, and converted the return code 0 to a code 3 (because of a misconfigured variable).
I currently have a website written in PHP, utilizing the curl_multi for polling external APIs. The server forks child processes to standalone from web requests and is working well, but it is somewhat limited to a per process basis.
Occasionally it hit bandwidth bottlenecks and needs a better centralized queuing logic.
I am currently trying PHP IPC with a standalone background process to handle all the outgoing requests, but was stuck in things that, normally said, is not likely to be catered by casual programmers. Says, garbage collection, inter-process exception handling, request-response matching... and etc. Am I going the wrong way?
Is there a common practise (implementation theory) out there, or even libraries I could make use of?
EDIT
Using localhost TCP/IP communication would double the stress of the local trafic, which is definitely not a good approach.
I am currently working on IPC message queue with some home-brew protocol... not looking right at all. Would appreciate any help.
There are several different things to distinguish here:
jobs : you have N jobs to process. Executed tasks can crash or hang, in any way all jobs should be proceeded without any data loss.
resources : you are processing your jobs in a single machine and/or in a single connection, so you need to take care of your cpu and bandwith.
synchronization : if you have interactions between your processes, you need to share information, taking care of concurrent data access.
Keep control on resources
Everybody want to get into the bus...
Because there is no built-in threads in PHP, we will need to simulate mutexes. The principle is quite simple:
1 All jobs are put in a queue
2 There is N resources available and no more in a pool
3 We iterate the queue (a while on each job)
4 Before execution, job asks for a resource in the pool
5 If there is available resources, job is executed
6 If there is no more resources, pool hangs until a job has finished or is considered dead
How to do that in PHP?
To proceed, we have several possibilities but the principle is the same:
We have 2 programs:
There is a process launcher that will launch no more than N tasks simultaneously.
There is a process child, it represents a thread's context
How can look a process launcher?
The process launcher knows how many tasks should be run, and run them without taking care of their result. It only control their execution (a process starts, finishes, or hangs, and N are already running).
PHP I give you the idea here, I'll give you usable examples later:
<?php
// launcher.php
require_once("ProcessesPool.php");
// The label identifies your process pool, it should be unique for your process launcher and your process children
$multi = new ProcessesPool($label = 'test');
// Initialize a new pool (creates the right directory or file, cleans a database or whatever you want)
// 10 is the maximum number of simultaneously run processes
if ($multi->create($max = '10') == false)
{
echo "Pool creation failed ...\n";
exit();
}
// We need to launch N processes, stored in $count
$count = 100; // maybe count($jobs)
// We execute all process, one by one
for ($i = 0; ($i < $count); $i++)
{
// The waitForResources method looks for how many processes are already run,
// and hangs until a resource is free or the maximum execution time is reached.
$ret = $multi->waitForResource($timeout = 10, $interval = 500000);
if ($ret)
{
// A resource is free, so we can run a new process
echo "Execute new process: $i\n";
exec("/usr/bin/php ./child.php $i > /dev/null &");
}
else
{
// Timeout is reached, we consider all children as dead and we kill them.
echo "WaitForResources Timeout! Killing zombies...\n";
$multi->killAllResources();
break;
}
}
// All process has been executed, but this does not mean they finished their work.
// This is important to follow the last executed processes to avoid zombies.
$ret = $multi->waitForTheEnd($timeout = 10, $interval = 500000);
if ($ret == false)
{
echo "WaitForTheEnd Timeout! Killing zombies...\n";
$multi->killAllResources();
}
// We destroy the process pool because we run all processes.
$multi->destroy();
echo "Finish.\n";
And what about the process child, the simulated thread's context
A child (executed job) has only 3 things to do :
tell the process launcher it started
do his job
tell the process launcher it finished
PHP It could contains something like this:
<?php
// child.php
require_once("ProcessesPool.php");
// we create the *same* instance of the process pool
$multi = new ProcessesPool($label = 'test');
// child tells the launcher it started (there will be one more resource busy in pool)
$multi->start();
// here I simulate job's execution
sleep(rand() % 5 + 1);
// child tells the launcher it finished his job (there will be one more resource free in pool)
$multi->finish();
Your usage example is nice, but where is the ProcessPool class?
There is a lot of ways to synchronize tasks but it really depends on your requirements and constraints.
You can synchronize your tasks using :
an unique file
a database
a directory and several files
probably other methods (such as system IPCs)
As we seen already, we need at least 7 methods:
1 create() will create an empty pool
2 start() takes a resource on the pool
3 finish() releases a resource
4 waitForResources() hangs if there is no more free resource
5 killAllResources() get all launched jobs in the pool and kills them
6 waitForTheEnd() hangs until there is no more busy resource
7 destroy() destroys pool
So let's begin by creating an abstract class, we'll be able to implement it using the above ways later.
PHP AbstractProcessPool.php
<?php
// AbstractProcessPool.php
abstract class AbstractProcessesPool
{
abstract protected function _createPool();
abstract protected function _cleanPool();
abstract protected function _destroyPool();
abstract protected function _getPoolAge();
abstract protected function _countPid();
abstract protected function _addPid($pid);
abstract protected function _removePid($pid);
abstract protected function _getPidList();
protected $_label;
protected $_max;
protected $_pid;
public function __construct($label)
{
$this->_max = 0;
$this->_label = $label;
$this->_pid = getmypid();
}
public function getLabel()
{
return ($this->_label);
}
public function create($max = 20)
{
$this->_max = $max;
$ret = $this->_createPool();
return $ret;
}
public function destroy()
{
$ret = $this->_destroyPool();
return $ret;
}
public function waitForResource($timeout = 120, $interval = 500000, $callback = null)
{
// let enough time for children to take a resource
usleep(200000);
while (44000)
{
if (($callback != null) && (is_callable($callback)))
{
call_user_func($callback, $this);
}
$age = $this->_getPoolAge();
if ($age == -1)
{
return false;
}
if ($age > $timeout)
{
return false;
}
$count = $this->_countPid();
if ($count == -1)
{
return false;
}
if ($count < $this->_max)
{
break;
}
usleep($interval);
}
return true;
}
public function waitForTheEnd($timeout = 3600, $interval = 500000, $callback = null)
{
// let enough time to the last child to take a resource
usleep(200000);
while (44000)
{
if (($callback != null) && (is_callable($callback)))
{
call_user_func($callback, $this);
}
$age = $this->_getPoolAge();
if ($age == -1)
{
return false;
}
if ($age > $timeout)
{
return false;
}
$count = $this->_countPid();
if ($count == -1)
{
return false;
}
if ($count == 0)
{
break;
}
usleep($interval);
}
return true;
}
public function start()
{
$ret = $this->_addPid($this->_pid);
return $ret;
}
public function finish()
{
$ret = $this->_removePid($this->_pid);
return $ret;
}
public function killAllResources($code = 9)
{
$pids = $this->_getPidList();
if ($pids == false)
{
$this->_cleanPool();
return false;
}
foreach ($pids as $pid)
{
$pid = intval($pid);
posix_kill($pid, $code);
if ($this->_removePid($pid) == false)
{
return false;
}
}
return true;
}
}
Synchronisation using a directory and several files
If you want to use the directory method (on a /dev/ram1 partition for example), the implementation will be :
1 create() will create an empty directory using the given $label
2 start() creates a file in the directory, named by the child's pid
3 finish() destroy the child's file
4 waitForResources() counts file inside that directory
5 killAllResources() reads directory content and kill all pids
6 waitForTheEnd() reads directory until there is no more files
7 destroy() removes directory
This method looks costy, but it is really efficient if you want to run hundred tasks simultaneously without taking as many database connections as there is jobs to execute.
Implementation :
PHP ProcessPoolFiles.php
<?php
// ProcessPoolFiles.php
class ProcessesPoolFiles extends AbstractProcessesPool
{
protected $_dir;
public function __construct($label, $dir)
{
parent::__construct($label);
if ((!is_dir($dir)) || (!is_writable($dir)))
{
throw new Exception("Directory '{$dir}' does not exist or is not writable.");
}
$sha1 = sha1($label);
$this->_dir = "{$dir}/pool_{$sha1}";
}
protected function _createPool()
{
if ((!is_dir($this->_dir)) && (!mkdir($this->_dir, 0777)))
{
throw new Exception("Could not create '{$this->_dir}'");
}
if ($this->_cleanPool() == false)
{
return false;
}
return true;
}
protected function _cleanPool()
{
$dh = opendir($this->_dir);
if ($dh == false)
{
return false;
}
while (($file = readdir($dh)) !== false)
{
if (($file != '.') && ($file != '..'))
{
if (unlink($this->_dir . '/' . $file) == false)
{
return false;
}
}
}
closedir($dh);
return true;
}
protected function _destroyPool()
{
if ($this->_cleanPool() == false)
{
return false;
}
if (!rmdir($this->_dir))
{
return false;
}
return true;
}
protected function _getPoolAge()
{
$age = -1;
$count = 0;
$dh = opendir($this->_dir);
if ($dh == false)
{
return false;
}
while (($file = readdir($dh)) !== false)
{
if (($file != '.') && ($file != '..'))
{
$stat = #stat($this->_dir . '/' . $file);
if ($stat['mtime'] > $age)
{
$age = $stat['mtime'];
}
$count++;
}
}
closedir($dh);
clearstatcache();
return (($count > 0) ? (#time() - $age) : (0));
}
protected function _countPid()
{
$count = 0;
$dh = opendir($this->_dir);
if ($dh == false)
{
return -1;
}
while (($file = readdir($dh)) !== false)
{
if (($file != '.') && ($file != '..'))
{
$count++;
}
}
closedir($dh);
return $count;
}
protected function _addPid($pid)
{
$file = $this->_dir . "/" . $pid;
if (is_file($file))
{
return true;
}
echo "{$file}\n";
$file = fopen($file, 'w');
if ($file == false)
{
return false;
}
fclose($file);
return true;
}
protected function _removePid($pid)
{
$file = $this->_dir . "/" . $pid;
if (!is_file($file))
{
return true;
}
if (unlink($file) == false)
{
return false;
}
return true;
}
protected function _getPidList()
{
$array = array ();
$dh = opendir($this->_dir);
if ($dh == false)
{
return false;
}
while (($file = readdir($dh)) !== false)
{
if (($file != '.') && ($file != '..'))
{
$array[] = $file;
}
}
closedir($dh);
return $array;
}
}
PHP demo, the process launcher:
<?php
// pool_files_launcher.php
require_once("AbstractProcessesPool.php");
require_once("ProcessesPoolFiles.php");
$multi = new ProcessesPoolFiles($label = 'test', $dir = "/tmp");
if ($multi->create($max = '10') == false)
{
echo "Pool creation failed ...\n";
exit();
}
$count = 20;
for ($i = 0; ($i < $count); $i++)
{
$ret = $multi->waitForResource($timeout = 10, $interval = 500000, 'test_waitForResource');
if ($ret)
{
echo "Execute new process: $i\n";
exec("/usr/bin/php ./pool_files_calc.php $i > /dev/null &");
}
else
{
echo "WaitForResources Timeout! Killing zombies...\n";
$multi->killAllResources();
break;
}
}
$ret = $multi->waitForTheEnd($timeout = 10, $interval = 500000, 'test_waitForTheEnd');
if ($ret == false)
{
echo "WaitForTheEnd Timeout! Killing zombies...\n";
$multi->killAllResources();
}
$multi->destroy();
echo "Finish.\n";
function test_waitForResource($multi)
{
echo "Waiting for available resource ( {$multi->getLabel()} )...\n";
}
function test_waitForTheEnd($multi)
{
echo "Waiting for all resources to finish ( {$multi->getLabel()} )...\n";
}
PHP demo, the process child:
<?php
// pool_files_calc.php
require_once("AbstractProcessesPool.php");
require_once("ProcessesPoolFiles.php");
$multi = new ProcessesPoolFiles($label = 'test', $dir = "/tmp");
$multi->start();
// here I simulate job's execution
sleep(rand() % 7 + 1);
$multi->finish();
Synchronisation using a database
MySQL If you prefer using the database method, you'll need a table like:
CREATE TABLE `processes_pool` (
`label` varchar(40) PRIMARY KEY,
`nb_launched` mediumint(6) unsigned NOT NULL,
`pid_list` varchar(2048) default NULL,
`updated` timestamp NOT NULL default CURRENT_TIMESTAMP on update CURRENT_TIMESTAMP
) ENGINE=InnoDB DEFAULT CHARSET=utf8;
Then, the implementation will be something like:
1 create() will insert a new row in the above table
2 start() inserts a pid on the pid list
3 finish() remove one pid from the pid list
4 waitForResources() reads nb_launched field
5 killAllResources() gets and kills every pid
6 waitForTheEnd() hangs and checks regularily until nb_launched equals 0
7 destroy() removes the row
Implementation:
PHP ProcessPoolMySql.php
<?php
// ProcessPoolMysql.php
class ProcessesPoolMySQL extends AbstractProcessesPool
{
protected $_sql;
public function __construct($label, PDO $sql)
{
parent::__construct($label);
$this->_sql = $sql;
$this->_label = sha1($label);
}
protected function _createPool()
{
$request = "
INSERT IGNORE INTO processes_pool
VALUES ( ?, ?, NULL, CURRENT_TIMESTAMP )
";
$this->_query($request, $this->_label, 0);
return $this->_cleanPool();
}
protected function _cleanPool()
{
$request = "
UPDATE processes_pool
SET
nb_launched = ?,
pid_list = NULL,
updated = CURRENT_TIMESTAMP
WHERE label = ?
";
$this->_query($request, 0, $this->_label);
return true;
}
protected function _destroyPool()
{
$request = "
DELETE FROM processes_pool
WHERE label = ?
";
$this->_query($request, $this->_label);
return true;
}
protected function _getPoolAge()
{
$request = "
SELECT (CURRENT_TIMESTAMP - updated) AS age
FROM processes_pool
WHERE label = ?
";
$ret = $this->_query($request, $this->_label);
if ($ret === null)
{
return -1;
}
return $ret['age'];
}
protected function _countPid()
{
$req = "
SELECT nb_launched AS nb
FROM processes_pool
WHERE label = ?
";
$ret = $this->_query($req, $this->_label);
if ($ret === null)
{
return -1;
}
return $ret['nb'];
}
protected function _addPid($pid)
{
$request = "
UPDATE processes_pool
SET
nb_launched = (nb_launched + 1),
pid_list = CONCAT_WS(',', (SELECT IF(LENGTH(pid_list) = 0, NULL, pid_list )), ?),
updated = CURRENT_TIMESTAMP
WHERE label = ?
";
$this->_query($request, $pid, $this->_label);
return true;
}
protected function _removePid($pid)
{
$req = "
UPDATE processes_pool
SET
nb_launched = (nb_launched - 1),
pid_list =
CONCAT_WS(',', (SELECT IF (LENGTH(
SUBSTRING_INDEX(pid_list, ',', (FIND_IN_SET(?, pid_list) - 1))) = 0, null,
SUBSTRING_INDEX(pid_list, ',', (FIND_IN_SET(?, pid_list) - 1)))), (SELECT IF (LENGTH(
SUBSTRING_INDEX(pid_list, ',', (-1 * ((LENGTH(pid_list) - LENGTH(REPLACE(pid_list, ',', ''))) + 1 - FIND_IN_SET(?, pid_list))))) = 0, null,
SUBSTRING_INDEX(pid_list, ',', (-1 * ((LENGTH(pid_list) - LENGTH(REPLACE(pid_list, ',', ''))) + 1 - FIND_IN_SET(?, pid_list))
)
)
)
)
),
updated = CURRENT_TIMESTAMP
WHERE label = ?";
$this->_query($req, $pid, $pid, $pid, $pid, $this->_label);
return true;
}
protected function _getPidList()
{
$req = "
SELECT pid_list
FROM processes_pool
WHERE label = ?
";
$ret = $this->_query($req, $this->_label);
if ($ret === null)
{
return false;
}
if ($ret['pid_list'] == null)
{
return array();
}
$pid_list = explode(',', $ret['pid_list']);
return $pid_list;
}
protected function _query($request)
{
$return = null;
$stmt = $this->_sql->prepare($request);
if ($stmt === false)
{
return $return;
}
$params = func_get_args();
array_shift($params);
if ($stmt->execute($params) === false)
{
return $return;
}
if (strncasecmp(trim($request), 'SELECT', 6) === 0)
{
$return = $stmt->fetch(PDO::FETCH_ASSOC);
}
return $return;
}
}
PHP demo, the process launcher:
<?php
// pool_mysql_launcher.php
require_once("AbstractProcessesPool.php");
require_once("ProcessesPoolMySQL.php");
$dbh = new PDO("mysql:host=127.0.0.1;dbname=fuz", 'root', 'root');
$dbh->setAttribute(PDO::ATTR_ERRMODE, PDO::ERRMODE_EXCEPTION);
$multi = new ProcessesPoolMySQL($label = 'test', $dbh);
if ($multi->create($max = '10') == false)
{
echo "Pool creation failed ...\n";
exit();
}
$count = 20;
for ($i = 0; ($i < $count); $i++)
{
$ret = $multi->waitForResource($timeout = 10, $interval = 500000, 'test_waitForResource');
if ($ret)
{
echo "Execute new process: $i\n";
exec("/usr/bin/php ./pool_mysql_calc.php $i > /dev/null &");
}
else
{
echo "WaitForResources Timeout! Killing zombies...\n";
$multi->killAllResources();
break;
}
}
$ret = $multi->waitForTheEnd($timeout = 10, $interval = 500000, 'test_waitForTheEnd');
if ($ret == false)
{
echo "WaitForTheEnd Timeout! Killing zombies...\n";
$multi->killAllResources();
}
$multi->destroy();
echo "Finish.\n";
function test_waitForResource($multi)
{
echo "Waiting for available resource ( {$multi->getLabel()} )...\n";
}
function test_waitForTheEnd($multi)
{
echo "Waiting for all resources to finish ( {$multi->getLabel()} )...\n";
}
PHP demo, the process child:
<?php
// pool_mysql_calc.php
require_once("AbstractProcessesPool.php");
require_once("ProcessesPoolMySQL.php");
$dbh = new PDO("mysql:host=127.0.0.1;dbname=fuz", 'root', 'root');
$dbh->setAttribute(PDO::ATTR_ERRMODE, PDO::ERRMODE_EXCEPTION);
$multi = new ProcessesPoolMySQL($label = 'test', $dbh);
$multi->start();
// here I simulate job's execution
sleep(rand() % 7 + 1);
$multi->finish();
What is the output of the above code?
Demo output Those demo gives - fortunately - about the same output. If the timeout is not reached (the dreams's case), output is:
KolyMac:TaskManager ninsuo$ php pool_files_launcher.php
Waiting for available resource ( test )...
Execute new process: 0
Waiting for available resource ( test )...
Execute new process: 1
Waiting for available resource ( test )...
Execute new process: 2
Waiting for available resource ( test )...
Execute new process: 3
Waiting for available resource ( test )...
Execute new process: 4
Waiting for available resource ( test )...
Execute new process: 5
Waiting for available resource ( test )...
Execute new process: 6
Waiting for available resource ( test )...
Execute new process: 7
Waiting for available resource ( test )...
Execute new process: 8
Waiting for available resource ( test )...
Execute new process: 9
Waiting for available resource ( test )...
Waiting for available resource ( test )...
Execute new process: 10
Waiting for available resource ( test )...
Waiting for available resource ( test )...
Waiting for available resource ( test )...
Execute new process: 11
Waiting for available resource ( test )...
Execute new process: 12
Waiting for available resource ( test )...
Execute new process: 13
Waiting for available resource ( test )...
Waiting for available resource ( test )...
Waiting for available resource ( test )...
Waiting for available resource ( test )...
Waiting for available resource ( test )...
Execute new process: 14
Waiting for available resource ( test )...
Waiting for available resource ( test )...
Execute new process: 15
Waiting for available resource ( test )...
Execute new process: 16
Waiting for available resource ( test )...
Execute new process: 17
Waiting for available resource ( test )...
Execute new process: 18
Waiting for available resource ( test )...
Execute new process: 19
Waiting for all resources to finish ( test )...
Waiting for all resources to finish ( test )...
Waiting for all resources to finish ( test )...
Waiting for all resources to finish ( test )...
Waiting for all resources to finish ( test )...
Waiting for all resources to finish ( test )...
Waiting for all resources to finish ( test )...
Waiting for all resources to finish ( test )...
Waiting for all resources to finish ( test )...
Waiting for all resources to finish ( test )...
Waiting for all resources to finish ( test )...
Waiting for all resources to finish ( test )...
Waiting for all resources to finish ( test )...
Waiting for all resources to finish ( test )...
Finish.
Demo output In a worse case (I change sleep(rand() % 7 + 1); to sleep(rand() % 7 + 100);, this gives:
KolyMac:TaskManager ninsuo$ php pool_files_launcher.php
Waiting for available resource ( test )...
Execute new process: 0
Waiting for available resource ( test )...
Execute new process: 1
Waiting for available resource ( test )...
Execute new process: 2
Waiting for available resource ( test )...
Execute new process: 3
Waiting for available resource ( test )...
Execute new process: 4
Waiting for available resource ( test )...
Execute new process: 5
Waiting for available resource ( test )...
Execute new process: 6
Waiting for available resource ( test )...
Execute new process: 7
Waiting for available resource ( test )...
Execute new process: 8
Waiting for available resource ( test )...
Execute new process: 9
Waiting for available resource ( test )...
Waiting for available resource ( test )...
Waiting for available resource ( test )...
Waiting for available resource ( test )...
Waiting for available resource ( test )...
Waiting for available resource ( test )...
(...)
Waiting for available resource ( test )...
Waiting for available resource ( test )...
Waiting for available resource ( test )...
WaitForResources Timeout! Killing zombies...
Waiting for all resources to finish ( test )...
Finish.
Go to page 2 to continue reading this answer.
Page 2: there is a limit for SO answer's body to 30k chars, so I need to create a new one.
Keep control on results
No mistake allowed!
YEAH ! You can launch tons of processes without taking care of resources. But what if one child process fails? There will be one undone or incomplete job!...
In fact, this is simpler (far simpler) than controlling process execution. We have a queue of jobs executed using a pool, and we only need to know if one has failed or succeded after its execution. If there is failure when the whole pool is executed, then failed processes are put on a new pool, and are executed again.
How to proceed in PHP ?
This principle is based on clusters: queue contains several jobs, but represents only one entity. Each calcul of the cluster
should succeed to get that entity complete.
Roadmap:
1 We create a todo list (to mismatch with the queue, used for process management) containing all calculs of the cluster. Each job has a status: waiting (not executed), running (executed and not finished), success and error (according to their result), and of course, at this step, their status is WAITING.
2 We run all jobs using the process manager (to keep control on resources), each one begin by telling the task manager it runs, and according to his own context, it finishes by indicating his state (failed or success).
3 When the entire queue is executed, the task manager creates a new queue with failed jobs, and loops again.
4 When all jobs succeed, you're done, and you're sure nothing went wrong. Your cluster is complete and your entity usable at upper level.
Proof of concept
There is nothing more to tell about the subject, so let's write some code, continuing the previous sample code.
As for process management, you can use several ways to synchronize your parents and children, but there is no hard logic, so there is no need for an abstraction.
So I developped a MySQL example (faster to write), you'll be free to adapt this concept for your requirements and constraints.
MySQL Create the following table:
CREATE TABLE `tasks_manager` (
`cluster_label` varchar(40),
`calcul_label` varchar(40),
`status` enum('waiting', 'running', 'failed', 'success') default 'waiting',
`updated` timestamp NOT NULL default CURRENT_TIMESTAMP on update CURRENT_TIMESTAMP,
PRIMARY KEY (`cluster_label`, `calcul_label`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8;
PHP Here is the TaskManager.php file:
<?php
class TasksManager
{
protected $_cluster_label;
protected $_calcul_label;
protected $_sql;
const WAITING = "waiting";
const RUNNING = "running";
const SUCCESS = "success";
const FAILED = "failed";
public function __construct($label, PDO $sql)
{
$this->_sql = $sql;
$this->_cluster_label = substr($label, 0, 40);
}
public function getClusterLabel()
{
return $this->_cluster_label;
}
public function getCalculLabel()
{
return $this->_calcul_label;
}
public function destroy()
{
$request = "
DELETE FROM tasks_manager
WHERE cluster_label = ?
";
$this->_query($request, $this->_cluster_label);
return $this;
}
public function start($calcul_label)
{
$this->_calcul_label = $calcul_label;
$this->add($calcul_label, TasksManager::RUNNING);
return $this;
}
public function finish($status = TasksManager::SUCCESS)
{
if (!$this->_isStatus($status))
{
throw new Exception("{$status} is not a valid status.");
}
if (is_null($this->_cluster_label))
{
throw new Exception("finish() called, but task never started.");
}
$request = "
UPDATE tasks_manager
SET status = ?
WHERE cluster_label = ?
AND calcul_label = ?
";
$this->_query($request, $status, $this->_cluster_label, substr($this->_calcul_label, 0, 40));
return $this;
}
public function add($calcul_label, $status = TasksManager::WAITING)
{
if (!$this->_isStatus($status))
{
throw new Exception("{$status} is not a valid status.");
}
$request = "
INSERT INTO tasks_manager (
cluster_label, calcul_label, status
) VALUES (
?, ?, ?
)
ON DUPLICATE KEY UPDATE
status = ?
";
$calcul_label = substr($calcul_label, 0, 40);
$this->_query($request, $this->_cluster_label, $calcul_label, $status, $status);
return $this;
}
public function delete($calcul_label)
{
$request = "
DELETE FROM tasks_manager
WHERE cluster_label = ?
AND calcul_label = ?
";
$this->_query($request, $this->_cluster_label, substr($calcul_label, 0, 40));
return $this;
}
public function countStatus($status = TasksManager::SUCCESS)
{
if (!$this->_isStatus($status))
{
throw new Exception("{$status} is not a valid status.");
}
$request = "
SELECT COUNT(*) AS cnt
FROM tasks_manager
WHERE cluster_label = ?
AND status = ?
";
$ret = $this->_query($request, $this->_cluster_label, $status);
return $ret[0]['cnt'];
}
public function count()
{
$request = "
SELECT COUNT(id) AS cnt
FROM tasks_manager
WHERE cluster_label = ?
";
$ret = $this->_query($request, $this->_cluster_label);
return $ret[0]['cnt'];
}
public function getCalculsByStatus($status = TasksManager::SUCCESS)
{
if (!$this->_isStatus($status))
{
throw new Exception("{$status} is not a valid status.");
}
$request = "
SELECT calcul_label
FROM tasks_manager
WHERE cluster_label = ?
AND status = ?
";
$ret = $this->_query($request, $this->_cluster_label, $status);
$array = array();
if (!is_null($ret))
{
$array = array_map(function($row) {
return $row['calcul_label'];
}, $ret);
}
return $array;
}
public function switchStatus($statusA = TasksManager::RUNNING, $statusB = null)
{
if (!$this->_isStatus($statusA))
{
throw new Exception("{$statusA} is not a valid status.");
}
if ((!is_null($statusB)) && (!$this->_isStatus($statusB)))
{
throw new Exception("{$statusB} is not a valid status.");
}
if ($statusB != null)
{
$request = "
UPDATE tasks_manager
SET status = ?
WHERE cluster_label = ?
AND status = ?
";
$this->_query($request, $statusB, $this->_cluster_label, $statusA);
}
else
{
$request = "
UPDATE tasks_manager
SET status = ?
WHERE cluster_label = ?
";
$this->_query($request, $statusA, $this->_cluster_label);
}
return $this;
}
private function _isStatus($status)
{
if (!is_string($status))
{
return false;
}
return in_array($status, array(
self::FAILED,
self::RUNNING,
self::SUCCESS,
self::WAITING,
));
}
protected function _query($request)
{
$return = null;
$stmt = $this->_sql->prepare($request);
if ($stmt === false)
{
return $return;
}
$params = func_get_args();
array_shift($params);
if ($stmt->execute($params) === false)
{
return $return;
}
if (strncasecmp(trim($request), 'SELECT', 6) === 0)
{
$return = $stmt->fetchAll(PDO::FETCH_ASSOC);
}
return $return;
}
}
PHP The task_launcher.php is the usage example
<?php
require_once("AbstractProcessesPool.php");
require_once("ProcessesPoolMySQL.php");
require_once("TasksManager.php");
// Initializing database connection
$dbh = new PDO("mysql:host=127.0.0.1;dbname=fuz", 'root', 'root');
$dbh->setAttribute(PDO::ATTR_ERRMODE, PDO::ERRMODE_EXCEPTION);
// Initializing process pool
$pool = new ProcessesPoolMySQL($label = "pool test", $dbh);
$pool->create($max = "10");
// Initializing task manager
$multi = new TasksManager($label = "jobs test", $dbh);
$multi->destroy();
// Simulating jobs
$count = 20;
$todo_list = array ();
for ($i = 0; ($i < $count); $i++)
{
$todo_list[$i] = "Job {$i}";
$multi->add($todo_list[$i], TasksManager::WAITING);
}
// Infinite loop until all jobs are done
$continue = true;
while ($continue)
{
$continue = false;
echo "Starting to run jobs in queue ...\n";
// put all failed jobs to WAITING status
$multi->switchStatus(TasksManager::FAILED, TasksManager::WAITING);
foreach ($todo_list as $job)
{
$ret = $pool->waitForResource($timeout = 10, $interval = 500000, "waitResource");
if ($ret)
{
echo "Executing job: $job\n";
exec(sprintf("/usr/bin/php ./tasks_program.php %s > /dev/null &", escapeshellarg($job)));
}
else
{
echo "waitForResource timeout!\n";
$pool->killAllResources();
// All jobs currently running are considered dead, so, failed
$multi->switchStatus(TasksManager::RUNNING, TasksManager::FAILED);
break;
}
}
$ret = $pool->waitForTheEnd($timeout = 10, $interval = 500000, "waitEnd");
if ($ret == false)
{
echo "waitForTheEnd timeout!\n";
$pool->killAllResources();
// All jobs currently running are considered dead, so, failed
$multi->switchStatus(TasksManager::RUNNING, TasksManager::FAILED);
}
echo "All jobs in queue executed, looking for errors...\n";
// Counts if there is failures
$nb_failed = $multi->countStatus(TasksManager::FAILED);
if ($nb_failed > 0)
{
$todo_list = $multi->getCalculsByStatus(TasksManager::FAILED);
echo sprintf("%d jobs failed: %s\n", $nb_failed, implode(', ', $todo_list));
$continue = true;
}
}
function waitResource($multi)
{
echo "Waiting for a resource ....\n";
}
function waitEnd($multi)
{
echo "Waiting for the end .....\n";
}
// All jobs finished, destroying task manager
$multi->destroy();
// Destroying process pool
$pool->destroy();
echo "Finish.\n";
PHP And here is the child program (a calcul)
<?php
if (!isset($argv[1]))
{
die("This program must be called with an identifier (calcul_label)\n");
}
$calcul_label = $argv[1];
require_once("AbstractProcessesPool.php");
require_once("ProcessesPoolMySQL.php");
require_once("TasksManager.php");
// Initializing database connection
$dbh = new PDO("mysql:host=127.0.0.1;dbname=fuz", 'root', 'root');
$dbh->setAttribute(PDO::ATTR_ERRMODE, PDO::ERRMODE_EXCEPTION);
// Initializing process pool (with same label as parent)
$pool = new ProcessesPoolMySQL($label = "pool test", $dbh);
// Takes one resource in pool
$pool->start();
// Initializing task manager (with same label as parent)
$multi = new TasksManager($label = "jobs test", $dbh);
$multi->start($calcul_label);
// Simulating execution time
$secs = (rand() % 2) + 3;
sleep($secs);
// Simulating job status
$status = rand() % 3 == 0 ? TasksManager::FAILED : TasksManager::SUCCESS;
// Job finishes indicating his status
$multi->finish($status);
// Releasing pool's resource
$pool->finish();
Demo output This demo will give you something like this (too large for SO).
Synchronization and communication between processes
There is solutions to communicate easily without errors.
Take a coffee, we're close to the end!
We are now able to launch tons of processes and they are all giving the expected result, well, this is not bad.
But now, all our processes are executed stand-alone, and they actually can't communicate each others. That is your
core problem, and there is a lot of solutions.
It is really difficult to tell you exactly what kind of communication you need. You were speaking about what you tried (IPCs,
communication using files, or home-made protocols), but not what kind of information are shared between your processes.
Anyway, I invite you to think about an OOP solution.
PHP is powerful.
PHP has magic methods :
__get($property) let us implement the access of a $property on an object
__set($property, $value) let us implement the assignation of a $property on an object
PHP can handle files, with concurrent-access management
fopen($file, 'c+') opens a file with advisory lock options enabled (allow you to use flock)
flock($descriptor, LOCK_SH) takes a shared lock (for reading)
flock($descriptor, LOCK_EX) takes an exclusive lock (for writting)
Finally, PHP has:
json_encode($object) to get a json representation of an object
json_decode($string) to get back an object from a json string
You see where I'm going? We will create a Synchro class that will work the same way as the stdClass class, but it will be always safely synchronized on a file. Our processes will be able to access the same instance of that object at the same time.
Some Linux systems tricks
Of course, if you have 150 processes working on the same file at the same time, your hard drive will slow down your processes. To
handle this issue, why not creating a filesystem partition on RAM? Writing into that file will be about as quick as writing
in memory!
shell As root, type the following commands:
mkfs -q /dev/ram1 65536
mkdir -p /ram
mount /dev/ram1 /ram
Some notes :
65536 is in kilobytes, here you get a 64M partition.
if you want to mount that partition at startup, create a shell script and call it inside /etc/rc.local file.
Implementation
PHP Here is the Synchro.php class.
<?php
class Synchro
{
private $_file;
public function __construct($file)
{
$this->_file = $file;
}
public function __get($property)
{
// File does not exist
if (!is_file($this->_file))
{
return null;
}
// Check if file is readable
if ((is_file($this->_file)) && (!is_readable($this->_file)))
{
throw new Exception(sprintf("File '%s' is not readable.", $this->_file));
}
// Open file with advisory lock option enabled for reading and writting
if (($fd = fopen($this->_file, 'c+')) === false)
{
throw new Exception(sprintf("Can't open '%s' file.", $this->_file));
}
// Request a lock for reading (hangs until lock is granted successfully)
if (flock($fd, LOCK_SH) === false)
{
throw new Exception(sprintf("Can't lock '%s' file for reading.", $this->_file));
}
// A hand-made file_get_contents
$contents = '';
while (($read = fread($fd, 32 * 1024)) !== '')
{
$contents .= $read;
}
// Release shared lock and close file
flock($fd, LOCK_UN);
fclose($fd);
// Restore shared data object and return requested property
$object = json_decode($contents);
if (property_exists($object, $property))
{
return $object->{$property};
}
return null;
}
public function __set($property, $value)
{
// Check if directory is writable if file does not exist
if ((!is_file($this->_file)) && (!is_writable(dirname($this->_file))))
{
throw new Exception(sprintf("Directory '%s' does not exist or is not writable.", dirname($this->_file)));
}
// Check if file is writable if it exists
if ((is_file($this->_file)) && (!is_writable($this->_file)))
{
throw new Exception(sprintf("File '%s' is not writable.", $this->_file));
}
// Open file with advisory lock option enabled for reading and writting
if (($fd = fopen($this->_file, 'c+')) === false)
{
throw new Exception(sprintf("Can't open '%s' file.", $this->_file));
}
// Request a lock for writting (hangs until lock is granted successfully)
if (flock($fd, LOCK_EX) === false)
{
throw new Exception(sprintf("Can't lock '%s' file for writing.", $this->_file));
}
// A hand-made file_get_contents
$contents = '';
while (($read = fread($fd, 32 * 1024)) !== '')
{
$contents .= $read;
}
// Restore shared data object and set value for desired property
if (empty($contents))
{
$object = new stdClass();
}
else
{
$object = json_decode($contents);
}
$object->{$property} = $value;
// Go back at the beginning of file
rewind($fd);
// Truncate file
ftruncate($fd, strlen($contents));
// Save shared data object to the file
fwrite($fd, json_encode($object));
// Release exclusive lock and close file
flock($fd, LOCK_UN);
fclose($fd);
return $value;
}
}
Demonstration
We will continue (and finish) our processes / tasks example by making our processes communicate each others.
Rules :
Our goal is to get the sum of all numbers between 1 and 20.
We have 20 processes, with ID from 1 to 20.
Those processes are randomly queued for execution.
Each process (except process 1) can do only one calculation : its id + the previous process's result
Process 1 directly put his id
Each process succeed if it can do his calculation (means, if the previous process's result is available), else it fails (and is candidate for a new queue)
The pool's timeout expires after 10 seconds
Well, it looks complicated but actually, it is a good representation of what you'll find in a real-life situation.
PHP synchro_launcher.php file.
<?php
require_once("AbstractProcessesPool.php");
require_once("ProcessesPoolMySQL.php");
require_once("TasksManager.php");
require_once("Synchro.php");
// Removing old synchroized object
if (is_file("/tmp/synchro.txt"))
{
unlink("/tmp/synchro.txt");
}
// Initializing database connection
$dbh = new PDO("mysql:host=127.0.0.1;dbname=fuz", 'root', 'root');
$dbh->setAttribute(PDO::ATTR_ERRMODE, PDO::ERRMODE_EXCEPTION);
// Initializing process pool
$pool = new ProcessesPoolMySQL($label = "synchro pool", $dbh);
$pool->create($max = "10");
// Initializing task manager
$multi = new TasksManager($label = "synchro tasks", $dbh);
$multi->destroy();
// Simulating jobs
$todo_list = array ();
for ($i = 1; ($i <= 20); $i++)
{
$todo_list[$i] = $i;
$multi->add($todo_list[$i], TasksManager::WAITING);
}
// Infinite loop until all jobs are done
$continue = true;
while ($continue)
{
$continue = false;
echo "Starting to run jobs in queue ...\n";
// Shuffle all jobs (else this will be too easy :-))
shuffle($todo_list);
// put all failed jobs to WAITING status
$multi->switchStatus(TasksManager::FAILED, TasksManager::WAITING);
foreach ($todo_list as $job)
{
$ret = $pool->waitForResource($timeout = 10, $interval = 500000, "waitResource");
if ($ret)
{
echo "Executing job: $job\n";
exec(sprintf("/usr/bin/php ./synchro_program.php %s > /dev/null &", escapeshellarg($job)));
}
else
{
echo "waitForResource timeout!\n";
$pool->killAllResources();
// All jobs currently running are considered dead, so, failed
$multi->switchStatus(TasksManager::RUNNING, TasksManager::FAILED);
break;
}
}
$ret = $pool->waitForTheEnd($timeout = 10, $interval = 500000, "waitEnd");
if ($ret == false)
{
echo "waitForTheEnd timeout!\n";
$pool->killAllResources();
// All jobs currently running are considered dead, so, failed
$multi->switchStatus(TasksManager::RUNNING, TasksManager::FAILED);
}
echo "All jobs in queue executed, looking for errors...\n";
// Counts if there is failures
$multi->switchStatus(TasksManager::WAITING, TasksManager::FAILED);
$nb_failed = $multi->countStatus(TasksManager::FAILED);
if ($nb_failed > 0)
{
$todo_list = $multi->getCalculsByStatus(TasksManager::FAILED);
echo sprintf("%d jobs failed: %s\n", $nb_failed, implode(', ', $todo_list));
$continue = true;
}
}
function waitResource($multi)
{
echo "Waiting for a resource ....\n";
}
function waitEnd($multi)
{
echo "Waiting for the end .....\n";
}
// All jobs finished, destroying task manager
$multi->destroy();
// Destroying process pool
$pool->destroy();
// Recovering final result
$synchro = new Synchro("/tmp/synchro.txt");
echo sprintf("Result of the sum of all numbers between 1 and 20 included is: %d\n", $synchro->result20);
echo "Finish.\n";
PHP And its associed synchro_calcul.php file.
<?php
if (!isset($argv[1]))
{
die("This program must be called with an identifier (calcul_label)\n");
}
$current_id = $argv[1];
require_once("AbstractProcessesPool.php");
require_once("ProcessesPoolMySQL.php");
require_once("TasksManager.php");
require_once("Synchro.php");
// Initializing database connection
$dbh = new PDO("mysql:host=127.0.0.1;dbname=fuz", 'root', 'root');
$dbh->setAttribute(PDO::ATTR_ERRMODE, PDO::ERRMODE_EXCEPTION);
// Initializing process pool (with same label as parent)
$pool = new ProcessesPoolMySQL($label = "synchro pool", $dbh);
// Takes one resource in pool
$pool->start();
// Initializing task manager (with same label as parent)
$multi = new TasksManager($label = "synchro tasks", $dbh);
$multi->start($current_id);
// ------------------------------------------------------
// Job begins here
$synchro = new Synchro("/tmp/synchro.txt");
if ($current_id == 1)
{
$synchro->result1 = 1;
$status = TasksManager::SUCCESS;
}
else
{
$previous_id = $current_id - 1;
if (is_null($synchro->{"result{$previous_id}"}))
{
$status = TasksManager::FAILED;
}
else
{
$synchro->{"result{$current_id}"} = $synchro->{"result{$previous_id}"} + $current_id;
$status = TasksManager::SUCCESS;
}
}
// ------------------------------------------------------
// Job finishes indicating his status
$multi->finish($status);
// Releasing pool's resource
$pool->finish();
Output
The following demo will give you something like this output (too large for SO)
Conclusion
Task management in PHP is not really easy because of the missing threads. As many developers, I hope this feature will come
built-in someday. Anyway, this is possible to control resources and results, and to share data between processes, so we can do,
with some work I assume, task management efficiently.
Synchronization and communication can be done by several ways, but you need to check out pros and cons for each one, according
to your constraints and requirements. For example:
if you need to launch 500 tasks at once and want to use a MySQL synchronization method, you'll need 1+500 simultaneous connections to the database (it may not appreciate a lot).
If you need to share a big amount of data, using only one file may be inefficient.
If you are using file for synchronization, don't forget to have a look to system built-in tools such as /dev/sdram.
Try to stay as more as possible in an object-oriented programming to handle your troubles. Home-made protocols or such will make your app harder to maintain.
I gave you my 2-cents about this interesting subject, and I hope it will give you some ideas to solve your issues.
I recommend you take a look at this library called PHP-Queue: https://github.com/CoderKungfu/php-queue
An short description from its github page:
A unified front-end for different queuing backends. Includes a REST
server, CLI interface and daemon runners.
Check out its github page for more details.
With a bit of tinkering, I think this library will help you solve your problem.
Hope this helps.
Problem: I want to implement several php-worker processes who are listening on a MQ-server queue for asynchronous jobs. The problem now is that simply running this processes as daemons on a server doesn't really give me any level of control over the instances (Load, Status, locked up)...except maybe for dumping ps -aux.
Because of that I'm looking for a runtime environment of some kind that lets me monitor and control the instances, either on system (process) level or on a higher layer (some kind of Java-style appserver)
Any pointers?
Here's some code that may be useful.
<?
define('WANT_PROCESSORS', 5);
define('PROCESSOR_EXECUTABLE', '/path/to/your/processor');
set_time_limit(0);
$cycles = 0;
$run = true;
$reload = false;
declare(ticks = 30);
function signal_handler($signal) {
switch($signal) {
case SIGTERM :
global $run;
$run = false;
break;
case SIGHUP :
global $reload;
$reload = true;
break;
}
}
pcntl_signal(SIGTERM, 'signal_handler');
pcntl_signal(SIGHUP, 'signal_handler');
function spawn_processor() {
$pid = pcntl_fork();
if($pid) {
global $processors;
$processors[] = $pid;
} else {
if(posix_setsid() == -1)
die("Forked process could not detach from terminal\n");
fclose(stdin);
fclose(stdout);
fclose(stderr);
pcntl_exec(PROCESSOR_EXECUTABLE);
die('Failed to fork ' . PROCESSOR_EXECUTABLE . "\n");
}
}
function spawn_processors() {
global $processors;
if($processors)
kill_processors();
$processors = array();
for($ix = 0; $ix < WANT_PROCESSORS; $ix++)
spawn_processor();
}
function kill_processors() {
global $processors;
foreach($processors as $processor)
posix_kill($processor, SIGTERM);
foreach($processors as $processor)
pcntl_waitpid($processor);
unset($processors);
}
function check_processors() {
global $processors;
$valid = array();
foreach($processors as $processor) {
pcntl_waitpid($processor, $status, WNOHANG);
if(posix_getsid($processor))
$valid[] = $processor;
}
$processors = $valid;
if(count($processors) > WANT_PROCESSORS) {
for($ix = count($processors) - 1; $ix >= WANT_PROCESSORS; $ix--)
posix_kill($processors[$ix], SIGTERM);
for($ix = count($processors) - 1; $ix >= WANT_PROCESSORS; $ix--)
pcntl_waitpid($processors[$ix]);
} elseif(count($processors) < WANT_PROCESSORS) {
for($ix = count($processors); $ix < WANT_PROCESSORS; $ix++)
spawn_processor();
}
}
spawn_processors();
while($run) {
$cycles++;
if($reload) {
$reload = false;
kill_processors();
spawn_processors();
} else {
check_processors();
}
usleep(150000);
}
kill_processors();
pcntl_wait();
?>
It sounds like you already have a MQ up and running on a *nix system and just want a way to manage workers.
A very simple way to do so is to use GNU screen. To start 10 workers you can use:
#!/bin/sh
for x in `seq 1 10` ; do
screen -dmS worker_$x php /path/to/script.php worker$x
end
This will start 10 workers in the background using screens named worker_1,2,3 and so on.
You can reattach to the screens by running screen -r worker_ and list the running workers by using screen -list.
For more info this guide may be of help:
http://www.kuro5hin.org/story/2004/3/9/16838/14935
Also try:
screen --help
man screen
or google.
For production servers I would normally recommend using the normal system startup scripts, but I have been running screen commands from the startup scripts for years with no problems.
Do you actually need it to be continuously running?
If you only want to spawn new process on request, you can register it as a service in xinetd.
a pcntl plugin type server daemon for PHP
http://dev.pedemont.com/sonic/
Bellow is our working implementation of #chaos answer. Code to handle signals was removed as this script lives usually just few milliseconds.
Also, in code we added 2 functions to save pids between calls: restore_processors_state() and save_processors_state(). We've used redis there, but you can decide to use implementation on files.
We run this script every minute using cron. Cron checks if all processes alive. If not - it re-run them and then dies. If we want to kill existing processes then we simply run this script with argument kill: php script.php kill.
Very handy way of running workers without injecting scripts into init.d.
<?php
include_once dirname( __FILE__ ) . '/path/to/bootstrap.php';
define('WANT_PROCESSORS', 5);
define('PROCESSOR_EXECUTABLE', '' . dirname(__FILE__) . '/path/to/worker.php');
set_time_limit(0);
$run = true;
$reload = false;
declare(ticks = 30);
function restore_processors_state()
{
global $processors;
$redis = Zend_Registry::get('redis');
$pids = $redis->hget('worker_procs', 'pids');
if( !$pids )
{
$processors = array();
}
else
{
$processors = json_decode($pids, true);
}
}
function save_processors_state()
{
global $processors;
$redis = Zend_Registry::get('redis');
$redis->hset('worker_procs', 'pids', json_encode($processors));
}
function spawn_processor() {
$pid = pcntl_fork();
if($pid) {
global $processors;
$processors[] = $pid;
} else {
if(posix_setsid() == -1)
die("Forked process could not detach from terminal\n");
fclose(STDIN);
fclose(STDOUT);
fclose(STDERR);
pcntl_exec('/usr/bin/php', array(PROCESSOR_EXECUTABLE));
die('Failed to fork ' . PROCESSOR_EXECUTABLE . "\n");
}
}
function spawn_processors() {
restore_processors_state();
check_processors();
save_processors_state();
}
function kill_processors() {
global $processors;
foreach($processors as $processor)
posix_kill($processor, SIGTERM);
foreach($processors as $processor)
pcntl_waitpid($processor, $trash);
unset($processors);
}
function check_processors() {
global $processors;
$valid = array();
foreach($processors as $processor) {
pcntl_waitpid($processor, $status, WNOHANG);
if(posix_getsid($processor))
$valid[] = $processor;
}
$processors = $valid;
if(count($processors) > WANT_PROCESSORS) {
for($ix = count($processors) - 1; $ix >= WANT_PROCESSORS; $ix--)
posix_kill($processors[$ix], SIGTERM);
for($ix = count($processors) - 1; $ix >= WANT_PROCESSORS; $ix--)
pcntl_waitpid($processors[$ix], $trash);
}
elseif(count($processors) < WANT_PROCESSORS) {
for($ix = count($processors); $ix < WANT_PROCESSORS; $ix++)
spawn_processor();
}
}
if( isset($argv) && count($argv) > 1 ) {
if( $argv[1] == 'kill' ) {
restore_processors_state();
kill_processors();
save_processors_state();
exit(0);
}
}
spawn_processors();