Stopping gearman workers nicely

Stopping gearman workers nicely - php

I have a number of Gearman workers running constantly, saving things like records of user page views, etc. Occasionally, I'll update the PHP code that is used by the Gearman workers. In order to get the workers to switch to the new code, I the kill and restart the PHP processes for the workers.
What is a better way to do this? Presumably, I'm sometime losing data (albeit not very important data) when I kill one of those worker processes.
Edit: I found an answer that works for me, and posted it below.

Solution 1
Generally I run my workers with the unix daemon utility with the -r flag and let them expire after one job. Your script will end gracefully after each iteration and daemon will restart automatically.
Your workers will be stale for one job but that may not be as big a deal to you as losing data
This solution also has the advantage of freeing up memory. You may run into problems with memory if you're doing large jobs as PHP pre 5.3 has god awful GC.
Solution 2
You could also add a quit function to all of your workers that exits the script. When you'd like to restart you simply give gearman calls to quit with a high priority.

function AutoRestart() {
static $startTime = time();
if (filemtime(__FILE__) > $startTime) {
exit();
}
}
AutoRestart();

Well, I posted this question, now I think I have found a good answer to it.
If you look in the code for Net_Gearman_Worker, you'll find that in the work loop, the function stopWork is monitored, and if it returns true, it exits the function.
I did the following:
Using memcache, I created a cached value, gearman_restarttime, and I use a separate script to set that to the current timestamp whenever I update the site. (I used Memcache, but this could be stored anywhere--a database, a file, or anything).
I extended the Worker class to be, essentially, Net_Gearman_Worker_Foo, and had all of my workers instantiate that. In the Foo class, I overrode the stopWork function to do the following: first, it checks gearman_restarttime; the first time through, it saves the value in a global variable. From then on, each time through, it compares the cached value to the global. If it has changed, the stopWork returns true, and the worker quits. A cron checks every minute to see if each worker is still running, and restarts any worker that has quit.
It may be worth putting a timer in stopWork as well, and checking the cache only once every x minutes. In our case, Memcache is fast enough that checking the value each time doesn't seem to be a problem, but if you are using some other system to store off the current timestamp, checking less often would be better.

Hmm, You could implement a code in the workers to check occasionally if the source code was modified, if yes then just just kill themselves when they see fit. That is, check while they are in the middle of the job, and if job is very large.
Other way would be implement some kind of an interrupt, maybe via network to say stop whenever you have the chance and restart.
The last solution is helping to modify Gearman's source to include this functionality.

I've been looking at this recently as well (though in perl with Gearman::XS). My usecase was the same as yours - allow a long-running gearman worker to periodically check for a new version of itself and reload.
My first attempt was just having the worker keep track of how long since it last checked the worker script version (an md5sum would also work). Then once N seconds had elapsed, between jobs, it would check to see if a new version of itself was available, and restart itself (fork()/exec()). This did work OK, but workers registered for rare jobs could potentially end up waiting hours for work() to return, and thus for checking the current time.
So I'm now setting a fairly short timeout when waiting for jobs with work(), so I can check the time more regularly. The PHP interface suggest that you can set this timeout value when registering for the job. I'm using SIGALRM to trigger the new-version check. The perl interface blocks on work(), so the alarm wasn't being triggered initially. Setting the timeout to 60 seconds got the SIGALRM working.

If someone were looking for answer for a worker running perl, that's part of what the GearmanX::Starter library is for. You can stop workers after completing the current job two different ways: externally by sending the worker process a SIGTERM, or programmatically by setting a global variable.

Given the fact that the workers are written in PHP, it would be a good idea to recycle them on a known schedule. This can be a static amount of time since started or can be done after a certain number of jobs have been attempted.
This essentially kills (no pun intended) two birds with one stone. You are are mitigating the potential for memory leaks, and you have a consistent way to determine when your workers will pick up on any potentially new code.
I generally write workers such that they report their interval to stdout and/or to a logging facility so it is simple to check on where a worker is in the process.

I ran into this same problem and came up with a solution for python 2.7.
I'm writing a python script which uses gearman to communicate with other components on the system. The script will have multiple workers, and I have each worker running in separate thread. The workers all receive gearman data, they process and store that data on a message queue, and the main thread can pull the data off of the queue as necessary.
My solution to cleanly shutting down each worker was to subclass gearman.GearmanWorker and override the work() function:
from gearman import GearmanWorker
POLL_TIMEOUT_IN_SECONDS = 60.0
class StoppableWorker(GearmanWorker):
def __init__(self, host_list=None):
super(StoppableWorker,self).__init__(host_list=host_list)
self._exit_runloop = False
# OVERRIDDEN
def work(self, poll_timeout=POLL_TIMEOUT_IN_SECONDS):
worker_connections = []
continue_working = True
def continue_while_connections_alive(any_activity):
return self.after_poll(any_activity)
while continue_working and not self._exit_runloop:
worker_connections = self.establish_worker_connections()
continue_working = self.poll_connections_until_stopped(
worker_connections,
continue_while_connections_alive,
timeout=poll_timeout)
for current_connection in worker_connections:
current_connection.close()
self.shutdown()
def stopwork(self):
self._exit_runloop = True
Use it just like GearmanWorker. When it's time to exit the script, call the stopwork() function. It won't stop immediately--it can take up to poll_timeout seconds before it kicks out of the run loop.
There may be multiple smart ways to invoke the stopwork() function. In my case, I create a temporary gearman client in the main thread. For the worker that I'm trying to shutdown, I send a special STOP command through the gearman server. When the worker gets this message, it knows to shut itself down.
Hope this helps!

http://phpscaling.com/2009/06/23/doing-the-work-elsewhere-sidebar-running-the-worker/
Like the above article demonstrates, I've run a worker inside a BASH shell script, exiting occasionally between jobs to cleanup (or re-load the worker-script) - or if a given task is given to it it can exit with a specific exit code and to shut down.

I use following code which supports both Ctrl-C and kill -TERM. By default supervisor sends TERM signal if have not modified signal= setting. In PHP 5.3+ declare(ticks = 1) is deprecated, use pcntl_signal_dispatch() instead.
$terminate = false;
pcntl_signal(SIGINT, function() use (&$terminate)
{
$terminate = true;
});
pcntl_signal(SIGTERM, function() use (&$terminate)
{
$terminate = true;
});
$worker = new GearmanWorker();
$worker->addOptions(GEARMAN_WORKER_NON_BLOCKING);
$worker->setTimeout(1000);
$worker->addServer('127.0.0.1', 4730);
$worker->addFunction('reverse', function(GearmanJob $job)
{
return strrev($job->workload());
});
$count = 500 + rand(0, 100); // rand to prevent multple workers restart at same time
for($i = 0; $i < $count; $i++)
{
if ( $terminate )
{
break;
}
else
{
pcntl_signal_dispatch();
}
$worker->work();
if ( $terminate )
{
break;
}
else
{
pcntl_signal_dispatch();
}
if ( GEARMAN_SUCCESS == $worker->returnCode() )
{
continue;
}
if ( GEARMAN_IO_WAIT != $worker->returnCode() && GEARMAN_NO_JOBS != $worker->returnCode() )
{
$e = new ErrorException($worker->error(), $worker->returnCode());
// log exception
break;
}
$worker->wait();
}
$worker->unregisterAll();

This would fit nicely into your continuous integration system. I hope you have it or you should have it soon :-)
As you check in new code, it automatically gets built and deployed onto the server. As a part of the build script, you kill all workers, and launch new ones.

What I do is use gearmadmin to check if there are any jobs running. I used the admin API to make a UI for this. When the jobs are sitting idly, there is no harm in killing them.

Related

Best way to guarantee a job is been executed

I have a script that is running continuously in the server, in this case a PHP script, like:
php path/to/my/index.php.
It's been executed, and when it's done, it's executed again, and again, forever.
I'm looking for the best way to be notified if that event stop running(been executed).
There are many reasons why it stops been called, like server memory, new deployment, human error... etc.
I just want to be notified(email, sms, slack...) if that script was not executed for certain amount of time(like 1 hour, 1 day, etc...)
My server is Ubuntu living in AWS.
An idea:
I was thinking on having an index in REDIS/MEMCACHED/ETC with a TTL. Every time the script run, renovate that TTL for this index.
If the script stop working for that TTL time, this index will expire. I just need a way to trigger a notification when that expiration happen, but looks like REDIS/MEMCACHED are not prepared for that

register_shutdown_function might help, but might not... https://www.php.net/manual/en/function.register-shutdown-function.php
I can't say i've ever seen a script that needs to run indefinitely in PHP. Perhaps there is another way to solve the problem you are after?

Update - Following your redis idea, I'd look at keyspace notifications. https://redis.io/topics/notifications
I've not tested the idea since I'm not actually a redis user. But it may be possible to subscribe to capture the expiration event (perhaps from another server?) and generate your notification.
There's no 'best' way to do this. Ultimately, what works best will boil down to the specific workflow you're supporting.
tl;dr version: Find what constitutes success and record the most recent time it happened. Use that for your notification trigger in another script.
Long version:
That said, persistent storage with a separate watcher is probably the most straight-forward way to do this. Record the last successful run, and then check it with a cron job every so often.
For what it's worth, for scripts like this I generally monitor exit codes or logs produced by the script in question. This isolates the error notification process from the script itself so a flaw in the script (hopefully) doesn't hamper the notification.
For a barebones example, say we have a script to invoke the actual script... (This is very much untested pseudo-code)
<?php
//Run and record.
exec("php path/to/my/index.php", $output, $return_code);
//$return_code will be 255 on fatal errors. You can use other return codes
//with exit in your called script to report other fail states.
if($return_code == 0) {
file_put_contents('/path/to/folder/last_success.txt', time());
} else {
file_put_contents('/path/to/folder/error_report.json', json_encode([
'return_code' => $return_code,
'time' => time(),
'output' => implode("\n", $output),
//assuming here that error output isn't silently logged somewhere already.
], JSON_PRETTY_PRINT));
}
And then a watcher.php that monitors these files on a cron job.
<?php
//Notify us immediately on failure maybe?
//If you have a lot of transient failures it may make more sense to
//aggregate and them in a single report at a specific time instead.
if(is_file('/path/to/folder/error_report.json')) {
//Mail details stored in JSON here.
//rename file so it's recorded, but we don't receive it again.
rename('/path/to/folder/error_report.json', '/path/to/folder/error_report.json'.'-sent-'.date('Y-m-d-H-i-s'));
} else {
if(is_file('/path/to/folder/last_success.txt')) {
$last_success = intval(file_get_contents('/path/to/folder/last_success.txt'));
if(strtotime('-24 hours') > $last_success) {
//Our script hasn't run in 24 hours, let someone know.
}
} else {
//No successful run recorded. Might want to put code here if that's unexpected.
}
}
Notes: There are some caveats to the specific approach displayed above. A script can fail in a non-fatal way and if you're not checking for it this example could record that as a successful run. For example, permissions errors causing warnings but the script still runs it's full course and exits normally without hitting an exit call with a specific return code. Our example invoker here would log that as a successful run - even though it isn't.
Another option is to log success from your script and only check for error exits from the invoker.

execute a PHP method every X seconds?

Context :
I'm making a PHP websocket server (here) running as a DAEMON in which there is obviously a main loop listening for sockets connections and incoming data so i can't just create an other loop with a sleep(x_number_of_seconds); in it because it'll freeze my whole server.
I can't execute an external script with a CRON job or fork a new process too (i guess) because I have to be in the scope of my server class to send data to connected client sockets.
Does anyone knows a magic trick to achieve this in PHP ? :/
Some crazy ideas :
Keeping track of the last loop execution time with microtime(true), and compare it with the current time on each loop, if it's about my desired X seconds interval, execute the method... which would result in a very drunk and inconsistent interval loop.
Run a JavaScript setInterval() in a browser that will communicate with my server trough a websocket and tell it to execute my method... i said they where crazy ideas !
Additional infos about what i'm trying to achieve :
I'm making a little online game (RPG like) in which I would like to add some NPCs that updates their behaviours every X seconds.
Is there an other ways of achieving this ? Am I missing something ? Should I rewrite my server in Node.js ??
Thanks a lot for the help !

A perfect alternative doesn't seams to exists so I'll use my crazy solution #1 :
$this->last_tick_time = microtime(true);
$this->tick_interval = 1;
$this->tick_counter = 0;
while(true)
{
//loop code here...
$t= microtime(true) - $this->last_tick_time;
if($t>= $this->tick_interval)
{
$this->on_server_tick(++$this->tick_counter);
$this->last_tick_time = microtime(true) - ($t- $this->tick_interval);
}
}
Basically, if the time elapsed since the last server tick is greater or equal to my desired tick interval, execute on_server_tick() method. And most importantly : we subtract the time overflow to make the next tick happen faster if this one happened too late. This way we fill the gaps and at the end, if the socket_select timeout is set to 1 second, we will never have a gap greater than 1.99999999+ second.
I also keep track of the tick counter, this way I can use modulo (%) to execute code on multiple intervals like this :
protected function on_server_tick($counter)
{
if($counter%5 == 0)
{
// 5 seconds interval
}
if($counter%10 == 0)
{
// 10 seconds interval
}
}
which covers all my needs ! :D
Don't worry PHP, I won't replace you with Node.js, you still my friend.

It looks to me like the websocket-framework you are using is too primitive to allow your server to do other useful things while waiting for connections from clients. The only call to PHP's socket_select() function is hard-coded to a one second timeout, and it does nothing when the time runs out. It really ought to allow a callback or an outside loop.
Look at the http://php.net/manual/en/function.socket-select.php manual page. The last parameter is a timeout time. socket_select() waits for incoming data on a socket or until the timeout time is up, which sounds like what you want to do, but the library has no provision for it. Then look at how the library uses it in core/classes/SocketServer.php.
I'm assuming you call run() and then it just never returns to your calling code until it gets a message on the socket, which prevents you from doing anything.

Can a PHP script be scheduled to run at a specific time or after a specific amount of time has expired?

I am writing a social cloud game for Android and using PHP on the server. Almost all aspects of the game will be user or user-device driven, so most of the time the device will send a request to the server and the server will, in turn, send a response to the device. Sometimes the server will also send out push messages to the devices, but generally in response to one user's device contacting the server.
There is one special case, however, where a user can set a "timer" and, after the given time has elapsed, the server needs to send the push messages to all of the devices. One way to do this would be to keep the timer local to the user's device and, once it goes off, send the signal to the server to send the push messages. However, there were several reasons why I did not want to do it this way. For instance, if the user decides not to play anymore or loses the game, the timer should technically remain in play.
I looked around for a method in PHP that would allow me to do something like this, but all I came up with were alarms, which are not what I need. I also thought of cron jobs and, indeed, they have been recommended for similar situations on this and other forums, but since this is not a recurring event but, rather, a one time event to take place at an arbitrary point in time, I did not know that a cron job is what I want either.
My current best solution involves a cron job that runs once a second and checks to see if one of these events is to occur in the next second and, if so, sends out the push messages. Is this the proper way to handle this situation, or is there a better tool out there that I just haven't found yet?

cron is great for scripts run on a regular basis, but if you want a one-off (or two-off) script to run at a particular time you would use the unix 'at' command, and you can do it directly from php using code like this:
/****
* Schedule a command using the AT command
*
* To do this you need to ensure that the www-data user is allowed to
* use the 'at' command - check this in /etc/at.deny
*
*
* EXAMPLE USAGE ::
*
* scriptat( '/usr/bin/command-to-execute', 'time-to-run');
* The time-to-run shoud be in this format: strftime("%Y%m%d%H%M", $unixtime)
*
**/
function scriptat( $cmd = null, $time = null ) {
// Both parameters are required
if (!$cmd) {
error_log("******* ScriptAt: cmd not specified");
return false;
}
if (!$time) {
error_log("******* ScriptAt: time not specified");
return false;
}
// We need to locate php (executable)
if (!file_exists("/usr/bin/php")) {
error_log("~ ScriptAt: Could not locate /usr/bin/php");
return false;
}
$fullcmd = "/usr/bin/php -f $cmd";
$r = popen("/usr/bin/at $time", "w");
if (!$r) {
error_log("~ ScriptAt: unable to open pipe for AT command");
return false;
}
fwrite($r, $fullcmd);
pclose($r);
error_log("~ ScriptAt: cmd=${cmd} time=${time}");
return true;
}

soloution 1 :
your php file can include a ultimate loop
$con = true;
while($con)
{
//do sample operation
if($end)
$con = false;
else
sleep(5); // 5 seconds for example
}
soloution 2 :
use cron jobs -- Depend on yout CP you can follow the instruction and call your php program at the specific times
limit : in cron job the minimum time between two calling is 1 minute
soloution 3 :
use a shell script and call your php program when ever you want

You can make PHP sleep for a certain amount of time - it will then resume the code afterwards but this is seriously not recommended because when a script sleeps it still uses up processor resources, and if you had multiple scripts sleeping for long periods of time it would put impossible load on your server.
The only other option that I know of is Cron. As #Pete says, you can manage Cron jobs from within PHP, e.g.:
http://net.tutsplus.com/tutorials/php/managing-cron-jobs-with-php-2/
This is going to involve a fair bit of coding, but I think it is your best options.
Another option is to have your user's browser call a PHP function using an Ajax request and JavaScript's setTimeout as suggested by the accepted answer in this question:
how to call a function in PHP after 10 seconds of the page load (Not using HTML)

Executing functions parallelly in PHP

Can PHP call a function and don't wait for it to return? So something like this:
function callback($pause, $arg) {
sleep($pause);
echo $arg, "\n";
}
header('Content-Type: text/plain');
fast_call_user_func_array('callback', array(3, 'three'));
fast_call_user_func_array('callback', array(2, 'two'));
fast_call_user_func_array('callback', array(1, 'one'));
would output
one (after 1 second)
two (after 2 seconds)
three (after 3 seconds)
rather than
three (after 3 seconds)
two (after 3 + 2 = 5 seconds)
one (after 3 + 2 + 1 = 6 seconds)
Main script is intended to be run as a permanent process (TCP server). callback() function would receive data from client, execute external PHP script and then do something based on other arguments that are passed to callback(). The problem is that main script must not wait for external PHP script to finish. Result of external script is important, so exec('php -f file.php &') is not an option.
Edit:
Many have recommended to take a look at PCNTL, so it seems that such functionality can be achieved. PCNTL is not available in Windows, and I don't have an access to a Linux machine right now, so I can't test it, but if so many people have advised it, then it should do the trick :)
Thanks, everyone!

On Unix platforms you can enable the PCNTL functions, and use pcntl_fork to fork the process and run your jobs in child processes.
Something like:
function fast_call_user_func_array($func, $args) {
if (pcntl_fork() == 0) {
call_user_func_array($func, $args);
}
}
Once you call pcntl_fork, two processes will execute your code from the same position. The parent process will get a PID returned from pcntl_fork, while the child process will get 0. (If there's an error the parent process will return -1, which is worth checking for in production code).

You can check out PHP Process Control:
http://us.php.net/manual/en/intro.pcntl.php
Note: This is not threading, but the handling of separate processes. There is more overhead attached.

Wouldn't it solve your problem to fork, keeping the parent process free for other connections & actions? See http://www.php.net/pcntl_fork. If you need an answer back you could possibly listen to a socket in the parent, and write with the child. A simple while(true) loop with a read could possibly do, and probably you already have that basic functionality if you run a permanent TCP server. Another option would be to keep track of your childprocess-ids, keep a accessable store somewhere (file/database/memcached etc), with a pcnt_wait in the main process with a WNOHANG to check which process has exited, and retrieve the data from the store.

You can do some threading in PHP if you use the method pcntl_fork.
http://ca.php.net/manual/en/function.pcntl-fork.php
I have never use this myself, but the are some good example of how to use it on php.net.

PHP doesn't have this functionality as far as I know
You can emulate the function using a different technique, like this one:
Parallel functions in PHP

PHP does not support multi-threading, so there's no other option than taking advantage of the OS or the web server multi processing capabilities. Note that actually you can fetch both the result and output of exec:
string exec ( string $command [,
array &$output [, int &$return_var
]] )

You can, at least, prevent the parent process from hanging until the child process is done by ignoring the child signals using pcntl_signal(SIGCHLD, SIG_IGN).
So, let's say you want to fork a process and execute another PHP function that takes a while without making the parent wait for it to finish (since you want the main process to finish in a timely manner):
pcntl_signal(SIGCHLD, SIG_IGN);
$pid = pcntl_fork();
if ($pid < 0) {
exit(0);
} elseif (!$pid) {
my_slow_function();
exit(0);
}
// Parent keeps executing and finishes before the child does
If you want to execute a slow external script as the child process, pcntl_exec is handy:
$script = array('/path/to/my/script'); // E.g. /home/my_user/my_script.php
pcntl_exec('/path/to/program/executable',$script); // E.g. /usr/bin/php

Speeding up a PHP App

I have a list of data that needs to be processed. The way it works right now is this:
A user clicks a process button.
The PHP code takes the first item that needs to be processed, takes 15-25 secs to process it, moves on to the next item, and so on.
This takes way too long. What I'd like instead is that:
The user clicks the process button.
A PHP script takes the first item and starts to process it.
Simultaneously another instance of the script takes the next item and processes it.
And so on, so around 5-6 of the items are being process simultaneously and we get 6 items processed in 15-25 secs instead of just one.
Is something like this possible?
I was thinking that I use CRON to launch an instance of the script every second. All items that need to be processed will be flagged as such in the MySQL database, so whenever an instance is launched through CRON, it will simply take the next item flagged to be processed and remove the flag.
Thoughts?
Edit: To clarify something, each 'item' is stored in a mysql database table as seperate rows. Whenever processing starts on an item, it is flagged as being processed in the db, hence each new instance will simply grab the next row which is not being processed and process it. Hence I don't have to supply the items as command line arguments.

Here's one solution, not the greatest, but will work fine on Linux:
Split the processing PHP into a separate CLI scripts in which:
The command line inputs include `$id` and `$item`
The script writes its PID to a file in `/tmp/$id.$item.pid`
The script echos results as XML or something that can be read into PHP to stdout
When finished the script deletes the `/tmp/$id.$item.pid` file
Your master script (presumably on your webserver) would do:
`exec("nohup php myprocessing.php $id $item > /tmp/$id.$item.xml");` for each item
Poll the `/tmp/$id.$item.pid` files until all are deleted (sleep/check poll is enough)
If they are never deleted kill all the processing scripts and report failure
If successful read the from `/tmp/$id.$item.xml` for format/output to user
Delete the XML files if you don't want to cache for later use
A backgrounded nohup started application will run independent of the script that started it.
This interested me sufficiently that I decided to write a POC.
test.php
<?php
$dir = realpath(dirname(__FILE__));
$start = time();
// Time in seconds after which we give up and kill everything
$timeout = 25;
// The unique identifier for the request
$id = uniqid();
// Our "items" which would be supplied by the user
$items = array("foo", "bar", "0xdeadbeef");
// We exec a nohup command that is backgrounded which returns immediately
foreach ($items as $item) {
exec("nohup php proc.php $id $item > $dir/proc.$id.$item.out &");
}
echo "<pre>";
// Run until timeout or all processing has finished
while(time() - $start < $timeout)
{
echo (time() - $start), " seconds\n";
clearstatcache(); // Required since PHP will cache for file_exists
$running = array();
foreach($items as $item)
{
// If the pid file still exists the process is still running
if (file_exists("$dir/proc.$id.$item.pid")) {
$running[] = $item;
}
}
if (empty($running)) break;
echo implode($running, ','), " running\n";
flush();
sleep(1);
}
// Clean up if we timeout out
if (!empty($running)) {
clearstatcache();
foreach ($items as $item) {
// Kill process of anything still running (i.e. that has a pid file)
if(file_exists("$dir/proc.$id.$item.pid")
&& $pid = file_get_contents("$dir/proc.$id.$item.pid")) {
posix_kill($pid, 9);
unlink("$dir/proc.$id.$item.pid");
// Would want to log this in the real world
echo "Failed to process: ", $item, " pid ", $pid, "\n";
}
// delete the useless data
unlink("$dir/proc.$id.$item.out");
}
} else {
echo "Successfully processed all items in ", time() - $start, " seconds.\n";
foreach ($items as $item) {
// Grab the processed data and delete the file
echo(file_get_contents("$dir/proc.$id.$item.out"));
unlink("$dir/proc.$id.$item.out");
}
}
echo "</pre>";
?>
proc.php
<?php
$dir = realpath(dirname(__FILE__));
$id = $argv[1];
$item = $argv[2];
// Write out our pid file
file_put_contents("$dir/proc.$id.$item.pid", posix_getpid());
for($i=0;$i<80;++$i)
{
echo $item,':', $i, "\n";
usleep(250000);
}
// Remove our pid file to say we're done processing
unlink("proc.$id.$item.pid");
?>
Put test.php and proc.php in the same folder of your server, load test.php and enjoy.
You will of course need nohup (unix) and PHP cli to get this to work.
Lots of fun, I may find a use for it later.

Use an external workqueue like Beanstalkd which your PHP script writes a bunch of jobs too. You have as many worker processes pulling jobs from beanstalkd and processing them as fast as possible. You can spin up as many workers as you have memory / CPU. Your job body should contain as little information as possible, maybe just some IDs which you hit the DB with. beanstalkd has a slew of client APIs and itself has a very basic API, think memcached.
We use beanstalkd to process all of our background jobs, I love it. Easy to use, its very fast.

There is no multithreading in PHP, however you can use fork.
php.net:pcntl-fork
Or you could execute a system() command and start another process which is multithreaded.

can you implementing threading in javascript on the client side? seems to me i've seen a javascript library (from google perhaps?) that implements it. google it and i'm sure you'll find something. i've never done it, but i know its possible. anyway, your client-side javascript could activate (ajax) a php script once for each item in separate threads. that might be easier than trying to do it all on the server side.
-don

If you are running a high traffic PHP server you are INSANE if you do not use Alternative PHP Cache: http://php.net/manual/en/book.apc.php . You do not have to make code modifications to run APC.
Another useful technique that can work along with APC is using the Smarty template system which allows you to cache output so that pages do not have to be rebuilt.

To solve this problem, I've used two different products; Gearman and RabbitMQ.
The benefit of putting your jobs into some sort of queuing software like Gearman or Rabbit is that you have multiple machines, they can all participate in processing items off the queue(s).
Gearman is easier to setup, so I'd suggest poking around with it a bit first. If you find you need something more heavy duty with queue robustness; Look into RabbitMQ
http://www.danga.com/gearman/
http://pear.php.net/package/Net_Gearman (PEAR library)

You can use pcntl_fork() and family to fork a process - however you may need something like IPC to communicate back to the parent process that the child process (the one you fork'd) is finished.
You could have them write to shared memory, like via memcache or a DB.
You could also have the child process write the completed data to a file, that the parent process keeps checking - as each child process completes the file is created/written to/updated, and parent process can grab it, one at a time, and them throw them back to the callee/client.
The parent's job is to control the queue, to make sure the same data isn't processed twice and also to sanity check the children (better kill that runaway process and start over...etc)
Something else to keep in mind - on windows platforms you are going to be severely limited - I dont even think you have access to pcntl_ unless you compiled PHP with support for it.
Also, can you cache the data once its been processed, or is it unique data every time? that would surely speed things up..?

We Keep Coding

PHP, A popular general-purpose scripting language that is especially suited to web development.