PHP fork process - getting child output in parent

PHP fork process - getting child output in parent - php

I want to achieve the following:
Initialize an array. Child process adds some elements to the array. Parent process adds some elements to the array. Finally before exiting, print all elements.
Following is the code that I wrote:
<?php
$values=array();
$pid = pcntl_fork();
if (!$pid) {
sleep(2);
$values[]="Put by child";
exit(0);
}
$values[]="Put by parent";
pcntl_waitpid($pid, $status);
print_r($values);
?>
However, it only prints one value - Put by parent. Can someone please explain the behavior and suggest the right code?
Regards,
JP

(sorry for crossposting)
I suggest a look at socket_create_pair().
In the PHP manual is a very short & easy example of interprocess communication (IPC) between a fork()-parent and the child.
And using serialize() und unserialize() You could even transfer complex data types like arrays...

Forked children will gain their own dedicated copy of their memory space as soon as they write anywhere to it - this is "copy-on-write". While shmop does provide access to a common memory location, the actual PHP variables and whatnot defined in the script are NOT shared between the children.
Doing $x = 7; in one child will not make the $x in the other children also become 7. Each child will have its own dedicated $x that is completely independent of everyone else's copy.
a local domain socket is easiest. have the parent open one with fsockopen for each child immediately before the fork. that way you can have one comm channel per child: http://php.net/manual/en/transports.unix.php and http://php.net/manual/en/transports.unix.php.
You could also shared memory, or open a bi-directional communications channel between the two processes and build a little api to send data back and forth.
As long as father and children know the key/keys of the shared memory segment is ok to do a shmop_open before pcnlt_fork. But remember that pcnlt_fork returns 0 in the child's process and -1 on failure to create the child (check your code near the comment /confusion/). The father will have in $pid the PID of the child process just created.
Check it here:
http://php.net/manual/es/function.pcntl-fork.php

The child's code is missing the print_r() statement.
The parent won't print what the child added to values, as the addition was done after the child process had been fork()ed off, and with this it had gotten its own copy of the prcoess' memory.
From the fork-tag's excerpt (emphasis by me):
The fork() function is the Unix/Linux/POSIX way of creating a new process by duplicating the calling process.
This behaviour of forking is different from threading where all threads share the same address space.

Related

What's wrong with my concurrent programming logic?

I wrote a web spider to spider pages concurrently. For each link that the spider finds, I want to fork off a new child that starts the process all over again.
I don't want to overload the target server so I created a static array that all objects can access. Each child can add their PID to the array, and either parent or child should check the array to see if $maxChildren have been met, and if so, patiently wait until any child finishes.
As you see, I have $maxChildren set to 3. I am expecting to see 3 simultaneous processes at any given time. However, that's not the case. The linux top command shows 12 to 30 processes at any given time. In concurrent programming, how can I regulate the number of simultaneous processes? My logic is currently inspired by how Apache handles it's max children, but I'm not exactly sure how that works.
As pointed out in one of the answers, globally accessing the static variable brings up issues with race conditions. To deal with this, the $children array takes the unique $PID of the process as both the key and it's value, thereby creating a unique value. My thinking is that since any object can only deal with one $children[$pid] value, locking is not necessary. Is this not true? Is there a chance that two processes could try to unset or add the same value at some point?
private static $children = array();
private $maxChildren = 3;
public function concurrentSpider($url) {
// STEP 1:
// Download the $url
$pageData = http_get($url, $ref = '');
if (!$this->checkIfSaved($url)) {
$this->save_link_to_db($url, $pageData);
}
// STEP 2:
// extract all hyperlinks from this url's page data
$linksOnThisPage = $this->harvest_links($url, $pageData);
// STEP 3:
// Check the links array from STEP 2 to see if this page has
// already been saved or is excluded because of any other
// logic from the excluded_link() function
$filteredLinks = $this->filterLinks($linksOnThisPage);
shuffle($filteredLinks);
// STEP 4: loop through each of the links and
// repeat the process
foreach ($filteredLinks as $filteredLink) {
$pid = pcntl_fork();
switch ($pid) {
case -1:
print "Could not fork!\n";
exit(1);
case 0:
if ($this->checkIfSaved($filteredLink)) {
exit();
}
//$pid = getmypid();
print "In child with PID: " . getmypid() . " processing $filteredLink \n";
$var[$pid]->concurrentSpider($filteredLink);
sleep(2);
exit(1);
default:
// Add an element to the children array
self::$children[$pid] = $pid;
// If the maximum number of children has been
// achieved, wait until one or more return
// before continuing.
while (count(self::$children) >= $this->maxChildren) {
//print count(self::$children) . " children \n";
$pid = pcntl_waitpid(-1, $status);
unset(self::$children[$pid]);
}
}
}
}
This is written in PHP. I know that the pcntl_waitpid function with argument of -1 waits for any child to complete regardless of the parent (http://php.net/manual/en/function.pcntl-waitpid.php).
What's wrong with my logic and how can I correct it so that only $maxChildren processes are running simultaneously? I'm also open to improving the logic in general if you have suggestions.

First thing to note: if this is truly a global being shared among multiple threads, it's possible that multiple threads are adding to it at once and you're running afoul of a race condition. You need some sort of concurrency control to ensure that only one process is accessing your global array at once.
Also, try the simple debugging trick of having each process write out (to the console or to a file) its PID and the full contents of the global array each time a new spider is forked. It will help you to check your assumptions (which are plainly wrong at some point) and figure out what's going wrong.
EDIT: (In response to the comments)
I'm not a PHP developer, but if I had to guess, based on the fact that you're using an OS tool that counts OS-level processes, I'd guess that your fork is spawning multiple processes, but your static array is global within the current process. Implementing system-wide shared memory is a lot more complicated!
If you just want to count something and ensure that instances of a shared resource don't grow out of control, look into semaphores, and see if you can find a way in PHP to create a named semaphore object that can be shared between multiple instances of your spider.

Use a real programming language ;)
Step 1 is kind of bad why are you downloading if it might be in the db. Put that inside the if and see if you can put a mutex around it. Maybe so something in sql to imitate one.
I hope harvest_links uses a proper html processor with css selector support (i like fizzler for .NET). I guess regular expression would be fine if its just to get links but it is possible to mess up.
I see step 4 and i don't think its bad but personally i'd do it a different way.
I'd have something like step one to insert url,page,flag into a db. Then i'd have another process or the same one ask the db for unprocessed pages and set the flag to some value if it errors and another if its successful. This is so if something fails of the process exits (shutdown, crash, power out, etc) it can pick it up easily and don't need to scan every page to find where it left off. It just ask the database for the next link and redoes what it didnt finish

PHP doesn't support multithreading, therefore it doesn't support mutexes or any other synchronization methods. As others have said in their answers, this will lead to a race condition.
You'll have to write a wrapper in C or bash. That way, the PHP script can submit targets to the wrapper, and the wrapper will handle scheduling.
Another approach is to rewrite your spider in Python or Ruby, both of which support multithreading. That will eliminate the need for interprocess communication.
Edit: On second thought, the best way is to write the wrapper in Python or Ruby and reuse your existing PHP code as a black box. That's a compromise of the solutions above.

If the spider is for practical purposes, you might want to google "curl multithread"
cURL Multi Threading with PHP

Storing data in /tmp from a forked process in php

For awhile now, I've been storing serialized objects from forked processes in /tmp with file_put_contents.
Once all child processes wrap up, I'm simply using file_get_contents and unserializing the data to rebuild my object for processing.
so my question is, is there a better way of storing my data without writing to /tmp?

Outside of storing the data in a file, the only other native solutions that come to mind is shm http://www.php.net/manual/en/function.shm-attach.php or socket stream pairs http://www.php.net/manual/en/function.stream-socket-pair.php
Either of these should be doable if the data collected is unimportant after the script is run. The idea behind both of them is to just open a communication channel between your parent and child processes. I will say that my personal opinion is that unless there is some sort of issue using the file system is causing that it is by far the least complicated solution.
SHM
The idea with shm is that instead of storing the serialized objects in a file, you would store them in an shm segment protected for concurrency by a semaphore. Forgive the code, it is rough but should be enough to give you the general idea.
/*** Configurations ***/
$blockSize = 1024; // Size of block in bytes
$shmVarKey = 1; //An integer specifying the var key in the shm segment
/*** In the children processes ***/
//First you need to get a semaphore, this is important to help make sure you don't
//have multiple child processes accessing the shm segment at the same time.
$sem = sem_get(ftok(tempnam('/tmp', 'SEM'), 'a'));
//Then you need your shm segment
$shm = shm_attach(ftok(tempnam('/tmp', 'SHM'), 'a'), $blockSize);
if (!$sem || !$shm) {
//error handling goes here
}
//if multiple forks hit this line at roughly the first time, the first one gets the lock
//everyone else waits until the lock is released before trying again.
sem_acquire($sem);
$data = shm_has_var($shm, $shmVarKey) ? shm_get_var($shm, $shmVarKey) : shm_get_var($shm, $shmVarKey);
//Here you could key the data array by probably whatever you are currently using to determine file names.
$data['child specific id'] = 'my data'; // can be an object, array, anything that is php serializable, though resources are wonky
shm_put_var($shm, $shmVarKey, $data); // important to note that php handles the serialization for you
sem_release($sem);
/*** In the parent process ***/
$shm = shm_attach(ftok(tempnam('/tmp', 'SHM'), 'a'), $blockSize);
$data = shm_get_var($shm, $shmVarKey);
foreach ($data as $key => $value)
{
//process your data
}
Stream Socket Pair
I personally love using these for inter process communication. The idea is that prior to forking, you create a stream socket pair. This results in two read write sockets being created that are connected to each other. One of them should be used by the parent, one of them should be used by the child. You would have to create a separate pair for each child and it will change your parent's model a little bit in that it will need to manage the communication a bit more real time.
Fortunately the PHP docs for this function has a great example: http://us2.php.net/manual/en/function.stream-socket-pair.php

You could use a shared memory cache such as memcached which would be faster, but depending on what you're doing and how sensitive/important the data is, a file-based solution may be your best option.

Terminating zombie child processes forked from socket server

Disclaimer
I am well aware that PHP might not have been the best choice in this case for a socket server. Please refrain from suggesting
different languages/platforms - believe me - I've heard it from all
directions.
Working in a Unix environment and using PHP 5.2.17, my situation is as follows - I have constructed a socket server in PHP that communicates with flash clients. My first hurtle was that each incoming connection blocked the sequential connections until it had finished being processed. I solved this by utilizing PHP's pcntl_fork(). I was successfully able to spawn numerous child processes (saving their PID in the parent) that took care of broadcasting messages to the other clients and therefore "releasing" the parent process and allowing it to continue to process the next connection[s].
My main issue right now is dealing/handling with the collection of these dead/zombie child processes and terminating them. I have read (over and over) the relevant PHP manual pages for pcntl_fork() and realize that the parent process is in charge of cleaning up its children. The parent process receives a SIGNAL from its child when the child executes an exit(0). I am able to "catch" that signal using the pcntl_signal() function to setup a signal handler.
My signal_handler looks like this :
declare(ticks = 1);
function sig_handler($signo){
global $forks; // this is an array that holds all the child PID's
foreach($forks AS $key=>$childPid){
echo "has my child {$childPid} gone away?".PHP_EOL;
if (posix_kill($childPid, 9)){
echo "Child {$childPid} has tragically died!".PHP_EOL;
unset($forks[$key]);
}
}
}
I am indeed seeing both echo's including the relevant and correct child PID that needs to be removed but it seems that
posix_kill($childPid, 9)
Which I understand to be synonymous with kill -9 $childPid is returning TRUE although it is in fact NOT removing the process...
Taken from the man pages of posix_kill :
Returns TRUE on success or FALSE on failure.
I am monitoring the child processes with the ps command. They appear like this on the system :
web5 5296 5234 0 14:51 ? 00:00:00 [php] <defunct>
web5 5321 5234 0 14:51 ? 00:00:00 [php] <defunct>
web5 5466 5234 0 14:52 ? 00:00:00 [php] <defunct>
As you can see all these processes are child processes of the parent which has the PID of 5234
Am I missing something in my understanding? I seem to have managed to get everything to work (and it does) but I am left with countless zombie processes on the system!
My plans for a zombie apocalypse are rock solid -
but what on earth can I do when even sudo kill -9 does not kill the zombie child processes?
Update 10 Days later
I've answered this question myself after some additional research, if you are still able to stand my ramblings proceed at will.

I promise there is a solution at the end :P
Alright... so here we are, 10 days later and I believe that I have solved this issue. I didn't want to add onto an already longish post so I'll include in this answer some of the things that I tried.
Taking #sym's advice, and reading more into the documentation and the comments on the documentation, the pcntl_waitpid() description states :
If a child as requested by pid has already exited by the time of the call (a so-called
"zombie" process), the function returns immediately. Any system resources used by the child
are freed...
So I setup my pcntl_signal() handler like this -
function sig_handler($signo){
global $childProcesses;
$pid = pcntl_waitpid(-1, $status, WNOHANG);
echo "Sound the alarm! ";
if ($pid != 0){
if (posix_kill($pid, 9)){
echo "Child {$pid} has tragically died!".PHP_EOL;
unset($childProcesses[$pid]);
}
}
}
// These define the signal handling
// pcntl_signal(SIGTERM, "sig_handler");
// pcntl_signal(SIGHUP, "sig_handler");
// pcntl_signal(SIGINT, "sig_handler");
pcntl_signal(SIGCHLD, "sig_handler");
For completion, I'll include the actual code I'm using for forking a child process -
function broadcastData($socketArray, $data){
global $db,$childProcesses;
$pid = pcntl_fork();
if($pid == -1) {
// Something went wrong (handle errors here)
// Log error, email the admin, pull emergency stop, etc...
echo "Could not fork()!!";
} elseif($pid == 0) {
// This part is only executed in the child
foreach($socketArray AS $socket) {
// There's more happening here but the essence is this
socket_write($socket,$msg,strlen($msg));
// TODO : Consider additional forking here for each client.
}
// This is where the signal is fired
exit(0);
}
// If the child process did not exit above, then this code would be
// executed by both parent and child. In my case, the child will
// never reach these commands.
$childProcesses[] = $pid;
// The child process is now occupying the same database
// connection as its parent (in my case mysql). We have to
// reinitialize the parent's DB connection in order to continue using it.
$db = dbEngine::factory(_dbEngine);
}
Yea... That's a ratio of 1:1 comments to code :P
So this was looking great and I saw the echo of :
Sound the alarm! Child 12345 has tragically died!
However when the socket server loop did it's next iteration, the socket_select() function failed throwing this error :
PHP Warning: socket_select(): unable to select [4]: Interrupted system call...
The server would now hang and not respond to any requests other than manual kill commands from a root terminal.
I'm not going to get into why this was happening or what I did after that to debug it... lets just say it was a frustrating week...
much coffee, sore eyes and 10 days later...
Drum roll please
TL&DR - The Solution :
Mentioned here in a comment from 2007 in the php sockets documentation and in this tutorial on stuporglue (search for "good parenting"), one can simply "ignore" signals comming in from the child processes (SIGCHLD) by passing SIG_IGN to the pcntl_signal() function -
pcntl_signal(SIGCHLD, SIG_IGN);
Quoting from that linked blog post :
If we are ignoring SIGCHLD, the child processes will be reaped automatically upon completion.
Believe it or not - I included that pcntl_signal() line, deleted all the other handlers and things dealing with the children and it worked! There were no more <defunct> processes left hanging around!
In my case, it really did not interest me to know exactly when a child process died, or who it was, I wasn't interested in them at all - just that they didn't hang around and crash my entire server :P

Regards your disclaimer - PHP is no better / worse than many other languages for writing a server in. There are some things which are not possible to do (lightweight processes, asynchronuos I/O) but these do not really apply to a forking server. If you're using OO code, then do ensure that you've got the circular reference checking garbage collector enabled.
Once a child process exits, it becomes a zombie until the parent process cleans it up. Your code seems to send a KILL signal to every child on receipt of any signal. It won't clean up the process entries. It will terminate processes which have not called exit. To get the child process reaped correctly you should call waitpid (see also this example on the pcntl_wait manual page).

http://www.linuxsa.org.au/tips/zombies.html
Zombies are dead processes. You cannot kill the dead. All processes
eventually die, and when they do they become zombies. They consume
almost no resources, which is to be expected because they are dead!
The reason for zombies is so the zombie's parent (process) can
retrieve the zombie's exit status and resource usage statistics. The
parent signals the operating system that it no longer needs the zombie
by using one of the wait() system calls.
When a process dies, its child processes all become children of
process number 1, which is the init process. Init is ``always''
waiting for children to die, so that they don't remain as zombies.
If you have zombie processes it means those zombies have not been
waited for by their parent (look at PPID displayed by ps -l). You
have three choices: Fix the parent process (make it wait); kill the
parent; or live with it. Remember that living with it is not so hard
because zombies take up little more than one extra line in the output
of ps.

I know only too well how hard you have to search for a solution to the problem of zombie processes. My concern with potentially having hundreds or thousands of them was (rightly or wrongly as I don't know if this would actualy be a problem) running out of inodes, as all hell can break loose when that happens.
If only the pcntl_fork() manual page linked to posix-setsid() many of us would have discovered the solution was so simple years ago.

Executing functions parallelly in PHP

Can PHP call a function and don't wait for it to return? So something like this:
function callback($pause, $arg) {
sleep($pause);
echo $arg, "\n";
}
header('Content-Type: text/plain');
fast_call_user_func_array('callback', array(3, 'three'));
fast_call_user_func_array('callback', array(2, 'two'));
fast_call_user_func_array('callback', array(1, 'one'));
would output
one (after 1 second)
two (after 2 seconds)
three (after 3 seconds)
rather than
three (after 3 seconds)
two (after 3 + 2 = 5 seconds)
one (after 3 + 2 + 1 = 6 seconds)
Main script is intended to be run as a permanent process (TCP server). callback() function would receive data from client, execute external PHP script and then do something based on other arguments that are passed to callback(). The problem is that main script must not wait for external PHP script to finish. Result of external script is important, so exec('php -f file.php &') is not an option.
Edit:
Many have recommended to take a look at PCNTL, so it seems that such functionality can be achieved. PCNTL is not available in Windows, and I don't have an access to a Linux machine right now, so I can't test it, but if so many people have advised it, then it should do the trick :)
Thanks, everyone!

On Unix platforms you can enable the PCNTL functions, and use pcntl_fork to fork the process and run your jobs in child processes.
Something like:
function fast_call_user_func_array($func, $args) {
if (pcntl_fork() == 0) {
call_user_func_array($func, $args);
}
}
Once you call pcntl_fork, two processes will execute your code from the same position. The parent process will get a PID returned from pcntl_fork, while the child process will get 0. (If there's an error the parent process will return -1, which is worth checking for in production code).

You can check out PHP Process Control:
http://us.php.net/manual/en/intro.pcntl.php
Note: This is not threading, but the handling of separate processes. There is more overhead attached.

Wouldn't it solve your problem to fork, keeping the parent process free for other connections & actions? See http://www.php.net/pcntl_fork. If you need an answer back you could possibly listen to a socket in the parent, and write with the child. A simple while(true) loop with a read could possibly do, and probably you already have that basic functionality if you run a permanent TCP server. Another option would be to keep track of your childprocess-ids, keep a accessable store somewhere (file/database/memcached etc), with a pcnt_wait in the main process with a WNOHANG to check which process has exited, and retrieve the data from the store.

You can do some threading in PHP if you use the method pcntl_fork.
http://ca.php.net/manual/en/function.pcntl-fork.php
I have never use this myself, but the are some good example of how to use it on php.net.

PHP doesn't have this functionality as far as I know
You can emulate the function using a different technique, like this one:
Parallel functions in PHP

PHP does not support multi-threading, so there's no other option than taking advantage of the OS or the web server multi processing capabilities. Note that actually you can fetch both the result and output of exec:
string exec ( string $command [,
array &$output [, int &$return_var
]] )

You can, at least, prevent the parent process from hanging until the child process is done by ignoring the child signals using pcntl_signal(SIGCHLD, SIG_IGN).
So, let's say you want to fork a process and execute another PHP function that takes a while without making the parent wait for it to finish (since you want the main process to finish in a timely manner):
pcntl_signal(SIGCHLD, SIG_IGN);
$pid = pcntl_fork();
if ($pid < 0) {
exit(0);
} elseif (!$pid) {
my_slow_function();
exit(0);
}
// Parent keeps executing and finishes before the child does
If you want to execute a slow external script as the child process, pcntl_exec is handy:
$script = array('/path/to/my/script'); // E.g. /home/my_user/my_script.php
pcntl_exec('/path/to/program/executable',$script); // E.g. /usr/bin/php

Patterns for PHP multi processes?

Which design pattern exist to realize the execution of some PHP processes and the collection of the results in one PHP process?
Background:
I do have many large trees (> 10000 entries) in PHP and have to run recursive checks on it. I want to reduce the elapsed execution time.

If your goal is minimal time - the solution is simple to describe, but not that simple to implement.
You need to find a pattern to divide the work (You don't provide much information in the question in this regard).
Then use one master process that forks children to do the work. As a rule the total number of processes you use should be between n and 2n, where n is the number of cores the machine has.
Assuming this data will be stored in files you might consider using non-blocking IO to maximize the throughput. Not doing so will make most of your process spend time waiting for the disk. PHP has stream_select() that might help you. Note that using it is not trivial.
If you decide not to use select - increasing the number of processes might help.
In regards to pcntl functions: I've written a deamon with them (a proper one with forking, changing session id, the running user, etc...) and it's one of the most reliable piece of software I've written. Because it spawns workers for every task, even if there is a bug in one of the tasks, it does not affect the others.

From your php script, you could launch another script (using exec) to do the processing. Save status updates in a text file, which could then be read periodically by the parent thread.
Note: to avoid php waiting for the exec'd script to complete, pipe the output to a file:
exec('/path/to/file.php | output.log');
Alternatively, you can fork a script using the PCNTL functions. This uses one php script, which when forked can detect whether it is the parent or the child and operate accordingly. There are functions to send/receive signals for the purpose of communicating between parent/child, or you have the child log to a file and the parent read from that file.
From the pcntl_fork manual page:
$pid = pcntl_fork();
if ($pid == -1) {
die('could not fork');
} else if ($pid) {
// we are the parent
pcntl_wait($status); //Protect against Zombie children
} else {
// we are the child
}

This might be a good time to consider using a message queue, even if you run it all on one machine.

The question seems to be a bit confused.
I want to reduce the absolute execution time.
Do you mean elapsed time? Certainly use of the right data-structure will improve throughput, but for a given data-structure, the minmimum order of the algorithm is absolute, and nothing to do with how you implement the algorithm.
Which design pattern exist to realize....?
Design Patterns are something which code is, not a template for writing programs, and a useful tools for curriculum design. To start with a pattern and make your code fit it is in itself an anti-pattern.
Nobody can answer this question withuot knowing a lot more about your data and how its structured, however the key driver for efficiency will be the data-structure you use to implement your tree. If elapsed time is important then certainly look at parallel execution, however it may also be worth considering performing the operation in a different tool - databases are highly optimized for dealing with large sets of data, however note that the obvious method for describing a tree in a relational database is very inefficient when it comes to isolating sub-trees and walking the tree.
In response to Adam's suggesting of forking you replied:
I "heard" that pcntl isnt a good solution. Any experiences?
Where did you hear that? Certainly forking from a CGI or mod_php invoked script is a bad idea, but nothing wrong with doing it from the command line. Do have a google for long running PHP processes (be warned there is a lot of bad information out there). What code you write will vary depending on the underlying OS - which you've not stated.
I suspect that you could solve a large part of your performance issues by identifying which parts of the tree need to be checked and only checking those parts AND triggering the checks when the tree is updated, or at least marking the nodes as 'dirty'.
You might find these helpful:
http://mikehillyer.com/articles/managing-hierarchical-data-in-mysql/
http://en.wikipedia.org/wiki/Threaded_binary_tree
C.

You could use a more efficient data structure, such as a btree. I used once in Java but not in PHP. You can try this script: http://www.phpclasses.org/browse/file/708.html, it is an implementation of btree.
If it is not enough, you can use Hadoop to implement a Map/Reduce pattern, as Michael said. I would not fork PHP process, it does not seem to help for performace.
Personally, I would use PHP as client and put everything in Hadoop. This tutorial might help: http://www.lunchpauze.com/2007/10/writing-hadoop-mapreduce-program-in-php.html.
Another solution can be to use a Java implementation of Btree: http://jdbm.sourceforge.net/. JDBM is an object database using a Btree+ data astructures. Then you can search with PHP by exposing data with a web service or by accessing it directly with Quercus

Using web or CLI?
If you use web, you could intergrate that part in Quercus Then you could use the advantages of JAVA multithreading.
I don't actually know how reliable Quercus is though. I'd also suggest using a kind of message queue and refactoring the code, so it doesn't need the scope.
Maybe you could rebuild the code to a Map/Reduce pattern. You then can run the PHP code in Hadoop Then you can cluster the processing through a couple of machines.
I don't know if it's useful, but I came across another project, called Gearman. It's also used to cluster PHP processes. I guess you can combine that with a reduce script as well, if Hadoop is not the way you want to go.

pthreads
There is a rather new (since 2012) PHP extension available: pthreads. It can be installed via PECL.
Simple Implementation in PHP Code: extend from Thread Class. Add a run() method and execute the start() method.
<?php
// Example from http://www.phpgangsta.de/richtige-threads-in-php-einfach-erstellen-mit-pthreads
class AsyncOperation extends Thread
{
public function __construct($threadId)
{
$this->threadId = $threadId;
}
public function run()
{
printf("T %s: Sleeping 3sec\n", $this->threadId);
sleep(3);
printf("T %s: Hello World\n", $this->threadId);
}
}
$start = microtime(true);
for ($i = 1; $i <= 5; $i++) {
$t[$i] = new AsyncOperation($i);
$t[$i]->start();
}
echo microtime(true) - $start . "\n";
echo "end\n";
Outputs
>php pthreads.php
0.041301012039185
end
T 1: Sleeping 3sec
T 2: Sleeping 3sec
T 3: Sleeping 3sec
T 4: Sleeping 3sec
T 5: Sleeping 3sec
T 1: Hello World
T 2: Hello World
T 3: Hello World
T 4: Hello World
T 5: Hello World

Try this: PHPThreads
Code Example:
function threadproc($thread, $param) {
echo "\tI'm a PHPThread. In this example, I was given only one parameter: \"". print_r($param, true) ."\" to work with, but I can accept as many as you'd like!\n";
for ($i = 0; $i < 10; $i++) {
usleep(1000000);
echo "\tPHPThread working, very busy...\n";
}
return "I'm a return value!";
}
$thread_id = phpthread_create($thread, array(), "threadproc", null, array("123456"));
echo "I'm the main thread doing very important work!\n";
for ($n = 0; $n < 5; $n++) {
usleep(1000000);
echo "Main thread...working!\n";
}
echo "\nMain thread done working. Waiting on our PHPThread...\n";
phpthread_join($thread_id, $retval);
echo "\n\nOur PHPThread returned: " . print_r($retval, true) . "!\n";
Requires PHP extensions:
posix
pcntl
sockets

We Keep Coding

PHP, A popular general-purpose scripting language that is especially suited to web development.