im wondering if theres a way to run a code in a loop as a process and interacting with it from a different script. I know sockets listen to incoming requests but im referring to internal usage, without requests.
Standard approach:
Use pcntl-signal() and posix-kill() functions to interact by standard or user-defined signals.
Pros:
PHP built-in, ready for use functions. No need to reinvent wheels.
POSIX compatibility.
Cons:
You can only send defined signals to a script. Not values.
One-way interaction.
Example of listening script:
<?php
pcntl_signal(SIGTERM, 'sig_handler');
pcntl_signal(SIGUSR1, 'sig_handler');
echo 'Run... PID: ' . getmypid() . PHP_EOL;
$finish = false;
while (!$finish) {
pcntl_signal_dispatch();
}
echo 'Shutdown...' . PHP_EOL;
function sig_handler($signal) {
global $finish;
echo 'Received signal: ' . $signal . PHP_EOL;
switch ($signal) {
case SIGTERM:
$finish = true;
break;
case SIGUSR1:
echo 'Processing SIGUSR1 signal...' . PHP_EOL;
break;
}
}
Non-standard approach:
You can implement interaction with script using tools like database, files, sockets, pipes.
Pros:
Functionality depends only on realization.
Cons:
You need to implement protocol for interaction and support it in your script.
I would say first you set up your script to run in the background. You can implement it yourself (using fork) or use existing libraries.
https://www.google.co.jp/webhp?sourceid=chrome-instant&ion=1&espv=2&ie=UTF-8#q=php%20daemon
Then you define a protocol to communicate. There are many way to implements that, from simple to complex, depending on your needs.
A simple way for example would be to define a folder somewhere in the server that your script reads on the regular basis (loop + sleep). When a file is added, the script reads it, execute the instruction in it, and delete it.
Nope, you cannot directly access another running script, as each script runs on it's own resource sandbox (memory, file descriptors). You'd also need to know it's PID in order to access it, and then after, you'd need some hacking tools to get into that process space.
This may not full fill your requirement. But will works:
Try logging the status to a text file from the first script and read it frequently from the second script to know what is the current status of first script.
I have a situation, which is with similar approach.
I'm starting independent processes by forking and not wait for them - only start. But I've created control structure in a database. There every children process is storing it's state but also observes a control flag and when this flag is up - then children stops immediately. The children processes stores time-stamped records in another table and that let's me to check what process where is with it's task.
In control structure is stored also processes pids and it is possible to send signals to them.
I think this can be useful in your situation too.
when php process starts it take all its requirements for example variables linked pages etc etc.. and show result after it finishes its execution, and after it is completed execution you cant do anything but to re-execute.
its like going to picnic on moon and coming back to home after its over, so no one can disturb you while you are in your picnic. :D
Related
I wrote a web spider to spider pages concurrently. For each link that the spider finds, I want to fork off a new child that starts the process all over again.
I don't want to overload the target server so I created a static array that all objects can access. Each child can add their PID to the array, and either parent or child should check the array to see if $maxChildren have been met, and if so, patiently wait until any child finishes.
As you see, I have $maxChildren set to 3. I am expecting to see 3 simultaneous processes at any given time. However, that's not the case. The linux top command shows 12 to 30 processes at any given time. In concurrent programming, how can I regulate the number of simultaneous processes? My logic is currently inspired by how Apache handles it's max children, but I'm not exactly sure how that works.
As pointed out in one of the answers, globally accessing the static variable brings up issues with race conditions. To deal with this, the $children array takes the unique $PID of the process as both the key and it's value, thereby creating a unique value. My thinking is that since any object can only deal with one $children[$pid] value, locking is not necessary. Is this not true? Is there a chance that two processes could try to unset or add the same value at some point?
private static $children = array();
private $maxChildren = 3;
public function concurrentSpider($url) {
// STEP 1:
// Download the $url
$pageData = http_get($url, $ref = '');
if (!$this->checkIfSaved($url)) {
$this->save_link_to_db($url, $pageData);
}
// STEP 2:
// extract all hyperlinks from this url's page data
$linksOnThisPage = $this->harvest_links($url, $pageData);
// STEP 3:
// Check the links array from STEP 2 to see if this page has
// already been saved or is excluded because of any other
// logic from the excluded_link() function
$filteredLinks = $this->filterLinks($linksOnThisPage);
shuffle($filteredLinks);
// STEP 4: loop through each of the links and
// repeat the process
foreach ($filteredLinks as $filteredLink) {
$pid = pcntl_fork();
switch ($pid) {
case -1:
print "Could not fork!\n";
exit(1);
case 0:
if ($this->checkIfSaved($filteredLink)) {
exit();
}
//$pid = getmypid();
print "In child with PID: " . getmypid() . " processing $filteredLink \n";
$var[$pid]->concurrentSpider($filteredLink);
sleep(2);
exit(1);
default:
// Add an element to the children array
self::$children[$pid] = $pid;
// If the maximum number of children has been
// achieved, wait until one or more return
// before continuing.
while (count(self::$children) >= $this->maxChildren) {
//print count(self::$children) . " children \n";
$pid = pcntl_waitpid(-1, $status);
unset(self::$children[$pid]);
}
}
}
}
This is written in PHP. I know that the pcntl_waitpid function with argument of -1 waits for any child to complete regardless of the parent (http://php.net/manual/en/function.pcntl-waitpid.php).
What's wrong with my logic and how can I correct it so that only $maxChildren processes are running simultaneously? I'm also open to improving the logic in general if you have suggestions.
First thing to note: if this is truly a global being shared among multiple threads, it's possible that multiple threads are adding to it at once and you're running afoul of a race condition. You need some sort of concurrency control to ensure that only one process is accessing your global array at once.
Also, try the simple debugging trick of having each process write out (to the console or to a file) its PID and the full contents of the global array each time a new spider is forked. It will help you to check your assumptions (which are plainly wrong at some point) and figure out what's going wrong.
EDIT: (In response to the comments)
I'm not a PHP developer, but if I had to guess, based on the fact that you're using an OS tool that counts OS-level processes, I'd guess that your fork is spawning multiple processes, but your static array is global within the current process. Implementing system-wide shared memory is a lot more complicated!
If you just want to count something and ensure that instances of a shared resource don't grow out of control, look into semaphores, and see if you can find a way in PHP to create a named semaphore object that can be shared between multiple instances of your spider.
Use a real programming language ;)
Step 1 is kind of bad why are you downloading if it might be in the db. Put that inside the if and see if you can put a mutex around it. Maybe so something in sql to imitate one.
I hope harvest_links uses a proper html processor with css selector support (i like fizzler for .NET). I guess regular expression would be fine if its just to get links but it is possible to mess up.
I see step 4 and i don't think its bad but personally i'd do it a different way.
I'd have something like step one to insert url,page,flag into a db. Then i'd have another process or the same one ask the db for unprocessed pages and set the flag to some value if it errors and another if its successful. This is so if something fails of the process exits (shutdown, crash, power out, etc) it can pick it up easily and don't need to scan every page to find where it left off. It just ask the database for the next link and redoes what it didnt finish
PHP doesn't support multithreading, therefore it doesn't support mutexes or any other synchronization methods. As others have said in their answers, this will lead to a race condition.
You'll have to write a wrapper in C or bash. That way, the PHP script can submit targets to the wrapper, and the wrapper will handle scheduling.
Another approach is to rewrite your spider in Python or Ruby, both of which support multithreading. That will eliminate the need for interprocess communication.
Edit: On second thought, the best way is to write the wrapper in Python or Ruby and reuse your existing PHP code as a black box. That's a compromise of the solutions above.
If the spider is for practical purposes, you might want to google "curl multithread"
cURL Multi Threading with PHP
I want to write an Ajax web application, a game to be specific. Two web clients have to communicate with each other via the PHP server. My approach to solve this is to use Ajax between client and server and server and client. Each client creates a separate server process using Ajax. I want that these two server processes communicate via MySQL and via named pipes. I need the named pipes to get immediate response for the whole application.
I cannot use one server process, which first creates a pipe and then forks into two processes, which use the pipe. Web applications create server processes when the web-browser sends a request. So, I need named pipes, where each process doesn't know more than the file name of the named pipe. They cannot exchange file handles (at least I don't know how).
My problem is that named pipes in the PHP way indeed work as long as they are
used within the same function:
public function writeAndReadPipe_test(){
$pipeA = fopen("testpipe",'r+');
fwrite($pipeA, 'ABCD');
$pipeB = fopen("testpipe",'r+');
$content = fread($pipeB, 4);
echo "[" . $content . "]<br>\n";
}
public function testingPipes_same_function(){
posix_mkfifo("testpipe", 0777);
$this->writeAndReadPipe_test();
}
But, when I use different functions, then the fread($pipeB, 4) command blocks the whole application:
public function writePipe_test(){
$pipeA = fopen("testpipe",'r+');
fwrite($pipeA, 'ABCD');
}
public function readPipe_test(){
$pipeB = fopen("testpipe",'r+');
$content = fread($pipeB, 4);
echo "[" . $content . "]<br>\n";
}
public function testingPipes_different_functions(){
posix_mkfifo("testpipe", 0777);
$this->writePipe_test();
$this->readPipe_test();
}
Does anybody know why? And what can I do to make it work between different functions in the first step? In the second step, it should work even between different processes! I found out that I also get a problem when the writer closes the pipe before the reader reads from
it. I suppose that the function closes it automatically when it ends, but this is only a guess.
If the PHP way does not work, I plan to let PHP open a command line, generate BASH commands and let execute them. This should work in any case as long as my web-server works in a LAMP environment. Disadvantage is that it will not work in WAMP environments.
So, has anybody some ideas to this?
P.S:
I need blocking pipes to let the reader wait until an event is sent. I know that the pipes can work in a non-blocking mode using
stream_set_blocking($pipe,false);
or so, but the whole idea is to do it without polling just using a pipe, which wakes the
reader up as soon as an event is fired.
Just a short statement, that I actually found a nice solution
using named pipes:
public function createPipe_test(){
posix_mkfifo("testpipe", 0777);
}
public function writePipe_test($value){
$pipeA = fopen("testpipe",'w');
$valueString = number_format($value);
$valueLen = number_format(strlen($valueString));
fwrite($pipeA, $valueLen);
fwrite($pipeA, $valueString);
}
public function readPipe_test(){
$pipeB = fopen("testpipe",'r');
$valueLen = fread($pipeB, 1);
return fread($pipeB, $valueLen);
}
I have two processes.
If process 1 calls writePipe_test(), then it waits until process 2 calls
readPipe_test() to read the data out of the pipe.
If process 1 calls readPipe_test(), then it waits until process 2 calls
writePipe_test() to write something into the pipe.
The trick is 'w' and 'r' instead of 'r+'.
When you use the pipes in separate functions like this, the write pipe A would seem to be closed/discarded again (local scope of $pipeA). The assumption would be that the pipe must be opened for reading and/or writing in order to retain any info, which makes sense really. Though I don't know the inner magic.
You can also observe that your blocking read-call succeeds when you feed the pipe from another process (like echo magic >> testpipe). So you'd already have step 2 done, but you need some pipehandle management.
If you change it as follows it'd work:
private $pipeA;
public function writePipe_test(){
$this->pipeA = fopen("testpipe",'r+');
fwrite($this->pipeA, 'ABCD');
}
Edit: or setting $pipeA to have global scope, for that matter..
Im not sure wether I understand your 2nd last post..
But to comment on the last one, if I don't misunderstand, TCP might be even more complex because you will have to establish a connection before you can either read or write, so youve got different overhead
As for the pipehandle closing at the function end, I assume you'll face the same problem with the sockets; but the pipefile remains!
Persistent storage (files,db) would make the clients independent timingwise, if you want to use blocking calls then files might actually be a way to go..
The question sort of says it all - is there a function which does the same as the JavaScript function setTimeout() for PHP? I've searched php.net, and I can't seem to find any...
There is no way to delay execution of part of the code of in the current script. It wouldn't make much sense, either, as the processing of a PHP script takes place entirely on server side and you would just delay the overall execution of the script. There is sleep() but that will simply halt the process for a certain time.
You can, of course, schedule a PHP script to run at a specific time using cron jobs and the like.
There's the sleep function, which pauses the script for a determined amount of time.
See also usleep, time_nanosleep and time_sleep_until.
PHP isn't event driven, so a setTimeout doesn't make much sense. You can certainly mimic it and in fact, someone has written a Timer class you could use. But I would be careful before you start programming in this way on the server side in PHP.
A few things I'd like to note about timers in PHP:
1) Timers in PHP make sense when used in long-running scripts (daemons and, maybe, in CLI scripts). So if you're not developing that kind of application, then you don't need timers.
2) Timers can be blocking and non-blocking. If you're using sleep(), then it's a blocking timer, because your script just freezes for a specified amount of time.
For many tasks blocking timers are fine. For example, sending statistics every 10 seconds. It's ok to block the script:
while (true) {
sendStat();
sleep(10);
}
3) Non-blocking timers make sense only in event driven apps, like websocket-server. In such applications an event can occur at any time (e.g incoming connection), so you must not block your app with sleep() (obviously).
For this purposes there are event-loop libraries, like reactphp/event-loop, which allows you to handle multiple streams in a non-blocking fashion and also has timer/ interval feature.
4) Non-blocking timeouts in PHP are possible.
It can be implemented by means of stream_select() function with timeout parameter (see how it's implemented in reactphp/event-loop StreamSelectLoop::run()).
5) There are PHP extensions like libevent, libev, event which allow timers implementation (if you want to go hardcore)
Not really, but you could try the tick count function.
http://php.net/manual/en/class.evtimer.php is probably what you are looking for, you can have a function called during set intervals, similar to setInterval in javascript. it is a pecl extension, if you have whm/cpanel you can easily install it through the pecl software/extension installer page.
i hadn't noticed this question is from 2010 and the evtimer class started to be coded in 2012-2013. so as an update to an old question, there is now a class that can do this similar to javascripts settimeout/setinterval.
Warning: You should note that while the sleep command can make a PHP process hang, or "sleep" for a given amount of time, you'd generally implement visual delays within the user interface.
Since PHP is a server side language, merely writing its execution output (generally in the form of HTML) to a web server response: using sleep in this fashion will generally just stall or delay the response.
With that being said, sleep does have practical purposes. Delaying execution can be used to implement back off schemes, such as when retrying a request after a failed connection. Generally speaking, if you need to use a setTimeout in PHP, you're probably doing something wrong.
Solution: If you still want to implement setTimeout in PHP, to answer your question explicitly: Consider that setTimeout possesses two parameters, one which represents the function to run, and the other which represents the amount of time (in milliseconds). The following code would actually meet the requirements in your question:
<?php
// Build the setTimeout function.
// This is the important part.
function setTimeout($fn, $timeout){
// sleep for $timeout milliseconds.
sleep(($timeout/1000));
$fn();
}
// Some example function we want to run.
$someFunctionToExecute = function() {
echo 'The function executed!';
}
// This will run the function after a 3 second sleep.
// We're using the functional property of first-class functions
// to pass the function that we wish to execute.
setTimeout($someFunctionToExecute, 3000);
?>
The output of the above code will be three seconds of delay, followed by the following output:
The function executed!
if you need to make an action after you execute some php code you can do it with an echo
echo "Success.... <script>setTimeout(function(){alert('Hello')}, 3000);</script>";
so after a time in the client(browser) you can do something else, like a redirect to another php script for example or echo an alert
There is a Generator class available in PHP version > 5.5 which provides a function called yield that helps you pause and continue to next function.
generator-example.php
<?php
function myGeneratorFunction()
{
echo "One","\n";
yield;
echo "Two","\n";
yield;
echo "Three","\n";
yield;
}
// get our Generator object (remember, all generator function return
// a generator object, and a generator function is any function that
// uses the yield keyword)
$iterator = myGeneratorFunction();
OUTPUT
One
If you want to execute the code after the first yield you add these line
// get the current value of the iterator
$value = $iterator->current();
// get the next value of the iterator
$value = $iterator->next();
// and the value after that the next value of the iterator
// $value = $iterator->next();
Now you will get output
One
Two
If you minutely see the setTimeout() creates an event loop.
In PHP there are many libraries out there E.g amphp is a popular one that provides event loop to execute code asynchronously.
Javascript snippet
setTimeout(function () {
console.log('After timeout');
}, 1000);
console.log('Before timeout');
Converting above Javascript snippet to PHP using Amphp
Loop::run(function () {
Loop::delay(1000, function () {
echo date('H:i:s') . ' After timeout' . PHP_EOL;
});
echo date('H:i:s') . ' Before timeout' . PHP_EOL;
});
Check this Out!
<?php
set_time_limit(20);
while ($i<=10)
{
echo "i=$i ";
sleep(100);
$i++;
}
?>
Output:
i=0 i=1 i=2 i=3 i=4 i=5 i=6 i=7 i=8 i=9 i=10
Which design pattern exist to realize the execution of some PHP processes and the collection of the results in one PHP process?
Background:
I do have many large trees (> 10000 entries) in PHP and have to run recursive checks on it. I want to reduce the elapsed execution time.
If your goal is minimal time - the solution is simple to describe, but not that simple to implement.
You need to find a pattern to divide the work (You don't provide much information in the question in this regard).
Then use one master process that forks children to do the work. As a rule the total number of processes you use should be between n and 2n, where n is the number of cores the machine has.
Assuming this data will be stored in files you might consider using non-blocking IO to maximize the throughput. Not doing so will make most of your process spend time waiting for the disk. PHP has stream_select() that might help you. Note that using it is not trivial.
If you decide not to use select - increasing the number of processes might help.
In regards to pcntl functions: I've written a deamon with them (a proper one with forking, changing session id, the running user, etc...) and it's one of the most reliable piece of software I've written. Because it spawns workers for every task, even if there is a bug in one of the tasks, it does not affect the others.
From your php script, you could launch another script (using exec) to do the processing. Save status updates in a text file, which could then be read periodically by the parent thread.
Note: to avoid php waiting for the exec'd script to complete, pipe the output to a file:
exec('/path/to/file.php | output.log');
Alternatively, you can fork a script using the PCNTL functions. This uses one php script, which when forked can detect whether it is the parent or the child and operate accordingly. There are functions to send/receive signals for the purpose of communicating between parent/child, or you have the child log to a file and the parent read from that file.
From the pcntl_fork manual page:
$pid = pcntl_fork();
if ($pid == -1) {
die('could not fork');
} else if ($pid) {
// we are the parent
pcntl_wait($status); //Protect against Zombie children
} else {
// we are the child
}
This might be a good time to consider using a message queue, even if you run it all on one machine.
The question seems to be a bit confused.
I want to reduce the absolute execution time.
Do you mean elapsed time? Certainly use of the right data-structure will improve throughput, but for a given data-structure, the minmimum order of the algorithm is absolute, and nothing to do with how you implement the algorithm.
Which design pattern exist to realize....?
Design Patterns are something which code is, not a template for writing programs, and a useful tools for curriculum design. To start with a pattern and make your code fit it is in itself an anti-pattern.
Nobody can answer this question withuot knowing a lot more about your data and how its structured, however the key driver for efficiency will be the data-structure you use to implement your tree. If elapsed time is important then certainly look at parallel execution, however it may also be worth considering performing the operation in a different tool - databases are highly optimized for dealing with large sets of data, however note that the obvious method for describing a tree in a relational database is very inefficient when it comes to isolating sub-trees and walking the tree.
In response to Adam's suggesting of forking you replied:
I "heard" that pcntl isnt a good solution. Any experiences?
Where did you hear that? Certainly forking from a CGI or mod_php invoked script is a bad idea, but nothing wrong with doing it from the command line. Do have a google for long running PHP processes (be warned there is a lot of bad information out there). What code you write will vary depending on the underlying OS - which you've not stated.
I suspect that you could solve a large part of your performance issues by identifying which parts of the tree need to be checked and only checking those parts AND triggering the checks when the tree is updated, or at least marking the nodes as 'dirty'.
You might find these helpful:
http://mikehillyer.com/articles/managing-hierarchical-data-in-mysql/
http://en.wikipedia.org/wiki/Threaded_binary_tree
C.
You could use a more efficient data structure, such as a btree. I used once in Java but not in PHP. You can try this script: http://www.phpclasses.org/browse/file/708.html, it is an implementation of btree.
If it is not enough, you can use Hadoop to implement a Map/Reduce pattern, as Michael said. I would not fork PHP process, it does not seem to help for performace.
Personally, I would use PHP as client and put everything in Hadoop. This tutorial might help: http://www.lunchpauze.com/2007/10/writing-hadoop-mapreduce-program-in-php.html.
Another solution can be to use a Java implementation of Btree: http://jdbm.sourceforge.net/. JDBM is an object database using a Btree+ data astructures. Then you can search with PHP by exposing data with a web service or by accessing it directly with Quercus
Using web or CLI?
If you use web, you could intergrate that part in Quercus Then you could use the advantages of JAVA multithreading.
I don't actually know how reliable Quercus is though. I'd also suggest using a kind of message queue and refactoring the code, so it doesn't need the scope.
Maybe you could rebuild the code to a Map/Reduce pattern. You then can run the PHP code in Hadoop Then you can cluster the processing through a couple of machines.
I don't know if it's useful, but I came across another project, called Gearman. It's also used to cluster PHP processes. I guess you can combine that with a reduce script as well, if Hadoop is not the way you want to go.
pthreads
There is a rather new (since 2012) PHP extension available: pthreads. It can be installed via PECL.
Simple Implementation in PHP Code: extend from Thread Class. Add a run() method and execute the start() method.
<?php
// Example from http://www.phpgangsta.de/richtige-threads-in-php-einfach-erstellen-mit-pthreads
class AsyncOperation extends Thread
{
public function __construct($threadId)
{
$this->threadId = $threadId;
}
public function run()
{
printf("T %s: Sleeping 3sec\n", $this->threadId);
sleep(3);
printf("T %s: Hello World\n", $this->threadId);
}
}
$start = microtime(true);
for ($i = 1; $i <= 5; $i++) {
$t[$i] = new AsyncOperation($i);
$t[$i]->start();
}
echo microtime(true) - $start . "\n";
echo "end\n";
Outputs
>php pthreads.php
0.041301012039185
end
T 1: Sleeping 3sec
T 2: Sleeping 3sec
T 3: Sleeping 3sec
T 4: Sleeping 3sec
T 5: Sleeping 3sec
T 1: Hello World
T 2: Hello World
T 3: Hello World
T 4: Hello World
T 5: Hello World
Try this: PHPThreads
Code Example:
function threadproc($thread, $param) {
echo "\tI'm a PHPThread. In this example, I was given only one parameter: \"". print_r($param, true) ."\" to work with, but I can accept as many as you'd like!\n";
for ($i = 0; $i < 10; $i++) {
usleep(1000000);
echo "\tPHPThread working, very busy...\n";
}
return "I'm a return value!";
}
$thread_id = phpthread_create($thread, array(), "threadproc", null, array("123456"));
echo "I'm the main thread doing very important work!\n";
for ($n = 0; $n < 5; $n++) {
usleep(1000000);
echo "Main thread...working!\n";
}
echo "\nMain thread done working. Waiting on our PHPThread...\n";
phpthread_join($thread_id, $retval);
echo "\n\nOur PHPThread returned: " . print_r($retval, true) . "!\n";
Requires PHP extensions:
posix
pcntl
sockets
I have a list of data that needs to be processed. The way it works right now is this:
A user clicks a process button.
The PHP code takes the first item that needs to be processed, takes 15-25 secs to process it, moves on to the next item, and so on.
This takes way too long. What I'd like instead is that:
The user clicks the process button.
A PHP script takes the first item and starts to process it.
Simultaneously another instance of the script takes the next item and processes it.
And so on, so around 5-6 of the items are being process simultaneously and we get 6 items processed in 15-25 secs instead of just one.
Is something like this possible?
I was thinking that I use CRON to launch an instance of the script every second. All items that need to be processed will be flagged as such in the MySQL database, so whenever an instance is launched through CRON, it will simply take the next item flagged to be processed and remove the flag.
Thoughts?
Edit: To clarify something, each 'item' is stored in a mysql database table as seperate rows. Whenever processing starts on an item, it is flagged as being processed in the db, hence each new instance will simply grab the next row which is not being processed and process it. Hence I don't have to supply the items as command line arguments.
Here's one solution, not the greatest, but will work fine on Linux:
Split the processing PHP into a separate CLI scripts in which:
The command line inputs include `$id` and `$item`
The script writes its PID to a file in `/tmp/$id.$item.pid`
The script echos results as XML or something that can be read into PHP to stdout
When finished the script deletes the `/tmp/$id.$item.pid` file
Your master script (presumably on your webserver) would do:
`exec("nohup php myprocessing.php $id $item > /tmp/$id.$item.xml");` for each item
Poll the `/tmp/$id.$item.pid` files until all are deleted (sleep/check poll is enough)
If they are never deleted kill all the processing scripts and report failure
If successful read the from `/tmp/$id.$item.xml` for format/output to user
Delete the XML files if you don't want to cache for later use
A backgrounded nohup started application will run independent of the script that started it.
This interested me sufficiently that I decided to write a POC.
test.php
<?php
$dir = realpath(dirname(__FILE__));
$start = time();
// Time in seconds after which we give up and kill everything
$timeout = 25;
// The unique identifier for the request
$id = uniqid();
// Our "items" which would be supplied by the user
$items = array("foo", "bar", "0xdeadbeef");
// We exec a nohup command that is backgrounded which returns immediately
foreach ($items as $item) {
exec("nohup php proc.php $id $item > $dir/proc.$id.$item.out &");
}
echo "<pre>";
// Run until timeout or all processing has finished
while(time() - $start < $timeout)
{
echo (time() - $start), " seconds\n";
clearstatcache(); // Required since PHP will cache for file_exists
$running = array();
foreach($items as $item)
{
// If the pid file still exists the process is still running
if (file_exists("$dir/proc.$id.$item.pid")) {
$running[] = $item;
}
}
if (empty($running)) break;
echo implode($running, ','), " running\n";
flush();
sleep(1);
}
// Clean up if we timeout out
if (!empty($running)) {
clearstatcache();
foreach ($items as $item) {
// Kill process of anything still running (i.e. that has a pid file)
if(file_exists("$dir/proc.$id.$item.pid")
&& $pid = file_get_contents("$dir/proc.$id.$item.pid")) {
posix_kill($pid, 9);
unlink("$dir/proc.$id.$item.pid");
// Would want to log this in the real world
echo "Failed to process: ", $item, " pid ", $pid, "\n";
}
// delete the useless data
unlink("$dir/proc.$id.$item.out");
}
} else {
echo "Successfully processed all items in ", time() - $start, " seconds.\n";
foreach ($items as $item) {
// Grab the processed data and delete the file
echo(file_get_contents("$dir/proc.$id.$item.out"));
unlink("$dir/proc.$id.$item.out");
}
}
echo "</pre>";
?>
proc.php
<?php
$dir = realpath(dirname(__FILE__));
$id = $argv[1];
$item = $argv[2];
// Write out our pid file
file_put_contents("$dir/proc.$id.$item.pid", posix_getpid());
for($i=0;$i<80;++$i)
{
echo $item,':', $i, "\n";
usleep(250000);
}
// Remove our pid file to say we're done processing
unlink("proc.$id.$item.pid");
?>
Put test.php and proc.php in the same folder of your server, load test.php and enjoy.
You will of course need nohup (unix) and PHP cli to get this to work.
Lots of fun, I may find a use for it later.
Use an external workqueue like Beanstalkd which your PHP script writes a bunch of jobs too. You have as many worker processes pulling jobs from beanstalkd and processing them as fast as possible. You can spin up as many workers as you have memory / CPU. Your job body should contain as little information as possible, maybe just some IDs which you hit the DB with. beanstalkd has a slew of client APIs and itself has a very basic API, think memcached.
We use beanstalkd to process all of our background jobs, I love it. Easy to use, its very fast.
There is no multithreading in PHP, however you can use fork.
php.net:pcntl-fork
Or you could execute a system() command and start another process which is multithreaded.
can you implementing threading in javascript on the client side? seems to me i've seen a javascript library (from google perhaps?) that implements it. google it and i'm sure you'll find something. i've never done it, but i know its possible. anyway, your client-side javascript could activate (ajax) a php script once for each item in separate threads. that might be easier than trying to do it all on the server side.
-don
If you are running a high traffic PHP server you are INSANE if you do not use Alternative PHP Cache: http://php.net/manual/en/book.apc.php . You do not have to make code modifications to run APC.
Another useful technique that can work along with APC is using the Smarty template system which allows you to cache output so that pages do not have to be rebuilt.
To solve this problem, I've used two different products; Gearman and RabbitMQ.
The benefit of putting your jobs into some sort of queuing software like Gearman or Rabbit is that you have multiple machines, they can all participate in processing items off the queue(s).
Gearman is easier to setup, so I'd suggest poking around with it a bit first. If you find you need something more heavy duty with queue robustness; Look into RabbitMQ
http://www.danga.com/gearman/
http://pear.php.net/package/Net_Gearman (PEAR library)
You can use pcntl_fork() and family to fork a process - however you may need something like IPC to communicate back to the parent process that the child process (the one you fork'd) is finished.
You could have them write to shared memory, like via memcache or a DB.
You could also have the child process write the completed data to a file, that the parent process keeps checking - as each child process completes the file is created/written to/updated, and parent process can grab it, one at a time, and them throw them back to the callee/client.
The parent's job is to control the queue, to make sure the same data isn't processed twice and also to sanity check the children (better kill that runaway process and start over...etc)
Something else to keep in mind - on windows platforms you are going to be severely limited - I dont even think you have access to pcntl_ unless you compiled PHP with support for it.
Also, can you cache the data once its been processed, or is it unique data every time? that would surely speed things up..?