waiting for all pids to exit in php - php

My issue is this. I am forking a process so that I can speed up access time to files on disk. I store any data from these files in a tmp file on local desk. ideally, after all processes have finished, I need to access that tmp file and get that data into an array. I then unlink the tmp file as it is no longer needed. My problem is that it would seem that pcntl_wait() does not acutally wait until all child processes are done before moving on to the final set of operations. So I end up unlinking that file before some random process can finish up.
I can't seem to find a solid way to wait for all processes to exit cleanly and then access my data.
$numChild = 0;
$maxChild = 20; // max number of forked processes.
// get a list of "availableCabs"
foreach ($availableCabs as $cab) {
// fork the process
$pids[$numChild] = pcntl_fork();
if (!$pids[$numChild]) {
// do some work
exit(0);
} else {
$numChild++;
if ($numChild == $maxChild) {
pcntl_wait($status);
$numChild--;
}
} // end fork
}
// Below is where things fall apart. I need to be able to print the complete serialized data. but several child processes don't actually exit before i unlink the file.
$dataFile = fopen($pid, 'r');
while(($values = fgetcsv($dataFile,',')) !== FALSE) {
$fvalues[] = $values;
}
print serialize($fvalues);
fclose($dataFile);
unlink($file);
please note that i'm leaving a lot of code out regarding what i'm actually doing, if we need that posted thats not issue.

Try restructuring you code so that you have two loops - one that spawns processes and one that waits for them to finish. You should also use pcntl_waitpid() to check for specific process IDs, rather than the simple child counting approach you are currently using.
Something like this:
<?php
$maxChildren = 20; // Max number of forked processes
$pids = array(); // Child process tracking array
// Get a list of "availableCabs"
foreach ($availableCabs as $cab) {
// Limit the number of child processes
// If $maxChildren or more processes exist, wait until one exits
if (count($pids) >= $maxChildren) {
$pid = pcntl_waitpid(-1, $status);
unset($pids[$pid]); // Remove PID that exited from the list
}
// Fork the process
$pid = pcntl_fork();
if ($pid) { // Parent
if ($pid < 0) {
// Unable to fork process, handle error here
continue;
} else {
// Add child PID to tracker array
// Use PID as key for easy use of unset()
$pids[$pid] = $pid;
}
} else { // Child
// If you aren't doing this already, consider using include() here - it
// will keep the code in the parent script more readable and separate
// the logic for the parent and children
exit(0);
}
}
// Now wait for the child processes to exit. This approach may seem overly
// simple, but because of the way it works it will have the effect of
// waiting until the last process exits and pretty much no longer
foreach ($pids as $pid) {
pcntl_waitpid($pid, $status);
unset($pids[$pid]);
}
// Now the parent process can do it's cleanup of the results

Related

php exec() never ended

I'm using php exec() to run a executable file and it seems never ended.
But running this executable file in shell is ok
Here's the main things the executable file do:
fork();
child process does some time-wasting things.
And I setrlimit a CPU time
In parent process: listen signals and kill child process when the used_time calculated exceeds limit
How can I do to make php exec() work?
Update:
because the code is too long,I just select some of them
main function
child_pid = fork();
if(child_pid == 0)
{
compile();
exit(0);
}
else
{
int res = watch();
if(res)
puts("YES");
else
puts("NO");
}
child process
LIM.rlim_cur = LIM.rlim_max = COMPILE_TIME;
setrlimit(RLIMIT_CPU,&LIM);
alarm(0);
alarm(LIM.rlim_cur * 10);
switch(language)
{
//..... here is execl() to call compiler like gcc,g++,javac
}
parent process
int status = 0;
int used_time = 0;
struct timeval case_startv, case_nowv;
struct timezone case_startz, case_nowz;
gettimeofday(&case_startv, &case_startz);
while(1)
{
usleep(50000);
kill(child_pid,SIGKILL);
gettimeofday(&case_nowv, &case_nowz);
used_time = case_nowv.tv_sec - case_startv.tv_sec;
if(waitpid(child_pid,&status,WNOHANG) == 0) //still running
{
if(used_time > COMPILE_TIME)
{
report_log("Compile time limit exceed");
kill(child_pid,SIGKILL);
return 0;
}
}
else
{
//handle signals
}
}
For test,just the function exec() in php file
The situation what i said only occurred when :
use php exec() run the executable file to compile user code like:
#include "/dev/random"
//....
Php script on server has limited time to execute. It is generally not a good idea to execute long running scripts this way. It is recommended that they be run as background jobs.
Thi is defined in php.ini which is different for apache and shell
At last, I find out why this happened..
I just kill childpid but not kill other process cause by childpid
So php exec() will always run

PHP counter issue using flock()

I have a problem with a counter. I need to count two variables, separated with a |, but sometimes the counter doesn't increase a variable's value.
numeri.txt (the counter):
5240|593389
This is the PHP script:
$filename="numeri.txt";
$fp=fopen($filename,"r");
if(!flock($fp,LOCK_SH))
{
while(true)
{
usleep(100000);
if(flock($fp,LOCK_SH))
{
break;
}
}
}
$contents=fread($fp,filesize($filename));
flock($fp,LOCK_UN);
fclose($fp);
$fp=fopen($filename,'a');
if(!flock($fp,LOCK_EX))
{
while(true)
{
usleep(100000);
if(flock($fp,LOCK_EX))
{
break;
}
}
}
ftruncate($fp,0);
$contents=explode("|",$contents);
$clicks=$contents[0];
$impressions=$contents[1]+1;
fwrite($fp,$clicks."|".$impressions);
flock($fp,LOCK_UN);
fclose($fp);
I set the counter to the right value but after 3-4 days the counter hasn't counted about 50 impressions (the number after "|")
How to fix the code?
Two problems:
1) There is still the opportunity to do a read, another process to write the file, then this process write the file. Millisecond timing required, but it's possible.
2) You don't actually verify the "open" works - it can fail.
The solution is to firstly check for open failure and retry, and secondly do one lock - exclusive. Paraphrased code:
while (!$fh = fopen($file, 'c+')) { // Read write, do not truncate, place pointer at start.
usleep(100000);
}
while (!flock(LOCK_EX)) {
usleep(100000);
}
$content = fread();
// Process content...
ftruncate();
fseek(0); // Or frewind();
fwrite();
fclose(); // Also releases the lock.

Creating a PHP Online Grading System on Linux: exec Behavior, Process IDs, and grep

Background
I am writing a simple online judge (a code grading system) using PHP and MySQL. It takes submitted codes in C++ and Java, compiles them, and tests them.
This is Apache running PHP 5.2 on an old version of Ubuntu.
What I am currently doing
I have a php program that loops infinitely, calling another php program by
//for(infinity)
exec("php -f grade.php");
//...
every tenth of a second. Let's call the first one looper.php and the second one grade.php. (Checkpoint: grade.php should completely finish running before the "for" loop continues, correct?)
grade.php pulls the earliest submitted code that needs to be graded from the MySQL database, puts that code in a file (test.[cpp/java]), and calls 2 other php programs in succession, named compile.php and test.php, like so:
//...
exec("php -f compile.php");
//...
//for([all tests])
exec("php -f test.php");
//...
(Checkpoint: compile.php should completely finish running before the "for" loop calling test.php even starts, correct?)
compile.php then compiles the program in test.[cpp/java] as a background process. For now, let's assume that it's compiling a Java program and that test.java is located in a subdirectory. I now have
//...
//$dir = "./sub/" or some other subdirectory; this may be an absolute path
$start_time = microtime(true); //to get elapsed compilation time later
exec("javac ".$dir."test.java -d ".$dir." 2> ".$dir
."compileError.txt 1> ".$dir."compileText.txt & echo $!", $out);
//...
in compile.php. It's redirecting the output from javac, so javac should be running as a background process... and it seems like it works. The $out should be grabbing the process id of javac in $out[0].
The real problem
I want to stop compiling if for some reason compiling takes more than 10 seconds, and I want to end compile.php if the program stops compiling before 10 seconds. Since the exec("javac... I called above is a background process (or is it?), I have no way of knowing when it has completed without looking at the process id, which should have been stored in $out earlier. Right after, in compile.php, I do this with a 10 second loop calling exec("ps ax | grep [pid].*javac"); and seeing if the pid still exists:
//...
$pid = (int)$out[0];
$done_compile = false;
while((microtime(true) - $start_time < 10) && !$done_compile) {
usleep(20000); // only sleep 0.02 seconds between checks
unset($grep);
exec("ps ax | grep ".$pid.".*javac", $grep);
$found_process = false;
//loop through the results from grep
while(!$found_process && list(, $proc) = each($grep)) {
$boom = explode(" ", $proc);
$npid = (int)$boom[0];
if($npid == $pid)
$found_process = true;
}
$done_compile = !$found_process;
}
if(!done_compile)
exec("kill -9 ".$pid);
//...
... which doesn't seem to be working. At least some of the time. Often, what happens is test.php starts running before the javac even stops, resulting in test.php not being able to find the main class when it tries to run the java program. I think that the loop is bypassed for some reason, though this may not be the case. At other times, the entire grading system works as intended.
Meanwhile, test.php also uses the same strategy (with the X-second loop and the grep) in running a program in a certain time limit, and it has a similar bug.
I think the bug lies in the grep not finding javac's pid even when javac is still running, resulting in the 10 second loop breaking early. Can you spot an obvious bug? A more discreet bug? Is there a problem with my usage of exec? Is there a problem with $out? Or is something entirely different happening?
Thank you for reading my long question. All help is appreciated.
I just came up with this code that will run a process, and terminate it if it runs longer than $timeout seconds. If it terminates before the timeout, it will have the program output in $output and the exit status in $return_value.
I have tested it and it seems to work well. Hopefully you can adapt it to your needs.
<?php
$command = 'echo Hello; sleep 30'; // the command to execute
$timeout = 5; // terminate process if it goes longer than this time in seconds
$cwd = '/tmp'; // working directory of executing process
$env = null; // environment variables to set, null to use same as PHP
$descriptorspec = array(
0 => array("pipe", "r"), // stdin is a pipe that the child will read from
1 => array("pipe", "w"), // stdout is a pipe that the child will write to
2 => array("file", "/tmp/error-output.txt", "a") // stderr is a file to write to
);
// start the process
$process = proc_open($command, $descriptorspec, $pipes, $cwd, $env);
$startTime = time();
$terminated = false;
$output = '';
if (is_resource($process)) {
// process was started
// $pipes now looks like this:
// 0 => writeable handle connected to child stdin
// 1 => readable handle connected to child stdout
// Any error output will be appended to /tmp/error-output.txt
// loop infinitely until timeout, or process finishes
for(;;) {
usleep(100000); // dont consume too many resources
$stat = proc_get_status($process); // get info on process
if ($stat['running']) { // still running
if (time() - $startTime > $timeout) { // check for timeout
// close descriptors
fclose($pipes[1]);
fclose($pipes[0]);
proc_terminate($process); // terminate process
$return_value = proc_close($process); // get return value
$terminated = true;
break;
}
} else {
// process finished before timeout
$output = stream_get_contents($pipes[1]); // get output of command
// close descriptors
fclose($pipes[1]);
fclose($pipes[0]);
proc_close($process); // close process
$return_value = $stat['exitcode']; // set exit code
break;
}
}
if (!$terminated) {
echo $output;
}
echo "command returned $return_value\n";
if ($terminated) echo "Process was terminated due to long execution\n";
} else {
echo "Failed to start process!\n";
}
References: proc_open(), proc_close(), proc_get_status(), proc_terminate()

PHP Shared block memory and fork

Im trying to create counter that using shared block memory, just look code:
$i=0; $counter = new counter('g');
while($i<3){
$pid = pcntl_fork();
echo $counter->get()."\t".$i."\t".$pid."\n";
$i++;
}
class counter {
protected static $projID = array();
protected $t_key;
protected $length;
function __construct($projID){
!in_array( $projID, self::$projID) or die('Using duplicate project identifer "'.$projID.'" for creating counter');
self::$projID[] = $projID;
$this->t_key = ftok(__FILE__, $projID);
$this->shmid = shmop_open($t_key, 'c', 0755, 64);
$this->length = shmop_write($this->shmid, 0, 0);
shmop_close($this->shmid);
}
function get(){
$sem = sem_get($this->t_key, 1);
sem_acquire($sem);
$shmid = shmop_open($this->t_key, 'c', 0755, 64);
$inc = shmop_read($shmid, 0, $this->length);
$this->length = shmop_write($shmid, $inc+1, 0);
shmop_close($shmid);
sem_release($sem);
return $inc;
}
}
But il get strange result
7 0 2567
8 1 2568
9 0 0
1 1 0
2 2 2569
40 1 2570
4 2 2572
3 2 0
51 2 2571
52 1 0
63 2 0
5 2 0
64 2 2573
65 2 0
I want to create this class for read and write strings in file in multithreading.
You're not ending child processes at all, they'll never finish. You're also not checking whether the process forked correctly or not, there's no control over what's finished processing and in what order. Forking a process isn't really multithreading that other languages provide, all that happens is that the current process is being copied and variables are shared - your $i won't end at 3, nor is there a guarantee which process is finishing first or last.
Try with:
while($i < 3)
{
$pid = pcntl_fork();
if($pid == -1)
{
// some sort of message that the process wasn't forked
exit(1);
}
else
{
if($pid)
{
pcntl_wait($status); // refer to PHP manual to check what this function does
}
else
{
// enter your code here, for whatever you want to be done in parallel
// bear in mind that some processes can finish sooner, some can finish later
// good use is when you have tasks dependent on network latency and you want
// them executed asynchronously (such as uploading multiple files to an ftp or
// synchronizing of something that's being done over network
// after you're done, kill the process so it doesn't become a zombie
posix_kill(getmypid(), 9); // not the most elegant solution, and can fail
}
}
}
You aren't dealing with the PID after your call to pcntl_fork. Your forks are forking because the loop continues to execute and fork.
Unless you're trying to create a localized fork bomb, you probably don't want your forks to fork.
I did some work locally to try and figure out if that alone would solve the problem, but it didn't. It almost looks like the shared memory segment isn't being written to correctly, as if one of the digits on either side of the string is being repeated, which corrupts all of it and forces things to start over.
Complete speculation.
You might want to consider a different way of performing parallel processing with PHP. Using Gearman as a multi-process work queue is a favorite solution of mine.

How can I enforce a maximum amount of forked children?

EDIT: I've tagged this C in a hope to get more response. It's more the theory I'm interested in than a specific language implementation. So if you're a C coder please treat the following PHP as pseudo-code and feel free to respond with an answer written in C.
I am trying to speed up a PHP CLI script by having it execute its tasks in parallel instead of serial. The tasks are completely independent of each other so it doesn't matter which order they start/finish in.
Here's the original script (note all these examples are stripped-back for clarity):
<?php
$items = range(0, 100);
function do_stuff_with($item) { echo "$item\n"; }
foreach ($items as $item) {
do_stuff_with($item);
}
I've managed to make it work on the $items in parallel with pcntl_fork() as shown below:
<?php
ini_set('max_execution_time', 0);
ini_set('max_input_time', 0);
set_time_limit(0);
$items = range(0, 100);
function do_stuff_with($item) { echo "$item\n"; }
$pids = array();
foreach ($items as $item) {
$pid = pcntl_fork();
if ($pid == -1) {
die("couldn't fork()");
} elseif ($pid > 0) {
// parent
$pids[] = $pid;
} else {
// child
do_stuff_with($item);
exit(0);
}
}
foreach ($pids as $pid) {
pcntl_waitpid($pid, $status);
}
Now I want to extend this so there's a maximum of, say, 10 children active at once. What's the best way of handling this? I've tried a few things but haven't had much luck.
There is no syscall to get a list of child pids, but ps can do it for you.
--ppid switch will list all children for you process so you just need to count number of lines outputted by ps.
Alternatively you can maintain your own counter that you will increment on fork() and decrement on SIGCHLD signal, assuming ppid stays unchanged for fork'ed processed.
The best thing I can come up with is to add all the tasks to a queue, launch the maximum number of threads you want, and then have each thread requesting a task from the queue, execute the task and requesting the next one. Don't forget to have the threads terminate when there are no more tasks to do.
Forking is an expensive operation. From the looks of it, what you really want is multithreading, not multiprocessing. The difference is that threads are much lighter weight than processes, since threads share a virtual address space but processes have separate virtual address spaces.
I'm not a PHP developer, but a quick Google search reveals that PHP does not support multithreading natively, but there are libraries to do the job.
Anyways, once you figure out how to spawn threads, you should figure out how many threads to spawn. In order to do this, you need to know what the bottleneck of your application is. Is the bottleneck CPU, memory, or I/O? You've indicated in your comments that you are network-bound, and network is a type of I/O.
If you were CPU bound, you're only going to get as much parallelism as you have CPU cores; any more threads and you're just wasting time doing context switches. Assuming you can figure out how many total threads to spawn, you should divide your work into that many units, and have each thread process one unit independently.
If you were memory bound, then multithreading would not help.
Since you're I/O bound, figuring out how many threads to spawn is a little trickier. If all work items take approximately the same time to process with very low variance, you can estimate how many threads to spawn by measuring how long one work item takes. However, since network packets tend to have highly variable latencies, this is unlikely to be the case.
One option is to use thread pools - you create a whole bunch of threads, and then for each item to process, you see if there is a free thread in the pool. If there is, you have that thread perform the work, and you move onto the next item. Otherwise, you wait for a thread to become available. Choosing the size of the thread pool is important - too big, and you're wasting time doing unnecessary context switches. Too few, and you're waiting for threads too often.
Yet another option is to abandon multithreading/multiprocessing and just do asynchronous I/O instead. Since you mentioned you're working on a single-core processor, this will probably be the fastest option. You can use functions like socket_select() to test if a socket has data available. If it does, you can read the data, otherwise you move onto a different socket. This requires doing a lot more bookkeeping, but you avoid waiting for data to come in on one socket when data is available on a different socket.
If you want to eschew threads and asynchronous I/O and stick with multiprocessing, it can still be worthwhile if the per-item processing is expensive enough. You might then do the work division like so:
$my_process_index = 0;
$pids = array();
// Fork off $max_procs processes
for($i = 0; $i < $max_procs - 1; $i++)
{
$pid = pcntl_fork();
if($pid == -1)
{
die("couldn't fork()");
}
elseif($pid > 0)
{
// parent
$my_process_index++;
$pids[] = $pid
}
else
{
// child
break;
}
}
// $my_process_index is now an integer in the range [0, $max_procs), unique among all the processes
// Each process will now process 1/$max_procs of the items
for($i = $my_process_index; $i < length($items); $i += $max_procs)
{
do_stuff_with($items[$i]);
}
if($my_process_index != 0)
{
exit(0);
}
man 2 setrlimit
That's going to be per-user which may be what you want anyway.
<?php
ini_set('max_execution_time', 0);
ini_set('max_input_time', 0);
set_time_limit(0);
$items = range(0, 100);
function do_stuff_with($item) { echo "$item\n"; }
$pids = array();
while (count($items)){
$item = array_pop($items);
$pid = pcntl_fork();
if ($pid == -1) {
die("couldn't fork()");
} elseif ($pid > 0) {
// parent
$pids[] = $pid;
} else {
// child
do_stuff_with($item);
exit(0);
}
while (count($pids) >= 10){ // limit
while (($wait_pid = pcntl_waitpid(0, $status)) != -1) {
$status = pcntl_wexitstatus($status);
array_pop($pids);
echo "$wait_pid $status".PHP_EOL;
break;
}
}
}
while (count($pids)){
while (($wait_pid = pcntl_waitpid(0, $status)) != -1) {
$status = pcntl_wexitstatus($status);
array_pop($pids);
echo "CHILD: child $status completed $wait_pid".PHP_EOL;
break;
}
}

Categories