From the get go, let me say that I'm trying to avoid using pcntl_fork()
For this example, lets imagine that I'm trying to fork many instances of the 'dig' command line application. In reality the same script will be used for different command line apps.
Right now I'm using php exec and appending & to the command so that bash runs it in the background.
E.g
exec("dig google.com &");
exec("dig yahoo.com &");
and so on...
This successfully creates multiple processes of dig running parallel.
The problem I'm having is that the number of processes is rising steadily until the system crashes. Essentially it's a fork bomb.
I tried to combat this by checking the number of running processes using ps ax | wc -l
and only launching more if it's below X.
E.g (running on a loop)
if 80 processes are running, i'll launch another 20.
if 70 processes are running, i'll launch another 30.
The problem is, that even with this check in place, the number of processes continues to rise until the system crashes or it hits the operating systems max user processes
Can anyone give me some hints on how I can fork effectively (mass) without raping all the system resources? I can't see why this current method isn't working tbh.
Since you have a management process, I suggest you watch over the created subprocesses. Save the PID of every subscript you start:
$running[] = exec("process www.google.com & echo $!");
Where $! will return the PID of the backgrounded process, adding it to a list in PHP. Then in your management loop, just recheck if the processes are still active:
do {
foreach ($running as $i=>$pid) {
if (!posix_getpgid($pid)) {
unset($running[$i]);
// restart
}
} }
I don't think it's very elegant or reliable. pcntl_fork is often the better approach, but you din't elaborate on your actual scripts. But maybe this works in your case.
You may also want to use uptime to check system load.
Related
I want to get some data from and API and save for that user in database, this actions takes random times depending on the time and sometimes it takes even 4 hours,
I am executing the script using exec and & in the background in php,
My question is that is exec safe for long running jobs, I dont know much about fork,and linux processes etc so I dont know what happened internally on CPU cores,
Here is something I found that confused me,
http://symcbean.blogspot.com/2010/02/php-and-long-running-processes.html
Can somebody tell me if I am going in right direction with exec?
will the process be killed itself after script completion?
Thanks
Well, that article is talking about process "trees" and how a child process depends of it spawning parent.
The PHP instance starts a child process (through exec or similar). If it doesn't wait for the process output, the PHP script ends (and the response is sent to the browser, for instance), but the process will sit idling, waiting for it's child process to finish.
The problem with this is that the child process (the long running 4 hours process) is not guaranteed to finish its job, before apache decides to kill its parent process (because you have too many idle processes) and, effectively, killing its children.
The article's author then gives the suggestion of using a daemon and separate the child process from the parent process.
Edit:
Answering the question you left in the comments, here's a quick explanation of the command he uses in the article
echo /usr/bin/php -q longThing.php | at now
Starting from left to right.
echo prints to Standard Output (STDOUT) the stuff you put in front of it so...
echo /usr/bin/php -q longThing.php will print to the shell /usr/bin/php -q longThing.php
| (pipeline) feeds directly the STDOUT of a previous command to the standard input (STDIN) of the next command.
at reads commands from STDIN and executes them at a specified time. at now means the command will be executed immediately.
So basically this is the same thing as running the following sequence in the shell:
at now - Opens the at prompt
/usr/bin/php -q longThing.php - The command we want to run
^D (by pressing Control+D) - To save the job
So, regarding your questions:
Will the child process be immediately killed after the PARENT PHP script ends?
No.
Will the child process be killed at all, in some future moment?
Yes. Apache takes care of that for you.
Will the child process finish its job before being killed?
Maybe. Maybe not. Apache might kill it before its done. Odds of that happening increase with the number of idle processes and with the time the process takes to finish.
Sidenote:
I think this article does point in the right direction but I dislike the idea of spawning processes directly from PHP. In fact, PHP does not have the appropriate tools for running (long and/or intensive) bg work. With PHP alone, you have little to no control over it.
I can, however, give you the solution we found for a similar problem I faced a while ago. We created a small program that would accept and queue data processing requests (about 5 mins long) and report back when the request was finished. That way we could control how many processes could be running at the same time, memory usage, number of requests by the same user, etc...
The program was actually hosted in another LAN server, which prevented memory usage spikes slowing down the webserver.
At the front-end, the user would be informed when the request was completed through long polling,
I have a textarea in my webpage in which the user is to paste a c program. At the server side, I save this code in a file appropriately. I use the shell_exec() function to call gcc to execute the c program. This works fine. And so does the execution part.
But what if the user (un)intentionally gives an infinite loop? When I use the function -
shell_exec("./a.out")
the program goes into an infinite loop. How do I break out of such a loop from the php script itself? Is there a way?
Use ulimit to limit the CPU usage? Note that this is per process, so if the user "forks" continually the process, you may be in trouble.
Another method would be to have a wrapper process that monitors and kills all it's child processes, and let that start the a.out. It depends on whether you can trust your "clients" or not (e.g. are they your good friends, or is this a school project or a public website) - your paranoia level should increase by threat level.
If you want more refined security, run the process via ssh in a virtual machine. Then just kill the virtual machine after X seconds, and start a fresh one from a saved snapshot. (You could have a pool of VM's ready to run, so the user don't have to wait for the VM to load)
I am developing a video upload site and I have ran into a dilemma: videos uploaded need to be converted into the FLV format in order to be displayed to a visitor but, if I execute the command within the script, the script will hang for about 10-15 minutes while the FFMPEG converts the video.
I had an idea to insert a record in to the database indicating the file needs to be processed, then using a cron job set to every 5 minutes to select records from the database which needs to be processed, process them, then update the database showing they have been processed. My worry about this is executing too many processes and the server crashing under the strain, so has anyone got any solutions to this or a way to better the process I have in mind?
Okay, this is now what I have in mind, so the user uploads a video and a row is inserted in to the database indicating the video needs to be processed. A cron job set to every 5 minutes checks what needs to be processed and what is being processed, say I would make a maximum of five processes at one time, so the script would check if any video needs to be processed and how many videos are being processed, if it is less then five, it updates the record indicating that it is being processed, once the video has been processed, it updates the record indicating it has been processed and the cron job starts again, any thoughts?
Gearman is a good solution for this kind of problem, it lets you instantly dispatch a job and have any number of workers (which may be on different servers) available to fulfill it.
To start with you can run a few workers on the same server, but if you start to run into load issues then you can just fire up another server with some more workers, so it's horizontally scalable.
If you're using PHP-FPM then you can make use of fastcgi_finish_request() as documented on PHP.net. FastCGI Process Manager (FPM)
fastcgi_finish_request() - special function to finish request and flush all data while continuing to do something time-consuming (video converting, stats processing etc.);
If you're not using PHP-FPM or want something more advanced then you might consider using a queue manager like Gearman which is perfectly suited to the scenario you're describing. The advantage of using Gearman over running a process with shell_exec is you can take a look at how many jobs are running / how many are left and check their statuses. You also make scaling much easier as it's now trivial to add job servers:
$worker->addServer("10.0.0.1");
I love this class (see the specific comment) in the PHP manual: http://www.php.net/manual/en/function.exec.php#88704
Basically, it lets you spin off a background process on *Nix systems. it returns a pid, which you can store in the session. When you reload the page to check on it, you simply recreate the ForkedProcess class with the saved pid, and you can check on it's status. If it's complete, the process should be done.
It doesn't allow for much error checking, but it's incredibly lightweight.
If you expect a lot of traffic you should seriously consider a dedicated server.
On a single server, you can use shell_exec along with the UNIX nohup command to get the PID of the process.
function run_in_background($Command, $Priority = 0)
{
if($Priority)
$PID = shell_exec("nohup nice -n $Priority $Command 2> /dev/null & echo $!");
else
$PID = shell_exec("nohup $Command 2> /dev/null & echo $!");
return($PID);
}
function is_process_running($PID)
{
exec("ps $PID", $ProcessState);
return(count($ProcessState) >= 2);
}
A full description of this technique is here: http://nsaunders.wordpress.com/2007/01/12/running-a-background-process-in-php/
You could perhaps put the list of PIDs in a MySQL table and then use your cron job every 5 mins to detect when a video is complete and update the relevant values in the database.
You can call ffmpeg use system and send the output to /dev/null, this will make that call return right away, effectively handling it in the background.
Spawn couple of worker processes which will consume messages from message queue like for example beanstalkd. This way you can control the number of concurrent tasks(conversions) and also don't have to pay price of spawning processes(because processes keep running in background).
I think it would be even a lot faster if you used/coded C and used Redis as your message queue. Redis has a very good c client library named Hiredis. I don't think this would be insanely difficult to accomplish.
I'm trying to figure out the most efficient way to running a pretty hefty PHP task thousands of times a day. It needs to make an IMAP connection to Gmail, loop over the emails, save this info to the database and save images locally.
Running this task every so often using a cron isn't that big of a deal, but I need to run it every minute and I know eventually the crons will start running on top of each other and cause memory issues.
What is the next step up when you need to efficiently run a task multiple times a minute? I've been reading about beanstalk & pheanstalk and I'm not entirely sure if that will do what I need. Thoughts???
I'm not a PHP guy but ... what prevents you from running your script as a daemon? I've written many a perl script that does just that.
Either create a locking mechanism so the scripts won't overlap. This is quite simple as scripts only run every minute, a simple .lock file would suffice:
<?php
if (file_exists("foo.lock")) exit(0);
file_put_contents("foo.lock", getmypid());
do_stuff_here();
unlink("foo.lock");
?>
This will make sure scripts don't run in parallel, you just have to make sure the .lock file is deleted when the program exits, so you should have a single point of exit (except for the exit at the beginning).
A good alternative - as Brian Roach suggested - is a dedicated server process that runs all the time and keeps the connection to the IMAP server up. This reduces overhead a lot and is not much harder than writing a normal php script:
<?php
connect();
while (is_world_not_invaded_by_aliens())
{
get_mails();
get_images();
sleep(time_to_next_check());
}
disconnect();
?>
I've got a number of scripts like these, where I don't want to run them from cron in case they stack-up.
#!/bin/sh
php -f fetchFromImap.php
sleep 60
exec $0
The exec $0 part starts the script running again, replacing itself in memory, so it will run forever without issues. Any memory the PHP script uses is cleaned up whenever it exits, so that's not a problem either.
A simple line will start it, and put it into the background:
cd /x/y/z ; nohup ./loopToFetchMail.sh &
or it can be similarly started when the machine starts with various means (such as Cron's '#reboot ....')
fcron http://fcron.free.fr/ will not start new job if old one is still running, Your could use # 1 command and not worry about race conditions.
I'm writing a script that builds a queue of other scripts and is supposed to manage their launch. the manager script should know which child process has finished, so it can launch other scripts waiting in the queue.
I added a "& echo $!" to get the Process Id of each child process. so I have my child processes Process Ids, and for now am using system "ps" program call to find if child processes are still running or not.
the thing is that my script currently runs only in Unix-like systems. I don't know how to fetch my children's PID in windows, and my script does not parse "tasklist" command's output in windows yet.
Is there any other way to achieve this? any in-PHP solution to find if the child process is still running? a solution to start other processes (non blocking), and check if they are still running or not.
You may find Process Control interesting for Unix environments. You may also find an example of executing programs on Windows as comment in the manual, and this points me to think of COM-objects.
What you could do is create a database or file that will hold your process ids. Every process will write his pid (process id) in the file or DB.
Use this method to acquire your php pid:
getmypid();
Your supervising process will check every now and then if the process id is still running with the following:
function is_process_running($PID) {
exec("ps $PID", $ProcessState);
return(count($ProcessState) >= 2);
}
When process is stopped you can execute the next process
and for use of windows check the comment in the manual: http://no2.php.net/manual/en/book.exec.php#87943
Have you tried proc_get_status() ? In that case you may want to spawn your child processes using proc_open(). Im not sure if this is what your looking for.