using exec for long running scripts - php

I want to get some data from and API and save for that user in database, this actions takes random times depending on the time and sometimes it takes even 4 hours,
I am executing the script using exec and & in the background in php,
My question is that is exec safe for long running jobs, I dont know much about fork,and linux processes etc so I dont know what happened internally on CPU cores,
Here is something I found that confused me,
http://symcbean.blogspot.com/2010/02/php-and-long-running-processes.html
Can somebody tell me if I am going in right direction with exec?
will the process be killed itself after script completion?
Thanks

Well, that article is talking about process "trees" and how a child process depends of it spawning parent.
The PHP instance starts a child process (through exec or similar). If it doesn't wait for the process output, the PHP script ends (and the response is sent to the browser, for instance), but the process will sit idling, waiting for it's child process to finish.
The problem with this is that the child process (the long running 4 hours process) is not guaranteed to finish its job, before apache decides to kill its parent process (because you have too many idle processes) and, effectively, killing its children.
The article's author then gives the suggestion of using a daemon and separate the child process from the parent process.
Edit:
Answering the question you left in the comments, here's a quick explanation of the command he uses in the article
echo /usr/bin/php -q longThing.php | at now
Starting from left to right.
echo prints to Standard Output (STDOUT) the stuff you put in front of it so...
echo /usr/bin/php -q longThing.php will print to the shell /usr/bin/php -q longThing.php
| (pipeline) feeds directly the STDOUT of a previous command to the standard input (STDIN) of the next command.
at reads commands from STDIN and executes them at a specified time. at now means the command will be executed immediately.
So basically this is the same thing as running the following sequence in the shell:
at now - Opens the at prompt
/usr/bin/php -q longThing.php - The command we want to run
^D (by pressing Control+D) - To save the job
So, regarding your questions:
Will the child process be immediately killed after the PARENT PHP script ends?
No.
Will the child process be killed at all, in some future moment?
Yes. Apache takes care of that for you.
Will the child process finish its job before being killed?
Maybe. Maybe not. Apache might kill it before its done. Odds of that happening increase with the number of idle processes and with the time the process takes to finish.
Sidenote:
I think this article does point in the right direction but I dislike the idea of spawning processes directly from PHP. In fact, PHP does not have the appropriate tools for running (long and/or intensive) bg work. With PHP alone, you have little to no control over it.
I can, however, give you the solution we found for a similar problem I faced a while ago. We created a small program that would accept and queue data processing requests (about 5 mins long) and report back when the request was finished. That way we could control how many processes could be running at the same time, memory usage, number of requests by the same user, etc...
The program was actually hosted in another LAN server, which prevented memory usage spikes slowing down the webserver.
At the front-end, the user would be informed when the request was completed through long polling,

Related

Server-side scheduled tasks: need to schedule a task that happens with a frequency of 5 seconds

I need to write a server-side program that lives on the server, and is checking a database consistently for new entries.
When a new entry shows up in the database, the program should process the data and put the results somewhere else.
It is important to hi-light that the process isn't instigated by new entries showing up, but by the program checking for new entries on its own.
Some people I've spoken to brought up cron jobs, I was curious what if this is the solution for me? I see that it has limitations, it won't run less than every minute. I was hoping for the program to run every 5 seconds, would I be better off writing a shell script or is that a bootleg fix?
I'm not sure if this is conventional (?) but...
Use a database trigger on INSERT that runs an external program (PHP, Python, .. whatever). Which database are you using? I think this post is old but might be of help: http://crazytechthoughts.blogspot.co.uk/2011/12/call-external-program-from-mysql.html
There is a technique I've frequently used when dealing with queues that I've been processing.
#!/bin/sh
php -f checkDBAndAct.php
sleep 5
exec $0
The exec $0 part starts the script running again, replacing itself in memory, so it will run forever without issues. Any memory the PHP script uses is cleaned up whenever it exits, so that's not a problem either.
A simple line will start it, and put it into the background:
cd /x/y/z ; nohup ./loopToProcessDB.sh &
or it can be similarly started when the machine starts with various means (such as Cron's '#reboot ....')
-- from https://stackoverflow.com/a/2686100/6216
An extended version is on http://PHPscaling.com and https://gist.github.com/alister/1386212
Though I'd use an actual queue system, rather than a DB, as there are a number of downsides to bending a database to this task.

Running Cron jobs in parallel (PHP)

In the past, I ran a bunch of scripts each as a separate cron job. Now I'd like to run a controller script with one cron job, then have that call the scripts separately (and in parallel, all at the same time), so I don't have to create a new cron job every time I add another script.
I looked up pcntl_fork() but we don't have that installed. Can fsockopen() do this as well?
A few questions:
I saw this example, http://phplens.com/phpeverywhere/?q=node/view/254, that uses fsockopen(). Will this allow me to run PHP scripts in parallel? Note, the scripts don't interact, but I would still like to know if any of them exited prematurely with an error.
Secondly the scripts I'm running aren't externally accessible, they are internal only. The script was previously run like so: php -f /path/to/my/script1.php. It's not a web-accessible path. Would the example in #1 work with this, or only web-accessible paths?.
Thanks for any advice you can offer.
You can use proc_open to run multiple processes without waiting for each process to finish.
You will have a process handle, you can terminate each process at any time and you can read the standard output of each process.
You can also communicate via pipes, which is optional.
Passing 1st param php /your/path/to/script.php param1 "param2 x" means starting a separate PHP process.
proc_open (see Example #1)
Ultimately you will want to use an infinite while loop + usleep (or sleep) to avoid maxing out on the CPU. Break when all processes finish, or after you killed them.
Edit: you can know if a process has exited prematurely.
Edit2: a simpler way of doing the above is popen
Please correct me if I'm wrong, but if I understand things correctly, the solution Tiberiu-Ionut Stan proposed implies that starting the processes with proc_open and waiting for them to finish will not be run as a cron script, but is part of a running program/service, right?
As far as I understand the cron jobs, the controller script user920050 was thinking of using would be started by cron on a schedule and each new instance would launch the processes all over again, do the waiting for them to finish and probably run in parallel with other cron-launched instances of the controller script.

How to set up Beanstalkd with PHP

Recently I've been researching the use of Beanstalkd with PHP. I've learned quite a bit but have a few questions about the setup on a server, etc.
Here is how I see it working:
I install Beanstalkd and any dependencies (such as libevent) on my Ubuntu server. I then start the Beanstalkd daemon (which should basically run at all times).
Somewhere in my website (such as when a user performs some actions, etc) tasks get added to various tubes within the Beanstalkd queue.
I have a bash script (such as the following one) that is run as a deamon that basically executes a PHP script.
#!/bin/sh
php worker.php
4) The worker script would have something like this to execute the queued up tasks:
while(1) {
$job = $this->pheanstalk->watch('test')->ignore('default')->reserve();
$job_encoded = json_decode($job->getData(), false);
$done_jobs[] = $job_encoded;
$this->log('job:'.print_r($job_encoded, 1));
$this->pheanstalk->delete($job);
}
Now here are my questions based on the above setup (which correct me if I'm wrong about that):
Say I have the task of importing an RSS feed into a database or something. If 10 users do this at once, they'll all be queued up in the "test" tube. However, they'd then only be executed one at a time. Would it be better to have 10 different tubes all executing at the same time?
If I do need more tubes, does that then also mean that i'd need 10 worker scripts? One for each tube all running concurrently with basically the same code except for the string literal in the watch() function.
If I run that script as a daemon, how does that work? Will it constantly be executing the worker.php script? That script loops until the queue is empty theoretically, so shouldn't it only be kicked off once? How does the daemon decide how often to execute worker.php? Is that just a setting?
Thanks!
If the worker isn't taking too long to fetch the feed, it will be fine. You can run multiple workers if required to process more than one at a time. I've got a system (currently using Amazon SQS, but I've done similar with BeanstalkD before), with up to 200 (or more) workers pulling from the queue.
A single worker script (the same script running multiple times) should be fine - the script can watch multiple tubes at the same time, and the first one available will be reserved. You can also use the job-stat command to see where a particular $job came from (which tube), or put some meta-information into the message if you need to tell each type from another.
A good example of running a worker is described here. I've also added supervisord (also, a useful post to get started) to easily start and keep running a number of workers per machine (I run shell scripts, as in the first link). I would limit the number of times it loops, and also put a number into the reserve() to have it wait for a few seconds, or more, for the next job the become available without spinning out of control in a tight loop that does not pause at all - even if there was nothing to do.
Addendum:
The shell script would be run as many times as you need. (the link show how to have it re-run as required with exec $#). Whenever the php script exits, it re-runs the PHP.
Apparently there's a Djanjo app to show some stats, but it's trivial enough to connect to the daemon, get a list of tubes, and then get the stats for each tube - or just counts.

Fork safely from PHP

From the get go, let me say that I'm trying to avoid using pcntl_fork()
For this example, lets imagine that I'm trying to fork many instances of the 'dig' command line application. In reality the same script will be used for different command line apps.
Right now I'm using php exec and appending & to the command so that bash runs it in the background.
E.g
exec("dig google.com &");
exec("dig yahoo.com &");
and so on...
This successfully creates multiple processes of dig running parallel.
The problem I'm having is that the number of processes is rising steadily until the system crashes. Essentially it's a fork bomb.
I tried to combat this by checking the number of running processes using ps ax | wc -l
and only launching more if it's below X.
E.g (running on a loop)
if 80 processes are running, i'll launch another 20.
if 70 processes are running, i'll launch another 30.
The problem is, that even with this check in place, the number of processes continues to rise until the system crashes or it hits the operating systems max user processes
Can anyone give me some hints on how I can fork effectively (mass) without raping all the system resources? I can't see why this current method isn't working tbh.
Since you have a management process, I suggest you watch over the created subprocesses. Save the PID of every subscript you start:
$running[] = exec("process www.google.com & echo $!");
Where $! will return the PID of the backgrounded process, adding it to a list in PHP. Then in your management loop, just recheck if the processes are still active:
do {
foreach ($running as $i=>$pid) {
if (!posix_getpgid($pid)) {
unset($running[$i]);
// restart
}
} }
I don't think it's very elegant or reliable. pcntl_fork is often the better approach, but you din't elaborate on your actual scripts. But maybe this works in your case.
You may also want to use uptime to check system load.

Have a PHP script run forever, access a queue

See also Having a PHP script loop forever doing computing jobs from a queue system, but that doesn't answer all my questions.
If I want to run a PHP script forever, accessing a queue and doing jobs:
What is the potential for memory problems? How to avoid them? (any flush functions or something I should use?)
What if the script dies for some reason? What would be a good method to automatically start it up again?
What would be the best basic approach to start the script. Since it runs forever, I don't need cron. But how do I start it up? (See also 2.)
Set the queue up as a cron script. Have it execute every 10 seconds. When the script fires up, check if there's a lock file present (something like .lock). If there is, exit immediately. If not, create the .lock and start processing. If any errors occur, email/log these errors, delete .lock and exit. If there's no tasks, then exit.
I think this approach is ideal, since PHP isn't really designed to be able to run a script for extended periods of time like you're asking. To avoid potential memory leaks, crashes etc, continuously executing the script is a better approach.
While PHP can access (publish and consume) MQ's, if at all possible try to use a fully functional MQ application to do this.
A fully functional MQ application (in ruby, perl, .NET, java etc) will handle all of the concurrency, error logging, state management and scalability issues that you discuss.
Not going too far with state machines, at least it's a good idea to introduce states both to 'jobs' (example: flv2avi conversion) and 'tasks' (flv2avi 1.flv).
On my script (Perl), sometimes zombie processes are starting to downgrade the whole script's performance. It is a rare case, but it is native in source, so the script should be able to stop reading queue anymore, allowing new instance to continue its tasks&jobs; however, keeping as much of running tasks' data is welcome. Once first instance has 1-2 tasks, it gets killed.
On start :
check for common errors (due to shutdown)
check for known errors (out of space, can't read input)
kill whatever may be killed and set status to 'waiting'
start all waiting.
If you run a piped jobs (vlc | ffmpeg, tail -f | grep), you can try to avoid using too much I/O in your program, instead doing fork() (bad idea for PHP?) or just calling /bin/bash -c "prog1 | prog2", this saves a lot of cpu load.
Start points: both /etc/rc.d and cron (check processes, run first instance || run second with 'debug' argument )

Categories