Recently I've been researching the use of Beanstalkd with PHP. I've learned quite a bit but have a few questions about the setup on a server, etc.
Here is how I see it working:
I install Beanstalkd and any dependencies (such as libevent) on my Ubuntu server. I then start the Beanstalkd daemon (which should basically run at all times).
Somewhere in my website (such as when a user performs some actions, etc) tasks get added to various tubes within the Beanstalkd queue.
I have a bash script (such as the following one) that is run as a deamon that basically executes a PHP script.
#!/bin/sh
php worker.php
4) The worker script would have something like this to execute the queued up tasks:
while(1) {
$job = $this->pheanstalk->watch('test')->ignore('default')->reserve();
$job_encoded = json_decode($job->getData(), false);
$done_jobs[] = $job_encoded;
$this->log('job:'.print_r($job_encoded, 1));
$this->pheanstalk->delete($job);
}
Now here are my questions based on the above setup (which correct me if I'm wrong about that):
Say I have the task of importing an RSS feed into a database or something. If 10 users do this at once, they'll all be queued up in the "test" tube. However, they'd then only be executed one at a time. Would it be better to have 10 different tubes all executing at the same time?
If I do need more tubes, does that then also mean that i'd need 10 worker scripts? One for each tube all running concurrently with basically the same code except for the string literal in the watch() function.
If I run that script as a daemon, how does that work? Will it constantly be executing the worker.php script? That script loops until the queue is empty theoretically, so shouldn't it only be kicked off once? How does the daemon decide how often to execute worker.php? Is that just a setting?
Thanks!
If the worker isn't taking too long to fetch the feed, it will be fine. You can run multiple workers if required to process more than one at a time. I've got a system (currently using Amazon SQS, but I've done similar with BeanstalkD before), with up to 200 (or more) workers pulling from the queue.
A single worker script (the same script running multiple times) should be fine - the script can watch multiple tubes at the same time, and the first one available will be reserved. You can also use the job-stat command to see where a particular $job came from (which tube), or put some meta-information into the message if you need to tell each type from another.
A good example of running a worker is described here. I've also added supervisord (also, a useful post to get started) to easily start and keep running a number of workers per machine (I run shell scripts, as in the first link). I would limit the number of times it loops, and also put a number into the reserve() to have it wait for a few seconds, or more, for the next job the become available without spinning out of control in a tight loop that does not pause at all - even if there was nothing to do.
Addendum:
The shell script would be run as many times as you need. (the link show how to have it re-run as required with exec $#). Whenever the php script exits, it re-runs the PHP.
Apparently there's a Djanjo app to show some stats, but it's trivial enough to connect to the daemon, get a list of tubes, and then get the stats for each tube - or just counts.
Related
I need to write a server-side program that lives on the server, and is checking a database consistently for new entries.
When a new entry shows up in the database, the program should process the data and put the results somewhere else.
It is important to hi-light that the process isn't instigated by new entries showing up, but by the program checking for new entries on its own.
Some people I've spoken to brought up cron jobs, I was curious what if this is the solution for me? I see that it has limitations, it won't run less than every minute. I was hoping for the program to run every 5 seconds, would I be better off writing a shell script or is that a bootleg fix?
I'm not sure if this is conventional (?) but...
Use a database trigger on INSERT that runs an external program (PHP, Python, .. whatever). Which database are you using? I think this post is old but might be of help: http://crazytechthoughts.blogspot.co.uk/2011/12/call-external-program-from-mysql.html
There is a technique I've frequently used when dealing with queues that I've been processing.
#!/bin/sh
php -f checkDBAndAct.php
sleep 5
exec $0
The exec $0 part starts the script running again, replacing itself in memory, so it will run forever without issues. Any memory the PHP script uses is cleaned up whenever it exits, so that's not a problem either.
A simple line will start it, and put it into the background:
cd /x/y/z ; nohup ./loopToProcessDB.sh &
or it can be similarly started when the machine starts with various means (such as Cron's '#reboot ....')
-- from https://stackoverflow.com/a/2686100/6216
An extended version is on http://PHPscaling.com and https://gist.github.com/alister/1386212
Though I'd use an actual queue system, rather than a DB, as there are a number of downsides to bending a database to this task.
In the past, I ran a bunch of scripts each as a separate cron job. Now I'd like to run a controller script with one cron job, then have that call the scripts separately (and in parallel, all at the same time), so I don't have to create a new cron job every time I add another script.
I looked up pcntl_fork() but we don't have that installed. Can fsockopen() do this as well?
A few questions:
I saw this example, http://phplens.com/phpeverywhere/?q=node/view/254, that uses fsockopen(). Will this allow me to run PHP scripts in parallel? Note, the scripts don't interact, but I would still like to know if any of them exited prematurely with an error.
Secondly the scripts I'm running aren't externally accessible, they are internal only. The script was previously run like so: php -f /path/to/my/script1.php. It's not a web-accessible path. Would the example in #1 work with this, or only web-accessible paths?.
Thanks for any advice you can offer.
You can use proc_open to run multiple processes without waiting for each process to finish.
You will have a process handle, you can terminate each process at any time and you can read the standard output of each process.
You can also communicate via pipes, which is optional.
Passing 1st param php /your/path/to/script.php param1 "param2 x" means starting a separate PHP process.
proc_open (see Example #1)
Ultimately you will want to use an infinite while loop + usleep (or sleep) to avoid maxing out on the CPU. Break when all processes finish, or after you killed them.
Edit: you can know if a process has exited prematurely.
Edit2: a simpler way of doing the above is popen
Please correct me if I'm wrong, but if I understand things correctly, the solution Tiberiu-Ionut Stan proposed implies that starting the processes with proc_open and waiting for them to finish will not be run as a cron script, but is part of a running program/service, right?
As far as I understand the cron jobs, the controller script user920050 was thinking of using would be started by cron on a schedule and each new instance would launch the processes all over again, do the waiting for them to finish and probably run in parallel with other cron-launched instances of the controller script.
I am developing a video upload site and I have ran into a dilemma: videos uploaded need to be converted into the FLV format in order to be displayed to a visitor but, if I execute the command within the script, the script will hang for about 10-15 minutes while the FFMPEG converts the video.
I had an idea to insert a record in to the database indicating the file needs to be processed, then using a cron job set to every 5 minutes to select records from the database which needs to be processed, process them, then update the database showing they have been processed. My worry about this is executing too many processes and the server crashing under the strain, so has anyone got any solutions to this or a way to better the process I have in mind?
Okay, this is now what I have in mind, so the user uploads a video and a row is inserted in to the database indicating the video needs to be processed. A cron job set to every 5 minutes checks what needs to be processed and what is being processed, say I would make a maximum of five processes at one time, so the script would check if any video needs to be processed and how many videos are being processed, if it is less then five, it updates the record indicating that it is being processed, once the video has been processed, it updates the record indicating it has been processed and the cron job starts again, any thoughts?
Gearman is a good solution for this kind of problem, it lets you instantly dispatch a job and have any number of workers (which may be on different servers) available to fulfill it.
To start with you can run a few workers on the same server, but if you start to run into load issues then you can just fire up another server with some more workers, so it's horizontally scalable.
If you're using PHP-FPM then you can make use of fastcgi_finish_request() as documented on PHP.net. FastCGI Process Manager (FPM)
fastcgi_finish_request() - special function to finish request and flush all data while continuing to do something time-consuming (video converting, stats processing etc.);
If you're not using PHP-FPM or want something more advanced then you might consider using a queue manager like Gearman which is perfectly suited to the scenario you're describing. The advantage of using Gearman over running a process with shell_exec is you can take a look at how many jobs are running / how many are left and check their statuses. You also make scaling much easier as it's now trivial to add job servers:
$worker->addServer("10.0.0.1");
I love this class (see the specific comment) in the PHP manual: http://www.php.net/manual/en/function.exec.php#88704
Basically, it lets you spin off a background process on *Nix systems. it returns a pid, which you can store in the session. When you reload the page to check on it, you simply recreate the ForkedProcess class with the saved pid, and you can check on it's status. If it's complete, the process should be done.
It doesn't allow for much error checking, but it's incredibly lightweight.
If you expect a lot of traffic you should seriously consider a dedicated server.
On a single server, you can use shell_exec along with the UNIX nohup command to get the PID of the process.
function run_in_background($Command, $Priority = 0)
{
if($Priority)
$PID = shell_exec("nohup nice -n $Priority $Command 2> /dev/null & echo $!");
else
$PID = shell_exec("nohup $Command 2> /dev/null & echo $!");
return($PID);
}
function is_process_running($PID)
{
exec("ps $PID", $ProcessState);
return(count($ProcessState) >= 2);
}
A full description of this technique is here: http://nsaunders.wordpress.com/2007/01/12/running-a-background-process-in-php/
You could perhaps put the list of PIDs in a MySQL table and then use your cron job every 5 mins to detect when a video is complete and update the relevant values in the database.
You can call ffmpeg use system and send the output to /dev/null, this will make that call return right away, effectively handling it in the background.
Spawn couple of worker processes which will consume messages from message queue like for example beanstalkd. This way you can control the number of concurrent tasks(conversions) and also don't have to pay price of spawning processes(because processes keep running in background).
I think it would be even a lot faster if you used/coded C and used Redis as your message queue. Redis has a very good c client library named Hiredis. I don't think this would be insanely difficult to accomplish.
I'm building a system that watches a queue and activates a set of tasks on a regular interval.
I'm interested in running multiple instances of my processing "bots" based on how many items are in the queue. So if there are 5 items I'll run two bots and if their are 10 I'll run four.
I know how to run multiple instances from CLI (manually), but how would I do this as a function of my application? And how would I properly track the creation and destruction of these bots?
It seems like cron (*nix) or task scheduler (windows) would be what you need.
http://en.wikipedia.org/wiki/Cron
http://msdn.microsoft.com/en-us/library/aa383614%28VS.85%29.aspx
These can run a PHP script that determine how many "bots" need run, calculations, etc. Anything PHP is capable of.
Also, for running the multiple bots in the background (after the main controller script has finished executing) you may want to look at PHP process forking.
You might also want to look at gearman ( http://gearman.org/ )
I have a simple messaging queue setup and running using the Zend_Queue object heirarchy. I'm using a Zend_Queue_Adapter_Db back-end. I'm interested in using this as a job queue, to schedule things for processing at a later time. They're jobs that don't need to happen immediately, but should happen sooner rather than later.
Is there a best-practices/standard way to setup your infrastructure to run jobs? I understand the code for receiving a message from the queue, but what's not so clear to me is how run the program that does that receiving. A cron that receives n messages on the command-line, run once a minute? A cron that fires off multiple web requests, each web request running the receiver script? Something else?
Tangential bonus question. If I'm running other queries with Zend_Db, will the message queue queries be considered part of that transaction?
You can do it like a thread pool. Create a command line php script to handle the receiving. It should be started by a shell script that automatically restarts the process if it dies. The shell script should not start the process if it is already running (use a $pid.running file or similar). Have cron run several of these every 1-10 minutes. That should handle the receiving nicely.
I wouldn't have the cron fire a web request unless your cron is on another server for some strange reason.
Another way to use this would be to have some backround process creating data, and a web user(s) consume it as they naturally browse the site. A report generator might work this way. Company-wide reports are available to all users but you don't want them all generating this db/time intensive report. So you create a queue and process one at a time possible removing duplicates. All users can view the report(s) when ready.
According to the docs it doens't look like the zend db is even using the same connection as your other zend_db queries. But of course the best way to find out is to make a simple test.
EDIT
The multiple lines in the cron are for concurrency. each line represents a worker for the pool. I was not clear, you don't want the pid as the identifier, you want to pass that as a parameter.
/home/byron/run_queue.sh Process1
/home/byron/run_queue.sh Process2
/home/byron/run_queue.sh Process3
The bash script would check for the $process.running file if it finds it exit.
otherwise:
Create the $process.running file.
start the php process. Block/wait until finished.
Delete the $process.running file.
This allows for the php script to die but not cause the pool to loose a worker.
If the queue is empty the php script exits immediately and is started again by the nex invocation of cron.