PHP - Activating Multiple Instances of a Process - php

I'm building a system that watches a queue and activates a set of tasks on a regular interval.
I'm interested in running multiple instances of my processing "bots" based on how many items are in the queue. So if there are 5 items I'll run two bots and if their are 10 I'll run four.
I know how to run multiple instances from CLI (manually), but how would I do this as a function of my application? And how would I properly track the creation and destruction of these bots?

It seems like cron (*nix) or task scheduler (windows) would be what you need.
http://en.wikipedia.org/wiki/Cron
http://msdn.microsoft.com/en-us/library/aa383614%28VS.85%29.aspx
These can run a PHP script that determine how many "bots" need run, calculations, etc. Anything PHP is capable of.
Also, for running the multiple bots in the background (after the main controller script has finished executing) you may want to look at PHP process forking.

You might also want to look at gearman ( http://gearman.org/ )

Related

Running an infinite loop in cron job

Running an infinite loop in cron job.Suppose, i have written a php based script to run on my server computer using cron job, and i want to use infinite loop in that php script.Any ideas for running an infinite loop in cron job.
Infinite looping applications are usually called daemons. They are system services that offer some kind of constant processing and/or the readiness to accept some potential incoming processing activities.
Gearman is a system daemon you can install than can handle various tasks you give it. It's a complex tools that allows many things but it could be used to implement your necessities.
PHP::Gearman is a Gearman client that talks to the Gearman daemon and sends tasks to the daemon specifying the conditions under which the task must be executed.
The limitations that #Jeffrey emphasized about PHP are true because PHP was designed as a share nothing architecture (one page load equals one script execution - each page load works under its own data context).
Perhaps System Daemon (a pear package) may assist in overcoming some or all of the limitations mentioned above. I haven't used it so I can't tell you much more about it but it's as good a place to start as any.

How to set up Beanstalkd with PHP

Recently I've been researching the use of Beanstalkd with PHP. I've learned quite a bit but have a few questions about the setup on a server, etc.
Here is how I see it working:
I install Beanstalkd and any dependencies (such as libevent) on my Ubuntu server. I then start the Beanstalkd daemon (which should basically run at all times).
Somewhere in my website (such as when a user performs some actions, etc) tasks get added to various tubes within the Beanstalkd queue.
I have a bash script (such as the following one) that is run as a deamon that basically executes a PHP script.
#!/bin/sh
php worker.php
4) The worker script would have something like this to execute the queued up tasks:
while(1) {
$job = $this->pheanstalk->watch('test')->ignore('default')->reserve();
$job_encoded = json_decode($job->getData(), false);
$done_jobs[] = $job_encoded;
$this->log('job:'.print_r($job_encoded, 1));
$this->pheanstalk->delete($job);
}
Now here are my questions based on the above setup (which correct me if I'm wrong about that):
Say I have the task of importing an RSS feed into a database or something. If 10 users do this at once, they'll all be queued up in the "test" tube. However, they'd then only be executed one at a time. Would it be better to have 10 different tubes all executing at the same time?
If I do need more tubes, does that then also mean that i'd need 10 worker scripts? One for each tube all running concurrently with basically the same code except for the string literal in the watch() function.
If I run that script as a daemon, how does that work? Will it constantly be executing the worker.php script? That script loops until the queue is empty theoretically, so shouldn't it only be kicked off once? How does the daemon decide how often to execute worker.php? Is that just a setting?
Thanks!
If the worker isn't taking too long to fetch the feed, it will be fine. You can run multiple workers if required to process more than one at a time. I've got a system (currently using Amazon SQS, but I've done similar with BeanstalkD before), with up to 200 (or more) workers pulling from the queue.
A single worker script (the same script running multiple times) should be fine - the script can watch multiple tubes at the same time, and the first one available will be reserved. You can also use the job-stat command to see where a particular $job came from (which tube), or put some meta-information into the message if you need to tell each type from another.
A good example of running a worker is described here. I've also added supervisord (also, a useful post to get started) to easily start and keep running a number of workers per machine (I run shell scripts, as in the first link). I would limit the number of times it loops, and also put a number into the reserve() to have it wait for a few seconds, or more, for the next job the become available without spinning out of control in a tight loop that does not pause at all - even if there was nothing to do.
Addendum:
The shell script would be run as many times as you need. (the link show how to have it re-run as required with exec $#). Whenever the php script exits, it re-runs the PHP.
Apparently there's a Djanjo app to show some stats, but it's trivial enough to connect to the daemon, get a list of tubes, and then get the stats for each tube - or just counts.

PHP Concurrency via Cron

I have a few scripts that need to run concurrently as separate processes. My plan is to have a cron job that executes multiple instances of these scripts at a set interval. Is this a good idea? What are the pros/cons to this approach? Are there any other options I need to consider?
Bottomline: I'm trying to mimic multithreading. Any race conditions will be handled via code (e.g. setting statuses in DB, etc.). The scripts are supposed to do processing intensive tasks (e.g. creating thumbnails, etc.).
You can use forking. The startup script would load all the default configurations and initializations, then fork child processes to do the processing. It could then monitor the processes to see if they are still running.
http://php.net/manual/en/function.pcntl-fork.php
Well, if you need it as a cronjob, go ahead. If you want multiple processes, you most likely want to use pcntl_fork to create multiple instances of the same script.
Depending on how quickly you want to react to those jobs and if you're looking to do processor intensive tasks then you can also spread out that processing using a queuing system. Check out Gearman or beanstalkd with multiple workers per machine if you have multiple cores/processors.
Doesn't PHP have fork()? While that's not really multithreading, it is a basic way of co-routines.
One con of using cron is that it will execute a copy of your script at the interval you set regardless of how many script processes are already running. This means the scripts need a way to communicate with each other so that a maximum of N scripts are kept running concurrently (excess scripts can just exit immediately).
An alternative to cron could be supervisord which will execute a configurable number of scripts and monitor each one so any that exit are respawned.

Cron jobs and php script question

I have 5-6 jobs to be done by cron, and i have separated php scripts for those jobs.
My question is, which one is better, putting all the jobs in one php script or keeping them in separated php files and entering them to crontab separately ?
thx
I'd recommend separate scripts. Primarily it will be much easier to debug/diagnose issues when you get a "failed" email from cron if you know which script was running at the time.
It also allows you to run the other jobs even if one fails more easily.
It also gives you flexibilty to change the timings of different jobs (e.g. suppose you suddenly need to run one every 15mins, but all the others are hourly).
If they all have to run at the same time, then it may be better to write a wrapper script that invokes them all (or enclose them all in one file), and call the one script from cron.
This is especially true if there is an order dependency, such that one script must run before another. Separating these in cron is tricky at best.
If they have to run at different times, then obviously separate cron jobs are warranted.
I'd suggest you to keep all this jobs separately, as this provides you with more flexibility, for example, when setting time.
I have multiple cron tasks which each run multiple shell scripts which contain multiple php/etc. scripts.
Works very nicely.

PHP: Multithreaded PHP / Web Services?

Greetings All!
I am having some troubles on how to execute thousands upon thousands of requests to a web service (eBay), I have a limit of 5 million calls per day, so there are no problems on that end.
However, I'm trying to figure out how to process 1,000 - 10,000 requests every minute to every 5 minutes.
Basically the flow is:
1) Get list of items from database (1,000 to 10,000 items)
2) Make a API POST request for each item
3) Accept return data, process data, update database
Obviously a single PHP instance running this in a loop would be impossible.
I am aware that PHP is not a multithreaded language.
I tried the CURL solution, basically:
1) Get list of items from database
2) Initialize multi curl session
3) For each item add a curl session for the request
4) execute the multi curl session
So you can imagine 1,000-10,000 GET requests occurring...
This was ok, around 100-200 requests where occurring in about a minute or two, however, only 100-200 of the 1,000 items actually processed, I am thinking that i'm hitting some sort of Apache or MySQL limit?
But this does add latency, its almost like performing a DoS attack on myself.
I'm wondering how you would handle this problem? What if you had to make 10,000 web service requests and 10,000 MySQL updates from the return data from the web service... And this needs to be done in at least 5 minutes.
I am using PHP and MySQL with the Zend Framework.
Thanks!
I've had to do something similar, but with Facebook, updating 300,000+ profiles every hour. As suggested by grossvogel, you need to use many processes to speed things up because the script is spending most of it's time waiting for a response.
You can do this with forking, if your PHP install has support for forking, or you can just execute another PHP script via the command line.
exec('nohup /path/to/script.php >> /tmp/logfile 2>&1 & echo $!'), $processId);
You can pass parameters (getopt) to the php script on the command line to tell it which "batch" to process. You can have the master script do a sleep/check cycle to see if the scripts are still running by checking for the process id's. I've tested up to 100 scripts running at once in this manner, at which point the CPU load can get quite high.
Combine multiple processes with multi-curl, and you should easily be able to do what you need.
My two suggestions are (a) do some benchmarking to find out where your real bottlenecks are and (b) use batching and cacheing wherever possible.
Mysqli allows multiple-statement queries, so you could definitely batch those database updates.
The http requests to the web service are more likely the culprit, though. Check the API you're using to see if you can get more info from a single call, maybe? To break up the work, maybe you want a single master script to shell out to a bunch of individual processes, each of which makes an api call and stores the results in a file or memcached. The master can periodically read the results and update the db. (Careful to rotate the data store for safe reading and writing by multiple processes.)
To understand your requirements better, you must implement your solution only in PHP? Or you can interface a PHP part with another part written in another language?
If you could not go for another language, try to perform this update maybe as php script that runs in the background and not through the apache.
You can follow Brent Baisley advice for a simple use case.
If you want to build a robuts solution, then you need to :
set up a representation of the actions in a table in database that will be your process queue;
set up a script that pop this queue and process your action;
set up a cron daemon that run this script every x.
This way you can have 1000 PHP scripts running, using your OS parallelism capabilities and not hanging when ebay is taking to to respond.
The real advantage of this system is that you can fully control the firepower you throw at your task by adjusting :
the number of request one PHP script does;
the order / number / type / priority of the action in the queue;
the number or scripts the cron daemon runs.
Thanks everyone for the awesome and quick answers!
The advice from Brent Baisley and e-satis works nicely, rather than executing the sub-processes using CURL like i did before, the forking takes a massive load off, it also nicely gets around the issues with max out my apache connection limit.
Thanks again!
It is true that PHP is not multithreaded, but it can certainly be setup with multiple processes.
I have created a system that resemebles the one that you are describing. It's running in a loop and is basically a background process. It uses up to 8 processes for batch processing and a single control process.
It is somewhat simplified because i do not have to have any communication between the processes. Everything resides in a database so each process is spawned with the full context taken from the database.
Here is a basic description of the system.
1. Start control process
2. Check database for new jobs
3. Spawn child process with the job data as a parameter
4. Keep a table of the child processes to be able to control the number of simultaneous processes.
Unfortunately it does not appear to be a widespread idea to use PHP for this type of application, and i really had to write wrappers for the low level functions.
The manual has a whole section on these functions, and it appears that there are methods for allowing IPC as well.
PCNTL has the functions to control forking/child processes, and Semaphore covers IPC.
The interesting part of this is that i'm able to fork off actual PHP code, not execute other programs.

Categories