Start and stopping workers in gearman php - php

I've successfully configured and used gearman and its pecl php extension. I'm using it to execute a long process concerning long sql queries in a background. I'm using Yii btw, if that detail helps.
Here's how I use it :
public function actionProcessWithGearman(){
$output = shell_exec('gnome-terminal -e "php workers/worker.php" > /dev/null 2>/dev/null &');
$client = new GearmanClient();
$client->addServer();
$result = $client->doBackground('executeJob',//parameters);
}
Some details:
If you notice I run a gnome-terminal first, so that I can see the process rather than going directly with the php command, I also added /dev/null so that it will no longer wait for a response. And then the worker is woken up and runs the job.
Problem:
My problem arises when this action is executed several times or executed by several users in different clients, and as a result, multiple terminals running worker.php are being instantiated.
How do I have only one worker? and even if I can have several workers for several users in the different clients, how do I close a worker everytime the task is finished?

You can try adding returnCode() and jobStatus().
A sample here - http://php.net/manual/en/gearmanclient.dobackground.php

If you have a queue system in place the workers should always be running in the background and be ready to catch new jobs.
I think there is a misunderstanding on the queue system logic. You should just add a new job that will be processed by the worker, and not start the worker if you want a job to be done.
Gearman usually goes well with supervisor. Supervisor will control the worker and make sure it will be always available. If you need more than one job to be running at the same time then you can always add a new worker.
Just look for gearman + supervisor + php there are a lot of articles explaining how to set that up.

Letting supervisord taking care of the php processes launch and monitoring is the way to go.
This approach have a few pluses: for instance you can let supervisord start workers automatically, even multiple instances of it. And you can even control supervisord processes through xmlrpc, if you prefer to manage your workers from your own web interface (http://supervisord.org/api.html)
Sample supervisord conf
[program:worker_development]
process_name=worker_%(process_num)s
command=php worker.php
directory=/var/ww/myproject
autorestart=true
user=ubuntu
redirect_stderr=false
numprocs=2
startretries=5

Related

Laravel 5.6. Stop a worker AFTER job execution with supervisor

Is it possible to send a stop signal to the worker in such a way, that it will stop only AFTER processing the job.
Currently I have a job, that takes some time AND can't be interrupted, cause I have only one try/attempt.
Sometimes I need to stop workers to redeploy my code. Is there a way to stop Laravel's worker only after finishing current job and before starting a new one?
I'm using supervisor for restarting the queue workers.
Cause currently on each deploy I'm loosing 1 job and my client loses money :(
P.S.
This is NOT a duplicate of Laravel Artisan CLI safely stop daemon queue workers cause he was using Artisan CLI and I'm using supervisor.
autorestart=true in supervisor + php artisan queue:restart solves the issue.
There is a built-in feature for this:
php artisan queue:restart
This command will instruct all queue workers to gracefully "die" after they finish processing their current job so that no existing jobs are lost. Since the queue workers will die when the queue:restart command is executed, you should be running a process manager such as Supervisor to automatically restart the queue workers.
Supervisord has an XML-RPC Api which you could use from your php code. I sugesst you use Zend's XML-RPC Client

Excute script for n minutes in a user defined range of time

I want to execute a php or python script on a server with values from a database for n minutes a day in a user defined range of time which I would like to store in a database too. So my question is: How can I achive this the most performant and scalable way?
So a user should set a range of time in which the script should be executed, I couldn't figure out how i could do this if multiple users selected the same range of time.
Thanks for your ideas.
Check out cron and gearman.
Both are services that run on linux and would be able to execute either PHP or Python scripts in a predictable and scalable way.
The basic structure of your mechanism would be...
Job initiator - this script would be invoked automatically by cron (every minute max). It would execute very quickly and then exit. You would use this to check the DB for jobs that should be started during that minute, and pass them off to Gearman.
Job executor - this script executes a single job. Since Gearman handles concurrency for you, you only have to worry about doing a single long-running process here.
You can easily configure Gearman to handle many concurrent jobs. Also, there are good Gearman packages for both Python and PHP (I've used them both with great success).
Here is some pseudocode to help you understand how this would work:
initiate.php
<?php
$gearmanClient= new GearmanClient();
$gearmanClient->addServer();
$workloads = fetchWorkloads(time());
foreach ($workloads as $workload) {
$gearmanClient->doBackground('execute', json_encode($workload));
}
execute.php
<?php
$gearmanWorker = new GearmanWorker();
$gearmanWorker->addServer();
$gearmanWorker->addFunction('execute', 'executeMethod');
while($gearmanWorker->work()) {
// Handle any errors here
}
function executeMethod($job)
{
$workload = json_decode($job->workload());
while (time() < $workload->stopTime) {
// Do your task for an amount of time here
}
}
Again, this is just pseudocode. You will need to flesh it out based on your requirements. Obviously you'll also need to learn, install, and/or configure cron and Gearman.
Your cron entry might look like this:
* * * * * www-data php /var/www/my-app/initiate.php
For Gearman, I would recommend using supervisor to ensure that your workers are restarted, etc when they inevitably encounter problems. Your supervisor config might look like this:
[program:myProgram]
user=www-data
command=php /var/www/my-app/initiate.php
numprocs=1
stdout_logfile=/var/log/supervisord.log
autostart=true
autorestart=true
stopsignal=KILL

Adding a new job with GearmanManager?

I'm still new to the whole Gearman and GearmanManager cycle. I have a working server and have verified my jobs run if they're already in my queue (MySQL table) when the server starts. However, I need to be able to add a new job to the queue from PHP, and from inside of the worker if possible.
Right now I have a job that I will be creating on the deployment of our new codebase. This will be the first job to run and it's purpose is to gather some data for reports and store it.
This needs to be run every hour on the hour, so I want to utilize the when_to_run column. I've been mulling over the documentation for Gearman but I'm still confused on how I'm actually supposed to add a job to the queue.
I've tried running:
<?php
$gm = new GearmanClient;
$gm->addServer();
$gm->doBackground('Metadata_Ingest_Report', '', com_create_guid());
On a side note, yes, I do have php-pecl-uuid installed.
The above code just hangs and doesn't do anything. No job is added to the DB, nothing happens.
This is due to me not fully understanding how a job gets sent, and I'm doing my best to RTM, but I'm not having any luck.
So if there is something you can point me to, or if someone has some time to explain how I'm supposed to setup and add jobs to the MySQL queue so GearmanManager's workers pick them up that would be awesome.
Edit: So it appears that you have to call $gm->addServer('127.0.0.1'). According to the documentation 127.0.0.1 is supposed to be the default, but that does not appear to be the case running PHP 5.4.11. I can now get the tasks to run if I call $gm->runTasks() after $gm->addTask(). I would expect to just have to call $gm->addTask() and the task would be added to the DB and GearmanManager would see it and spool it up. Still digging...
Best regards,
Andrew
So it appears that the when_to_run functionality is not exposed on pecl-gearman. Because this, we are unable to schedule jobs for the future using their built in methods. It also appears that the library does not create the DB records like it should (I'd assume this may actually be Gearmand not offloading the jobs to the DB before they're run.
To get around this we have decided to do the following.
Scheduling Future Jobs
Manually INSERT the job into gearman_queue.
Run a CRON every minute to ping the queue table and load jobs that have a when_to_run <= time()
Fire those jobs via addTask($function, $payload) and runTasks(). The $payload contains the UUID from the DB as well.
GearmanManager picks up the job and hands off the payload to their respective workers.
Worker runs, then on completion, removes the item from the DB with a DELETE.
Running Job Immediately
Manually INSERT the job into gearmand_queue with a when_to_run of NULL.
Run addTask($function, $payload) and runTasks(). The $payload contains the UUID from the DB as well.
GearmanManager picks up the job and hands off the payload to their respetive workers.
Worker runs, then on completion, removes the item from the DB with a DELETE.
Conclusion
Gearmand Job Server, GearmanManager, and pecl-gearman all seem to be out of sync when it comes to what is supported and how it's done. For the most part I think this issue lays within the core of pecl-gearman talking to Gearmand.
I have also opened a feature request on the pecl-gearman project for when_to_run: https://bugs.php.net/bug.php?id=64120
Before added task, nedded starting gearman server.
for linux:
/usr/bin/gearmand -d -L 127.0.0.1 -q libdrizzle /
--libdrizzle-user=your_db_user --libdrizzle-password=your_db_pass /
--libdrizzle-db=your_db_name --libdrizzle-mysql -vvvv
after adding task, create worker
look like worker.php:
<?php
$worker= new GearmanWorker();
$worker->addServer();
while ($worker->work());
function Metadata_Ingest_Report( $job )
{
// do something
}
and starting this worker
/usr/bin/php worker.php

Monitoring php Scripts through Gearman

I am trying to run my php scripts in Gearman worker code but also want to monitor
besides that if they are taking more than the expected run time ,I want to kill those scripts.Each script has to run in a timely fashion(say running every 10 minutes) and the Gearman client picks ,the script which are ready to run and send s them to Gearman worker.
I tried using the following options :
1) Tried using an independent script,a normal php script which monitors the running process.
But this normal scripts will not inform Gearman that job got killed and Gearman thinks that the job that got killed is still running.
So that made me think I have to synchronize the process of monitoring and process of running php scripts in the same worker.
Also these jobs need to be restarted and the client takes care of them.
2) I am running my php scripts using the following command :
cd /home/amehrotra/include/core/background;php $workload;(this is blocking does not go to the next line until the script finishes execution).
I tried using exec , but exec does not execute the scripts
exec ("/usr/bin/php /home/amehrotra/include/core/background/$workload >/dev/null &");
3) Tried running 2 workers ,one for running php script another for monitoring but Geraman client does not connect to two workers.
Not the coolest plan, but try to use database as central place where everything is controlled.
It will take some resources and time for your workers but it is the cost to make it manageable.
Worker will need to check for commands (stop/restart) that are assigned to him via db. and he can also save some data into db so you can see what is happening.

beanstalkd - what happens to reserved, but not completed jobs?

I've created a PHP script that reads from beanstalkd and processes the jobs. No problems there.
The last thing I've got to do is just write an init script for it, so it can run as a service.
However, this has now raised another question for me. When trying to stop the service, the one obvious way of doing it would be to try and kill the process. However, if I do that, what will happen to the job, if the PHP script was halfway through processing it? So the job was reserved, but the script never succeeded or failed (to delete or bury respectively), what happens?
My guess is that the TTR will expire, and then it gets put back to the ready queue?
And bonus 2nd question, any hints on how to better manage stopping the PHP service?
When a worker process (beanstalk client) opens up a connection with beanstalkd and reserves a job, the job will be in "reserved" state until the client issues delete/release command (or) job times out.
In case, if the worker process terminates abruptly, its connection with beanstalkd will get closed and the server will immediately release all the jobs that has been reserved using that particular connection.
Ref: http://groups.google.com/group/beanstalk-talk/browse_thread/thread/232d0cac5bebe30f?hide_quotes=no#msg_efa0109e7af4672e
Any job that runs out of time, and is not buried or touched goes back into the ready queue to be reserved.
I've posted elsewhere about using Supervisord and shell scripts to run workers. It has the advantage that most of the time, you probably don't mind waiting for a little while as jobs finish cleanly. You can have supervisord kill the bash scripts that run a worker script, and when the script itself has finished, simply exits, as it can't be restarted.
Another way is to put a highest-priority (0) message into a tube that the workers listen of, that will have the workers first delete the message, and then exit. I setup the shell scripts to check for a specific return value (from exit($val);) and then they too would exit any loop in the shell scripts.
I've used these techniques for Beanstalkd and also AWS:SQS queue runners for some time, dealing with millions of jobs per day running through the system.
If you job is too valuable to lose, you can also use pcntl to wait until the job finishes and then restart/shutdown your worker. I've managed to handle all suitable pcntl signals to release the job back to tube.

Categories