Adding a new job with GearmanManager?

Adding a new job with GearmanManager? - php

I'm still new to the whole Gearman and GearmanManager cycle. I have a working server and have verified my jobs run if they're already in my queue (MySQL table) when the server starts. However, I need to be able to add a new job to the queue from PHP, and from inside of the worker if possible.
Right now I have a job that I will be creating on the deployment of our new codebase. This will be the first job to run and it's purpose is to gather some data for reports and store it.
This needs to be run every hour on the hour, so I want to utilize the when_to_run column. I've been mulling over the documentation for Gearman but I'm still confused on how I'm actually supposed to add a job to the queue.
I've tried running:
<?php
$gm = new GearmanClient;
$gm->addServer();
$gm->doBackground('Metadata_Ingest_Report', '', com_create_guid());
On a side note, yes, I do have php-pecl-uuid installed.
The above code just hangs and doesn't do anything. No job is added to the DB, nothing happens.
This is due to me not fully understanding how a job gets sent, and I'm doing my best to RTM, but I'm not having any luck.
So if there is something you can point me to, or if someone has some time to explain how I'm supposed to setup and add jobs to the MySQL queue so GearmanManager's workers pick them up that would be awesome.
Edit: So it appears that you have to call $gm->addServer('127.0.0.1'). According to the documentation 127.0.0.1 is supposed to be the default, but that does not appear to be the case running PHP 5.4.11. I can now get the tasks to run if I call $gm->runTasks() after $gm->addTask(). I would expect to just have to call $gm->addTask() and the task would be added to the DB and GearmanManager would see it and spool it up. Still digging...
Best regards,
Andrew

So it appears that the when_to_run functionality is not exposed on pecl-gearman. Because this, we are unable to schedule jobs for the future using their built in methods. It also appears that the library does not create the DB records like it should (I'd assume this may actually be Gearmand not offloading the jobs to the DB before they're run.
To get around this we have decided to do the following.
Scheduling Future Jobs
Manually INSERT the job into gearman_queue.
Run a CRON every minute to ping the queue table and load jobs that have a when_to_run <= time()
Fire those jobs via addTask($function, $payload) and runTasks(). The $payload contains the UUID from the DB as well.
GearmanManager picks up the job and hands off the payload to their respective workers.
Worker runs, then on completion, removes the item from the DB with a DELETE.
Running Job Immediately
Manually INSERT the job into gearmand_queue with a when_to_run of NULL.
Run addTask($function, $payload) and runTasks(). The $payload contains the UUID from the DB as well.
GearmanManager picks up the job and hands off the payload to their respetive workers.
Worker runs, then on completion, removes the item from the DB with a DELETE.
Conclusion
Gearmand Job Server, GearmanManager, and pecl-gearman all seem to be out of sync when it comes to what is supported and how it's done. For the most part I think this issue lays within the core of pecl-gearman talking to Gearmand.
I have also opened a feature request on the pecl-gearman project for when_to_run: https://bugs.php.net/bug.php?id=64120

Before added task, nedded starting gearman server.
for linux:
/usr/bin/gearmand -d -L 127.0.0.1 -q libdrizzle /
--libdrizzle-user=your_db_user --libdrizzle-password=your_db_pass /
--libdrizzle-db=your_db_name --libdrizzle-mysql -vvvv
after adding task, create worker
look like worker.php:
<?php
$worker= new GearmanWorker();
$worker->addServer();
while ($worker->work());
function Metadata_Ingest_Report( $job )
{
// do something
}
and starting this worker
/usr/bin/php worker.php

Related

Laravel: Schedule a job or an artisan command?

I have a Product model with id,name,price.
The price value is stored in an external API and i need to fetch it every minute in order to update it in the database.
Looking through the Laravel documentation I found two ways to implement:
Create an artisan command (https://laravel.com/docs/8.x/artisan) and add it to task scheduling (https://laravel.com/docs/8.x/scheduling#scheduling-artisan-commands)
Create a job (https://laravel.com/docs/8.x/queues) and add it to task scheduling (https://laravel.com/docs/8.x/scheduling#scheduling-artisan-commands)
First of all, is there any other approach i should take in consideration?
If not, which one of the above would be the best approach and why is it correct for my use case?

As per my comments on one of your previous questions on this topic, whether you use a queue or not depends on your use case.
An Artisan command is a process that executes once and performs a task or tasks and then exits when that task is complete. It is generally run from the command line rather than through a user action. You can then use the task scheduling of your command's host operating system (e.g. a CRON job) to execute that command periodically. It will faithfully execute it when you schedule it to be done.
A Queued job will execute when the Job turns up next in the queue, in priority order. Let's say you send your API call (from your other post) to the queue to be processed. Another system then decides it needs to send out emails urgently (with a higher priority). Suddenly, your Job, which was next, is now waiting for 2000 other Jobs to finish (which might take a half hour). Then, you're no longer receiving new data until your Job executes.
With a scheduled job, you have a time critical system in place. With queues, you have a "when I get to it" approach.
Hope this makes the difference clearer.

With laravel it is a lot easy to use the built in scheduler. You have to add only one entry to the crontab and that is to run the command php artisan schedule:run EVERY MINUTE on your project. After that you dont have to thing about configuring the crontab on the server, you just add commands to the laravel scheduler and they will work as expected.

You should probably use Cron Job Task Scheduling which would be the first approach you mentioned.
Commonly for this type of use-cases commands are the easiest and cleanest approach.
There are a few things to do in order to make it work as expected:
Create a new command that will need to take care of hitting the endpoint and storing the retrieved data to the database
In Kernel.php file register your command and the frequency of running (each minute)
Run php artisan schedule:run
You can read more about how to create it here:

How to set up Beanstalkd with PHP

Recently I've been researching the use of Beanstalkd with PHP. I've learned quite a bit but have a few questions about the setup on a server, etc.
Here is how I see it working:
I install Beanstalkd and any dependencies (such as libevent) on my Ubuntu server. I then start the Beanstalkd daemon (which should basically run at all times).
Somewhere in my website (such as when a user performs some actions, etc) tasks get added to various tubes within the Beanstalkd queue.
I have a bash script (such as the following one) that is run as a deamon that basically executes a PHP script.
#!/bin/sh
php worker.php
4) The worker script would have something like this to execute the queued up tasks:
while(1) {
$job = $this->pheanstalk->watch('test')->ignore('default')->reserve();
$job_encoded = json_decode($job->getData(), false);
$done_jobs[] = $job_encoded;
$this->log('job:'.print_r($job_encoded, 1));
$this->pheanstalk->delete($job);
}
Now here are my questions based on the above setup (which correct me if I'm wrong about that):
Say I have the task of importing an RSS feed into a database or something. If 10 users do this at once, they'll all be queued up in the "test" tube. However, they'd then only be executed one at a time. Would it be better to have 10 different tubes all executing at the same time?
If I do need more tubes, does that then also mean that i'd need 10 worker scripts? One for each tube all running concurrently with basically the same code except for the string literal in the watch() function.
If I run that script as a daemon, how does that work? Will it constantly be executing the worker.php script? That script loops until the queue is empty theoretically, so shouldn't it only be kicked off once? How does the daemon decide how often to execute worker.php? Is that just a setting?
Thanks!

If the worker isn't taking too long to fetch the feed, it will be fine. You can run multiple workers if required to process more than one at a time. I've got a system (currently using Amazon SQS, but I've done similar with BeanstalkD before), with up to 200 (or more) workers pulling from the queue.
A single worker script (the same script running multiple times) should be fine - the script can watch multiple tubes at the same time, and the first one available will be reserved. You can also use the job-stat command to see where a particular $job came from (which tube), or put some meta-information into the message if you need to tell each type from another.
A good example of running a worker is described here. I've also added supervisord (also, a useful post to get started) to easily start and keep running a number of workers per machine (I run shell scripts, as in the first link). I would limit the number of times it loops, and also put a number into the reserve() to have it wait for a few seconds, or more, for the next job the become available without spinning out of control in a tight loop that does not pause at all - even if there was nothing to do.
Addendum:
The shell script would be run as many times as you need. (the link show how to have it re-run as required with exec $#). Whenever the php script exits, it re-runs the PHP.
Apparently there's a Djanjo app to show some stats, but it's trivial enough to connect to the daemon, get a list of tubes, and then get the stats for each tube - or just counts.

Cron Tasks on load balanced web servers

I'm looking for better solution to handling our cron tasks in a load balanced environment.
Currently have:
PHP application running on 3 CentOS servers behind a load balancer.
Tasks that need to be run periodically but only on a single machine at a time.
Good old cron set up to run those tasks on the first server.
Problems if the first server is out of play for whatever reason.
Looking for:
Something more robust and de-centralized.
Load balancing the tasks so multiple tasks would run only once but on random/different servers to spread the load.
Preventing not having the tasks run when the first server goes down.
Being able to manage tasks and see aggregate reports ideally using a web interface.
Notifications if anything goes wrong.
The solution doesn't need to be implemented in PHP but it would be nice as it would allow us to easily tweak it if needed.
I have found two projects that look promissing. GNUBatch and Job Scheduler. Will most likely further test both but I wonder if someone has better solution for the above.
Thanks.

You can use this small library that uses redis to create a temporary timed lock:
https://github.com/AlexDisler/MutexLock
The servers should be identical and have the same cron configuration. The server that will be first to create the lock will also execute the task. The other servers will see the lock and exit without executing anything.
For example, in the php file that executes the scheduled task:
MutexLock\Lock::init([
'host' => $redisHost,
'port' => $redisPort
]);
// check if a lock was already created,
// if it was, it means that another server is already executing this task
if (!MutexLock\Lock::set($lockKeyName, $lockTimeInSeconds)) {
return;
}
// if no lock was created, execute the scheduled task
scheduledTaskThatRunsOnlyOnce();
To run the tasks in a de-centralized way and spread the load, take a look at: https://github.com/chrisboulton/php-resque
It's a php port of the ruby version of resque and it stores the data in the same exact format so you can use https://github.com/resque/resque-web or http://resqueboard.kamisama.me/ to monitor the workers and see reports

Assuming you have a database available not hosted on one of those 3 servers;
Write a "wrapper" script that goes in cron, and takes the program you're running as its argument. The very first thing it does is connect to the remote database, and check when the last time an entry was inserted into a table (created for this wrapper). If the last insertion time is greater than when it was supposed to run, then insert a new record into the table with the current time, and execute the wrapper's argument (your cron job).
Cron up the wrapper on each server, each set X minutes behind the other (server A runs at the top of the hour, server B runs at 5 minutes, C at 10 minutes, etc).
The first server will always execute the cron first, so the other two servers never will. If the first server goes down, the second server will see it hasn't ran, and will run it.
If you also record in the table which server it was that executed the job, you'll have a log of when/where the script was executed.

Wouldn't this be an ideal situation for using a message / task queue?

I ran into the same problem but came up with this litte repository:
https://github.com/incapption/LoadBalancedCronTask

cron jobs or PHP scheduler

I am using MYSQL as my database and PHP as my programming language.I wanted to run a cron job which would run until the current system date matches the "deadline(date)" column in my database table called "PROJECT".Once the dates are same an update query has to run which would change the status(field of project table) from "open" to "close".
I am not really sure if cron jobs are the best way or I could use triggers or may be something else.Also I am using Apache as my web server and my OS is windows vista.
Also which is the best way to do it? PHP scheduler or cron jobs or any other method? can anybody enlighten me?

I think your concept needs to change.
PHP cannot schedule a job, neither can MySQL. Triggers in MySQL execute when a mysql query occurs, not at a specific time. Neither
This limitation usually isn't a problem in web development. The reason is because your PHP application should control all data going in and out. Usually, this means just the HTML that displays that data, or other formats to users, or other programs.
In your case you can think about it this way. The deadline is a set date. You can treat it as data, and save it to your database. When the deadline occurs is not important, it is that the data you have sent in your database is viewed correctly.
When a request is made to your application, check if the date of the deadline is in the past, if it is, then display that the project is closed - or update that the project is closed, just before display.
There really is no reason to update data independantly of your PHP application.
Usually, the only things you want to schedule are jobs that would affect your application in terms of load, or that need to be done only once, or where concurrency or time is an issue.
In your case none of those apply.
PS: I haven't tried PHPscheduler but I can guess it isn't a true scheduler. Cron is a deamon that sleeps until a given task is due in its queue, executes the task, then sleeps till the next one is due (at least thats what it does in the current algorithm). PHP cannot do that without the sockets and fork extensions, as special setup. So PHPscheduler is most likely just checking if a date for a task has expired, on each load of a webpage (whenever PHP executes a page). This is no different then you just checking if the date on the project has expired, without the overhead of PHPScheduler.

I would always go for a cron job for anything scheduling related.
The big bonus point is that you can echo info out as well and it get's emailed to you.
You'll find once you start using cronjobs, it's hard to stop.

cron does not exist, per se, in vista, but what does exist is the standard windows scheduling manager which you can run with a command line like "php -q -f myfile.php" which will execute the php file at the given time.
you can also use a port of the cron program, there are many out there.
if it is not critical to the second, any windows scheduling application will do, just be sure to have you PHP bin path in your PATH variable for simplicity.

For Windows CRON jobs I cannot recommend PyCron enough.

While CRON and Windows Scheduled Tasks are the tried and true ways of scheduling jobs/tasks to run on a regular basis, there are use cases where having a different scheduled task in CRON/Windows can become tedious. Namely when you want to let users schedule things to run, or for instances where you prefer simplicity/maintainability/portability/etc or all of the above.
In cases where I prefer to not use CRON/Windows for scheduled tasks, I build into the application a task scheduling system. This still requires 1 CRON job or Windows Task to be scheduled. The idea is to store Job details in the database (job name, job properties, last run time, run interval, anything else that is important for your implementation). You then schedule a "Master" job in CRON or Windows which handles running all of your other jobs for you. You'll need this master job to run at least as often as your shortest interval; if you want to be able to schedule jobs that run every minute the master job needs to run every minute.
You can then launch each scheduled job in the background from PHP with minimal effort (if you want). In memory constrained systems you can monitor memory usage or keep track of the PIDs (various methods) and limit to N jobs running at a given time.
I've had a great deal of success with this method, YMMV however based on your needs and your implementation.

how about PHPscheduler..R they not better than cronjobs? I think crons would be independent of the application hence would be difficult if one has to change the host..i am not really sure though..It would be great if anyone can comment on this!! Thanks!

Infrastructure for Running your Zend Queue Receiver

I have a simple messaging queue setup and running using the Zend_Queue object heirarchy. I'm using a Zend_Queue_Adapter_Db back-end. I'm interested in using this as a job queue, to schedule things for processing at a later time. They're jobs that don't need to happen immediately, but should happen sooner rather than later.
Is there a best-practices/standard way to setup your infrastructure to run jobs? I understand the code for receiving a message from the queue, but what's not so clear to me is how run the program that does that receiving. A cron that receives n messages on the command-line, run once a minute? A cron that fires off multiple web requests, each web request running the receiver script? Something else?
Tangential bonus question. If I'm running other queries with Zend_Db, will the message queue queries be considered part of that transaction?

You can do it like a thread pool. Create a command line php script to handle the receiving. It should be started by a shell script that automatically restarts the process if it dies. The shell script should not start the process if it is already running (use a $pid.running file or similar). Have cron run several of these every 1-10 minutes. That should handle the receiving nicely.
I wouldn't have the cron fire a web request unless your cron is on another server for some strange reason.
Another way to use this would be to have some backround process creating data, and a web user(s) consume it as they naturally browse the site. A report generator might work this way. Company-wide reports are available to all users but you don't want them all generating this db/time intensive report. So you create a queue and process one at a time possible removing duplicates. All users can view the report(s) when ready.
According to the docs it doens't look like the zend db is even using the same connection as your other zend_db queries. But of course the best way to find out is to make a simple test.
EDIT
The multiple lines in the cron are for concurrency. each line represents a worker for the pool. I was not clear, you don't want the pid as the identifier, you want to pass that as a parameter.
/home/byron/run_queue.sh Process1
/home/byron/run_queue.sh Process2
/home/byron/run_queue.sh Process3
The bash script would check for the $process.running file if it finds it exit.
otherwise:
Create the $process.running file.
start the php process. Block/wait until finished.
Delete the $process.running file.
This allows for the php script to die but not cause the pool to loose a worker.
If the queue is empty the php script exits immediately and is started again by the nex invocation of cron.

We Keep Coding

PHP, A popular general-purpose scripting language that is especially suited to web development.