I'm looking for better solution to handling our cron tasks in a load balanced environment.
Currently have:
PHP application running on 3 CentOS servers behind a load balancer.
Tasks that need to be run periodically but only on a single machine at a time.
Good old cron set up to run those tasks on the first server.
Problems if the first server is out of play for whatever reason.
Looking for:
Something more robust and de-centralized.
Load balancing the tasks so multiple tasks would run only once but on random/different servers to spread the load.
Preventing not having the tasks run when the first server goes down.
Being able to manage tasks and see aggregate reports ideally using a web interface.
Notifications if anything goes wrong.
The solution doesn't need to be implemented in PHP but it would be nice as it would allow us to easily tweak it if needed.
I have found two projects that look promissing. GNUBatch and Job Scheduler. Will most likely further test both but I wonder if someone has better solution for the above.
Thanks.
You can use this small library that uses redis to create a temporary timed lock:
https://github.com/AlexDisler/MutexLock
The servers should be identical and have the same cron configuration. The server that will be first to create the lock will also execute the task. The other servers will see the lock and exit without executing anything.
For example, in the php file that executes the scheduled task:
MutexLock\Lock::init([
'host' => $redisHost,
'port' => $redisPort
]);
// check if a lock was already created,
// if it was, it means that another server is already executing this task
if (!MutexLock\Lock::set($lockKeyName, $lockTimeInSeconds)) {
return;
}
// if no lock was created, execute the scheduled task
scheduledTaskThatRunsOnlyOnce();
To run the tasks in a de-centralized way and spread the load, take a look at: https://github.com/chrisboulton/php-resque
It's a php port of the ruby version of resque and it stores the data in the same exact format so you can use https://github.com/resque/resque-web or http://resqueboard.kamisama.me/ to monitor the workers and see reports
Assuming you have a database available not hosted on one of those 3 servers;
Write a "wrapper" script that goes in cron, and takes the program you're running as its argument. The very first thing it does is connect to the remote database, and check when the last time an entry was inserted into a table (created for this wrapper). If the last insertion time is greater than when it was supposed to run, then insert a new record into the table with the current time, and execute the wrapper's argument (your cron job).
Cron up the wrapper on each server, each set X minutes behind the other (server A runs at the top of the hour, server B runs at 5 minutes, C at 10 minutes, etc).
The first server will always execute the cron first, so the other two servers never will. If the first server goes down, the second server will see it hasn't ran, and will run it.
If you also record in the table which server it was that executed the job, you'll have a log of when/where the script was executed.
Wouldn't this be an ideal situation for using a message / task queue?
I ran into the same problem but came up with this litte repository:
https://github.com/incapption/LoadBalancedCronTask
Related
Context
I'm currently implementing a feature to schedule notifications for a specific period through a web form using PHP and Firebase.
To send the notification I use Firebase and it sends notifications to Android/Ios.
To schedule the notification I use the AT linux service, as it seems to suit better than cron, as cron runs at certain frequencies and AT does not, it runs at a specific time.
man page about the AT: man page AT
Sample code
/usr/bin/php `send_notification.php` | at 2021-07-11 15:40
This will create a file on linux that will run in the period 2021-07-11 15:40 only once.
Problems
The AT service, like CRON, creates files inside a directory on the operating system that represent the jobs.
1 - If a machine on AWS is scaled, jobs would likely be duplicated and consequently send notifications more than once. (Note: I don't know much about machine scaling, but I believe it should happen)
2 - And if the machine is in downtime due to the inclusion of some functionality or something like that, I believe that the way it is currently the job would not be executed.
3 - Another problem, but not the main one, would be if I was using a docker container. As Ubuntu + PHP are inside the container, the job files would probably be lost if I restarted the container, so in this case I believe that a solution would be to use volume, but that would not be my problem now, as currently the application uses only one machine on AWS EB with the PHP image.
Doubts
Is there any solution I can apply to solve this duplicate job problem using PHP?
Is the approach using AT the most suitable? I see a lot of people talking to use CRON, but CRON will run the job several times and for me that's not what I'm looking for.
I think you need a place where scheduled and finished notifications will be persisted, independently on what you are using, cron or at.
If I had such a task, I would stay with a solution like this: run special script, "scheduler.php" each 1 (or more, e.g. 5) mins by cron, which will check some log file(or remote database in case of several machines) and look if there are any new lines. If new line present and it contains timestamp in the past and status "sceduled", than script will lock it and run your "sender.php". After that it will mark the line as "done". Each line in a storage should contain a timestamp to run and one of three statuses "scheduled", "running" and "done".
With such approach you could plan new notifications by adding a line with needed time and status "scheduled" to the storage. Note, that there can be a little delay between scheduled time and actual notification depending on the cron interval, but I suppose it is not critical.
This will allow you to run any number of crons on different machines and guarantee that each job will be done once.
Important: if you will adopt this scheme, be sure that your scheduler.php reads and updates a storage in a single atomic operation, to prevent race conditions between several crons. File locks, or "select for update" will do.
I'm using AWS Amazon Web Services and I have 4 identical Web Servers. I have a cron job setup to sent out emails that only needs to run from one server but exists on all 4.
Everything on these web servers is identical which suits me but currently the cron job runs on all 4. I therefore looking for a solution on how to have the job on all 4 servers but only run on one.
I do have a database and thought of using a tmp table to the first to write/register itself in that table would then be the only one to run (the others would read this and then not execute past this point). But I was concerned because they all use an identical system clock and therefore might read the table at the same time and even though one writes to it the others assume they also have access to run.
Would using a table be able to restrict 4 webservers to only 1 contined or would the identical clock factor get in the way?
Note: the server names have the potential of changing with time as they are more like resources then permanent systems. (eg: can spin new ones up when required)...
Is there a better way to do this? Note: I don't want a different server and I want them all identical.
thankyou
For me I would do:
1 - create a lock for each of the server
2 - define a list of server like ['1'=>'server ip', '2'=>'server ip',...]
3 - pass in server id in cron job as args
4 - modify the script if (lock not exists) do run then create a lock for this server, delete the lock for next server.
5 - delete the lock for the first server to start with.
The machines will each have different hostnames, so you could use that to only execute the job on the desired machine. For example if the machines are "server01", "server02", etc, then in PHP you might do:
if (gethostname() === 'server01') {
// do the task
}
Another way is, instead of having a single server image for all servers, have two slightly different images, one is for the "primary" server (which has the cron tasks, etc) and the other is for all other ("secondary") servers.
I have a Cron Job with PHP which I want to set up on my webhost, but at the moment the script takes about 20 seconds to run with only 3 users data being refreshed. If I get a 1000 users - gonna take ages. Is there an alternative to Cron Job? Will my web host let me run a cron job which takes, for example, 10 minutes to run?
Your cron job can be as long as you want.
The main problem for you is that you must ensure the next cron job execution is not occuring while the first one is still running. You have a lot of solutions to avoid it, basically use a semaphore.
It can be a lock file, a record in database. Your cron job should check if the previous one is finished or not. A good thing is maybe sending you an email if he cannot run because of a long previous job (this way you'll have some notice alerting you that something is maybe getting wrong) By default cron jobs with bad error dstatus on exit are outputing all the standard output to the email of the account running the job, depending on how is configured the platform you could use this behavior or build an smtp connexion on the job (or store the alert in a database table).
If you want some alternatives to cron jobs you should have a look at work queues. You can mix work queues with a cron job, or use work queue in apache-php envirronment, lot of solutions, but the main idea is to make on single queue of things that should be done, and execute them one after the other (but be careful, if you handle theses tasks very slowly you'll get a big fat waiting queue).
A cron job shouldn't have any bearing on how long it's 'job' takes to complete. If you're jobs are taking 20 seconds to complete, it's PHP's fault, not cronjob.
Will my web host let me run a cron job which takes, for example, 10 minutes to run?
Ask your webhost.
If you want to learn about optimizing php scripts, take a look at Profiling PHP Code.
I am using MYSQL as my database and PHP as my programming language.I wanted to run a cron job which would run until the current system date matches the "deadline(date)" column in my database table called "PROJECT".Once the dates are same an update query has to run which would change the status(field of project table) from "open" to "close".
I am not really sure if cron jobs are the best way or I could use triggers or may be something else.Also I am using Apache as my web server and my OS is windows vista.
Also which is the best way to do it? PHP scheduler or cron jobs or any other method? can anybody enlighten me?
I think your concept needs to change.
PHP cannot schedule a job, neither can MySQL. Triggers in MySQL execute when a mysql query occurs, not at a specific time. Neither
This limitation usually isn't a problem in web development. The reason is because your PHP application should control all data going in and out. Usually, this means just the HTML that displays that data, or other formats to users, or other programs.
In your case you can think about it this way. The deadline is a set date. You can treat it as data, and save it to your database. When the deadline occurs is not important, it is that the data you have sent in your database is viewed correctly.
When a request is made to your application, check if the date of the deadline is in the past, if it is, then display that the project is closed - or update that the project is closed, just before display.
There really is no reason to update data independantly of your PHP application.
Usually, the only things you want to schedule are jobs that would affect your application in terms of load, or that need to be done only once, or where concurrency or time is an issue.
In your case none of those apply.
PS: I haven't tried PHPscheduler but I can guess it isn't a true scheduler. Cron is a deamon that sleeps until a given task is due in its queue, executes the task, then sleeps till the next one is due (at least thats what it does in the current algorithm). PHP cannot do that without the sockets and fork extensions, as special setup. So PHPscheduler is most likely just checking if a date for a task has expired, on each load of a webpage (whenever PHP executes a page). This is no different then you just checking if the date on the project has expired, without the overhead of PHPScheduler.
I would always go for a cron job for anything scheduling related.
The big bonus point is that you can echo info out as well and it get's emailed to you.
You'll find once you start using cronjobs, it's hard to stop.
cron does not exist, per se, in vista, but what does exist is the standard windows scheduling manager which you can run with a command line like "php -q -f myfile.php" which will execute the php file at the given time.
you can also use a port of the cron program, there are many out there.
if it is not critical to the second, any windows scheduling application will do, just be sure to have you PHP bin path in your PATH variable for simplicity.
For Windows CRON jobs I cannot recommend PyCron enough.
While CRON and Windows Scheduled Tasks are the tried and true ways of scheduling jobs/tasks to run on a regular basis, there are use cases where having a different scheduled task in CRON/Windows can become tedious. Namely when you want to let users schedule things to run, or for instances where you prefer simplicity/maintainability/portability/etc or all of the above.
In cases where I prefer to not use CRON/Windows for scheduled tasks, I build into the application a task scheduling system. This still requires 1 CRON job or Windows Task to be scheduled. The idea is to store Job details in the database (job name, job properties, last run time, run interval, anything else that is important for your implementation). You then schedule a "Master" job in CRON or Windows which handles running all of your other jobs for you. You'll need this master job to run at least as often as your shortest interval; if you want to be able to schedule jobs that run every minute the master job needs to run every minute.
You can then launch each scheduled job in the background from PHP with minimal effort (if you want). In memory constrained systems you can monitor memory usage or keep track of the PIDs (various methods) and limit to N jobs running at a given time.
I've had a great deal of success with this method, YMMV however based on your needs and your implementation.
how about PHPscheduler..R they not better than cronjobs? I think crons would be independent of the application hence would be difficult if one has to change the host..i am not really sure though..It would be great if anyone can comment on this!! Thanks!
I have a simple messaging queue setup and running using the Zend_Queue object heirarchy. I'm using a Zend_Queue_Adapter_Db back-end. I'm interested in using this as a job queue, to schedule things for processing at a later time. They're jobs that don't need to happen immediately, but should happen sooner rather than later.
Is there a best-practices/standard way to setup your infrastructure to run jobs? I understand the code for receiving a message from the queue, but what's not so clear to me is how run the program that does that receiving. A cron that receives n messages on the command-line, run once a minute? A cron that fires off multiple web requests, each web request running the receiver script? Something else?
Tangential bonus question. If I'm running other queries with Zend_Db, will the message queue queries be considered part of that transaction?
You can do it like a thread pool. Create a command line php script to handle the receiving. It should be started by a shell script that automatically restarts the process if it dies. The shell script should not start the process if it is already running (use a $pid.running file or similar). Have cron run several of these every 1-10 minutes. That should handle the receiving nicely.
I wouldn't have the cron fire a web request unless your cron is on another server for some strange reason.
Another way to use this would be to have some backround process creating data, and a web user(s) consume it as they naturally browse the site. A report generator might work this way. Company-wide reports are available to all users but you don't want them all generating this db/time intensive report. So you create a queue and process one at a time possible removing duplicates. All users can view the report(s) when ready.
According to the docs it doens't look like the zend db is even using the same connection as your other zend_db queries. But of course the best way to find out is to make a simple test.
EDIT
The multiple lines in the cron are for concurrency. each line represents a worker for the pool. I was not clear, you don't want the pid as the identifier, you want to pass that as a parameter.
/home/byron/run_queue.sh Process1
/home/byron/run_queue.sh Process2
/home/byron/run_queue.sh Process3
The bash script would check for the $process.running file if it finds it exit.
otherwise:
Create the $process.running file.
start the php process. Block/wait until finished.
Delete the $process.running file.
This allows for the php script to die but not cause the pool to loose a worker.
If the queue is empty the php script exits immediately and is started again by the nex invocation of cron.