I want to setup a system for a privileged user to create a new task to run from date/time X to date/time Y saved in MySQL or SQLite? The task will send out a request to remote server via SSH and when the end date/time is up another SSH request would be sent.
What I'm not sure about is how to actually trigger the event at the start time and howto trigger the other at the end time?
Should I be polling the server somehow every 1min (sounds like a performance hit) or setup jobs in Iron.io/Amazon SQS or something else?
I noticed Amazon SQS only allows messages to queue for up to 14 days, how would that work for events weeks or months in the future?
Im not looking for code, just the idea on how it should work.
Basically there are two solutions, but maybe a hybrid version suits your problem best...
Use a queue (build into Laravel) and set up delayed jobs in the queue to be fired later on. You already mention that this might not be the best solution when a task takes months/weeks.
Use a cron job. Downside with this is that you can check once every day but that could mean a delay of 23h59m or you can check every minute but that might give you performance issues (in most cases it kind of works, but definitely not perfect).
Combining 1 & 2 might be the best solution; check in de beginning of a day whether there are tasks going to end in the coming day. If so, schedule a job in the queue to end the task at the exact time at which it should end. This gives you scalability and the possibility to create tasks that end a year after they where created.
Related
On the surface it looks very simple problem that I am facing. In an enterprise web app (LAMP stack) we need to add some time based & schedule based tasks. Some examples are
when a user logs in and has stayed active for more than 30 minutes, send them a lucky coupon.
send a newsletter to subscribers every Monday. [easily handled by a cron job]
If a user does not login for 3 days, start stalking her. [doable by cron job but ...]
deduct phone bill amount from user account on 1st working day of every month at 9.
repeat failed deduction every subsequent work day at 9 for a max of 15 retries.
I hope that give you an idea of what is going on that needs to be handled.
At the moment we have cron jobs of almost every possible situation and they are kind of working but as you can see with the above scenarios, we are forced to run those crons almost every second (bit exaggeration but almost).
To handle the issue more elegantly and better implement the ddd concepts, we are thinking to make clock ticking as first class citizen of the application.
We would like to make a simple central clock ticker class, that will emit ticks as time events every second.
The ticks will be published to the central event bus.
And all the classes that are interested to act on the tick, will subscribe to the event bus.
What I am unable to figure out yet is that this will result in making a lot of subscriber/registrant classes code to run on every tick. As this is already the case with cron, could there be a better way to handle the subscription part so that a specific subscription is notified only when it needs to be notified?
And before we even get into solving this problem the way I am proposing, is there is a better way to handle this kind of problems? The key point in this whole scenario seems to be how to trigger something X based on how much time has passed since something Y happened in the domain. I believe I am not the first one to face this issue and this problem must have been long solved already but I am unable to stumble upon any road sign pointing me to the right direction.
The way I have handled this in the past is to queue commands as soon as I know something should happen and then the scheduler will fire off the commands when the time has come.
The scheduler is simply a process that runs as a service and wakes every N milliseconds to find any commands that have passed their ScheduledTime.
For example:
The user has logged in. Queue a command for 30 minutes hence to give them a coupon. After 30 minutes, the scheduler will send the command. If the session is still active, then the command is accepted and a coupon is presented. Otherwise, it simply does nothing.
You also mention several examples that are best handled by a traditional scheduler (cron as you mentioned) and will fire off a batch command. Depending on how knowledgeable your domain is about things like newsletters, you would either issue individual commands to your domain objects or simply pull a report and run a job to send emails.
If you do handle these types of processes in your domain, then your domain should also queue the next command. A saga or process manager would be most suitable for this type of operation. E.g.
CreateNewsletter (This is the batch) -> NewsletterCreated
Accounts.Each(SendAccountNewsletter) -> AccountNewsletterSent
NewsletterCompleted (This is the batch) ->
Queue(command: CreateNewsletter, when: NextMondayAt9) (This is the next batch)
Hope that helps.
P.S. If you publish ticks on your bus, you will have a ton of noise to filter through.
I'm writing a PHP project in Laravel. The admin can queue email alerts to be sent at a specific date/time. The natural choice was to use Laravel's Queue class with Beanstalkd (Laravel uses Pheanstalk internally).
However, the admin can choose to reschedule or delete email alerts that have not yet been sent. I could not find a way yet to delete a specific task so that I can insert a new one with the new timing.
What is the usual technique to accomplish something like this? I'm open to other ideas as well. I don't want to use CRON since the volumes of these emails would be pretty high, and I'd rather reuse an already tested solution for managing a task queue.
I have scoured the internet for such an answer myself.
Here's what I've found:
Beanstalkd
I think the simplest way within Beantalkd is to use its ability to delay a job (giving it a number of seconds to delay as an argument).
You can then do date math to capture the difference in time between now() and when the job ideally is run (in seconds) and delay the job that long.
Here's an example (see first answer to that SO question - it's in Python, but you can get the gist of what the person is saying/doing)
Note that this doesn't guarantee the task is run on time - the delay just make the job available after X seconds. How behind your queue is on processing tasks when the delayed job becomes available determines when the job is run in reality. (If you're queue is behind due to heavy number of jobs, then it won't necessarily run exactly on time!)
Laravel
Laravel's Queue has a later method, which you can use instead of push
Where as with Push, you'd do:
Queue::push('Some\Processing\Class', array('data' => $data));
with later(), you'd do:
$delay = 14400; // 4 hours in seconds
Queue::later($delay, 'Some\Processing\Class', array('data' => $data));
SaaS Options
Google App Engine can schedule tasks
iron.io can schedule tasks (works well with Laravel!)
Other languages
Python's Celery has similar options to delay jobs (with the same caveats)
Ruby's resque has a schedular that also sort of fits this
I'm working on a service where the user schedules his tweets so for example i want to post a tweet tomorrow at 12:30 PM. How can that be done ? is cron jobs the best way to do so ? like running a cronjob every 5 minutes and see if there are tweets to post in this interval ? Are there any alternatives ?
Running a cron job is definitely the easiest solution, however there are other approaches available, one such approach would be to use a queue like Amazons SQS
This lets you simply throw things onto a queue to be processed later, by default they are immediately added to the queue in a state ready to be processed immediately, however you can add items to the queue with a timestamp they should remain dormant until. This would be your users predefined tweet time.
You could then have a script running that is constantly listening to the queue for any new tweets that need to be sent, as soon as a queue item becomes available, it will be processed.
The downside to this is of course that it's more effort, the upsides though are that you can scale easier since you can have multiple machines processing tweets and they wont ever send out the same tweet twice (whereas if two machines are running the same cron, there's the chance they'll both send out the tweet)
We have a web app that uses IMAP to conditionally insert messages into users' mailboxes at user-defined times.
Each of these 'jobs' are stored in a MySQL DB with a timestamp for when the job should be run (may be months into the future). Jobs can be cancelled at anytime by the user.
The problem is that making IMAP connections is a slow process, and before we insert the message we often have to conditionally check whether there is a reply from someone in the inbox (or similar), which adds considerable processing overhead to each job.
We currently have a system where we have cron script running every minute or so that gets all the jobs from the DB that need delivering in the next X minutes. It then splits them up into batches of Z jobs, and for each batch performs an asynchronous POST request back to the same server with all the data for those Z jobs (in order to achieve 'fake' multithreading). The server then processes each batch of Z jobs that come in via HTTP.
The reason we use an async HTTP POST for multithreading and not something like pnctl_fork is so that we can add other servers and have them POST the data to those instead, and have them run the jobs rather than the current server.
So my question is - is there a better way to do this?
I appreciate work queues like beanstalkd are available to use, but do they fit with the model of having to run jobs at specific times?
Also, because we need to keep the jobs in the DB anyway (because we need to provide the users with a UI for managing the jobs), would adding a work queue in there somewhere actually be adding more overhead rather than reducing it?
I'm sure there are better ways to achieve what we need - any suggestions would be much appreciated!
We're using PHP for all this so a PHP-based/compatible solution is really what we are looking for.
Beanstalkd would be a reasonable way to do this. It has the concept of put-with-delay, so you can regularly fill the queue from your primary store with a message that will be able to be reserved, and run, in X seconds (time you want it to run - the time now).
The workers would then run as normal, connecting to the beanstalkd daemon and waiting for a new job to be reserved. It would also be a lot more efficient without the overhead of a HTTP connection. As an example, I used to post messages to Amazon SQS (by http). This could barely do 20 QPS at very most, but Beanstalkd accepted over a thousand per second with barely any effort.
Edited to add: You can't delete a job without knowing it's ID, though you could store that outside. OTOH, do users have to be able to delete jobs at any time up to the last minute? You don't have to put a job into the queue weeks or months in advance, and so you would still only have one DB-reader that ran every, say, 1 to 5 mins to put the next few jobs into the queue, and still have as many workers as you would need, with the efficiencies they can bring.
Ultimately, it depends on the number of DB read/writes that you are doing, and how the database server is able to handle them.
If what you are doing is not a problem now, and won't become so with additional load, then carry on.
I'm new to PHP, so I need some guidance as to which would be the simplest and/or elegant solution to the following problem:
I'm working on a project which has a table with as many as 500,000 records, at user specified periods, a background task must be started which will invoke a command line application on the server that does the magic, the problem is, at each 1 minute or so, I need to check on all 500,000 records(and counting) if something needs to be done.
As the title says, it is time-critical, this means that a maximum of 1 minute delay can be allowed between the time expected by the user and the time that the task is executed, of course the less delay, the better.
Thus far, I can only think of a very dirty option, have a simple utility app that runs on the server, that at each minute, will make multiple requests to the server, example:
check records between 1 and 100,000;
check records between 100,000 and 200,000;
etc. you get the point;
and the server basically starts a task for each bulk of 100,000 records or less, but it seems to me that there must be a faster approach, something similar to facebook's notification.
Additional info:
server is Windows 2008
using apache + php
EDIT 1
users have an average of 3 tasks per day at about 6-8 hours interval
more than half of the tasks can be at least 1 time per day executed at the same time[!]
Any suggestion is highly appreciated!
The easiest approach would be using a persistent task that runs the whole time and receives notification about records that need to be processed. Then it could process them immediately or, in case it needs to be processed at a certain time, it could sleep until either that time is reached or another notification arrives.
I think I gave this question more than enough time, I will stick to a utility application(that sits on the server) that will make requests to a URL accessible only from the server's IP which will spawn a new thread for each task if multiple tasks needs to be executed at the same time, it's not really scalable but it will have to do for now.