I want to extract some of the time consuming things into a queue. For this I found Gearman to be the most used but don't know if it is the right thing for me.
One of the tasks we want to do is queue sending emails and want to provide the feature to be able to cancel to send the mail for 1 minute. So it should not work on the job right away but execute it at now + 1 minute. That way I can cancel the job before that and it never gets sent.
Is there a way to do this?
It will run on debian. And should be usable from php. The only thing I found so far was Schedule a job in Gearman for a specific date and time but that runs on something not widely spread :(
There are two parts to your question: (1) scheduling in the future and (2) being able to cancel the job until that time.
For (1) at should work just fine as specified in that question and the guy even posted his wrapper code. Have you tried it?
If you don't want to use that, consider this scenario:
insert an email record for the email to-be-sent in a database, including a "timeSent" column which you will set 1 minute in the future.
have a single gearman worker (I'll explain why single) look at the database for emails that have not been sent (eg some status column = 0) and where timeSent has already passed, and send those.
So, for (2), if you want to cancel an email before it's sent just update its status column to something else.
Your gearman worker has to be a single one because if you have multiple they might fetch and try to send the same email record. If you need multiple make sure the one that gets the email record first locks it immediately before any time consuming operations like actually emailing it (say, by updating that status column to something else).
Related
Closed. This question is opinion-based. It is not currently accepting answers.
Want to improve this question? Update the question so it can be answered with facts and citations by editing this post.
Closed 4 years ago.
Improve this question
I'm developing a tool for my company, that in very broad strokes is intended to message users some information (a link to a picture) if they decide they want to be notified when it comes online.
If it were just alerting them when it's online is easy, because you don't have to schedule it, just check the list to see if anyone wants to be messaged about the picture when it comes online, and do it.
But we also have "We've got your petition, the picture is not here yet but we'll message you when it is" kind of message, and a few "not here yet" that I have to launch days later if the picture isn't online yet. And all these scheduled jobs need to be canceled if in any moment the picture comes online, and we send the message with it's link.
I'll try to explain it in as much detail as I can:
The user asks to be notified
Our web takes note of the petition. (adds it to the DB)
Check if the file is already online. At this point in time the file might not have been uploaded yet, but may be mere seconds away from it. So if the picture is not online yet when the petition is made, we want to wait 1-2 minutes to send the "Picture not yet here" message. Just so we don't send a "not yet here" message and 5 seconds later a "here's your picture" one.
We want to wait a few hours (1-3), to send a new message asking them to be patient.
After another set amount of time (7 days, approx) we want to send a last message, letting them know that the picture might never reach them because it's not being uploaded.
In any given time, even after point 5, if the picture comes online we want to cancel all these schedules and send the message with the picture.
I've been trying to learn how to do this, and through my search I've learned of three possible ways to achieve this functionality:
Option A: A single Cronjob executing every minute, that sweeps the database table searching if it's time to send one of those messages.
This option is easy to understand, although I'm afraid it might tax the database too much. I can use the shifty control panel that 1and1 has to set up that single Cronjob and call it a day.
Option B: Programatically write Cronjobs for every message that gets scheduled.
This sounds like it would be more "efficient", or at least less taxing on the DB, but I'm not sure Cronjob is supposed to work like that. It's normally used to schedule tasks that repeat themselves, isn't it? This would need a whole lot of functions to work (read the Crontab, add a line, search a line, edit a line, delete a line). Problem being here that I don't know how to edit the crontab that's on 1and1 servers via php. I have tried to contact them but their support has not been helpful at all.
Option C: The "at" function in linux.
This I just learned about. It looks like it would do what I want: schedule a task that happens only once, and it's structure seems pretty easy to handle. The problem here is threefold: 1- I don't know if PHP can execute Command Lines, 2- I don't know if the server at 1and1 has the "at" program installed, 3- I don't know if I can get a Command Line to execute a PHP file with the arguments to make it work.
And if any of these can be done, I don't know how.
As you see there are plenty of things I don't know about, but I've been trying to inform myself and learn. I just ask here because I'm at the end of the rope.
These options I listed are not an exhaustive list, they are just the methods I've found.
Which method would serve my purpose better? And how to do it?
Relevant facts:
Our host and database are located within 1and1, in a virtual server (meaning, we don't have a complete server for us, but share one with other clients)
Although we have "Unlimited" database space and queries, there is still a hard limit of how many queries you can do in a certain limit.
I'm new-ish to using linux, and I have not worked with PHP for years (until I've got this job!), so it would be better if your explanation doesn't assume deep knowledge on my part.
Programatically write Cronjobs for every message that gets scheduled.
God no, the biggest issue you'll have with that is that you will have to anticipate in advance what kinds of messages you'll have to send when, and that clashes with your ability to easily cancel messages. You will generally have to worry about a lot of state to manage (more on that later), which is a lot of hassle. You also need to ensure your scheduled jobs are cleaned up again afterwards, since cron can only set repeating tasks.
The "at" function in linux.
This is basically cron but for non-repeating tasks. That's better, but is still stateful. Especially with shared hosts it's also somewhat unpredictable whether your code will always execute on the same machine or when a machine might reboot. In those circumstances you may lose your scheduled jobs, so this is a no-go.
A single Cronjob executing every minute, that sweeps the database table searching if it's time to send one of those messages.
Yes, this is the way to do it. The biggest advantage here is that it's stateless, in the sense that it will always pick up on the exact current contents of your database, so it allows you to easily manage what your job should be doing on its next run and not having to anticipate that at the time you schedule an event.
I'm afraid it might tax the database too much.
It's one query per minute (if you write it well). Presumably every single page load of your website will incur one or multiple queries, and a properly built site should be able to handle hundreds to thousands of loads per second. One more query per minute isn't going to tank it. If it does, you have bigger issues.
I would personally choose the option A, I'm using it already on a project I worked on.
In your case, having the data on a shared hosting, I would create a cronjob that runs every minute (using an online service) and hits a php file somewhere in your folders, checking in a database table if anything must be done.
You should write some code that handles all the notifications you want to send and when, creating, for each of them, a row in the db table with the time of execution and all the details ready to be used to create the notification and to send it out.
The entire thing would work more or less as follow:
- Something happens that requires the creation of a notification to be sent out in 5 minutes: the row is created in the db table with the unix time or date of 5 minutes from now.
- A notification needs to be sent out 3 days from now, you use the same procedure as above.
The cronjob runs every minute and checks for expired orders (anything with date <= now), if any, a script takes care of these rows and execute the orders (sending out only the notifications required).
The database wouldn't be bothered too much, having to perform only 1 query per minutes (only checking for expired orders).
I am developing a Web Application for businesses to track the status of their repairs & part orders that is running on LAMP (Linux Apache MySQL PHP). I just need some input as to how I should go about allowing users to customize the frequency of email notifications.
Currently, I just have a cron job running every Monday at 6:00AM that runs a php script that sends an email to each user of their un-processed jobs. But I would like to give users the flexibility of not only choosing the time they are sent at, but the days of the week as well.
One idea I had was, some way or another, storing their email notification preferences in a MySQL database, and then writing a php script to notify via email but only if the current date/time fits within the criteria they have set & write in code to prevent it from being sent twice within the same cycle. Then I could just run the cron job every minute or 5 or whatever.
Or would it be better to somehow create individual cron jobs for each user programatically via php?
Any input would be greatly appreciated! :)
No you are right.
Individual crons will consume many resources. Imagine 10k of users with a request to send mail at different times ... this imply 10k of tasks.
The best solution is to create a cron task that will run on your users and take the correct actions.
Iterate on your users, check the date/time set up, detect change and send mail with adding a flag somewhere so said "it's done" (an attribute last_cron_scandate or next_calculated_cron_scandate could be a good solution)
I've an application in Symfony that needs to send Emails/Notificatios from the App.
Since the Email/Notifications sending process takes time, so I decided to put them in Queue and process the Queue periodically. Hence I can decrease the response time for the Requests involving the Email/Notification dispatch.
The Cron Job(a php script - Symfony route) to process the queue runs every 30 seconds, and checks if there are any unsent Emails/Notifications if found it gets all data from the Queue Table and starts sending them. When an Email/Notification is sent, the Queue Table row status flag is updated to show that it's sent.
Now, when there are more Emails in Queue which could take more than 30 seconds to send. Another Cron Job also start running and starts sending emails from the Queue. Hence resulting in duplicate Emails/Notifications dispatch.
My Table structure for Email Queue is as follows :
|-------------------------------------|
| id | email | body | status | sentat |
|-------------------------------------|
My Ideas to resolve this issue are as follows :
Set a flag in Database that a Cron Job is running, and no other Cron Jobs should proceed if found the flag set.
Update status as 'sent' for all records and then start sending Emails/Notifications.
So my question is, are there any efficient approach to process Queues? Is there any Symfony Bundle/Feature to do such specific task?
So my question is, are there any efficient approach to process Queues? Is there any Symfony Bundle/Feature to do such specific task?
You can take enqueue-bundle plus doctrine dbal transport.
It already takes care of race conditions and other stuff.
Regarding your suggestions:
What if the cronjob process dies (for whatever reason) and cannot clean up the flag? A flag is not a good idea, I think. If you would like to follow this approach, you should not use a boolean, but rather either a process ID or a timestamp, so that you can check if the process is still alive or if it started a suspiciously long time ago without cleaning up.
Same question: what if the process dies? You don’t want to mark the mails as sent before they are sent.
I guess I’d probably use two fields: one for marking a record as “sending in progress” (thus telling other processes to skip this record) and another one for marking it as “sending successfully completed”. I’d write a timestamp to both, so that I can (automatically or manually) find those records where the “sending in progress” is > X seconds in the past, which would be an indicator for a died process.
You can use Database Transactions here. Rest will be handled by database locking mechanism and concurrency control. Generally whatever DML/DCL/DDL commands you are giving, are treated as isolated Transactions. In your Question, if 2nd cron job will read the rows(before 1st cron job will update it as sent) , it will find the email unsent, and try to send it again. and before 2nd cron job will update it as sent, if 3rd job will find it unsent, it will do same. So it can cause big problem for you.
whatever approach you will take, there will be Race Condition. So let the database allow to do it. there are many concurrency control methods you can refer.
BEGIN_TRANSACTION
/* Perform your actions here. N numbers of read/write */
END_TRANSACTION
Still there is one problem with this solution. You will find at one stage that, when number of read/write operation will increase, some inconsistency still remains.
Here comes isolation level of the database, It is the factor that will define how much 2 transactions are isolated from each other, and how to schedule them to run concurrently.
You can set isolation level as per your requirements. Remember that, concurrency is inversely proportional to isolation level. So analyse your Read/Write statements, figure out which level you need. Do not use higher level then that. I am giving some links, which may help you
http://www.ibm.com/developerworks/data/zones/informix/library/techarticle/db_isolevels.html
Difference between read commit and repeatable read
http://dev.mysql.com/doc/refman/5.7/en/innodb-transaction-isolation-levels.htm
If you can post your database operations here. I can suggest you some possible isolation level
Here is the scenario, I'm using Beanstalkd queue to send an email to a huge list of emails (50000+), each email has to have some unique content so the fired job loops over all the addresses, generates the content and sends the mail.
Sometimes the user may want to cancel the operation in the middle of sending, so for example while the job is running and after the mail was sent to say 20000 addresses, the user clicks on "Stop" which should "delete" the job.
what I have done so far is that I managed to get the running job instance, Queue::Push returns the job ID, so I save this ID saved in DB, and when I want to stop the job this is what I tried to do
$phean= Queue::getPheanstalk();
$res = $phean->peek($Job_ID); // returns a Pheanstalk_Job
$job = new \Illuminate\Queue\Jobs\BeanstalkdJob(app() , $phean, $res , 'default') ;
$res = $job->delete() // returns NOT_FOUND ??
$data = $job->getRawBody() // returns correct data, so I'm sure this is the right job instance
so why do I get NOT_FOUND, although when I use
supervisorctl tail -f queuename
I can see that the job is still running and outputting content
Any help ? If there is a better approach than trying to get the job and delete it this way I'm open to suggestions, I thought about saving the job ID in database (ID , status), and when I want to delete it I alter the ID status, and In the loop that is running within the job it checks every time, or maybe every 10 times, and if the status is equal to 1 for example then $job->delete(), but this will be so slow as it will hit the db in every loop.
So you have a a main job, which you reserve() and hold open, and in that job you create many emails directly.
Since the job you are trying to delete is currently reserved, you can't delete it. Even if you could, how would the currently running job be informed by Beanstalkd?
Instead, I would have the main loop check for any jobs on some separate control tube (you could do a quick check every say 10 or 100 emails sent) - just to request a new job, but not waiting if there isn't anything there. If there is a job there, then the main process cleans up and exits.
Another idea is not to actually send email in the main loop, but instead put the details of what emails to send, one per address, into the queue. Other processes read that mass-queue and start sending emails, but again, also read a control tube (with a higher priority message that would be returned ahead of the lower-priority email/details message). If there's anything in the control tube, stop sending emails. You would need at least as many STOP messages in the control tube as you have workers.
I'm developing a web game (js php mysql) in which one clicks a button to start an action that takes time to complete (let's say 10 hours) and when it finishes some points are added to that player's total.. The problem is that I need those points to be added even if the player is not online at the time the action finishes.. for example I need to have the rankings updated, or an email sent to the player..
I thought about a cron job checking constantly for ending actions, but I think that would kill the resources (contantly checking against actions of thousands of players..).
Is there a better solution to this problem?
Thanks for your attention!!
You can just write into your database when it's finished and when the user logs in you add the earned points to his account. You can also check with a cronjob. Even if you have millions of user this will not kill your server.
Cron is perfect for this. You could write your tasks in stored procedures, then have cron run an SQL script to call the stored procedure that would update the records of your players.
Databases are designed to work with thousands and millions of pieces of information efficiently, so I don't think the idea that it will kill system resources is a valid one unless you hosting system is really constrained already.
If you want to be safe against cheating you need to do the checking on the server anyway. If the "waiting" will happen within a Javascript on the client, one could easily decrease the remaing time.
So you need to send the job to the server (which is assumed to be safe against clock modifications) and the server will determine the end timestamp. You could store your jobs in a queue.
If you only need this information for the user himself you can just look at the queue when the user logs in. Otherwise run a cron job every minute (or so). This job will mark all jobs finished when their timestamp is in the past (and remove them from the database).
If you need more precise checking you will need to come up with an alternative server side solution that is doing this more often (e.g. a simple program polling the database every few seconds).