I came up with a schema for a simple batch email I'm making. The emailer will send out X number of emails every 5 minutes by running a cron job'd php script. The issue is that I don't think this is the best way to do it, and was looking for an alternate, better way; or validation ;).
The (simplified) schema would look like:
EmailList | JobQue | Jobs
------------|----------|----------
email | jobid | id
| email | esubject
| ebody
The idea is that when a new job is created, it adds it to the Jobs table, and every email that needs to be sent will be added to the JobQue table.
Then the cron'd php script actually sending the emails will just loop through the next X number of items in the JobQue table, send and delete them.
Is this a good way of doing it? Will it buckle under moderate load? (1000-5000 emails, 1-5 jobs a day)? Of course it would if there are more emails being added then sent, but would there be other issues (like trying to add a 1000 records to a table in one go, even if I'm inserting them all with one mysql query)?
Thanks,
Max
I think this is a very effective way of doing it. The only issue could be, if you want to send like thousands of emails at once. It could lead to timeout in php.
Adding thousends of records into mysql with one query is not bad, it's the best way imo.
But I must say, the copy depends on the query itself. If it is too long (I mean a too long string), than you can loose the connection to the server.
But I don't think you will have any problems with this schema at all.
At a bare minimum, you may want to consider a shorter cron time than 5 minutes. If you limited X to 30 (a key number for avoiding being listed on a lot of blacklists), you're script would take 16 hours and 55 minutes to complete 5000 emails.
Related
I developed a web application for running email campaigns. A cron is running to send multiple emails (upto 100 in single request) per minute.
SELECT id,email,email_text FROM recipients
WHERE sent_status=0 LIMIT 100
This script takes approx 70-100 seconds to send all the email using php. After sending each email, I update the sent_status=1.
Now the problem is that dues to shared hosting the script is not able to process more than 50-60 records in 60 seconds, then another request started which also select those 40 records that are still processing with first request not updated yet. Due to this some recipients receives duplicate emails.
Can this prevent by using Locking or any other solution ?
UPDATE
However my question is very similar with the linked duplicate question, except that I am actually SELECTing data from multiple tables, using GROUP BY and using ORDER BY clause on multiple columns including RAND().
My actual query something like this
SELECT s.sender_name,
s.sender_email,
r.recipient_name,
r.email,
c.campaign_id,
c.email_text
FROM users s, recipients r, campaigns c
WHERE c.sender_id=s.sender_id
AND c.recipient_id=r.recipient_id
AND sent_status=0
GROUP BY c.sender_id, r.recipient_id
ORDER BY DATE(previous_sent_time), RAND()
LIMIT 100
Thanks
You shouldn't try to fix this by using some database mechanics.
Instead, you should rethink your method of processing the "sending".
In your case, I would perform the following steps:
Create the emails you want to send, store them inside the database. Maybe 100.000 records in 10 seconds - that's no issue.
Use a script that processes these records according to your limitations (50-60 mails per minute) - That's a simple SELECT with proper limits, called every minute.
Voila, Your mails are beeing send. 100.000 Mails with 60 mails per minute would require about 27 hours - but you can't bypass "Hosting-Limitations" by altering code.
Wrap the execution into a Singleton, or some "locking" method to make sure, there is only one Mail-Queue-Processor active. Then you don't have any issues with double selects of the same mail-queue-entry
.
I actually ran into this issue myself when developing a similar app. My solution was that at the beginning of the cron, I set every processing task in the database to be marked as in process.
Once the script is done, it marks it as done and moves on.
Using this method, if another script runs over the same item, it will automatically skip it.
I've got a website in which I want to send some followup emails to customers a certain number of days after they bought something. I now wonder how to do that. I think there are two options:
Create a table in my DB in which I store the emails which I plan to send. So I simply store the customer email address and the date on which I want to send it. I then simply run a cron every day and send the emails that need to be sent and set the status in the table to "sent". The advantage of this method is that I know which emails need to be sent. The disadvantage is that I'm less flexible; it's not easy to change the number of days after which I send the emails because they are stored in the DB.
I can also do it from the code by simply running a cron that gets the list of customers who bought something an x number of days ago, send them the email, and only then store the fact that I sent them an email in the database. The advantage of this method is that I'm more flexible. If I want to send out the emails later I can simply define that in the code (or some var). The disadvantage is that I don't have a list of emails which are going to be sent (although I don't really know what that would be useful for).
My question is actually; what is best practice in this case? How do most websites handle this and why?
I would choose method 2.
The disadvantage is not really a disavantage. Supposing you got an "order" table, you can get the list of mail to send just making a query quite similar the one used by your cron.
But it is a personal choice. I don't know which method is normally used.
I would go on a combination of both options, and that is the method I actually doing so in a system I'm currently developing.
Having a "ready to send" list is useful for logging and tracking your emails, for example, if you use a third party emailing solution, and you have a limited number of emails per month, you can track the amount you used from within your program, and maybe even trigger an automatic "upgrade" of the account if required because you need more emails.
The required flexibility can be achieved by designing a good schema for that table.
The solution you described will have a schema like so I guess:
|---------|---------|------|---------|-------|
| send_to | subject | body | send_at | sent |
|---------|---------|------|---------|-------|
That is really not flexible, because once inserted into the database, in order to change the send_at column you will have to retrieve data from orders, and recalculate the send_at value.
I propose a schema like so:
|---------|---------|------|-----------|---------|-------|
| send_to | subject | body | added_at | send_in | sent |
|---------|---------|------|-----------|---------|-------|
The change is that the send_at column is not fixed now. When you run the cron, you will retrieve only emails that match the following query:
!sent && added_at + send_in >= now
This will return the same result as querying the first schema using:
!sent && send_at >= now
But now, you can easily change the duration of the waiting between the time that the email is added to the queue and the time that it actually will be send.
I've an application in Symfony that needs to send Emails/Notificatios from the App.
Since the Email/Notifications sending process takes time, so I decided to put them in Queue and process the Queue periodically. Hence I can decrease the response time for the Requests involving the Email/Notification dispatch.
The Cron Job(a php script - Symfony route) to process the queue runs every 30 seconds, and checks if there are any unsent Emails/Notifications if found it gets all data from the Queue Table and starts sending them. When an Email/Notification is sent, the Queue Table row status flag is updated to show that it's sent.
Now, when there are more Emails in Queue which could take more than 30 seconds to send. Another Cron Job also start running and starts sending emails from the Queue. Hence resulting in duplicate Emails/Notifications dispatch.
My Table structure for Email Queue is as follows :
|-------------------------------------|
| id | email | body | status | sentat |
|-------------------------------------|
My Ideas to resolve this issue are as follows :
Set a flag in Database that a Cron Job is running, and no other Cron Jobs should proceed if found the flag set.
Update status as 'sent' for all records and then start sending Emails/Notifications.
So my question is, are there any efficient approach to process Queues? Is there any Symfony Bundle/Feature to do such specific task?
So my question is, are there any efficient approach to process Queues? Is there any Symfony Bundle/Feature to do such specific task?
You can take enqueue-bundle plus doctrine dbal transport.
It already takes care of race conditions and other stuff.
Regarding your suggestions:
What if the cronjob process dies (for whatever reason) and cannot clean up the flag? A flag is not a good idea, I think. If you would like to follow this approach, you should not use a boolean, but rather either a process ID or a timestamp, so that you can check if the process is still alive or if it started a suspiciously long time ago without cleaning up.
Same question: what if the process dies? You don’t want to mark the mails as sent before they are sent.
I guess I’d probably use two fields: one for marking a record as “sending in progress” (thus telling other processes to skip this record) and another one for marking it as “sending successfully completed”. I’d write a timestamp to both, so that I can (automatically or manually) find those records where the “sending in progress” is > X seconds in the past, which would be an indicator for a died process.
You can use Database Transactions here. Rest will be handled by database locking mechanism and concurrency control. Generally whatever DML/DCL/DDL commands you are giving, are treated as isolated Transactions. In your Question, if 2nd cron job will read the rows(before 1st cron job will update it as sent) , it will find the email unsent, and try to send it again. and before 2nd cron job will update it as sent, if 3rd job will find it unsent, it will do same. So it can cause big problem for you.
whatever approach you will take, there will be Race Condition. So let the database allow to do it. there are many concurrency control methods you can refer.
BEGIN_TRANSACTION
/* Perform your actions here. N numbers of read/write */
END_TRANSACTION
Still there is one problem with this solution. You will find at one stage that, when number of read/write operation will increase, some inconsistency still remains.
Here comes isolation level of the database, It is the factor that will define how much 2 transactions are isolated from each other, and how to schedule them to run concurrently.
You can set isolation level as per your requirements. Remember that, concurrency is inversely proportional to isolation level. So analyse your Read/Write statements, figure out which level you need. Do not use higher level then that. I am giving some links, which may help you
http://www.ibm.com/developerworks/data/zones/informix/library/techarticle/db_isolevels.html
Difference between read commit and repeatable read
http://dev.mysql.com/doc/refman/5.7/en/innodb-transaction-isolation-levels.htm
If you can post your database operations here. I can suggest you some possible isolation level
I'm implementing an application that would check if the time is 4 or 6 or 12 hours after the data was inserted in the database.
So for example:
The data was inserted 1pm, then it's already 5pm (so this is 4 hours after the data was inserted), my application would process another task (no need for an explanation for this, I got it cover :D )
So this is a picture of my database fields:
id | name | time_added | time_to_check
ID is of course a type INT()
name is Varchar()
time_added is a type INT() because I am using time() function of PHP to insert a value.
time_to_check is a type INT because this is the field where i'll insert the 4,6, or 12
So my question is how would I implement this one? Is setting up a Cron Job a good idea to perform this task?
If Yes, every what time should I run the Cron Job (every 15mins,1Hour, Once a day)? I know there are lots of consideration in doing this task. So I need your ideas guys.
If you have an idea please share it to me or even code how you think to implement this one, it would be a great help!
Thank you very much! :)
ADDITIONAL INFO:
My concern about running it every minute is that not all of the data are inserted at the same time. Like for example: Data1 was added 3:15pm and time to check is 4 hours
Data2 was added 3:20pm and time to check is 4hours. What if the cron job didn't execute during 3:15pm or 3:20pm, what should I do?
If you need data to be routinely processed at 4, 6 or 12 hours after insert, you could also look at the Linux (not PHP!) 'at' command which allows you to queue processes to execute at a particular time. If you are expecting a lot of inserts then cron remains a better option though.
Cron is probably your best bet, having it check the database for records to check and going through them.
As for how often you want it, it mostly depends on how accurate you need the time to be. For instance, in my game I have a cron running every minute to check for fleets arriving at their destinations, but only every hour for checking if any banned users have served their sentence and need unbanning.
Yes a cron job would be good for this task. You would need to run it in an LCM of the times you'd like to check it. So from what I can see you should run it after 2 hours. (There would be a slight margin of error though - due to rounding). If you need complete accuracy you should check it every minute
I would recommend todo this task with a cronjob. The usage and some examples are explained here: http://en.wikipedia.org/wiki/Cron
The timeframe depends on your needs. If you check for 4,6 or 12 hours a cronjob per hour seems to be ok.
This is more in search of advice/best practice. We have a site with many users ( > 200,000 ), and we need to send emails to all of them about events occurring in their areas. What would be the best way to stagger the adding of the jobs?
Things to note:
We store everything in a MySQL database
The emails go out on a queue-based system, with independent workers grabbing the tasks and sending them out.
We collect username, join date that we can use for grouping
Sending the emails is not the problem, the problem is getting the jobs added. I am afraid of a performance hit if we suddenly try to add that many jobs at once.
I hope you requirement is like sending news letters to news groups and subscribers.
Do you already have groups and is it possible to implement.
It will help to filter and avoid filtering the entire 200000 users.
send the emails based on groups will reduce a db load i hope !!.
in db active and inactive status for user can be there.
running the cron job is the solution . but the intervals is based on the load that job can impact your sever.
so if db design and job intervals are good performance will be betetr
I assume your queue is a table in a database and you concern is that adding thousands of records to a table will thrash it because the index gets rebuilt each time?
If you add many entries within a single process (eg. a single http-request or a single cronjob script), you can start a transaction before inserting and commit when done. With all the inserts inside a transaction, the index will only be updated once.
If it's a more general problem, you might want to consider using a message queue instead of a database table.
Or am I completely off?
Set a cron job for every 5 minutes. Have it check if there are emails to send. If there are, and none are set as "in progress" yet, pick the first one and set it as being in progress. Select the first users with id < n and send it to them. Keep track of that last id, and repeat until you reach the end of the user list.