Prevent SELECTing same rows in concurrent process - MySql PHP - php

I developed a web application for running email campaigns. A cron is running to send multiple emails (upto 100 in single request) per minute.
SELECT id,email,email_text FROM recipients
WHERE sent_status=0 LIMIT 100
This script takes approx 70-100 seconds to send all the email using php. After sending each email, I update the sent_status=1.
Now the problem is that dues to shared hosting the script is not able to process more than 50-60 records in 60 seconds, then another request started which also select those 40 records that are still processing with first request not updated yet. Due to this some recipients receives duplicate emails.
Can this prevent by using Locking or any other solution ?
UPDATE
However my question is very similar with the linked duplicate question, except that I am actually SELECTing data from multiple tables, using GROUP BY and using ORDER BY clause on multiple columns including RAND().
My actual query something like this
SELECT s.sender_name,
s.sender_email,
r.recipient_name,
r.email,
c.campaign_id,
c.email_text
FROM users s, recipients r, campaigns c
WHERE c.sender_id=s.sender_id
AND c.recipient_id=r.recipient_id
AND sent_status=0
GROUP BY c.sender_id, r.recipient_id
ORDER BY DATE(previous_sent_time), RAND()
LIMIT 100
Thanks

You shouldn't try to fix this by using some database mechanics.
Instead, you should rethink your method of processing the "sending".
In your case, I would perform the following steps:
Create the emails you want to send, store them inside the database. Maybe 100.000 records in 10 seconds - that's no issue.
Use a script that processes these records according to your limitations (50-60 mails per minute) - That's a simple SELECT with proper limits, called every minute.
Voila, Your mails are beeing send. 100.000 Mails with 60 mails per minute would require about 27 hours - but you can't bypass "Hosting-Limitations" by altering code.
Wrap the execution into a Singleton, or some "locking" method to make sure, there is only one Mail-Queue-Processor active. Then you don't have any issues with double selects of the same mail-queue-entry
.

I actually ran into this issue myself when developing a similar app. My solution was that at the beginning of the cron, I set every processing task in the database to be marked as in process.
Once the script is done, it marks it as done and moves on.
Using this method, if another script runs over the same item, it will automatically skip it.

Related

Inserting large number (millions) of rows in a MySQL table using Eloquent in Laravel in a single HTTP Request

Please note that I don't carry so much of experience.
The problem is well outlined in the question title and the logic follows below.
On my webapp there's a client who wants to engage with 'All' users of my webapp in the following ways -
In-App Notifications (Will be pushed to Android App using RESTful APIs and Services)
Emails - will be queued to a CRON job on my hosting.
SMS - a third party RESTful api will do this job. Each call will send SMS to a single user.
Now assuming that my application grows, which obviously should, I'll be having say 50 Million users at some point in near future and the client on press of a button will request the application to send over notifications.
Considering, my client base grows too, say to around half a million, and there are atleast 50 clients doing this job of 'customer engagement' per second my server will have to send
50 * 50 million In-App notifications
50 * 50 million Emails
50 * 50 million SMS API calls
Since, API calls is out of context, let's take them off. We're now left with
50 * 50 million In-App notifications
50 * 50 million Emails
How I send notifications? A notification is sent by creating a new row in the notifications table with say userId and NotificationText.
How I send Emails? Using Laravel Queues with Database, the default Mail::queue function in Laravel does the job by creating a row in the 'jobs' table which is serviced by a CRON job.
Now, considering the above two cases I'll have to issue commands for creation of 50 * 50 * 2 Million MySQL rows in my database per second. This shall take considerable amount of time and each client shall wait for around x seconds to be redirected to a response page highlighting 'successful' request.
Is this approach relatively practical?
What value will x assume in this case?
The response will be delayed for some x seconds, is there a way to overcome this issue using Laravel Events?
During execution of the request (concurrently by many clients), is there a chance of increased resource usage leading to Denial of Service?
In case, the above mentioned approach is BAD, what alternative approach(es) do I have?
For inserting data to the database you don't do 100M inserts but one insert using a sub-query, something like INSERT INTO table (SELECT id, <message_id> FROM users); where message_id can be set from your app so mysql see it as a constant. If this still take to much time you could look into having one table for messages to all as other have suggested and insert into the notification table when each user is reading the message.
For sending emails you could send it in bulk setting a large number of addresses as BCC, or use a service to send the emails so you avoid getting them marked as spam

Batch emailing mysql table setup?

I came up with a schema for a simple batch email I'm making. The emailer will send out X number of emails every 5 minutes by running a cron job'd php script. The issue is that I don't think this is the best way to do it, and was looking for an alternate, better way; or validation ;).
The (simplified) schema would look like:
EmailList | JobQue | Jobs
------------|----------|----------
email | jobid | id
| email | esubject
| ebody
The idea is that when a new job is created, it adds it to the Jobs table, and every email that needs to be sent will be added to the JobQue table.
Then the cron'd php script actually sending the emails will just loop through the next X number of items in the JobQue table, send and delete them.
Is this a good way of doing it? Will it buckle under moderate load? (1000-5000 emails, 1-5 jobs a day)? Of course it would if there are more emails being added then sent, but would there be other issues (like trying to add a 1000 records to a table in one go, even if I'm inserting them all with one mysql query)?
Thanks,
Max
I think this is a very effective way of doing it. The only issue could be, if you want to send like thousands of emails at once. It could lead to timeout in php.
Adding thousends of records into mysql with one query is not bad, it's the best way imo.
But I must say, the copy depends on the query itself. If it is too long (I mean a too long string), than you can loose the connection to the server.
But I don't think you will have any problems with this schema at all.
At a bare minimum, you may want to consider a shorter cron time than 5 minutes. If you limited X to 30 (a key number for avoiding being listed on a lot of blacklists), you're script would take 16 hours and 55 minutes to complete 5000 emails.

techniques for bulk data processing

I'm looking for a technique to do the following and I need your advices.
I have a huge (really )table with registration ids and I need to send messages to these ID owners. I cant send the message to many recipients at once, this needs to be proceeded one by one. So I would like to have a script(php) which can run in many parallel instances (processes) by getting some amount from db and processing it. In other words every process needs to work with a particular range of data. I would like also to stop each process and to be able to continue message sending from the stopped user to another set of users who didnt get the message yet.
If it's possible? Any tips and advices are welcome.
You may wish to set a cron job, typically one of the best approaches to run large batch operations with PHP scripts:
http://www.developertutorials.com/tutorials/php/running-php-cron-jobs-regular-scheduled-tasks-in-php-172/
Your cron job will need to point to a PHP script which does the following:
Selects a subset of recipients from your large DB table, based on a
flag set at #3 (below), identifying the next batch to process
Send email to those selected recipients
Saves a note of current job position success/fail (i.e. you could set a
flag next to each recipient in the DB who is succesfully mailed, these are then not selected when the job is rerun)
Parallel processing is possible only to the extent of the configuration of your server. Many servers can serve pages in a parallel fashion, but then again, it is limited to a few. Instead, the rule of thumb is to be as fast as possible and jump to the next request.
Regarding your processing of a really large list of data in your database. You will first of all need a list of id for the mailing your are doing:
INSERT INTO `mymailinglisttable` (mailing_id, recipient_id, senton) SELECT 123 AS mailing_id, mycontacttable.recipient_id, NULL FROM mycontacttable WHERE [insert your criterias for your contacts]
Next you will need to use either innodb or some clever logic for your parallel processing:
With InnoDB, you can do some row level locking, but don't ask me how, search it yourself, i don't use InnoDB at all, but i know it is possible. So you read the docs on that, select and lock some rows, send the emails, mark as sent and wash rinse repeat the operation by calling back your own script. (Either with AJAX or with a php socket)
Without InnoDB, you can simply add 2 fields to your database, one is a processid, the other is a lockedon field. When you want to lock some addresses for your processing, do:
$mypid = getmypid().rand(1111,9999);
$now = date('Y-m-d G:i:s');
mysql_query('UPDATE mymailinglisttable SET mypid = '.$mypid.', lockedon = "'.$now.'" LIMIT 3');
This will lock 3 rows for your pid and on the current time, select the rows that were locked using:
mysql_query('SELECT * FROM mymailinglisttable WHERE mypid = '.$mypid.' AND lockedon = "'.$now.'")
You will retrieve the 3 rows that you locked correctly for processing. I tend to use this version more than the innodb version cause i was raised with this method but not because it is more performant, actually, i'm sure InnoDB's version is much better just never tried it.
If you're comfortable with using PEAR modules, I'd recommend having a look at the pear Mail_Queue module.
http://pear.php.net/package/Mail_Queue
Well documented and with a nice tutorial. I've used a modified version of this before to send out thousands of emails to customers and it hasn't given me a problem yet:
http://pear.php.net/manual/en/package.mail.mail-queue.mail-queue.tutorial.php

How would I organize data to send many emails to users?

This is more in search of advice/best practice. We have a site with many users ( > 200,000 ), and we need to send emails to all of them about events occurring in their areas. What would be the best way to stagger the adding of the jobs?
Things to note:
We store everything in a MySQL database
The emails go out on a queue-based system, with independent workers grabbing the tasks and sending them out.
We collect username, join date that we can use for grouping
Sending the emails is not the problem, the problem is getting the jobs added. I am afraid of a performance hit if we suddenly try to add that many jobs at once.
I hope you requirement is like sending news letters to news groups and subscribers.
Do you already have groups and is it possible to implement.
It will help to filter and avoid filtering the entire 200000 users.
send the emails based on groups will reduce a db load i hope !!.
in db active and inactive status for user can be there.
running the cron job is the solution . but the intervals is based on the load that job can impact your sever.
so if db design and job intervals are good performance will be betetr
I assume your queue is a table in a database and you concern is that adding thousands of records to a table will thrash it because the index gets rebuilt each time?
If you add many entries within a single process (eg. a single http-request or a single cronjob script), you can start a transaction before inserting and commit when done. With all the inserts inside a transaction, the index will only be updated once.
If it's a more general problem, you might want to consider using a message queue instead of a database table.
Or am I completely off?
Set a cron job for every 5 minutes. Have it check if there are emails to send. If there are, and none are set as "in progress" yet, pick the first one and set it as being in progress. Select the first users with id < n and send it to them. Keep track of that last id, and repeat until you reach the end of the user list.

Postpone Job Queue with Gearman

I want to extract some of the time consuming things into a queue. For this I found Gearman to be the most used but don't know if it is the right thing for me.
One of the tasks we want to do is queue sending emails and want to provide the feature to be able to cancel to send the mail for 1 minute. So it should not work on the job right away but execute it at now + 1 minute. That way I can cancel the job before that and it never gets sent.
Is there a way to do this?
It will run on debian. And should be usable from php. The only thing I found so far was Schedule a job in Gearman for a specific date and time but that runs on something not widely spread :(
There are two parts to your question: (1) scheduling in the future and (2) being able to cancel the job until that time.
For (1) at should work just fine as specified in that question and the guy even posted his wrapper code. Have you tried it?
If you don't want to use that, consider this scenario:
insert an email record for the email to-be-sent in a database, including a "timeSent" column which you will set 1 minute in the future.
have a single gearman worker (I'll explain why single) look at the database for emails that have not been sent (eg some status column = 0) and where timeSent has already passed, and send those.
So, for (2), if you want to cancel an email before it's sent just update its status column to something else.
Your gearman worker has to be a single one because if you have multiple they might fetch and try to send the same email record. If you need multiple make sure the one that gets the email record first locks it immediately before any time consuming operations like actually emailing it (say, by updating that status column to something else).

Categories