On a site I run we have a facility to send email alerts of updates. Due to the popularity of the site and the feature, in some cases a single update may require sending anywhere betwen 30 to 300 alerts. Each of these alerts have an individual 'reset' link to ensure they don't get spammed with updates (ie: forum topic replies).
So needless to say, on the more popular updates the page will slow down.
How would you recommend handling large numbers of email alerts to ensure that they don't affect the page load when an update is posted?
I would schedule the updates from the foreground task ("when the page loads") into a file or database, and process them with a cron task to take this operation into the background.
I doubt it's really important enough (or breaks often enough) that you would actually confirm that all updates were successfully sent, but it's possible to provide such a notification as well.
I would either send them in batches, are show a progress bar as I advised here:
Processing large amounts of data in PHP without a browser timeout
Related
I'm creating a site that sends out a daily news email to about 800 users, at a time they can specify. My problem is that my script takes a long time to run and times out, so I'm looking for some advice on how I could be approaching this better.
Current approach:
Users are placed in a 'mailing queue' database table with their ID, receive time, and a sent flag.
I'm then running a CRON script every minute which does the following:
Grab all from mailing queue with a 'receive time' less than or equal to now, that haven't sent
Loop through the users, joining a preferences table to get their chosen categories (up to about 30 per user).
For each category, find the latest 3 articles
Prepare an HTML email with this content using PHPMailer
PHPMailer is using Mailgun SMTP to avoid overloading my SMTP server
Send mail to user, mark as sent in database
My observations so far are:
When testing the script by running in-browser, it runs incredibly slowly for a few minutes then times out (without sending any emails).
When running every minute via CRON, it sends way over the number of emails (about 1400) over the course of 40 minutes, I guess because the script is overlapping itself and the sent flag is not reliably updated.
The majority of users are set to receive their email at the same time, so I'm doing 'worst case scenario' testing on this basis
Questions
Is my script far too heavy, by querying the database and generating the HTML email content for each user on the fly? I'm wondering if it would be better to generate the content ahead of time and store against the user in the mailing queue.
Would a queue manager like Beanstalkd help? I've had a look into it, but am struggling to see how to implement into my routine.
Ultimately I need the emails to be sent reliably to each user at the time they expect.
Any advice much appreciated!
You can do this in PHP, but you probably shouldn't. You're trying to build a mail server when there are much better ways of doing this, which mainly involve using a mail server.
Sending high volumes of email during page loads is not workable – it can be troublesome even for single messages, yet many still try. Approach it like this:
Store your list in a database.
When you want to send, generate a record representing each message to send (essentially a copy of the list).
Have a daemon (a long-running task) or cron job that sends the messages in chunks.
Create messages one at a time and submit them to a local mail server.
Use DKIM signatures.
As each message is sent, mark them as sent in the database, but you need to be very aware of how database transactions and locks work for this to work safely and avoid duplicates – do this right and overlapping processes work just fine.
You can generate messages as fast as you like, and your mail server will deal with queuing, onward delivery, retries, bounces.
Use VERP addressing, and feed bounces into a bounce handler (be warned, writing these is not fun!), and have that prevent sending to bouncing addresses in future.
This approach works well – it is exactly how my own system ([Smartmessages.net](https://info.smartmessages.net/ , which is built in PHP) works, and I can sustain over 200 messages per second using multiple message generators running in parallel (database transactions FTW!).
If you find all this a bit too much (it is very difficult), you're probably better off using a commercial sending service (like my own) or hosting a DIY solution, such as Mailcoach by my good friends at Spatie. Either of these would work well, and your list is pretty small – I'm often handling lists of over 100,000.
Now that all of the browsers I like have almost full support for Server Sent Events, I wanted to try implementing it on a site I've been putting off because I hate polling. But I have initial hesitation that I was hoping I could get some help on.
Here is my use case:
User goes to a form, something time-based and competitive, in this case class registration. All things being equal, they have a list of about 30 - 40 classes they are eligible for, and in order to minimize instances of "she logged in first but he hit save first but he didn't mean to hit save but she already chose another class" etc, I want to make the form real-time, so that when someone selects an option, it goes straight into the db and anyone else viewing the form sees that it is filling up. (I'll deal with the stress of people changing their minds later).
So, in a polling scenario, I had to deal with the AJAX calls having to check on the status of 40 spots and update them and setting an interval that could potentially still create collisions.
But with Server Sent Events, I can have the listener get just the spots that need updating, which seems better, but here's where I get stuck:
Is there any risk of the listener getting overloaded? Let's say the script sends 15 messages, back-to-back, about a status change. I see vague mentions of how user agents should handle queued tasks, but it's not clear if that's for establishing a connection or handling server-sent messages
Is this basically just passing the burden of polling from the browser to the server? Does the script have to check the DB every second for changes? Is there any way for the script to be aware or notified when change has occurred? Let's assume that seat requests are sent to requests.php via ajax and that updates.php pushes events back to the browser. Is there a standard and/or clever way for updates to idle until requests has made a commit?
The only solution I can think of is for requests.php to write the committed changes to a flat file (commits.xml perhaps) and updates.php just polls the file size every half-second, thereby keeping the workload to a minimum.
Any better/smarter/more obvious solutions out there?
Polling your database for changes is not a good idea. Instead, you should do inter-process PUB/SUB on the server. To do that, you can use a message queue like RabbitMQ, ZeroMQ or Redis PUB/SUB.
In my application, the user needs to register through a form, where I have to send three mails and do some other (huge) database checks. It takes a lot of time, is it possible to make the whole task as background process or some other alternates is there?
If your database activities take too long then you need to rethink your design. However if the delay is due to emails, then just store the emails in DB or in files. Create a cron job that sends out these queued emails every 5/10/15 minutes(and then delete them).
maybe you can once a user is registered flag him as pending in your database.
Then you could defer the work in a python or php routine running in the background continuously who would look for any pending request, do the check, send the emails and finally update the database accordingly.
the user during this time would be in a registered but pending status, but at least from a visitor point of view, he is not stuck waiting for everything to be processesed.
You cannot make a PHP script that has been started over a webserver process a background process.
I would check if I can optimize the database (probably, you have insufficient indizes), and if that doesn't fly, build a second process that gets started regularily (maybe once every five minutes or so) on the CLI side with a cronjob, showing the user a "Thank you for your registration" page...
As per my comment elsewhere, spawning a long running process from PHP is a practical solution bearing in mind a few caveats if the performance problems are unavoidable.
However "send 3 mails" should not take an appreciable amount of time (I don't know what the database checks are). You need to spend some time looking at optimizing the existing process.
Other ways to solve the problem would be conventional batch processing, offloading the heavy lifting to a multi-process/multi-threaded daemon via a network call or asynchronous messaging system, or even a single threaded job processor using a message queue.
I have a web application where users can create topics and also comment on other topics (similar to what we have here on stackoverflow). I want to be able to send notifications to participating users of a discussion.
I know the easiest way to go about it is to hook the notification to the script executed when a user interacts with a discussion. In as much as that seems very easy, I believe its not the most appropriate way as the user will need to wait till all the emails notifications (notification script finishes execution) are sent till he gets the status of his action.
Another alternative I know of is to schedule the execution of the notification script using cronjob. In order for the notification to be relevant, the script will be scheduled to execute every 3 to 7 minutes so as to make sure the users get notification in a reasonable time.
Now my concern is, will setting cronjob to run a script every 3 minutes consume reasonable system resource putting into consideration my application is still running on a shared hosting platform?
Also, am thinking is it possible to have a scenario where by the comment script will trigger or notify a notification script to send notifications to specified email addresses while the comment script continues it's execution without having to wait for the completion of the notification script. If this can be achievable, then I think it will be the best choice for me.
Thank you very much for your time.
Unless your notification script is enormously resource-intensive and sends dozens or hundreds of messages out on each run, I would not worry about scheduling it every 3-7min on a shared host. Indeed, if you scheduled it for 3 minutes and found performance sagging on your site, then increase it to 4min for a 25% reduction in resources. It's pretty unlikely to be a problem though.
As far as starting a background process, you can achieve that with a system call to exec(). I would direct you to this question for an excellent answer.
IMO adding a "hook" to each "discussion interaction" is by far the cleanest approach, and one trick to avoid making users wait is to send back a Content-Length header in the HTTP response. Well-behaved HTTP clients are supposed to read the specified number of octets and then close the connection, so if you send back your "status" response with the proper Content-Length HTTP header (and set ignore_user_abort) then the end user won't notice that your server-side script actually continues on its merry way, generating email notifcations (perhaps even for several minutes) before exiting.
First,
the set up:
I have a script that executes several tasks after a user hits the "upload" button that sends the script the data it need. Now, this part is currently mandatory, we don't have the option at this point to cut out the upload and draw from a live source.
This section intentionally long-winded to make a point. Skip ahead if you hate that
Right now the data is parsed from a really funky source using regex, then broken down into an array. It then checks the DB for any data already in the uploaded data's date range. If the data date ranges don't already exist in the DB, it inserts the data and outputs success to the user (there is also some security checks, data source validation, and basic upload validation)... If the data does exist, the script then gets the data already in the DB, finds the differences between the two sets, deletes the old data that doesn't match, adds the new data, and then sends an email to each person affected by these changes (one email per person with all relevant changes in said email, which is a whole other step). The email addresses are pulled by means of an LDAP search as our DB has their work email but the LDAP has their personal email which ensures they get the email before they come in the next day and get caught unaware. Finally, the data-uploader is told "Changes have been made, emails have been sent." which is really all they care about.
Now I may be adding a Google Calendar API that posts the data (when it's scheduling data) to the user's Google Calendar. I would do it via their work calendar, but I thought I'd get my toes wet with Google's API before dealing with setting up a WebDav system for Exchange.
</backstory>
Now!
The practical question
At this point, pre-Google integration, the script takes at most a second and a half to run. It's pretty impressive, at least I think so (the server, not my coding). But the Google bit, in tests, is SLOOOOW. We can probably fix that, but it raises the bigger question...
What is the best way to off-load some of the work after the user has gotten confirmation that the DB has been updated? This is the part he's most concerned with and the part most critical. Email notifications and Google Calendar updates are only there for the benefit of those affected by the upload, and if there is a problem with these notifications, he'll hear about it (and then I'll hear about it) regardless of the script telling him first.
So is there a way, for example, to run a cronjob that's triggered by a script's last execution? Can PHP create cronjobs with exec() ability? Is there some normalized way of handling post-execution work that needs getting done?
Any advice on this is really appreciated. I feel like the scripts bloated-ness reflects my stage of development and the need for me to finally know how to do division-of-labor in web apps.
But I also get worried that this is not done, as user's need to know when all tasks are completed, etc. So this brings up:
The best-practices/more-subjective question
Basically, is there an idea that progress bars, real-time offloading, and other ways of keeping the user tethered to the script are --when combined with optimization of the code, of course-- the better, more-preferred method then simply saying "We're done with your part, if you need us, we'll be notifying users" etc etc.
Are there any BIG things to avoid (other than obviously not giving the user any feedback at all)?
Thanks for reading. The coding part is crucial, so don't feel obliged to cover the second part or forget to cover the coding part!
A cron job is good for this. If all you want to do when a user uploads data is say "Hey user, thanks for the data!" then this will be fine.
If you prefer a more immediate approach, then you can use exec() to start a background process. In a Linux environment it would look something like this:
exec("php /path/to/your/worker/script.php >/dev/null &");
The & part says "run me in the backgound." The >/dev/null part redirects output to a black hole. As far as handling all errors and notifying appropriate parties--this is all down to the design of your worker script.
For a more flexible cross-platform approach, check out this PHP Manual post
There are a number of ways to go about this. You could exec(), like the above says, but you could potentially run into a DoS situation if there are too many submit clicks. the pcntl extension is arguably better at managing processes like this. Check out this post to see a discussion (there are 3 parts).
You could use Javascript to send a second, ajax post that runs the appropriate worker script afterwards. By using ignore_user_abort() and sending a Content-Length, the browser can disconnect early, but your apache process will continue to run and process your data. Upside is no forkbomb potential, Downside is it will open more apache processes.
Yet another option is to use a cron in the background that looks at a process-queue table for things to do 'later' - you stick items into this table on the front end, remove them on the backend while processing (see Zend_Queue).
Yet another is to use a more distributed job framework like gearmand - which can process items on other machines.
It all depends on your overall capabilities and requirements.