I'm running an Amazon EC2 'Large' instance - Ubuntu Natty x64 with PHP5 and MySQL. We execute a PHP script via CRON - this sends an email list (2000-4000 emails) using SMTP/PHPMailer.
The server runs very slowly (several of these CRON jobs run in parallel) and it's making the CPU go to 100%. Memory usage is low (only ~600mb / 8gigs used) and each CRON job takes a significant chunk of CPU%, for example 20-30% each with 4-5 running in parallel.
Trying to pinpoint the issue, I ran slow query log in MySQL but nothing caught my attention. How should I go about narrowing down the cause of this CPU usage? Is SMTP/email just that CPU intensive or is it a sign that there is a programming or server issue? Thanks!
EDIT: The issue is resolved. There was a trivial (of course) bug that caused emails to 'grow' (some of the previous email content was being injected into the next email) - so the email pre-processing got more and more ridiculous with each subscriber. The resulting emails had hundreds/thousands of tracking images which all hit our server simultaneously when opened i.e. 'display images' in gmail. After fending off the self-inflicted DDoS attack and two days of no sleep, I am going to enjoy a bottle of Captain Morgan while contemplating various choices I've made in life.
Things that can cause this (Non-exhaustive list):
Non block IO with the SMTP server.
Implementation of SMTP library used in php with long strings manipulation/long files encoded every loop (remember: the protocol must be corrected formatted, and this is checked/encoded every time you call the send method by many other methods).
One (or more) query per mail.
Try measuring the time spent on each operation performed inside the loop.
You can use a simple $start = microtime (true) and printf (___FILE__.':'.__LINE__.": here after % 0.8f seconds\n", microtime(true) - $start); to a debug file or another profiling tool.
Try to reduce protocol formatting/encoding time.
Not allow more than Number of cores in your machine here instances of php scripts running simultaneously.
First, establish exactly which programs are taking 100% CPU.
If it's the PHP interpreter then there has be something wrong in your code - an SMTP client should never manage to reach 100% utilisation because a lot of the time it'll be limited in throughput by the SMTP server.
It may not be limited just to php... is the SMTP server you are connecting to on the local box? Are you running out of sockets? Are the requests blocking themselves?
Usually for things like what are are doing, a queue based approach is usually best.
Have you though about using a third party service for sending mail, where all you do is send a API HTTP request? There are several benefits to this, most of these services have relationships setup with mail servers so that your emails actually get to inboxes, and your SMTP server does not get blacklisted as spammy. Amazon has a service that can do this, so do others like Postmark.
Related
ReactPHP http server for each user, Is this a good idea?
In my application:
Each logged on user sends and receives data from server. In average one request per second.
After server response, the server have some extra work to do, which is related to specific user.
I can simply build new ReactPHP http server for each user who logs, and release the server after the user log out.
Is this will work? Am i missing something ?
No, it's not a good idea. You need a separate port per user in that case to route the user to the right server. That'd quickly exhaust your ports.
If you have blocking tasks within the event loop and want to use multiple processes because of that, just stick to traditional PHP with mod_php or php-fpm and start a new event loop for each process, do your thing and then exit.
If you don't have any blocking operations and everything is non-blocking, you can just use a single server and it handles all the things.
I'm not sure if exhausting ports would be the issue. Other services that do just this such as WebRTC SFUs. With 65,535 ports available that your talking 30,000+ concurrent TCP connections.
However, with that many users first obvious problem would be memory. At 10 mb just to start up PHP, that would be 300+ gb of memory without including a single line of code or actually doing anything. If your working with a seriously trimmed php binary you can get down to 4 or 5 mb, so at 5,000 concurrent users you would have around 25 gb.
But the real problem is that it would result in thousands of processes, which is impossible to work around. This would be entirely wasteful considering ReactPHP's eventloop can handle 10k users within a single process. Not saying a single PHP process can do the work for that many users (except maybe the most basic chat) but ReactPHP can handle the IO. Throwing them all into their own process though would a nightmare.
The basic idea has been tried in other languages by giving each user their own thread, but even in C/C++ this is quickly proven to be a bad design.
What is the best approach for sending the highest email rate possible with Swiftmailer?
We own an email automation tool and sometimes there are single sendouts of 40.000 emails. Our average rate with the spool:send command is of ~50 emails/min. I've tried copying the same command on the cron 5 times and it worked (i.e. it was sending ~250 emails/min), but it looks like the SMTP server got dizzy, because some contacts were receiving emails with another contact's information (any idea on what could be causing that?).
So now I was thinking about setting up 5 different mailers that spools the emails on different folders and running the 5 commands with a cron, each one for one of those mailers. Should it work? Any other recommended solution?
If you're sending 250 emails per minute, then you need something more resilient than the cron and the Swiftmailer spool. It will be hard to scale, a nightmare to debug, and not very inflexible. The Swiftmailer spool is great if you're sending no more than a couple of emails a minute, but any bigger than that and it's hard to scale and a nightmare to debug.
Instead, use a job queue like PHP Resque or Rabbit MQ (both are open source). You can replicate the 'spool' by having a queue of emails that need to be sent, and you can add multiple workers and queues. You could also have a second queue that actually adds the jobs to the first queue.
The advantage is that Rabbit MQ comes with a manager interface, so you can see things like how many emails are being sent, how many are failing, etc. Also, it's easier to scale up and down by adding and removing workers when you're under heavy load, for example.
Kacper from Sensio Labs actually gave a talk on Rabbit MQ with Symfony last year - http://www.slideshare.net/cakper/2014-0821-symfony-uk-meetup-scaling-symfony2-apps-with-rabbit-mq.
We are currently developing a mobile app for iOS and Android. For this, we need a stable webservices.
Requirements: - Based on PHP and MySQL, must be blazing fast, must be scalable
I've created a custom-coded simple webservices with multiple endpoints to allow passing data from the app to our database, and vice versa.
My Question:
our average response time with my custom coded solution is below 100ms (measured using newrelic) for normal requests (say, updating a DB field, or performing INSERT INTO). This is without any load however (below 100 users daily). When we are creating outbound requests (specifically, sending E-Mail using SendGrid PHP-Framework) we are seeing a response time of > 1000ms. It appears that the request is "waiting" for a response from Sendgrid. Is it possible to tell the script not to "wait for a response"? This is not really ideal. My idea was to store all "pending" requests to a separate table, and then using a cron to run through all "pending" requests and mark them as "completed". Is this a viable solution? And will one cron each minute be enough for processing requests (possible delay of 1min for each E-Mail)?
As always, any replies or suggestions are very appreciated. Thanks in advance!
To answer the first part of your question: Yes you can make asynchronous requests with PHP, and even ignore the service's response. However, as you correctly say it's not a super great solution.
Asynchronous Requests
This excellent blog post on PHP Asynchronous Requests by Segment.io comes to several conclusions:
You can open a socket and write to it, as described by this Stack Overflow Topic - However, it seems that this is actually blocking and fairly slow (300ms in their tests).
You can write to a log file and then process it in another way (essentially a queue, like you describe) - However, this requires another process to read the log and process it. Using the file system can be slow, and shared files can cause all sorts of problems.
You can fork a cURL request - However, this means you aren't waiting for a response, so if SendGrid (or some other service) responds with an error, you can't catch it and react.
Opinion Land
We're now entering semi-opinion land, but queues as you describe (such as a mySQL one with a cron job, or a text file, or something else) tend to be very scalable as you can throw workers at the queue if you need it to process faster. These can be outside your user facing system (and therefor not share resources).
Queues
With a queue, you'd have a separate service that would be responsible for sending an email with SendGrid (e.g.). It would pull tasks off a queue (e.g. "send an email to Nick")and then execute on it.
There are several ways to implement queues that you can process.
You can write your own - As you seem to want to stay on PHP/mySQL, if you do this you'll need to take into account a bunch of queueing problems and weird edge cases. However, you'll have absolute control and for a simple application maybe this will work.
You can implement a self hosted task queue - Celery is meant to be a distributed task queue, øMQ (ZeroMQ) and RabbitMQ can also be used as Task Queues. These are meant to be fast and distributed and have had a lot of thought put into them. You'd need to benchmark them in your system to see if they speed it up. It'd also mean you have to host additional pieces yourself. This however, is likely to be the fastest solution from a communication standpoint.
You can pass things off to a hosted task queue - IronMQ and Amazon SQS are both cool hosted solutions which means you wouldn't need to dedicate resources to them, additionally with IronWorkers (e.g.) you could have the other service taken care of. However, since you're trying to optimize a request to an external service, this probably isn't the solution in this scenario.
Queueing Emails
On the topic of queuing emails (specifically), this is something common to email senders. Like with everything else it means you can have better reliability (because if a service down the line fails you can keep it in the queue and retry).
With email however, there's some specific services out there for queueing messages. These are SMTP Servers. Theoretically you can setup a server like sendmail and then set SendGrid as your "smarthost" or relay and have the server send to SendGrid. It then queues and deals with service interruptions and sends mail with little additional code. However, SMTP servers are pains to deal with, even if they're just forwarding messages. Additionally, SMTP is even slower than HTTP to establish a connection and therefor probably not what you want, but it's good to know.
Another possible solution if you control your own server environment that will speed up your email sending and your application is to install a mail server such as Postfix locally. You then configure Postfix to use your Sendgrid credentials, so any email sent will go from your server to sendgrid.
This is not a PHP solution, but removes the need for writing your own customer solution. If you set Postfix as the default mail server. You can then just use the php mail() function to send email.
https://sendgrid.com/docs/Integrate/Mail_Servers/postfix.html
I have a large board with 1 million+ members and I'm experiencing great lag between the sending of emails to each member. At the current rate it would literally take me 3 months to send emails to all 1 million members.
My machine (dedicated):
dual quad xeon
32 gigs of ram
Centos 5.4
vBulletin
I've tried configuring it a number of ways and it is still slow.
The resolution is done locally, so I don't think that's the issue. Any suggestions?
vBulletin shows as it sends out the emails (500 at a time) so I know the script isn't timing out or a memory issue. To complete a page of 500, it takes 10 minutes. I am using PHP's mail() function, which is the only other option I have other than SMTP. With previous servers I have not configured myself, it had always been fast. Now trying it with sendmail (PHP's mail function) it is so slow.
Check your /etc/hosts file.
If you have an entry for your external IP address that points to your local hostname for example:
75.23.123.21 my-server-hostname
Change it to:
127.0.0.1 my-server-hostname
Then try running the PHP mail() function again.
I'm going to say if you have 1 million subscribers you need to reach, perhaps it's better that you not do yourself. Instead, why not use a service like Mailchimp who's primary focus is on delivering email.
Think about the advantages:
You don't worry about bandwidth, infrastructure and maintenance.
You get comprehensive analytics on how your email campaigns are performing and the health of your list - you say you have a million emails but how many of them bounce back? How many are opened? what is the open rate per country?, how many are marked as spam etc?
Depending on what your business is, you can A/B test your campaigns and optimize reads/clicks/conversions.
You will obviously pay extra for this service which is separate from your current hosting costs, but with Mailchimp you pay for what you use. Also if you can reach a million humans, you probably figured out how to monetize it (if not, you really should). So using a 3rd party service might pay for itself.
Mailchimp is one of many services out there (I mention it because I use it and very happy with it). You might want to check out SendGrid, Campaign Monitor and Aweber and weigh your pros and cons.
Probably not the answer you were expecting, but this is just my $0.02.
P.S: Mailchimp also gives you an API so you can seamlessly integrate your app with their services.
From the PHP Manual
It is worth noting that the mail() function is not suitable for larger volumes of email in a loop. This function opens and closes an SMTP socket for each email, which is not very efficient.
For the sending of large amounts of email, see the » PEAR::Mail, and » PEAR::Mail_Queue packages.
I'm far from an expert, but the mail() function uses a lot more CPU and memory than normal web functions but having 1 million users may already have a significant load (CPU and IO) on your server already. This may impact the speed of sending out emails, especially if you're on an older Xeon.
From what I know, dual quad Xeons are relatively new and sending those emails shouldn't take anywhere near as long as it is.
From what I've read, a lower end single cpu dedicated server should be able to send out about 500-700 emails per minute... but that is a system dedicated to only sending emails. On a mid range server like I suspect you have I'd expect it to be able to send the emails in hours, not months.
It may be a configuration or a load issue which could be on many different levels.
I have developed a web application where students across the country come and register for some academic purpose. The users are expected to be around 100k within next year.
I need to send all of these people periodic mails. The web app is developed using codeigniter. The php script can run for 3000 seconds. But still the app is unable to send mails to more that 100 users.
The machine I run is in cloud and has got 256MB ram. I used the free -m command to check the memory usage but that doesnt seem to be a problem. Everything works fine for 10-20 mails.
What would be the best solutions? Is there any way I can transfer this job to some other app/program/shell script ?
If you cannot use some external service for your emails I would just setup a cronjob that sends a couple of emails every n seconds. Its pretty cumbersome to send a lot of emails with php as you have discovered. But the cronjob solution works everytime as far as I know.
So you have a list of emails/addresses and a cronjob that iterates that list and sends the emails.
Sure you can send the emails yourself from a server, but that is only half the battle.
If you are sending bulk emails, as opposed to the transactional type, it's best to use a third party service that is already whitelisted on mail servers. The primary reason being, you might get blacklisted by the major mail servers as a spammer. If this happens, you will have to work with them individually to get removed from the blacklists.
Also if you are operating in the United States you should be familiar with CAN-SPAM: http://business.ftc.gov/documents/bus61-can-spam-act-Compliance-Guide-for-Business
MailChimp is a viable candidate for this. Serving mail is a time-consuming task, and sending it up to 100k email addresses will be an arduous task for your server.
They provide an very capable PHP API.
https://developer.mailchimp.com/
It is very appropriate to get this out of your web server threads and into something that runs standalone. Typically for stuff like this, I have tables in the DB where the appropriate information is written to from the web site, so when I am ready to e-mail, something on the backend can assemble the e-mails and send them out. If you are sending out 100,000 e-mails, you are going to want something multithreaded.
It might be good in this case to use one of the many off-the-shelf tools for this, rather than reinventing the wheel. We use an old version of Campaign Enterprise here, and I am able to throw queries at it which I can use to pull data from my web DB directly, over ODBC. That may or may not work well for you, considering you are in the cloud.
Edit: You can also write a PHP script to do this and call PHP from the shell. Perhaps you can get around your timeout limit this way? (This is assuming you are referring to some service-level timeout. If you are talking about the regular PHP timeout, this can worked around with set_time_limit().)
You might be able to do with using pcntl_fork or creating a daemon process.
Fork: By using the fork process you could batch the emails into groups and send them out. each batch could be in it's own fork child process
Daemon: By using a Daemon you could create a batch of emails and send them to be processed by the daemon. a daemon could run multiple batches at once.