Using MX records to validate email addresses - php

Scenario:
I have a contact form on my web app, it gets alot of spam.
I am validating the format of email addresses loosely i.e. ^.+#.+\..+$
I am using a spam filtering service (defensio) but the spam scores returned are overlapping with valid messages. At a threshold of 0.4 some spam gets through and some customer's questions are wrongly thrown in a log and an error displayed.
All of the spam messages use fake email addresses e.g. zxmzxm#ywduasm.com
Dedicated PHP5 Linux server in US, mysql, logging spam only, emailing the non spam messages (not stored).
Proposal:
Use php's checkdnsrr(preg_replace(/^.+?#/, '', $_POST['email']), 'MX') to check the email domain resolves to a valid address, log to file, then redirect with an error for messages that don't resolve, proceed to the spam filter service as before for addresses that do resolve according to checkdnsrr().
I have read (and i am sceptical about this myself) that you should never leave this type of validation up to remote lookups, but why?
Aside from connectivity issues, where i will have bigger problems than a contact form anyway, is checkdnsrr going to encounter false positives/negatives?
Would there be some address types that wont resolve? gov addresses? ip email addresses?
Do i need to escape the hostname i pass to checkdnsrr()?
Solution:
A combination of all three answers (wish i could accept more than one as a compound answer).
I am using:
$email_domain = preg_replace('/^.+?#/', '', $email).'.';
if(!checkdnsrr($email_domain, 'MX') && !checkdnsrr($email_domain, 'A')){
//validation error
}
All spam is being logged and rotated.
With a view to upgrading to a job queue at a later date.
Some comments were made about asking the mail server for the user to verify, i felt this would be too much traffic and might get my server banned or into trouble in some way, and this is only to cut out most of the emails that were being bounced back due to invalid server addresses.
http://en.wikipedia.org/wiki/Fqdn
and
RFC2821
The lookup first attempts to locate an MX record associated with the name.
If a CNAME record is found instead, the resulting name is processed as if
it were the initial name.
If no MX records are found, but an A RR is found, the A RR is treated as
if it was associated with an implicit MX RR, with a preference of 0,
pointing to that host. If one or more MX RRs are found for a given
name, SMTP systems MUST NOT utilize any A RRs associated with that
name unless they are located using the MX RRs; the "implicit MX" rule
above applies only if there are no MX records present. If MX records
are present, but none of them are usable, this situation MUST be
reported as an error.
Many thanks to all (especially ZoogieZork for the A record fallback tip)

I see no harm doing a MX lookup with checkdnsrr() and I also don't see how false positives may appear. You don't need to escape the hostname, in fact you can use this technique and take it a little further by talking to the MTA and testing if the user exists at a given host (however this technique may and probably will get you some false positives in some hosts).

DNS lookups can be slow at times, depending on network traffic & congestion, so that's something to be aware of.
If I were in your shoes, I'd test it out and see how it goes. For a week or so, log all emails to a database or log file and include a field to indicate if it would be marked as spam or legitimate email. After the week is over, take a look at the results and see if it's performing as you would expect.
Taking this logging/testing approach gives you the flexibility to test it out and not worry about loosing customer emails.
I've gotten into the habit of adding an extra field to my forms that is hidden with CSS, if it's filled in I assume it's being submitted by a spam bot. I also make sure to use a name like "url" or "website_url" something that looks like a legitimate field name to a spam bot. Add a label for the field that says something like "Don't fill out this field" so if someone's browser doesn't render it correctly, they will know not to fill out the spam field. So far it's working very well for me.

function mxrecordValidate($email){
list($user, $domain) = explode('#', $email);
$arr= dns_get_record($domain,DNS_MX);
if($arr[0]['host']==$domain&&!empty($arr[0]['target'])){
return $arr[0]['target'];
}
}
$email= 'user#radiffmail.com';
if(mxrecordValidate($email)) {
echo('This MX records exists; I will accept this email as valid.');
}
else {
echo('No MX record exists; Invalid email.');
}

//The Code *https://davidwalsh.name/php-email-validator*
function domain_exists($email, $record = 'MX'){
list($user, $domain) = explode('#', $email);
return checkdnsrr($domain, $record);
}
if(domain_exists('user#davidwalsh.name')) {
echo('This MX records exists; I will accept this email as valid.');
} else {
echo('No MX record exists; Invalid email.');
}

An MX Lookup is only part of the picture, if you want to ensure the email address is itself valid, then you need to attempt to send an email to that account.
The other possible scenario is, someone can be simply using hijacked email accounts from a compromised machine anyway. Of course, that is probably a little bit less likely to occur, but it still does.
There are email address validation libraries out there that do this, simply search for email validation.
All of this can be done asynchronously. I have this setup on my site in which case the email is saved in the database (for auditing purposes), a job queued, then when the job comes time to execute, any additional validation is performed at that point in time. It offloads the heavy lifting to another thread.
To the user, it appears as if the email was sent already, it was (it's in the database), and can be viewed internally, but the actual email won't get mailed out until that job executes which can be immediately or some set amount of time depending on the server load.
Walter

Related

How reliable is Symfonys2 EmailValidator checkMX?

Like in title, how reliable is this check.
https://github.com/symfony/symfony/blob/3.0/src/Symfony/Component/Validator/Constraints/EmailValidator.php#L139-L142
Every single server in the world have their MX record exposed to the world?
Or is there a possibility that there exists a server that hides their MX record and even if email will be valid, check will fail.
UPDATE:
I have already checked Symfony documentation and source.
I know and I've tested that only emails domain is checked, not the user part.
I just don't know how reliable it is. Is it always possible to check servers MX records.
This validator only check if the DNS user in the email is valid for example if you set this email fail#google.com it will be validated even this email doesn't exit.
http://symfony.com/doc/current/reference/constraints/Email.html#checkmx
The checkMX option is done using PHP's checkdnsrr function, along with the checkHost option. You can also set a strict option and include the egulias/email-validator library for tighter restrictions. Using those options should be completely sufficient and reliable in determining if an email address is valid.
From experience, It seems like a good idea in theory and in practice will work most of the time but there will be occasions where the MX lookup fails due to network issues etc, and then the email will return as invalid, this then causes users who have inserted their correct email to become frustrated.
This in turn then causes developers to have to spend time looking at what went wrong.
Additionally, if a user is putting in a fake email, all they need do is use a real domain and fake user (like fbjdsbafjkbsdjafj #gmail.com) so it is of limited usefulness.
For 99.9% of cases it will be sufficient to check for email well-formedness without mx lookup using an established email checking library, (rather that roll-your-own)

How to correct email address domains which are misspelled?

Sometimes users misspelled their email domain and hence they enter wrong email address.
Eg. abc#gmial.com rather than abc#gmail.com
Has anybody thought about this before? Can anybody suggest how to handle this type of mistakes?
It didn't exist when this question was asked, but I recommend MailCheck which auto-suggests corrections to entered emails. It's used successfully by large companies.
Can anybody suggest how to handle this type of mistakes?
You would usually send a confirmation E-Mail to the address given, and proceed only if a link in that E-Mail has been clicked.
There is no other good way to deal with this - it's impossible to tell for sure whether gmial.com is a typo or not, seeing as it's a valid domain.
Create a list of common email domain names:
hotmail.com
gmail.com
googlemail.com
... etc
When a user enters an email address, take the domain name of the entered address and take the Levenstein distance between your list. If the distance is 1 (or maybe up to 2) then ask the user to confirm that's the email address they meant.
In my opinion it is bordering on impossible to come up with a generic solution for the generic case.
That being said, the most common typo is to interchange two adajcent letters.
So you might want to check for character content for the largest sites gmail, yahoo and what have you; Based on that suggest an alternative spelling if the original does not match gmail etc.
Do not assume the user is at fault, suggest alternatives if it looks suspicious compared to common names. A white-list was mentioned in another reply.
Use confirmation mails if you need to know you can get a reply from this address.
You cannot assume the spelling you find is in error, that is what confirmation mails are for.
Make it very non-obtrusive (ajax springs to mind).
In our forms we're using a combination of techniques. While bad data can still slip through, the chances are vastly reduced.
First is to do a simple formatting regex that is commonly available - just be sure it's RFC-compliant. If this fails, it's good to offer the user a confirmation form at this point, because they may catch other errors for you while fixing this problem.
The next part is to check the TLD part of the domain. Since all TLDs can be known, these are relatively easy to scan for misspellings using some regex tests. Just keep a list of all current TLDs in a table somewhere and update it form time to time as needed (mind you, this list can get complex when dealing with international TLDs. If you're only dealing with US traffic, the rules are much easier, and that's something else you can filter out. For example, if you're selling a service only available in the US, it would make sense to filter out international emails at form submission time. We are, so this works for us).
Third is to do something like what #npclaudiu suggested - scan for common misspellings of big-name mail hosts (gmail, hotmail, yahoo, etc) in the domain part and if a possible hit is detected, offer a confirmation form to the user. (You entered someone#hptmail.com, did you mean hotmail.com?)
If you get through those steps, then you can do the MX lookup suggested by #symcbean.
Finally, if all of that succeeds, there is a method (but I've not yet tested it) for communicating with the remote SMTP host to see if the mailbox exists. We're about to begin testing this ourselves. I found the how-to for such here:
http://www.webdigi.co.uk/blog/2009/how-to-check-if-an-email-address-exists-without-sending-an-email/
The funny thing is that the url does exist http://www.gmial.com
In fact it would be very difficult for you to know if it's a mistake or just a "strange" domain. Look at the Google API's because when you type something wrong in Google they propose you "did you mean...."
good luck
Arnaud
You can not provide this functionality in a way that you auto correct the misspelled email domain names, because the name which you are assuming to be invalid, would be valid. you should expect anything to be entered as a email address domain name.
I would suggest, if you are creating a signup form, you provide user with a dropdown having all possible domain names which you are aware of so that he can make a selection from that.
Hope this helps.
You could create a list of popular e-mail domains (gmail.com, yahoo.com, ymail.com, etc) in your db and validate the e-mail address that the user inputs against this list, and if the domain resembles with one of these domains, you should show a warning and allow the user to correct it if necessary, not auto correct it. And to compare the domain entered with the domains in your list, you might use an algorithm like the the one used in the soundex function in SQL Server, that matches words based on if one word sounds like the second.
Edit: you can find more details the SOUNDEX function here.
As mentioned before, it is not a good idea to automatically assume that someone has mistyped an email. A better approach would be to implement a little javascript function that checks if the domain of the email was possibly mistyped and alert the user instead of assuming they were wrong from the start.
Give me a minute to create a little mockup.
EDIT: OK, so maybe it was more than a minute. Take a look at http://jsbin.com/iyaxuq/8/edit and see for yourself how javascript can help prevent common typing errors. Try emails like: test#gmail.cmo, another#yhaoo.com, loser#htomali.ocm (typo of hotmail), and me#aol.com.
Note: I used a lazy regex to validate the email. Don't rely on it (or for that matter, most regexes) for a real app.
Trying to automate correction of bad data is a very dangerous practice. Ultimately, only the user can provide the correct data. However there are strict rules about formatting an email address - a regex check can be run in javascript (or using the preg functions with the same regex syntax) - but note that there are a lot of bad examples on the internet of regexes claiming to solve the problem.
This should be a fairly complete implementation of an RFC2822 ADDR_SPEC validator:
/[a-z0-9!#$%&'*+/=?^_`{|}~-]+(?:\.[a-z0-9!#$%&'*+/=?^_`{|}~-]+)*#(?:[a-z0-9](?:[a-z0-9-]*[a-z0-9])?\.)+[a-z0-9](?:[a-z0-9-]*[a-z0-9])?/gi
However in practice I find this to be adequate:
/^[a-z0-9\._%+!$&*=^|~#%'`?{}/\-]+#([a-z0-9\-]+\.){1,}([a-z]{2,22})$/gi
Then, serverside, you can do an MX lookup to verify that the domain provided not only meets the formatting requirements but exists as an email receiving site.
This does not prove that the named mailbox exists at that site, nor that it is accepting emails - ultimately you'd need to send an email to that address including a click back link / password to establish whether the email address is valid.
Update
While, as the top voted answer here says, the best way to validate an ADDR_SPEC is to send a token to the address to be submitted back via the web, this is not of much help if the data is not coming from the person whom controls the mailbox, and the action is dissociated from the primary interaction even when they do. A further consideration is that an email address which is valid today might not be tomorrow.
Using a regex (and an MX lookup) is still a good idea to provide immediate feedback to the user, but for a complete solution you also need to monitor the bounces.

Checking email addresses for Spam Honeypots

Is there an API (preferably PHP based) or another means of checking a use-inputted email address against a list of known email honeypots or other email-address related spam stopping techniques?
Context: I'm working on a system to handle contacts for our clients. It'll eventually interface with Verticalresponse or similar. I want to check all incoming contact email addresses to be sure they're legit and not a purchased list.
You may not find such a database, or a PHP interface. Project Honeypot alone has 62,782,527 trap addresses on their monitor. That's 62 million addresses.
Anyone can make a spam trap. Check for example these references to get a picture of the futility of detecting honeypot addresses.
What is a honeypot
Project Honeypot home page
So what can you do to check that your customers' lists are legitimate? Use an check list for evaluating list contents. If you see anything like removethis, or any such common strings fooling spam harvesters, the list you're looking is probably not a good one. Also, a DNS check for the existence of any records for the domain is also a good way to see to which addresses it is possible to try to deliver mail. The DNS check won't tell you if your client has bought the list or not, but it will at least allow you to disregard those recipients.

Safe way to send mail via PHP to many users

Let me explain what I mean in my title. Let's say, that for example I'm creating a small e-commerce system for one web shop/catalog. There's a possibility for customers to choose, do they wish to receive newsletters or not. If they do, then logically thinking the newsletters should be send immediately as the newsletter is formed and ready.
Of course it can be done simple by fetching all specified user e-mails from database and using for cycle to send mail via mail function in loop, but problem is that I was told, that this is bad practice. The simple and not cheap way would be purchasing internet service for sending newsletters, but what for the php programmer is needed, then?
So I ask you humble comrades, what from your point of view might be a solution?
NB! You probably won't believe me, but it is not for spamming.
UPD: I might have explained myself wrong, but I would like to hear a solution not only about the correct way to send mail, but also about the correct delivery. Since not every mail send is always delivered.
Of course there are some reasons, which are unpredictable. For example someplace along the way something broke and mail was lost (if such thing is possible), but there are also other reasons which are influenced maybe from server or elsewhere. Maybe there is need to talk with hoster about it?
There is no reason why you couldn't write it in PHP, although I would not make it a part of a webrequest / HTTP process. I've successfully implemented for give or take 500,000 subscribers per mailing (depending on local data available, as this was a location-specific project). It was an in-house project, so unfortunately no code/package for you, but some pointers I came across:
Setting up delivery
Started out with phpmailer itself, to take care of formatting, encoding of contents and headers, adding of attachments etc. That portion of it works well, and I wouldn't want to write that from scratch.
The 'sending' of an email itself is just setting some flag in a database whether / how / what should be sent to (a portion of) the subscribers.
After this flag is set, it will automatically be picked up by a cronjob, no more webserver involved.
I started out with a heavily polluted database with millions of email addresses, of which a lot were obvious not valid, so first thing was to validate all email addresses for format, then for host:
filter_var($email, FILTER_VALIDATE_EMAIL); over the subscribers (and storing the result obviously) got rid of the first few hundred thousand invalid emails.
Splitting out the host (and storing the host name) from the emails, and validating that (does it have an MX or at least an A record in DNS, but keep in mind: you can send email to an IP-address foo#[255.255.255.255], so do keep those valid)) got rid of a good portion more. The emailaddresses here are not permanently disabled, but with a status flag that indicates they're disabled because of the domain name / ip.
Scripts were changed to require valid emailaddresses on subscription / before insertion, this nonsense of 'you aint gonna get it#anywhere' subscription-pollution in the database was just ridiculous.
Now I ended up with a list of email addresses that had the potential of being valid. There are in essence 3 ways to detect invalid addresses (keep in mind, the all can be temporary):
They are denied immediately by the server.
The earlier determined server just doesn't listen to traffic.
They are bounced long after you thought you delivered them.
Strange thing, the bounces, which every emailserver seems to have another format forand were a hell to parse at first, ended up actually pretty easy to capture using VERP. Rather then parsing whole emails, a dedicated email address (let's call it mailer#example.com) was configured to rather then deliver to mailbox, to pipe it through a command, and if we sent email to user#server.tld, the Return-Path was set for mailer+user=server.tld#example.com. Easily parsed on receipt, and after how many bounces (mailbox could not exist, mailbox may be full (yes, still!), etc.) you declare an emailaddress unusable is up to you.
Now, the direct denial by the server. Probably we could have gone with properly configuring some MTA and/or writing plugins for those, but as the emails were time-sensitive, and we had to have absolute configurable control per mailing over last usable delivery time (after which the email was not longer of interest to the user), throttling per receiving server, and generally everything, it would take about the same time writing a mailer in PHP which we knew better, that used the SMTP protocol directly to socket 25 on receiving servers. With a minimum amount of effort the possibility of another transport then the default choices in PHPMailer was built-in. The SMTP protocol is actually quite simple, but there are some caveats:
A lot of receiving server apply Grey Listing: most spambots will not really care if a specific mail arrives, they just churn them out. So, if an unknown / not yet trusted sender send mail, it will be temporarily rejected. Catch that (usually code 451), and place the email in the queue for later retry.
A mailserver, especially of the larger ISP's and free services (gmail, hotmail/msn/live, etc.) will not stand for a torrent of mail without fighting back: after the first couple of hundred / thousand, they start rejecting you. More about that later.
Getting speed
Now, we had a delivery system that worked, but it needed to be fast. Sending a 10,000 emails in an hour is all fine if you only have 10,000 addresses to send to, but the minimum we required was about 200,000 per hour. Start of it was a dedicated server (which can actually be pretty low powered, no matter what you do, most of time taken in delivery of email is in the network, not on your server).
Caching of IP's: remember all those IP's we requested from hostnames in email addresses? We stored those obviously, and looking up their IP's again and again causes considerable lag. However, IPs may change: a DNS record there, another MX at another place... the data gets stale fast. Most of the time the server isn't actually sending anything (subscription newslettters come in bursts obviously), a low-priority cronjob is running checking all hostnames with a stale IP (we chose older then 1 day as being stale) for an IP address, including those which previously had none (new domains get registered all the time, so why shouldn't a domain become available the day after someone already enthusiastically subscribed with his/her brand new email address? Or server problems with some domain are solved, etc.). Actually sending the emails now required no more domain lookups.
Reuse of the SMTP connection: setting up a connection to a server takes relatively a large portion of the time to deliver an email when you're talking directly to port 25. You don't have to setup a new connection for every email, you can just send the next over the same connection. A bit of trail-and-error has resulted in setting the default here to about 50 emails per connection (assuming you have that many or more for the domain). However, on failure of an emailaddress closing and reopening the connection for a retry sometimes helped. All in all, this really helped to speed things along.
Some obvious one, so obvious I almost forgot to mention it: it would be a waste to have to create the body of the email on the spot: if it's a general mail, have the body ready (I altered PHPMailer somewhat to be able to use a cached email), possibly days before (if you know you're going to send a mail on Friday, and your server is idling, why not prepare them on Wednesday already? If it's personalized, you could still prepare it beforehand given enough time, if not, at least have the non-personalized portions waiting to go.
Multiple processes. Did I mention much of the time it takes to deliver email is spend on the network? One mailing process is not nearly getting the most from your emailserver, barely noticable load and the mails are trickling out. Play around with a number of processes mailing different portions of the queue to see what's right for your server/connection, but remember 2 very important things:
Different processes make you very vulnerable to race conditions: be freaking absolutely sure you have a fullproof system that will never send the same mail twice (thrice, of even more). Not only does it seriously annoy users, your spamrating goes up a notch.
Keep domains together where possible: randomly picking from the queue you will lose the advantage of keeping an open connection to the server receiving email for the domain.
Avoiding rejects
You're going to send a lot of mail. That's exactly what spammers do. However, you don't want to be seen as a spammer (after all, you aren't, are you)? There are a number of mechanisms in place which will thoroughly increase your trustworthiness to receiving servers:
Have a proper reverse DNS: processes checking the DNS belonging to the IP that is sending the email like it very much if the second level domains match: are you sending mail on behalf of example.com? Make sure your server's reverse DNS is something like somename.example.com.
Publish SPF records for your domain: explicitly indicate the machine used to send your bulk email is allowed & expected to send mail with that From / Return-Path headers.
remember rejects: servers don't like it to tell you again and again that different email addresses don't exist. Either automated mechanisms, and even human admins, blocked our server while we worked through all non-validated email addresses that did (no longer) exist. We didn't employ a double opt-in until later, so the database was polluted with typos, people switching IPs and thereby email address, prank email addresses and so on. Be sure to capture those invalids, and given enough or sever enough failures, unsubscribe them. They're doing you no good, they're hogging resources, and if they really want you mail and the mailbox becomes available later, they'll just have to resubscribe.
DKIM is another mechanism that may increase your trustworthiness, but as we haven't implemented it (yet), I cannot tell you much about that.
MX records: some servers still like it if your sending server is also the receiving server for the domain. As it was at the time, we had only 1 MX, and as the mailing server was still not that very busy, we dubbed it the fallback MX server for the domain. The normal MX server was not the server sending the subscriptions, as it is very irritating to be temporarily blocked by a server you're trying to send an important email to (clients etc.) because you already sent a load of less important mail. It does have the highest preference as receiving MX, but in the event it would fail we had the nice bonus that our subscription sending server would still be fallback for delivery, so in crisis we could still get to it, preventing awkward bounces to customers trying to reach us.
Tell them about you. Seriously. A lot of major players in free email addresses like live.com offer you the opportunity to sign up in some way, or have some point of contact to go to for help & support if your emails get rejected. I you have a legitimate reason to send so many emails, and it is believable you have that many subscribers, chances are they seriously up the number of emails you can send to their server per hour. A meager 1,000 may become somewhere in the ten-thousands or even higher if you are persuasive and honest enough. There may be contracts, requirements you have to fulfill, and promises you must make (and keep) to be allowed this. ISP's are a brand apart, and every other player is different. Don't bother calling them usually, because 99% of the time the only numbers you can find will only have people willing to troubleshoot your internet connection, which understand (or are allowed) little else. An abuse# email address is a good place to start, but see if you can delve up a more to-the-point email address up from somewhere. Be precise, honest and complete: roughly how many subscribers of you have an emailaddress with that ISP, how often are you trying to mail them, what are the errors or denials you receive, how is the subscribe & unscubscribe process like, and what is the service you actually provide to their customers. Also, be nice: how vital sending those mails may be to your business, panicking about it and claiming terrible losses does not concern them. A polite statement of facts and wishes, and asking whether they can help rather then demanding a solution goes a very long way.
Throttling: as much as you tried, some server will accept only a certain amount of mail per hour and/or day from you. Learn those numbers (we're logging successes & failures anyway), set them to a reasonable default for the normal domains, set them to agreed upon limits for bigger players.
Avoiding being tagged as spam
First rule: don't spam!
Second rule: ever! Not a 'once off', not a 'they haven't subscribed but this may be the deal of a lifetime to them', not with the best intentions, people have had to ask for your emails.
Obviously set up a correct double opt-in subscription mechanism.
PHPMailer does set proper headers on its own,
Set up an easy unsubscribe mechanism, by web (include a link to it in every mail), possibly also email and customerservice if you have it. Make sure the customerservice can unsubscribe people directly.
As said earlier: unsubscribe (excessive) fails & bounces.
Avoid spammy 'deal of a lifetime' wordings.
Use url's in your emails sparingly.
Avoid adding links to domains outside your control, unless you are absolutely sure you can trust them not to spam, if even then...
Provide value to the user: being tagged as spam by user-interaction in google/yahoo/live webmail clients seriously hurts future successes (on a site note: if you sign up for it, live/msn/hotmail will forward all mail to you send by your domain which is tagged as spam by users. Learn to love it, and as always: unsubscribe them, they clearly don't want your mall and are hurting your spam rating).
Monitor blacklists for your IP. If you appear on one of those, it's bye bye, so immidiate action in both clearing your name and determining the case is required.
Measuring success rate
With the whole process under your control, you are reasonably sure the email ended up somewhere (although it could be the MX's bitbucket or a spam folder), or you have logged a failure & the reason why. That takes care of the 'actually delivered' numbers.
Some people will try to convince you to add links to online images to your emails (either real or the famous 1x1 transparent gif) to measure how many people actually read your email. As a high percentage blocks those images, these numbers are shaky at best, and our believe is we just shouldn't bother with them, their numbers are utterly unreliable.
Your best bet for measuring actual success rate are a lot easier if you want the users to do something. Add parameters to links in the mail, so you can measure how many users arrive at the site you linked, whether they performed desired actions (watched a video, left a comment, purchased goods).
All in all, with all the logging, the user-interface, configurable settings per domain / email / user etc. It took us about 1,5 man-month to build & iron out the quirks. That may be quite an investment compared to outsourcing the emails, it may be not, it all depends on the volume & business itself.
Now, let the flaming begin that I was a fool to write an MTA in PHP, I for one thoroughly enjoyed it (which is one reason I wrote this huge amount of text), and the extremely versatile logging & settings capabilities, per-host alerts based on failure percentage etc. are making live oh so easy ;)
Using something like Swiftmailer, PHPmailer or Zend_mail are much better alternatives to using the simple mail() function as it can be easily marked as spam. There are simply too many issues with mailing that need to be considered - most of these are solved by using pre-existing libraries.
Just a few problems that need to be addressed when sending mass emails manually:
Using incorrect headers.
Processing bounced messages
Timing out of script due to an influx of emails.
Edit:
Probably not the answer you're looking for. But, I would strongly suggest you invest in something like Campaign Monitor or Mail Chimp. Since this process is not for educational purposes, but commercial, I would strongly suggest the above services.
I got your question, but before replying, let's me go to usual considerations. First, I strongly recommend using a service like Mail Chimp. It's kinda free for small jobs, and has many cool features, like tracking back how many emails were open, how many were clicked, how many failed on deliver... Think about make a favor to yourself and don't reinvent the wheel.
Now, for getting to know purpose, let's go to the reply to your question.
First thing to have in mind is to enforce your list to be a good one. How to do that? Well, for a good list, I mean a valid email addresses list. Simple put a newsletter form on your page, with just one field (maybe a captcha, but I don't think it is necessary).
Save all input to a database table, with a field "isValid" set as false by default, and any kind of unique hash. Then you send a confirmation email, with a link (with the hash generated) for confimation which when clicked will make the "isValid" flag true, and a link for cancelation (ALWAYS send this cancelation link within all your emails).
This is what stores and serious sites do. Anything that force your customers/visitors to receive is a bad moral practice (ie, spam).
Second thing, use a good hosting service. Too cheap services usually are used by spammers, and major email services blacklist everything coming from those addresses.
I know, you should be asking yourself if I get your question wrong. No, I don't, technical stuff comes now.
Why is a bad practice put a mail function inside a for loop? Simple. Because function mail do several operations everytime it is called. PHP, will open a connection to mail server, send data to be parsed, ask for sending, register mail server status, close connection, bubble up status to finish the mail function you called and clean up the memory mess.
This connection overhead is the problem people state as bad practice from programming point of view. Using a SMTP/IMAP solution is better because it optimizes this process.
A little bit down on tech stuff I see your questions about delivery. Well, as I said, you have some ways to ensure you emails list is good enough. But what if another exception occurs, like having a blackout + no-break failures on customer server?
Well, PHP keeps the status of "asking mail server to send, mail server sent". If the mail server sent your message, PHP will return true. Period.
If client was unable to receive, or reject, you should check email headers and email status. These are on email server. Once again, these informations can be accessed with SMTP/POP/IMAP extensions, not with mail function.
If you wanna go further, read IMAP docs, search for email classes (phpclasses.org, pear and pecl are best places to look into).
Extra tip: RFCs can be useful, as you can understand better what email servers really talks to each other.
Extra tip 2: Access you gmail or ymail and check for your sent/received messages for their "full version" and read its headers. You can learn a lot with them.
Via SMTP Authentication: http://www.cyberciti.biz/tips/howto-php-send-email-via-smtp-authentication.html
Just use PHP Mail and study IMF and how to build custom headers you can attach the fourth parameter, exmaple follows
<?php
// multiple recipients
$to = 'aidan#example.com' . ', '; // note the comma
$to .= 'wez#example.com';
// subject
$subject = 'Birthday Reminders for August';
// message
$message = '
<html>
...
</html>
';
// To send HTML mail, the Content-type header must be set
$headers = 'MIME-Version: 1.0' . "\r\n";
$headers .= 'Content-type: text/html; charset=iso-8859-1' . "\r\n";
// Additional headers
$headers .= 'To: Mary <mary#example.com>, Kelly <kelly#example.com>' . "\r\n";
$headers .= 'From: Birthday Reminder <birthday#example.com>' . "\r\n";
$headers .= 'Cc: birthdayarchive#example.com' . "\r\n";
$headers .= 'Bcc: birthdaycheck#example.com' . "\r\n";
// Mail it
mail($to, $subject, $message, $headers);
?>
Source: http://php.net/manual/en/function.mail.php
create a mail queue subsystem which might include tables such as mail_queue, mail_status, mail_attachments, mail_recipients and mail_templates etc...
You might consider PHPMailer
http://phpmailer.worxware.com/index.php?pg=exampleasendmail
You can add multiple recipients and a special callback function for handling the returning messages for every single mail sent. (for an example visit the link)
I Don't think that catching a "mail delivery failed" error mail is possible via php except that you are using PHPMailer via SMTP and once in a while watch for any returning message form any recipient in from your outgoing email collection.

How to check if an email address exists without sending an email?

I have come across this PHP code to check email address using SMTP without sending an email.
Has anyone tried anything similar or does it work for you? Can you tell if an email customer / user enters is correct & exists?
There are two methods you can sometimes use to determine if a recipient actually exists:
You can connect to the server, and issue a VRFY command. Very few servers support this command, but it is intended for exactly this. If the server responds with a 2.0.0 DSN, the user exists.
VRFY user
You can issue a RCPT, and see if the mail is rejected.
MAIL FROM:<>
RCPT TO:<user#domain>
If the user doesn't exist, you'll get a 5.1.1 DSN. However, just because the email is not rejected, does not mean the user exists. Some server will silently discard requests like this to prevent enumeration of their users. Other servers cannot verify the user and have to accept the message regardless.
There is also an antispam technique called greylisting, which will cause the server to reject the address initially, expecting a real SMTP server would attempt a re-delivery some time later. This will mess up attempts to validate the address.
Honestly, if you're attempting to validate an address the best approach is to use a simple regex to block obviously invalid addresses, and then send an actual email with a link back to your system that will validate the email was received. This also ensures that they user entered their actual email, not a slight typo that happens to belong to somebody else.
Other answers here discuss the various problems with trying to do this. I thought I'd show how you might try this in case you wanted to learn by doing it yourself.
You can connect to an mail server via telnet to ask whether an email address exists. Here's an example of testing an email address for stackoverflow.com:
C:\>nslookup -q=mx stackoverflow.com
Non-authoritative answer:
stackoverflow.com MX preference = 40, mail exchanger = STACKOVERFLOW.COM.S9B2.PSMTP.com
stackoverflow.com MX preference = 10, mail exchanger = STACKOVERFLOW.COM.S9A1.PSMTP.com
stackoverflow.com MX preference = 20, mail exchanger = STACKOVERFLOW.COM.S9A2.PSMTP.com
stackoverflow.com MX preference = 30, mail exchanger = STACKOVERFLOW.COM.S9B1.PSMTP.com
C:\>telnet STACKOVERFLOW.COM.S9A1.PSMTP.com 25
220 Postini ESMTP 213 y6_35_0c4 ready. CA Business and Professions Code Section 17538.45 forbids use of this system for unsolicited electronic mail advertisements.
helo hi
250 Postini says hello back
mail from: <me#myhost.com>
250 Ok
rcpt to: <fake#stackoverflow.com>
550-5.1.1 The email account that you tried to reach does not exist. Please try
550-5.1.1 double-checking the recipient's email address for typos or
550-5.1.1 unnecessary spaces. Learn more at
550 5.1.1 http://mail.google.com/support/bin/answer.py?answer=6596 w41si3198459wfd.71
Lines prefixed with numeric codes are responses from the SMTP server. I added some blank lines to make it more readable.
Many mail servers will not return this information as a means to prevent against email address harvesting by spammers, so you cannot rely on this technique. However you may have some success at cleaning out some obviously bad email addresses by detecting invalid mail servers, or having recipient addresses rejected as above.
Note too that mail servers may blacklist you if you make too many requests of them.
In PHP I believe you can use fsockopen, fwrite and fread to perform the above steps programmatically:
$smtp_server = fsockopen("STACKOVERFLOW.COM.S9A1.PSMTP.com", 25, $errno, $errstr, 30);
fwrite($smtp_server, "helo hi\r\n");
fwrite($smtp_server, "mail from: <me#myhost.com>\r\n");
fwrite($smtp_server, "rcpt to: <fake#stackoverflow.com>\r\n");
The general answer is that you can not check if an email address exists event if you send an email to it: it could just go into a black hole.
That being said the method described there is quite effective. It is used in production code in ZoneCheck except that it uses RSET instead of QUIT.
Where user interaction with his mailbox is not overcostly many sites actually test that the mail arrive somewhere by sending a secret number that must be sent back to the emitter (either by going to a secret URL or sending back this secret number by email). Most mailing lists work like that.
This will fail (amongst other cases) when the target mailserver uses greylisting.
Greylisting: SMTP server refuses delivery the first time a previously unknown client connects, allows next time(s); this keeps some percentage of spambots out, while allowing legitimate use - as it is expected that a legitimate mail sender will retry, which is what normal mail transfer agents will do.
However, if your code only checks on the server once, a server with greylisting will deny delivery (as your client is connecting for the first time); unless you check again in a little while, you may be incorrectly rejecting valid e-mail addresses.
I can confirm Joseph's and Drew's answers to use RCPT TO: <address_to_check>. I would like to add some little addenda on top of those answers.
Catch-all providers
Some mail providers implement a catch-all policy, meaning that *#mydomain.com will return positive to the RCPT TO: command. But this doesn't necessarily mean that the mailbox "exists", as in "belongs to a human". Nothing much can be done here, just be aware.
IP Greylisting/Blacklisting
Greylisting: 1st connection from unknown IP is blocked. Solution: retry at least 2 times.
Blacklisting: if you send too many requests from the same IP, this IP is blocked. Solution: use IP rotation.
HTTP requests on sign-up forms
This is very provider-specific, but you sometimes can use well-crafted HTTP requests, and parse the responses of these requests to see if a username already signed up or not with this provider.
Here is the relevant function from an open-source library I wrote to check *#yahoo.com addresses using HTTP requests: check-if-email-exists. I know my code is Rust and this thread is tagged PHP, but the same ideas apply.
Full Inbox
This might be an edge case, but when the user has a full inbox, RCTP TO: will return a 5.1.1 DSN error message saying it's full. This means that the account actually exists!
Disclosure
I run [Reacher][1], a real-time email verification API. My code is written in Rust, and is 100% open-source. Check it out if you want a more robust solution:
Github: https://github.com/reacherhq/check-if-email-exists
With a combination of various techniques to jump through hoops, I manage to verify around 80% of the emails my customers check.
Not really.....Some server may not check the "rcpt to:"
http://www.freesoft.org/CIE/RFC/1123/92.htm
Doing so is security risk.....
If the server do, you can write a bot to discovery every address on the server....
Some issues:
I'm sure some SMTP servers will let you know immediately if an address you give them does not exist, but some won't as a privacy measure. They'll just accept whatever addresses you give them and silently ignore the ones that don't exist.
As the article says, if you do this too often with some servers, they will blacklist you.
For some SMTP servers (like gmail), you need to use SSL in order to do anything. This is only true when using gmail's SMTP server to send email.
About all you can do is search DNS and ensure the domain that is in the email address has an MX record, other than that there is no reliable way of dealing with this.
Some servers may work with the rcpt-to method where you talk to the SMTP server, but it depends entirely on the configuration of the server. Another issue may be an overloaded server may return a 550 code saying user is unknown, but this is a temporary error, there is a permanent error (451 i think?) that can be returned. This depends entirely on the configuration of the server.
I personally would check for the DNS MX record, then send an email verification if the MX record exists.
function EmailValidation($email)
{
$email = htmlspecialchars(stripslashes(strip_tags($email))); //parse unnecessary characters to prevent exploits
if (eregi('[a-z||0-9]#[a-z||0-9].[a-z]', $email)) {
//checks to make sure the email address is in a valid format
$domain = explode( "#", $email ); //get the domain name
if (#fsockopen ($domain[1],80,$errno,$errstr,3)) {
//if the connection can be established, the email address is probably valid
echo "Domain Name is valid ";
return true;
} else {
echo "Con not a email domian";
return false; //if a connection cannot be established return false
}
return false; //if email address is an invalid format return false
}
}
Although this question is a bit old, this service tip might help users searching for a similar solution checking email addresses beyond syntax validation prior to sending.
I have been using this open sourced service for a more in depth validating of emails (checking for mx records on the e-mail address domain etc.) for a few projects with good results. It also checks for common typos witch is quite useful. Demo here.
"Can you tell if an email customer / user enters is correct & exists?"
Actually these are two separate things. It might exist but might not be correct.
Sometimes you have to take the user inputs at the face value. There are many ways to defeat the system otherwise.
Assuming it's the user's address, some mail servers do allow the SMTP VRFY command to actually verify the email address against its mailboxes. Most of the major site won't give you much information; the gmail response is "if you try to mail it, we'll try to deliver it" or something clever like that.
I think you cannot, there are so many scenarios where even sending an e-mail can fail. Eg. mail server on the user side is temporarily down, mailbox exists but is full so message cannot be delivered, etc.
That's probably why so many sites validate a registration after the user confirmed they have received the confirmation e-mail.
You have many simple online tools like https://mail7.net
This service check the email address format, then make sure the domain name is valid and extracts the MX records.
So in 90% you can be sure if it valid. 90% because some mail servers are not involved in the process.
<?php
$email = "someone#exa mple.com";
if(!filter_var($email, FILTER_VALIDATE_EMAIL))
echo "E-mail is not valid";
else
echo "E-mail is valid";
?>

Categories