How to correct email address domains which are misspelled? - php

Sometimes users misspelled their email domain and hence they enter wrong email address.
Eg. abc#gmial.com rather than abc#gmail.com
Has anybody thought about this before? Can anybody suggest how to handle this type of mistakes?

It didn't exist when this question was asked, but I recommend MailCheck which auto-suggests corrections to entered emails. It's used successfully by large companies.

Can anybody suggest how to handle this type of mistakes?
You would usually send a confirmation E-Mail to the address given, and proceed only if a link in that E-Mail has been clicked.
There is no other good way to deal with this - it's impossible to tell for sure whether gmial.com is a typo or not, seeing as it's a valid domain.

Create a list of common email domain names:
hotmail.com
gmail.com
googlemail.com
... etc
When a user enters an email address, take the domain name of the entered address and take the Levenstein distance between your list. If the distance is 1 (or maybe up to 2) then ask the user to confirm that's the email address they meant.

In my opinion it is bordering on impossible to come up with a generic solution for the generic case.
That being said, the most common typo is to interchange two adajcent letters.
So you might want to check for character content for the largest sites gmail, yahoo and what have you; Based on that suggest an alternative spelling if the original does not match gmail etc.
Do not assume the user is at fault, suggest alternatives if it looks suspicious compared to common names. A white-list was mentioned in another reply.
Use confirmation mails if you need to know you can get a reply from this address.
You cannot assume the spelling you find is in error, that is what confirmation mails are for.
Make it very non-obtrusive (ajax springs to mind).

In our forms we're using a combination of techniques. While bad data can still slip through, the chances are vastly reduced.
First is to do a simple formatting regex that is commonly available - just be sure it's RFC-compliant. If this fails, it's good to offer the user a confirmation form at this point, because they may catch other errors for you while fixing this problem.
The next part is to check the TLD part of the domain. Since all TLDs can be known, these are relatively easy to scan for misspellings using some regex tests. Just keep a list of all current TLDs in a table somewhere and update it form time to time as needed (mind you, this list can get complex when dealing with international TLDs. If you're only dealing with US traffic, the rules are much easier, and that's something else you can filter out. For example, if you're selling a service only available in the US, it would make sense to filter out international emails at form submission time. We are, so this works for us).
Third is to do something like what #npclaudiu suggested - scan for common misspellings of big-name mail hosts (gmail, hotmail, yahoo, etc) in the domain part and if a possible hit is detected, offer a confirmation form to the user. (You entered someone#hptmail.com, did you mean hotmail.com?)
If you get through those steps, then you can do the MX lookup suggested by #symcbean.
Finally, if all of that succeeds, there is a method (but I've not yet tested it) for communicating with the remote SMTP host to see if the mailbox exists. We're about to begin testing this ourselves. I found the how-to for such here:
http://www.webdigi.co.uk/blog/2009/how-to-check-if-an-email-address-exists-without-sending-an-email/

The funny thing is that the url does exist http://www.gmial.com
In fact it would be very difficult for you to know if it's a mistake or just a "strange" domain. Look at the Google API's because when you type something wrong in Google they propose you "did you mean...."
good luck
Arnaud

You can not provide this functionality in a way that you auto correct the misspelled email domain names, because the name which you are assuming to be invalid, would be valid. you should expect anything to be entered as a email address domain name.
I would suggest, if you are creating a signup form, you provide user with a dropdown having all possible domain names which you are aware of so that he can make a selection from that.
Hope this helps.

You could create a list of popular e-mail domains (gmail.com, yahoo.com, ymail.com, etc) in your db and validate the e-mail address that the user inputs against this list, and if the domain resembles with one of these domains, you should show a warning and allow the user to correct it if necessary, not auto correct it. And to compare the domain entered with the domains in your list, you might use an algorithm like the the one used in the soundex function in SQL Server, that matches words based on if one word sounds like the second.
Edit: you can find more details the SOUNDEX function here.

As mentioned before, it is not a good idea to automatically assume that someone has mistyped an email. A better approach would be to implement a little javascript function that checks if the domain of the email was possibly mistyped and alert the user instead of assuming they were wrong from the start.
Give me a minute to create a little mockup.
EDIT: OK, so maybe it was more than a minute. Take a look at http://jsbin.com/iyaxuq/8/edit and see for yourself how javascript can help prevent common typing errors. Try emails like: test#gmail.cmo, another#yhaoo.com, loser#htomali.ocm (typo of hotmail), and me#aol.com.
Note: I used a lazy regex to validate the email. Don't rely on it (or for that matter, most regexes) for a real app.

Trying to automate correction of bad data is a very dangerous practice. Ultimately, only the user can provide the correct data. However there are strict rules about formatting an email address - a regex check can be run in javascript (or using the preg functions with the same regex syntax) - but note that there are a lot of bad examples on the internet of regexes claiming to solve the problem.
This should be a fairly complete implementation of an RFC2822 ADDR_SPEC validator:
/[a-z0-9!#$%&'*+/=?^_`{|}~-]+(?:\.[a-z0-9!#$%&'*+/=?^_`{|}~-]+)*#(?:[a-z0-9](?:[a-z0-9-]*[a-z0-9])?\.)+[a-z0-9](?:[a-z0-9-]*[a-z0-9])?/gi
However in practice I find this to be adequate:
/^[a-z0-9\._%+!$&*=^|~#%'`?{}/\-]+#([a-z0-9\-]+\.){1,}([a-z]{2,22})$/gi
Then, serverside, you can do an MX lookup to verify that the domain provided not only meets the formatting requirements but exists as an email receiving site.
This does not prove that the named mailbox exists at that site, nor that it is accepting emails - ultimately you'd need to send an email to that address including a click back link / password to establish whether the email address is valid.
Update
While, as the top voted answer here says, the best way to validate an ADDR_SPEC is to send a token to the address to be submitted back via the web, this is not of much help if the data is not coming from the person whom controls the mailbox, and the action is dissociated from the primary interaction even when they do. A further consideration is that an email address which is valid today might not be tomorrow.
Using a regex (and an MX lookup) is still a good idea to provide immediate feedback to the user, but for a complete solution you also need to monitor the bounces.

Related

Detect keyboard mashed email addresses

We're trying to reduce our email bounce rate and often we get people mashing their keyboards. Here's a few example "email addresses" on our suppression list:
aaaaaaaaaaaaaaaa5#hotmail.com
aaaa_a#hotmail.com
991022865#gmail.com
725668844#gmail.com
82665#gmail.com
81c3988a#mailna.me
I wonder if it's possible to write some sort of php function to tell how likely the first part of an email is to be "mashed"?
Edit: We do send confirmation emails and run them through a 'validator'. Having a confirmed email is not necessary to use the site though and I'd rather not send them a confirmation email if it's super-likely to be mashed. Trying to reduce the bounce rate on confirmation emails, so we can reduce our costs of sending emails to those that are confirmed!
Rather than trying to fight against your users, ask yourself why users are mashing their keyboard rather than providing a real e-mail address. Are you making the e-mail address mandatory, even though the user gets no value from giving it to you? Do people think they're getting no value, because you haven't explained why you need it? Is it already optional, but it's not clear to the user that they can skip that field on the form?
Bear in mind that the users who are mashing the keyboard probably also have a negative impression of your site: "Why am I being asked for an e-mail address? Are they going to spam me? I'll just fill in junk!"
You say "having a confirmed email is not necessary to use the site", so maybe you just have a UX problem here: users think they must enter an e-mail address in order to proceed, so enter junk. Either your validation is wrong (marking the field mandatory when it should be optional) or your labelling is poor (not making clear which fields are mandatory and which are optional).
This sounds like a case for using a service such as this one if hard bounces are becoming a cost issue. Obviously you need to weigh up the costs to you to find out if it is worth it.

How reliable is Symfonys2 EmailValidator checkMX?

Like in title, how reliable is this check.
https://github.com/symfony/symfony/blob/3.0/src/Symfony/Component/Validator/Constraints/EmailValidator.php#L139-L142
Every single server in the world have their MX record exposed to the world?
Or is there a possibility that there exists a server that hides their MX record and even if email will be valid, check will fail.
UPDATE:
I have already checked Symfony documentation and source.
I know and I've tested that only emails domain is checked, not the user part.
I just don't know how reliable it is. Is it always possible to check servers MX records.
This validator only check if the DNS user in the email is valid for example if you set this email fail#google.com it will be validated even this email doesn't exit.
http://symfony.com/doc/current/reference/constraints/Email.html#checkmx
The checkMX option is done using PHP's checkdnsrr function, along with the checkHost option. You can also set a strict option and include the egulias/email-validator library for tighter restrictions. Using those options should be completely sufficient and reliable in determining if an email address is valid.
From experience, It seems like a good idea in theory and in practice will work most of the time but there will be occasions where the MX lookup fails due to network issues etc, and then the email will return as invalid, this then causes users who have inserted their correct email to become frustrated.
This in turn then causes developers to have to spend time looking at what went wrong.
Additionally, if a user is putting in a fake email, all they need do is use a real domain and fake user (like fbjdsbafjkbsdjafj #gmail.com) so it is of limited usefulness.
For 99.9% of cases it will be sufficient to check for email well-formedness without mx lookup using an established email checking library, (rather that roll-your-own)

Setting up an anonymous email system that logs IPs

I'm looking to set up a whistleblowing/anonymous tip website, but I've run into some problems. The basic idea is that you navigate to a splash page, fill in a few fields (name and location optionally, and then the message), then fire it off. At that point the message gets sent to a specific email inbox so that our team can look at it.
I've done a bit of research and PHP seems like my best bet, but I would also like to be able to log IP addresses for every message (or, more ideally, append them to the email before it is sent) so that I can be sure I'm not getting trolled or spammed. Can anyone point me in the right direction with this? I'm kind of a PHP noob, but willing to learn.
Thanks!
The remote IP address will be available within your php script using the super global $_SERVER['REMOTE_ADDR']. You can append that to your mail.
Just to mention: If you log the ip address of the sender, you kind of miss something important if you want the sender to be ANONYMOUS. Because if you log the ip, then this is not really the case anymore.
Problem
Spambots most of the times have a network of computers(hacked!) so blocking IP addresses most of the times does not work. Also I would like to point out the probably some legimate user who is not aware of the malware on his PC can't use your service because you are blocking his IP address. Otherwise CAPTCHA's were NOT necessary at all and Google, Yahoo! would not be using them at all because as you most likely know these images are hard to read sometimes.
Solution
You should just have a good spam filter(GMail's works very good) in place and use Akismet to detect spam-messages instead. They have very decent libraries in place so that you don't have to do any coding at all and it is going to work a lot better, then what you were about to implement.

Is there a way to check if the email is EXISTED using php?

I am getting more and more spam emails recently. I already validate my email using regular expression, all the emails must be something like this: xxx#xxxx.xxx
But the problem i have now is, there are alot spammers, type hsdjsdhgf#gmail.com, iluvhahahahah#yahoo.com, these emails are not existed, because i tried to send email to them.
How to avoid those email?
You're barking up the wrong tree. The better way to stop spam is by filtering them out in other ways from the form. If you are not a fan of CAPTCHAs like reCAPTCHA, you should look into what is known as "Honeypots". Essentially, add an extra field to your form with a common name like "email" and hide it with CSS. Mark it as "leave blank" for anyone browsing with styles disabled. If the field has a value in it, it is most likely a bot, so throw the submission out. Voila. They work really well for your average mid-size website that spammers don't really care to specifically set out to beat. Check out this related question.
All things considered, I like the honeypots because it is usually enough to deter 99% of your spamming while not making your average user have to do anything. This is important.
To quickly answer your original question: the only way to verify the email is valid is to actually send out an email to that address and see if it bounces. It is generally not worth the hassle, however.
If they are signing up for a newsletter or something that you will email them more than once I like to send a confirmation email to them which they must click a link to verify that it is a real email. If they dont confirm after a few days then you know you can delete it. If they do confirm then you know that it is a real person. You will still get bounce backs when you first email them but it will eliminate reoccurring bounce backs.
I also like CAPTCHAs or if you want something simpler ask them to enter the first letter of the title of your website (or some other word that will always remain the same on the page), it works for smaller to mid sized sites but is mostly effective.
Short answer:
Use a CAPTCHA.
Long answer:
Well, you could do an MX lookup, where you find the server responsible for delivering email for their given domain, then query the mail-server to see if the address is valid. In a perfect world, this would be the ideal way to validate email addresses.
Unfortunately, as an anti-spam measure, most mail servers these days will respond to such a query with either all positives or all negatives.
This leaves us with really only one practical solution: CAPTCHAs.

How do I protect my forum against spam?

I have a forum on a website I master, which gets a daily dose of pron spam. Currently I delete the spam and block the IP. But this does not work very well. The list of blocked IP's is growing quickly, but so is the number of spam posts in the forum.
The forum is entirely my own code. It is built in PHP and MySQL.
What are some concrete ways of stopping the spam?
Edit
The thing I forgot to mention is that the forum needs to be open for unregistered users to post. Kinda like a blog comment.
In a guestbook app I wrote, I implemented two features which prevent most of the spam:
Don't allow POST as the first request in a session
Require a valid HTTP Refer(r)er when posting
One way that I know which works is to use JavaScript before submitting the form. For example, to change the method from GET to POST. ;) Spambots are lousy at executing JavaScript. Of course, this also means that non-Javascript people will not be able to use your site... if you care about them that is. ;) (Note: I don't)
In my experience, the best easy defenses come from just doing something "non-standard". If you make your site non-standard, this makes it so that any automated spam would have to be coded specifically for your site, which (no offense) probably isn't worth the effort. Note that if the spam is coming from human spammers, there's not really anything you can do that won't also stop legitimate posters. So the goal is to find a solution that will throw away any "standard" posts - that is, "fill out the whole form and push submit".
A couple examples that come to mind of things that you could try:
Have a hidden form field with a name that sounds like something a spammer would want to fill out, like "website" or "homepage" or something like that. If the form field gets filled out, throw away the message instead of posting it, because it was a bot automatically filling in the whole form, even invisible fields.
You don't have to use a "real" captcha, but even something simple like "Enter the following word backwards: <random backwards word>" or "What is the domain name of this website?". Easy for a human to do, but it would require a fairly complex bot to figure out what to fill in.
You could use a captcha, there are some good scripts like PHPCaptcha or use a spam control service, like Akismet, they have a PHP API.
You might want to look at this question, which has several answers that describe how you could implement a non-intrusive captcha.
Another thing to consider is to require time between posts to prevent massive spamming.
Include a CAPTCHA that is always "orange".
The spams may be by bots or humans - bots are more likely.
To stop the bots, put in a hidden field populated by Javascript - there is a 99.5% chance that a standard, stupid bot that isn't customised to your site will fail to fill that in.
If they fail to fill it in correctly, give them a message that Javascript is required or something, and give them an opportunity to post some other way (e.g. with a captcha or registration). That way anonymous users who aren't spambots can (mostly) still post with no problems, and most spambots (which haven't been tailored for your specific site) won't.
Don't bother blacklisting IP addresses or using third party blacklists, that will just generate false positives. Almost all bots use the same IP addresses as (some) legitimate users.
Another trick is to put in a text field with a plausible sounding name, which is made difficult to see with CSS - anyone filling this field in with anything is considered to be a bot.
Advanced solutions:
Akismet
Defensio
Sblam! (open source clone of the above)
You can try your luck with non-standard form:
fields that must stay empty hidden with CSS
fields with misleading names, e.g. <input name=email> for something that is not an e-mail.
For me CAPTCHA is like giving up to spammers and letting them damage your forum anyway – except that instead of spam damage, you get usability and accessibility damage.
Something I've found to be surprisingly effective: disallow comments that contain too many URLs (more than, say, 5). Since doing that, I've had zero comment spam.
Edit: Since writing the above, I've had recurring comment spam with only one link. I have now added some honeypot fields and have had no commend spam for a few months now.
Don't let anybody post until they respond to an email sent to their registered email address. You'll see lots of forums and mailing lists generate a unique email address or web url that is sent to the new user's given email address, and they have to respond to the email or click on the link to finalize their registration.
Captcha is definitely the easiest method - try KittenAuth if you want something bot-proof (Although I got pandas this time)
Kitten Auth
There is no single answer since Spam is really a matter of economics: how much is it worth it to someone to put their stuff onto the web. There, however, some solutions that seem pretty good
Recaptcha
Use CCS to create an invisible
field that robots fill-in
Create a time-specific hidden field in your form so the
robot can't use the same form over and over again.
I want to say that in most time, a CAPTCHA is enough for you to prevent SPAMers.
But do use a strong one, like http://www.captcha.net/.
Remember that SPAMers do not want to spend much time to deal with a particular site(except heavy traffic sites), they use a tool to post AD on a lot of sites. So make your FORM a little unusual, (e.g. give the user a image says '1.5+2.4=?' and let users to answer, this will block most of the spam tools :) )
The easiest thing I've done to stop spammers with (so far) 100% consistency is to validate the text that was submitted. If you use the php function strstr() to check for "a href" or even a non-clickable http or www, you can then just reroute the spammer elsewhere. I actually have a script then write to my .htaccess file to deny the offending IP address. Not sure if there's any other kind of spam to be concerned about, but links are all I've seen so far.

Categories