Detect keyboard mashed email addresses

Detect keyboard mashed email addresses - php

We're trying to reduce our email bounce rate and often we get people mashing their keyboards. Here's a few example "email addresses" on our suppression list:
aaaaaaaaaaaaaaaa5#hotmail.com
aaaa_a#hotmail.com
991022865#gmail.com
725668844#gmail.com
82665#gmail.com
81c3988a#mailna.me
I wonder if it's possible to write some sort of php function to tell how likely the first part of an email is to be "mashed"?
Edit: We do send confirmation emails and run them through a 'validator'. Having a confirmed email is not necessary to use the site though and I'd rather not send them a confirmation email if it's super-likely to be mashed. Trying to reduce the bounce rate on confirmation emails, so we can reduce our costs of sending emails to those that are confirmed!

Rather than trying to fight against your users, ask yourself why users are mashing their keyboard rather than providing a real e-mail address. Are you making the e-mail address mandatory, even though the user gets no value from giving it to you? Do people think they're getting no value, because you haven't explained why you need it? Is it already optional, but it's not clear to the user that they can skip that field on the form?
Bear in mind that the users who are mashing the keyboard probably also have a negative impression of your site: "Why am I being asked for an e-mail address? Are they going to spam me? I'll just fill in junk!"
You say "having a confirmed email is not necessary to use the site", so maybe you just have a UX problem here: users think they must enter an e-mail address in order to proceed, so enter junk. Either your validation is wrong (marking the field mandatory when it should be optional) or your labelling is poor (not making clear which fields are mandatory and which are optional).

This sounds like a case for using a service such as this one if hard bounces are becoming a cost issue. Obviously you need to weigh up the costs to you to find out if it is worth it.

Related

Letting users define sender of an email

a client asked me about a little form for his website, from which it would be possible to mail the URL to someone. Something like "Hey check this out".
Since he was not happy with mailto:, I want to use PHP mail() function, but i wonder if it is smart to let users define a sender of the email. I am worried about the form being abused for spam/phishing.
Is that a reason to worry? Is it even legal?

It's legal to send e-mail. It's not legal (everywhere) to send spam. But you are just providing a share link, not a relay server, so I wouldn't worry about that. If you limit the amount of control over the content of the message, and limit the number of people to send it to, it won't be too interesting for spammers.
Letting the user choose a sender is not a very good idea. Some mail relay servers check if the originating server is allowed to send e-mails for the domain specified in the address, so the mails might never arrive. You can safely set the sender name, though.
Apart from that, if the receivers of the message consider it as spam and report it, your domain might become blacklisted, and your mails will be sent to junk mail in many cases, so you want to make sure no (or little) spam is sent through your form.
Those bots try every form automatically just to see what happens, so you'll need to make some effort. You could add a captcha, which is an obstacle for humans too, although Google is going to put an end to that. Or you could protect it through other means, like a honeypot. Maybe you can just generate the form through JavaScript, which is a big obstacle for most spam bots.

Setting the From on an email in php mail isn't the cause for concern. The problem is that you'll be sending emails from your server. The mail headers will have your server information embedded - so any issues will tie back to you.
As long as you can safeguard your own server from allowing these spam/phishing attacks, then there's nothing wrong with it.
Just limit the number of people this mail function can send to - and make sure it can't be called multiple times in succession -- like with a script.
This way, the spammers wouldn't benefit from using your page to try to send spam. They'll go elsewhere.
There's much more to do to work with sending email, but this will at least get you started.

"Is it legal" depends upon the country you are in.
I don't think you need to worry about spam if you set up a login.
Or you could limit the number of emails by IP address. This can be spoofed, however, so it may not be the best option.
There are other control options you could do; limit number of emails by User Agent/IP combination, etc.

Aside from the reasons pointed out by others who have answered this question, I would advise against doing this because these messages will likely be marked as spam by spam filters, due to SPF and DMARC records.
For example, is someone sends a message through your system from a yahoo.com address, most spam filters will treat the message as spam, because os Yahoo's DMARC record, which basically says, 'any message sent from a yahoo.com email address that did not originate from a mail server on yahoo's network is spam'. See https://help.yahoo.com/kb/mail/SLN24016.html?impressions=true for more info.

Coding email addresses directly in HTML is bad, right? What is a better easy solution?

I have a site that will display a large number of members publicly with their contact info. It is a bad idea to simply spit their emails up on the page in HTML due to spamming, etc., right?
So I am sure there a a thousand ways to deal with this. And I have seen recommendations ranging from using "name[at]blah[dot]com" (which just doesn't seem that much more secure). I guess the logical step would be to utilize phpmail or swiftmailer or something along those lines? Both look like they are right at the edge of my PHP skills and would cause me some headaching before I could get them to work right.
Is there an easier solution? Can anyone suggest one with the best simple-to-effective ratio?
I am grabbing the info from member-entered SQL data.

The better solution to displaying their emails or creating a mailto: link would be to add a "Click Here to Email" button that gives the user a simple form (From address, Subject, Body). When this form is posted, create the email server-side and send it to the recipient.
If you implement the above system, do not simply output the recipient's email address into a hidden input field. Instead, write out a unique identifier and then look up the email address on the backend. This way, the recipient's address is never available to be scraped by a bot or malicious user.

What do you want: send a mail or display mail? If send, use PHP's mail function, it's easy to use, won't cause headache. If you want to display only, first consider if you really need to display. If you really need, then either replace # and . with [] or something like this - not sure whether robots know this or not or even safer: display them only on click via AJAX and PHP. Spambots and mail-collecting-bots cannot click.

How to correct email address domains which are misspelled?

Sometimes users misspelled their email domain and hence they enter wrong email address.
Eg. abc#gmial.com rather than abc#gmail.com
Has anybody thought about this before? Can anybody suggest how to handle this type of mistakes?

It didn't exist when this question was asked, but I recommend MailCheck which auto-suggests corrections to entered emails. It's used successfully by large companies.

Can anybody suggest how to handle this type of mistakes?
You would usually send a confirmation E-Mail to the address given, and proceed only if a link in that E-Mail has been clicked.
There is no other good way to deal with this - it's impossible to tell for sure whether gmial.com is a typo or not, seeing as it's a valid domain.

Create a list of common email domain names:
hotmail.com
gmail.com
googlemail.com
... etc
When a user enters an email address, take the domain name of the entered address and take the Levenstein distance between your list. If the distance is 1 (or maybe up to 2) then ask the user to confirm that's the email address they meant.

In my opinion it is bordering on impossible to come up with a generic solution for the generic case.
That being said, the most common typo is to interchange two adajcent letters.
So you might want to check for character content for the largest sites gmail, yahoo and what have you; Based on that suggest an alternative spelling if the original does not match gmail etc.
Do not assume the user is at fault, suggest alternatives if it looks suspicious compared to common names. A white-list was mentioned in another reply.
Use confirmation mails if you need to know you can get a reply from this address.
You cannot assume the spelling you find is in error, that is what confirmation mails are for.
Make it very non-obtrusive (ajax springs to mind).

In our forms we're using a combination of techniques. While bad data can still slip through, the chances are vastly reduced.
First is to do a simple formatting regex that is commonly available - just be sure it's RFC-compliant. If this fails, it's good to offer the user a confirmation form at this point, because they may catch other errors for you while fixing this problem.
The next part is to check the TLD part of the domain. Since all TLDs can be known, these are relatively easy to scan for misspellings using some regex tests. Just keep a list of all current TLDs in a table somewhere and update it form time to time as needed (mind you, this list can get complex when dealing with international TLDs. If you're only dealing with US traffic, the rules are much easier, and that's something else you can filter out. For example, if you're selling a service only available in the US, it would make sense to filter out international emails at form submission time. We are, so this works for us).
Third is to do something like what #npclaudiu suggested - scan for common misspellings of big-name mail hosts (gmail, hotmail, yahoo, etc) in the domain part and if a possible hit is detected, offer a confirmation form to the user. (You entered someone#hptmail.com, did you mean hotmail.com?)
If you get through those steps, then you can do the MX lookup suggested by #symcbean.
Finally, if all of that succeeds, there is a method (but I've not yet tested it) for communicating with the remote SMTP host to see if the mailbox exists. We're about to begin testing this ourselves. I found the how-to for such here:
http://www.webdigi.co.uk/blog/2009/how-to-check-if-an-email-address-exists-without-sending-an-email/

The funny thing is that the url does exist http://www.gmial.com
In fact it would be very difficult for you to know if it's a mistake or just a "strange" domain. Look at the Google API's because when you type something wrong in Google they propose you "did you mean...."
good luck
Arnaud

You can not provide this functionality in a way that you auto correct the misspelled email domain names, because the name which you are assuming to be invalid, would be valid. you should expect anything to be entered as a email address domain name.
I would suggest, if you are creating a signup form, you provide user with a dropdown having all possible domain names which you are aware of so that he can make a selection from that.
Hope this helps.

You could create a list of popular e-mail domains (gmail.com, yahoo.com, ymail.com, etc) in your db and validate the e-mail address that the user inputs against this list, and if the domain resembles with one of these domains, you should show a warning and allow the user to correct it if necessary, not auto correct it. And to compare the domain entered with the domains in your list, you might use an algorithm like the the one used in the soundex function in SQL Server, that matches words based on if one word sounds like the second.
Edit: you can find more details the SOUNDEX function here.

As mentioned before, it is not a good idea to automatically assume that someone has mistyped an email. A better approach would be to implement a little javascript function that checks if the domain of the email was possibly mistyped and alert the user instead of assuming they were wrong from the start.
Give me a minute to create a little mockup.
EDIT: OK, so maybe it was more than a minute. Take a look at http://jsbin.com/iyaxuq/8/edit and see for yourself how javascript can help prevent common typing errors. Try emails like: test#gmail.cmo, another#yhaoo.com, loser#htomali.ocm (typo of hotmail), and me#aol.com.
Note: I used a lazy regex to validate the email. Don't rely on it (or for that matter, most regexes) for a real app.

Trying to automate correction of bad data is a very dangerous practice. Ultimately, only the user can provide the correct data. However there are strict rules about formatting an email address - a regex check can be run in javascript (or using the preg functions with the same regex syntax) - but note that there are a lot of bad examples on the internet of regexes claiming to solve the problem.
This should be a fairly complete implementation of an RFC2822 ADDR_SPEC validator:
/[a-z0-9!#$%&'*+/=?^_`{|}~-]+(?:\.[a-z0-9!#$%&'*+/=?^_`{|}~-]+)*#(?:[a-z0-9](?:[a-z0-9-]*[a-z0-9])?\.)+[a-z0-9](?:[a-z0-9-]*[a-z0-9])?/gi
However in practice I find this to be adequate:
/^[a-z0-9\._%+!$&*=^|~#%'`?{}/\-]+#([a-z0-9\-]+\.){1,}([a-z]{2,22})$/gi
Then, serverside, you can do an MX lookup to verify that the domain provided not only meets the formatting requirements but exists as an email receiving site.
This does not prove that the named mailbox exists at that site, nor that it is accepting emails - ultimately you'd need to send an email to that address including a click back link / password to establish whether the email address is valid.
Update
While, as the top voted answer here says, the best way to validate an ADDR_SPEC is to send a token to the address to be submitted back via the web, this is not of much help if the data is not coming from the person whom controls the mailbox, and the action is dissociated from the primary interaction even when they do. A further consideration is that an email address which is valid today might not be tomorrow.
Using a regex (and an MX lookup) is still a good idea to provide immediate feedback to the user, but for a complete solution you also need to monitor the bounces.

E-mail verification through e-mail & PHP?

I have seen on some sites where the user can simply send a blank e-mail to something like verify#domain.com to have their e-mail verified if they are having trouble getting the verification e-mail. I have a website with PHP/MySQL that I'd like to implement this same functionality, but I haven't done much with e-mail besides sending it so I don't even know where to start to set something like this up.

Basically if your mailbox is an IMAP you could reference these functions via PHP http://www.php.net/manual/en/ref.imap.php (if enabled, check your phpinfo()) and read that specific mailbox (http://www.php.net/manual/en/function.imap-open.php).
Run a cronjob every 10 minutes maybe (I say 10minutes as I dont see many people doing this), loop thru all the emails (if any), run your logic to verify that email account, send them an email to say its been verified, then delete that email item from your account so you are not creating a massive backlog of emails.
Its risky way of wanting someone to verify but this is probably one way of doing it.

If your host allows you to, you can pipe received email to a program (in your case, a PHP script), which could then parse the message and decide what to do.
However, I agree that this isn't very secure. It would be very easy to spoof the sender, unless you implement DomainKey checking or DNS lookups.

Is there a way to check if the email is EXISTED using php?

I am getting more and more spam emails recently. I already validate my email using regular expression, all the emails must be something like this: xxx#xxxx.xxx
But the problem i have now is, there are alot spammers, type hsdjsdhgf#gmail.com, iluvhahahahah#yahoo.com, these emails are not existed, because i tried to send email to them.
How to avoid those email?

You're barking up the wrong tree. The better way to stop spam is by filtering them out in other ways from the form. If you are not a fan of CAPTCHAs like reCAPTCHA, you should look into what is known as "Honeypots". Essentially, add an extra field to your form with a common name like "email" and hide it with CSS. Mark it as "leave blank" for anyone browsing with styles disabled. If the field has a value in it, it is most likely a bot, so throw the submission out. Voila. They work really well for your average mid-size website that spammers don't really care to specifically set out to beat. Check out this related question.
All things considered, I like the honeypots because it is usually enough to deter 99% of your spamming while not making your average user have to do anything. This is important.
To quickly answer your original question: the only way to verify the email is valid is to actually send out an email to that address and see if it bounces. It is generally not worth the hassle, however.

If they are signing up for a newsletter or something that you will email them more than once I like to send a confirmation email to them which they must click a link to verify that it is a real email. If they dont confirm after a few days then you know you can delete it. If they do confirm then you know that it is a real person. You will still get bounce backs when you first email them but it will eliminate reoccurring bounce backs.
I also like CAPTCHAs or if you want something simpler ask them to enter the first letter of the title of your website (or some other word that will always remain the same on the page), it works for smaller to mid sized sites but is mostly effective.

Short answer:
Use a CAPTCHA.
Long answer:
Well, you could do an MX lookup, where you find the server responsible for delivering email for their given domain, then query the mail-server to see if the address is valid. In a perfect world, this would be the ideal way to validate email addresses.
Unfortunately, as an anti-spam measure, most mail servers these days will respond to such a query with either all positives or all negatives.
This leaves us with really only one practical solution: CAPTCHAs.

We Keep Coding

PHP, A popular general-purpose scripting language that is especially suited to web development.