Checking email addresses for Spam Honeypots

Checking email addresses for Spam Honeypots - php

Is there an API (preferably PHP based) or another means of checking a use-inputted email address against a list of known email honeypots or other email-address related spam stopping techniques?
Context: I'm working on a system to handle contacts for our clients. It'll eventually interface with Verticalresponse or similar. I want to check all incoming contact email addresses to be sure they're legit and not a purchased list.

You may not find such a database, or a PHP interface. Project Honeypot alone has 62,782,527 trap addresses on their monitor. That's 62 million addresses.
Anyone can make a spam trap. Check for example these references to get a picture of the futility of detecting honeypot addresses.
What is a honeypot
Project Honeypot home page
So what can you do to check that your customers' lists are legitimate? Use an check list for evaluating list contents. If you see anything like removethis, or any such common strings fooling spam harvesters, the list you're looking is probably not a good one. Also, a DNS check for the existence of any records for the domain is also a good way to see to which addresses it is possible to try to deliver mail. The DNS check won't tell you if your client has bought the list or not, but it will at least allow you to disregard those recipients.

Related

How reliable is Symfonys2 EmailValidator checkMX?

Like in title, how reliable is this check.
https://github.com/symfony/symfony/blob/3.0/src/Symfony/Component/Validator/Constraints/EmailValidator.php#L139-L142
Every single server in the world have their MX record exposed to the world?
Or is there a possibility that there exists a server that hides their MX record and even if email will be valid, check will fail.
UPDATE:
I have already checked Symfony documentation and source.
I know and I've tested that only emails domain is checked, not the user part.
I just don't know how reliable it is. Is it always possible to check servers MX records.

This validator only check if the DNS user in the email is valid for example if you set this email fail#google.com it will be validated even this email doesn't exit.

http://symfony.com/doc/current/reference/constraints/Email.html#checkmx
The checkMX option is done using PHP's checkdnsrr function, along with the checkHost option. You can also set a strict option and include the egulias/email-validator library for tighter restrictions. Using those options should be completely sufficient and reliable in determining if an email address is valid.

From experience, It seems like a good idea in theory and in practice will work most of the time but there will be occasions where the MX lookup fails due to network issues etc, and then the email will return as invalid, this then causes users who have inserted their correct email to become frustrated.
This in turn then causes developers to have to spend time looking at what went wrong.
Additionally, if a user is putting in a fake email, all they need do is use a real domain and fake user (like fbjdsbafjkbsdjafj #gmail.com) so it is of limited usefulness.
For 99.9% of cases it will be sufficient to check for email well-formedness without mx lookup using an established email checking library, (rather that roll-your-own)

Letting users define sender of an email

a client asked me about a little form for his website, from which it would be possible to mail the URL to someone. Something like "Hey check this out".
Since he was not happy with mailto:, I want to use PHP mail() function, but i wonder if it is smart to let users define a sender of the email. I am worried about the form being abused for spam/phishing.
Is that a reason to worry? Is it even legal?

It's legal to send e-mail. It's not legal (everywhere) to send spam. But you are just providing a share link, not a relay server, so I wouldn't worry about that. If you limit the amount of control over the content of the message, and limit the number of people to send it to, it won't be too interesting for spammers.
Letting the user choose a sender is not a very good idea. Some mail relay servers check if the originating server is allowed to send e-mails for the domain specified in the address, so the mails might never arrive. You can safely set the sender name, though.
Apart from that, if the receivers of the message consider it as spam and report it, your domain might become blacklisted, and your mails will be sent to junk mail in many cases, so you want to make sure no (or little) spam is sent through your form.
Those bots try every form automatically just to see what happens, so you'll need to make some effort. You could add a captcha, which is an obstacle for humans too, although Google is going to put an end to that. Or you could protect it through other means, like a honeypot. Maybe you can just generate the form through JavaScript, which is a big obstacle for most spam bots.

Setting the From on an email in php mail isn't the cause for concern. The problem is that you'll be sending emails from your server. The mail headers will have your server information embedded - so any issues will tie back to you.
As long as you can safeguard your own server from allowing these spam/phishing attacks, then there's nothing wrong with it.
Just limit the number of people this mail function can send to - and make sure it can't be called multiple times in succession -- like with a script.
This way, the spammers wouldn't benefit from using your page to try to send spam. They'll go elsewhere.
There's much more to do to work with sending email, but this will at least get you started.

"Is it legal" depends upon the country you are in.
I don't think you need to worry about spam if you set up a login.
Or you could limit the number of emails by IP address. This can be spoofed, however, so it may not be the best option.
There are other control options you could do; limit number of emails by User Agent/IP combination, etc.

Aside from the reasons pointed out by others who have answered this question, I would advise against doing this because these messages will likely be marked as spam by spam filters, due to SPF and DMARC records.
For example, is someone sends a message through your system from a yahoo.com address, most spam filters will treat the message as spam, because os Yahoo's DMARC record, which basically says, 'any message sent from a yahoo.com email address that did not originate from a mail server on yahoo's network is spam'. See https://help.yahoo.com/kb/mail/SLN24016.html?impressions=true for more info.

PHP mail send & blacklist [closed]

Closed. This question does not meet Stack Overflow guidelines. It is not currently accepting answers.
Questions asking for code must demonstrate a minimal understanding of the problem being solved. Include attempted solutions, why they didn't work, and the expected results. See also: Stack Overflow question checklist
Closed 9 years ago.
Improve this question
I want to make an app with php, this app has a cron job and send mail daily.
Is there any possibility of falling to the Blacklist?(Domain)
PS: Mail only send people who confirmed app.
Sorry for my language i hope i explained it correctly.

Yes, your server (IP address) and/or domain name can become blacklisted for many reasons.
If you automatically send lots of emails (for example a forum summary "What postings are new today?"), chances are high that one day some providers will block your mails or put them into the "spam folder".
A few ideas on reasons for mails being blocked / treated as spam:
Sending lots of mails at once => Providers can recognize lots of very similar mails incoming in a short amount of time. That can be interpreted as a mass bulk mailing.
No correctly configured reverse DNS record for the IP address of the mail sending server
E-mail script and/or MTA not following standards / rules (for example wrong HELO, mistakes with the mail headers, ...)
Receivers of your mails (Customers) can click a "Treat this as spam" button. Many email providers offer such a button to allow their users to flag spam mails.
No "Unsubscribe" link in your mass mail. If your newsletters / notifications don't contain an "Unsubscribe" link (for example in the footer), provider-side filter software might award a negative rating.
Wrong SPF record. If your domain has an SPF record in the DNS, many providers will treat mails as spam if they don't originate from an allowed server (named in the SPF record).
Bad text to URL ratio. If your mass mails mainly contain links but not much text, filter software might declare it as link spam.
Scripts or other users on your webserver (shared hosting environment) really send spam (evil users or software vulnerabilities exploited by hackers). => The entire server IP or even the entire IP range of your provider can become blacklisted in DNSBLs.
Attachments featuring dangerous file formats (EXE, COM, PIF, SCR, ...) will cause mails to be blocked in many cases.
Keyword filters can block certain words like "Casino", names of certain pharmaceuticals, ...
Embedded JavaScript, VBScript, images from remote servers, flash or java applets, ... can negatively incluence your mail arrival rate.
One approach might be using a different server (different IP address) than the web app server for your mass mailings and/or marketing mails. If a provider blocks the mails from your mass mail server's IP address, at least the important mails from your app server (e.g. registration confirmation mails), won't be affected.

Sending huge amount of email will get you in trouble, no doubt.
Email marketing is no easy to handle, there are a lot of things to keep track of, you need multiple delivery servers in order to avoid blacklisting(and all these servers must be legit, don't fool around), all your email must be correctly formatted and you must follow the CAN-SPAM act, otherwise you are a spammer.
But that's not all, what happens with bounced emails? you can't just keep sending to invalid email addresses, have you even taken this into consideration?
How about giving the right to a user to unsubscribe from you without later marking you as a spammer ?
All these are steps that you need to take BEFORE you even send a single email and sometimes even if you follow best practices you will still get blacklisted, that's the way things work for now since the amount of spam is too damn high.
You might want to take a look at a solution like MailWizz EMA that has taken into consideration all the above (disclaimer, i am the author) or any other solution that deals with email marketing, since the people behind these applications know a thing or two about the way things work in the email industry.

How to correct email address domains which are misspelled?

Sometimes users misspelled their email domain and hence they enter wrong email address.
Eg. abc#gmial.com rather than abc#gmail.com
Has anybody thought about this before? Can anybody suggest how to handle this type of mistakes?

It didn't exist when this question was asked, but I recommend MailCheck which auto-suggests corrections to entered emails. It's used successfully by large companies.

Can anybody suggest how to handle this type of mistakes?
You would usually send a confirmation E-Mail to the address given, and proceed only if a link in that E-Mail has been clicked.
There is no other good way to deal with this - it's impossible to tell for sure whether gmial.com is a typo or not, seeing as it's a valid domain.

Create a list of common email domain names:
hotmail.com
gmail.com
googlemail.com
... etc
When a user enters an email address, take the domain name of the entered address and take the Levenstein distance between your list. If the distance is 1 (or maybe up to 2) then ask the user to confirm that's the email address they meant.

In my opinion it is bordering on impossible to come up with a generic solution for the generic case.
That being said, the most common typo is to interchange two adajcent letters.
So you might want to check for character content for the largest sites gmail, yahoo and what have you; Based on that suggest an alternative spelling if the original does not match gmail etc.
Do not assume the user is at fault, suggest alternatives if it looks suspicious compared to common names. A white-list was mentioned in another reply.
Use confirmation mails if you need to know you can get a reply from this address.
You cannot assume the spelling you find is in error, that is what confirmation mails are for.
Make it very non-obtrusive (ajax springs to mind).

In our forms we're using a combination of techniques. While bad data can still slip through, the chances are vastly reduced.
First is to do a simple formatting regex that is commonly available - just be sure it's RFC-compliant. If this fails, it's good to offer the user a confirmation form at this point, because they may catch other errors for you while fixing this problem.
The next part is to check the TLD part of the domain. Since all TLDs can be known, these are relatively easy to scan for misspellings using some regex tests. Just keep a list of all current TLDs in a table somewhere and update it form time to time as needed (mind you, this list can get complex when dealing with international TLDs. If you're only dealing with US traffic, the rules are much easier, and that's something else you can filter out. For example, if you're selling a service only available in the US, it would make sense to filter out international emails at form submission time. We are, so this works for us).
Third is to do something like what #npclaudiu suggested - scan for common misspellings of big-name mail hosts (gmail, hotmail, yahoo, etc) in the domain part and if a possible hit is detected, offer a confirmation form to the user. (You entered someone#hptmail.com, did you mean hotmail.com?)
If you get through those steps, then you can do the MX lookup suggested by #symcbean.
Finally, if all of that succeeds, there is a method (but I've not yet tested it) for communicating with the remote SMTP host to see if the mailbox exists. We're about to begin testing this ourselves. I found the how-to for such here:
http://www.webdigi.co.uk/blog/2009/how-to-check-if-an-email-address-exists-without-sending-an-email/

The funny thing is that the url does exist http://www.gmial.com
In fact it would be very difficult for you to know if it's a mistake or just a "strange" domain. Look at the Google API's because when you type something wrong in Google they propose you "did you mean...."
good luck
Arnaud

You can not provide this functionality in a way that you auto correct the misspelled email domain names, because the name which you are assuming to be invalid, would be valid. you should expect anything to be entered as a email address domain name.
I would suggest, if you are creating a signup form, you provide user with a dropdown having all possible domain names which you are aware of so that he can make a selection from that.
Hope this helps.

You could create a list of popular e-mail domains (gmail.com, yahoo.com, ymail.com, etc) in your db and validate the e-mail address that the user inputs against this list, and if the domain resembles with one of these domains, you should show a warning and allow the user to correct it if necessary, not auto correct it. And to compare the domain entered with the domains in your list, you might use an algorithm like the the one used in the soundex function in SQL Server, that matches words based on if one word sounds like the second.
Edit: you can find more details the SOUNDEX function here.

As mentioned before, it is not a good idea to automatically assume that someone has mistyped an email. A better approach would be to implement a little javascript function that checks if the domain of the email was possibly mistyped and alert the user instead of assuming they were wrong from the start.
Give me a minute to create a little mockup.
EDIT: OK, so maybe it was more than a minute. Take a look at http://jsbin.com/iyaxuq/8/edit and see for yourself how javascript can help prevent common typing errors. Try emails like: test#gmail.cmo, another#yhaoo.com, loser#htomali.ocm (typo of hotmail), and me#aol.com.
Note: I used a lazy regex to validate the email. Don't rely on it (or for that matter, most regexes) for a real app.

Trying to automate correction of bad data is a very dangerous practice. Ultimately, only the user can provide the correct data. However there are strict rules about formatting an email address - a regex check can be run in javascript (or using the preg functions with the same regex syntax) - but note that there are a lot of bad examples on the internet of regexes claiming to solve the problem.
This should be a fairly complete implementation of an RFC2822 ADDR_SPEC validator:
/[a-z0-9!#$%&'*+/=?^_`{|}~-]+(?:\.[a-z0-9!#$%&'*+/=?^_`{|}~-]+)*#(?:[a-z0-9](?:[a-z0-9-]*[a-z0-9])?\.)+[a-z0-9](?:[a-z0-9-]*[a-z0-9])?/gi
However in practice I find this to be adequate:
/^[a-z0-9\._%+!$&*=^|~#%'`?{}/\-]+#([a-z0-9\-]+\.){1,}([a-z]{2,22})$/gi
Then, serverside, you can do an MX lookup to verify that the domain provided not only meets the formatting requirements but exists as an email receiving site.
This does not prove that the named mailbox exists at that site, nor that it is accepting emails - ultimately you'd need to send an email to that address including a click back link / password to establish whether the email address is valid.
Update
While, as the top voted answer here says, the best way to validate an ADDR_SPEC is to send a token to the address to be submitted back via the web, this is not of much help if the data is not coming from the person whom controls the mailbox, and the action is dissociated from the primary interaction even when they do. A further consideration is that an email address which is valid today might not be tomorrow.
Using a regex (and an MX lookup) is still a good idea to provide immediate feedback to the user, but for a complete solution you also need to monitor the bounces.

Using MX records to validate email addresses

Scenario:
I have a contact form on my web app, it gets alot of spam.
I am validating the format of email addresses loosely i.e. ^.+#.+\..+$
I am using a spam filtering service (defensio) but the spam scores returned are overlapping with valid messages. At a threshold of 0.4 some spam gets through and some customer's questions are wrongly thrown in a log and an error displayed.
All of the spam messages use fake email addresses e.g. zxmzxm#ywduasm.com
Dedicated PHP5 Linux server in US, mysql, logging spam only, emailing the non spam messages (not stored).
Proposal:
Use php's checkdnsrr(preg_replace(/^.+?#/, '', $_POST['email']), 'MX') to check the email domain resolves to a valid address, log to file, then redirect with an error for messages that don't resolve, proceed to the spam filter service as before for addresses that do resolve according to checkdnsrr().
I have read (and i am sceptical about this myself) that you should never leave this type of validation up to remote lookups, but why?
Aside from connectivity issues, where i will have bigger problems than a contact form anyway, is checkdnsrr going to encounter false positives/negatives?
Would there be some address types that wont resolve? gov addresses? ip email addresses?
Do i need to escape the hostname i pass to checkdnsrr()?
Solution:
A combination of all three answers (wish i could accept more than one as a compound answer).
I am using:
$email_domain = preg_replace('/^.+?#/', '', $email).'.';
if(!checkdnsrr($email_domain, 'MX') && !checkdnsrr($email_domain, 'A')){
//validation error
}
All spam is being logged and rotated.
With a view to upgrading to a job queue at a later date.
Some comments were made about asking the mail server for the user to verify, i felt this would be too much traffic and might get my server banned or into trouble in some way, and this is only to cut out most of the emails that were being bounced back due to invalid server addresses.
http://en.wikipedia.org/wiki/Fqdn
and
RFC2821
The lookup first attempts to locate an MX record associated with the name.
If a CNAME record is found instead, the resulting name is processed as if
it were the initial name.
If no MX records are found, but an A RR is found, the A RR is treated as
if it was associated with an implicit MX RR, with a preference of 0,
pointing to that host. If one or more MX RRs are found for a given
name, SMTP systems MUST NOT utilize any A RRs associated with that
name unless they are located using the MX RRs; the "implicit MX" rule
above applies only if there are no MX records present. If MX records
are present, but none of them are usable, this situation MUST be
reported as an error.
Many thanks to all (especially ZoogieZork for the A record fallback tip)

I see no harm doing a MX lookup with checkdnsrr() and I also don't see how false positives may appear. You don't need to escape the hostname, in fact you can use this technique and take it a little further by talking to the MTA and testing if the user exists at a given host (however this technique may and probably will get you some false positives in some hosts).

DNS lookups can be slow at times, depending on network traffic & congestion, so that's something to be aware of.
If I were in your shoes, I'd test it out and see how it goes. For a week or so, log all emails to a database or log file and include a field to indicate if it would be marked as spam or legitimate email. After the week is over, take a look at the results and see if it's performing as you would expect.
Taking this logging/testing approach gives you the flexibility to test it out and not worry about loosing customer emails.
I've gotten into the habit of adding an extra field to my forms that is hidden with CSS, if it's filled in I assume it's being submitted by a spam bot. I also make sure to use a name like "url" or "website_url" something that looks like a legitimate field name to a spam bot. Add a label for the field that says something like "Don't fill out this field" so if someone's browser doesn't render it correctly, they will know not to fill out the spam field. So far it's working very well for me.

function mxrecordValidate($email){
list($user, $domain) = explode('#', $email);
$arr= dns_get_record($domain,DNS_MX);
if($arr[0]['host']==$domain&&!empty($arr[0]['target'])){
return $arr[0]['target'];
}
}
$email= 'user#radiffmail.com';
if(mxrecordValidate($email)) {
echo('This MX records exists; I will accept this email as valid.');
}
else {
echo('No MX record exists; Invalid email.');
}

//The Code *https://davidwalsh.name/php-email-validator*
function domain_exists($email, $record = 'MX'){
list($user, $domain) = explode('#', $email);
return checkdnsrr($domain, $record);
}
if(domain_exists('user#davidwalsh.name')) {
echo('This MX records exists; I will accept this email as valid.');
} else {
echo('No MX record exists; Invalid email.');
}

An MX Lookup is only part of the picture, if you want to ensure the email address is itself valid, then you need to attempt to send an email to that account.
The other possible scenario is, someone can be simply using hijacked email accounts from a compromised machine anyway. Of course, that is probably a little bit less likely to occur, but it still does.
There are email address validation libraries out there that do this, simply search for email validation.
All of this can be done asynchronously. I have this setup on my site in which case the email is saved in the database (for auditing purposes), a job queued, then when the job comes time to execute, any additional validation is performed at that point in time. It offloads the heavy lifting to another thread.
To the user, it appears as if the email was sent already, it was (it's in the database), and can be viewed internally, but the actual email won't get mailed out until that job executes which can be immediately or some set amount of time depending on the server load.
Walter

We Keep Coding

PHP, A popular general-purpose scripting language that is especially suited to web development.