Probably a stupid question, but I need to ask anyway.
I'm working on a research, which involves emailing fake phishing emails to participants.
At the beginning, I would have a database of email addresses.
Because of ethical considerations, I would like somehow to hash the email addresses in a way, that later they would not be recoverable even if I want to.
For example:
I want to send an email to
john.doe#mail.com
The email would lead to a page, where I would collect some data (when was it visited, what did he did on the page), so basically I would store email address and its actions in a database.
I could store the hash of the email address in this database, so in the end I wouldn't have his address, but the problem is at a later stage I will need to email him a second time, and record those actions as well...
Now the problem is:
If I hash his email address and store it this way in the database, a
simple re-hash of the original database would reveal the recipient.
If I hash his email with a random salt, I could not link his old and
new actions together.
I need to be able to tell honestly that there
is no way I can link real email addresses and real people to the
database entries. (I just need the results anyway)
No, you can't keep the email address usable and also make it impossible to recover it. If you need to be able to decode/recover the email address to send an email at a later date, then there's no way to make it unrecoverable. That's a contradiction in terms. You would need to do something like use a third party to create per-user tokens, but then the third party would need to store the token and the email. There's no avoiding it: someone has to store the email.
The best solution is just to encrypt any sensitive data, including personally identifiable information (PII). If you want to be hyper-paranoid about it, you could throw away the key at the end of your project. But you have to keep it in the meantime, if you really need to be able to use the encrypted information (like the email address).
Also, be aware that what you are doing may have legal implications (both the sending of bogus phishing emails and the storage of PII). You should speak to a lawyer in whatever jurisdiction(s) is/are relevant.
Related
I want to run this by the group to get some ideas on how to improve security.
Long story short, I have a web app that when you send an email to xxx#mydomain.com, using php and imap, the script checks the email account and then saves that email into mysql to be used for other parts of the application. We take all necessary steps to properly sanitize the data to prevent mysql injections etc.
However, In order for the incoming email to be saved into mysql, your email address has to be approved first as to not allow just anyone to have their email saved into our database.
My question is, if a hacker wanted to, they could mask the "from" email address of an approved user and if they found out our secret email address to send to, they could then have their messages saved to our database, bypassing our security measures. Is there any way to prevent this?
For instance, let's say that an approved email is safe#approved.com. Is there a way to check with PHP if that the email sent to our mail server actually came from safe#approved.com or was it masked?
I have looked at gethostbyname() , but not exactly sure how to implement it while not creating a bunch of headaches for our legit users.
Any ideas would be much appreciated, thanks!
There is no simple way to verify that a From: header is legitimate. There are methods that can help increase confidence in it, though:
SPF records can be used to check that the originating server is authorized to send for that domain, though this won't help with the "local-part", or the individual sending.
DKIM signing can indicate that the actual address used is authorized by that server, something often included by default on most email platforms (e.g. Gmail).
Unless you do additional work to verify these headers you've got no way of knowing.
If you're expecting email from an unsigned source, with no SPF records, it's anyone's guess as to if that's legitimate or not.
This is why you'll often see services with a "mail in" end point use obfuscated delivery addresses, that is a secret address of sorts that can be used to communicate with the app or service. For example, Evernote uses this approach, giving a unique destination email for each user.
This provides at least a layer of security in that unless that address is leaked out, it's highly unlikely that some attacker could exploit that address. Anything sent there is probably from the authorized individual.
I run a service where users can log in, but I will never have a need to send an email to them. I try to keep user data as anonymous a possible. I'm not interested in user tracking, selling data, etc. I know there will be simpler solutions to this question, such as "don't use email addresses in the first place" but they make a good login identifier because they are GUIDs. My service goes though the process of having the user verify the address, that's the only email I'll ever send.
So I had the idea of storing the addresses anonymously. My first thought was to simply store the SHA512 hash of each address, but in the event of a breach - which I believe my security would prevent - technically somebody could use rainbow tables to recover at least some of the addresses.
To use a salted hash, I need some way to narrow down the potential result list so I don't compute hashes for every user for every login. That won't scale. To achieve that, my idea was to store the first 5 characters of the SHA512 of the email. That wouldn't be a unique value of course, but it gives me a smaller pool of potential matches. Technically, this all works great.
My concern though is this is still vulnerable to rainbow tables. Those 5 characters are enough to look up possible inputs, and the attacker would already know that only inputs that look like email addresses would be valid. They'd still have enough to determine the email address given the first part of an unsalted hash and entire salted hash.
Am I overthinking this though? For the record, I'm using pgsql and php in this case, but that's really an implementation detail.
Update: I'm still not sure if I'm going to go ahead with this, but for anybody curious, the problem with rainbow tables here can be solved rather easily. Rather than hashing the whole email and taking the first few characters of the hash, use the first few characters of the email as the hash input and store the whole hash. It achieves the same effect, but at best the rainbow table will only reveal the first few characters.
To me, I think yes. You are over-looking.
no matter how strong your structure is, there is always a small chance of breach as nobody is perfect and no can be the human made script.
I think you should go for the best option you think it is and then stick to it.
Some things are best left to fate.
Good Luck
I think you're overthinking this. You stated that you don't need to email the users down the road, so my question back to you is why do you need to store the email at all? You mention that it's a good GUID, but if you're that concerned about data security, would it not be easier to let users define a username upon email verification?
Basically, I picture an ephemeral usage of the email, where it's never stored in the database, and only used to send a validation email. This would allow you to send a custom one-time-use link to the email, which would allow your user the chance to create a custom login name, which you could validate against your database to make sure it is unique.
You could then safely store this unique identifier without the concern that it would lead to email insecurity.
All of that said, I don't think any of it is necessary. As you said, email is an excellent GUID. What makes it an excellent GUID is that it is so widely known and available. The risks associated with the release of a plaintext email are far fewer and less damaging than the risks of a plaintext password. I believe our time as developers is better left securing the private data, and not the public data.
With almost weekly news about databases being pilfered I am wondering why only passwords are hashed and not emails too? To be clear, I mean hashed with a static salt, which is stored somewhere other than the database.
Obviously, it's just one step among many. But as part of a multi-faceted security setup (ie - PDO, not rolling your own hasher, rate limiting, etc etc) why is it not more common to hash the email? Regarding logins (+ password reminder emails, etc) you could simply do a regular compare. Surely user emails should be treated more respectfully?
I have read a number of similar questions on SO / sister sites but am really unconvinced as to how this is not an idea that should be adopted more frequently?
Because you usually need to be able to read the email address at a later date. Not just verify its value.
Passwords are not used for anything but validation so you don't need to know it's actual value so long as you have a way to validate that value. Comparing hashes allows you to do that.
Emails addresses are actually used for something. Like, sending emails. You can't do that unless you can actually read the email address.
I have been running my website for a few months now and occasionally I find my activation isnt great. After the user signs up, they will receive an email which has an activation link provided.
I have a few problems and want to improve this if possible.
Firstly, the email sometimes doesnt arrive? Any reason for this?
How can I stop it going into the junk mail?
Secondly, at the moment, the activation is their username and an md5 of their username.
Is there a better way to do activations?
I'm always looking to improve and find better ways of doing things!
Thanks for your time.
Email doesn't arrive
First at all, you cannot really rely on mail. Never. Because you can't even know if it was received or read. A mail may be blocked as spam on server side, can be filtered on client side, or can just be lost or ignored.
There may be plenty of causes. For example, you may use e-mail authentication mechanisms. You may also start to check if there is reverse DNS for your domain.
Further, you may want to read some documentation and books to know how spam filters work. It will show you some obvious methods to reduce filtering of your mails, like sending mails in plain text instead of full-HTML, but also less obvious stuff like the words to use, etc.
If you have no choice and you must send mail, probably the most easy solution to prevent spam filtering would be to ask the users to add your domain to the list of safe senders. In practice, nobody will do it for you.
Activation through MD5
There is obviously a better way, since the one you implemented does not provide anything. If the activation is a hash from user name, you can as well just tell the users to calculate the hash themselves (thus avoiding all the problems with mails filtered as spam).
Normally, the users may not know what their activation code would be. It means that the activation code must be random or difficult to guess.
Generate a set of random characters, save them to database and send the code by mail. Then you would just need to validate the code against the one you keep in your database.
Some emails will always end up in the trash folder. It's probably best to put up a notice so that people know to check there, and make it possible for the user to re-request the activation email.
Using the MD5 hash of the username is not a very good idea because anyone can automate that. At the very least add some salt before hashing it, or even better, use a completely unrelated random token saved in your database.
For your second question, you may want to generate a random activation code and store it in a database. When the user clicks the activation link you could verify the code in the database using their e-mail address. This way a malicious user will have a more difficult time automating registration on your site.
$code = md5(uniqid(rand(), true));
If you're on a shared server, services like Yahoo are apt to label you spam. They want you to have a dedicated IP. It's almost impossible to get users to check the 1000 messages in their spam folders for your one activation message.
The MD5 hash is fine if you're hashing with a timestamp.
Keep this implementation, but supplement it with OpenID. That will take care of your Gmail and Yahoo users.
Yes, that's wrong. You shouldn't use MD5 for that.
The most popular way of do it is generating a rand code and saving it in the users table in the DB and send it by email as a GET parameter of the link.
About the emails, I would tell users to look in theit junk folders.
First problem: Make sure your mail isn't spammy. Follow the default guidelines for setting up mail... things like making sure you've got your SPF records configured, your mail is well-formatted, doesn't include spammy words. I generally test against Gmail, Hotmail and a server running SpamAssassin to check mails I send out; examine the headers to see if you're triggering any serious anti-spam rules.
Second problem: You'll want to make sure that the user cannot guess what his activation key is (thus removing the need for receiving the email). An MD5 of the username is insufficient for this. However, if you salt the MD5 you can easily prevent people from generating the MD5's in an automated way (that's an open invitation for automated signups). Adding Salt refers to adding a large amount of pregenerated random data to your input before hashing it. That way, the attacker can't lookup the hash in a 'rainbow table', as he no longer knows what the input for your hash was. Of course, you could just as well use a randomly generated string, which would probably be easier.
Another look on user registration. Let yourself inspire at stackoverflow and use OpenId and you don't have to care about user registration.
Update
You don't need to validate OpenId user via email. A user which signed up via Google or MyOpenId account is valid.
You don't have to care about questions if user is a bot? This servers did it already.
I have never got verification email from stackoverflow.
Mail arriving in the junk folder is a perpetual problem. The range of 'not looking like spam' strategies are numerous. Beyond the Junk folder I think that the overwhelming majority of reported 'not received' situations are actually just delays in propagating the email.
I'm currently implementing a resend for the activation email confirmation despite the fact that it should only actually be necessary in cases where the user has accidentally deleted the email and purged their trash or a transient error has discarded the mail. These cases are going to be rare but do exist so needed to be coded for.
I think the most important reason for implementing the resend of the activation confirm is customer service. It provides the user with an action that they can take while waiting for their mail and in the course of doing so and re-checking their email the activation email will eventually appear.
I wouldn't use the md5 as it creates too predictable a result. You want something that has a random or at least less predictable element. It is then problematic if you are invalidating the hash/token in the original email by resending a new mail so I would avoid overwriting the existing token and would instead re-use the same token which you should have stored or better stored the values from which it can be validated. This does constrain how you create the token as you want to be able to recreate it in the later resend mails or at least to be able to continue to validate all the inflight mails as valid. I am using a session aging model to resend the same token if that token is still valid. There is no reason why the user shouldn't see it as the same token and hence understand that they are all valid. In the case of an expired session/token a new one needs to be generated.
It's good practice to expire the activation mail token in case the mailbox falls into the wrong hands weeks or months later and the old mail is found. Assuming this can have some undesirable effect on the state of the users account at that later point.
I work at a hospital and have developed a way to estimate the total patient financial responsibility for services, after insurance has paid it's obligation, and before any services are rendered. A lot of patients are calling for quotes, and I wanted to find a secure way to email those results to the patient at their request.
I'm considering removing all patient information from the generated quote, so there would not be any security concerns, but would like to find a way to encrypt the email, send it, and allow the patient's email client to decrypt the email.
I'm not sure how to use security certificates, though they might be the best option for me, even though I'd have to jump through corporate hoops to be granted access to internet facing hosting for certificates, all applications other than email are hospital side only.
I'm also considering creating a PDF from the generated letter and encrypting the PDF, assigning their last four of their social, or some other private info they've shared with us during the quote generation process, as their password.
You would be better off sending a link to an SSL encrypted site that has all the information. It would not require any additional software on the client side, and would allow you to have a bit more control and accounting of who is accessing it.
You must of course secure it with username/password of some kind, you could even just use their social security + a generated hash sent in the email. The hash prevents a user from guessing random ssn's.
If you're employed by a hospital in the USA, you had better not try to email protected health information. (Similar things are true in other countries.) Even if you scrub the patient's name out of the message, you'll definitely have the patient's email address in the message (duh!). You'll most likely have diagnoses, dates of birth, dates of proposed care, medical record numbers, or account numbers. That's all protected data. Bad. Bad. See here for the regulations, which are rigid.
http://www.hhs.gov/ocr/privacy/hipaa/understanding/summary/index.html
If you want to do this, you must use TLS (https) security, and you must go to some length both to ensure that the person logging in to your secure web site is who they claim to be, and you must log accesses.
Please, if you value your job and your savings account, check with your hospital's privacy officer before sending emails with PHI in them. The ARRA 2009 law makes individuals personally liable for breaches even if they work for corporations. Plus, your hospital does NOT want its name in lights here.
http://www.hhs.gov/ocr/privacy/hipaa/administrative/breachnotificationrule/postedbreaches.html
You could use encrypted email, as long as the unencrypted part (e.g. the subject line) only said "here's the information you requested" or something like that. But, you know, many persons seeking medical care won't be able to cope with a complex addin to their mail client software.
The PGP company offers an encrypted email gateway system that some people with PHI use.
http://www.pgp.com/products/universal_gateway_email/index.html
But you should still check with your privacy officer.
I accomplished this about 10 years ago using PGP. GPG is a similar library.
These options may be way too involved for an older user though, as I believe they both involve the recipient installing a certificate of sorts on their end.
Might be a good place to start looking...
From what I know, this is essentially impossible unless the recipient is also using the same e-mail client. The problem is that even if you encrypt on your end, the recipient will received a garbage message simply because they don't have the functionality to decrypt.
While I was typing this, TomWilsonFL posted information on a possible encryption method, but you will still need to provide the recipient an application to decrypt the data.