Storing email addresses anonymously

Storing email addresses anonymously - php

I run a service where users can log in, but I will never have a need to send an email to them. I try to keep user data as anonymous a possible. I'm not interested in user tracking, selling data, etc. I know there will be simpler solutions to this question, such as "don't use email addresses in the first place" but they make a good login identifier because they are GUIDs. My service goes though the process of having the user verify the address, that's the only email I'll ever send.
So I had the idea of storing the addresses anonymously. My first thought was to simply store the SHA512 hash of each address, but in the event of a breach - which I believe my security would prevent - technically somebody could use rainbow tables to recover at least some of the addresses.
To use a salted hash, I need some way to narrow down the potential result list so I don't compute hashes for every user for every login. That won't scale. To achieve that, my idea was to store the first 5 characters of the SHA512 of the email. That wouldn't be a unique value of course, but it gives me a smaller pool of potential matches. Technically, this all works great.
My concern though is this is still vulnerable to rainbow tables. Those 5 characters are enough to look up possible inputs, and the attacker would already know that only inputs that look like email addresses would be valid. They'd still have enough to determine the email address given the first part of an unsalted hash and entire salted hash.
Am I overthinking this though? For the record, I'm using pgsql and php in this case, but that's really an implementation detail.
Update: I'm still not sure if I'm going to go ahead with this, but for anybody curious, the problem with rainbow tables here can be solved rather easily. Rather than hashing the whole email and taking the first few characters of the hash, use the first few characters of the email as the hash input and store the whole hash. It achieves the same effect, but at best the rainbow table will only reveal the first few characters.

To me, I think yes. You are over-looking.
no matter how strong your structure is, there is always a small chance of breach as nobody is perfect and no can be the human made script.
I think you should go for the best option you think it is and then stick to it.
Some things are best left to fate.
Good Luck

I think you're overthinking this. You stated that you don't need to email the users down the road, so my question back to you is why do you need to store the email at all? You mention that it's a good GUID, but if you're that concerned about data security, would it not be easier to let users define a username upon email verification?
Basically, I picture an ephemeral usage of the email, where it's never stored in the database, and only used to send a validation email. This would allow you to send a custom one-time-use link to the email, which would allow your user the chance to create a custom login name, which you could validate against your database to make sure it is unique.
You could then safely store this unique identifier without the concern that it would lead to email insecurity.
All of that said, I don't think any of it is necessary. As you said, email is an excellent GUID. What makes it an excellent GUID is that it is so widely known and available. The risks associated with the release of a plaintext email are far fewer and less damaging than the risks of a plaintext password. I believe our time as developers is better left securing the private data, and not the public data.

Related

Why are emails not commonly hashed when stored in db?

With almost weekly news about databases being pilfered I am wondering why only passwords are hashed and not emails too? To be clear, I mean hashed with a static salt, which is stored somewhere other than the database.
Obviously, it's just one step among many. But as part of a multi-faceted security setup (ie - PDO, not rolling your own hasher, rate limiting, etc etc) why is it not more common to hash the email? Regarding logins (+ password reminder emails, etc) you could simply do a regular compare. Surely user emails should be treated more respectfully?
I have read a number of similar questions on SO / sister sites but am really unconvinced as to how this is not an idea that should be adopted more frequently?

Because you usually need to be able to read the email address at a later date. Not just verify its value.
Passwords are not used for anything but validation so you don't need to know it's actual value so long as you have a way to validate that value. Comparing hashes allows you to do that.
Emails addresses are actually used for something. Like, sending emails. You can't do that unless you can actually read the email address.

Secure way to allow users to log in with a single unique code (no username)?

I am creating a website whereby users are given an account by invitation only, and are sent a unique code by post. Users can then log in (at least the first time) by entering the code only.
The goal of this is for it to be extremely easy to understand and use by non tech-savvy people.
User accounts will contain name, email, maybe address if the user wants to add it. No other sensitive information.
The site itself would not be of interest to anyone other than those invited, and will not be indexed by search engines.
If you imagine the users are receiving a piece of mail in the post which says something along the lines of:
Please visit www.example.com
Log in with your unique code:
A6XH3
As for the code, it must be extremely easy to remember and enter.
I was planning four or five upper case alphanumeric characters - e.g. A6XH3 - because I don't want anyone to have to enter a long hash or complicated string. I think 6 characters is the limit that I would deem acceptable for people to enter in this format.
An alternative idea I had was to use two/three easy to spell words, such as [adjective] [noun] which would be more fun and seem less "techy" to the users - e.g. pretty blue flower - which would be more in keeping with the spirit of the site.
Caveat
Website administrators must be able to see all the users' codes in plain text, so they can mail them out in the first place and/or offer support to anyone unable to log in. They may also need to generate a new code for some reason, and tell the person directly.
Is there any alternative to storing the codes as plain text in the database?
Questions
Is this secure enough for the context? i.e. The only people who know about the site are those invited, and there is no real motive for anyone else to try to force their way in.
Would you use either of my methods of unique code generation, and if not what would you suggest as a better solution?
Is there another way I could allow a simple login without compromising security or simplicity of use without a username?
Reminder
There is NO registration process and users don't choose their own code. Their account is created by the website administrator, and the site randomly generates a unique code for them.

Is this secure enough for the context? i.e. The only people who know about the site are those invited, and there is no real motive for anyone else to try to force their way in.
Not really, as it would allow an attacker (disregard the notion of 'no motive to force their way in') to brute force a login - just like any other login system, apart from in this instance you'd only have to try four or five upper case alphanumeric characters and not an e-mail and a password that adheres to various character sets.
Of course, you could do the following to help prevent a brute-force;
Add a captcha to fill on every login request
Two-factor authentication via SMS or E-Mail.
Would you use either of my methods of unique code generation, and if not what would you suggest as a better solution?
Both methods are fine in my opinion, as it's just like a site not enforcing "strong" password character sets. However, the reason for the code to be in plain text is no different than storing passwords (in the conventional sense) in plain text - you just don't.
Generate the random code
Send it to the user (securely)
Encrypt the code and store in the database
Is there another way I could allow a simple login without compromising security or simplicity of use without a username?
Simplicity is a relative term as it depends on your users. I would strongly recommended adding two-factor authentication via SMS or e-mail as outlined to my answer to your first question.
You could also use social media APIs to login. You'd then be giving the security to the social media platform and the user (without holding all the security concerns on your end, to some degree).
To raise points in your question that weren't explicitly defined as a question.
Website administrators must be able to see all the users' codes in plain text, so they can mail them out in the first place and/or offer support to anyone unable to log in
No. I see no reason why you'd need any human interaction, nor have a site administrator to see the passcodes in plain text - anything your administrators can see, a hacker can see.
When a user is unable to login, they should verify their identity via e-mail or SMS or security questions (or all three?), and have a new code generated for them via the system and sent to the user. The new passcode should be immediately encrypted and saved into your database.
User accounts will contain name, email, maybe address if the user wants to add it. No other sensitive information.
Any data that can be used to identify someone (for example their name, email and address) is considered sensitive.

Ultimately, no. Authenticating with a single piece of information is dangerous. I touched on this subject when I covered securely implementing "remember me" checkboxes. Your database lookups are going to leak timing information and allow attackers to trivially guess a valid code. (And implementing constant-time search algorithms is not a good idea.)
Having an authentication mechanism based solely on one value is a very bad idea. Always have two inputs: one for database lookups, the other for constant-time validation.
In most authentication systems, the username is used for the database lookup:
$userData = $pgsql->dbQuery("SELECT * FROM accounts WHERE username = ?", array(
$_POST['username']
));
...and the password is, ideally, compared outside of the DB query:
if(\password_verify($_POST['password'], $userData[0]['passwordhash'])) {
/* good password */
}
Aside from timing leaks (which may lead to timing attacks), having only one factor means that you can't benefit from a per-user salt without evaluating every single user in your database (which would be an enormous performance drag with a sufficiently large number of users).
With these requirements, you have to do something like:
$result = $pgsql->dbQuery("SELECT * FROM accounts WHERE password = ?", array(
hash($algo, $_POST['password'])
));
...which goes completely against best practices.
My advice: Bite the bullet and either use two pieces of information (an identifier and an authenticator), or eschew authentication completely and work with OAuth, OpenID, SQRL, Mozilla Personas, etc. Feel free to implement this if you really want to, but it will not be secure.

There are many good ways of doing this, if you don't want to change the unique code. You could use API's for IP location and then you could create like a pattern of the places where the user has logged in from and then once you have enough data of the location, IP, maybe user agent? or even ISP you could use an algorithm to determine any alterations in pattern you collected, and then block that user account temporary till he/she confirms it was them using the account?
This is just an idea, its kind of complex and probably to extreme for some people, but that's what I would try to do if I just waned a login system based on a unique key.

You can do it the same way, as you would do a with a password-reset page:
Let the user register with his email.
Send a link with a token to the user, a hash of the token is stored in the database.
If the user clicks the link and if the token is valid, welcome him and let him enter his own password.
If you send the user a link with a token, (s)he can simply click this link and does not have to enter a code anywhere. The token can then be a strong token like:
http://www.example.com/register/8eM2WwsuR59MnmyswYoQ
In the database you should store only a hash of this token, though if the token is strong, the hash can be unsalted and the algorithm can be fast like SHA256. When you implement it this way, you also have the password-reset for free.

How to prevent users from posting their password?

Does anyone have any ideas on how you could prevent a user from posting their password on a site using php?

You could entirely forbid (for passwords) using dictionary words, names, dates or anything other sequence of characters that people might use in a conversation. Then, for every message, loop over every word in the message, hash it, then compared it to your store of hashed passwords.
This would be require a lot of CPU, and be easy to bypass though.
If people want to be idiots and tell other people their account details, you can't stop them.
To save them from phishing, you can only educate them.
To save yourself from multiple people sharing a single account, you can only look for patterns which suggest the account is being shared (such as being logged in from many different IP addresses at once — but be careful as people may access a service from a computer and phone at the same time, or use an ISP that slaps a rotating proxy in front of its users).

You can store the password in session or where ever you want and try to match the password when the user post a comment but I wouldn't do that because:
you will end up storing the clear password somewhere which is really bad
you will clutter your code with hundred of useless checks
I think in this case the prevention is the way to go, just make a nice blurb on your registration and login pages that user shouldn't give their password, post it on site.
The other way to go is like BeemerGuy mentionned jsut hire some humans to moderates the comments on your website.

Did you ever see it happen? Probably not!
Why:
Or you need to save the password as plain-text (lucky hackers!).
Or you need to hash each word to compare it to the hashed password, very expensive.
So you just can't do it properly!
Don't even try it, explain the users why it is bad and just hope they don't post it ...

Secure login with proper authentication in PHP

How do I write/put together a secure login in PHP? The website developer guide said I shouldn't roll my own, so referring to samples available via Google is useless.
How do you pros do it? Lets say you're building a world-class app in rails, would the same libraries / techniques be usable here?
Thanks

In Rails, one would generally use a pre-existing library. Authentication is easy to do wrong, and the problem's been solved so many times that it's rarely worth the effort to solve it again. If you are interested in writing your own implementation, then I'll describe how modern authentication works.
The naive method of authenticating a user is to store their password in a database and compare it to the password the user submits. This is simple but unbelievably insecure. Anyone who can read your database can view anyone's password. Even if you put in database access controls, you (and your users) are vulnerable to anyone who hacks around them.
Proper form is to use a cryptographic hash function to process the password when it is chosen and then every time it is submitted. A good hash function is practically irreversible -- you can't take a hash and turn it back into a password. So when the user logs in, you take the submitted password, hash it, and compare it to the hash in the database. This way, you never store the password itself. On the downside, if the user forgets their password, you have to reset it rather than send it to them.
Even this, however, is vulnerable to certain attacks. If an attacker gets hold of your password hashes, and knows how you hash your passwords, then he can make a dictionary attack: he simply takes every word in the dictionary and hashes that word, keeping it with the original. This data structure is called a rainbow table. Then, if any of the dictionary word hashes match a password hash, the attacker can conclude that the password is the dictionary word that hashes to that password. In short, an attacker who can read your database can still log in to accounts with weak passwords.
The solution is that before a password is hashed, it is combined (usually concatenated or xor'd) with a value called the salt which is unique to each user. It may be randomly generated, or it may be an account creation timestamp or some such. Then, an attacker cannot use a rainbow table because every password is essentially hashed slightly differently; he would have to create a separate rainbow table for every single distinct salt (practically for each account), which would be prohibitively computationally expensive.
I will echo the advice of the other answerers: this is not simple stuff, and you don't need to do it because it's been done before, and if you do it yourself you stand a very good chance of making a mistake and inadvertently compromising your system's security. But if, for whatever reason, you really, really want to write one yourself, I hope that I have provided an (incomplete!) outline of how it's done.

The Zend Framework has an 'Auth' module which would be a good place to start. Or, if your site will be hosting an install of WordPress or PHPBB, there are ways of leveraging those technologies' authentication modules to sign in to other pages of a site.

One thing to look at when you are trying to authenticate is what is your real goal.
For example, on SO I use my google login, and that works, as they just need to know who I am, and they can trust that Google has an idea. So, if that model will work for you, then look at using OpenID, as there are various tools for that.
If you must do your own, then there will be various tests to ensure that it is secure, again, depending on how paranoid you want to be.
Never trust anything from the user, unless you have used some strict verification.
Use https to help protect the password of the user, you owe them that much.
I will end my response here as Thom did a fantastic response.

by Soulmerge:
I think the accepted answer in your other question states it pretty well. Hash the passwords with a salt. Other than that, there are some security ideas on the transport layer:
Use https when sending passwords. This makes sure nobody can catch them on the wire (man-in-the-middle attack or the client uses an evil proxy)
An alternative is to hash the password using javascript when the login form is submitted. This makes sure that the password is never transported in plaintext. You should hash the hashed value again with a salt on the server. (md5($_POST['postedPwHash'] . $salt))

a good method to somewhat secure the client-server transaction (if no ssl is available) is to use a one-time random key to create a unique hash from the credentials, then only send that unique hash to the server. the server then compares this hash to its own generated hash instead of comparing it to the real credentials. this would provide a good defense against the man-in-the-middle attack. the downside is that to do this the user must have JS enabled (at least i dont know of a good method to encrypt client-side data without it). this means that you will need a sufficient fallback when it isn't on. you can even create the form in JS to make sure its enabled.
this library is a simple library i wrote once that does the procedure i described, though it probably needs some improvements.
note that this is in addition to using "salting" methods and other server-side security measures. it is also quite vulnerable to dictionary attacks as the entire hashing process is by definition procedural, predictable and visible to the user (as JS always is).

My answer is "Don't do it"
This is a very complex area, full of potential security gotcha's. If you are not an expert in this field, then you are really just asking for trouble and problems down the road.
I would recommend looking at getting an existing solution to do. Sadly I don't know any that I would be happy to recommend, other than openid. I'm sure you will get some good suggestions here though...

How to best store user information and user login and password

I'm using Mysql and I was assuming it was better to separate out a users personal information and their login and password into two different tables and then just reference them between the two.
Note : To clarify my post, I understand the techniques of securing the password (hash, salt, etc). I just know that if I'm following practices from other parts of my life (investing, data backup, even personal storage) that in the worst case scenario (comprised table or fire) that having information split among tables provides the potential to protect your additional data.

Don't store passwords. If it's ever sitting on a disk, it can be stolen. Instead, store password hashes. Use the right hashing algorithm, like bcrypt (which includes a salt).
EDIT: The OP has responded that he understands the above issue.
There's no need to store the password in a physically different table from the login. If one database table is compromised, it's not a large leap to access another table in that same database.
If you're sufficiently concerned about security and security-in-depth, you might consider storing the user credentials in a completely separate data store from your domain data. One approach, commonly done, is to store credentials in an LDAP directory server. This might also help with any single-sign-on work you do later.

The passwords should be stored as a cryptographic hash, which is a non-reversible operation that prevents reading the plain text. When authenticating users, the password input is subjected to the same hashing process and the hashes compared.
Avoid the use of a fast and cheap hash such as MD5 or SHA1; the objective is to make it expensive for an attacker to compute rainbow tables (based on hash collisions); a fast hash counteracts this. Use of an expensive hash is not a problem for authentication scenarios, since it will have no effect on a single run of the hash.
In addition to hashing, salt the hash with a randomly generated value; a nonce, which is then stored in the database and concatenated with the data prior to hashing. This increases the number of possible combinations which have to be generated when computing collisions, and thus increases the overall time complexity of generating rainbow tables.
Your password hash column can be a fixed length; your cryptographic hash should output values which can be encoded into a fixed length, which will be the same for all hashes.
Wherever possible, avoid rolling your own password authentication mechanism; use an existing solution, such as bcrypt.
An excellent explanation of how to handle passwords, and what you need to concern yourself with, can be found at http://www.matasano.com/log/958/enough-with-the-rainbow-tables-what-you-need-to-know-about-secure-password-schemes.
As a final note, please remember that if an attacker obtains access to your database, then your immediate concern should probably be with any sensitive or personally-identifying information they may have access to, and any damage they may have done.

There's nothing wrong with putting them in the same table. In fact, it would be much faster, so I'd highly recommend it. I don't know why you'd want to split it up.

I'll attempt to answer your original question. Having it all in one table is fine unless you just have a lot of personal information to gather. In that case it may make sense to split it up. That decision should be made based on the amount of personal information you're dealing with and how often it needs to be accessed.
I'd say most of the time I'd do something like this in a single table:
UserID, FirstName, LastName, Email, Password, TempPassword
But... if you're gathering much more than that. Say you're gathering phone, fax, birth date, biography, etc, etc. And if most of that information is rarely accessed then I'd probably put that in its own table and connect it with a one-to-one relationship. After all, the fewer columns you have on a table, the faster your queries against that table will be. And sometimes it makes sense to simplify the tables that are most accessed. There is a performance hit with the JOIN though whenever you do need to access that personal information, so that's something you'll have to consider.
EDIT -- You know what, I just thought of something. If you create an index on the username or email field (whichever you prefer), it'll almost completely eliminate the performance drawback of creating so many columns in a user table. I say that because whenever you login the WHERE clause will actually be extremely quick to find the username if it has an index and it won't matter if you have 100 columns in that table. So I've changed my opinion. I'd put it all in one table. ;)
In either case, since security seems to be a popular topic, the password should be a hash value. I'd suggest SHA1 (or SHA256 if you're really concerned about it). TempPassword should also use a hash and it's only there for the forgot password functionality. Obviously with a hash you can't decrypt and send the user their original password. So instead you generate a temporary password they can login with, and then force them to change their password again after login.

Will all of this data always have a 1:1 relationship with the user? If you can forsee allowing users to have multiple addresses, phone numbers, etc, then you may want to break out the personal info into a separate table.

First, to state the (hopefully) obvious, if you can in any way at all avoid storing usernames and passwords do so; it's a big responsibility and if your credential store is breached it may provide access to many other places for the same users (due to password sharing).
If you must store credentials:
Don't store a reversible form; store a hash using a recognized algorithm like SHA-256. Use cryptographic software from a reputable trustworthy source - DO NOT ATTEMPT TO ROLL YOUR OWN, YOU WILL LIKELY GET IT WRONG.
For each credential set, store a salt along with the hashed data; this is used to "prime" the hash such that two identical passwords do not produce the same hash - since that gives away that the passwords are the same.
Use a secure random generator. Weak randomness is the number one cause of encryption related security failures, not cipher algorithms.
If you must store reversible credentials:
Choose a good encryption algorithm - AES-256, 3DES (dated), or a public key cipher. Use cryptographic software from a reputable trustworthy source - DO NOT ATTEMPT TO ROLL YOUR OWN, YOU WILL LIKELY GET IT WRONG.
For each credential set, store a salt (unencrypted) along with the encrypted data; this is used to "prime" the encryption cipher such that two identical passwords do not produce the same cipher text - since that gives away that the passwords are the same.
Use a secure random generator. Weak randomness is the number one cause of encryption related security failures, not cipher algorithms.
Store the encryption/decryption key(s) separately from your database, in an O/S secured file, accessible only to your applications runtime profile. That way, if your DB is breached (e.g. through SQL injection) your key is not automatically vulnerable, since that would require access to to the HDD in general. If your O/S supports file encryption tied to a profile, use it - it can only help and it's generally transparent (e.g. NTFS encryption).
If practical, store the keys themselves encrypted with a primary password. This usually means your app. will need that password keyed in at startup - it does no good to supply it in a parameter from a script since if your HDD is breached you must assume that both the key file and the script can be viewed.
If the username is not necessary to locate the account record encrypt both the username and password.

In my personal experience, storing the personal information and the login information in individual databases is the best practice in this case. The reason being should an SQL injection take place, it is limited (unless the infiltrator knows the inner layout of your database(s)) to the table that the data pertains to, as opposed to providing access to the whole conglomerate of data.
However, do note that this may come at the expense of needing to perform more queries, hence a performance hit.

You ought to store them in the same table, and use one-way encryption. MD5 will work, but is weak, so you might consider something like SHA1 or another method. There's no benefit to storing the 2 items in seperate tables.

We Keep Coding

PHP, A popular general-purpose scripting language that is especially suited to web development.