What is the best algorithm to create a hash (as a verification id) for new users to verify their accounts?
MD5 seems to be the answer everywhere I look, but all the sources are at least 3 or 4 years old. I just want to make sure that MD5 is still the best option today...
I use the password_hash() function for passwords which I belive uses the Blowfish algorithm and adds a random salt, but is that necessary for a verification ID?
For this purpose, MD5 is just an 'encoding'. As long as the source value that you run the MD5 on is properly random, it can be safely used.
Any random (with a proper algorithm) 128 bit value will do file (either an GUID (as long as it is v4) or just base64 encoded crypto-random byte array, or MD5 on the same array).
You just have to make sure that it cannot be guessed. So mostly you would want to add some sort of invalid-token counter by IP address that blocks the access after certain number of times.
Also you would probably want to add some sort of expiration (like valid for 24 hours) for the code for the same reason.
MD5 is fine to generate a hash, if you just want to create verification tokens you can use something like:
$token = md5(uniqid(mt_rand(), true));
Don't use MD5 for passwords unless you are using a salt, and even then you should use a stronger bcrypt algorithm.
Also see Generating cryptographically secure tokens
Currently I'm just fooling around with PHP, but I came across an idea I want to expand on and need to know how secure it is in your opinion, and how I can improve it to make it acceptable for practical use.
This is how I store the passwords in the database:
plain text password -> hash password (I use whirlpool, but any method will practically work)->
shuffle/scramble the hashed password (with the str_shuffle() function).
I store the users password in the database like so, to make sure if the database is compromised, it would make it impossible for the attacker to reverse the broken password hash inside the database. (Because how can you reverse in a sense, random text that use to be a hash? - Although I'm sure you can create a list of possibilities by comparing a list of hashes that share the same chars.)
The way I check if the users password they entered on the login form is correct (compared to the broken hash in the database) is by counting the individual letters+numbers (a-f & 0-9) in both strings/passwords , and see if they match up, and if they do, I assume they're correctly logged in.
And again, I want to know how secure this is in your opinion, and how can I improve it to make it acceptable for practical use. (If possible.)
& I would also like to try my best to avoid a "reversible" hash. (i.e the idea of creating my own way of ensuring the passwords match, I want to make it more of an A best guess Assumption, to completely help ensure it will be impossible for an attacker to reverse the passwords in the database.
& Yes I know this is stupid because it most likely causes more security flaws rather then helps fix them. But this is just something I'm fooling around with, and maybe hope to make it practical.
OTHER INFO:
1) Passwords are stored with unique salts (so not 1 account shares the same salt)
2) Password salts are always changing (Each time a Successful Login happens with a users account, it will change the users salt in the database. I do this to change the hash in the datbase, causing a password collision to be less frequent (hopefully) & also to prevent unwanted users from using the same incorrect password multiple times to login (If they manage to come across one, only way to achieve this is by bruteforce or 'guessing' which any login system is vulnerable to).
When I say password collision, I mean the slightest chance that the word "hello" & "blue" share the same exact char count (as I explained, I count the individual chars + numbers, and compare them, to ASSUME its the correct password.)
3) I will also MAYBE keep the first 3chars/numbers of the hashed password unaffectedd by the str_shuffle, to also help ensure the passswords are correct. (By creating 2 checks, 1) check if both strings share the same FIRST 3 CHARS/Numbers & 2) Then compare the count of chars in each string. (Hoping to make password collisions, again, less frequent).
4) Obviously other security measures will be added (i.e max login attempts, captcha , etc.. to help protect against automated bruteforcing, to make it harder for a hacker to find a possible password or the real password.
I have made a successful PoC of this, and it works like a charm, although I have yet to test the PoC against a Dictionary Attack / Brute Force Attack, to see the chances of password collisions. & How frequent they are.
If I stated a lot of 'useless' information, ignore it. I'm just trying my best to explain this reasonably.
This seems terribly ineffective and insecure to me.
Most notably: Collisions. You mentioned that already in Other Info.
Just checking for the count of characters in the hashed & scrambled lets collision probability go through the roof. You enable one password to be also valid for all permutations of its hash. Considering the length of 128 characters in a whirlpool hash, this is a veeery large number.
So, basically, by allowing this, you allow a would-be bruteforcer to check many many thousand passwords at once, by entering a single one.
They will not gain permanent access to the system, since you said you alter the hash after each login, but the probability that they gain access ONCE is increased substantially.
Concerning the altered salt... how do you do that? I can't think of a way unless you apply the salt after hashing instead of before, which is not how a Salt works in hashing.
If you want to make it more secure then just use multiple hash iterations. Store the hashed password and the number of hash iterations. Every time the user logs in hash the hash again, store it, and increase the iteration count. This will change the stored hash sufficiently without introducing too many cryptographic weaknesses.
Your shuffling scheme will make the password less secure. Comparing the number of instances of letter and numbers after a shuffle increases the chance of two people having the same password value (collision, as you said).
The re-salting is something you could use. Each time the user successfully logs in, you can re-salt the password and save it again. This could be even better if you modified the PHP password procedure to use a hi-res time value, increasing the unique-ness. Essentially you're continuously rotating the salt of the password. You would have to save the clear password, compare its hash to the saved one, re-salt and hash the clear password and save again.
The output of a cryptographically strong hashing function is for all intents and purposes already pseudo-random. Attempting to add entropy by scrambling it does nothing. It does nothing to make the hash less "reversible", since the only way to "reverse" a hash is by choosing an input, hashing it, comparing it with the hash; that's the same thing you have to do when logging the user in, it's the same thing an attacker has to do, changing the comparison algorithm does not change this basic operation. (As others have pointed out, your weakened comparison algorithm actually aids an attacker.)
The accepted way to deal with this problem is already sufficient:
Make sure your input is unique by salting it with (pseudo) random noise, this forces an attacker to do actual brute force hashing.
Choose a hash that is slow (preferably bcrypt or scrypt, with a high enough cost factor that makes it feasible for you to do once, but infeasible for an attacker to do billions of times), this makes it computationally infeasible for an attacker to brute force a hash in his life time.
If both steps are done correctly, it's already infeasible to "reverse" a hash. No additional mind games needed.
Don't fiddle around with your idea any longer. It is insecure.
There are only about two ways for password security that provide a sufficient level of resistance against tampering:
Use a hardware security module executing something like HMAC-SHA1. The module is external hardware, the outside world does not know the internal secret (only available by physical access to the module) inside the module, and without that module the generated hashes will never be reconstructed. Being dedicated hardware with a "fast" hashing algorithm makes this a viable solution for lot's of password checks. See http://en.wikipedia.org/wiki/Hash-based_message_authentication_code for details.
Use very slow hashing algorithm. Things like "scrypt" or "bcrypt" will execute very slowly, thus hindering the fast bruteforce scan of list of passwords against a list of known hashes. PHP only has support for "bcrypt" at this time.
You may wonder why you should use external hardware encapsulating a secret. Simple: Anything that is accessible from the machine that is doing the hash can be stolen. Stealing the secret is like using the same salt (or none at all) for all keys: You end up "only" having a very fast hash algorithm with every other component known, and can start bruteforcing passwords right away.
So if there is no dedicated hardware, the only other option is a slow password hash algorithm.
There is a solution for PHP: password_compat is a library that reimplements the PHP password hash API for versions before PHP 5.5. If you are already using 5.5, you simply use these functions.
Given that a one time pad is unbreakable (to the best of my knowledge, please feel free to correct me), if I were to generate a pad, and use this same exact pad to encrypt passwords for a website when a user is created and store the encrypted password in my database, is this a safe method? In other words, is it ok to keep this same pad forever as long as no one ever sees what the pad is?
Should I instead use something like mcrypt?
What you would do with the one-time pad is encrypting the password. Encrypting passwords is not optimal, because however you do it, you will be able to decrypt the password. Your application itself must have access to the key (or the keys since every one-time pad can only encrypt a single password), so can do an attacker if he has enough privileges.
That's why we use hash functions to store passwords, they are one-way, you can check if an entered password results in the same hash, but you cannot get the original password back. PHP offers the function password_hash() to generate such hash-values, it handles all the pitfalls with generating random salts and uses the slow BCrypt to hash passwords.
The "one time" in one time page means that a given key is only used to encrypt a single plaintext. In other words, you have a separate pad for each item you need to encrypt. That's the thing that makes them unbreakable. Since those separate pads have to be stored somewhere, you are vulnerable. Instead, use a widely used and tested library (such as mcrypt) and encrypt your passwords using a salt
Suppose i assume if hash collision occur while i am using sha1 function in php .
Will this code avoids it permanently or do i have to use any other way
$filename=sha1($filename.'|'.microtime());
OR
$filename=sha1($filename.'|'.rand());
If no this code doesn't provide protection from hash collision .
What should i do to avoid any type of hash collision if i assume there can be more than 100,000 entries in db.
Its very unlikely that a hash collision will happen for sha1.
Probability of sha1 collision is negligible
And hash collision risk is not practical. No one has found sha1 collision till yet . So you are safe to use it.
Using a salt like microtime or random number may decreases the chances of probability but you simply can't avoid it.
And what you are using is sha1(string) whether string is a mixed value or single string.so using microtime and rand function wont affect anything to probability of hash collision.
Therefore there can be possibility that sha1(mixedvalue) collision might be equal or greater than collision of sha1(filename) so certainly that is of no use.
So dont worry and use this or simple way if you like to, it wont create problem in future, Thinking about hash collision is waste of time when the chances are very very very less.
Just to be clear, you can't completely avoid hash collisions. It's an infinite number of inputs to a finite number of outputs, but you can take into account things like the file's size, the current system time and other data to use as a salt which will increase the entropy of your message digests.
Just sha1() the entire file path, not only the file name.
Filename xy.png can be only one in a directory, therefore your hash will be unique for that filename.
Also, this has the advantage that you will not have duplicate files (while with rand()/microtime() you can get same file 10 times in same dir, and if it's a 1GB file can cause problems)
Neither of these avoid hash collision.
Hash collisions happen when you have an algorithm that generates a hash of a specific size, regardless of the starting value.
A hash collision is when two different values, like "mypassword" and "dsjakfuiUIs2kh-1jlks" end up generating the same hash because of the mathematical operations performed on them.
You can't write code to prevent hash collisions, how often that happens is dependent on the hashing algorithm you are using.
I'm looking at some code that I have not written myself. The code tries to hash a password with SHA512 and uses just time() as the salt. Is time() too simple a salt for this or is this code safe?
Thanks for the answers and comments. I will sum it up here for the new readers:
salt should be different for each user, so if 2 users register at the same time, their salts won't be unique. This is a problem, but not a big one.
but salt shouldn't be in any way related to the user, so time() is not a good salt.
"Use a random, evenly distributed, high entropy salt." -- That's a mouthful, so what code could possibly generate a random, evenly distributed, high entropy salt?
Ok, so how about I replace time() with a random string 32 char long. The random string could be generated from looping 32 times over a set of alphabet chars. Does that sound good?
Short answer:
No, time() is not a good salt.
Long answer:
copied from my answer to Salt Generation and open source software
What is a salt?
A salt is a random set of bytes of a fixed length that is added to the input of a hash algorithm.
Why is salting (or seeding) a hash useful?
Adding a random salt to a hash ensures that the same password will produce many different hashes. The salt is usually stored in the database, together with the result of the hash function.
Salting a hash is good for a number of reasons:
Salting greatly increases the difficulty/cost of precomputated attacks (including rainbow tables)
Salting makes sure that the same password does not result in the same hash.
This makes sure you cannot determine if two users have the same password. And, even more important, you cannot determine if the same person uses the same password across different systems.
Salting increases the complexity of passwords, thereby greatly decreasing the effectiveness of both Dictionary- and Birthday attacks. (This is only true if the salt is stored separate from the hash).
Proper salting greatly increases the storage need for precomputation attacks, up to the point where they are no longer practical. (8 character case-sensitive alpha-numeric passwords with 16 bit salt, hashed to a 128 bit value, would take up just under 200 exabytes without rainbow reduction).
There is no need for the salt to be secret.
A salt is not a secret key, instead a salt 'works' by making the hash function specific to each instance. With salted hash, there is not one hash function, but one for every possible salt value. This prevent the attacker from attacking N hashed passwords for less than N times the cost of attacking one password. This is the point of the salt.
A "secret salt" is not a salt, it is called a "key", and it means that you are no longer computing a hash, but a Message Authentication Code (MAC). Computing MAC is tricky business (much trickier than simply slapping together a key and a value into a hash function) and it is a very different subject altogether.
The salt must be random for every instance in which it is used. This ensures that an attacker has to attack every salted hash separately.
If you rely on your salt (or salting algorithm) being secret, you enter the realms of Security Through Obscurity (won't work). Most probably, you do not get additional security from the salt secrecy; you just get the warm fuzzy feeling of security. So instead of making your system more secure, it just distracts you from reality.
So, why does the salt have to be random?
Technically, the salt should be unique. The point of the salt is to be distinct for each hashed password. This is meant worldwide. Since there is no central organization which distributes unique salts on demand, we have to rely on the next best thing, which is random selection with an unpredictable random generator, preferably within a salt space large enough to make collisions improbable (two instances using the same salt value).
It is tempting to try to derive a salt from some data which is "presumably unique", such as the user ID, but such schemes often fail due to some nasty details:
If you use for example the user ID, some bad guys, attacking distinct systems, may just pool their resources and create precomputed tables for user IDs 1 to 50. A user ID is unique system-wide but not worldwide.
The same applies to the username: there is one "root" per Unix system, but there are many roots in the world. A rainbow table for "root" would be worth the effort, since it could be applied to millions of systems. Worse yet, there are also many "bob" out there, and many do not have sysadmin training: their passwords could be quite weak.
Uniqueness is also temporal. Sometimes, users change their password. For each new password, a new salt must be selected. Otherwise, an attacker obtained the hash of the old password and the hash of the new could try to attack both simultaneously.
Using a random salt obtained from a cryptographically secure, unpredictable PRNG may be some kind of overkill, but at least it provably protects you against all those hazards. It's not about preventing the attacker from knowing what an individual salt is, it's about not giving them the big, fat target that will be used on a substantial number of potential targets. Random selection makes the targets as thin as is practical.
In conclusion:
Use a random, evenly distributed, high entropy salt. Use a new salt whenever you create a new password or change a password. Store the salt along with the hashed password. Favor big salts (at least 10 bytes, preferably 16 or more).
A salt does not turn a bad password into a good password. It just makes sure that the attacker will at least pay the dictionary attack price for each bad password he breaks.
Usefull sources:
stackoverflow.com: Non-random salt for password hashes
Bruce Schneier: Practical Cryptography (book)
Matasano Security: Enough with the Rainbow Tables
usenix.org: Unix crypt used salt since 1976
owasp.org: Why add salt
openwall.com: Salts
Disclaimer:
I'm not a security expert. (Although this answer was reviewed by Thomas Pornin)
If any of the security professionals out there find something wrong, please do comment or edit this wiki answer.
As for what seems to be a good source for your random salt
Also read: What is the most secure seed for random number generation?
In the absence of dedicated, hardware based, random generators, the best way of obtaining random data is to ask the operating system (on Linux, this is called /dev/random or /dev/urandom [both have advantages and problems, choose your poison]; on Windows, call CryptGenRandom())
If for some reason you do not have access to the above mentioned sources of random, in PHP you could use the following function:
From the source of phpass v0.3
<?php
/**
* Generate pseudo random bits
* #copyright: public domain
* #link http://www.openwall.com/phpass/
* #param int $length number of bits to generate
* #return string A string with the hexadecimal number
* #note don't try to improve this, you will likely just ruin it
*/
function random_bits($entropy) {
$entropy /= 8;
$state = uniqid();
$str = '';
for ($i = 0; $i < $entropy; $i += 16) {
$state = md5(microtime().$state);
$str .= md5($state, true);
}
$str = unpack('H*', substr($str, 0, $entropy));
// for some weird reason, on some machines 32 bits binary data comes out as 65! hex characters!?
// so, added the substr
return substr(str_pad($str[1], $entropy*2, '0'), 0, $entropy*2);
}
?>
Updated
It's not a really good salt, but probably good enough to defeat all but the most determined and resourceful attackers. The requirements for a good salt are:
Different for each user
long enough (at the very least alphanumeric 8 characters) to make the concatenation of salt and (potentially weak) password too long for a brute force attack.
time() values are not really long enough, since they have 10 characters, but only digits.
Also, sometimes two users may get the same value when they are created within the same second. But that's only a problem if you have situations where many users are automatically created within the same second.
In any case, far more important than a perfect salt is using a good hash function, and SHA512 is one of the best we have available right now.
This post may veer a little too far away from your original question, but I hope you find it useful;
Security is about raising barriers and hurdles; defence in depth. There is no truly secure hashing solution, just ones that are hard to break. It's like putting in a burglar alarm and window locks in your house - make your site less attractive to break into than someone else's.
Salt for a crypt algorithm is only a small part of the security problem. A single salt simply means that there is one less thing to figure out when trying to break the password for multiple users. A low-entropy salt (such as the server's time) makes it a little bit harder, and a high-entropy salt makes it harder still. Which of these to use, and whether it's something you need to worry about primarily depends upon both the sensitivity of the data you're protecting, but also what other security measures you have in place. A site that just gives a personalised weather forecast for a selected city obviously has less sensitive data than one which has your home address, mother's maiden name, date of birth and other info which could be used for identification purposes.
So here's the rub; a high entropy salt is still a bad salt if it's easily obtainable.
In the real world, storing a salt in the database (random or not) is probably less secure than using a constant salt and burying it away from private eyes in a file inaccessible via the web browser. Whilst a unique and high entropy salt is harder to guess, if you've allowed root login from any server on MySql and set the password to 'password' it doesn't really matter! Constrast how easy it is to crack the database versus getting a valid login to your server - which is possibly more difficult to do discretely as you can put fail2ban and a plethora of other attack vector watchers in place depending upon your setup.
You can combine the two approaches by storing the location of a file containing a user-specific salt in the database, rather than the salt itself. Whether having to crack both the file system and the database is warranted depends whether the sensitivity of the data you are trying to protect warrants this overhead.
Another, alternative, recommendation from security experts is to store the username in a separate database (and ideally different technology) to the password, and reference between the two using a UUID. E.g. use both MySQL and SQLite. This means that both databases have to be cracked (and is also why, to go down a separate rabbit hole for the sake of an example, you should not store user details and credit card numbers in the same database since one is of no use without the other).
Note that Algorithms like SHA-512 and Blowfish can return the salt as part of their hash. Be careful with these as if you store the complete hash you give away the algorithm, which means there's two less thing for the hackers to figure out (the salt also gives away the algorithm).
Make sure you enforce strong passwords and usernames, so dictionary attacks will fail; I know of dictionaries for all 6-alphanumeric combinations of username/ password entries for MD5 and I suspect that there are more than this available for all sorts of algorithms. With the explosion of low-cost cloud and CPGPU computing, the size and complexity of available dictionaries is going to explode.
Ultimately, the most secure way is never to programatically generate a salt but require a user to enter it along with their username and password over a SSL link (so can't be snooped), but never store it. This is the approach taken by credit card companies; i.e. the 3-digit CSV security key on your credit card which you have to enter each and every time you buy online, since it should never be stored in any database. If you really want to generate the salt, send it to them separately (e.g. via SMS message or Email) and still make them enter it manually each time. With this approach, although more secure, you need to contrast the complexity against whether users will just stop using the site as you've made it too difficult for them to be bothered with it.
All of the above still relies on the fact that you also have protection in place against session hijacking, cross-site scripting, etc., etc. The world's strongest password algorithm is irrelevant if all I need to do is to calculate a valid PHPSESSID for a logged-in user and hijack it!
I am not a security expert, but have read up on this as much as I reasonably can do. The fact that there are so many books on the subject indicates how big the answer to your question really is.
A couple of really great books you might like to try which I've found invaluable are;
Web Application Vulnerabilities Detect, Exploit, Prevent - ISBN-13: 978-1-59749-209-6
Preventing Web Attacks with Apache - ISBN-13: 978-0-321-32128-2
No, time() is not a good salt
It's best not to reinvent the wheel when it comes to authentication, but to answer your question, no. The problem with time():
It's predictable and it correlates to potentially discoverable things. These issues make it easier to cross-match different hashed results.
There aren't very many possible values. Since the high-order bits don't change, it's an even narrower salt than it first appears.
Using it repeats previous mistakes. If this app were the first one to use time() as a salt, at least it would require a new attack.
Yes.
It seems that a unix timestamp, stored in the user database as a "Member since" field going to be decent salt.
However, salt question is most negligible one.
There are much more important things you have to pay attention to:
Most likely not a password nor salt or hashing algorithm going to be weakest part of your site. Some lame file injection or XSS or CSRF surely is. So, don't make a too big deal of it.
Speaking of a true random string of 32 char long in the typical web-application is like speaking about 32-inch armored door in the wooden barn.
Speaking of passwords, most ever important thing is password complexity. With weak password no salt nor hashing algorithm, even super-ingenious-incredible-hard one, could help. It's a pain to ask users to use complex password, but without it everything else becomes a piece of crap.
So, your first concern should be password complexity. 12-16 characters of different case, including numbers and punctuation is a requirement.
As for the salt, I see no benefit in using time, as you have to store it along with other user data. Better use a email - it's random enough and you have it already anyway. Don't forget to rehash a password if user changes their email. it seems that unix timstamp going to be a decent salt, no need to use email or anything else.
Update
As I can see, many people still unable to get the point.
Like that guy from the comments, saying
Many users use weak passwords (we should educate them, or at least keep trying), but that is no excuse; they still deserve good security
They deserve, no doubt. But with weak password the mission. is. impossible.
If your password is weak, then no salt will protect it.
While salt is not that important to spend a 10-kilobyte text on the topic.
Salt is use to prevent rainbow attacks by breaking the match between the password and precomputed hash. So the main task for a salt is to be different for each user/password record. Quality of randomization of the salt doesn't matter much as long as the salt is different for different users.
the date when a member joins a forum/website is generally openly access able , which would be same as time() hence making your salt useless.
No! Never use the current time as the salt. You can use something like 'SecureRandom' in java to generate a random salt that is secure. Always use an unpredictable random number as the salt. Using time as the salt will help you to remove collisions only upto a certain extent(because two users can sypply the same passwords at the same time), but still make the passwords recoverable.
The user name should be sufficient enough and perhaps the registration time stamp, but you should store it somewhere in the database. Anyway every value you use to salt your password hash, should be stored some way, so you can recalculate the hash.
Is salting with user name + a time stamp secure enough? It should be. For cracking SHA512 Hashes normally Rainbow Tables are used. A user name + a time stamp should be a salt which is uniquq enough, so there is no way there is some Rainbow Table on the net which contains precalculated hashes with passwords, which are salted this way.