Entropy in CSRF Tokens and Nonces

Entropy in CSRF Tokens and Nonces - php

I am making a classified ads site with Zend Framework (for portfolio purposes, yes I know the world doesn't have room for "yet another Craigslist clone"). I am trying to implement the ability to post/edit/delete without ever needing an account.
To do this, I feel like I need to have a Nonce generated upon post submission and stored in the database. Then email a link to the user which makes a GET request for the delete, like this:
http://www.somesite.com/post/delete/?id=123&nonce=2JDXS93JFKS8204HJTHSLDH230945HSLDF
Only the user has this unique key or nonce, and upon submission I check the database under the post's ID and ensure the nonce matches prior to deleting.
My issue is how secure the nonce actually is. If I use Zend Framework's Zend_Form_Element_Hash, it creates the hash like this:
protected function _generateHash()
{
$this->_hash = md5(
mt_rand(1,1000000)
. $this->getSalt()
. $this->getName()
. mt_rand(1,1000000)
);
$this->setValue($this->_hash);
}
In reading about mt_rand(), one commenter said "This function has limited entrophy. So, if you want to create random string, it will produce only about 2 billion different strings, no matter the length of the string. This can be serous security issue if you are using such strings for session indentifiers, passwords etc."
Due to the lifetime of the nonce/token in the application, which could be days or weeks before user chooses to delete post, I think more than enough time would be given for a potential hack.
I realize mt_rand() is a huge upgrade from rand() as seen in this visual mapping pixels with rand on the left, and mt_rand on the right. But is it enough? What makes "2 billion different strings" a security issue?
And ultimately, how can I increase the entropy of a nonce/token/hash?

For such security it's not only important how long your output is. It counts how much randomness you've used to create it.
For mt_rand() the source of randomness is its seed and state (number of times you've used it since it was seeded). More mt_rand() calls will just give you more rehasing of the same randomness source (no new entropy).
mt_rand()'s seed is only 32-bit (anything less than 128bit makes cryptographers suspicious ;)
Strength of a keys with 32-bits of entropy is 4 billion divided by (roughly) number of keys you'll generate (e.g. after 100K uses there will be ~1:43000 chance to guess any valid key, which approaches practical brute-forcing).
You're adding salt to this, which makes it much stronger, because in addition to guessing the seed attacker would have to know the salt as well, so if the salt is long, then overall the key may be quite strong despite "low" entropy.
To increase entropy you need to add more random stuff (even slightly random is OK too, just gives less bits) from different sources than mt_rand: microtime(), amount of memory used, process ID... or just use /dev/random, which collects all entropy it can get.
(edit: uniqid() has weak entropy, so it won't help here)

The Zend hash generating code above's input for the md5() hashing function has 1,000,000 X 1,000,000 different possibilities. md5() has 32^16 (1208925819614629174706176) possible outcomes no matter what the input is. On average, the hacker would need to send 500,000,000,000 requests to your server in order to guess the right nonce.
At 100 requests per minute, that's about 3472222 days to hack.

Related

HASH for GIF IMAGES

I need to know if exists any form to get a unique hash from gif images, i did tried with SHA1 file function
sha1_file
but i don't know if exist the case where two hash of different gif images, result in same hash.
Its can happen with SHA1? In this case is better SHA2, or MD5? Or any other previously implemented in PHP language.
I know its also depends of file size, but gifs image don't exceed 10mb in any case.
I need recommendations for this problem. best regards.

There is no hash function that creates different values for each and every set of images you provide. This should be obvious as your hash values are much shorter than the files themselves and therefore they are bound to drop some information on the way. Given a fixed set of images it is rather simple to produce a perfect hash function (e.g. by numbering them), but this is probably not the answer you are looking for.
On the other hand you can use "perfect hashing", a two step hashing algorithm that guarantees amortized O(1) access using a two step hashing algorithm, but as you are asking for a unique 'hash' that may also not be what you are looking for. Could you be a bit more specific about why you insist on the hash-value being unique and under what circumstances?

sha1_file is fine.
In theory you can run into two files that hash to the same value, but in practice it is so stupendously unlikely that you should not worry about it.

Hash functions don't provide any guarantees about uniqueness. Patru explains why, very well - this is the pigeonhole principle, if you'd like to read up.
I'd like to talk about another aspect, though. While you won't get any theoretical guarantees, you get a practical guarantee. Consider this: SHA-256 generates hashes that are 256 bits long. That means there are 2256 possible hashes it can generate. Assume further that the hashes it generates are distributed almost purely randomly (true for SHA-256). That means that if you generate a billion hashes a second, 24 hours a day, you'll have generated 31,536,000,000,000,000 hashes a year. A lot, right?
Divide that by 2256. That's ~1060. If you walked linearly through all possible hashes, that's how many years it would take you to generate all possible hashes (pack a lunch). Divide that by two, that's... still ~1060. That's how many years you'd have to work to have a greater than 50% chance of generating the same hash twice.
To put it another way, if you generate a billion hashes a second for a century, you'd have a 1/1058 chance of generating the same hash twice. Until the sun burns out, 1/1050.
Those are damn fine chances.

Unique token in CakePHP

I need to create truly unique token when inserting records in CakePHP. The table can contain millions of rows so I cant just base on some randomly generated strings. I do not want to use a microtime() as well, because there is, though very small probability that two records can be submitted exactly at the same moment.
Of course the best solution would be to use String::uuid(), but as from cakephp documentation
The uuid method is used to generate unique identifiers as per RFC 4122. The uuid is a 128bit string in the format of 485fc381-e790-47a3-9794-1337c0a8fe68.
So, as far as I understood it does not use cake's security salt for its generation. So, I decided to hash it by security component's hash function (or Auth Password function), because I need it to be unique and very, really very secure at the same time. But then I found the question, saying that it is not a good idea, but for php uniqid and md5.
Why is MD5'ing a UUID not a good idea?
And, also I think the string hashed by security component is much harder to guess - because, for example String::uuid() in for loop has an output like this
for ($i = 0; $i < 30; $i++) {
echo String::uuid()."<br>";
}
die;
// outputs
51f3dcda-c4fc-4141-aaaf-1378654d2d93
51f3dcda-d9b0-4c20-8d03-1378654d2d93
51f3dcda-e7c0-4ddf-b808-1378654d2d93
51f3dcda-f508-4482-852d-1378654d2d93
51f3dcda-01ec-4f24-83b1-1378654d2d93
51f3dcda-1060-49d2-adc0-1378654d2d93
51f3dcda-1da8-4cfe-abe4-1378654d2d93
51f3dcda-2af0-42f7-81a0-1378654d2d93
51f3dcda-3838-4879-b2c9-1378654d2d93
51f3dcda-451c-465a-a644-1378654d2d93
51f3dcda-5264-44b0-a883-1378654d2d93
So, after all the some part of the string is similar, but in case of using hash function the results are pretty different
echo Security::hash('stackoverflow1');
echo "<br>";
echo Security::hash('stackoverflow2');
die;
// outputs
e9a3fcb74b9a03c7a7ab8731053ab9fe5d2fe6bd
b1f95bdbef28db16f8d4f912391c22310ba3c2c2
So, the question is, can I after all hash the uuid() in Cake? Or what is the best secure way to get truly unique and hashed (better according to my security salt) secure token.
UPDATE
Saying secure token, I mean how difficult it is for guessing. UUID is really unique, but from the example above, some records have some similarity. But hashed results do not.
Thanks !!

I don't think you need to worry about the UUIDs overlapping.
To put these numbers into perspective, the annual risk of someone being hit by a meteorite is estimated to be one chance in 17 billion,[38] which means the probability is about 0.00000000006 (6 × 10−11), equivalent to the odds of creating a few tens of trillions of UUIDs in a year and having one duplicate. In other words, only after generating 1 billion UUIDs every second for the next 100 years, the probability of creating just one duplicate would be about 50%. Or, to put it another way, the probability of one duplicate would be about 50% if every person on earth owns 600 million UUIDs.
http://en.wikipedia.org/wiki/Universally_unique_identifier#Random_UUID_probability_of_duplicates
Continue to use String::uuid() and rest easy :)

A UUID is unique
I need to create truly unique token when inserting records in cakphp
That is exactly what a UUID is. It is normally used in distributed systems to prevent collisions (multiple sources inserting data, possibly out of sync, into a datasource).
A UUID is not a security measure
I need it to be unique and very, really very secure at the same time
Not sure in what way hashing a uuid is supposed to enhance security - it won't. Relying on security by obscurity is more or less guaranteed to fail.
If your need is random tokens of some form - use a hash function (Hashing a uuid is simply hashing a random seed), if you need guaranteed-unique identifiers use UUIDs. They aren't the same thing and a UUID is a very poor mechanism of generating random, non-sequential "un-guessable" (or whatever the purpose is) strings.

Generating a random string suitable for cryptographic purposes was answered well here:
Secure random number generation in PHP
The code sample fills the string $pr_bits with random binary data, so the characters are unprintable. To use this in a URL, you could convert the binary data to printable characters a couple ways. None of them enhance the security but make them ready for URLs.
convert bytes to hex: bin2hex($pr_bits)
convert bytes to base64: base64_encode($pr_bits)
hash the bytes (because the output is conveniently in hex, not for added security): string hash ('md5' , $pr_bits)
I include the last one because you will see people use hash functions for other reasons, like to guarantee the output is 16bytes/128bits for md5. In PHP people use it to convert a value into HEX.

I have come up with the following solution
to use a string as a result of concatenating current time in microseconds and random string's hash
$timeStr = str_replace("0.", "", microtime());
$timeStr = str_replace(" ", "", $timeStr);
echo Security::hash('random string').'_'.$timeStr;
// 5ffd3b852ccdd448809abb172e19bbb9c01a43a4_796473001379403705
So, the first part(hash) of the string will contribute for the unguessability of the token, and the second part will guarantee its uniquenes.
Hope, this will help someone.

Is it wrong to use a hash for a unique ID?

I want to use a unique ID generated by PHP in a database table that will likely never have more than 10,000 records. I don't want the time of creation to be visible or use a purely numeric value so I am using:
sha1(uniqid(mt_rand(), true))
Is it wrong to use a hash for a unique ID? Don't all hashes lead to collisions or are the chances so remote that they should not be considered in this case?
A further point: if the number of characters to be hashed is less than the number of characters in a sha1 hash, won't it always be unique?

If you have 2 keys you will have a theoretical best case scenario of 1 in 2 ^ X probability of a collision, where X is the number of bits in your hashing algorithm. 'Best case' because the input usually will be ASCII which doesn't utilize the full charset, plus the hashing functions do not distribute perfectly, so they will collide more often than the theoretical max in real life.
To answer your final question:
A further point: if the number of characters to be hashed is less than
the number of characters in a sha1 hash, won't it always be unique?
Yeah that's true-sorta. But you would have another problem of generating unique keys of that size. The easiest way is usually a checksum, so just choose a large enough digest that the collision space will be small enough for your comfort.
As #wayne suggests, a popular approach is to concatenate microtime() to your random salt (and base64_encode to raise the entropy).

How horrible would it be if two ended up the same? Murphy's Law applies - if a million to one, or even a 100,000:1 chance is acceptable, then go right ahead! The real chance is much, much smaller - but if your system will explode if it happens then your design flaw must be addressed first. Then proceed with confidence.
Here is a question/answer of what the probabilities really are: Probability of SHA1 Collisions

Use sha1(time()) in stead, then you remove the random possibility of a repeating hash for as long as time can be represented shorter than the sha1 hash. (likely longer than you fill find a working php parser ;))

Computer random isn't actually random, you know?
The only true random that you can obtain from a computer, supposing you are on a Unix environment is from /dev/random, but this is a blocking operation that depends on user interactions like moving a mouse or typing on keyboard. Reading from /dev/urandom is less safe, but it's probably better thang using just ASCII characters and gives you instantaneous response.

sha1($ipAddress.time())
Causes it's impossible for anyone to use same IP address same time

How to continuously keep the number of bcrypt rounds relevant to the current year's hardware?

I saw a recommendation that the number of rounds be set to ($currentYear - 2000) to account for Moore's law, so that 2013 would be 13 rounds and therefore 2^13 total iterations. Of course, you need to take into account your own hardware to ensure it doesn't take too long (I saw 1 second recommended as "safe" for checking passwords/hashes, and 13 rounds falls around that mark on my current hardware).
Does that sound reasonable for a social networking type of site? Or would I be setting myself up for very slow password checking in the future by using ($currentYear - 2000)?
Also, how do you deal with changing the number of rounds from one year to the next? Won't changing the number of rounds change the hashes, therefore not allowing you to check hashes from 2013 in 2014 since the check would use an extra round? Would you have to re-calculate every single hash each year, or how would it work exactly?

First off, I question that recommendation (adjusting cost based on year). The cost should be based on how fast your hardware is, not the current date. If you don't upgrade your server between now and 2015, there's no reason to increase the cost. All you do is slow an already slow process.
With that said, I also question the recommendation of 1 second for most usages. If you're dealing with highly sensitive information, 1 second (or perhaps longer) is ok. But for the average website, I typically recommend between 0.25 and 0.5 seconds. In some cases you can go lower, but I wouldn't without strong justification.
Now, to the question itself. When you use crypt() or password_hash(), the iteration count is stored in the return hash format. In fact, the salt is as well. So all information needed to compute the hash is included in it!
And if you're not using either of those API's (or the polyfill that I maintain: password-compat), then I really have to wonder why you aren't. Don't invent your own password crypto. Don't use libraries that use native hashes (like phpass) unless you have a strong reason to (for certain governmental compliance reasons, or compatibility with PHP <= 5.2).
It is generally considered that bcrypt is the strongest hash format today. SCrypt is stronger, but there are some issues with it, and it is still very new (and it's not available in PHP core yet). So just use bcrypt...
The password_hash() api has a mechanism for you to do what you're asking: password_needs_rehash(). Basically, you pass in the hash, and the options you use today, and it tells you if you need to rehash it:
if (password_verify($password, $hash)) {
if (password_needs_rehash($hash, PASSWORD_BCRYPT, ['cost' => 14])) {
$hash = password_hash($password);
update_password_in_database($hash);
}
$loggedin = true;
}
Read the RFC for password_hash() for more information about it (I collected data from a large number of sources, and included references in the RFC).
Edit - Follow up to #AnotherParker's Comment:
Criminals don't stop upgrading their crackingboxes just because you didn't upgrade your server. You do need to increase the work parameter over time to thwart offline attacks.
Sort-of true. Well, true, but misses the point of what I was talking about above.
The cost parameter of the hash function is a time-effort tradeoff. You trade-off some time to add additional effort to each hash. On the same hardware, taking more time will yield more work. The other way to yield more work is to get faster hardware.
But the recommendation is to test the hash function on your current hardware, and make it as expensive as you can reasonably make it. If 0.5 seconds is the maximum that you can afford today, unless you upgrade your server hardware, how is increasing the cost going to help you? In short, it won't because you'll break the maximum time limit you've already determined is important.
So you can't increase the work parameter without also increasing the server's capabilities, unless you already were producing weak hashes.
Also, check out this answer on the subject

When you use bcrypt, the number of rounds is part of the hash generated:
crypt ( 'Password', '$2a$04$thisshallbemysalt' );
will result in something like
$2a$04$thisshallbemysalt.rAnd0ml0ok1ngch4rsh3re
2a after the first $ sign stands for the bcrypt algorithem, and next 04 stands for the number of rounds – so by looking at the hash you can see how many rounds where done creating it.
So when you decide it’s time to up the number of rounds, you could check the number of rounds used in generating the stored hash when the user logs in – and if its not your current number of rounds, you re-hash their password there and then, and save it as the new hash (after checking whether their password matched the existing hash, of course ;-))

The idea of key stretching is to make brute-forcing unfeasable because calculating a hash hundreds or thousands of times takes, for each round, the same amount of extra time on the attackers' system.
It isn't really important if it takes 1 second or .9 seconds or 2.5 seconds . The idea is that it's unfeasable to brute-force millions of password-hashes per second. It's the factor that counts, not the actual number of rounds.
If you use, for example, an SHA256 hash a system could do X (say 1,000,000) hashes per second. By key stretching (and thus hashing, for example, 500 times) you bring this system down to 1,000,000 / 500 = 2,000 attempts per second for each password. You effectively slow down the attacker by a factor of 500. If you use 750 round you... exactly! Slow down the attacker by a factor of 750. But increasing the number of round affects your system/website/application as well so you don't want to go too high "just to be sure"; users will complain about slow logins!
This stems from the fact that, for example, SHA1/256/512, MD4/5 etc. are optimized for speed. And what you don't want is a speed-optimized algorithm so you can slow attackers down. So once every some years you simply increase the number of rounds by some factor that the login-time for your users still stays acceptable but that it would slow down the attacker enough to make it not worthwhile trying to brute-force the hashes (or at least to force them to focus on a lot less accounts rather than all accounts for example).
When you up the number of rounds you rehash as CBroe explains.
I don't know who came up with the 2($currentYear - 2000) recommendation (I'd love to see a source! Nevermind, found it) but if you ask me this is total bullocks. I suggest you read the answers more closely and also check this question/answer.
If your bcrypt takes .2 to .5 seconds (which is an acceptable 'delay' while logging in if you ask me) that would mean an attacker can brute-force about 5 to 2 hashes per second given the same hardware and if he/she invests heavily 5,000/2,000 or 5,000,000/2,000,000 maybe. That still isn't feasable to brute-force the entire 160bit (SHA1), 256bit (SHA256) or even 448bit (bcrypt) space in acceptable time (or even lifetime).

Unique key generation

I looking for a way, specifically in PHP that I will be guaranteed to always get a unique key.
I have done the following:
strtolower(substr(crypt(time()), 0, 7));
But I have found that once in a while I end up with a duplicate key (rarely, but often enough).
I have also thought of doing:
strtolower(substr(crypt(uniqid(rand(), true)), 0, 7));
But according to the PHP website, uniqid() could, if uniqid() is called twice in the same microsecond, it could generate the same key. I'm thinking that the addition of rand() that it rarely would, but still possible.
After the lines mentioned above I am also remove characters such as L and O so it's less confusing for the user. This maybe part of the cause for the duplicates, but still necessary.
One option I have a thought of is creating a website that will generate the key, storing it in a database, ensuring it's completely unique.
Any other thoughts? Are there any websites out there that already do this that have some kind of API or just return the key. I found http://userident.com but I'm not sure if the keys will be completely unique.
This needs to run in the background without any user input.

There are only 3 ways to generate unique values, rather they be passwords, user IDs, etc.:
Use an effective GUID generator - these are long and cannot be shrunk. If you only use part you FAIL.
At least part of the number is sequentially generated off of a single sequence. You can add fluff or encoding to make it look less sequential. Advantage is they start short - disadvantage is they require a single source. The work around for the single source limitation is to have numbered sources, so you include the [source #] + [seq #] and then each source can generate its own sequence.
Generate them via some other means and then check them against the single history of previously generated values.
Any other method is not guaranteed. Keep in mind, fundamentally you are generating a binary number (it is a computer), but then you can encode it in Hexadecimal, Decimal, Base64, or a word list. Pick an encoding that fits your usage. Usually for user entered data you want some variation of Base32 (which you hinted at).
Note about GUIDS: They gain their strength of uniqueness from their length and the method used to generate them. Anything less than 128-bits is not secure. Beyond random number generation there are characteristics that go into a GUID to make it more unique. Keep in mind they are only practically unique, not completely unique. It is possible, although practically impossible to have a duplicate.
Updated Note about GUIDS: Since writing this I learned that many GUID generators use a cryptographically secure random number generator (difficult or impossible to predict the next number generated, and a not likely to repeat). There are actually 5 different UUID algorithms. Algorithm 4 is what Microsoft currently uses for the Windows GUID generation API. A GUID is Microsoft's implementation of the UUID standard.
Update: If you want 7 to 16 characters then you need to use either method 2 or 3.
Bottom line: Frankly there is no such thing as completely unique. Even if you went with a sequential generator you would eventually run out of storage using all the atoms in the universe, thus looping back on yourself and repeating. Your only hope would be the heat death of the universe before reaching that point.
Even the best random number generator has a possibility of repeating equal to the total size of the random number you are generating. Take a quarter for example. It is a completely random bit generator, and its odds of repeating are 1 in 2.
So it all comes down to your threshold of uniqueness. You can have 100% uniqueness in 8 digits for 1,099,511,627,776 numbers by using a sequence and then base32 encoding it. Any other method that does not involve checking against a list of past numbers only has odds equal to n/1,099,511,627,776 (where n=number of previous numbers generated) of not being unique.

Any algorithm will result in duplicates.
Therefore, might I suggest that you use your existing algorithm* and simply check for duplicates?
*Slight addition: If uniqid() can be non-unique based on time, also include a global counter that you increment after every invocation. That way something is different even in the same microsecond.

Without writing the code, my logic would be:
Generate a random string from whatever acceptable characters you like.
Then add half the date stamp (partial seconds and all) to the front and the other half to the end (or somewhere in the middle if you prefer).
Stay JOLLY!
H

If you use your original method, but add the username or emailaddress in front of the password, it will always be unique if each user only can have 1 password.

You may be interested in this article which deals with the same issue: GUIDs are globally unique, but substrings of GUIDs aren't.
The goal of this algorithm is to use the combination of time and location ("space-time coordinates" for the relativity geeks out there) as the uniqueness key. However, timekeeping is not perfect, so there's a possibility that, for example, two GUIDs are generated in rapid succession from the same machine, so close to each other in time that the timestamp would be the same. That's where the uniquifier comes in.

I usually do it like this:
$this->password = '';
for($i=0; $i<10; $i++)
{
if($i%2 == 0)
$this->password .= chr(rand(65,90));
if($i%3 == 0)
$this->password .= chr(rand(97,122));
if($i%4 == 0)
$this->password .= chr(rand(48,57));
}
I suppose there are some theoretical holes but I've never had an issue with duplication. I usually use it for temporary passwords (like after a password reset) and it works well enough for that.

As Frank Kreuger commented, go with a GUID generator.
Like this one

I'm still not seeing why the passwords have to be unique? What's the downside if 2 of your users have the same password?
This is assuming we're talking about passwords that are tied to userids, and not just unique identifiers. If that's what you're looking for, why not use GUIDs?

You might be interested in Steve Gibson's over-the-top-secure implementation of a password generator (no source, but he has a detailed description of how it works) at https://www.grc.com/passwords.htm.
The site creates huge 64-character passwords but, since they're completely random, you could easily take the first 8 (or however many) characters for a less secure but "as random as possible" password.
EDIT: from your later answers I see you need something more like a GUID than a password, so this probably isn't what you want...

I do believe that part of your issue is that you are trying to us a singular function for two separate uses... passwords and transaction_id
these really are two different problem areas and it really is not best to try to address them together.

I recently wanted a quick and simple random unique key so I did the following:
$ukey = dechex(time()) . crypt( time() . md5(microtime() + mt_rand(0, 100000)) );
So, basically, I get the unix time in seconds and add a random md5 string generated from time + random number. It's not the best, but for low frequency requests it is pretty good. It's fast and works.
I did a test where I'd generate thousands of keys and then look for repeats, and having about 800 keys per second there were no repetitions, so not bad. I guess it totally depends on mt_rand()
I use it for a survey tracker where we get a submission rate of about 1000 surveys per minute... so for now (crosses fingers) there are no duplicates. Of course, the rate is not constant (we get the submissions at certain times of the day) so this is not fail proof nor the best solution... the tip is using an incremental value as part of the key (in my case, I used time(), but could be better).

Ingoring the crypting part that does not have much to do with creating a unique value I usually use this one:
function GetUniqueValue()
{
static $counter = 0; //initalized only 1st time function is called
return strtr(microtime(), array('.' => '', ' ' => '')) . $counter++;
}
When called in same process $counter is increased so value is always unique in same process.
When called in different processes you must be really unlucky to get 2 microtime() call with the same values, think that microtime() calls usually have different values also when called in same script.

I usually do a random substring (randomize how many chars between 8 an 32, or less for user convenience) or the MD5 of some value I have gotten in, or the time, or some combination. For more randomness I do MD5 of come value (say last name) concatenate that with the time, MD5 it again, then take the random substring. Yes, you could get equal passwords, but its not very likely at all.

We Keep Coding

PHP, A popular general-purpose scripting language that is especially suited to web development.