Why is MD5'ing a UUID not a good idea?

Why is MD5'ing a UUID not a good idea? - php

PHP has a uniqid() function which generates a UUID of sorts.
In the usage examples, it shows the following:
$token = md5(uniqid());
But in the comments, someone says this:
Generating an MD5 from a unique ID is
naive and reduces much of the value of
unique IDs, as well as providing
significant (attackable) stricture on
the MD5 domain. That's a deeply
broken thing to do. The correct
approach is to use the unique ID on
its own; it's already geared for
non-collision.
Why is this true, if so? If an MD5 hash is (almost) unique for a unique ID, then what is wrong from md5'ing a uniqid?

A UUID is 128 bits wide and has uniqueness inherent to the way it is generated. A MD5 hash is 128 bits wide and doesn't guarantee uniquess, only a low probablity of collision. The MD5 hash is no smaller than the UUID so it doesn't help with storage.
If you know the hash is from a UUID it is much easier to attack because the domain of valid UUIDs is actually fairly predictable if you know anything about the machine geneerating them.
If you needed to provide a secure token then you would need to use a cryptographically secure random number generator.(1) UUIDs are not designed to be cryptographically secure, only guaranteed unique. A monotonically increasing sequence bounded by unique machine identifiers (typically a MAC) and time is still a perfectly valid UUID but highly predictable if you can reverse engineer a single UUID from the sequence of tokens.
The defining characteristic of a cryptographically secure PRNG is that the result of a given iteration does not contain enough information to infer the value of the next iteration - i.e. there is some hidden state in the generator that is not revealed in the number and cannot be inferred by examining a sequence of numbers from the PRNG. If you get into number theory you can find ways to guess the internal state of some PRNGs from a sequence of generated values. Mersenne Twister is an example of such a generator. It has hidden state that it used to get its long period but it is not cryptographically secure - you can take a fairly small sequence of numbers and use that to infer the internal state. Once you've done this you can use it to attack a cryptographic mechanism that depends on keeping that sequence a secret.

Note that uniqid() does not return a UUID, but a "unique" string based on the current time:
$ php -r 'echo uniqid("prefix_", true);'
prefix_4a8aaada61b0f0.86531181
If you do that multiple times, you will get very similar output strings and everyone who is familiar with uniqid() will recognize the source algorithm. That way it is pretty easy to predict the next IDs that will be generated.
The advantage of md5()-ing the output, along with an application-specific salt string or random number, is a way harder to guess string:
$ php -r 'echo md5(uniqid("prefix_", true));'
3dbb5221b203888fc0f41f5ef960f51b
Unlike plain uniqid(), this produces very different outputs every microsecond. Furthermore it does not reveil your "prefix salt" string, nor that you are using uniqid() under the hood. Without knowing the salt, it is very hard (consider it impossible) to guess the next ID.
In summary, I would disagree with the commentor's opinion and would always prefer the md5()-ed output over plain uniqid().

MD5ing a UUID is pointless because UUIDs are already unique and fixed length (short), properties that are some of the reasons that people often use MD5 to begin with. So I suppose it depends on what you plan on doing with the UUID, but in general a UUID has the same properties as some data that has been MD5'd, so why do both?

UUIDs are already unique, so there is no point in MD5'ing them anyway.
Regarding the security question, in general you can be attacked if the attacker can predict what the next unique ID will be you are about to generate. If it is known that you generate your unique IDs from UUIDs, the set of potential next unique IDs is much smaller, giving a better chance for a brute force attack.
This is especially true if the attacker can get a whole bunch of unique IDs from you, and that way guess your scheme of generating UUIDs.

Version 3 of UUIDs are already MD5'd, so there's no point in doing it again. However, I'm not sure what UUID version PHP uses.

As an aside, MD5 is actually obsolete and is not to be used in anything worth protecting - PHI, PII or PCI - from 2010 onwards. The US Feds have ennforced this and any entity non-compliant would be paying lots of $$$ in penalty.

Related

How easily will uniqid() with more entropy create a duplicate?

This might be an off topic question but i hope someone can answer this question.
Per how many nanoseconds, mili seconds or seconds does uniqid() with more entropy run the risk of creating a duplicate?
With reference to link below, uniqid will collide if two id are created in one milisecond. What about with more entropy?
(My goal is to use a small indexable alphanumeric string as document id at creation that can be created fast with minimum processor power without db interference.)
Answers here dont seem to provide any exact number:
How unique is uniqid?

From the source code, more_entropy adds nine random decimal digits, so you can expect a collision after 37,000 or so calls. (For how a billion turned into 37,000, see the birthday attack.) That of course ignores the fact that these digits are not actually random but generated by an LCG, and the same LCG is probably used in other places in the code, so the actual chance of collision is probably higher (by how much exactly, I have no idea).
Also worth noting that uniqid does not actually guarantee microsecond resolution as some PHP implementations (Windows, specifically) don't have access to a microsecond-precision clock.
In short, if you need a unique ID for anything security-sensitive, or collisions are costly, avoid uniqid. Otherwise, using it with more_entropy is probably fine (although the common pattern is to use uniqid(mt_rand(), true) to add even more extra entropy).

Unique token in CakePHP

I need to create truly unique token when inserting records in CakePHP. The table can contain millions of rows so I cant just base on some randomly generated strings. I do not want to use a microtime() as well, because there is, though very small probability that two records can be submitted exactly at the same moment.
Of course the best solution would be to use String::uuid(), but as from cakephp documentation
The uuid method is used to generate unique identifiers as per RFC 4122. The uuid is a 128bit string in the format of 485fc381-e790-47a3-9794-1337c0a8fe68.
So, as far as I understood it does not use cake's security salt for its generation. So, I decided to hash it by security component's hash function (or Auth Password function), because I need it to be unique and very, really very secure at the same time. But then I found the question, saying that it is not a good idea, but for php uniqid and md5.
Why is MD5'ing a UUID not a good idea?
And, also I think the string hashed by security component is much harder to guess - because, for example String::uuid() in for loop has an output like this
for ($i = 0; $i < 30; $i++) {
echo String::uuid()."<br>";
}
die;
// outputs
51f3dcda-c4fc-4141-aaaf-1378654d2d93
51f3dcda-d9b0-4c20-8d03-1378654d2d93
51f3dcda-e7c0-4ddf-b808-1378654d2d93
51f3dcda-f508-4482-852d-1378654d2d93
51f3dcda-01ec-4f24-83b1-1378654d2d93
51f3dcda-1060-49d2-adc0-1378654d2d93
51f3dcda-1da8-4cfe-abe4-1378654d2d93
51f3dcda-2af0-42f7-81a0-1378654d2d93
51f3dcda-3838-4879-b2c9-1378654d2d93
51f3dcda-451c-465a-a644-1378654d2d93
51f3dcda-5264-44b0-a883-1378654d2d93
So, after all the some part of the string is similar, but in case of using hash function the results are pretty different
echo Security::hash('stackoverflow1');
echo "<br>";
echo Security::hash('stackoverflow2');
die;
// outputs
e9a3fcb74b9a03c7a7ab8731053ab9fe5d2fe6bd
b1f95bdbef28db16f8d4f912391c22310ba3c2c2
So, the question is, can I after all hash the uuid() in Cake? Or what is the best secure way to get truly unique and hashed (better according to my security salt) secure token.
UPDATE
Saying secure token, I mean how difficult it is for guessing. UUID is really unique, but from the example above, some records have some similarity. But hashed results do not.
Thanks !!

I don't think you need to worry about the UUIDs overlapping.
To put these numbers into perspective, the annual risk of someone being hit by a meteorite is estimated to be one chance in 17 billion,[38] which means the probability is about 0.00000000006 (6 × 10−11), equivalent to the odds of creating a few tens of trillions of UUIDs in a year and having one duplicate. In other words, only after generating 1 billion UUIDs every second for the next 100 years, the probability of creating just one duplicate would be about 50%. Or, to put it another way, the probability of one duplicate would be about 50% if every person on earth owns 600 million UUIDs.
http://en.wikipedia.org/wiki/Universally_unique_identifier#Random_UUID_probability_of_duplicates
Continue to use String::uuid() and rest easy :)

A UUID is unique
I need to create truly unique token when inserting records in cakphp
That is exactly what a UUID is. It is normally used in distributed systems to prevent collisions (multiple sources inserting data, possibly out of sync, into a datasource).
A UUID is not a security measure
I need it to be unique and very, really very secure at the same time
Not sure in what way hashing a uuid is supposed to enhance security - it won't. Relying on security by obscurity is more or less guaranteed to fail.
If your need is random tokens of some form - use a hash function (Hashing a uuid is simply hashing a random seed), if you need guaranteed-unique identifiers use UUIDs. They aren't the same thing and a UUID is a very poor mechanism of generating random, non-sequential "un-guessable" (or whatever the purpose is) strings.

Generating a random string suitable for cryptographic purposes was answered well here:
Secure random number generation in PHP
The code sample fills the string $pr_bits with random binary data, so the characters are unprintable. To use this in a URL, you could convert the binary data to printable characters a couple ways. None of them enhance the security but make them ready for URLs.
convert bytes to hex: bin2hex($pr_bits)
convert bytes to base64: base64_encode($pr_bits)
hash the bytes (because the output is conveniently in hex, not for added security): string hash ('md5' , $pr_bits)
I include the last one because you will see people use hash functions for other reasons, like to guarantee the output is 16bytes/128bits for md5. In PHP people use it to convert a value into HEX.

I have come up with the following solution
to use a string as a result of concatenating current time in microseconds and random string's hash
$timeStr = str_replace("0.", "", microtime());
$timeStr = str_replace(" ", "", $timeStr);
echo Security::hash('random string').'_'.$timeStr;
// 5ffd3b852ccdd448809abb172e19bbb9c01a43a4_796473001379403705
So, the first part(hash) of the string will contribute for the unguessability of the token, and the second part will guarantee its uniquenes.
Hope, this will help someone.

Is it wrong to use a hash for a unique ID?

I want to use a unique ID generated by PHP in a database table that will likely never have more than 10,000 records. I don't want the time of creation to be visible or use a purely numeric value so I am using:
sha1(uniqid(mt_rand(), true))
Is it wrong to use a hash for a unique ID? Don't all hashes lead to collisions or are the chances so remote that they should not be considered in this case?
A further point: if the number of characters to be hashed is less than the number of characters in a sha1 hash, won't it always be unique?

If you have 2 keys you will have a theoretical best case scenario of 1 in 2 ^ X probability of a collision, where X is the number of bits in your hashing algorithm. 'Best case' because the input usually will be ASCII which doesn't utilize the full charset, plus the hashing functions do not distribute perfectly, so they will collide more often than the theoretical max in real life.
To answer your final question:
A further point: if the number of characters to be hashed is less than
the number of characters in a sha1 hash, won't it always be unique?
Yeah that's true-sorta. But you would have another problem of generating unique keys of that size. The easiest way is usually a checksum, so just choose a large enough digest that the collision space will be small enough for your comfort.
As #wayne suggests, a popular approach is to concatenate microtime() to your random salt (and base64_encode to raise the entropy).

How horrible would it be if two ended up the same? Murphy's Law applies - if a million to one, or even a 100,000:1 chance is acceptable, then go right ahead! The real chance is much, much smaller - but if your system will explode if it happens then your design flaw must be addressed first. Then proceed with confidence.
Here is a question/answer of what the probabilities really are: Probability of SHA1 Collisions

Use sha1(time()) in stead, then you remove the random possibility of a repeating hash for as long as time can be represented shorter than the sha1 hash. (likely longer than you fill find a working php parser ;))

Computer random isn't actually random, you know?
The only true random that you can obtain from a computer, supposing you are on a Unix environment is from /dev/random, but this is a blocking operation that depends on user interactions like moving a mouse or typing on keyboard. Reading from /dev/urandom is less safe, but it's probably better thang using just ASCII characters and gives you instantaneous response.

sha1($ipAddress.time())
Causes it's impossible for anyone to use same IP address same time

Entropy in CSRF Tokens and Nonces

I am making a classified ads site with Zend Framework (for portfolio purposes, yes I know the world doesn't have room for "yet another Craigslist clone"). I am trying to implement the ability to post/edit/delete without ever needing an account.
To do this, I feel like I need to have a Nonce generated upon post submission and stored in the database. Then email a link to the user which makes a GET request for the delete, like this:
http://www.somesite.com/post/delete/?id=123&nonce=2JDXS93JFKS8204HJTHSLDH230945HSLDF
Only the user has this unique key or nonce, and upon submission I check the database under the post's ID and ensure the nonce matches prior to deleting.
My issue is how secure the nonce actually is. If I use Zend Framework's Zend_Form_Element_Hash, it creates the hash like this:
protected function _generateHash()
{
$this->_hash = md5(
mt_rand(1,1000000)
. $this->getSalt()
. $this->getName()
. mt_rand(1,1000000)
);
$this->setValue($this->_hash);
}
In reading about mt_rand(), one commenter said "This function has limited entrophy. So, if you want to create random string, it will produce only about 2 billion different strings, no matter the length of the string. This can be serous security issue if you are using such strings for session indentifiers, passwords etc."
Due to the lifetime of the nonce/token in the application, which could be days or weeks before user chooses to delete post, I think more than enough time would be given for a potential hack.
I realize mt_rand() is a huge upgrade from rand() as seen in this visual mapping pixels with rand on the left, and mt_rand on the right. But is it enough? What makes "2 billion different strings" a security issue?
And ultimately, how can I increase the entropy of a nonce/token/hash?

For such security it's not only important how long your output is. It counts how much randomness you've used to create it.
For mt_rand() the source of randomness is its seed and state (number of times you've used it since it was seeded). More mt_rand() calls will just give you more rehasing of the same randomness source (no new entropy).
mt_rand()'s seed is only 32-bit (anything less than 128bit makes cryptographers suspicious ;)
Strength of a keys with 32-bits of entropy is 4 billion divided by (roughly) number of keys you'll generate (e.g. after 100K uses there will be ~1:43000 chance to guess any valid key, which approaches practical brute-forcing).
You're adding salt to this, which makes it much stronger, because in addition to guessing the seed attacker would have to know the salt as well, so if the salt is long, then overall the key may be quite strong despite "low" entropy.
To increase entropy you need to add more random stuff (even slightly random is OK too, just gives less bits) from different sources than mt_rand: microtime(), amount of memory used, process ID... or just use /dev/random, which collects all entropy it can get.
(edit: uniqid() has weak entropy, so it won't help here)

The Zend hash generating code above's input for the md5() hashing function has 1,000,000 X 1,000,000 different possibilities. md5() has 32^16 (1208925819614629174706176) possible outcomes no matter what the input is. On average, the hacker would need to send 500,000,000,000 requests to your server in order to guess the right nonce.
At 100 requests per minute, that's about 3472222 days to hack.

How unique is uniqid?

This question isn't really a problem looking for a solution, it's more just a matter of simple curiosity. The PHP uniqid function has a more entropy flag, to make the output "more unique". This got me wondering, just how likely is it for this function to produce the same result more than once when more_entropy is true, versus when it isn't. In other words, how unique is uniqid when more_entropy is enabled, versus when it is disabled? Are there any drawbacks to having more_entropy enabled all the time?

Update, March 2014:
Firstly, it is important to note that uniqid is a bit of a misnomer as it doesnt guarantee a unique ID.
Per the PHP documentation:
WARNING!
This function does not create random nor unpredictable string. This
function must not be used for security purposes. Use cryptographically
secure random function/generator and cryptographically secure hash
functions to create unpredictable secure ID.
And
This function does not generate cryptographically secure tokens, in
fact without being passed any additional parameters the return value
is little different from microtime(). If you need to generate
cryptographically secure tokens use openssl_random_pseudo_bytes().
Setting more-entropy to true generates a more unique value, however the execution time is longer (though to a tiny degree), according to the docs:
If set to TRUE, uniqid() will add additional entropy (using the
combined linear congruential generator) at the end of the return
value, which increases the likelihood that the result will be unique.
Note the line increases the likelihood that the result will be unique and not that is will guarantee uniqueness.
You can 'endlessly' strive for uniqueness, up to a point, and enhance using any number of encryption routines, adding salts and the like- it depends on the purpose.
I'd recommend looking at the comments on the main PHP topic, notably:
http://www.php.net/manual/en/function.uniqid.php#96898
http://www.php.net/manual/en/function.uniqid.php#96549
http://www.php.net/manual/en/function.uniqid.php#95001
What I'd recommend is working out why you need uniqueness, is it for security (i.e. to add to an encryption/scrambling routine)? Also, How unique does it need to be? Finally, look at the speed consideration. Suitability will change with the underlying considerations.

Things are only unique if you check that they don't exist already. It doesn't matter what function you use to generate a 'random' string, or ID - if you don't double check that it isn't a duplicate, then there's always that chance.. ;)
While uniqid is based on the current time, the cautionary note above still applies - it just depends on where you will be using these "unique IDs". The clue to all this is where it says "more unique". Unique is unique is unique. How you can have something which is more or less unique, is a bit confusing to me!
Checking as above, and combining all this stuff will let you end up with something approaching uniqueness, but its all relative to where the keys will be used and the context. Hope that helps!

From the discussions about the function on the PHP manual site:
As others below note, without prefix
and without "added entropy", this
function simply returns the UNIX
timestamp with added microsecond
counter as a hex number; it's more or
less just microtime(), in hexit form.
[...]
Also worth to note is that since microtime() only works on systems that have gettimeofday() > present, which Windows natively DOES NOT, uniqid() might yield just the single-second-resolution UNIX timestamp in a Windows environment.
In other words without "more_entropy", the function is absolutely horrible and should never be used, period. Accoding to the documentation, the flag will use a "combined linear congruential generator" to "add entropy". Well, that's a pretty weak RNG. So I'd skip this function entirely and use something based on mt_rand with a good seed for things that are not security-relevant, and SHA-256 for things that are.

Without the more_unique flag, it returns the unix timestamp with a microsecond counter, therefore if two calls get made at the same microsecond then they will return the same 'unique' id.
From there it is a question of how likely that is. The answer is, not very, but not to a discountable degree. If you need a unique id and you generate them often (or work with data generated elsewhere), don't count on it to be absolutely unique.

The relevant bit from the source code is
if (more_entropy) {
uniqid = strpprintf(0, "%s%08x%05x%.8F", prefix, sec, usec, php_combined_lcg() * 10);
} else {
uniqid = strpprintf(0, "%s%08x%05x", prefix, sec, usec);
}
So more_entropy adds nine somewhat random decimal digits (php_combined_lcg() returns a value in (0,1)) - that's 29.9 bits of entropy, tops (in reality probably less as LCG is not a cryptographically secure pseudorandom number generator).

After reading the source code of uniqueId, it's clear the way it works is to convert the time in microseconds from 1970-01-01 00:00:00 into an ID. It also waits until a microsecond has passed.
That means in the following code:
$uniqueId = uniqid();
$uniqueId1 = uniqid();
You can be certain that $uniqueId != $uniqueId1, even without the more_entropy flag, as each ID will always be generated from a different microsecond.
If the ID's are generated on a different server or possibly even the same server but a different thread, then there is a chance the time in microseconds is the same, therefore uniqueid may not be unique. If this is the case, then you use the more_entropy flag for an extra 29.9 bits more of entropy. The chance of a collision would now be so highly improbably, that its probably not worth even checking to make sure the ID already exists.
If you are generating the ID's only on a single server without multithreading php, then there's no point using the more_entropy flag, otherwise use it. If you need a cryptographically secure ID, then you should use a decent 256 bit RNG instead.

We Keep Coding

PHP, A popular general-purpose scripting language that is especially suited to web development.