How easily will uniqid() with more entropy create a duplicate?

How easily will uniqid() with more entropy create a duplicate? - php

This might be an off topic question but i hope someone can answer this question.
Per how many nanoseconds, mili seconds or seconds does uniqid() with more entropy run the risk of creating a duplicate?
With reference to link below, uniqid will collide if two id are created in one milisecond. What about with more entropy?
(My goal is to use a small indexable alphanumeric string as document id at creation that can be created fast with minimum processor power without db interference.)
Answers here dont seem to provide any exact number:
How unique is uniqid?

From the source code, more_entropy adds nine random decimal digits, so you can expect a collision after 37,000 or so calls. (For how a billion turned into 37,000, see the birthday attack.) That of course ignores the fact that these digits are not actually random but generated by an LCG, and the same LCG is probably used in other places in the code, so the actual chance of collision is probably higher (by how much exactly, I have no idea).
Also worth noting that uniqid does not actually guarantee microsecond resolution as some PHP implementations (Windows, specifically) don't have access to a microsecond-precision clock.
In short, if you need a unique ID for anything security-sensitive, or collisions are costly, avoid uniqid. Otherwise, using it with more_entropy is probably fine (although the common pattern is to use uniqid(mt_rand(), true) to add even more extra entropy).

Related

Is it wrong to use a hash for a unique ID?

I want to use a unique ID generated by PHP in a database table that will likely never have more than 10,000 records. I don't want the time of creation to be visible or use a purely numeric value so I am using:
sha1(uniqid(mt_rand(), true))
Is it wrong to use a hash for a unique ID? Don't all hashes lead to collisions or are the chances so remote that they should not be considered in this case?
A further point: if the number of characters to be hashed is less than the number of characters in a sha1 hash, won't it always be unique?

If you have 2 keys you will have a theoretical best case scenario of 1 in 2 ^ X probability of a collision, where X is the number of bits in your hashing algorithm. 'Best case' because the input usually will be ASCII which doesn't utilize the full charset, plus the hashing functions do not distribute perfectly, so they will collide more often than the theoretical max in real life.
To answer your final question:
A further point: if the number of characters to be hashed is less than
the number of characters in a sha1 hash, won't it always be unique?
Yeah that's true-sorta. But you would have another problem of generating unique keys of that size. The easiest way is usually a checksum, so just choose a large enough digest that the collision space will be small enough for your comfort.
As #wayne suggests, a popular approach is to concatenate microtime() to your random salt (and base64_encode to raise the entropy).

How horrible would it be if two ended up the same? Murphy's Law applies - if a million to one, or even a 100,000:1 chance is acceptable, then go right ahead! The real chance is much, much smaller - but if your system will explode if it happens then your design flaw must be addressed first. Then proceed with confidence.
Here is a question/answer of what the probabilities really are: Probability of SHA1 Collisions

Use sha1(time()) in stead, then you remove the random possibility of a repeating hash for as long as time can be represented shorter than the sha1 hash. (likely longer than you fill find a working php parser ;))

Computer random isn't actually random, you know?
The only true random that you can obtain from a computer, supposing you are on a Unix environment is from /dev/random, but this is a blocking operation that depends on user interactions like moving a mouse or typing on keyboard. Reading from /dev/urandom is less safe, but it's probably better thang using just ASCII characters and gives you instantaneous response.

sha1($ipAddress.time())
Causes it's impossible for anyone to use same IP address same time

Entropy in CSRF Tokens and Nonces

I am making a classified ads site with Zend Framework (for portfolio purposes, yes I know the world doesn't have room for "yet another Craigslist clone"). I am trying to implement the ability to post/edit/delete without ever needing an account.
To do this, I feel like I need to have a Nonce generated upon post submission and stored in the database. Then email a link to the user which makes a GET request for the delete, like this:
http://www.somesite.com/post/delete/?id=123&nonce=2JDXS93JFKS8204HJTHSLDH230945HSLDF
Only the user has this unique key or nonce, and upon submission I check the database under the post's ID and ensure the nonce matches prior to deleting.
My issue is how secure the nonce actually is. If I use Zend Framework's Zend_Form_Element_Hash, it creates the hash like this:
protected function _generateHash()
{
$this->_hash = md5(
mt_rand(1,1000000)
. $this->getSalt()
. $this->getName()
. mt_rand(1,1000000)
);
$this->setValue($this->_hash);
}
In reading about mt_rand(), one commenter said "This function has limited entrophy. So, if you want to create random string, it will produce only about 2 billion different strings, no matter the length of the string. This can be serous security issue if you are using such strings for session indentifiers, passwords etc."
Due to the lifetime of the nonce/token in the application, which could be days or weeks before user chooses to delete post, I think more than enough time would be given for a potential hack.
I realize mt_rand() is a huge upgrade from rand() as seen in this visual mapping pixels with rand on the left, and mt_rand on the right. But is it enough? What makes "2 billion different strings" a security issue?
And ultimately, how can I increase the entropy of a nonce/token/hash?

For such security it's not only important how long your output is. It counts how much randomness you've used to create it.
For mt_rand() the source of randomness is its seed and state (number of times you've used it since it was seeded). More mt_rand() calls will just give you more rehasing of the same randomness source (no new entropy).
mt_rand()'s seed is only 32-bit (anything less than 128bit makes cryptographers suspicious ;)
Strength of a keys with 32-bits of entropy is 4 billion divided by (roughly) number of keys you'll generate (e.g. after 100K uses there will be ~1:43000 chance to guess any valid key, which approaches practical brute-forcing).
You're adding salt to this, which makes it much stronger, because in addition to guessing the seed attacker would have to know the salt as well, so if the salt is long, then overall the key may be quite strong despite "low" entropy.
To increase entropy you need to add more random stuff (even slightly random is OK too, just gives less bits) from different sources than mt_rand: microtime(), amount of memory used, process ID... or just use /dev/random, which collects all entropy it can get.
(edit: uniqid() has weak entropy, so it won't help here)

The Zend hash generating code above's input for the md5() hashing function has 1,000,000 X 1,000,000 different possibilities. md5() has 32^16 (1208925819614629174706176) possible outcomes no matter what the input is. On average, the hacker would need to send 500,000,000,000 requests to your server in order to guess the right nonce.
At 100 requests per minute, that's about 3472222 days to hack.

How unique is uniqid?

This question isn't really a problem looking for a solution, it's more just a matter of simple curiosity. The PHP uniqid function has a more entropy flag, to make the output "more unique". This got me wondering, just how likely is it for this function to produce the same result more than once when more_entropy is true, versus when it isn't. In other words, how unique is uniqid when more_entropy is enabled, versus when it is disabled? Are there any drawbacks to having more_entropy enabled all the time?

Update, March 2014:
Firstly, it is important to note that uniqid is a bit of a misnomer as it doesnt guarantee a unique ID.
Per the PHP documentation:
WARNING!
This function does not create random nor unpredictable string. This
function must not be used for security purposes. Use cryptographically
secure random function/generator and cryptographically secure hash
functions to create unpredictable secure ID.
And
This function does not generate cryptographically secure tokens, in
fact without being passed any additional parameters the return value
is little different from microtime(). If you need to generate
cryptographically secure tokens use openssl_random_pseudo_bytes().
Setting more-entropy to true generates a more unique value, however the execution time is longer (though to a tiny degree), according to the docs:
If set to TRUE, uniqid() will add additional entropy (using the
combined linear congruential generator) at the end of the return
value, which increases the likelihood that the result will be unique.
Note the line increases the likelihood that the result will be unique and not that is will guarantee uniqueness.
You can 'endlessly' strive for uniqueness, up to a point, and enhance using any number of encryption routines, adding salts and the like- it depends on the purpose.
I'd recommend looking at the comments on the main PHP topic, notably:
http://www.php.net/manual/en/function.uniqid.php#96898
http://www.php.net/manual/en/function.uniqid.php#96549
http://www.php.net/manual/en/function.uniqid.php#95001
What I'd recommend is working out why you need uniqueness, is it for security (i.e. to add to an encryption/scrambling routine)? Also, How unique does it need to be? Finally, look at the speed consideration. Suitability will change with the underlying considerations.

Things are only unique if you check that they don't exist already. It doesn't matter what function you use to generate a 'random' string, or ID - if you don't double check that it isn't a duplicate, then there's always that chance.. ;)
While uniqid is based on the current time, the cautionary note above still applies - it just depends on where you will be using these "unique IDs". The clue to all this is where it says "more unique". Unique is unique is unique. How you can have something which is more or less unique, is a bit confusing to me!
Checking as above, and combining all this stuff will let you end up with something approaching uniqueness, but its all relative to where the keys will be used and the context. Hope that helps!

From the discussions about the function on the PHP manual site:
As others below note, without prefix
and without "added entropy", this
function simply returns the UNIX
timestamp with added microsecond
counter as a hex number; it's more or
less just microtime(), in hexit form.
[...]
Also worth to note is that since microtime() only works on systems that have gettimeofday() > present, which Windows natively DOES NOT, uniqid() might yield just the single-second-resolution UNIX timestamp in a Windows environment.
In other words without "more_entropy", the function is absolutely horrible and should never be used, period. Accoding to the documentation, the flag will use a "combined linear congruential generator" to "add entropy". Well, that's a pretty weak RNG. So I'd skip this function entirely and use something based on mt_rand with a good seed for things that are not security-relevant, and SHA-256 for things that are.

Without the more_unique flag, it returns the unix timestamp with a microsecond counter, therefore if two calls get made at the same microsecond then they will return the same 'unique' id.
From there it is a question of how likely that is. The answer is, not very, but not to a discountable degree. If you need a unique id and you generate them often (or work with data generated elsewhere), don't count on it to be absolutely unique.

The relevant bit from the source code is
if (more_entropy) {
uniqid = strpprintf(0, "%s%08x%05x%.8F", prefix, sec, usec, php_combined_lcg() * 10);
} else {
uniqid = strpprintf(0, "%s%08x%05x", prefix, sec, usec);
}
So more_entropy adds nine somewhat random decimal digits (php_combined_lcg() returns a value in (0,1)) - that's 29.9 bits of entropy, tops (in reality probably less as LCG is not a cryptographically secure pseudorandom number generator).

After reading the source code of uniqueId, it's clear the way it works is to convert the time in microseconds from 1970-01-01 00:00:00 into an ID. It also waits until a microsecond has passed.
That means in the following code:
$uniqueId = uniqid();
$uniqueId1 = uniqid();
You can be certain that $uniqueId != $uniqueId1, even without the more_entropy flag, as each ID will always be generated from a different microsecond.
If the ID's are generated on a different server or possibly even the same server but a different thread, then there is a chance the time in microseconds is the same, therefore uniqueid may not be unique. If this is the case, then you use the more_entropy flag for an extra 29.9 bits more of entropy. The chance of a collision would now be so highly improbably, that its probably not worth even checking to make sure the ID already exists.
If you are generating the ID's only on a single server without multithreading php, then there's no point using the more_entropy flag, otherwise use it. If you need a cryptographically secure ID, then you should use a decent 256 bit RNG instead.

Why is MD5'ing a UUID not a good idea?

PHP has a uniqid() function which generates a UUID of sorts.
In the usage examples, it shows the following:
$token = md5(uniqid());
But in the comments, someone says this:
Generating an MD5 from a unique ID is
naive and reduces much of the value of
unique IDs, as well as providing
significant (attackable) stricture on
the MD5 domain. That's a deeply
broken thing to do. The correct
approach is to use the unique ID on
its own; it's already geared for
non-collision.
Why is this true, if so? If an MD5 hash is (almost) unique for a unique ID, then what is wrong from md5'ing a uniqid?

A UUID is 128 bits wide and has uniqueness inherent to the way it is generated. A MD5 hash is 128 bits wide and doesn't guarantee uniquess, only a low probablity of collision. The MD5 hash is no smaller than the UUID so it doesn't help with storage.
If you know the hash is from a UUID it is much easier to attack because the domain of valid UUIDs is actually fairly predictable if you know anything about the machine geneerating them.
If you needed to provide a secure token then you would need to use a cryptographically secure random number generator.(1) UUIDs are not designed to be cryptographically secure, only guaranteed unique. A monotonically increasing sequence bounded by unique machine identifiers (typically a MAC) and time is still a perfectly valid UUID but highly predictable if you can reverse engineer a single UUID from the sequence of tokens.
The defining characteristic of a cryptographically secure PRNG is that the result of a given iteration does not contain enough information to infer the value of the next iteration - i.e. there is some hidden state in the generator that is not revealed in the number and cannot be inferred by examining a sequence of numbers from the PRNG. If you get into number theory you can find ways to guess the internal state of some PRNGs from a sequence of generated values. Mersenne Twister is an example of such a generator. It has hidden state that it used to get its long period but it is not cryptographically secure - you can take a fairly small sequence of numbers and use that to infer the internal state. Once you've done this you can use it to attack a cryptographic mechanism that depends on keeping that sequence a secret.

Note that uniqid() does not return a UUID, but a "unique" string based on the current time:
$ php -r 'echo uniqid("prefix_", true);'
prefix_4a8aaada61b0f0.86531181
If you do that multiple times, you will get very similar output strings and everyone who is familiar with uniqid() will recognize the source algorithm. That way it is pretty easy to predict the next IDs that will be generated.
The advantage of md5()-ing the output, along with an application-specific salt string or random number, is a way harder to guess string:
$ php -r 'echo md5(uniqid("prefix_", true));'
3dbb5221b203888fc0f41f5ef960f51b
Unlike plain uniqid(), this produces very different outputs every microsecond. Furthermore it does not reveil your "prefix salt" string, nor that you are using uniqid() under the hood. Without knowing the salt, it is very hard (consider it impossible) to guess the next ID.
In summary, I would disagree with the commentor's opinion and would always prefer the md5()-ed output over plain uniqid().

MD5ing a UUID is pointless because UUIDs are already unique and fixed length (short), properties that are some of the reasons that people often use MD5 to begin with. So I suppose it depends on what you plan on doing with the UUID, but in general a UUID has the same properties as some data that has been MD5'd, so why do both?

UUIDs are already unique, so there is no point in MD5'ing them anyway.
Regarding the security question, in general you can be attacked if the attacker can predict what the next unique ID will be you are about to generate. If it is known that you generate your unique IDs from UUIDs, the set of potential next unique IDs is much smaller, giving a better chance for a brute force attack.
This is especially true if the attacker can get a whole bunch of unique IDs from you, and that way guess your scheme of generating UUIDs.

Version 3 of UUIDs are already MD5'd, so there's no point in doing it again. However, I'm not sure what UUID version PHP uses.

As an aside, MD5 is actually obsolete and is not to be used in anything worth protecting - PHI, PII or PCI - from 2010 onwards. The US Feds have ennforced this and any entity non-compliant would be paying lots of $$$ in penalty.

Unique key generation

I looking for a way, specifically in PHP that I will be guaranteed to always get a unique key.
I have done the following:
strtolower(substr(crypt(time()), 0, 7));
But I have found that once in a while I end up with a duplicate key (rarely, but often enough).
I have also thought of doing:
strtolower(substr(crypt(uniqid(rand(), true)), 0, 7));
But according to the PHP website, uniqid() could, if uniqid() is called twice in the same microsecond, it could generate the same key. I'm thinking that the addition of rand() that it rarely would, but still possible.
After the lines mentioned above I am also remove characters such as L and O so it's less confusing for the user. This maybe part of the cause for the duplicates, but still necessary.
One option I have a thought of is creating a website that will generate the key, storing it in a database, ensuring it's completely unique.
Any other thoughts? Are there any websites out there that already do this that have some kind of API or just return the key. I found http://userident.com but I'm not sure if the keys will be completely unique.
This needs to run in the background without any user input.

There are only 3 ways to generate unique values, rather they be passwords, user IDs, etc.:
Use an effective GUID generator - these are long and cannot be shrunk. If you only use part you FAIL.
At least part of the number is sequentially generated off of a single sequence. You can add fluff or encoding to make it look less sequential. Advantage is they start short - disadvantage is they require a single source. The work around for the single source limitation is to have numbered sources, so you include the [source #] + [seq #] and then each source can generate its own sequence.
Generate them via some other means and then check them against the single history of previously generated values.
Any other method is not guaranteed. Keep in mind, fundamentally you are generating a binary number (it is a computer), but then you can encode it in Hexadecimal, Decimal, Base64, or a word list. Pick an encoding that fits your usage. Usually for user entered data you want some variation of Base32 (which you hinted at).
Note about GUIDS: They gain their strength of uniqueness from their length and the method used to generate them. Anything less than 128-bits is not secure. Beyond random number generation there are characteristics that go into a GUID to make it more unique. Keep in mind they are only practically unique, not completely unique. It is possible, although practically impossible to have a duplicate.
Updated Note about GUIDS: Since writing this I learned that many GUID generators use a cryptographically secure random number generator (difficult or impossible to predict the next number generated, and a not likely to repeat). There are actually 5 different UUID algorithms. Algorithm 4 is what Microsoft currently uses for the Windows GUID generation API. A GUID is Microsoft's implementation of the UUID standard.
Update: If you want 7 to 16 characters then you need to use either method 2 or 3.
Bottom line: Frankly there is no such thing as completely unique. Even if you went with a sequential generator you would eventually run out of storage using all the atoms in the universe, thus looping back on yourself and repeating. Your only hope would be the heat death of the universe before reaching that point.
Even the best random number generator has a possibility of repeating equal to the total size of the random number you are generating. Take a quarter for example. It is a completely random bit generator, and its odds of repeating are 1 in 2.
So it all comes down to your threshold of uniqueness. You can have 100% uniqueness in 8 digits for 1,099,511,627,776 numbers by using a sequence and then base32 encoding it. Any other method that does not involve checking against a list of past numbers only has odds equal to n/1,099,511,627,776 (where n=number of previous numbers generated) of not being unique.

Any algorithm will result in duplicates.
Therefore, might I suggest that you use your existing algorithm* and simply check for duplicates?
*Slight addition: If uniqid() can be non-unique based on time, also include a global counter that you increment after every invocation. That way something is different even in the same microsecond.

Without writing the code, my logic would be:
Generate a random string from whatever acceptable characters you like.
Then add half the date stamp (partial seconds and all) to the front and the other half to the end (or somewhere in the middle if you prefer).
Stay JOLLY!
H

If you use your original method, but add the username or emailaddress in front of the password, it will always be unique if each user only can have 1 password.

You may be interested in this article which deals with the same issue: GUIDs are globally unique, but substrings of GUIDs aren't.
The goal of this algorithm is to use the combination of time and location ("space-time coordinates" for the relativity geeks out there) as the uniqueness key. However, timekeeping is not perfect, so there's a possibility that, for example, two GUIDs are generated in rapid succession from the same machine, so close to each other in time that the timestamp would be the same. That's where the uniquifier comes in.

I usually do it like this:
$this->password = '';
for($i=0; $i<10; $i++)
{
if($i%2 == 0)
$this->password .= chr(rand(65,90));
if($i%3 == 0)
$this->password .= chr(rand(97,122));
if($i%4 == 0)
$this->password .= chr(rand(48,57));
}
I suppose there are some theoretical holes but I've never had an issue with duplication. I usually use it for temporary passwords (like after a password reset) and it works well enough for that.

As Frank Kreuger commented, go with a GUID generator.
Like this one

I'm still not seeing why the passwords have to be unique? What's the downside if 2 of your users have the same password?
This is assuming we're talking about passwords that are tied to userids, and not just unique identifiers. If that's what you're looking for, why not use GUIDs?

You might be interested in Steve Gibson's over-the-top-secure implementation of a password generator (no source, but he has a detailed description of how it works) at https://www.grc.com/passwords.htm.
The site creates huge 64-character passwords but, since they're completely random, you could easily take the first 8 (or however many) characters for a less secure but "as random as possible" password.
EDIT: from your later answers I see you need something more like a GUID than a password, so this probably isn't what you want...

I do believe that part of your issue is that you are trying to us a singular function for two separate uses... passwords and transaction_id
these really are two different problem areas and it really is not best to try to address them together.

I recently wanted a quick and simple random unique key so I did the following:
$ukey = dechex(time()) . crypt( time() . md5(microtime() + mt_rand(0, 100000)) );
So, basically, I get the unix time in seconds and add a random md5 string generated from time + random number. It's not the best, but for low frequency requests it is pretty good. It's fast and works.
I did a test where I'd generate thousands of keys and then look for repeats, and having about 800 keys per second there were no repetitions, so not bad. I guess it totally depends on mt_rand()
I use it for a survey tracker where we get a submission rate of about 1000 surveys per minute... so for now (crosses fingers) there are no duplicates. Of course, the rate is not constant (we get the submissions at certain times of the day) so this is not fail proof nor the best solution... the tip is using an incremental value as part of the key (in my case, I used time(), but could be better).

Ingoring the crypting part that does not have much to do with creating a unique value I usually use this one:
function GetUniqueValue()
{
static $counter = 0; //initalized only 1st time function is called
return strtr(microtime(), array('.' => '', ' ' => '')) . $counter++;
}
When called in same process $counter is increased so value is always unique in same process.
When called in different processes you must be really unlucky to get 2 microtime() call with the same values, think that microtime() calls usually have different values also when called in same script.

I usually do a random substring (randomize how many chars between 8 an 32, or less for user convenience) or the MD5 of some value I have gotten in, or the time, or some combination. For more randomness I do MD5 of come value (say last name) concatenate that with the time, MD5 it again, then take the random substring. Yes, you could get equal passwords, but its not very likely at all.

We Keep Coding

PHP, A popular general-purpose scripting language that is especially suited to web development.