Unique key generation

Unique key generation - php

I looking for a way, specifically in PHP that I will be guaranteed to always get a unique key.
I have done the following:
strtolower(substr(crypt(time()), 0, 7));
But I have found that once in a while I end up with a duplicate key (rarely, but often enough).
I have also thought of doing:
strtolower(substr(crypt(uniqid(rand(), true)), 0, 7));
But according to the PHP website, uniqid() could, if uniqid() is called twice in the same microsecond, it could generate the same key. I'm thinking that the addition of rand() that it rarely would, but still possible.
After the lines mentioned above I am also remove characters such as L and O so it's less confusing for the user. This maybe part of the cause for the duplicates, but still necessary.
One option I have a thought of is creating a website that will generate the key, storing it in a database, ensuring it's completely unique.
Any other thoughts? Are there any websites out there that already do this that have some kind of API or just return the key. I found http://userident.com but I'm not sure if the keys will be completely unique.
This needs to run in the background without any user input.

There are only 3 ways to generate unique values, rather they be passwords, user IDs, etc.:
Use an effective GUID generator - these are long and cannot be shrunk. If you only use part you FAIL.
At least part of the number is sequentially generated off of a single sequence. You can add fluff or encoding to make it look less sequential. Advantage is they start short - disadvantage is they require a single source. The work around for the single source limitation is to have numbered sources, so you include the [source #] + [seq #] and then each source can generate its own sequence.
Generate them via some other means and then check them against the single history of previously generated values.
Any other method is not guaranteed. Keep in mind, fundamentally you are generating a binary number (it is a computer), but then you can encode it in Hexadecimal, Decimal, Base64, or a word list. Pick an encoding that fits your usage. Usually for user entered data you want some variation of Base32 (which you hinted at).
Note about GUIDS: They gain their strength of uniqueness from their length and the method used to generate them. Anything less than 128-bits is not secure. Beyond random number generation there are characteristics that go into a GUID to make it more unique. Keep in mind they are only practically unique, not completely unique. It is possible, although practically impossible to have a duplicate.
Updated Note about GUIDS: Since writing this I learned that many GUID generators use a cryptographically secure random number generator (difficult or impossible to predict the next number generated, and a not likely to repeat). There are actually 5 different UUID algorithms. Algorithm 4 is what Microsoft currently uses for the Windows GUID generation API. A GUID is Microsoft's implementation of the UUID standard.
Update: If you want 7 to 16 characters then you need to use either method 2 or 3.
Bottom line: Frankly there is no such thing as completely unique. Even if you went with a sequential generator you would eventually run out of storage using all the atoms in the universe, thus looping back on yourself and repeating. Your only hope would be the heat death of the universe before reaching that point.
Even the best random number generator has a possibility of repeating equal to the total size of the random number you are generating. Take a quarter for example. It is a completely random bit generator, and its odds of repeating are 1 in 2.
So it all comes down to your threshold of uniqueness. You can have 100% uniqueness in 8 digits for 1,099,511,627,776 numbers by using a sequence and then base32 encoding it. Any other method that does not involve checking against a list of past numbers only has odds equal to n/1,099,511,627,776 (where n=number of previous numbers generated) of not being unique.

Any algorithm will result in duplicates.
Therefore, might I suggest that you use your existing algorithm* and simply check for duplicates?
*Slight addition: If uniqid() can be non-unique based on time, also include a global counter that you increment after every invocation. That way something is different even in the same microsecond.

Without writing the code, my logic would be:
Generate a random string from whatever acceptable characters you like.
Then add half the date stamp (partial seconds and all) to the front and the other half to the end (or somewhere in the middle if you prefer).
Stay JOLLY!
H

If you use your original method, but add the username or emailaddress in front of the password, it will always be unique if each user only can have 1 password.

You may be interested in this article which deals with the same issue: GUIDs are globally unique, but substrings of GUIDs aren't.
The goal of this algorithm is to use the combination of time and location ("space-time coordinates" for the relativity geeks out there) as the uniqueness key. However, timekeeping is not perfect, so there's a possibility that, for example, two GUIDs are generated in rapid succession from the same machine, so close to each other in time that the timestamp would be the same. That's where the uniquifier comes in.

I usually do it like this:
$this->password = '';
for($i=0; $i<10; $i++)
{
if($i%2 == 0)
$this->password .= chr(rand(65,90));
if($i%3 == 0)
$this->password .= chr(rand(97,122));
if($i%4 == 0)
$this->password .= chr(rand(48,57));
}
I suppose there are some theoretical holes but I've never had an issue with duplication. I usually use it for temporary passwords (like after a password reset) and it works well enough for that.

As Frank Kreuger commented, go with a GUID generator.
Like this one

I'm still not seeing why the passwords have to be unique? What's the downside if 2 of your users have the same password?
This is assuming we're talking about passwords that are tied to userids, and not just unique identifiers. If that's what you're looking for, why not use GUIDs?

You might be interested in Steve Gibson's over-the-top-secure implementation of a password generator (no source, but he has a detailed description of how it works) at https://www.grc.com/passwords.htm.
The site creates huge 64-character passwords but, since they're completely random, you could easily take the first 8 (or however many) characters for a less secure but "as random as possible" password.
EDIT: from your later answers I see you need something more like a GUID than a password, so this probably isn't what you want...

I do believe that part of your issue is that you are trying to us a singular function for two separate uses... passwords and transaction_id
these really are two different problem areas and it really is not best to try to address them together.

I recently wanted a quick and simple random unique key so I did the following:
$ukey = dechex(time()) . crypt( time() . md5(microtime() + mt_rand(0, 100000)) );
So, basically, I get the unix time in seconds and add a random md5 string generated from time + random number. It's not the best, but for low frequency requests it is pretty good. It's fast and works.
I did a test where I'd generate thousands of keys and then look for repeats, and having about 800 keys per second there were no repetitions, so not bad. I guess it totally depends on mt_rand()
I use it for a survey tracker where we get a submission rate of about 1000 surveys per minute... so for now (crosses fingers) there are no duplicates. Of course, the rate is not constant (we get the submissions at certain times of the day) so this is not fail proof nor the best solution... the tip is using an incremental value as part of the key (in my case, I used time(), but could be better).

Ingoring the crypting part that does not have much to do with creating a unique value I usually use this one:
function GetUniqueValue()
{
static $counter = 0; //initalized only 1st time function is called
return strtr(microtime(), array('.' => '', ' ' => '')) . $counter++;
}
When called in same process $counter is increased so value is always unique in same process.
When called in different processes you must be really unlucky to get 2 microtime() call with the same values, think that microtime() calls usually have different values also when called in same script.

I usually do a random substring (randomize how many chars between 8 an 32, or less for user convenience) or the MD5 of some value I have gotten in, or the time, or some combination. For more randomness I do MD5 of come value (say last name) concatenate that with the time, MD5 it again, then take the random substring. Yes, you could get equal passwords, but its not very likely at all.

Related

How easily will uniqid() with more entropy create a duplicate?

This might be an off topic question but i hope someone can answer this question.
Per how many nanoseconds, mili seconds or seconds does uniqid() with more entropy run the risk of creating a duplicate?
With reference to link below, uniqid will collide if two id are created in one milisecond. What about with more entropy?
(My goal is to use a small indexable alphanumeric string as document id at creation that can be created fast with minimum processor power without db interference.)
Answers here dont seem to provide any exact number:
How unique is uniqid?

From the source code, more_entropy adds nine random decimal digits, so you can expect a collision after 37,000 or so calls. (For how a billion turned into 37,000, see the birthday attack.) That of course ignores the fact that these digits are not actually random but generated by an LCG, and the same LCG is probably used in other places in the code, so the actual chance of collision is probably higher (by how much exactly, I have no idea).
Also worth noting that uniqid does not actually guarantee microsecond resolution as some PHP implementations (Windows, specifically) don't have access to a microsecond-precision clock.
In short, if you need a unique ID for anything security-sensitive, or collisions are costly, avoid uniqid. Otherwise, using it with more_entropy is probably fine (although the common pattern is to use uniqid(mt_rand(), true) to add even more extra entropy).

Unique token in CakePHP

I need to create truly unique token when inserting records in CakePHP. The table can contain millions of rows so I cant just base on some randomly generated strings. I do not want to use a microtime() as well, because there is, though very small probability that two records can be submitted exactly at the same moment.
Of course the best solution would be to use String::uuid(), but as from cakephp documentation
The uuid method is used to generate unique identifiers as per RFC 4122. The uuid is a 128bit string in the format of 485fc381-e790-47a3-9794-1337c0a8fe68.
So, as far as I understood it does not use cake's security salt for its generation. So, I decided to hash it by security component's hash function (or Auth Password function), because I need it to be unique and very, really very secure at the same time. But then I found the question, saying that it is not a good idea, but for php uniqid and md5.
Why is MD5'ing a UUID not a good idea?
And, also I think the string hashed by security component is much harder to guess - because, for example String::uuid() in for loop has an output like this
for ($i = 0; $i < 30; $i++) {
echo String::uuid()."<br>";
}
die;
// outputs
51f3dcda-c4fc-4141-aaaf-1378654d2d93
51f3dcda-d9b0-4c20-8d03-1378654d2d93
51f3dcda-e7c0-4ddf-b808-1378654d2d93
51f3dcda-f508-4482-852d-1378654d2d93
51f3dcda-01ec-4f24-83b1-1378654d2d93
51f3dcda-1060-49d2-adc0-1378654d2d93
51f3dcda-1da8-4cfe-abe4-1378654d2d93
51f3dcda-2af0-42f7-81a0-1378654d2d93
51f3dcda-3838-4879-b2c9-1378654d2d93
51f3dcda-451c-465a-a644-1378654d2d93
51f3dcda-5264-44b0-a883-1378654d2d93
So, after all the some part of the string is similar, but in case of using hash function the results are pretty different
echo Security::hash('stackoverflow1');
echo "<br>";
echo Security::hash('stackoverflow2');
die;
// outputs
e9a3fcb74b9a03c7a7ab8731053ab9fe5d2fe6bd
b1f95bdbef28db16f8d4f912391c22310ba3c2c2
So, the question is, can I after all hash the uuid() in Cake? Or what is the best secure way to get truly unique and hashed (better according to my security salt) secure token.
UPDATE
Saying secure token, I mean how difficult it is for guessing. UUID is really unique, but from the example above, some records have some similarity. But hashed results do not.
Thanks !!

I don't think you need to worry about the UUIDs overlapping.
To put these numbers into perspective, the annual risk of someone being hit by a meteorite is estimated to be one chance in 17 billion,[38] which means the probability is about 0.00000000006 (6 × 10−11), equivalent to the odds of creating a few tens of trillions of UUIDs in a year and having one duplicate. In other words, only after generating 1 billion UUIDs every second for the next 100 years, the probability of creating just one duplicate would be about 50%. Or, to put it another way, the probability of one duplicate would be about 50% if every person on earth owns 600 million UUIDs.
http://en.wikipedia.org/wiki/Universally_unique_identifier#Random_UUID_probability_of_duplicates
Continue to use String::uuid() and rest easy :)

A UUID is unique
I need to create truly unique token when inserting records in cakphp
That is exactly what a UUID is. It is normally used in distributed systems to prevent collisions (multiple sources inserting data, possibly out of sync, into a datasource).
A UUID is not a security measure
I need it to be unique and very, really very secure at the same time
Not sure in what way hashing a uuid is supposed to enhance security - it won't. Relying on security by obscurity is more or less guaranteed to fail.
If your need is random tokens of some form - use a hash function (Hashing a uuid is simply hashing a random seed), if you need guaranteed-unique identifiers use UUIDs. They aren't the same thing and a UUID is a very poor mechanism of generating random, non-sequential "un-guessable" (or whatever the purpose is) strings.

Generating a random string suitable for cryptographic purposes was answered well here:
Secure random number generation in PHP
The code sample fills the string $pr_bits with random binary data, so the characters are unprintable. To use this in a URL, you could convert the binary data to printable characters a couple ways. None of them enhance the security but make them ready for URLs.
convert bytes to hex: bin2hex($pr_bits)
convert bytes to base64: base64_encode($pr_bits)
hash the bytes (because the output is conveniently in hex, not for added security): string hash ('md5' , $pr_bits)
I include the last one because you will see people use hash functions for other reasons, like to guarantee the output is 16bytes/128bits for md5. In PHP people use it to convert a value into HEX.

I have come up with the following solution
to use a string as a result of concatenating current time in microseconds and random string's hash
$timeStr = str_replace("0.", "", microtime());
$timeStr = str_replace(" ", "", $timeStr);
echo Security::hash('random string').'_'.$timeStr;
// 5ffd3b852ccdd448809abb172e19bbb9c01a43a4_796473001379403705
So, the first part(hash) of the string will contribute for the unguessability of the token, and the second part will guarantee its uniquenes.
Hope, this will help someone.

Is it wrong to use a hash for a unique ID?

I want to use a unique ID generated by PHP in a database table that will likely never have more than 10,000 records. I don't want the time of creation to be visible or use a purely numeric value so I am using:
sha1(uniqid(mt_rand(), true))
Is it wrong to use a hash for a unique ID? Don't all hashes lead to collisions or are the chances so remote that they should not be considered in this case?
A further point: if the number of characters to be hashed is less than the number of characters in a sha1 hash, won't it always be unique?

If you have 2 keys you will have a theoretical best case scenario of 1 in 2 ^ X probability of a collision, where X is the number of bits in your hashing algorithm. 'Best case' because the input usually will be ASCII which doesn't utilize the full charset, plus the hashing functions do not distribute perfectly, so they will collide more often than the theoretical max in real life.
To answer your final question:
A further point: if the number of characters to be hashed is less than
the number of characters in a sha1 hash, won't it always be unique?
Yeah that's true-sorta. But you would have another problem of generating unique keys of that size. The easiest way is usually a checksum, so just choose a large enough digest that the collision space will be small enough for your comfort.
As #wayne suggests, a popular approach is to concatenate microtime() to your random salt (and base64_encode to raise the entropy).

How horrible would it be if two ended up the same? Murphy's Law applies - if a million to one, or even a 100,000:1 chance is acceptable, then go right ahead! The real chance is much, much smaller - but if your system will explode if it happens then your design flaw must be addressed first. Then proceed with confidence.
Here is a question/answer of what the probabilities really are: Probability of SHA1 Collisions

Use sha1(time()) in stead, then you remove the random possibility of a repeating hash for as long as time can be represented shorter than the sha1 hash. (likely longer than you fill find a working php parser ;))

Computer random isn't actually random, you know?
The only true random that you can obtain from a computer, supposing you are on a Unix environment is from /dev/random, but this is a blocking operation that depends on user interactions like moving a mouse or typing on keyboard. Reading from /dev/urandom is less safe, but it's probably better thang using just ASCII characters and gives you instantaneous response.

sha1($ipAddress.time())
Causes it's impossible for anyone to use same IP address same time

Entropy in CSRF Tokens and Nonces

I am making a classified ads site with Zend Framework (for portfolio purposes, yes I know the world doesn't have room for "yet another Craigslist clone"). I am trying to implement the ability to post/edit/delete without ever needing an account.
To do this, I feel like I need to have a Nonce generated upon post submission and stored in the database. Then email a link to the user which makes a GET request for the delete, like this:
http://www.somesite.com/post/delete/?id=123&nonce=2JDXS93JFKS8204HJTHSLDH230945HSLDF
Only the user has this unique key or nonce, and upon submission I check the database under the post's ID and ensure the nonce matches prior to deleting.
My issue is how secure the nonce actually is. If I use Zend Framework's Zend_Form_Element_Hash, it creates the hash like this:
protected function _generateHash()
{
$this->_hash = md5(
mt_rand(1,1000000)
. $this->getSalt()
. $this->getName()
. mt_rand(1,1000000)
);
$this->setValue($this->_hash);
}
In reading about mt_rand(), one commenter said "This function has limited entrophy. So, if you want to create random string, it will produce only about 2 billion different strings, no matter the length of the string. This can be serous security issue if you are using such strings for session indentifiers, passwords etc."
Due to the lifetime of the nonce/token in the application, which could be days or weeks before user chooses to delete post, I think more than enough time would be given for a potential hack.
I realize mt_rand() is a huge upgrade from rand() as seen in this visual mapping pixels with rand on the left, and mt_rand on the right. But is it enough? What makes "2 billion different strings" a security issue?
And ultimately, how can I increase the entropy of a nonce/token/hash?

For such security it's not only important how long your output is. It counts how much randomness you've used to create it.
For mt_rand() the source of randomness is its seed and state (number of times you've used it since it was seeded). More mt_rand() calls will just give you more rehasing of the same randomness source (no new entropy).
mt_rand()'s seed is only 32-bit (anything less than 128bit makes cryptographers suspicious ;)
Strength of a keys with 32-bits of entropy is 4 billion divided by (roughly) number of keys you'll generate (e.g. after 100K uses there will be ~1:43000 chance to guess any valid key, which approaches practical brute-forcing).
You're adding salt to this, which makes it much stronger, because in addition to guessing the seed attacker would have to know the salt as well, so if the salt is long, then overall the key may be quite strong despite "low" entropy.
To increase entropy you need to add more random stuff (even slightly random is OK too, just gives less bits) from different sources than mt_rand: microtime(), amount of memory used, process ID... or just use /dev/random, which collects all entropy it can get.
(edit: uniqid() has weak entropy, so it won't help here)

The Zend hash generating code above's input for the md5() hashing function has 1,000,000 X 1,000,000 different possibilities. md5() has 32^16 (1208925819614629174706176) possible outcomes no matter what the input is. On average, the hacker would need to send 500,000,000,000 requests to your server in order to guess the right nonce.
At 100 requests per minute, that's about 3472222 days to hack.

Sha1 substring question

I am making a pastebin type site and am trying to make the id be a random string like paste.com/4RT65L
I am getting the sha1 of the id before i add it to the database but i am getting the substring of the first 8 characters of the sha1. is their a possibility of their being a double copy of the same sha1? I dont want their to accidentaly be a second paste with an id that has already been used?

Well the odds of having a collision in the 8 characters is significantly higher than having a collision with two Sha1 keys, but that doesn't mean it is likely that it will happen.
I would recommend that you do some testing on it. Generate random input and see how long it takes before you have a collision. If you like the results, then go with it. Otherwise, you'll need a longer string.
EDIT: You can also calculate the odds of a collision by looking at the Birthday Paradox.
Basically, if you are taking the first 8 hex digits from the SHA-1, then you have 16**8 (4,294,967,296) different available combinations.
Using an online Birthay Paradox calculator, after about 9200 hashes, you will have a 1% chance of a collision. It will take about 30,000 hashes before you have a 10% chance, and 77,000 before you have a 50% chance.
Its important to point out that as long as your hash function does a decent job of being pseudo-random, it doesn't matter which one you use (whether it is SHA1, MD5, or any form of Checksum)--these numbers assume completely random inputs, and thus you can only approach these values by using increasingly better hash functions.
So in the end, it depends on how much traffic you are expecting. If this is a small site, you can probably get away with it. If it is a large amount of traffic, then your odds of a collision are very high.

Before assigning the id, you could always check that it isn't taken... or even better, put a unique id on the database field... problem solved. :)
Wait, you say SHA1 of the id. You don't mean the autoinc id do you? My first guesses would be:
356a192b
da4b9237
77de68de
If you are using a random id, why run sha1 on it?

I figured it out, my code is:
strtoupper(substr(sha1($token_start . $id . $token_end), 0, 8))
where $id is the id which is obtained be finding out what the total amount of id's are in the database + 1, being the next id since it is auto increment.
then when it inserts the entry it inserts the encrypted.
$token_start and $token_end are both random strings you can choose to make the new id unique.
I made a loop which inserted them 32 000 times into a database, just the id, autoincrement along with the new id and i did a search with distinct and didnt get any dublicates. thats more than enough for me. Any comments would be helpful. I dont know how long it would take untile it would give me a collision. if anybody knows when the first one would be that would be awesome.

We Keep Coding

PHP, A popular general-purpose scripting language that is especially suited to web development.