Unique token in CakePHP - php

I need to create truly unique token when inserting records in CakePHP. The table can contain millions of rows so I cant just base on some randomly generated strings. I do not want to use a microtime() as well, because there is, though very small probability that two records can be submitted exactly at the same moment.
Of course the best solution would be to use String::uuid(), but as from cakephp documentation
The uuid method is used to generate unique identifiers as per RFC 4122. The uuid is a 128bit string in the format of 485fc381-e790-47a3-9794-1337c0a8fe68.
So, as far as I understood it does not use cake's security salt for its generation. So, I decided to hash it by security component's hash function (or Auth Password function), because I need it to be unique and very, really very secure at the same time. But then I found the question, saying that it is not a good idea, but for php uniqid and md5.
Why is MD5'ing a UUID not a good idea?
And, also I think the string hashed by security component is much harder to guess - because, for example String::uuid() in for loop has an output like this
for ($i = 0; $i < 30; $i++) {
echo String::uuid()."<br>";
}
die;
// outputs
51f3dcda-c4fc-4141-aaaf-1378654d2d93
51f3dcda-d9b0-4c20-8d03-1378654d2d93
51f3dcda-e7c0-4ddf-b808-1378654d2d93
51f3dcda-f508-4482-852d-1378654d2d93
51f3dcda-01ec-4f24-83b1-1378654d2d93
51f3dcda-1060-49d2-adc0-1378654d2d93
51f3dcda-1da8-4cfe-abe4-1378654d2d93
51f3dcda-2af0-42f7-81a0-1378654d2d93
51f3dcda-3838-4879-b2c9-1378654d2d93
51f3dcda-451c-465a-a644-1378654d2d93
51f3dcda-5264-44b0-a883-1378654d2d93
So, after all the some part of the string is similar, but in case of using hash function the results are pretty different
echo Security::hash('stackoverflow1');
echo "<br>";
echo Security::hash('stackoverflow2');
die;
// outputs
e9a3fcb74b9a03c7a7ab8731053ab9fe5d2fe6bd
b1f95bdbef28db16f8d4f912391c22310ba3c2c2
So, the question is, can I after all hash the uuid() in Cake? Or what is the best secure way to get truly unique and hashed (better according to my security salt) secure token.
UPDATE
Saying secure token, I mean how difficult it is for guessing. UUID is really unique, but from the example above, some records have some similarity. But hashed results do not.
Thanks !!

I don't think you need to worry about the UUIDs overlapping.
To put these numbers into perspective, the annual risk of someone being hit by a meteorite is estimated to be one chance in 17 billion,[38] which means the probability is about 0.00000000006 (6 × 10−11), equivalent to the odds of creating a few tens of trillions of UUIDs in a year and having one duplicate. In other words, only after generating 1 billion UUIDs every second for the next 100 years, the probability of creating just one duplicate would be about 50%. Or, to put it another way, the probability of one duplicate would be about 50% if every person on earth owns 600 million UUIDs.
http://en.wikipedia.org/wiki/Universally_unique_identifier#Random_UUID_probability_of_duplicates
Continue to use String::uuid() and rest easy :)

A UUID is unique
I need to create truly unique token when inserting records in cakphp
That is exactly what a UUID is. It is normally used in distributed systems to prevent collisions (multiple sources inserting data, possibly out of sync, into a datasource).
A UUID is not a security measure
I need it to be unique and very, really very secure at the same time
Not sure in what way hashing a uuid is supposed to enhance security - it won't. Relying on security by obscurity is more or less guaranteed to fail.
If your need is random tokens of some form - use a hash function (Hashing a uuid is simply hashing a random seed), if you need guaranteed-unique identifiers use UUIDs. They aren't the same thing and a UUID is a very poor mechanism of generating random, non-sequential "un-guessable" (or whatever the purpose is) strings.

Generating a random string suitable for cryptographic purposes was answered well here:
Secure random number generation in PHP
The code sample fills the string $pr_bits with random binary data, so the characters are unprintable. To use this in a URL, you could convert the binary data to printable characters a couple ways. None of them enhance the security but make them ready for URLs.
convert bytes to hex: bin2hex($pr_bits)
convert bytes to base64: base64_encode($pr_bits)
hash the bytes (because the output is conveniently in hex, not for added security): string hash ('md5' , $pr_bits)
I include the last one because you will see people use hash functions for other reasons, like to guarantee the output is 16bytes/128bits for md5. In PHP people use it to convert a value into HEX.

I have come up with the following solution
to use a string as a result of concatenating current time in microseconds and random string's hash
$timeStr = str_replace("0.", "", microtime());
$timeStr = str_replace(" ", "", $timeStr);
echo Security::hash('random string').'_'.$timeStr;
// 5ffd3b852ccdd448809abb172e19bbb9c01a43a4_796473001379403705
So, the first part(hash) of the string will contribute for the unguessability of the token, and the second part will guarantee its uniquenes.
Hope, this will help someone.

Related

Is it wrong to use a hash for a unique ID?

I want to use a unique ID generated by PHP in a database table that will likely never have more than 10,000 records. I don't want the time of creation to be visible or use a purely numeric value so I am using:
sha1(uniqid(mt_rand(), true))
Is it wrong to use a hash for a unique ID? Don't all hashes lead to collisions or are the chances so remote that they should not be considered in this case?
A further point: if the number of characters to be hashed is less than the number of characters in a sha1 hash, won't it always be unique?
If you have 2 keys you will have a theoretical best case scenario of 1 in 2 ^ X probability of a collision, where X is the number of bits in your hashing algorithm. 'Best case' because the input usually will be ASCII which doesn't utilize the full charset, plus the hashing functions do not distribute perfectly, so they will collide more often than the theoretical max in real life.
To answer your final question:
A further point: if the number of characters to be hashed is less than
the number of characters in a sha1 hash, won't it always be unique?
Yeah that's true-sorta. But you would have another problem of generating unique keys of that size. The easiest way is usually a checksum, so just choose a large enough digest that the collision space will be small enough for your comfort.
As #wayne suggests, a popular approach is to concatenate microtime() to your random salt (and base64_encode to raise the entropy).
How horrible would it be if two ended up the same? Murphy's Law applies - if a million to one, or even a 100,000:1 chance is acceptable, then go right ahead! The real chance is much, much smaller - but if your system will explode if it happens then your design flaw must be addressed first. Then proceed with confidence.
Here is a question/answer of what the probabilities really are: Probability of SHA1 Collisions
Use sha1(time()) in stead, then you remove the random possibility of a repeating hash for as long as time can be represented shorter than the sha1 hash. (likely longer than you fill find a working php parser ;))
Computer random isn't actually random, you know?
The only true random that you can obtain from a computer, supposing you are on a Unix environment is from /dev/random, but this is a blocking operation that depends on user interactions like moving a mouse or typing on keyboard. Reading from /dev/urandom is less safe, but it's probably better thang using just ASCII characters and gives you instantaneous response.
sha1($ipAddress.time())
Causes it's impossible for anyone to use same IP address same time

Entropy in CSRF Tokens and Nonces

I am making a classified ads site with Zend Framework (for portfolio purposes, yes I know the world doesn't have room for "yet another Craigslist clone"). I am trying to implement the ability to post/edit/delete without ever needing an account.
To do this, I feel like I need to have a Nonce generated upon post submission and stored in the database. Then email a link to the user which makes a GET request for the delete, like this:
http://www.somesite.com/post/delete/?id=123&nonce=2JDXS93JFKS8204HJTHSLDH230945HSLDF
Only the user has this unique key or nonce, and upon submission I check the database under the post's ID and ensure the nonce matches prior to deleting.
My issue is how secure the nonce actually is. If I use Zend Framework's Zend_Form_Element_Hash, it creates the hash like this:
protected function _generateHash()
{
$this->_hash = md5(
mt_rand(1,1000000)
. $this->getSalt()
. $this->getName()
. mt_rand(1,1000000)
);
$this->setValue($this->_hash);
}
In reading about mt_rand(), one commenter said "This function has limited entrophy. So, if you want to create random string, it will produce only about 2 billion different strings, no matter the length of the string. This can be serous security issue if you are using such strings for session indentifiers, passwords etc."
Due to the lifetime of the nonce/token in the application, which could be days or weeks before user chooses to delete post, I think more than enough time would be given for a potential hack.
I realize mt_rand() is a huge upgrade from rand() as seen in this visual mapping pixels with rand on the left, and mt_rand on the right. But is it enough? What makes "2 billion different strings" a security issue?
And ultimately, how can I increase the entropy of a nonce/token/hash?
For such security it's not only important how long your output is. It counts how much randomness you've used to create it.
For mt_rand() the source of randomness is its seed and state (number of times you've used it since it was seeded). More mt_rand() calls will just give you more rehasing of the same randomness source (no new entropy).
mt_rand()'s seed is only 32-bit (anything less than 128bit makes cryptographers suspicious ;)
Strength of a keys with 32-bits of entropy is 4 billion divided by (roughly) number of keys you'll generate (e.g. after 100K uses there will be ~1:43000 chance to guess any valid key, which approaches practical brute-forcing).
You're adding salt to this, which makes it much stronger, because in addition to guessing the seed attacker would have to know the salt as well, so if the salt is long, then overall the key may be quite strong despite "low" entropy.
To increase entropy you need to add more random stuff (even slightly random is OK too, just gives less bits) from different sources than mt_rand: microtime(), amount of memory used, process ID... or just use /dev/random, which collects all entropy it can get.
(edit: uniqid() has weak entropy, so it won't help here)
The Zend hash generating code above's input for the md5() hashing function has 1,000,000 X 1,000,000 different possibilities. md5() has 32^16 (1208925819614629174706176) possible outcomes no matter what the input is. On average, the hacker would need to send 500,000,000,000 requests to your server in order to guess the right nonce.
At 100 requests per minute, that's about 3472222 days to hack.

Easy Encryption and Decryption with PHP

My PHP Application uses URLs like these:
http://domain.com/userid/120
http://domain.com/userid/121
The keys and the end of the URL are basically the primary key of the MySQL database table.
I don't want this increasing number to be public and I also don't want that someone will be able to crawl the user profiles just by interating the Id.
So I want to encrypt this Id for display in a way I can easily decrypt it again. The string shouldn't get much longer.
What's the best encryption method for this?
Simple Obscuring: Base64 encode them using base64_encode.
Now, your http://domain.com/userid/121 becomes: http://domain.com/userid/MTIx
Want more, do it again, add some letters around it.
Tough Obscuring: Use any encryption method using MCrypt library.
A better approach (from a usability and SEO perspective) would be to use a unique phrase rather than an obscured ID. In this instance the user's user name would seem an ideal solution, and would also be un-guessable.
That said, if you don't want to use this approach you could just use a hash (perhaps md5) of the user's user name which you'd store in the database along with their other details. As such, you can just do a direct lookup on that field. (i.e.: Having encrypt and decrypt part of the URL is probably overkill.)
You have a variety of choices here:
Generate and store an identifier in the database. It's good because you can then have readable keys that are guaranteed to be unique. It's bad because it causes a database schema change, and you have to actually query that table every time you want to generate a link.
Run an actual key-based encryption, for instance based on PHP's MCrypt. You have access to powerful cryptographic algorithms, but most secure algorithms tend to output strings that are much longer than what you expect. XOR does what you want, but it does not prevent accessing sequential values (and the key is pretty simple to determine, given the a priori knowledge about the numbers).
Run a hash-based verification: instead of using 121 as your identifier, use 121-a34df6 where a34df6 are the first six characters of the md5 (or other HMAC) of 121 and a secret key. Instead of decoding, you extract the 121 and recompute the six characters, to see if they match what the user sent. This does not hide the 121 (it's still right there before the hyphen) but without knowing the secret key, the visitor will not be able to generate the six characters to actually view the document numbered 121.
Use XOR with shuffling: shuffle the bits in the 30-bit identifier, then apply the XOR. This makes the XOR harder to identify because the shuffle pattern is also hidden.
Use XOR with on-demand keys: use fb37cde4-37b3 as your key, where the first part is the XOR of 121 and md5('37b3'.SECRET) (or another way of generating an XOR key based on 37b3 and a secret).
Don't use base64, it's easy to reverse engineer: if MTIx is 121, then MTIy is 122 ...
Ultimately, you will have to accept that your solution will not be secure: not only is it possible for users to leak valid urls (through their browser history, HTTP referer, or posting them on Twitter), but your requirement that the identifier fits in a small number of characters means a brute-force attack is possible (and becomes easier as you start having more documents).
Simplest but powerful encryption method: XOR with a secret Key. http://en.wikipedia.org/wiki/XOR_cipher
No practical performance degradation.
Base64 representation is not an encryption! It's another way to say the same.
Hope this helps.
Obscuring the URL will never secure it. It makes it harder to read, but not much harder to manipulate. You could use a hexadecimal number representation or something like that to obscure it. Those who can read hex can change your URL in a few seconds, anyway:
$hexId = dechex($id); // to hex
$id = hexdec($hexId); // from hex
I'd probably say it's better indeed to just create a random string for each user and store that in your database than to get one using hash. If you use a common hash, it's still very easy to iterate over all pages ;-)
I would write this in comments, but don't have the rep for it (yet?).
When user click on a link you should not use primary key, You can use the pkey in a session and get it from that session. Please do not use query string....
generate an unique string for each user and use it in your urls
http://domain.com/user/ofisdoifsdlfkjsdlfkj instead of http://domain.com/userid/121
you can use base64_encode and base64_decode function for encrypt and decrypt your URLS

Why is MD5'ing a UUID not a good idea?

PHP has a uniqid() function which generates a UUID of sorts.
In the usage examples, it shows the following:
$token = md5(uniqid());
But in the comments, someone says this:
Generating an MD5 from a unique ID is
naive and reduces much of the value of
unique IDs, as well as providing
significant (attackable) stricture on
the MD5 domain. That's a deeply
broken thing to do. The correct
approach is to use the unique ID on
its own; it's already geared for
non-collision.
Why is this true, if so? If an MD5 hash is (almost) unique for a unique ID, then what is wrong from md5'ing a uniqid?
A UUID is 128 bits wide and has uniqueness inherent to the way it is generated. A MD5 hash is 128 bits wide and doesn't guarantee uniquess, only a low probablity of collision. The MD5 hash is no smaller than the UUID so it doesn't help with storage.
If you know the hash is from a UUID it is much easier to attack because the domain of valid UUIDs is actually fairly predictable if you know anything about the machine geneerating them.
If you needed to provide a secure token then you would need to use a cryptographically secure random number generator.(1) UUIDs are not designed to be cryptographically secure, only guaranteed unique. A monotonically increasing sequence bounded by unique machine identifiers (typically a MAC) and time is still a perfectly valid UUID but highly predictable if you can reverse engineer a single UUID from the sequence of tokens.
The defining characteristic of a cryptographically secure PRNG is that the result of a given iteration does not contain enough information to infer the value of the next iteration - i.e. there is some hidden state in the generator that is not revealed in the number and cannot be inferred by examining a sequence of numbers from the PRNG. If you get into number theory you can find ways to guess the internal state of some PRNGs from a sequence of generated values. Mersenne Twister is an example of such a generator. It has hidden state that it used to get its long period but it is not cryptographically secure - you can take a fairly small sequence of numbers and use that to infer the internal state. Once you've done this you can use it to attack a cryptographic mechanism that depends on keeping that sequence a secret.
Note that uniqid() does not return a UUID, but a "unique" string based on the current time:
$ php -r 'echo uniqid("prefix_", true);'
prefix_4a8aaada61b0f0.86531181
If you do that multiple times, you will get very similar output strings and everyone who is familiar with uniqid() will recognize the source algorithm. That way it is pretty easy to predict the next IDs that will be generated.
The advantage of md5()-ing the output, along with an application-specific salt string or random number, is a way harder to guess string:
$ php -r 'echo md5(uniqid("prefix_", true));'
3dbb5221b203888fc0f41f5ef960f51b
Unlike plain uniqid(), this produces very different outputs every microsecond. Furthermore it does not reveil your "prefix salt" string, nor that you are using uniqid() under the hood. Without knowing the salt, it is very hard (consider it impossible) to guess the next ID.
In summary, I would disagree with the commentor's opinion and would always prefer the md5()-ed output over plain uniqid().
MD5ing a UUID is pointless because UUIDs are already unique and fixed length (short), properties that are some of the reasons that people often use MD5 to begin with. So I suppose it depends on what you plan on doing with the UUID, but in general a UUID has the same properties as some data that has been MD5'd, so why do both?
UUIDs are already unique, so there is no point in MD5'ing them anyway.
Regarding the security question, in general you can be attacked if the attacker can predict what the next unique ID will be you are about to generate. If it is known that you generate your unique IDs from UUIDs, the set of potential next unique IDs is much smaller, giving a better chance for a brute force attack.
This is especially true if the attacker can get a whole bunch of unique IDs from you, and that way guess your scheme of generating UUIDs.
Version 3 of UUIDs are already MD5'd, so there's no point in doing it again. However, I'm not sure what UUID version PHP uses.
As an aside, MD5 is actually obsolete and is not to be used in anything worth protecting - PHI, PII or PCI - from 2010 onwards. The US Feds have ennforced this and any entity non-compliant would be paying lots of $$$ in penalty.

Unique key generation

I looking for a way, specifically in PHP that I will be guaranteed to always get a unique key.
I have done the following:
strtolower(substr(crypt(time()), 0, 7));
But I have found that once in a while I end up with a duplicate key (rarely, but often enough).
I have also thought of doing:
strtolower(substr(crypt(uniqid(rand(), true)), 0, 7));
But according to the PHP website, uniqid() could, if uniqid() is called twice in the same microsecond, it could generate the same key. I'm thinking that the addition of rand() that it rarely would, but still possible.
After the lines mentioned above I am also remove characters such as L and O so it's less confusing for the user. This maybe part of the cause for the duplicates, but still necessary.
One option I have a thought of is creating a website that will generate the key, storing it in a database, ensuring it's completely unique.
Any other thoughts? Are there any websites out there that already do this that have some kind of API or just return the key. I found http://userident.com but I'm not sure if the keys will be completely unique.
This needs to run in the background without any user input.
There are only 3 ways to generate unique values, rather they be passwords, user IDs, etc.:
Use an effective GUID generator - these are long and cannot be shrunk. If you only use part you FAIL.
At least part of the number is sequentially generated off of a single sequence. You can add fluff or encoding to make it look less sequential. Advantage is they start short - disadvantage is they require a single source. The work around for the single source limitation is to have numbered sources, so you include the [source #] + [seq #] and then each source can generate its own sequence.
Generate them via some other means and then check them against the single history of previously generated values.
Any other method is not guaranteed. Keep in mind, fundamentally you are generating a binary number (it is a computer), but then you can encode it in Hexadecimal, Decimal, Base64, or a word list. Pick an encoding that fits your usage. Usually for user entered data you want some variation of Base32 (which you hinted at).
Note about GUIDS: They gain their strength of uniqueness from their length and the method used to generate them. Anything less than 128-bits is not secure. Beyond random number generation there are characteristics that go into a GUID to make it more unique. Keep in mind they are only practically unique, not completely unique. It is possible, although practically impossible to have a duplicate.
Updated Note about GUIDS: Since writing this I learned that many GUID generators use a cryptographically secure random number generator (difficult or impossible to predict the next number generated, and a not likely to repeat). There are actually 5 different UUID algorithms. Algorithm 4 is what Microsoft currently uses for the Windows GUID generation API. A GUID is Microsoft's implementation of the UUID standard.
Update: If you want 7 to 16 characters then you need to use either method 2 or 3.
Bottom line: Frankly there is no such thing as completely unique. Even if you went with a sequential generator you would eventually run out of storage using all the atoms in the universe, thus looping back on yourself and repeating. Your only hope would be the heat death of the universe before reaching that point.
Even the best random number generator has a possibility of repeating equal to the total size of the random number you are generating. Take a quarter for example. It is a completely random bit generator, and its odds of repeating are 1 in 2.
So it all comes down to your threshold of uniqueness. You can have 100% uniqueness in 8 digits for 1,099,511,627,776 numbers by using a sequence and then base32 encoding it. Any other method that does not involve checking against a list of past numbers only has odds equal to n/1,099,511,627,776 (where n=number of previous numbers generated) of not being unique.
Any algorithm will result in duplicates.
Therefore, might I suggest that you use your existing algorithm* and simply check for duplicates?
*Slight addition: If uniqid() can be non-unique based on time, also include a global counter that you increment after every invocation. That way something is different even in the same microsecond.
Without writing the code, my logic would be:
Generate a random string from whatever acceptable characters you like.
Then add half the date stamp (partial seconds and all) to the front and the other half to the end (or somewhere in the middle if you prefer).
Stay JOLLY!
H
If you use your original method, but add the username or emailaddress in front of the password, it will always be unique if each user only can have 1 password.
You may be interested in this article which deals with the same issue: GUIDs are globally unique, but substrings of GUIDs aren't.
The goal of this algorithm is to use the combination of time and location ("space-time coordinates" for the relativity geeks out there) as the uniqueness key. However, timekeeping is not perfect, so there's a possibility that, for example, two GUIDs are generated in rapid succession from the same machine, so close to each other in time that the timestamp would be the same. That's where the uniquifier comes in.
I usually do it like this:
$this->password = '';
for($i=0; $i<10; $i++)
{
if($i%2 == 0)
$this->password .= chr(rand(65,90));
if($i%3 == 0)
$this->password .= chr(rand(97,122));
if($i%4 == 0)
$this->password .= chr(rand(48,57));
}
I suppose there are some theoretical holes but I've never had an issue with duplication. I usually use it for temporary passwords (like after a password reset) and it works well enough for that.
As Frank Kreuger commented, go with a GUID generator.
Like this one
I'm still not seeing why the passwords have to be unique? What's the downside if 2 of your users have the same password?
This is assuming we're talking about passwords that are tied to userids, and not just unique identifiers. If that's what you're looking for, why not use GUIDs?
You might be interested in Steve Gibson's over-the-top-secure implementation of a password generator (no source, but he has a detailed description of how it works) at https://www.grc.com/passwords.htm.
The site creates huge 64-character passwords but, since they're completely random, you could easily take the first 8 (or however many) characters for a less secure but "as random as possible" password.
EDIT: from your later answers I see you need something more like a GUID than a password, so this probably isn't what you want...
I do believe that part of your issue is that you are trying to us a singular function for two separate uses... passwords and transaction_id
these really are two different problem areas and it really is not best to try to address them together.
I recently wanted a quick and simple random unique key so I did the following:
$ukey = dechex(time()) . crypt( time() . md5(microtime() + mt_rand(0, 100000)) );
So, basically, I get the unix time in seconds and add a random md5 string generated from time + random number. It's not the best, but for low frequency requests it is pretty good. It's fast and works.
I did a test where I'd generate thousands of keys and then look for repeats, and having about 800 keys per second there were no repetitions, so not bad. I guess it totally depends on mt_rand()
I use it for a survey tracker where we get a submission rate of about 1000 surveys per minute... so for now (crosses fingers) there are no duplicates. Of course, the rate is not constant (we get the submissions at certain times of the day) so this is not fail proof nor the best solution... the tip is using an incremental value as part of the key (in my case, I used time(), but could be better).
Ingoring the crypting part that does not have much to do with creating a unique value I usually use this one:
function GetUniqueValue()
{
static $counter = 0; //initalized only 1st time function is called
return strtr(microtime(), array('.' => '', ' ' => '')) . $counter++;
}
When called in same process $counter is increased so value is always unique in same process.
When called in different processes you must be really unlucky to get 2 microtime() call with the same values, think that microtime() calls usually have different values also when called in same script.
I usually do a random substring (randomize how many chars between 8 an 32, or less for user convenience) or the MD5 of some value I have gotten in, or the time, or some combination. For more randomness I do MD5 of come value (say last name) concatenate that with the time, MD5 it again, then take the random substring. Yes, you could get equal passwords, but its not very likely at all.

Categories