Secure string compare function - php

I just came across this code in the HTTP Auth library of the Zend Framework. It seems to be using a special string compare function to make it more secure. However, I don't quite understand the comments. Could anybody explain why this function is more secure than doing $a == $b?
/**
* Securely compare two strings for equality while avoided C level memcmp()
* optimisations capable of leaking timing information useful to an attacker
* attempting to iteratively guess the unknown string (e.g. password) being
* compared against.
*
* #param string $a
* #param string $b
* #return bool
*/
protected function _secureStringCompare($a, $b)
{
if (strlen($a) !== strlen($b)) {
return false;
}
$result = 0;
for ($i = 0; $i < strlen($a); $i++) {
$result |= ord($a[$i]) ^ ord($b[$i]);
}
return $result == 0;
}

It looks like they're trying to prevent timing attacks.
In cryptography, a timing attack is a side channel attack in which the attacker attempts to compromise a cryptosystem by analyzing the time taken to execute cryptographic algorithms. Every logical operation in a computer takes time to execute, and the time can differ based on the input; with precise measurements of the time for each operation, an attacker can work backwards to the input.
Basically, if it takes a different amount of time to compare a correct password and an incorrect password, then you can use the timing to figure out how many characters of the password you've guessed correctly.
Consider an extremely flawed string comparison (this is basically the normal string equality function, with an obvious wait added):
function compare(a, b) {
if(len(a) !== len(b)) {
return false;
}
for(i = 0; i < len(a); ++i) {
if(a[i] !== b[i]) {
return false;
}
wait(10); // wait 10 ms
}
return true;
}
Say you give a password and it (consistently) takes some amount of time for one password, and about 10 ms longer for another. What does this tell you? It means the second password has one more character correct than the first one.
This lets you do movie hacking -- where you guess a password one character at a time (which is much easier than guessing every single possible password).
In the real world, there's other factors involved, so you have to try a password many, many times to handle the randomness of the real world, but you can still try every one character password until one is obviously taking longer, then start on two character password, and so on.
This function still has a minor problem here:
if(strlen($a) !== strlen($b)) {
return false;
}
It lets you use timing attacks to figure out the correct length of the password, which lets you not bother guessing any shorter or longer passwords. In general, you want to hash your passwords first (which will create equal-length strings), so I'm guessing they didn't consider it to be a problem.

Related

How to decrease runtime for generating permutations of a string?

I have written a function that takes in a MD5 hashvalue and finds its input/original value by permuting all possible combinations of a string. As per BIT_CHEETAH's answer on a SO question:
... you cannot decrypt MD5 without attempting something like brute force hacking which is extremely resource intensive, not practical, and unethical.
(Source: encrypt and decrypt md5)
I'm well aware of this, however, I am using this scenario to implement a string permutation function. I would also like to stick to the recursive methodology as opposed to others. The best summary of doing this is probably summarised by Mark Byers post:
- Try each of the letters in turn as the first letter and then find all
the permutations of the remaining letters using a recursive call.
- The base case is when the input is an empty string the only permutation is the empty string.
(Generating all permutations of a given string)
Anyway, so I implemented this and got the following:
function matchMD5($possibleChars, $md5, $concat, $length) {
for($i = 0; $i < strlen($possibleChars); $i++) {
$ch = $possibleChars[$i];
$concatSubstr = $concat.$ch;
if(strlen($concatSubstr) != $length) {
matchMD5($possibleChars, $md5, $concatSubstr, $length);
}
else if(strlen($concatSubstr) == $length) {
$tryHash = hash('md5', $concatSubstr);
if ($tryHash == $md5) {
echo "Match! $concatSubstr ";
return $concatSubstr;
}
}
}
}
Works 100%, however when I pass in a four character array, my server runs 10.7 seconds to generate a match where the match lies approximately 1/10th of the way of all possible permutations. My valid characters in which the functions permutes, called, $possibleChars, contains all alphanumeric characters plus a few selected punctionations:
0123456789.,;:abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ
Question: Can the above code be written to run faster somehow?
When doing brute-force, you have to run through all the possibilities, there is not way of cutting a corner there. So you are left with profiling your code to find out what the application spends the most time doing and then trying to optimize that.

PHP Seeded, Deterministic, Cryptographically Secure PRNG (PseudoRandom Number Generator). Is it possible?

I'm required to create a provably-fair (deterministic & seeded) cryptographically secure (CS) random number generator in PHP. We are running PHP 5 and PHP 7 isn't really an option right now. However, I found a polyfill for PHP 7's new CS functions so I've implemented that solution (https://github.com/paragonie/random_compat).
I thought that srand() could be used to seed random_int(), but now I'm not certain if that is the case. Can a CSPRNG even be seeded? If it can be seeded, will the output be deterministic (same random result, given same seed)?
Here is my code:
require_once($_SERVER['DOCUMENT_ROOT']."/lib/assets/random_compat/lib/random.php");
$seed_a = 8138707157292429635;
$seed_b = 'JuxJ1XLnBKk7gPASR80hJfq5Ey8QWEIc8Bt';
class CSPRNG{
private static $RNGseed = 0;
public function generate_seed_a(){
return random_int(0, PHP_INT_MAX);
}
public function generate_seed_b($length = 35){
$characters = '0123456789abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ';
$randomString = '';
for($i = 0; $i < $length; $i++){
$randomString .= $characters[random_int(0, strlen($characters) - 1)];
}
return $randomString;
}
public function seed($s = 0) {
if($s == 0){
$this->RNGseed = $this->generate_seed_a();
}else{
$this->RNGseed = $s;
}
srand($this->RNGseed);
}
public function generate_random_integer($min=0, $max=PHP_INT_MAX, $pad_zeros = true){
if($this->RNGseed == 0){
$this->seed();
}
$rnd_num = random_int($min, $max);
if($pad_zeros == true){
$num_digits = strlen((string)$max);
$format_str = "%0".$num_digits."d";
return sprintf($format_str, $rnd_num);
}else{
return $rnd_num;
}
}
public function drawing_numbers($seed_a, $num_of_balls = 6){
$this->seed($seed_a);
$draw_numbers = array();
for($i = 0; $i < $num_of_balls; $i++) {
$number = ($this->generate_random_integer(1, 49));
if(in_array($number, $draw_numbers)){
$i = $i-1;
}else{
array_push($draw_numbers, $number);
}
}
sort($draw_numbers);
return $draw_numbers;
}
}
$CSPRNG= new CSPRNG();
echo '<p>Seed A: '.$seed_a.'</p>';
echo '<p>Seed B: '.$seed_b.'</p>';
$hash = hash('sha1', $seed_a.$seed_b);
echo '<p>Hash: '.$hash.'</p>';
$drawNumbers = $CSPRNG->drawing_numbers($seed_a);
$draw_str = implode("-", $drawNumbers);
echo "<br>Drawing: $draw_str<br>";
When this code is run, the Drawing ($draw_str) should be the same on each run, but it is not.
To prove that the drawing is fair, a seed (Seed A) is chosen before the winning number is picked and shown. Another random number is generated as well (Seed B). Seed B is used as a salt and combined with Seed A and the result is hashed. This hash is shown to the user prior to the drawing. They would also be provided with the source code so that when the winning number is picked, both seeds are revealed. They can verify that the hash matches and everything was done fairly.
Duskwuff asks:
How do you intend to prove that the seed was chosen fairly? A suspicious user can easily claim that you picked a seed that would result in a favorable outcome for specific users, or that you revealed the seed to specific users ahead of time.
Before you investigate solutions, what exactly is the problem you are trying to solve? What is your threat model?
It sounds like you want SeedSpring (version 0.3.0 supports PHP 5.6).
$prng = new \ParagonIE\SeedSpring\SeedSpring('JuxJ1XLnBKk7gPAS');
$byte = $prng->getBytes(16);
\var_dump(bin2hex($byte));
This should always return:
string(32) "76482c186f7c5d1cb3f895e044e3c649"
The numbers should be unbiased, but since it's based off a pre-shared seed, it is not, by strict definition, cryptographically secure.
Keep in mind that SeedSpring was created as a toy implementation/proof of concept rather than an official Paragon Initiative Enterprises open source security solution, so feel free to fork it and tweak it to suit your purposes. (I doubt our branch will ever reach a "stable 1.0.0 release").
(Also, if you're going to accept/award the bounty to any of these answers, Aaron Toponce's answer is more correct. Encrypting the nonce with ECB mode is more performant than encrypting a long stream of NUL bytes with AES-CTR, for approximately the same security benefit. This is one of the extremely rare occasions that ECB mode is okay.)
First, you shouldn't be implementing your own userspace CSPRNG. The operating system you have PHP 5 installed on already ships a CSPRNG, and you should be using that for all your randomness, unless you know you can use it, or performance is a concern. You should be using random_int(), random_bytes(), or openssl_random_pseudo_bytes().
However, if you must implement a userspace CSPRNG, then this can be done by simply using an AES library (E.G.: libsodium), and encrypting a counter. Psuedocode would be:
Uint-128 n = 0;
while true:
output = AES-ECB(key, n);
n++;
They AES key, in this case, needs sufficient entropy to withstand a sophisticated attack, or the security of your userspace CSPRNG falls apart, of course. The key could be the bcrypt() of a user-supplied password.
Provided your counter represented as a 128-bit unsigned integer is always unique, you will always get a unique output every time the generator is "seeded" with a new counter. If it's seeded with a previously used counter, but a different key, then the output will also be different. The best case scenario, would be a changing key and a changing counter every time the generator is called.
You may be tempted to use high precision timestamp, such as using microsecond accuracy, in your counter. This is fine, except you run the risk of someone or something manipulating the system clock. As such, if the clock can be manipulated, then the CSPRNG generator can be compromised. You're best off providing a new key every time you call the generator, and start encrypting with a 128-bit zero.
Also, notice that we're using ECB mode with AES. Don't freak out. ECB has problems with maintaining structure in the ciphertext that the plaintext provides. In general terms, you should not use ECB mode. However, with 128-bits of data, you will only be encrypting a single ECB block, so there will be no leak of structured data. ECB is preferred over CTR for a userspace CSPRNG, as you don't have to keep track of a key, a counter object, and the data to be encrypted. Only a key and the data are needed. Just make sure you are never encrypting more than 128-bits of data, and you'll never need more than 1 block.
Can a CSPRNG even be seeded?
Yes, and it should always be seeded. If you look at your GNU/Linux operating system, you'll likely notice a file in /var/lib/urandom/random-seed. When the operating system shuts down, it creates that file from the CSPRNG. On next boot, this file is used to seed the kernelspace CSPRNG to prevent reusing previous state of the generator. On every shutdown, that file should change.
If it can be seeded, will the output be deterministic (same random result, given same seed)?
Yes. Provided the same seed, key, etc., the output is deterministic, so the output will be the same. If one of your variables changes, then the output will be different. This is why on each call of the generator should be rekeyed.

Could a random sleep prevent timing attacks?

From Wikipedia
In cryptography, a timing attack is a side channel attack in which the
attacker attempts to compromise a cryptosystem by analyzing the time
taken to execute cryptographic algorithms.
Actually, to prevent timing attacks, I'm using the following function taken from this answer:
function timingSafeCompare($safe, $user) {
// Prevent issues if string length is 0
$safe .= chr(0);
$user .= chr(0);
$safeLen = strlen($safe);
$userLen = strlen($user);
// Set the result to the difference between the lengths
$result = $safeLen - $userLen;
// Note that we ALWAYS iterate over the user-supplied length
// This is to prevent leaking length information
for ($i = 0; $i < $userLen; $i++) {
// Using % here is a trick to prevent notices
// It's safe, since if the lengths are different
// $result is already non-0
$result |= (ord($safe[$i % $safeLen]) ^ ord($user[$i]));
}
// They are only identical strings if $result is exactly 0...
return $result === 0;
}
But I was thinking if is possible prevent this kind of attack using a random sleep like
function timingSafeCompare($a,$b) {
sleep(rand(0,100));
if ($a === $b) {
return true;
} else {
return false;
}
}
Or maybe augmenting the randomness of sleep
sleep(rand(1,10)+rand(1,10)+rand(1,10)+rand(1,10));
This kind of approach can totally prevent timing attacks? Or just make the work harder?
This kind of approach can totally prevent timing attacks? Or just make the work harder?
Neither. It doesn't prevent timing attacks, nor does it make them any more difficult at all.
To understand why, look at the docs for sleep. Specifically, the meaning of the first parameter:
Halt time in seconds.
So your app takes 0.3 seconds to respond without sleep. With sleep it takes either 0.3, 1.3, 2.3, etc...
So really, to get the part we care about (the timing difference), we just need to chop off the integer part:
$real_time = $time - floor($time);
But let's go a step further. Let's say that you randomly sleep using usleep. That's a lot more granular. That's sleeping in microseconds.
Well, the measurements are being made in the 15-50 nanosecond scale. So that sleep is still about 100 times less granular than the measurements being made. So we can average off to the single microsecond:
$microseconds = $time * 1000000;
$real_microseconds = $microseconds - floor($microseconds);
And still have meaningful data.
You could go further and use time_nanosleep which can sleep to nanosecond scale precision.
Then you could start fuddling with the numbers.
But the data is still there. The beauty of randomness is that you can just average it out:
$x = 15 + rand(1, 10000);
Run that enough times and you'll get a nice pretty graph. You'll tell that there are about 10000 different numbers, so you can then average away the randomness and deduce the "private" 15.
Because well-behaved randomness is unbiased, it's pretty easy to detect statistically over a large enough sample.
So the question I would ask is:
Why bother with sleep-like hacks when you can fix the problem correctly?
Anthony Ferrara answered this question in his blog post, It's All About Time. I highly recommend this article.
Many people, when they hear about timing attacks, think "Well, I'll just add a random delay! That'll work!". And it doesn't.
This is fine for a single request if the only side channel observable by the attacker is the response time.
However, if an attacker makes enough requests this random delay could average out as noted in #Scott's answer citing ircmaxell's blog post:
So if we needed to run 49,000 tests to get an accuracy of 15ns [without a random delay], then we would need perhaps 100,000 or 1,000,000 tests for the same accuracy with a random delay. Or perhaps 100,000,000. But the data is still there.
As an example, let's estimate the number of requests a timing attack would need to get a valid 160 bit Session ID like PHP at 6 bits per character which gives a length of 27 characters. Assume, like the linked answer that an attack can only be done on one user at once (as they are storing the user to lookup in the cookie).
Taking the very best case from the blog post, 100,000, the number of permutations would be 100,000 * 2^6 * 27.
On average, the attacker will find the value halfway through the number of permutations.
This gives the number of requests needed to discover the Session ID from a timing attack to be 86,400,000. This is compared to 42,336,000 requests without your proposed timing protection (assuming 15ns accuracy like the blog post).
In the blog post, taking the longest length tested, 14, took 0.01171 seconds on average, which means 86,400,000 would take 1,011,744 seconds which equates to 11 days 17 hours 2 minutes 24 seconds.
Could a random sleep prevent timing attacks?
This depends on the context in which your random sleep is used, and the bit strength of the string that it is protecting. If it is for "keep me logged in" functionality which is the context in the linked question, then it could be worth an attacker spending 11 days to use the timing attack to brute force a value. However, this is assuming perfect conditions (i.e. fairly consistent response times from your application for each string position tested and no resetting or rollover of IDs). Also, these type of activity from an attacker will create a lot of noise and it is likely they will be spotted via IDS and IPS.
It can't entirely prevent them, but it can make them more difficult for an attacker to execute. It would be much easier and better to use something like hash-equals which would prevent timing attacks entirely assuming the string lengths are equal.
Your proposed code
function timingSafeCompare($a,$b) {
sleep(rand(0,100));
if ($a === $b) {
return true;
} else {
return false;
}
}
Note that the PHP rand function is not cryptographically secure:
Caution
This function does not generate cryptographically secure values, and should not be used for cryptographic purposes. If you need a cryptographically secure value, consider using openssl_random_pseudo_bytes() instead.
This means that in theory an attacker could predict what rand was going to generate and then use this information to determine whether the response time delay from your application was due to random sleep or not.
The best way to approach security is to assume that the attacker knows your source code - the only things secret from the attacker should be things like keys and passwords - assume that they know the algorithms and function used. If you can still say your system is secure even though an attacker knows exactly how it works, you will be most of the way there. Functions like rand are usually set to seed with the current time of day, so an attacker can just make sure their system clock is set to the same as your server and then make requests to validate that their generator is matching yours.
Due to this, it is best to avoid insecure random functions like rand and change your implementation to use openssl_random_pseudo_bytes which will be unpredictable.
Also, as per ircmaxell's comment, sleep is not granular enough as it only accepts an integer to represent the number of seconds. If you are going to try this approach look into time_nanosleep with a random number of nanoseconds.
These pointers should help secure your implementation against this type of timing attack.
This kind of approach can totally prevent timing attacks? Or just make the work harder?
ircmaxell have already answered why this only makes the work harder,
but a solution to prevent timing attacks in PHP in general would be
/**
* execute callback function in constant-time,
* or throw an exception if callback was too slow
*
* #param callable $cb
* #param float $target_time_seconds
* #throws \LogicException if the callback was too slow
* #return whatever $cb returns.
*/
function execute_in_constant_time(callable $cb, float $target_time_seconds = 0.01)
{
$start_time = microtime(true);
$ret = ($cb)();
$success = time_sleep_until($start_time + $target_time_seconds);
if ($success) {
return $ret;
}
// dammit!
$time_used = microtime(true) - $start_time;
throw new \LogicException("callback function was too slow! time expired! target_time_seconds: {$target_time_seconds} actual time used: {$time_used}");
}
using that approach, your code could be
function timingSafeCompare($a,$b, float $target_time_seconds = 0.01) {
return execute_in_constant_time(fn() => $a === $b, $target_time_seconds);
}
downside is that you should pick a number with a large margin, meaning relatively much time is lost sleeping.. fwiw on my laptop i had to use 0.2 (200 milliseconds) to compare 2x exactly-1-GiB strings, with a Core i7-8565U (a weird 2018 mid-range laptop cpu i've never heard of)
and this loop:
ini_set("memory_limit", "-1");
$s1 = "a";
$s2 = "a";
$append = str_repeat("a",100*1024);
try {
for (;;) {
$res = timingSafeCompare($s1, $s2, 0.01);
$s1 .= $append;
$s2 .= $append;
}
} catch (\Throwable $e) {
var_dump(strlen($s1));
}
craps out at about 65 megabytes/int(65126401)
(but how often do you need to constant-time-compare strings above 65MB? i imagine it's not often)
you might think "then the attacker could send a HUGE string to compare, and check how long it takes for the exception to be thrown" but i don't think that would work, === starts by checking if both strings have the same length, and short-circuits if they have different lengths, such an attack should only work if the attacker can set the length for both strings to be large enough to timeout
today we have the native hash_equals() function to compare strings that have exactly the same length, but hash_equals() will not protect you against strings of different length, while the function above will.

Explanation about constant-time algorithm and string comparision

I've a problem to understand two different ways of string comparison. Given is the following function which compares two strings.
This function is used in the Symfony-Framework security component to compare passwords in the user-login process.
/**
* Compares two strings.
*
* This method implements a constant-time algorithm to compare strings.
*
* #param string $knownString The string of known length to compare against
* #param string $userInput The string that the user can control
*
* #return Boolean true if the two strings are the same, false otherwise
*/
function equals($knownString, $userInput)
{
// Prevent issues if string length is 0
$knownString .= chr(0);
$userInput .= chr(0);
$knownLen = strlen($knownString);
$userLen = strlen($userInput);
$result = $knownLen - $userLen;
// Note that we ALWAYS iterate over the user-supplied length
// This is to prevent leaking length information
for ($i = 0; $i < $userLen; $i++) {
// Using % here is a trick to prevent notices
// It's safe, since if the lengths are different
// $result is already non-0
$result |= (ord($knownString[$i % $knownLen]) ^ ord($userInput[$i]));
}
// They are only identical strings if $result is exactly 0...
return 0 === $result;
}
origin: origin snippet
I've problem to understand the difference between the equals() function and a simple comparison ===. I wrote a simple working example to explain my problem.
Given strings:
$password1 = 'Uif4yQZUqmCWRbWFQtdizZ9/qwPDyVHSLiR19gc6oO7QjAK6PlT/rrylpJDkZaEUOSI5c85xNEVA6JnuBrhWJw==';
$password2 = 'Uif4yQZUqmCWRbWFQtdizZ9/qwPDyVHSLiR19gc6oO7QjAK6PlT/rrylpJDkZaEUOSI5c85xNEVA6JnuBrhWJw==';
$password3 = 'iV3pT5/JpPhIXKmzTe3EOxSfZSukpYK0UC55aKUQgVaCgPXYN2SQ5FMUK/hxuj6qZoyhihz2p+M2M65Oblg1jg==';
Example 1 (act as expected)
echo $password1 === $password2 ? 'True' : 'False'; // Output: True
echo equals($password1, $password2) ? 'True' : 'False'; // Output: True
Example 2 (act as expected)
echo $password1 === $password3 ? 'True' : 'False'; // Output: False
echo equals($password1, $password3) ? 'True' : 'False'; // Output: False
I read about the Karp Rabin Algorithm but I'm not sure if the equals() function represent
the Karp Rabin Algorithm, and in general I didn't understand the Wikipedia article.
On the other hand I read that the equals() function will prevent brute force attacks is that right? Can someone explain what the advantage for equals() is?
Or can someone give me an example where === will fail and equals() does the correct work, so I can understand the advantage?
And what does constant-time Algorithm mean? I think constant-time has nothing to do with the real time, or if I'm wrong?
This function is just a normal string comparison function. It is not Rabin Karp. It is NOT constant time, it's linear time, regardless of what the comment says. It also does not prevent brute force attacks.
How it works:
if the correct and user-provided passwords are of different length, make $result != 0
iterate over the user-provided password, xor each of its characters with the corresponding character of the correct password (if the correct password is shorter, keep going through it in a circle), and bitwise or each result with $result.
Since only bitwise or is used, if any of the characters are different, $result will be != 0. Step 1 is needed because otherwise, user input "abca" would be accepted if the real password was "abc".
Why such string comparison functions are sometimes used
Let's assume we compare strings the usual way, and the correct password is "bac". Let's also assume I can precisely measure how long it takes for the password check to complete.
I (the user) try a, b, c... They don't work.
Then, I try aa. The algorithm compares the first 2 letters - b vs a, sees it's wrong, and returns false.
I now try with bb. The algorithm compares b vs b, they match, so it goes on to letter #2, compares a vs b, sees it's wrong, returns false. Now, since I am able to time the execution of the algorithm precisely, I know the password starts with "b", because the second pass took more time than the first one - I know the first letter matched.
So I try ba, bb, bc... They fail.
Now I check baa, bbb, see baa runs slower so second letter is a. This way, letter by letter, I can determine the password in an O(cN) number of attempts instead of O(c^N) that brute force would take.
It usually isn't as much of a concern as this explanation might make it sound, because it is unlikely an attacker will time a string comparison to such a degree of accuracy. But sometimes it can be.

md5(uniqid) makes sense for random unique tokens?

I want to create a token generator that generates tokens that cannot be guessed by the user and that are still unique (to be used for password resets and confirmation codes).
I often see this code; does it make sense?
md5(uniqid(rand(), true));
According to a comment uniqid($prefix, $moreEntopy = true) yields
first 8 hex chars = Unixtime, last 5 hex chars = microseconds.
I don't know how the $prefix-parameter is handled..
So if you don't set the $moreEntopy flag to true, it gives a predictable outcome.
QUESTION: But if we use uniqid with $moreEntopy, what does hashing it with md5 buy us? Is it better than:
md5(mt_rand())
edit1: I will store this token in an database column with a unique index, so I will detect columns. Might be of interest/
rand() is a security hazard and should never be used to generate a security token: rand() vs mt_rand() (Look at the "static" like images). But neither of these methods of generating random numbers is cryptographically secure. To generate secure secerts an application will needs to access a CSPRNG provided by the platform, operating system or hardware module.
In a web application a good source for secure secrets is non-blocking access to an entropy pool such as /dev/urandom. As of PHP 5.3, PHP applications can use openssl_random_pseudo_bytes(), and the Openssl library will choose the best entropy source based on your operating system, under Linux this means the application will use /dev/urandom. This code snip from Scott is pretty good:
function crypto_rand_secure($min, $max) {
$range = $max - $min;
if ($range < 0) return $min; // not so random...
$log = log($range, 2);
$bytes = (int) ($log / 8) + 1; // length in bytes
$bits = (int) $log + 1; // length in bits
$filter = (int) (1 << $bits) - 1; // set all lower bits to 1
do {
$rnd = hexdec(bin2hex(openssl_random_pseudo_bytes($bytes)));
$rnd = $rnd & $filter; // discard irrelevant bits
} while ($rnd >= $range);
return $min + $rnd;
}
function getToken($length=32){
$token = "";
$codeAlphabet = "ABCDEFGHIJKLMNOPQRSTUVWXYZ";
$codeAlphabet.= "abcdefghijklmnopqrstuvwxyz";
$codeAlphabet.= "0123456789";
for($i=0;$i<$length;$i++){
$token .= $codeAlphabet[crypto_rand_secure(0,strlen($codeAlphabet))];
}
return $token;
}
This is a copy of another question I found that was asked a few months before this one. Here is a link to the question and my answer: https://stackoverflow.com/a/13733588/1698153.
I do not agree with the accepted answer. According to PHPs own website "[uniqid] does not generate cryptographically secure tokens, in fact without being passed any additional parameters the return value is little different from microtime(). If you need to generate cryptographically secure tokens use openssl_random_pseudo_bytes()."
I do not think the answer could be clearer than this, uniqid is not secure.
I know the question is old, but it shows up in Google, so...
As others said, rand(), mt_rand() or uniqid() will not guarantee you uniqueness... even openssl_random_pseudo_bytes() should not be used, since it uses deprecated features of OpenSSL.
What you should use to generate random hash (same as md5) is random_bytes() (introduced in PHP7). To generate hash with same length as MD5:
bin2hex(random_bytes(16));
If you are using PHP 5.x you can get this function by including random_compat library.
Define "unique". If you mean that two tokens cannot have the same value, then hashing isn't enough - it should be backed with a uniqueness test. The fact that you supply the hash algorithm with unique inputs does not guarantee unique outputs.
To answer your question, the problem is you can't have a generator that is guaranteed random and unique as random by itself, i.e., md5(mt_rand()) can lead to duplicates. What you want is "random appearing" unique values. uniqid gives the unique id, rand() affixes a random number making it even harder to guess, md5 masks the result to make it yet even harder to guess. Nothing is unguessable. We just need to make it so hard that they wouldn't even want to try.
I ran into an interesting idea a couple of years ago.
Storing two hash values in the datebase, one generated with md5($a) and the other with sha($a). Then chek if both the values are corect. Point is, if the attacker broke your md5(), he cannot break your md5 AND sha in the near future.
Problem is: how can that concept be used with the token generating needed for your problem?
First, the scope of this kind of procedure is to create a key/hash/code, that will be unique for one given database. It is impossible to create something unique for the whole world at a given moment.
That being said, you should create a plain, visible string, using a custom alphabet, and checking the created code against your database (table).
If that string is unique, then you apply a md5() to it and that can't be guessed by anyone or any script.
I know that if you dig deep into the theory of cryptographic generation you can find a lot of explanation about this kind of code generation, but when you put it to real usage it's really not that complicated.
Here's the code I use to generate a simple 10 digit unique code.
$alphabet = "aA1!bB2#cC3#dD5%eE6^fF7&gG8*hH9(iI0)jJ4-kK=+lL[mM]nN{oO}pP\qQ/rR,sS.tT?uUvV>xX~yY|zZ`wW$";
$code = '';
$alplhaLenght = strlen($alphabet )-1;
for ($i = 1; $i <= 10; $i++) {
$n = rand(1, $alplhaLenght );
$code .= $alphabet [$n];
}
And here are some generated codes, although you can run it yourself to see it work:
SpQ0T0tyO%
Uwn[MU][.
D|[ROt+Cd#
O6I|w38TRe
Of course, there can be a lot of "improvements" that can be applied to it, to make it more "complicated", but if you apply a md5() to this, it'll become, let's say "unguessable" . :)
MD5 is a decent algorithm for producing data dependent IDs. But in case you have more than one item which has the same bitstream (content), you will be producing two similar MD5 "ids".
So if you are just applying it to a rand() function, which is guaranteed not to create the same number twice, you are quite safe.
But for a stronger distribution of keys, I'd personally use SHA1 or SHAx etc'... but you will still have the problem of similar data leads to similar keys.

Categories