Does mt_rand() generate same number twice?

Does mt_rand() generate same number twice? - php

I am trying to generate unique token id, can i use mt_rand()?
will mt_rand() generate same number twice?

This is the first time ever I'm going to answer a question with just a comic, just because it's the right answer:
The original on Dilbert.com.

mt_rand will generate the same number twice, yes. Every random number generator will probably do so eventually. (Theoretically) every number has the same chance of being generated every time the generator is run. It could randomly generate the same number many times in a row. It's random.
To use a random number generator for unique ids, the probability of generating the same number twice must be so low as to be irrelevant in practice. In this regard, mt_rand is probably not sufficient. The concept of randomly generated unique ids has been formalised into UUIDs, which you should use for exactly the purpose of a universally unique id.
Take this quote:
...only after generating 1 billion UUIDs every second for the next 100 years, the probability of creating just one duplicate would be about 50%.
Since mt_rand returns a 32-bit integer on 32-bit systems, it can only return 2^32 unique values, which is a mere 4,294,967,296 unique values. If you were generating a billion mt_rand values every second, you're basically guaranteed a duplicate after 4 seconds. That hopefully illustrates the difference of scale between UUIDs and mt_rand and why that matters. Even if you're generating much fewer than 1 billion ids every second, you'll still need to choose an algorithm which makes collisions practically impossible, not just unlikely.

mt_rand() will return a random number on each call.
But eventually, it will return numbers that have already been returned to you. That is because, along with being consistent with desired statistical properties of randomness, the generator has a finite (although for Mersenne Twister, a very large) periodicity. If this behaviour is undesirable then your best bet is to shuffle a unique set using that generator.

mt_rand() has the following caveats:
Caution: This function does not generate cryptographically secure
values, and should not be used for cryptographic purposes. If you need
a cryptographically secure value, consider using
openssl_random_pseudo_bytes() instead.
And
Caution: The distribution of mt_rand() return values is biased towards
even numbers on 64-bit builds of PHP when max is beyond 2^32.
If you're fine with it not being completely unbiased, you should be ok.
But, if that's the case, why aren't you using uniqid?

Related

How easily will uniqid() with more entropy create a duplicate?

This might be an off topic question but i hope someone can answer this question.
Per how many nanoseconds, mili seconds or seconds does uniqid() with more entropy run the risk of creating a duplicate?
With reference to link below, uniqid will collide if two id are created in one milisecond. What about with more entropy?
(My goal is to use a small indexable alphanumeric string as document id at creation that can be created fast with minimum processor power without db interference.)
Answers here dont seem to provide any exact number:
How unique is uniqid?

From the source code, more_entropy adds nine random decimal digits, so you can expect a collision after 37,000 or so calls. (For how a billion turned into 37,000, see the birthday attack.) That of course ignores the fact that these digits are not actually random but generated by an LCG, and the same LCG is probably used in other places in the code, so the actual chance of collision is probably higher (by how much exactly, I have no idea).
Also worth noting that uniqid does not actually guarantee microsecond resolution as some PHP implementations (Windows, specifically) don't have access to a microsecond-precision clock.
In short, if you need a unique ID for anything security-sensitive, or collisions are costly, avoid uniqid. Otherwise, using it with more_entropy is probably fine (although the common pattern is to use uniqid(mt_rand(), true) to add even more extra entropy).

Is it wrong to use a hash for a unique ID?

I want to use a unique ID generated by PHP in a database table that will likely never have more than 10,000 records. I don't want the time of creation to be visible or use a purely numeric value so I am using:
sha1(uniqid(mt_rand(), true))
Is it wrong to use a hash for a unique ID? Don't all hashes lead to collisions or are the chances so remote that they should not be considered in this case?
A further point: if the number of characters to be hashed is less than the number of characters in a sha1 hash, won't it always be unique?

If you have 2 keys you will have a theoretical best case scenario of 1 in 2 ^ X probability of a collision, where X is the number of bits in your hashing algorithm. 'Best case' because the input usually will be ASCII which doesn't utilize the full charset, plus the hashing functions do not distribute perfectly, so they will collide more often than the theoretical max in real life.
To answer your final question:
A further point: if the number of characters to be hashed is less than
the number of characters in a sha1 hash, won't it always be unique?
Yeah that's true-sorta. But you would have another problem of generating unique keys of that size. The easiest way is usually a checksum, so just choose a large enough digest that the collision space will be small enough for your comfort.
As #wayne suggests, a popular approach is to concatenate microtime() to your random salt (and base64_encode to raise the entropy).

How horrible would it be if two ended up the same? Murphy's Law applies - if a million to one, or even a 100,000:1 chance is acceptable, then go right ahead! The real chance is much, much smaller - but if your system will explode if it happens then your design flaw must be addressed first. Then proceed with confidence.
Here is a question/answer of what the probabilities really are: Probability of SHA1 Collisions

Use sha1(time()) in stead, then you remove the random possibility of a repeating hash for as long as time can be represented shorter than the sha1 hash. (likely longer than you fill find a working php parser ;))

Computer random isn't actually random, you know?
The only true random that you can obtain from a computer, supposing you are on a Unix environment is from /dev/random, but this is a blocking operation that depends on user interactions like moving a mouse or typing on keyboard. Reading from /dev/urandom is less safe, but it's probably better thang using just ASCII characters and gives you instantaneous response.

sha1($ipAddress.time())
Causes it's impossible for anyone to use same IP address same time

How unique is uniqid?

This question isn't really a problem looking for a solution, it's more just a matter of simple curiosity. The PHP uniqid function has a more entropy flag, to make the output "more unique". This got me wondering, just how likely is it for this function to produce the same result more than once when more_entropy is true, versus when it isn't. In other words, how unique is uniqid when more_entropy is enabled, versus when it is disabled? Are there any drawbacks to having more_entropy enabled all the time?

Update, March 2014:
Firstly, it is important to note that uniqid is a bit of a misnomer as it doesnt guarantee a unique ID.
Per the PHP documentation:
WARNING!
This function does not create random nor unpredictable string. This
function must not be used for security purposes. Use cryptographically
secure random function/generator and cryptographically secure hash
functions to create unpredictable secure ID.
And
This function does not generate cryptographically secure tokens, in
fact without being passed any additional parameters the return value
is little different from microtime(). If you need to generate
cryptographically secure tokens use openssl_random_pseudo_bytes().
Setting more-entropy to true generates a more unique value, however the execution time is longer (though to a tiny degree), according to the docs:
If set to TRUE, uniqid() will add additional entropy (using the
combined linear congruential generator) at the end of the return
value, which increases the likelihood that the result will be unique.
Note the line increases the likelihood that the result will be unique and not that is will guarantee uniqueness.
You can 'endlessly' strive for uniqueness, up to a point, and enhance using any number of encryption routines, adding salts and the like- it depends on the purpose.
I'd recommend looking at the comments on the main PHP topic, notably:
http://www.php.net/manual/en/function.uniqid.php#96898
http://www.php.net/manual/en/function.uniqid.php#96549
http://www.php.net/manual/en/function.uniqid.php#95001
What I'd recommend is working out why you need uniqueness, is it for security (i.e. to add to an encryption/scrambling routine)? Also, How unique does it need to be? Finally, look at the speed consideration. Suitability will change with the underlying considerations.

Things are only unique if you check that they don't exist already. It doesn't matter what function you use to generate a 'random' string, or ID - if you don't double check that it isn't a duplicate, then there's always that chance.. ;)
While uniqid is based on the current time, the cautionary note above still applies - it just depends on where you will be using these "unique IDs". The clue to all this is where it says "more unique". Unique is unique is unique. How you can have something which is more or less unique, is a bit confusing to me!
Checking as above, and combining all this stuff will let you end up with something approaching uniqueness, but its all relative to where the keys will be used and the context. Hope that helps!

From the discussions about the function on the PHP manual site:
As others below note, without prefix
and without "added entropy", this
function simply returns the UNIX
timestamp with added microsecond
counter as a hex number; it's more or
less just microtime(), in hexit form.
[...]
Also worth to note is that since microtime() only works on systems that have gettimeofday() > present, which Windows natively DOES NOT, uniqid() might yield just the single-second-resolution UNIX timestamp in a Windows environment.
In other words without "more_entropy", the function is absolutely horrible and should never be used, period. Accoding to the documentation, the flag will use a "combined linear congruential generator" to "add entropy". Well, that's a pretty weak RNG. So I'd skip this function entirely and use something based on mt_rand with a good seed for things that are not security-relevant, and SHA-256 for things that are.

Without the more_unique flag, it returns the unix timestamp with a microsecond counter, therefore if two calls get made at the same microsecond then they will return the same 'unique' id.
From there it is a question of how likely that is. The answer is, not very, but not to a discountable degree. If you need a unique id and you generate them often (or work with data generated elsewhere), don't count on it to be absolutely unique.

The relevant bit from the source code is
if (more_entropy) {
uniqid = strpprintf(0, "%s%08x%05x%.8F", prefix, sec, usec, php_combined_lcg() * 10);
} else {
uniqid = strpprintf(0, "%s%08x%05x", prefix, sec, usec);
}
So more_entropy adds nine somewhat random decimal digits (php_combined_lcg() returns a value in (0,1)) - that's 29.9 bits of entropy, tops (in reality probably less as LCG is not a cryptographically secure pseudorandom number generator).

After reading the source code of uniqueId, it's clear the way it works is to convert the time in microseconds from 1970-01-01 00:00:00 into an ID. It also waits until a microsecond has passed.
That means in the following code:
$uniqueId = uniqid();
$uniqueId1 = uniqid();
You can be certain that $uniqueId != $uniqueId1, even without the more_entropy flag, as each ID will always be generated from a different microsecond.
If the ID's are generated on a different server or possibly even the same server but a different thread, then there is a chance the time in microseconds is the same, therefore uniqueid may not be unique. If this is the case, then you use the more_entropy flag for an extra 29.9 bits more of entropy. The chance of a collision would now be so highly improbably, that its probably not worth even checking to make sure the ID already exists.
If you are generating the ID's only on a single server without multithreading php, then there's no point using the more_entropy flag, otherwise use it. If you need a cryptographically secure ID, then you should use a decent 256 bit RNG instead.

Generate cryptographically secure random numbers in php

PHP's rand() function doesn't give good random numbers. So I started to use mt_rand() which is said to give better results. But how good are these results? Are there any methods to improve them again?
My idea:
function rand_best($min, $max) {
$generated = array();
for ($i = 0; $i < 100; $i++) {
$generated[] = mt_rand($min, $max);
}
shuffle($generated);
$position = mt_rand(0, 99);
return $generated[$position];
}
This should give you "perfect" random numbers, shouldn't it?

Pseudorandom number generators (PRNG) are very complex beast.
There are no real "perfect" random number generators -- in fact the best that can be done from mathematical functions are pseudorandom -- they seem random enough for most intents and purposes.
In fact, performing any additional actions from a number returned by a PRNG doesn't really increase its randomness, and in fact, the number can become less random.
So, my best advice is, don't mess around with values returned from a PRNG. Use a PRNG that is good enough for the intended use, and if it isn't, then find a PRNG that can produce better results, if necessary.
And frankly, it appears that the mt_rand function uses the Mersenne twister, which is a pretty good PRNG as it is, so it's probably going to be good enough for most casual use.
However, Mersenne Twister is not designed to be used in any security contexts. See this answer for a solution to use when you need randomness to ensure security.
Edit
There was a question in the comments why performing operations on a random number can make it less random. For example, some PRNGs can return more consistent, less random numbers in different parts of the bits -- the high-end can be more random than the low-end.
Therefore, in operations where the high-end is discarded, and the low end is returned, the value can become less random than the original value returned from the PRNG.
I can't find a good explanation at the moment, but I based that from the Java documentation for the Random.nextInt(int) method, which is designed to create a fairly random value in a specified range. That method takes into account the difference in randomness of the parts of the value, so it can return a better random number compared to more naive implementations such as rand() % range.

Quick answer:
In a new PHP7 there is a finally a support for a cryptographically secure pseudo-random integers.
int random_int ( int $min , int $max )
There is also a polyfill for PHP5x.
Longer answer
There is no perfect random number generator, and computers use pseudorandom number generator to create sequences that looks random. The sequences look random (and pass some randomness tests) but because there is some algorithm to generate it, you can repeat algorithm with absolutely the same states and get the same result.
The same advice as with cryptography "do not invent your own cypher" can be translated to random number generators and mean that you can not just get a lot of random number generators combined together and get expect to get a better generator.
One of the subsets of random number generators is cryptographically secure random number generators:
The requirements of an ordinary PRNG are also satisfied by a
cryptographically secure PRNG, but the reverse is not true. CSPRNG
requirements fall into two groups: first, that they pass statistical
randomness tests; and secondly, that they hold up well under serious
attack, even when part of their initial or running state becomes
available to an attacker
So this is pretty close to your definition of "perfect". One more time under no condition (except of learning how to do cryptography) you should try to implement one of that algorithms and use it in your system.
But luckily PHP7 has it implemented,
int random_int ( int $min , int $max )
Generates cryptographic random integers that are suitable for use
where unbiased results are critical (i.e. shuffling a Poker deck).
The sources of random are as follows:
On Windows CryptGenRandom() is used exclusively
arc4random_buf() is used if it is available (generally BSD specific)
/dev/arandom is used where available
The getrandom(2) syscall (on newer Linux kernels)
/dev/urandom is used where none of the above is available
This makes all the previous answers obsolete (and some deprecated).

I'm not sure that what you've done "improves" the randomness. From what I can understand you generate 100 random numbers and then randomly pick one of them.
From what I can remember from my probability course, this probably doesn't increase the randomness, as if there is an underlying bias in the generator function (mt_rand()), then it will still be reflected somehow in the output.

In what way is mt_rand() "bad"?
For example: If it favors a certain number. Lets say mt_rand(1, 10) favours low numbers in the range, ie "1" and "2" occurs on average more than 10% each. Then your "improvement" would still suffer from the same problem.
Selecting a random number out of a faulty sequence will still be faulty.

<?php
function random_number(){
return 4; // return generated number
// guaranteed to be random
}
?>
All joking aside, you're getting into a philosophical question of what is "random" or what is "best". Ideally you'd want your random numbers to have few patterns in them over the course of your procedure. Generally system time is used as the seed, but I've also used the previous random number as the seed, the previous random numberth ago as the seed. The problem is, with a powerful enough computer and full knowledge of the hardware running, and generator function, you would be able to predict the entire set of numbers generated. Thus if you had a powerful enough computer (some people put God into this category) that knew all possible variables and functions of the universe you would then be able to predict every event that happened or will happen. Most random number generators are fine on their own but if you know someone who can see the patterns, more likely they are like the guy in Beautiful Mind and you should get them checked into a clinic.
By popular demand :D

I wrote a cronjob that gets 1000 numbers from random.org periodically (say, once an hour) and added them into a PHP array. Whenever I want random numbers in my script, I use mt_rand(0,1000) to call a number from that. A few extra microseconds of overhead, but I get truly random numbers based on natural atmospheric noise.

It all depends what for you need that random number :)
For me ShuffleBag is the best one :)

Edit: My comment is no longer valid. Please see the following answer: https://stackoverflow.com/a/31443898/109561
I'm guessing you're worried about the distribution of mt_rand(). I have tested it and it is very level and both bounds are inclusive.
I added my test to the comments of the documentation for mt_rand() on the php manual, but it was removed by a silly moderator due to politics that are too long winded to go into here.

If you don't like PHP's built in rand(), you probably shouldn't use their built-in shuffle() either, since it seems to be built on their rand().
I am halfway sure the "industry standard" shuffle now is the Fisher-Yates shuffle.

There is no such thing as a "perfect" random number. No matter what subjective definition of "perfect" you have. You can only achieve pseudo-random.
I was simply trying to point you in the right direction. You asked a question about perfect random numbers, even if perfect was in quotes. And yes, you can improve randomness. You can even implement heuristic or "natural" algorithms, such ideas like "atmospheric noise" -- but still, you're not perfect, not by any means.

use /dev/ramdom (linux device true random number generator) to seed mt_rand
<?
$rnd_dev=mcrypt_create_iv(4, MCRYPT_DEV_RANDOM); //need "apt-get install php5-mcrypt"
$seed=ord(substr($rnd_dev, 0, 1))<<24 |
ord(substr($rnd_dev, 1, 1))<<16 |
ord(substr($rnd_dev, 2, 1))<<8 |
ord(substr($rnd_dev, 3, 1));
mt_srand($seed);
echo mt_rand();
?>

I made a PHP class for generating random numbers and strings PHPRandomValue
It uses "mcrypt_create_iv(4, MCRYPT_DEV_URANDOM)" to generate random numbers and values. I made it while working on a crypto project because I needed a safe random value generator. Here's an example usage
$randomValue = new RandomValue;
$randomValue->randomNumber(): = -3880998
$randomValue->randomNumberBetween(1,10): = 2
$randomValue->randomTextString(): = CfCkKDHRgUULdGWcSqP4
$randomValue->randomTextString(10): = LorPIxaeEY
$randomValue->randomKey(): = C7al8tX9.gqYLf2ImVt/!$NOY79T5sNCT/6Q.$!.6Gf/Q5zpa3
$randomValue->randomKey(10): = RDV.dc6Ai/

It is not possible to generate true random numbers, the best you can hope for is pseudo-random which is what rand() provides, your function is no closer to random then rand(). Take a look at this http://en.wikipedia.org/wiki/Random_number_generator

Tru Random numbers
<?php
for ($i = -1; $i <= 4; $i++) {
$bytes = openssl_random_pseudo_bytes($i, $cstrong);
$hex = bin2hex($bytes);
echo "Lengths: Bytes: $i and Hex: " . strlen($hex) . PHP_EOL;
var_dump($hex);
var_dump($cstrong);
echo PHP_EOL;
}
?>
and also crypto secure ;)

Although the answer was accepted years ago, I'll re-reopen it.
Since all this randomness depends on the system time, let's mess with the system time too! The amount of time an operation takes on the computer is actually rather variable (especially if other stuff is happening on that server), so if we take advantage of that with microtime... (couldn't find any portable nanotime commands)
$a='';
for (int $i=0; $i<9001; $i++)
{
usleep(mt_rand(1000,10000));//Also eliminates timing attacks... possibly?
$a=hash('SHA512',$a.uniqid(mt_rand().microtime(),true));
}
echo $a;
Nominally this has 207023 bits of entropy, since you're adding another 23 bits every iteration, but there's a lot of interdependencies, so it's probably a few orders of magnitude less. Still pretty good.
Do you know of any operations on PHP that take a really random amount of time? Like... HTTP-requesting some website (other than RANDOM.org) and measuring the time it takes?

Using random.org, you can use this:
function getToken($length, $min, $max){
$r = explode('
',file_get_contents('http://www.random.org/integers/num='.$length.'&min='.$min.'&max='.$max.'&col=1&base=10&format=plain'));
$string = '';
foreach ( $r as $char ) $string.=$char;
return $string;
}
this should give real random numbers

Better Random Generating PHP

I know that just using rand() is predictable, if you know what you're doing, and have access to the server.
I have a project that is highly dependent upon choosing a random number that is as unpredictable as possible. So I'm looking for suggestions, either other built-in functions or user functions, that can generate a better random number.
I used this to do a little test:
$i = 0;
while($i < 10000){
$rand = rand(0, 100);
if(!isset($array[$rand])){
$array[$rand] = 1;
} else {
$array[$rand]++;
}
sort($array);
$i++;
}
I found the results to be evenly distributed, and there is an odd pattern to the number of times each number is generated.

Adding, multiplying, or truncating a poor random source will give you a poor random result. See Introduction to Randomness and Random Numbers for an explanation.
You're right about PHP rand() function. See the second figure on Statistical Analysis for a striking illustration. (The first figure is striking, but it's been drawn by Scott Adams, not plotted with rand()).
One solution is to use a true random generator such as random.org. Another, if you're on Linux/BSD/etc. is to use /dev/random. If the randomness is mission critical, you will have to use a hardware random generator.

random.org has an API you can access via HTTP.
RANDOM.ORG is a true random number service that generates randomness
via atmospheric noise.

I would be wary of the impression of randomness: there have been many experiments where people would choose the less random distribution. It seems the mind is not very good at producing or estimating randomness.
There are good articles on randomness at Fourmilab, including another true random generator. Maybe you could get random data from both sites so if one is down you still have the other.
Fourmilab also provides a test program to check randomness. You could use it to check your various myRand() programs.
As for your last program, if you generate 10000 values, why don't you choose the final value amongst the 10 thousand? You restrict yourself to a subset. Also, it won't work if your $min and $max are greater than 10000.
Anyway, the randomness you need depends on your application. rand() will be OK for an online game, but not OK for cryptography (anything not thoroughly tested with statistical programs will not be suitable for cryptography anyway). You be the judge!

Another way of getting random numbers, similar in concept to getting UUID
PHP Version 5.3 and above
openssl_random_pseudo_bytes(...)
Or you can try the following library using RFC4122

Variation on #KG, using the milliseconds since EPOCH as the seed for rand?

A new PHP7 there is a function that does exactly what you needed: it generates cryptographically secure pseudo-random integers.
int random_int ( int $min , int $max )
Generates cryptographic random integers that are suitable for use
where unbiased results are critical (i.e. shuffling a Poker deck).
For a more detailed explanation about PRNG and CSPRNG (and their difference) as well as why your original approach is actually a bad idea, please read my another highly similar answer.

We Keep Coding

PHP, A popular general-purpose scripting language that is especially suited to web development.