Better Random Generating PHP - php

I know that just using rand() is predictable, if you know what you're doing, and have access to the server.
I have a project that is highly dependent upon choosing a random number that is as unpredictable as possible. So I'm looking for suggestions, either other built-in functions or user functions, that can generate a better random number.
I used this to do a little test:
$i = 0;
while($i < 10000){
$rand = rand(0, 100);
if(!isset($array[$rand])){
$array[$rand] = 1;
} else {
$array[$rand]++;
}
sort($array);
$i++;
}
I found the results to be evenly distributed, and there is an odd pattern to the number of times each number is generated.

Adding, multiplying, or truncating a poor random source will give you a poor random result. See Introduction to Randomness and Random Numbers for an explanation.
You're right about PHP rand() function. See the second figure on Statistical Analysis for a striking illustration. (The first figure is striking, but it's been drawn by Scott Adams, not plotted with rand()).
One solution is to use a true random generator such as random.org. Another, if you're on Linux/BSD/etc. is to use /dev/random. If the randomness is mission critical, you will have to use a hardware random generator.

random.org has an API you can access via HTTP.
RANDOM.ORG is a true random number service that generates randomness
via atmospheric noise.

I would be wary of the impression of randomness: there have been many experiments where people would choose the less random distribution. It seems the mind is not very good at producing or estimating randomness.
There are good articles on randomness at Fourmilab, including another true random generator. Maybe you could get random data from both sites so if one is down you still have the other.
Fourmilab also provides a test program to check randomness. You could use it to check your various myRand() programs.
As for your last program, if you generate 10000 values, why don't you choose the final value amongst the 10 thousand? You restrict yourself to a subset. Also, it won't work if your $min and $max are greater than 10000.
Anyway, the randomness you need depends on your application. rand() will be OK for an online game, but not OK for cryptography (anything not thoroughly tested with statistical programs will not be suitable for cryptography anyway). You be the judge!

Another way of getting random numbers, similar in concept to getting UUID
PHP Version 5.3 and above
openssl_random_pseudo_bytes(...)
Or you can try the following library using RFC4122

Variation on #KG, using the milliseconds since EPOCH as the seed for rand?

A new PHP7 there is a function that does exactly what you needed: it generates cryptographically secure pseudo-random integers.
int random_int ( int $min , int $max )
Generates cryptographic random integers that are suitable for use
where unbiased results are critical (i.e. shuffling a Poker deck).
For a more detailed explanation about PRNG and CSPRNG (and their difference) as well as why your original approach is actually a bad idea, please read my another highly similar answer.

Related

Does mt_rand() generate same number twice?

I am trying to generate unique token id, can i use mt_rand()?
will mt_rand() generate same number twice?
This is the first time ever I'm going to answer a question with just a comic, just because it's the right answer:
The original on Dilbert.com.
mt_rand will generate the same number twice, yes. Every random number generator will probably do so eventually. (Theoretically) every number has the same chance of being generated every time the generator is run. It could randomly generate the same number many times in a row. It's random.
To use a random number generator for unique ids, the probability of generating the same number twice must be so low as to be irrelevant in practice. In this regard, mt_rand is probably not sufficient. The concept of randomly generated unique ids has been formalised into UUIDs, which you should use for exactly the purpose of a universally unique id.
Take this quote:
...only after generating 1 billion UUIDs every second for the next 100 years, the probability of creating just one duplicate would be about 50%.
Since mt_rand returns a 32-bit integer on 32-bit systems, it can only return 2^32 unique values, which is a mere 4,294,967,296 unique values. If you were generating a billion mt_rand values every second, you're basically guaranteed a duplicate after 4 seconds. That hopefully illustrates the difference of scale between UUIDs and mt_rand and why that matters. Even if you're generating much fewer than 1 billion ids every second, you'll still need to choose an algorithm which makes collisions practically impossible, not just unlikely.
mt_rand() will return a random number on each call.
But eventually, it will return numbers that have already been returned to you. That is because, along with being consistent with desired statistical properties of randomness, the generator has a finite (although for Mersenne Twister, a very large) periodicity. If this behaviour is undesirable then your best bet is to shuffle a unique set using that generator.
mt_rand() has the following caveats:
Caution: This function does not generate cryptographically secure
values, and should not be used for cryptographic purposes. If you need
a cryptographically secure value, consider using
openssl_random_pseudo_bytes() instead.
And
Caution: The distribution of mt_rand() return values is biased towards
even numbers on 64-bit builds of PHP when max is beyond 2^32.
If you're fine with it not being completely unbiased, you should be ok.
But, if that's the case, why aren't you using uniqid?

PHP shuffle vs rand - which produces better randomness

Does any one know which of the two choices for random number generation is going to produce a better randomness:
<?php
$array = array(1,2,3,4,5,6);
shuffle($array);
echo $array[0];
//// OR ////
echo rand(1,6);
?>
Or perhaps theres an even better option I am unaware of?
Both use the same PRNG so their randomness is equally good/bad. Obviously a plain rand(1,6) is faster since it's just the math and not both the math and the array stuff. Besides that, you'd use array_rand() if you wanted a random element from an array.
PHP also has the mersenne twister (mt_rand()) which is "better" but unsuitable for cryptography (probably not relevant in your case).
If you need true randomness you could consider reading from /dev/random - but that read may block if there's no randomness available. You could also use a hardware device that gives you even better randomness based on some physical effect.
On most systems, seed the random number generator with a random value. In most applications, using the low order parts of the current time (like seconds and microseconds) is a pretty good strategy. Whether that works for you depends on the application.
On a Linux system, the program can also read a few bytes from /dev/random or /dev/urandom to seed the random number generator. It is a concentration of random entropy from device drivers and other hard-to-predict phenomena.

If PHP's mt_rand() uses a faster algorithm than rand(), why not just change rand() to use the newer implementation?

The purpose of a random number function is to get -- you guessed it -- a random number, something you cannot predict (or be very hard to predict with certainty). If the mt_rand() function is faster and less predictable (more "random") than the old rand(), why not just switch the underlying implementation to the new method?
To put it another way, what kind of program that uses rand() would break in a later version of PHP if/because the underlying implementation changed?
Mainly because that's the PHP way. Just like they added mysql_real_escape_string instead of replacing mysql_escape_string with it.
However, it might also be related to the disadvantages the mersenne-twister algorithm has (I have no clue if they are also present in the rand() algorithm though):
The algorithm in its native form is not suitable for cryptography (unlike Blum Blum Shub). Observing a sufficient number of iterates (624 in the case of MT19937, since this figure is the size of the state vector from which future iterates are produced) allows one to predict all future iterates. A pair of cryptographic stream ciphers based on output from Mersenne twister has been proposed by Makoto Matsumoto et al. The authors claim speeds 1.5 to 2 times faster than Advanced Encryption Standard in counter mode. wikipedia
Another issue is that it can take a long time to turn a non-random initial state (notably the presence of many zeros) into output that passes randomness tests. A small lagged Fibonacci generator or linear congruential generator gets started much more quickly and usually is used to seed the Mersenne Twister with random initial values. wikipedia
Both algorithms are pseudo-random. That implies that knowing the initial conditions makes it possible to know all future iterations. It is impossible to know if someone relies on such implementation details (i.e. relying on the implementation of the function instead of on the intent of the function), and it is therefore safer to create a new function.
Finally as of PHP 7.1, Both rand() and mt_rand() are the same function.

What's the disadvantage of mt_rand?

What's the definition of bias in:
The distribution of mt_rand() return values is biased towards even numbers on 64-bit builds of PHP when max is beyond 2^32.
If it's the kind of bias stated in alternate tie-breaking rules for rounding, I don't think it really matters (since the bias is not really visible).
Besides mt_rand() is claimed to be four times faster than rand(), just by adding three chars in front!
Assuming mt_rand is available, what's the disadvantage of using it?
mt_rand uses the Mersenne Twister algorithm, which is far better than the LCG typically used by rand. For example, the period of an LCG is a measly 232, whereas the period of mt_rand is 219937 − 1. Also, all the values generated by an LCG will lie on lines or planes when plotted into a multidimensional space. Also, it is not only practically feasible, but relatively easy to determine the parameters of an LCG. The only advantage LCGs have is being potentially slightly faster, but on a scale that is completely irrelevant when coding in php.
However, mt_rand is not suitable for cryptographic purposes (generation of tokens, passwords or cryptographic keys) either.
If you need cryptographic randomness, use random_int in php 7. On older php versions, read from /dev/urandom or /dev/random on a POSIX-conforming operating system.
The distribution quirk that you quoted is only relevant when the random number range you're generating is larger than 2^32. That is 4294967296.
If you're working with numbers that big, and you need them to be randomised, then perhaps this is a reason to reconsider using mt_rand(). However if your working with numbers smaller than this, then it is irrelevant.
The reason it happens is due to the precision of the random number generator not being good enough in those high ranges.
I've never worked with random numbers that large, so I've never needed to worry about it.
The difference between rand() and mt_rand() is a lot more than "just three extra characters". They are entirely different function calls, and work in completly different ways. Just the same as you don't expect print() and print_r() to be similar.
mt_rand() gets it's name from the "Mersene Twister" algorithm it uses to generate the random numbers. This algorithm is known to be a quick, efficient and high quality random number generator, which is why it is available in PHP.
The older rand() function makes use of the operating system's random number generator by making a system call. This means that it uses whatever random number generator happens to be the default on the operating system you're using. In general, the default random number generator uses a much slower and older algorithm, hence the claim that my_rand() is quicker, but it will vary from system to system.
Therefore, for virtually all uses, mt_rand() is a better function to use than rand().
You say "assuming mt_rand() is available", but it always will be since it was introduced way back in PHP4.

Generate cryptographically secure random numbers in php

PHP's rand() function doesn't give good random numbers. So I started to use mt_rand() which is said to give better results. But how good are these results? Are there any methods to improve them again?
My idea:
function rand_best($min, $max) {
$generated = array();
for ($i = 0; $i < 100; $i++) {
$generated[] = mt_rand($min, $max);
}
shuffle($generated);
$position = mt_rand(0, 99);
return $generated[$position];
}
This should give you "perfect" random numbers, shouldn't it?
Pseudorandom number generators (PRNG) are very complex beast.
There are no real "perfect" random number generators -- in fact the best that can be done from mathematical functions are pseudorandom -- they seem random enough for most intents and purposes.
In fact, performing any additional actions from a number returned by a PRNG doesn't really increase its randomness, and in fact, the number can become less random.
So, my best advice is, don't mess around with values returned from a PRNG. Use a PRNG that is good enough for the intended use, and if it isn't, then find a PRNG that can produce better results, if necessary.
And frankly, it appears that the mt_rand function uses the Mersenne twister, which is a pretty good PRNG as it is, so it's probably going to be good enough for most casual use.
However, Mersenne Twister is not designed to be used in any security contexts. See this answer for a solution to use when you need randomness to ensure security.
Edit
There was a question in the comments why performing operations on a random number can make it less random. For example, some PRNGs can return more consistent, less random numbers in different parts of the bits -- the high-end can be more random than the low-end.
Therefore, in operations where the high-end is discarded, and the low end is returned, the value can become less random than the original value returned from the PRNG.
I can't find a good explanation at the moment, but I based that from the Java documentation for the Random.nextInt(int) method, which is designed to create a fairly random value in a specified range. That method takes into account the difference in randomness of the parts of the value, so it can return a better random number compared to more naive implementations such as rand() % range.
Quick answer:
In a new PHP7 there is a finally a support for a cryptographically secure pseudo-random integers.
int random_int ( int $min , int $max )
There is also a polyfill for PHP5x.
Longer answer
There is no perfect random number generator, and computers use pseudorandom number generator to create sequences that looks random. The sequences look random (and pass some randomness tests) but because there is some algorithm to generate it, you can repeat algorithm with absolutely the same states and get the same result.
The same advice as with cryptography "do not invent your own cypher" can be translated to random number generators and mean that you can not just get a lot of random number generators combined together and get expect to get a better generator.
One of the subsets of random number generators is cryptographically secure random number generators:
The requirements of an ordinary PRNG are also satisfied by a
cryptographically secure PRNG, but the reverse is not true. CSPRNG
requirements fall into two groups: first, that they pass statistical
randomness tests; and secondly, that they hold up well under serious
attack, even when part of their initial or running state becomes
available to an attacker
So this is pretty close to your definition of "perfect". One more time under no condition (except of learning how to do cryptography) you should try to implement one of that algorithms and use it in your system.
But luckily PHP7 has it implemented,
int random_int ( int $min , int $max )
Generates cryptographic random integers that are suitable for use
where unbiased results are critical (i.e. shuffling a Poker deck).
The sources of random are as follows:
On Windows CryptGenRandom() is used exclusively
arc4random_buf() is used if it is available (generally BSD specific)
/dev/arandom is used where available
The getrandom(2) syscall (on newer Linux kernels)
/dev/urandom is used where none of the above is available
This makes all the previous answers obsolete (and some deprecated).
I'm not sure that what you've done "improves" the randomness. From what I can understand you generate 100 random numbers and then randomly pick one of them.
From what I can remember from my probability course, this probably doesn't increase the randomness, as if there is an underlying bias in the generator function (mt_rand()), then it will still be reflected somehow in the output.
In what way is mt_rand() "bad"?
For example: If it favors a certain number. Lets say mt_rand(1, 10) favours low numbers in the range, ie "1" and "2" occurs on average more than 10% each. Then your "improvement" would still suffer from the same problem.
Selecting a random number out of a faulty sequence will still be faulty.
<?php
function random_number(){
return 4; // return generated number
// guaranteed to be random
}
?>
All joking aside, you're getting into a philosophical question of what is "random" or what is "best". Ideally you'd want your random numbers to have few patterns in them over the course of your procedure. Generally system time is used as the seed, but I've also used the previous random number as the seed, the previous random numberth ago as the seed. The problem is, with a powerful enough computer and full knowledge of the hardware running, and generator function, you would be able to predict the entire set of numbers generated. Thus if you had a powerful enough computer (some people put God into this category) that knew all possible variables and functions of the universe you would then be able to predict every event that happened or will happen. Most random number generators are fine on their own but if you know someone who can see the patterns, more likely they are like the guy in Beautiful Mind and you should get them checked into a clinic.
By popular demand :D
I wrote a cronjob that gets 1000 numbers from random.org periodically (say, once an hour) and added them into a PHP array. Whenever I want random numbers in my script, I use mt_rand(0,1000) to call a number from that. A few extra microseconds of overhead, but I get truly random numbers based on natural atmospheric noise.
It all depends what for you need that random number :)
For me ShuffleBag is the best one :)
Edit: My comment is no longer valid. Please see the following answer: https://stackoverflow.com/a/31443898/109561
I'm guessing you're worried about the distribution of mt_rand(). I have tested it and it is very level and both bounds are inclusive.
I added my test to the comments of the documentation for mt_rand() on the php manual, but it was removed by a silly moderator due to politics that are too long winded to go into here.
If you don't like PHP's built in rand(), you probably shouldn't use their built-in shuffle() either, since it seems to be built on their rand().
I am halfway sure the "industry standard" shuffle now is the Fisher-Yates shuffle.
There is no such thing as a "perfect" random number. No matter what subjective definition of "perfect" you have. You can only achieve pseudo-random.
I was simply trying to point you in the right direction. You asked a question about perfect random numbers, even if perfect was in quotes. And yes, you can improve randomness. You can even implement heuristic or "natural" algorithms, such ideas like "atmospheric noise" -- but still, you're not perfect, not by any means.
use /dev/ramdom (linux device true random number generator) to seed mt_rand
<?
$rnd_dev=mcrypt_create_iv(4, MCRYPT_DEV_RANDOM); //need "apt-get install php5-mcrypt"
$seed=ord(substr($rnd_dev, 0, 1))<<24 |
ord(substr($rnd_dev, 1, 1))<<16 |
ord(substr($rnd_dev, 2, 1))<<8 |
ord(substr($rnd_dev, 3, 1));
mt_srand($seed);
echo mt_rand();
?>
I made a PHP class for generating random numbers and strings PHPRandomValue
It uses "mcrypt_create_iv(4, MCRYPT_DEV_URANDOM)" to generate random numbers and values. I made it while working on a crypto project because I needed a safe random value generator. Here's an example usage
$randomValue = new RandomValue;
$randomValue->randomNumber(): = -3880998
$randomValue->randomNumberBetween(1,10): = 2
$randomValue->randomTextString(): = CfCkKDHRgUULdGWcSqP4
$randomValue->randomTextString(10): = LorPIxaeEY
$randomValue->randomKey(): = C7al8tX9.gqYLf2ImVt/!$NOY79T5sNCT/6Q.$!.6Gf/Q5zpa3
$randomValue->randomKey(10): = RDV.dc6Ai/
It is not possible to generate true random numbers, the best you can hope for is pseudo-random which is what rand() provides, your function is no closer to random then rand(). Take a look at this http://en.wikipedia.org/wiki/Random_number_generator
Tru Random numbers
<?php
for ($i = -1; $i <= 4; $i++) {
$bytes = openssl_random_pseudo_bytes($i, $cstrong);
$hex = bin2hex($bytes);
echo "Lengths: Bytes: $i and Hex: " . strlen($hex) . PHP_EOL;
var_dump($hex);
var_dump($cstrong);
echo PHP_EOL;
}
?>
and also crypto secure ;)
Although the answer was accepted years ago, I'll re-reopen it.
Since all this randomness depends on the system time, let's mess with the system time too! The amount of time an operation takes on the computer is actually rather variable (especially if other stuff is happening on that server), so if we take advantage of that with microtime... (couldn't find any portable nanotime commands)
$a='';
for (int $i=0; $i<9001; $i++)
{
usleep(mt_rand(1000,10000));//Also eliminates timing attacks... possibly?
$a=hash('SHA512',$a.uniqid(mt_rand().microtime(),true));
}
echo $a;
Nominally this has 207023 bits of entropy, since you're adding another 23 bits every iteration, but there's a lot of interdependencies, so it's probably a few orders of magnitude less. Still pretty good.
Do you know of any operations on PHP that take a really random amount of time? Like... HTTP-requesting some website (other than RANDOM.org) and measuring the time it takes?
Using random.org, you can use this:
function getToken($length, $min, $max){
$r = explode('
',file_get_contents('http://www.random.org/integers/num='.$length.'&min='.$min.'&max='.$max.'&col=1&base=10&format=plain'));
$string = '';
foreach ( $r as $char ) $string.=$char;
return $string;
}
this should give real random numbers

Categories