Generate cryptographically secure random numbers in php - php

PHP's rand() function doesn't give good random numbers. So I started to use mt_rand() which is said to give better results. But how good are these results? Are there any methods to improve them again?
My idea:
function rand_best($min, $max) {
$generated = array();
for ($i = 0; $i < 100; $i++) {
$generated[] = mt_rand($min, $max);
}
shuffle($generated);
$position = mt_rand(0, 99);
return $generated[$position];
}
This should give you "perfect" random numbers, shouldn't it?

Pseudorandom number generators (PRNG) are very complex beast.
There are no real "perfect" random number generators -- in fact the best that can be done from mathematical functions are pseudorandom -- they seem random enough for most intents and purposes.
In fact, performing any additional actions from a number returned by a PRNG doesn't really increase its randomness, and in fact, the number can become less random.
So, my best advice is, don't mess around with values returned from a PRNG. Use a PRNG that is good enough for the intended use, and if it isn't, then find a PRNG that can produce better results, if necessary.
And frankly, it appears that the mt_rand function uses the Mersenne twister, which is a pretty good PRNG as it is, so it's probably going to be good enough for most casual use.
However, Mersenne Twister is not designed to be used in any security contexts. See this answer for a solution to use when you need randomness to ensure security.
Edit
There was a question in the comments why performing operations on a random number can make it less random. For example, some PRNGs can return more consistent, less random numbers in different parts of the bits -- the high-end can be more random than the low-end.
Therefore, in operations where the high-end is discarded, and the low end is returned, the value can become less random than the original value returned from the PRNG.
I can't find a good explanation at the moment, but I based that from the Java documentation for the Random.nextInt(int) method, which is designed to create a fairly random value in a specified range. That method takes into account the difference in randomness of the parts of the value, so it can return a better random number compared to more naive implementations such as rand() % range.

Quick answer:
In a new PHP7 there is a finally a support for a cryptographically secure pseudo-random integers.
int random_int ( int $min , int $max )
There is also a polyfill for PHP5x.
Longer answer
There is no perfect random number generator, and computers use pseudorandom number generator to create sequences that looks random. The sequences look random (and pass some randomness tests) but because there is some algorithm to generate it, you can repeat algorithm with absolutely the same states and get the same result.
The same advice as with cryptography "do not invent your own cypher" can be translated to random number generators and mean that you can not just get a lot of random number generators combined together and get expect to get a better generator.
One of the subsets of random number generators is cryptographically secure random number generators:
The requirements of an ordinary PRNG are also satisfied by a
cryptographically secure PRNG, but the reverse is not true. CSPRNG
requirements fall into two groups: first, that they pass statistical
randomness tests; and secondly, that they hold up well under serious
attack, even when part of their initial or running state becomes
available to an attacker
So this is pretty close to your definition of "perfect". One more time under no condition (except of learning how to do cryptography) you should try to implement one of that algorithms and use it in your system.
But luckily PHP7 has it implemented,
int random_int ( int $min , int $max )
Generates cryptographic random integers that are suitable for use
where unbiased results are critical (i.e. shuffling a Poker deck).
The sources of random are as follows:
On Windows CryptGenRandom() is used exclusively
arc4random_buf() is used if it is available (generally BSD specific)
/dev/arandom is used where available
The getrandom(2) syscall (on newer Linux kernels)
/dev/urandom is used where none of the above is available
This makes all the previous answers obsolete (and some deprecated).

I'm not sure that what you've done "improves" the randomness. From what I can understand you generate 100 random numbers and then randomly pick one of them.
From what I can remember from my probability course, this probably doesn't increase the randomness, as if there is an underlying bias in the generator function (mt_rand()), then it will still be reflected somehow in the output.

In what way is mt_rand() "bad"?
For example: If it favors a certain number. Lets say mt_rand(1, 10) favours low numbers in the range, ie "1" and "2" occurs on average more than 10% each. Then your "improvement" would still suffer from the same problem.
Selecting a random number out of a faulty sequence will still be faulty.

<?php
function random_number(){
return 4; // return generated number
// guaranteed to be random
}
?>
All joking aside, you're getting into a philosophical question of what is "random" or what is "best". Ideally you'd want your random numbers to have few patterns in them over the course of your procedure. Generally system time is used as the seed, but I've also used the previous random number as the seed, the previous random numberth ago as the seed. The problem is, with a powerful enough computer and full knowledge of the hardware running, and generator function, you would be able to predict the entire set of numbers generated. Thus if you had a powerful enough computer (some people put God into this category) that knew all possible variables and functions of the universe you would then be able to predict every event that happened or will happen. Most random number generators are fine on their own but if you know someone who can see the patterns, more likely they are like the guy in Beautiful Mind and you should get them checked into a clinic.
By popular demand :D

I wrote a cronjob that gets 1000 numbers from random.org periodically (say, once an hour) and added them into a PHP array. Whenever I want random numbers in my script, I use mt_rand(0,1000) to call a number from that. A few extra microseconds of overhead, but I get truly random numbers based on natural atmospheric noise.

It all depends what for you need that random number :)
For me ShuffleBag is the best one :)

Edit: My comment is no longer valid. Please see the following answer: https://stackoverflow.com/a/31443898/109561
I'm guessing you're worried about the distribution of mt_rand(). I have tested it and it is very level and both bounds are inclusive.
I added my test to the comments of the documentation for mt_rand() on the php manual, but it was removed by a silly moderator due to politics that are too long winded to go into here.

If you don't like PHP's built in rand(), you probably shouldn't use their built-in shuffle() either, since it seems to be built on their rand().
I am halfway sure the "industry standard" shuffle now is the Fisher-Yates shuffle.

There is no such thing as a "perfect" random number. No matter what subjective definition of "perfect" you have. You can only achieve pseudo-random.
I was simply trying to point you in the right direction. You asked a question about perfect random numbers, even if perfect was in quotes. And yes, you can improve randomness. You can even implement heuristic or "natural" algorithms, such ideas like "atmospheric noise" -- but still, you're not perfect, not by any means.

use /dev/ramdom (linux device true random number generator) to seed mt_rand
<?
$rnd_dev=mcrypt_create_iv(4, MCRYPT_DEV_RANDOM); //need "apt-get install php5-mcrypt"
$seed=ord(substr($rnd_dev, 0, 1))<<24 |
ord(substr($rnd_dev, 1, 1))<<16 |
ord(substr($rnd_dev, 2, 1))<<8 |
ord(substr($rnd_dev, 3, 1));
mt_srand($seed);
echo mt_rand();
?>

I made a PHP class for generating random numbers and strings PHPRandomValue
It uses "mcrypt_create_iv(4, MCRYPT_DEV_URANDOM)" to generate random numbers and values. I made it while working on a crypto project because I needed a safe random value generator. Here's an example usage
$randomValue = new RandomValue;
$randomValue->randomNumber(): = -3880998
$randomValue->randomNumberBetween(1,10): = 2
$randomValue->randomTextString(): = CfCkKDHRgUULdGWcSqP4
$randomValue->randomTextString(10): = LorPIxaeEY
$randomValue->randomKey(): = C7al8tX9.gqYLf2ImVt/!$NOY79T5sNCT/6Q.$!.6Gf/Q5zpa3
$randomValue->randomKey(10): = RDV.dc6Ai/

It is not possible to generate true random numbers, the best you can hope for is pseudo-random which is what rand() provides, your function is no closer to random then rand(). Take a look at this http://en.wikipedia.org/wiki/Random_number_generator

Tru Random numbers
<?php
for ($i = -1; $i <= 4; $i++) {
$bytes = openssl_random_pseudo_bytes($i, $cstrong);
$hex = bin2hex($bytes);
echo "Lengths: Bytes: $i and Hex: " . strlen($hex) . PHP_EOL;
var_dump($hex);
var_dump($cstrong);
echo PHP_EOL;
}
?>
and also crypto secure ;)

Although the answer was accepted years ago, I'll re-reopen it.
Since all this randomness depends on the system time, let's mess with the system time too! The amount of time an operation takes on the computer is actually rather variable (especially if other stuff is happening on that server), so if we take advantage of that with microtime... (couldn't find any portable nanotime commands)
$a='';
for (int $i=0; $i<9001; $i++)
{
usleep(mt_rand(1000,10000));//Also eliminates timing attacks... possibly?
$a=hash('SHA512',$a.uniqid(mt_rand().microtime(),true));
}
echo $a;
Nominally this has 207023 bits of entropy, since you're adding another 23 bits every iteration, but there's a lot of interdependencies, so it's probably a few orders of magnitude less. Still pretty good.
Do you know of any operations on PHP that take a really random amount of time? Like... HTTP-requesting some website (other than RANDOM.org) and measuring the time it takes?

Using random.org, you can use this:
function getToken($length, $min, $max){
$r = explode('
',file_get_contents('http://www.random.org/integers/num='.$length.'&min='.$min.'&max='.$max.'&col=1&base=10&format=plain'));
$string = '';
foreach ( $r as $char ) $string.=$char;
return $string;
}
this should give real random numbers

Related

Does mt_rand() generate same number twice?

I am trying to generate unique token id, can i use mt_rand()?
will mt_rand() generate same number twice?
This is the first time ever I'm going to answer a question with just a comic, just because it's the right answer:
The original on Dilbert.com.
mt_rand will generate the same number twice, yes. Every random number generator will probably do so eventually. (Theoretically) every number has the same chance of being generated every time the generator is run. It could randomly generate the same number many times in a row. It's random.
To use a random number generator for unique ids, the probability of generating the same number twice must be so low as to be irrelevant in practice. In this regard, mt_rand is probably not sufficient. The concept of randomly generated unique ids has been formalised into UUIDs, which you should use for exactly the purpose of a universally unique id.
Take this quote:
...only after generating 1 billion UUIDs every second for the next 100 years, the probability of creating just one duplicate would be about 50%.
Since mt_rand returns a 32-bit integer on 32-bit systems, it can only return 2^32 unique values, which is a mere 4,294,967,296 unique values. If you were generating a billion mt_rand values every second, you're basically guaranteed a duplicate after 4 seconds. That hopefully illustrates the difference of scale between UUIDs and mt_rand and why that matters. Even if you're generating much fewer than 1 billion ids every second, you'll still need to choose an algorithm which makes collisions practically impossible, not just unlikely.
mt_rand() will return a random number on each call.
But eventually, it will return numbers that have already been returned to you. That is because, along with being consistent with desired statistical properties of randomness, the generator has a finite (although for Mersenne Twister, a very large) periodicity. If this behaviour is undesirable then your best bet is to shuffle a unique set using that generator.
mt_rand() has the following caveats:
Caution: This function does not generate cryptographically secure
values, and should not be used for cryptographic purposes. If you need
a cryptographically secure value, consider using
openssl_random_pseudo_bytes() instead.
And
Caution: The distribution of mt_rand() return values is biased towards
even numbers on 64-bit builds of PHP when max is beyond 2^32.
If you're fine with it not being completely unbiased, you should be ok.
But, if that's the case, why aren't you using uniqid?

PHP shuffle vs rand - which produces better randomness

Does any one know which of the two choices for random number generation is going to produce a better randomness:
<?php
$array = array(1,2,3,4,5,6);
shuffle($array);
echo $array[0];
//// OR ////
echo rand(1,6);
?>
Or perhaps theres an even better option I am unaware of?
Both use the same PRNG so their randomness is equally good/bad. Obviously a plain rand(1,6) is faster since it's just the math and not both the math and the array stuff. Besides that, you'd use array_rand() if you wanted a random element from an array.
PHP also has the mersenne twister (mt_rand()) which is "better" but unsuitable for cryptography (probably not relevant in your case).
If you need true randomness you could consider reading from /dev/random - but that read may block if there's no randomness available. You could also use a hardware device that gives you even better randomness based on some physical effect.
On most systems, seed the random number generator with a random value. In most applications, using the low order parts of the current time (like seconds and microseconds) is a pretty good strategy. Whether that works for you depends on the application.
On a Linux system, the program can also read a few bytes from /dev/random or /dev/urandom to seed the random number generator. It is a concentration of random entropy from device drivers and other hard-to-predict phenomena.

What's the disadvantage of mt_rand?

What's the definition of bias in:
The distribution of mt_rand() return values is biased towards even numbers on 64-bit builds of PHP when max is beyond 2^32.
If it's the kind of bias stated in alternate tie-breaking rules for rounding, I don't think it really matters (since the bias is not really visible).
Besides mt_rand() is claimed to be four times faster than rand(), just by adding three chars in front!
Assuming mt_rand is available, what's the disadvantage of using it?
mt_rand uses the Mersenne Twister algorithm, which is far better than the LCG typically used by rand. For example, the period of an LCG is a measly 232, whereas the period of mt_rand is 219937 − 1. Also, all the values generated by an LCG will lie on lines or planes when plotted into a multidimensional space. Also, it is not only practically feasible, but relatively easy to determine the parameters of an LCG. The only advantage LCGs have is being potentially slightly faster, but on a scale that is completely irrelevant when coding in php.
However, mt_rand is not suitable for cryptographic purposes (generation of tokens, passwords or cryptographic keys) either.
If you need cryptographic randomness, use random_int in php 7. On older php versions, read from /dev/urandom or /dev/random on a POSIX-conforming operating system.
The distribution quirk that you quoted is only relevant when the random number range you're generating is larger than 2^32. That is 4294967296.
If you're working with numbers that big, and you need them to be randomised, then perhaps this is a reason to reconsider using mt_rand(). However if your working with numbers smaller than this, then it is irrelevant.
The reason it happens is due to the precision of the random number generator not being good enough in those high ranges.
I've never worked with random numbers that large, so I've never needed to worry about it.
The difference between rand() and mt_rand() is a lot more than "just three extra characters". They are entirely different function calls, and work in completly different ways. Just the same as you don't expect print() and print_r() to be similar.
mt_rand() gets it's name from the "Mersene Twister" algorithm it uses to generate the random numbers. This algorithm is known to be a quick, efficient and high quality random number generator, which is why it is available in PHP.
The older rand() function makes use of the operating system's random number generator by making a system call. This means that it uses whatever random number generator happens to be the default on the operating system you're using. In general, the default random number generator uses a much slower and older algorithm, hence the claim that my_rand() is quicker, but it will vary from system to system.
Therefore, for virtually all uses, mt_rand() is a better function to use than rand().
You say "assuming mt_rand() is available", but it always will be since it was introduced way back in PHP4.

Unique key generation

I looking for a way, specifically in PHP that I will be guaranteed to always get a unique key.
I have done the following:
strtolower(substr(crypt(time()), 0, 7));
But I have found that once in a while I end up with a duplicate key (rarely, but often enough).
I have also thought of doing:
strtolower(substr(crypt(uniqid(rand(), true)), 0, 7));
But according to the PHP website, uniqid() could, if uniqid() is called twice in the same microsecond, it could generate the same key. I'm thinking that the addition of rand() that it rarely would, but still possible.
After the lines mentioned above I am also remove characters such as L and O so it's less confusing for the user. This maybe part of the cause for the duplicates, but still necessary.
One option I have a thought of is creating a website that will generate the key, storing it in a database, ensuring it's completely unique.
Any other thoughts? Are there any websites out there that already do this that have some kind of API or just return the key. I found http://userident.com but I'm not sure if the keys will be completely unique.
This needs to run in the background without any user input.
There are only 3 ways to generate unique values, rather they be passwords, user IDs, etc.:
Use an effective GUID generator - these are long and cannot be shrunk. If you only use part you FAIL.
At least part of the number is sequentially generated off of a single sequence. You can add fluff or encoding to make it look less sequential. Advantage is they start short - disadvantage is they require a single source. The work around for the single source limitation is to have numbered sources, so you include the [source #] + [seq #] and then each source can generate its own sequence.
Generate them via some other means and then check them against the single history of previously generated values.
Any other method is not guaranteed. Keep in mind, fundamentally you are generating a binary number (it is a computer), but then you can encode it in Hexadecimal, Decimal, Base64, or a word list. Pick an encoding that fits your usage. Usually for user entered data you want some variation of Base32 (which you hinted at).
Note about GUIDS: They gain their strength of uniqueness from their length and the method used to generate them. Anything less than 128-bits is not secure. Beyond random number generation there are characteristics that go into a GUID to make it more unique. Keep in mind they are only practically unique, not completely unique. It is possible, although practically impossible to have a duplicate.
Updated Note about GUIDS: Since writing this I learned that many GUID generators use a cryptographically secure random number generator (difficult or impossible to predict the next number generated, and a not likely to repeat). There are actually 5 different UUID algorithms. Algorithm 4 is what Microsoft currently uses for the Windows GUID generation API. A GUID is Microsoft's implementation of the UUID standard.
Update: If you want 7 to 16 characters then you need to use either method 2 or 3.
Bottom line: Frankly there is no such thing as completely unique. Even if you went with a sequential generator you would eventually run out of storage using all the atoms in the universe, thus looping back on yourself and repeating. Your only hope would be the heat death of the universe before reaching that point.
Even the best random number generator has a possibility of repeating equal to the total size of the random number you are generating. Take a quarter for example. It is a completely random bit generator, and its odds of repeating are 1 in 2.
So it all comes down to your threshold of uniqueness. You can have 100% uniqueness in 8 digits for 1,099,511,627,776 numbers by using a sequence and then base32 encoding it. Any other method that does not involve checking against a list of past numbers only has odds equal to n/1,099,511,627,776 (where n=number of previous numbers generated) of not being unique.
Any algorithm will result in duplicates.
Therefore, might I suggest that you use your existing algorithm* and simply check for duplicates?
*Slight addition: If uniqid() can be non-unique based on time, also include a global counter that you increment after every invocation. That way something is different even in the same microsecond.
Without writing the code, my logic would be:
Generate a random string from whatever acceptable characters you like.
Then add half the date stamp (partial seconds and all) to the front and the other half to the end (or somewhere in the middle if you prefer).
Stay JOLLY!
H
If you use your original method, but add the username or emailaddress in front of the password, it will always be unique if each user only can have 1 password.
You may be interested in this article which deals with the same issue: GUIDs are globally unique, but substrings of GUIDs aren't.
The goal of this algorithm is to use the combination of time and location ("space-time coordinates" for the relativity geeks out there) as the uniqueness key. However, timekeeping is not perfect, so there's a possibility that, for example, two GUIDs are generated in rapid succession from the same machine, so close to each other in time that the timestamp would be the same. That's where the uniquifier comes in.
I usually do it like this:
$this->password = '';
for($i=0; $i<10; $i++)
{
if($i%2 == 0)
$this->password .= chr(rand(65,90));
if($i%3 == 0)
$this->password .= chr(rand(97,122));
if($i%4 == 0)
$this->password .= chr(rand(48,57));
}
I suppose there are some theoretical holes but I've never had an issue with duplication. I usually use it for temporary passwords (like after a password reset) and it works well enough for that.
As Frank Kreuger commented, go with a GUID generator.
Like this one
I'm still not seeing why the passwords have to be unique? What's the downside if 2 of your users have the same password?
This is assuming we're talking about passwords that are tied to userids, and not just unique identifiers. If that's what you're looking for, why not use GUIDs?
You might be interested in Steve Gibson's over-the-top-secure implementation of a password generator (no source, but he has a detailed description of how it works) at https://www.grc.com/passwords.htm.
The site creates huge 64-character passwords but, since they're completely random, you could easily take the first 8 (or however many) characters for a less secure but "as random as possible" password.
EDIT: from your later answers I see you need something more like a GUID than a password, so this probably isn't what you want...
I do believe that part of your issue is that you are trying to us a singular function for two separate uses... passwords and transaction_id
these really are two different problem areas and it really is not best to try to address them together.
I recently wanted a quick and simple random unique key so I did the following:
$ukey = dechex(time()) . crypt( time() . md5(microtime() + mt_rand(0, 100000)) );
So, basically, I get the unix time in seconds and add a random md5 string generated from time + random number. It's not the best, but for low frequency requests it is pretty good. It's fast and works.
I did a test where I'd generate thousands of keys and then look for repeats, and having about 800 keys per second there were no repetitions, so not bad. I guess it totally depends on mt_rand()
I use it for a survey tracker where we get a submission rate of about 1000 surveys per minute... so for now (crosses fingers) there are no duplicates. Of course, the rate is not constant (we get the submissions at certain times of the day) so this is not fail proof nor the best solution... the tip is using an incremental value as part of the key (in my case, I used time(), but could be better).
Ingoring the crypting part that does not have much to do with creating a unique value I usually use this one:
function GetUniqueValue()
{
static $counter = 0; //initalized only 1st time function is called
return strtr(microtime(), array('.' => '', ' ' => '')) . $counter++;
}
When called in same process $counter is increased so value is always unique in same process.
When called in different processes you must be really unlucky to get 2 microtime() call with the same values, think that microtime() calls usually have different values also when called in same script.
I usually do a random substring (randomize how many chars between 8 an 32, or less for user convenience) or the MD5 of some value I have gotten in, or the time, or some combination. For more randomness I do MD5 of come value (say last name) concatenate that with the time, MD5 it again, then take the random substring. Yes, you could get equal passwords, but its not very likely at all.

Better Random Generating PHP

I know that just using rand() is predictable, if you know what you're doing, and have access to the server.
I have a project that is highly dependent upon choosing a random number that is as unpredictable as possible. So I'm looking for suggestions, either other built-in functions or user functions, that can generate a better random number.
I used this to do a little test:
$i = 0;
while($i < 10000){
$rand = rand(0, 100);
if(!isset($array[$rand])){
$array[$rand] = 1;
} else {
$array[$rand]++;
}
sort($array);
$i++;
}
I found the results to be evenly distributed, and there is an odd pattern to the number of times each number is generated.
Adding, multiplying, or truncating a poor random source will give you a poor random result. See Introduction to Randomness and Random Numbers for an explanation.
You're right about PHP rand() function. See the second figure on Statistical Analysis for a striking illustration. (The first figure is striking, but it's been drawn by Scott Adams, not plotted with rand()).
One solution is to use a true random generator such as random.org. Another, if you're on Linux/BSD/etc. is to use /dev/random. If the randomness is mission critical, you will have to use a hardware random generator.
random.org has an API you can access via HTTP.
RANDOM.ORG is a true random number service that generates randomness
via atmospheric noise.
I would be wary of the impression of randomness: there have been many experiments where people would choose the less random distribution. It seems the mind is not very good at producing or estimating randomness.
There are good articles on randomness at Fourmilab, including another true random generator. Maybe you could get random data from both sites so if one is down you still have the other.
Fourmilab also provides a test program to check randomness. You could use it to check your various myRand() programs.
As for your last program, if you generate 10000 values, why don't you choose the final value amongst the 10 thousand? You restrict yourself to a subset. Also, it won't work if your $min and $max are greater than 10000.
Anyway, the randomness you need depends on your application. rand() will be OK for an online game, but not OK for cryptography (anything not thoroughly tested with statistical programs will not be suitable for cryptography anyway). You be the judge!
Another way of getting random numbers, similar in concept to getting UUID
PHP Version 5.3 and above
openssl_random_pseudo_bytes(...)
Or you can try the following library using RFC4122
Variation on #KG, using the milliseconds since EPOCH as the seed for rand?
A new PHP7 there is a function that does exactly what you needed: it generates cryptographically secure pseudo-random integers.
int random_int ( int $min , int $max )
Generates cryptographic random integers that are suitable for use
where unbiased results are critical (i.e. shuffling a Poker deck).
For a more detailed explanation about PRNG and CSPRNG (and their difference) as well as why your original approach is actually a bad idea, please read my another highly similar answer.

Categories