I am writing a raffle program where people have some tickets, which are marked by natural numbers in the range of 1 to 100 inclusive.
I use mt_rand(1,100) to generate the number of the winning ticket, and then this is outputted to the site, so everyone can see it.
Now I did a little research and found out from the Merseene wiki article that:
Observing a sufficient number of iterations (624 in the case of MT19937, since this is the size of the state vector from which future iterations are produced) allows one to predict all future iterations.
Is the current version used by mt_rand() MT19937?
If so, what can I do to make my generated numbers more cryptographically secure?
Thanks in advance :-)
The short answer:
If so, what can I do to make my generated numbers more cryptographically secure?
You can simply use a random number generator suited for this task instead of mt_rand().
When PHP 7 comes out, you can use random_int() in your projects when a cryptographically secure random number generator is needed.
"Okay, great, but PHP 7 isn't out yet. What do I do today?"
Well, you're in luck, you have two good options available to you.
Use RandomLib. OR
I've been working on backporting PHP 7's CSPRNG functions into PHP 5 projects. It lives on Github under paragonie/random_compat.
"I don't want to use a library; how do I safely roll my own?"
When it comes to cryptography, rolling your own implementation is usually a poor decision. "Not invented here," is usually a good thing. However, if you're dead set on writing your own PHP library to securely generate random integers or strings, there are a few things to keep in mind:
Use a reliable source of randomness. In order of preference, reading from /dev/urandom should be your first choice, followed by mcrypt_create_iv() with MCRYPT_DEV_URANDOM, followed by reading from CAPICOM (Windows only), and lastly openssl_random_pseudo_bytes().
When reading from /dev/urandom, cache your file descriptors to reduce the overhead of each function invocation.
When reading from /dev/urandom, PHP will always buffer 8192 bytes of data (which, likely, you will not use). Be sure to turn read buffering off (i.e. stream_set_read_buffer($fileHandle, 0);).
Avoid any functions or operations that can leak timing information. This means, generally, you want to use bitwise operators instead of math functions (e.g. log()) or anything involving floats.
Don't use the modulo operator to reduce a random integer to a range. This will result in a biased probability distribution:
A good CSPRNG will not fallback to insecure results. Don't silently just use mt_rand() if no suitable CSPRNG is available; instead, throw an uncaught exception or issue a fatal error. Get the developer's attention immediately.
Sorry, but Mersenne Twister was not designed to meet cryptographic requirements. No, you cannot and should not try to fix it, because usually when non-experts try to improve cryptographic functionality, they just end up making things worse.
Php has a long history of problems with its randomness for cryptographic purposes. I'll point out a few references for light reading:
I forgot your password: Randomness attacks against PHP applications
Cracking PHP's lcg_value()
phpwn: Attack on PHP sessions and random numbers
To my knowledge, the best option for secure (pseudo) random number generation in PhP applications is to use openssl_random_pseudo_bytes.
mt_rand by its very name is the Mersenne Twister, a non secure random number generator. Furthermore it is often just seeded with a specific time in ms, something that an attacker can simply guess or aim for.
You cannot make the Mersenne Twister secure. So if anywhere possible you should use a secure random number generator seeded by an entropy source. This entropy source is usually obtained from the operating system. An OpenSSL based one should be preferred.
There is absolutely no reason why you would be stuck with MT. PRNG's are just algorithms. There are plenty of libraries that contain secure PRNG's.
Related
Doing some research into openssl_random_pseudo_bytes() in PHP and I noticed that in the implementation of the of the openssl_random_pseudo_bytes()function in PHP's source. OpenSSL's RAND_pseudo_bytes function is used to generate the return value as opposed to RAND_bytes also available in OpenSSL.
OpenSSL's documentation of these two functions are as follows:
RAND_pseudo_bytes() puts num pseudo-random bytes into buf.
Pseudo-random byte sequences generated by RAND_pseudo_bytes() will be
unique if they are of sufficient length, but are not necessarily
unpredictable. They can be used for non-cryptographic purposes and for
certain purposes in cryptographic protocols, but usually not for key
generation etc.
RAND_bytes() puts num cryptographically strong pseudo-random bytes
into buf. An error occurs if the PRNG has not been seeded with enough
randomness to ensure an unpredictable byte sequence.
I guess my question is why wasn't RAND_bytes used or why isn't there also a openssl_rand_bytes() function in PHP if it is, according to OpenSSL, more random.
Just curious. Was it a speed concern? Not reliable enough? Or was the PRNG the issue (ie: to hard to implement when the pseudo works fine for most purposes) ?
Thanks
The choice was probably made based on practicality rather than the soundness of the crypto. If RAND_bytes() is used, the function may fail due to insufficient randomness being available. The author of the PHP code no doubt wanted to avoid the PHP function failing.
I notice though that the openssl_random_pseudo_bytes() function does have an optional crypto_strong parameter, which lets the caller know whether the returned bytes really were cryptographically strong, in the opinion of OpenSSL.
As an aside, it is possible to configure OpenSSL with external engines and some of them (such as CHIL) use a hardware-based random source for both RAND_pseudo_bytes() and RAND_bytes() if that's what you need.
Also, on Windows the PHP code is using CryptGenRandom.
I've always been told that I should use openssl_random_pseudo_bytes when giving passwords a salt.
But what I would really love to know, is what makes it cryptographically secure. What is the internal difference between rand, mt_rand and openssl_random_pseudo_bytes?
Thanks in advance.
The differences are in short:
rand uses the libc random number generator (source), which depends on the system and is usually not cryptographically secure
mt_rand uses a known algorithm, the Mersenne Twister, hence the name; this is a fast algorithm that produces well distributed but not cryptographically-secure randoms
openssl_random_pseudo_bytes directly calls the OpenSSL system for cryptographically-secure randoms (but see the warning in the full description)
The properties are also listed in the table below:
rand
For rand it is stated in mt_rand:
Many random number generators of older libcs have dubious or unknown characteristics and are slow.
So for rand you'll have to take a look at your libc to figure out which random is actually used. It's stated on the Mersenne Twister site that it should have comparable speed nowadays, but it's characteristics depends on the system. It doesn't state how it is seeded either, meaning that you could use it for a game or such, but not for much else.
mt_rand
The Mersenne Twister is a well known algorithm that produces rather well distributed random numbers. It has a very long period, which means that it takes a long time before a previous state is encountered (if this happens it stays in loop, the size of the loop is called the period). MT is not secure because it is possible to reconstruct its secure state given enough data. This means that if you first generate a key, and then use the algorithm for something else, then an attacker may recreate the key given enough output. Furthermore, a non-secure seed as the system time is used upon creation.
openssl_random_pseudo_bytes
OpenSSL's random number generator is usually cryptographically secure (see note below); this means that it is not possible to re-calculate the internal state given the output of the generator.
OpenSSL's pseudo random number generator is constructed using a hash function, currently MD5, which should still be secure for generating random numbers. It is well distributed and - like the MT algorithm - has a high period. OpenSSL's rand is much slower than MT, but it should still get a rather good speed.
It has the advantage over OS random number generators that it does not need additional threads or system calls. OpenSSL uses the operating system random number generator (+ possible other sources) to create the initial seed. The OS random generators are normally the best possible random number generators available, as the OS has access to sources of entropy not directly available to libraries and applications.
Warning: on the Wiki of OpenSSL it is stated that:
RAND_pseudo_bytes returns pseudo-random bytes which can be cryptographically strong. The function returns 1 if the bytes are cryptographically strong, and 0 otherwise. If your application has high integrity requirements, it should not use RAND_pseudo_bytes.
Which is reflected by the PHP function:
If passed into the function, this will hold a boolean value that determines if the algorithm used was "cryptographically strong", e.g., safe for usage with GPG, passwords, etc. TRUE if it did, otherwise FALSE
This means it may still be insecure for e.g. long term keys.
Warning #2: additional insight shows that the PRNG of OpenSSL may not always be secure regardless of the return value. So additional care should be taken before choosing OpenSSL.
I know PHP's mt_rand() should not be used for security purposes as its results are not cryptographically strong. Yet a lot of PHP code does just that, or uses it as a fallback if better sources of randomness are not available.
So how bad is it? What sources of randomness does mt_rand use for seeding? And are there other security problems with mt_rand for cryptographic applications?
In PHP 5.4, if mt_rand is automatically seeded the first time it's used (PHP source). The seed value is a function of the current timestamp, the PHP process PID and a value produced by PHP's internal LCG. I didn't check the source for previous versions of PHP, but the documentation implies that this seeding algorithm has been in use starting from PHP 5.2.1.
The RNG algorithm behind mt_rand is the Mersenne Twister. It doesn't really make sense to talk about "how bad" it is, because it's clearly documented (not on the PHP docs page, unfortunately) that it is entirely unsuitable for cryptographic applications. If you want crypto-strength randomness, use a documented crypto-strength generator.
Update: You might also want to look at this question from crypto.SE.
The purpose of a random number function is to get -- you guessed it -- a random number, something you cannot predict (or be very hard to predict with certainty). If the mt_rand() function is faster and less predictable (more "random") than the old rand(), why not just switch the underlying implementation to the new method?
To put it another way, what kind of program that uses rand() would break in a later version of PHP if/because the underlying implementation changed?
Mainly because that's the PHP way. Just like they added mysql_real_escape_string instead of replacing mysql_escape_string with it.
However, it might also be related to the disadvantages the mersenne-twister algorithm has (I have no clue if they are also present in the rand() algorithm though):
The algorithm in its native form is not suitable for cryptography (unlike Blum Blum Shub). Observing a sufficient number of iterates (624 in the case of MT19937, since this figure is the size of the state vector from which future iterates are produced) allows one to predict all future iterates. A pair of cryptographic stream ciphers based on output from Mersenne twister has been proposed by Makoto Matsumoto et al. The authors claim speeds 1.5 to 2 times faster than Advanced Encryption Standard in counter mode. wikipedia
Another issue is that it can take a long time to turn a non-random initial state (notably the presence of many zeros) into output that passes randomness tests. A small lagged Fibonacci generator or linear congruential generator gets started much more quickly and usually is used to seed the Mersenne Twister with random initial values. wikipedia
Both algorithms are pseudo-random. That implies that knowing the initial conditions makes it possible to know all future iterations. It is impossible to know if someone relies on such implementation details (i.e. relying on the implementation of the function instead of on the intent of the function), and it is therefore safer to create a new function.
Finally as of PHP 7.1, Both rand() and mt_rand() are the same function.
In my application, I require a function to generate (unpredictably) random values that differ each time when called such as inside a fast loop.
On Linux platforms which is the platform I will release my script (of which shall be run under SSL in PHP) I will combine possibly multiple facilities to ensure a seed or hash is completely random, by querying /dev/random, possibly combined with OpenSSL's facilities and including system-specific values such as script last modified and creation time.
I am using these specific values, as even if person A had the script and knows the methods, they would not be able to guess the (/dev/random contents, memory usage at the moment, modification time likely, etc) and will not realistically be able to reduce the security of user B running the same script.
On the Windows platform which unfortunately I must develop on for the moment (I still test on Linux, but less often) I require random values of which I described above, just to provided at least limited protection from predicting the seeds or keys.
I had tried as a first attempt using memory_get_usage() (with or without available true parameter for 'true' memory usage for PHP) and it seems that the values remain very static even when each iteration performs a fair amount of memory heavy computation.
Would it maybe be wise to use this (somewhat dynamic) memory usage as a seed, for a PRNG to generate more (quickly) random numbers? Or would the fact that memory is such a limited range they could just create 2^xx seeds and roughly guess it.. I am starting to blur the line of what is realistically random, if it is even possible to guess my operations even if they are 'not' really that random.
The standard equivalent of the /dev/random (or the generally recommended /dev/urandom) Unix device on Windows is the CryptGenRandom function from CryptoAPI.
In PHP, you should be able to use mcrypt_create_iv() with MCRYPT_DEV_URANDOM, which uses /dev/urandom on Unix and (apparently) CryptGenRandom on Windows.
A Mersenne Twister (what mt_rand uses) is a good algorithm for non-security purposes but it shouldn't be used for security. Wikipedia: Mersenne Twister:
"The algorithm in its native form is not suitable for cryptography... Observing a sufficient number of iterates (624 in the case of MT19937) allows one to predict all future iterates."
Instead it's just as simple to just take the output of a counter, concatenate (or XOR) it with some salt, and hash it with a cryptographically secure hash algorithm like SHA-2. If no one knows your salt, it will be absolutely secure. The salt is then equivalent to Mersenne's seed.
I'm no expert on where to get good random salt on Windows, but you can always concatenate (or XOR) things like system time, memory usage, etc, and hash that with SHA-2. You can even reach outside to a place like Random.org for some true random numbers (if you don't call it too often). The best part about combining sources of randomness with SHA-2 is that every additional source can only add randomness, not subtract it.
Why not just use something like?
mt_rand({min}, {max});
More info here: http://php.net/manual/en/function.mt-rand.php