What makes openssl_random_pseudo_bytes "cryptographically secure"?

What makes openssl_random_pseudo_bytes "cryptographically secure"? - php

I've always been told that I should use openssl_random_pseudo_bytes when giving passwords a salt.
But what I would really love to know, is what makes it cryptographically secure. What is the internal difference between rand, mt_rand and openssl_random_pseudo_bytes?
Thanks in advance.

The differences are in short:
rand uses the libc random number generator (source), which depends on the system and is usually not cryptographically secure
mt_rand uses a known algorithm, the Mersenne Twister, hence the name; this is a fast algorithm that produces well distributed but not cryptographically-secure randoms
openssl_random_pseudo_bytes directly calls the OpenSSL system for cryptographically-secure randoms (but see the warning in the full description)
The properties are also listed in the table below:
rand
For rand it is stated in mt_rand:
Many random number generators of older libcs have dubious or unknown characteristics and are slow.
So for rand you'll have to take a look at your libc to figure out which random is actually used. It's stated on the Mersenne Twister site that it should have comparable speed nowadays, but it's characteristics depends on the system. It doesn't state how it is seeded either, meaning that you could use it for a game or such, but not for much else.
mt_rand
The Mersenne Twister is a well known algorithm that produces rather well distributed random numbers. It has a very long period, which means that it takes a long time before a previous state is encountered (if this happens it stays in loop, the size of the loop is called the period). MT is not secure because it is possible to reconstruct its secure state given enough data. This means that if you first generate a key, and then use the algorithm for something else, then an attacker may recreate the key given enough output. Furthermore, a non-secure seed as the system time is used upon creation.
openssl_random_pseudo_bytes
OpenSSL's random number generator is usually cryptographically secure (see note below); this means that it is not possible to re-calculate the internal state given the output of the generator.
OpenSSL's pseudo random number generator is constructed using a hash function, currently MD5, which should still be secure for generating random numbers. It is well distributed and - like the MT algorithm - has a high period. OpenSSL's rand is much slower than MT, but it should still get a rather good speed.
It has the advantage over OS random number generators that it does not need additional threads or system calls. OpenSSL uses the operating system random number generator (+ possible other sources) to create the initial seed. The OS random generators are normally the best possible random number generators available, as the OS has access to sources of entropy not directly available to libraries and applications.
Warning: on the Wiki of OpenSSL it is stated that:
RAND_pseudo_bytes returns pseudo-random bytes which can be cryptographically strong. The function returns 1 if the bytes are cryptographically strong, and 0 otherwise. If your application has high integrity requirements, it should not use RAND_pseudo_bytes.
Which is reflected by the PHP function:
If passed into the function, this will hold a boolean value that determines if the algorithm used was "cryptographically strong", e.g., safe for usage with GPG, passwords, etc. TRUE if it did, otherwise FALSE
This means it may still be insecure for e.g. long term keys.
Warning #2: additional insight shows that the PRNG of OpenSSL may not always be secure regardless of the return value. So additional care should be taken before choosing OpenSSL.

Related

Making mt_rand() as secure as possible

I am writing a raffle program where people have some tickets, which are marked by natural numbers in the range of 1 to 100 inclusive.
I use mt_rand(1,100) to generate the number of the winning ticket, and then this is outputted to the site, so everyone can see it.
Now I did a little research and found out from the Merseene wiki article that:
Observing a sufficient number of iterations (624 in the case of MT19937, since this is the size of the state vector from which future iterations are produced) allows one to predict all future iterations.
Is the current version used by mt_rand() MT19937?
If so, what can I do to make my generated numbers more cryptographically secure?
Thanks in advance :-)

The short answer:
If so, what can I do to make my generated numbers more cryptographically secure?
You can simply use a random number generator suited for this task instead of mt_rand().
When PHP 7 comes out, you can use random_int() in your projects when a cryptographically secure random number generator is needed.
"Okay, great, but PHP 7 isn't out yet. What do I do today?"
Well, you're in luck, you have two good options available to you.
Use RandomLib. OR
I've been working on backporting PHP 7's CSPRNG functions into PHP 5 projects. It lives on Github under paragonie/random_compat.
"I don't want to use a library; how do I safely roll my own?"
When it comes to cryptography, rolling your own implementation is usually a poor decision. "Not invented here," is usually a good thing. However, if you're dead set on writing your own PHP library to securely generate random integers or strings, there are a few things to keep in mind:
Use a reliable source of randomness. In order of preference, reading from /dev/urandom should be your first choice, followed by mcrypt_create_iv() with MCRYPT_DEV_URANDOM, followed by reading from CAPICOM (Windows only), and lastly openssl_random_pseudo_bytes().
When reading from /dev/urandom, cache your file descriptors to reduce the overhead of each function invocation.
When reading from /dev/urandom, PHP will always buffer 8192 bytes of data (which, likely, you will not use). Be sure to turn read buffering off (i.e. stream_set_read_buffer($fileHandle, 0);).
Avoid any functions or operations that can leak timing information. This means, generally, you want to use bitwise operators instead of math functions (e.g. log()) or anything involving floats.
Don't use the modulo operator to reduce a random integer to a range. This will result in a biased probability distribution:
A good CSPRNG will not fallback to insecure results. Don't silently just use mt_rand() if no suitable CSPRNG is available; instead, throw an uncaught exception or issue a fatal error. Get the developer's attention immediately.

Sorry, but Mersenne Twister was not designed to meet cryptographic requirements. No, you cannot and should not try to fix it, because usually when non-experts try to improve cryptographic functionality, they just end up making things worse.
Php has a long history of problems with its randomness for cryptographic purposes. I'll point out a few references for light reading:
I forgot your password: Randomness attacks against PHP applications
Cracking PHP's lcg_value()
phpwn: Attack on PHP sessions and random numbers
To my knowledge, the best option for secure (pseudo) random number generation in PhP applications is to use openssl_random_pseudo_bytes.

mt_rand by its very name is the Mersenne Twister, a non secure random number generator. Furthermore it is often just seeded with a specific time in ms, something that an attacker can simply guess or aim for.
You cannot make the Mersenne Twister secure. So if anywhere possible you should use a secure random number generator seeded by an entropy source. This entropy source is usually obtained from the operating system. An OpenSSL based one should be preferred.
There is absolutely no reason why you would be stuck with MT. PRNG's are just algorithms. There are plenty of libraries that contain secure PRNG's.

Is openssl_random_pseudo_bytes truly random?

I'm wondering about why this function has an "pseudo" on it's name. Can I trust that those bytes are really random? I didn't find any explanation about this in php's manual.
openssl_random_pseudo_bytes

There are two kinds of interpretation of the word "Pseudo" in Pseudo Random Number Generator (PRNG). It can either mean that the results may not be fully random. It may also mean that the random number generator itself is deterministic. This is what meant by pseudo in the OpenSSL sense.
Good Deterministic Random Bit Generators (DRBG) - just another term for PRNG - have a very high cycle time. That means that the output should be indistinguishable from random as long as there is enough entropy in the given seed data - it will take forever before it starts to repeat itself. Most DRBG's of libraries / systems are seeded when they are initialized, again this is true for the OpenSSL one as well.
Often "true random" is meant for random values directly retrieved from sources of entropy. Quite often these sources do provide enough entropy, but that does not necessarily mean that the entropy is well distributed. Hence it is often better and faster to use the entropy of the true random source to seed a DRBG.
Now the random sources are often only available to the operating system (as they are often linked to low level I/O operations). Hence the random seed is first used for random pools in the OS (e.g. /dev/random) which can in turn be used for seeding the DRBG's. These pools are often already whitened to create a better distribution.
So you would get: entropy -> os entropy pool -> openssl random_pseudo_bytes -> PHP wrapper -> your application.
When on embedded systems you may want to heed the following warning:
It also indicates if a cryptographically strong algorithm was used to produce the pseudo-random bytes, and does this via the optional crypto_strong parameter. It's rare for this to be FALSE, but some systems may be broken or old.
Otherwise it is probably the most reliable source for random numbers.

PHP: openssl_random_pseudo_bytes() and Crypto Security vs Extreme Randomness

Doing some research into openssl_random_pseudo_bytes() in PHP and I noticed that in the implementation of the of the openssl_random_pseudo_bytes()function in PHP's source. OpenSSL's RAND_pseudo_bytes function is used to generate the return value as opposed to RAND_bytes also available in OpenSSL.
OpenSSL's documentation of these two functions are as follows:
RAND_pseudo_bytes() puts num pseudo-random bytes into buf.
Pseudo-random byte sequences generated by RAND_pseudo_bytes() will be
unique if they are of sufficient length, but are not necessarily
unpredictable. They can be used for non-cryptographic purposes and for
certain purposes in cryptographic protocols, but usually not for key
generation etc.
RAND_bytes() puts num cryptographically strong pseudo-random bytes
into buf. An error occurs if the PRNG has not been seeded with enough
randomness to ensure an unpredictable byte sequence.
I guess my question is why wasn't RAND_bytes used or why isn't there also a openssl_rand_bytes() function in PHP if it is, according to OpenSSL, more random.
Just curious. Was it a speed concern? Not reliable enough? Or was the PRNG the issue (ie: to hard to implement when the pseudo works fine for most purposes) ?
Thanks

The choice was probably made based on practicality rather than the soundness of the crypto. If RAND_bytes() is used, the function may fail due to insufficient randomness being available. The author of the PHP code no doubt wanted to avoid the PHP function failing.
I notice though that the openssl_random_pseudo_bytes() function does have an optional crypto_strong parameter, which lets the caller know whether the returned bytes really were cryptographically strong, in the opinion of OpenSSL.
As an aside, it is possible to configure OpenSSL with external engines and some of them (such as CHIL) use a hardware-based random source for both RAND_pseudo_bytes() and RAND_bytes() if that's what you need.
Also, on Windows the PHP code is using CryptGenRandom.

What's the disadvantage of mt_rand?

What's the definition of bias in:
The distribution of mt_rand() return values is biased towards even numbers on 64-bit builds of PHP when max is beyond 2^32.
If it's the kind of bias stated in alternate tie-breaking rules for rounding, I don't think it really matters (since the bias is not really visible).
Besides mt_rand() is claimed to be four times faster than rand(), just by adding three chars in front!
Assuming mt_rand is available, what's the disadvantage of using it?

mt_rand uses the Mersenne Twister algorithm, which is far better than the LCG typically used by rand. For example, the period of an LCG is a measly 232, whereas the period of mt_rand is 219937 − 1. Also, all the values generated by an LCG will lie on lines or planes when plotted into a multidimensional space. Also, it is not only practically feasible, but relatively easy to determine the parameters of an LCG. The only advantage LCGs have is being potentially slightly faster, but on a scale that is completely irrelevant when coding in php.
However, mt_rand is not suitable for cryptographic purposes (generation of tokens, passwords or cryptographic keys) either.
If you need cryptographic randomness, use random_int in php 7. On older php versions, read from /dev/urandom or /dev/random on a POSIX-conforming operating system.

The distribution quirk that you quoted is only relevant when the random number range you're generating is larger than 2^32. That is 4294967296.
If you're working with numbers that big, and you need them to be randomised, then perhaps this is a reason to reconsider using mt_rand(). However if your working with numbers smaller than this, then it is irrelevant.
The reason it happens is due to the precision of the random number generator not being good enough in those high ranges.
I've never worked with random numbers that large, so I've never needed to worry about it.
The difference between rand() and mt_rand() is a lot more than "just three extra characters". They are entirely different function calls, and work in completly different ways. Just the same as you don't expect print() and print_r() to be similar.
mt_rand() gets it's name from the "Mersene Twister" algorithm it uses to generate the random numbers. This algorithm is known to be a quick, efficient and high quality random number generator, which is why it is available in PHP.
The older rand() function makes use of the operating system's random number generator by making a system call. This means that it uses whatever random number generator happens to be the default on the operating system you're using. In general, the default random number generator uses a much slower and older algorithm, hence the claim that my_rand() is quicker, but it will vary from system to system.
Therefore, for virtually all uses, mt_rand() is a better function to use than rand().
You say "assuming mt_rand() is available", but it always will be since it was introduced way back in PHP4.

A safe random number, such as an integer from /dev/random, for Windows platform?

In my application, I require a function to generate (unpredictably) random values that differ each time when called such as inside a fast loop.
On Linux platforms which is the platform I will release my script (of which shall be run under SSL in PHP) I will combine possibly multiple facilities to ensure a seed or hash is completely random, by querying /dev/random, possibly combined with OpenSSL's facilities and including system-specific values such as script last modified and creation time.
I am using these specific values, as even if person A had the script and knows the methods, they would not be able to guess the (/dev/random contents, memory usage at the moment, modification time likely, etc) and will not realistically be able to reduce the security of user B running the same script.
On the Windows platform which unfortunately I must develop on for the moment (I still test on Linux, but less often) I require random values of which I described above, just to provided at least limited protection from predicting the seeds or keys.
I had tried as a first attempt using memory_get_usage() (with or without available true parameter for 'true' memory usage for PHP) and it seems that the values remain very static even when each iteration performs a fair amount of memory heavy computation.
Would it maybe be wise to use this (somewhat dynamic) memory usage as a seed, for a PRNG to generate more (quickly) random numbers? Or would the fact that memory is such a limited range they could just create 2^xx seeds and roughly guess it.. I am starting to blur the line of what is realistically random, if it is even possible to guess my operations even if they are 'not' really that random.

The standard equivalent of the /dev/random (or the generally recommended /dev/urandom) Unix device on Windows is the CryptGenRandom function from CryptoAPI.
In PHP, you should be able to use mcrypt_create_iv() with MCRYPT_DEV_URANDOM, which uses /dev/urandom on Unix and (apparently) CryptGenRandom on Windows.

A Mersenne Twister (what mt_rand uses) is a good algorithm for non-security purposes but it shouldn't be used for security. Wikipedia: Mersenne Twister:
"The algorithm in its native form is not suitable for cryptography... Observing a sufficient number of iterates (624 in the case of MT19937) allows one to predict all future iterates."
Instead it's just as simple to just take the output of a counter, concatenate (or XOR) it with some salt, and hash it with a cryptographically secure hash algorithm like SHA-2. If no one knows your salt, it will be absolutely secure. The salt is then equivalent to Mersenne's seed.
I'm no expert on where to get good random salt on Windows, but you can always concatenate (or XOR) things like system time, memory usage, etc, and hash that with SHA-2. You can even reach outside to a place like Random.org for some true random numbers (if you don't call it too often). The best part about combining sources of randomness with SHA-2 is that every additional source can only add randomness, not subtract it.

Why not just use something like?
mt_rand({min}, {max});
More info here: http://php.net/manual/en/function.mt-rand.php

We Keep Coding

PHP, A popular general-purpose scripting language that is especially suited to web development.