Gathering entropy in web apps to create (more) secure random numbers - php

after several days of research and discussion i came up with this method to gather entropy from visitors (u can see the history of my research here)
when a user visits i run this code:
$entropy=sha1(microtime().$pepper.$_SERVER['REMOTE_ADDR'].$_SERVER['REMOTE_PORT'].
$_SERVER['HTTP_USER_AGENT'].serialize($_POST).serialize($_GET).serialize($_COOKIE));
note: pepper is a per site/setup random string set by hand.
then i execute the following (My)SQL query:
$query="update `crypto` set `value`=sha1(concat(`value`, '$entropy')) where name='entropy'";
that means we combine the entropy of the visitor's request with the others' gathered already.
that's all.
then when we want to generate random numbers we combine the gathered entropy with the output:
$query="select `value` from `crypto` where `name`='entropy'";
//...
extract(unpack('Nrandom', pack('H*', sha1(mt_rand(0, 0x7FFFFFFF).$entropy.microtime()))));
note: the last line is a part of a modified version of the crypt_rand function of the phpseclib.
please tell me your opinion about the scheme and other ideas/info regarding entropy gathering/random number generation.
ps: i know about randomness sources like /dev/urandom.
this system is just an auxiliary system or (when we don't have (access to) these sources) a fallback scheme.

In the best scenario, your biggest danger is a local user disclosure of information exploit. In the worst scenario, the whole world can predict your data. Any user that has access to the same resources you do: the same log files, the same network devices, the same border gateway, or the same line that runs between you and your remote connections allows them to sniff your traffic by unwinding your random number generator.
How would they do it? Why, basic application of information theory and a bit of knowledge of cryptography, of course!
You don't have a wrong idea, though! Seeding your PRNG with real sources of randomness is generally quite useful to prevent the above attacks from happening. For example, this same level of attack can be exploited by someone that understands how /dev/random gets populated on a per-system basis if the system has low entropy or its sources of randomness are reproducible.
If you can sufficiently secure the processes that seed your pool of entropy (for example, by gathering data from multiple sources over secure lines), the likelihood that someone is able to listen in becomes smaller and smaller as you get closer and closer to the desirable cryptographic qualities of a one-time pad.
In other words, don't do this in PHP, using a single source of randomness fed into a single Mersenne twister. Do it properly, by reading from your best, system-specific alternative to /dev/random, seeding its entropy pool from as many secure, distinct sources of "true" randomness as possible. I understand you've stated that these sources of randomness are inaccessible, but this notion is strange when similar functions are afforded to all major operating systems. So, I suppose I find the concept of an "auxiliary system" in this context to be dubious.
This will still be vulnerable to an attack by a local user cognizant of your sources of entropy, but securing the machine and increasing the true entropy within /dev/random will make it far more difficult for them to do their dirty work short of a man-in-the-middle attack.
As for cases where /dev/random is indeed accessible, you can seed it fairly easily:
Look at what options exist on your system for using /dev/hw_random
Embrace rngd (or a good alternative) for defining your sources of randomness
Use rng-tools for inspecting and improving your randomness profile
And finally, if you need a good, strong source of randomness, consider investing in more specialized hardware.
Best of luck in securing your application.
PS: You may want to give questions like this a spin at Security.SE and Cryptography.SE in the future!

Use Random.Org
If you need truly random numbers, use random.org. These numbers are generated via atmospheric noise. Besides library for PHP, it also has a http interface which allows you to get truly random numbers by simple requests:
https://www.random.org/integers/?num=10&min=1&max=6&col=1&base=10&format=plain&rnd=new
This means that you can simply retrieve the real random numbers in PHP without any additional PECL exension on the server.
If you don't like other users to be able to "steal" your random numbers (as MrGomez' argues), just use https with a certificate checking. Here follows an example with https certificate checking:
$url = "https://www.random.org/integers/?num=10&min=1&max=6&col=1&base=10&format=plain&rnd=new";
$ch = curl_init();
curl_setopt($ch, CURLOPT_URL, $url);
curl_setopt($ch, CURLOPT_RETURNTRANSFER, true);
curl_setopt($ch, CURLOPT_SSL_VERIFYPEER, true);
curl_setopt($ch, CURLOPT_SSL_VERIFYHOST, 2);
$response = curl_exec($ch);
if ($response === FALSE)
echo "http request failed: " . curl_error($ch);
else
echo $response;
curl_close($ch);
If you need more information on how to create https requests:
Make a HTTPS request through PHP and get response
http://unitstep.net/blog/2009/05/05/using-curl-in-php-to-access-https-ssltls-protected-sites/
More on security
Again, some might argue that if the attacker queries random.org at the same time as you, he might get the same numbers and predict.. I don't know if random.org would even work this way, but if you are really concerned, you may lessen the chance by fooling the attacker with dummy request which you throw out, or use only a certain part of the random numbers you get.
As MrGomez notes in his comment, this shall not be considered as an ultimate solution to security, but only as one of possible sources of entropy.
Performance
Of course, if you need a blitz latency then doing one random.org request per one client request might not be best idea... but what about just doing one bigger request to pre-cache the random numbers like every 5 minutes?

To come to the point, as far as i know there is no way to generate entrophy inside a PHP script, sorry for this non-answer. Even if you look at well etablished scripts like phppass, you will see, that their fallback system cannot do some magic.
The question is, whether you should try it anyway or not. Since you want to publish your system under GPL, you propably don't know in what scenario it will be used. In my opinion it's best then to require a random source, or to fail fast (die with an appropriate error message), so a developer who wants to use your system, knows immediately, that there is a problem.
To read from the random source, you could call the mcrypt_create_iv() function...
$randomBinaryString = mcrypt_create_iv($length, MCRYPT_DEV_URANDOM);
...this function reads from the random pool of the operating system. Since PHP 5.3 it does it on Windows servers as well, so you can leave it to PHP to handle the random source.

If you have access to /dev/urandom you can use this:
function getRandData($length = 1024) {
$randf = fopen('/dev/urandom', 'r');
$data = fread($randf, $length);
fclose($randf);
return $data;
}
UPDATE:
of course you should have some backup in case opening the device fails

should you have access to client side, you can enable mouse movement tracking - this is what true crypt is using for extra level of entropy.

as i have said before, my rand function is a modified version of phpseclib's crypt_random function.
u could see it in the link given on my first post. at least the author of the phpseclib cryptographic library confirmed it; not enough for ordinary apps? i don't speak of extreme/theoretical security, just speak about practical security to the extent really needed and at the same time 'easily'/'sufficiently low cost' available for almost all of the ordinary applications on the web.
phpseclib's crypt_random effectively and silently falls back to the mt_rand (which u should know is really weak) in the worst case (no openssl_random_pseudo_bytes or urandom available), but my function uses a much more secure scheme in such cases. it's just a fall back to a scheme that brute-forcing/predicting its output is much harder and (should be) in practice sufficient for all ordinary apps/sites. it uses possible (in practice very likely and hard to predict/circumvent) extra entropy that is gathered over time which quickly becomes almost impossible to know for outsiders. it adds this possible entropy to the mt_rand's output (and also to the output of other sources: urandom, openssl_random_pseudo_bytes, mcrypt_create_iv). if u are informed u should know, this entropy can be added but not subtracted. in the (almost surely really rare) worst case, that extra entropy would be 0 or some too tiny amount. in the mediocre case, which i think is almost all of the cases, it would be even more than practically necessary, i think. (i have had vast cryptography studies, so when i say i think, it is based on a much more informed and scientific analysis than ordinary programmers).
see the full code of my modified crypt_random:
function crypt_random($min = 0, $max = 0x7FFFFFFF)
{
if ($min == $max) {
return $min;
}
global $entropy;
if (function_exists('openssl_random_pseudo_bytes')) {
// openssl_random_pseudo_bytes() is slow on windows per the following:
// http://stackoverflow.com/questions/1940168/openssl-random-pseudo-bytes-is-slow-php
if ((PHP_OS & "\xDF\xDF\xDF") !== 'WIN') { // PHP_OS & "\xDF\xDF\xDF" == strtoupper(substr(PHP_OS, 0, 3)), but a lot faster
extract(unpack('Nrandom', pack('H*', sha1(openssl_random_pseudo_bytes(4).$entropy.microtime()))));
return abs($random) % ($max - $min) + $min;
}
}
// see http://en.wikipedia.org/wiki//dev/random
static $urandom = true;
if ($urandom === true) {
// Warning's will be output unles the error suppression operator is used. Errors such as
// "open_basedir restriction in effect", "Permission denied", "No such file or directory", etc.
$urandom = #fopen('/dev/urandom', 'rb');
}
if (!is_bool($urandom)) {
extract(unpack('Nrandom', pack('H*', sha1(fread($urandom, 4).$entropy.microtime()))));
// say $min = 0 and $max = 3. if we didn't do abs() then we could have stuff like this:
// -4 % 3 + 0 = -1, even though -1 < $min
return abs($random) % ($max - $min) + $min;
}
if(function_exists('mcrypt_create_iv') and version_compare(PHP_VERSION, '5.3.0', '>=')) {
#$tmp16=mcrypt_create_iv(4, MCRYPT_DEV_URANDOM);
if($tmp16!==false) {
extract(unpack('Nrandom', pack('H*', sha1($tmp16.$entropy.microtime()))));
return abs($random) % ($max - $min) + $min;
}
}
/* Prior to PHP 4.2.0, mt_srand() had to be called before mt_rand() could be called.
Prior to PHP 5.2.6, mt_rand()'s automatic seeding was subpar, as elaborated here:
http://www.suspekt.org/2008/08/17/mt_srand-and-not-so-random-numbers/
The seeding routine is pretty much ripped from PHP's own internal GENERATE_SEED() macro:
http://svn.php.net/viewvc/php/php-src/tags/php_5_3_2/ext/standard/php_rand.h?view=markup */
static $seeded;
if (!isset($seeded) and version_compare(PHP_VERSION, '5.2.5', '<=')) {
$seeded = true;
mt_srand(fmod(time() * getmypid(), 0x7FFFFFFF) ^ fmod(1000000 * lcg_value(), 0x7FFFFFFF));
}
extract(unpack('Nrandom', pack('H*', sha1(mt_rand(0, 0x7FFFFFFF).$entropy.microtime()))));
return abs($random) % ($max - $min) + $min;
}
$entropy contains my extra entropy which comes from all requests parameters' entropy combined till now + current request's parameters entropy + the entropy of a random string (*) set by hand at the installation time.
*: length: 22, composed of lower and uppercase letters + numbers (more than 128 bits of entropy)

Update 2: Code Review Warning to Everyone: Dont use The code in the original question. It's a security liability. If this code is online anywhere Remove it as it open the whole system, network and database to a malevolent user. Your not only exposing your code but all of your users data.
Do not ever Serialize user inputs. If in your code your already doing it, Stop your server and change your code. This is a great exemple of Not doing crypto by yourself.
Update 1: For real security you need to have UN-guessable randomess in your entropy. A suitable option to add entropy has your Question refer-to is to use the Delta of your script's execution time Not microtime() by itself . Because the Delta Rely on the load of your server. And so is a combination of the hardware environment, temperature, network load, power load, disk access, Cpu usage and voltage fluctuation which together are unpredictable.
Using Time(), timestamp or microtime is a flaw in your implementation.
Script execution Delta Exemple code coming:
#martinstoeckli stated correctly that a Suitable Random generation for crypto is from
mcrypt_create_iv($lengthinbytes, MCRYPT_DEV_URANDOM);
but is outside the requirements of not having a crypto module
In SQL use the RAND() in conjunction with your generated number.
http://www.tutorialspoint.com/mysql/mysql-rand-function.htm
Php offer as well the Rand() function
http://php.net/manual/en/function.rand.php
they wont give you the same number so you could use both.

rn_rand() should be getting used not rand()

Related

Could a random sleep prevent timing attacks?

From Wikipedia
In cryptography, a timing attack is a side channel attack in which the
attacker attempts to compromise a cryptosystem by analyzing the time
taken to execute cryptographic algorithms.
Actually, to prevent timing attacks, I'm using the following function taken from this answer:
function timingSafeCompare($safe, $user) {
// Prevent issues if string length is 0
$safe .= chr(0);
$user .= chr(0);
$safeLen = strlen($safe);
$userLen = strlen($user);
// Set the result to the difference between the lengths
$result = $safeLen - $userLen;
// Note that we ALWAYS iterate over the user-supplied length
// This is to prevent leaking length information
for ($i = 0; $i < $userLen; $i++) {
// Using % here is a trick to prevent notices
// It's safe, since if the lengths are different
// $result is already non-0
$result |= (ord($safe[$i % $safeLen]) ^ ord($user[$i]));
}
// They are only identical strings if $result is exactly 0...
return $result === 0;
}
But I was thinking if is possible prevent this kind of attack using a random sleep like
function timingSafeCompare($a,$b) {
sleep(rand(0,100));
if ($a === $b) {
return true;
} else {
return false;
}
}
Or maybe augmenting the randomness of sleep
sleep(rand(1,10)+rand(1,10)+rand(1,10)+rand(1,10));
This kind of approach can totally prevent timing attacks? Or just make the work harder?
This kind of approach can totally prevent timing attacks? Or just make the work harder?
Neither. It doesn't prevent timing attacks, nor does it make them any more difficult at all.
To understand why, look at the docs for sleep. Specifically, the meaning of the first parameter:
Halt time in seconds.
So your app takes 0.3 seconds to respond without sleep. With sleep it takes either 0.3, 1.3, 2.3, etc...
So really, to get the part we care about (the timing difference), we just need to chop off the integer part:
$real_time = $time - floor($time);
But let's go a step further. Let's say that you randomly sleep using usleep. That's a lot more granular. That's sleeping in microseconds.
Well, the measurements are being made in the 15-50 nanosecond scale. So that sleep is still about 100 times less granular than the measurements being made. So we can average off to the single microsecond:
$microseconds = $time * 1000000;
$real_microseconds = $microseconds - floor($microseconds);
And still have meaningful data.
You could go further and use time_nanosleep which can sleep to nanosecond scale precision.
Then you could start fuddling with the numbers.
But the data is still there. The beauty of randomness is that you can just average it out:
$x = 15 + rand(1, 10000);
Run that enough times and you'll get a nice pretty graph. You'll tell that there are about 10000 different numbers, so you can then average away the randomness and deduce the "private" 15.
Because well-behaved randomness is unbiased, it's pretty easy to detect statistically over a large enough sample.
So the question I would ask is:
Why bother with sleep-like hacks when you can fix the problem correctly?
Anthony Ferrara answered this question in his blog post, It's All About Time. I highly recommend this article.
Many people, when they hear about timing attacks, think "Well, I'll just add a random delay! That'll work!". And it doesn't.
This is fine for a single request if the only side channel observable by the attacker is the response time.
However, if an attacker makes enough requests this random delay could average out as noted in #Scott's answer citing ircmaxell's blog post:
So if we needed to run 49,000 tests to get an accuracy of 15ns [without a random delay], then we would need perhaps 100,000 or 1,000,000 tests for the same accuracy with a random delay. Or perhaps 100,000,000. But the data is still there.
As an example, let's estimate the number of requests a timing attack would need to get a valid 160 bit Session ID like PHP at 6 bits per character which gives a length of 27 characters. Assume, like the linked answer that an attack can only be done on one user at once (as they are storing the user to lookup in the cookie).
Taking the very best case from the blog post, 100,000, the number of permutations would be 100,000 * 2^6 * 27.
On average, the attacker will find the value halfway through the number of permutations.
This gives the number of requests needed to discover the Session ID from a timing attack to be 86,400,000. This is compared to 42,336,000 requests without your proposed timing protection (assuming 15ns accuracy like the blog post).
In the blog post, taking the longest length tested, 14, took 0.01171 seconds on average, which means 86,400,000 would take 1,011,744 seconds which equates to 11 days 17 hours 2 minutes 24 seconds.
Could a random sleep prevent timing attacks?
This depends on the context in which your random sleep is used, and the bit strength of the string that it is protecting. If it is for "keep me logged in" functionality which is the context in the linked question, then it could be worth an attacker spending 11 days to use the timing attack to brute force a value. However, this is assuming perfect conditions (i.e. fairly consistent response times from your application for each string position tested and no resetting or rollover of IDs). Also, these type of activity from an attacker will create a lot of noise and it is likely they will be spotted via IDS and IPS.
It can't entirely prevent them, but it can make them more difficult for an attacker to execute. It would be much easier and better to use something like hash-equals which would prevent timing attacks entirely assuming the string lengths are equal.
Your proposed code
function timingSafeCompare($a,$b) {
sleep(rand(0,100));
if ($a === $b) {
return true;
} else {
return false;
}
}
Note that the PHP rand function is not cryptographically secure:
Caution
This function does not generate cryptographically secure values, and should not be used for cryptographic purposes. If you need a cryptographically secure value, consider using openssl_random_pseudo_bytes() instead.
This means that in theory an attacker could predict what rand was going to generate and then use this information to determine whether the response time delay from your application was due to random sleep or not.
The best way to approach security is to assume that the attacker knows your source code - the only things secret from the attacker should be things like keys and passwords - assume that they know the algorithms and function used. If you can still say your system is secure even though an attacker knows exactly how it works, you will be most of the way there. Functions like rand are usually set to seed with the current time of day, so an attacker can just make sure their system clock is set to the same as your server and then make requests to validate that their generator is matching yours.
Due to this, it is best to avoid insecure random functions like rand and change your implementation to use openssl_random_pseudo_bytes which will be unpredictable.
Also, as per ircmaxell's comment, sleep is not granular enough as it only accepts an integer to represent the number of seconds. If you are going to try this approach look into time_nanosleep with a random number of nanoseconds.
These pointers should help secure your implementation against this type of timing attack.
This kind of approach can totally prevent timing attacks? Or just make the work harder?
ircmaxell have already answered why this only makes the work harder,
but a solution to prevent timing attacks in PHP in general would be
/**
* execute callback function in constant-time,
* or throw an exception if callback was too slow
*
* #param callable $cb
* #param float $target_time_seconds
* #throws \LogicException if the callback was too slow
* #return whatever $cb returns.
*/
function execute_in_constant_time(callable $cb, float $target_time_seconds = 0.01)
{
$start_time = microtime(true);
$ret = ($cb)();
$success = time_sleep_until($start_time + $target_time_seconds);
if ($success) {
return $ret;
}
// dammit!
$time_used = microtime(true) - $start_time;
throw new \LogicException("callback function was too slow! time expired! target_time_seconds: {$target_time_seconds} actual time used: {$time_used}");
}
using that approach, your code could be
function timingSafeCompare($a,$b, float $target_time_seconds = 0.01) {
return execute_in_constant_time(fn() => $a === $b, $target_time_seconds);
}
downside is that you should pick a number with a large margin, meaning relatively much time is lost sleeping.. fwiw on my laptop i had to use 0.2 (200 milliseconds) to compare 2x exactly-1-GiB strings, with a Core i7-8565U (a weird 2018 mid-range laptop cpu i've never heard of)
and this loop:
ini_set("memory_limit", "-1");
$s1 = "a";
$s2 = "a";
$append = str_repeat("a",100*1024);
try {
for (;;) {
$res = timingSafeCompare($s1, $s2, 0.01);
$s1 .= $append;
$s2 .= $append;
}
} catch (\Throwable $e) {
var_dump(strlen($s1));
}
craps out at about 65 megabytes/int(65126401)
(but how often do you need to constant-time-compare strings above 65MB? i imagine it's not often)
you might think "then the attacker could send a HUGE string to compare, and check how long it takes for the exception to be thrown" but i don't think that would work, === starts by checking if both strings have the same length, and short-circuits if they have different lengths, such an attack should only work if the attacker can set the length for both strings to be large enough to timeout
today we have the native hash_equals() function to compare strings that have exactly the same length, but hash_equals() will not protect you against strings of different length, while the function above will.

What is the fastest combination of compression + encoding + checking + serialization of an array?

I need a combination of functions that does:
array serialization(no object, small - 3-7 key-value pairs of strings, no references)
data validity check of above(Is it better for the hash to be inside the array?)
encryption of above(is there any encryption method that validates decrypted information?)
compression of above(I am not sure if the cost worth: bandwidth / CPU time)
...of an array.
Everything should be optimized for speed.
For serializing the array I was thinking about using json_encode() rather than serialize() because it's faster. See Preferred method to store PHP arrays (json_encode vs serialize).
For data validity check I was thinking about using sha1(), but I am considering crc32 because it's faster and I don't think collisions are close. See Fastest hash for non-cryptographic uses?.
For encryption i made:
<?php
function encode($pass, $data) {
return mcrypt_encrypt(MCRYPT_RIJNDAEL_256, $pass, $data, MCRYPT_MODE_ECB);
}
function decode($pass, $data) {
return mcrypt_decrypt(MCRYPT_RIJNDAEL_256, $pass, $data, MCRYPT_MODE_ECB);
}
$rand = str_repeat(rand(0, 1000), 5);
$start = microtime(true);
for($i = 0; $i <= 10000; $i++){
encode('pass', $rand);
}
echo 'Script took ' . (microtime(true) - $start) . ' seconds for encryption<br/>';
$start = microtime(true);
for($i = 0; $i <= 10000; $i++){
encode('pass', $rand);
}
echo 'Script took ' . (microtime(true) - $start) . ' seconds for decryption';
Results are:
Script took 1.8680129051208 seconds for encryption
Script took 1.8597548007965 seconds for decryption
I would rather avoid any randomness. I know that CBC mode is more secure, but it is also slower.
For compression I have no idea what is better to use given the fact that the resulting string is binary and short.
Is there any compression that don't require encoding in order to set the resulting string as a cookie? I know that sha1() for example returns only digits ans letters.
It is a complex question. So feel free to point anything wrong or not accurate.
It contains many topics but basically the short question is how to safely and rapidly encrypt/decrypt an array while having a small representation of it.
Is this the right order?
Is data validation required given that there is a high probability that the resulting JSON
won't be valid in case data is altered?
Is there a function that already combines those or some of those functions?
I know that CBC mode is more secure, but it is also slower
Than ECB? Only if the data is more than a couple of blocks.
If you want the fastest encryption algorithm then there's no substitute for testing it yourself - somewhat strangely, PHP's sha1() implementation is significantly faster than its md5() (I know these are hashes - this is to illustrate that performance depends on implementation as much as algorithm).
Why are you trying to valdate it? If it's an encrypted datagram then the contents are opaque to the user - if they try tampering with it, then it will most likely to fail to decompress, in the unlikely event it still decompresses then decode will fail but in the remote case that this neither happen it should be very easy to check for other modifications - even an embedded CRC32 seems overkill.
in order to set the resulting string as a cookie
Sounds like you're using lots of fancy encryption to cover up a basic insecurity of your application - it's likely to be open to replay attacks. And you've got the added complication of ensuring that your data fits in a cookie. Why not just use a server-side session with a random value sent client-side (you don't have to use the PHP session handler if you want to implement a remember me type function and still have a conventional session).
In my opinion it would be sufficient to use only a compression. To reverse engineer a compression it would take a long time. I can recommend a huffman compression.

What is the most ideal, cross-language method of executing an A/B split?

I'm on a project where I have to implement an A/B split in 15 or so views, in this case for PHP - we'd like to use the same math if possible for our JavaScript projects.
What is the most ideal, least verbose, least CPU-intensive way of doing this? For this project, I just need to set a variable: something like:
// In the main controller
if(rand(1, 2) == 2)
{
$recipe = 'program';
}
else
{
$recipe = 'standard';
}
define('RECIPE',$recipe);
// In the view
$program = (RECIPE == 'program') ? '&ProgramOfInterest=' . $program_id : '';
We have 20 or so devs here and we all have our ways - what is the best, benchmark-proven way?
least cpu-intensive way:
use a image sensor (ideally a CMOS) to take a very long exposure of black.
You'll get lots of truly random noise due to light interference and sensor heat
the bits in the uncompressed image will be completely random
A team got something like 200Gb/sec of random data like this :)
Then simply:
var counter = 0;
if(imageBit[counter++]){
:D
I assume that the A/B split needs to be consistent across all users, so a user should consistently fall in the A or the B bucket (if not, your analysis of the A/B buckets will not reveal any info related to page navigation).
Hence using a rand function is probably not what you want.
Instead use a session identifier, session cookie or persistent cookie, and simply use the last 3 bytes of that cookie instead of your random value. You can add the bytes or multiply their ascii values to generate a number which you can the use as your cut-off.
This would be very portable across PHP and JS, and it is cheap in CPU and easy to verify correctness in a unit test.
You should use mt_rand() over rand(). It's 4x faster than rand() because mt_rand uses a Mersenne Twister over the libc random number generator which rand() uses (see php.net).
You can then get an equivalent to mt_rand() for javascript from the php.js library.

Session hash does size matter?

Does size matter when choosing the right algorithm to use for a session hash.
I recently read this article and it suggested using whirlpool to create a hash for session id. Whirlpool generates a 128 character hash string, is this too large?
The plan is to store the session hash in a db. Is there much of a difference between maybe using 64 character field (sha256), 96 character field (sha384) or 128 character field (whirlpool)? One of the initial arguments made for whirlpool was the speed vs other algorithms but looking at the speed results sha384 doesn't fair too badly.
There is the option truncate the hash to make it smaller than 128 characters.
I did modify the original code snippet, to allow changing of the algorithm based of the needs.
Update: There was some discussion about string being hashed, so I've included the code.
function generateUniqueId($maxLength = null) {
$entropy = '';
// try ssl first
if (function_exists('openssl_random_pseudo_bytes')) {
$entropy = openssl_random_pseudo_bytes(64, $strong);
// skip ssl since it wasn't using the strong algo
if($strong !== true) {
$entropy = '';
}
}
// add some basic mt_rand/uniqid combo
$entropy .= uniqid(mt_rand(), true);
// try to read from the windows RNG
if (class_exists('COM')) {
try {
$com = new COM('CAPICOM.Utilities.1');
$entropy .= base64_decode($com->GetRandom(64, 0));
} catch (Exception $ex) {
}
}
// try to read from the unix RNG
if (is_readable('/dev/urandom')) {
$h = fopen('/dev/urandom', 'rb');
$entropy .= fread($h, 64);
fclose($h);
}
// create hash
$hash = hash('whirlpool', $entropy);
// truncate hash if max length imposed
if ($maxLength) {
return substr($hash, 0, $maxLength);
}
return $hash;
}
The time taken to create the hash is not important, and as long as your database is properly indexed, the storage method should not be a major factor either.
However, the hash has to be transmitted with the client's request every time, frequently as a cookie. Large cookies can add a small amount of additional time to each request. See Yahoo!'s page performance best practices for more information. Smaller cookies, thus a smaller hash, have benefits.
Overall, large hash functions are probably not justified. For their limited scope, good old md5 and sha1 are probably just fine as the source behind a session token.
Yes, size matters.
If it's too short, you run the risk of collisions. You also make it practical for an attacker to find someone else's session by brute-force attack.
Being too long matters less, but every byte of the session ID has to be transferred from the browser to the server with every request, so if you're really optimising things, you may not want an ID that's too long.
You don't have to use all the bits of a hash algorithm, though - there's nothing stopping you from using something like Whirlpool, then only taking the first 128 bits (32 characters in hex). Practically speaking, 128 bits is a good lower bound on length, too.
As erickson points out, though, using a hash is a bit odd. Unless you have at least as much entropy as input as the length of the ID you're using, you're vulnerable to attacks that guess the input to your hash.
The article times out when I try to read it, but I can't think of a good reason to use a hash as a session identifier. Session identifiers should be unpredictable; given the title of the article, it sounds like the authors acknowledge that principle. Then, why not use a cryptographic random number generator to produce session identifiers?
A hash takes input, and if that input is predictable, so is the hash, and that's bad.
SHA1 or MD5 is probably enough for your needs. In practice, the probability of a collision is so small that it will likely never happen.
Ultimately, though, it all depends upon your required level of security. Do also keep in mind that longer hashes are both more expensive to compute and require more storage space.

How do I measure the strength of a password?

I was looking for an effective algorithm that can give me an accurate idea of how strong a password is.
I found that several different websites use several different algorithms as I get different password strength ratings on different websites.
This has grown to my general brain dump of best practices for working with passwords in PHP/MySQL.
The ideas presented here are generally not my own, but the best of what I've found to date.
Ensure you are using SSL for all operations involving user information. All pages that involve these forms should check they are being called via HTTPS, and refuse to work otherwise.
You can eliminate most attacks by simply limiting the number of failed logins allowed.
Allow for relatively weak passwords, but store the number of failed logins per user and require a captcha or password verification by email if you exceed it. I set my max failures to 5.
Presenting login failures to the user needs to be carefully thought out as to not provide information to attackers.
A failed login due to a non existent user should return the same message as a failed login due to a bad password. Providing a different message will allow attackers to determine valid user logins.
Also make sure you return exactly the same message in the event of a failure for too many logins with a valid password, and a failure with too many logins and a bad password. Providing a different message will allow attackers to determine valid user passwords. A fair number of users when forced to reset their password will simply put it back to what it was.
Unfortunately limiting the number of logins allowed per IP address is not practical. Several providers such as AOL and most companies proxy their web requests. Imposing this limit will effectively eliminate these users.
I've found checking for dictionary words before submit to be inefficient as either you have to send a dictionary to the client in javascript, or send an ajax request per field change. I did this for a while and it worked ok, but didn't like the traffic it generated.
Checking for inherently weak passwords minus dictionary words IS practical client side with some simple javascript.
After submit, I check for dictionary words, and username containing password and vice versa server side. Very good dictionaries are readily downloadable and the testing against them is simple. One gotcha here is that to test for a dictionary word, you need to send a query against the database, which again contains the password. The way I got around this was to encrypt my dictionary before hand with a simple encryption and end positioned SALT and then test for the encrypted password. Not ideal, but better than plain text and only on the wire for people on your physical machines and subnet.
Once you are happy with the password they have picked encrypt it with PHP first, then store. The following password encryption function is not my idea either, but solves a number of problems. Encrypting within PHP prevents people on a shared server from intercepting your unencrypted passwords. Adding something per user that won't change (I use email as this is the username for my sites) and add a hash (SALT is a short constant string I change per site) increases resistance to attacks. Because the SALT is located within the password, and the password can be any length, it becomes almost impossible to attack this with a rainbow table.
Alternately it also means that people can't change their email and you can't change the SALT without invalidating everyone's password though.
EDIT: I would now recommend using PhPass instead of my roll your own function here, or just forget user logins altogether and use OpenID instead.
function password_crypt($email,$toHash) {
$password = str_split($toHash,(strlen($toHash)/2)+1);
return hash('sha256', $email.$password[0].SALT.$password[1]);
}
My Jqueryish client side password meter. Target should be a div. It's width will change between 0 and 100 and background color will change based on the classes denoted in the script. Again mostly stolen from other things I've found:
$.updatePasswordMeter = function(password,username,target) {
$.updatePasswordMeter._checkRepetition = function(pLen,str) {
res = ""
for ( i=0; i<str.length ; i++ ) {
repeated=true;
for (j=0;j < pLen && (j+i+pLen) < str.length;j++)
repeated=repeated && (str.charAt(j+i)==str.charAt(j+i+pLen));
if (j<pLen) repeated=false;
if (repeated) {
i+=pLen-1;
repeated=false;
}
else {
res+=str.charAt(i);
};
};
return res;
};
var score = 0;
var r_class = 'weak-password';
//password < 4
if (password.length < 4 || password.toLowerCase()==username.toLowerCase()) {
target.width(score + '%').removeClass("weak-password okay-password good-password strong-password"
).addClass(r_class);
return true;
}
//password length
score += password.length * 4;
score += ( $.updatePasswordMeter._checkRepetition(1,password).length - password.length ) * 1;
score += ( $.updatePasswordMeter._checkRepetition(2,password).length - password.length ) * 1;
score += ( $.updatePasswordMeter._checkRepetition(3,password).length - password.length ) * 1;
score += ( $.updatePasswordMeter._checkRepetition(4,password).length - password.length ) * 1;
//password has 3 numbers
if (password.match(/(.*[0-9].*[0-9].*[0-9])/)) score += 5;
//password has 2 symbols
if (password.match(/(.*[!,#,#,$,%,^,&,*,?,_,~].*[!,#,#,$,%,^,&,*,?,_,~])/)) score += 5;
//password has Upper and Lower chars
if (password.match(/([a-z].*[A-Z])|([A-Z].*[a-z])/)) score += 10;
//password has number and chars
if (password.match(/([a-zA-Z])/) && password.match(/([0-9])/)) score += 15;
//
//password has number and symbol
if (password.match(/([!,#,#,$,%,^,&,*,?,_,~])/) && password.match(/([0-9])/)) score += 15;
//password has char and symbol
if (password.match(/([!,#,#,$,%,^,&,*,?,_,~])/) && password.match(/([a-zA-Z])/)) score += 15;
//password is just a nubers or chars
if (password.match(/^\w+$/) || password.match(/^\d+$/) ) score -= 10;
//verifing 0 < score < 100
score = score * 2;
if ( score < 0 ) score = 0;
if ( score > 100 ) score = 100;
if (score > 25 ) r_class = 'okay-password';
if (score > 50 ) r_class = 'good-password';
if (score > 75 ) r_class = 'strong-password';
target.width(score + '%').removeClass("weak-password okay-password good-password strong-password"
).addClass(r_class);
return true;
};
Fundamentally you want to prevent to major types of attacks
Dictionary attacks
Brute force attacks
To prevent the first, you want to consider passwords containing common words weak. To prevent the second, you want to encourage passwords of reasonable length (8+ characters is common) and with a reasonably large character set (include letters, numbers, and special characters). If you consider lower case and upper case letters to be different, that increases the character set substantially. However, this creates a usability issue for some user communities so you need to balance that consideration.
A quick google search turned up solutions that account for brute force attacks (complex password) but not for dictionary attacks. PHP Password Strength Meter from this list of strength checkers runs the check server-side, so it could be extended to check a dictionary.
EDIT:
By the way... you should also limit the number of login attempts per user. This will make both types of attacks less likely. Effective but not-user-friendly is to lock an account after X bad attempts and require a password reset. More user friendly but more effort is to throttle time between login attempts. You can also require CAPTCHA after the first few login attempts (which is something that Stack Overflow requires after too many edits, or for very new users).
Basically you probably want to use Regular Expressions to validate the length and complexity of the password.
A good example doing this using javascript can be found here:
http://marketingtechblog.com/programming/javascript-password-strength/
As Daren Schwenke pointed it out, you'd better work on the security yourself and not put this in the user hands.
But it's good to provide some hints to the user of how strong his password is, because the best way to get a password is still social engenering.
So you can hack a little client side script that checks the user password strenght as a courtesy indicator, in real time. It blocks nothing, but gives him a good warm feeling when it turns green :-)
Basically what you must check is commom sense : check if the password contains letters, numbers and non alphabetical caracters, in a reasonable quantity.
You can hack your own algo very easily : just make 10 / 10 mark :
0 is a zero lenght password;
+2 for every 8 caracters in the password (15 is supposed to be a safe lenght);
+1 for the use of a letter, +2 for the use of 2 letters;
+1 for the use of a number, +2 for the use of 2 numbers;
+1 for the use of a non alphabetical caracters, +2 for 2.
You don't need to check for godlike passwords (are there capitalized letters, where are positioned the special caracters, etc), your users are not in the bank / military / secret service / monthy python movies industry, are they ?
You can code that in an hour in without crazy javascript skills.
And anyway, valid the password and move all the security code on the server side. If you can delegate authentification (e.g : open ID), even better.
Don't Roll-Your-Own!
Cryptography experts discourage roll-your-own cryptography for reasons that should be obvious.
For the very same reasons, one should not attempt to roll his own solution to the problem of measuring a password's strength; it is very much a cryptographic problem.
Don't get into the ugly business of authoring some massive regular expression for this purpose; you will likely fail to account for several factors that influence a password's overall strength.
It's a Difficult Problem
There is considerable difficulty inherent to the problem of measuring a password's strength. The more research I perform on this subject, the more I realize that this is a "unidirectional" problem; that is, one cannot measure the "difficulty" (computational cost) of cracking a password efficiently. Rather, it is more efficient to provide complexity requirements and measure the password's ability to meet them.
When we consider the problem logically, a "crackability index" doesn't make much sense, as convenient as it sounds. There are so many factors that drive the calculation, most of which relate to the computational resources devoted to the cracking process, so as to be impractical.
Imagine pitting John the Ripper (or a similar tool) against the password in question; it might take days to crack a decent password, months to crack a good password, and until the sun burns-out to crack an exceptional password. This is not a practical means by which to measure password strength.
Approaching the problem from the other direction is far more manageable: if we supply a set of complexity requirements, it's possible to judge the relative strength of a password very quickly. Obviously, the supplied complexity requirements must evolve over time, but there are far fewer variables for which to account if we approach the problem in this way.
A Viable Solution
There is a standalone utility available from Openwall entitled passwdqc (presumably, standing for Password Quality Checker). Openwall developer, Solar Designer, does appear to be a bona fide cryptography expert (his works speak for themselves), and so is qualified to author such a tool.
For my particular use-case, this is a far more attractive solution than using an ill-conceived JavaScript snippet living in some dark corner of the Web.
Establishing parameters for your particular needs is the hardest part. The implementation is the easy part.
A Practical Example
I offer a simple implementation in PHP to provide a jump-start. Standard disclaimers apply.
This example assumes that we're feeding an entire list of passwords to the PHP script. It goes without saying that if you are doing this with real passwords (e.g., those dumped out of a password manager), extreme caution should be exercised with regard to password-handling. Simply writing the unencrypted password dump to disk jeopardizes the security of your passwords!
passwords.csv:
"Title","Password"
"My Test Password","password123"
"Your Test Password","123456!!!"
"A strong password","NFYbCoHC5S7dngitqCD53tvQkAu3dais"
password-check.php:
<?php
//A few handy examples from other users:
//http://php.net/manual/en/function.str-getcsv.php#117692
$csv = array_map('str_getcsv', file('passwords.csv'), [',']);
array_walk($csv, function(&$a) use ($csv) {
$a = array_combine($csv[0], $a);
});
//Remove column header.
array_shift($csv);
//Define report column headers.
$results[] = [
'Title',
'Result',
'Exit Code',
];
$i = 1;
foreach ($csv as $p) {
$row['title'] = $p['Title'];
//If the value contains a space, it's considered a passphrase.
$isPassphrase = stristr($p['Password'], ' ') !== false ? true : false;
$cmd = 'echo ' . escapeshellarg($p['Password']) . ' | pwqcheck -1 min=32,24,22,20,16 max=128';
if ($isPassphrase) {
$cmd .= ' passphrase=3';
}
else {
$cmd .= ' passphrase=0';
}
$output = null;
$exitCode = null;
$stdOut = exec($cmd, $output, $exitCode);
//Exit code 0 represents an unacceptable password (not an error).
//Exit code 1 represents an acceptable password (it meets the criteria).
if ($exitCode === 0 || $exitCode === 1) {
$row['result'] = trim($stdOut);
$row['exitCode'] = $exitCode;
}
else {
$row['result'] = 'An error occurred while calling pwqcheck';
$row['exitCode'] = null;
}
$results[$i] = $row;
$i++;
}
$reportFile = 'report.csv';
$fp = #fopen($reportFile, 'w');
if ($fp !== false) {
foreach ($results as $p) {
fputcsv($fp, $p);
}
fclose($fp);
}
else {
die($reportFile . ' could not be opened for writing (destination is not writable or file is in use)');
}
exit;
Resultant report.csv:
Title,Result,"Exit Code"
"My Test Password","Bad passphrase (too short)",1
"Your Test Password","Bad passphrase (too short)",1
"A strong password",OK,0
Wrapping-Up
I have yet to find a more thorough solution on the Web; needless to say, I welcome any other recommendations.
Obviously, this approach is not ideal for certain use-cases (e.g., a "password strength meter" implemented "client-side"). Even so, it would be trivial to make an AJAX call to a server-side resource that returns a pass/fail response using the approach outlined above, but such an approach should assume the potential for abuse (e.g., DoS attacks) and would require secure communication between client and server, as well as acceptance of the risks associated with transmitting the un-hashed password.
I can't think of a specific algorithm to check the strengh of a password. What we do is we define several criterion and when the password respect a criteria, we add 1 to its score. When the password reach a threshold, the password is strong. Otherwise it is weak.
You can define many different level of strengh if with different throeshold, or you can define different value for a specific criteria. For example, if a password has 5 character, we add 1, but if it got 10, then we add 2.
here is a list of criterion to check for
Length (8 to 12 is ok, more is better)
Contains lowercase letter
Contains uppercase letter
The upper case letter is NOT the first one.
Contains number
Contains symbols
the last character is NOT a human like symbol (ex : . or !)
Does not look like a dictionnary word. Some wise password crack contains library of word and letter substitutes (like Library --> L1br#ry )
Hope that help.

Categories