Explanation about constant-time algorithm and string comparision

Explanation about constant-time algorithm and string comparision - php

I've a problem to understand two different ways of string comparison. Given is the following function which compares two strings.
This function is used in the Symfony-Framework security component to compare passwords in the user-login process.
/**
* Compares two strings.
*
* This method implements a constant-time algorithm to compare strings.
*
* #param string $knownString The string of known length to compare against
* #param string $userInput The string that the user can control
*
* #return Boolean true if the two strings are the same, false otherwise
*/
function equals($knownString, $userInput)
{
// Prevent issues if string length is 0
$knownString .= chr(0);
$userInput .= chr(0);
$knownLen = strlen($knownString);
$userLen = strlen($userInput);
$result = $knownLen - $userLen;
// Note that we ALWAYS iterate over the user-supplied length
// This is to prevent leaking length information
for ($i = 0; $i < $userLen; $i++) {
// Using % here is a trick to prevent notices
// It's safe, since if the lengths are different
// $result is already non-0
$result |= (ord($knownString[$i % $knownLen]) ^ ord($userInput[$i]));
}
// They are only identical strings if $result is exactly 0...
return 0 === $result;
}
origin: origin snippet
I've problem to understand the difference between the equals() function and a simple comparison ===. I wrote a simple working example to explain my problem.
Given strings:
$password1 = 'Uif4yQZUqmCWRbWFQtdizZ9/qwPDyVHSLiR19gc6oO7QjAK6PlT/rrylpJDkZaEUOSI5c85xNEVA6JnuBrhWJw==';
$password2 = 'Uif4yQZUqmCWRbWFQtdizZ9/qwPDyVHSLiR19gc6oO7QjAK6PlT/rrylpJDkZaEUOSI5c85xNEVA6JnuBrhWJw==';
$password3 = 'iV3pT5/JpPhIXKmzTe3EOxSfZSukpYK0UC55aKUQgVaCgPXYN2SQ5FMUK/hxuj6qZoyhihz2p+M2M65Oblg1jg==';
Example 1 (act as expected)
echo $password1 === $password2 ? 'True' : 'False'; // Output: True
echo equals($password1, $password2) ? 'True' : 'False'; // Output: True
Example 2 (act as expected)
echo $password1 === $password3 ? 'True' : 'False'; // Output: False
echo equals($password1, $password3) ? 'True' : 'False'; // Output: False
I read about the Karp Rabin Algorithm but I'm not sure if the equals() function represent
the Karp Rabin Algorithm, and in general I didn't understand the Wikipedia article.
On the other hand I read that the equals() function will prevent brute force attacks is that right? Can someone explain what the advantage for equals() is?
Or can someone give me an example where === will fail and equals() does the correct work, so I can understand the advantage?
And what does constant-time Algorithm mean? I think constant-time has nothing to do with the real time, or if I'm wrong?

This function is just a normal string comparison function. It is not Rabin Karp. It is NOT constant time, it's linear time, regardless of what the comment says. It also does not prevent brute force attacks.
How it works:
if the correct and user-provided passwords are of different length, make $result != 0
iterate over the user-provided password, xor each of its characters with the corresponding character of the correct password (if the correct password is shorter, keep going through it in a circle), and bitwise or each result with $result.
Since only bitwise or is used, if any of the characters are different, $result will be != 0. Step 1 is needed because otherwise, user input "abca" would be accepted if the real password was "abc".
Why such string comparison functions are sometimes used
Let's assume we compare strings the usual way, and the correct password is "bac". Let's also assume I can precisely measure how long it takes for the password check to complete.
I (the user) try a, b, c... They don't work.
Then, I try aa. The algorithm compares the first 2 letters - b vs a, sees it's wrong, and returns false.
I now try with bb. The algorithm compares b vs b, they match, so it goes on to letter #2, compares a vs b, sees it's wrong, returns false. Now, since I am able to time the execution of the algorithm precisely, I know the password starts with "b", because the second pass took more time than the first one - I know the first letter matched.
So I try ba, bb, bc... They fail.
Now I check baa, bbb, see baa runs slower so second letter is a. This way, letter by letter, I can determine the password in an O(cN) number of attempts instead of O(c^N) that brute force would take.
It usually isn't as much of a concern as this explanation might make it sound, because it is unlikely an attacker will time a string comparison to such a degree of accuracy. But sometimes it can be.

Related

CTF Type Juggling with ripemd160 hash

I am trying to solve a CTF in which the juggling type should be used. The code is:
if ($_GET["hash"] == hash("ripemd160", $_GET["hash"]))
{
echo $flag;
}
else
{
echo "<h1>Bad Hash</h1>";
}
I made a script in python which checks random hashes in ripemd160 that begins with "0e" and ends with only numbers. The code is:
def id_generator(size, chars=string.digits):
return ''.join(random.choice(chars) for _ in range(size))
param = "0e"
results = []
while True:
h = hashlib.new('ripemd160')
h.update("{0}".format(str(param)).encode('utf-8'))
hashed = h.hexdigest()
if param not in results:
print(param)
if hashed.startswith("0e") and hashed[2:].isdigit():
print(param)
print(hashed)
break
results.append(param)
else:
print("CHECKED")
param = "0e" + str(id_generator(size=10))
Any suggestions on how to solve it? Thank you!

There seems to be a bit of misunderstanding in the comments, so I'll start by explaining the problem a little more:
Type juggling refers to the behaviour of PHP whereby variables are implicitly cast to different data types under certain conditions. For example, all the following logical expressions will evaluate to true in PHP:
0 == 0 // int vs. int
"0" == 0 // str -> int
"abc" == 0 // any non-numerical string -> 0
"1.234E+03" == "0.1234E+04" // string that looks like a float -> float
"0e215962017" == 0 // another string that looks like a float
The last of these examples is interesting because its MD5 hash value is another string consisting of 0e followed by a bunch of decimal digits (0e291242476940776845150308577824). So here's another logical expression in PHP that will evaluate to true:
"0e215962017" == md5("0e215962017")
To solve this CTF challenge, you have to find a string that is "equal" to its own hash value, but using the RIPEMD160 algorithm instead of MD5. When this is provided as a query string variable (e.g., ?hash=0e215962017), then the PHP script will disclose the value of a flag.
Fake hash collisions like this aren't difficult to find. Roughly 1 in every 256 MD5 hashes will start with '0e', and the probability that the remaining 30 characters are all digits is (10/16)^30. If you do the maths, you'll find that the probability of an MD5 hash equating to zero in PHP is approximately one in 340 million. It took me about a minute (almost 216 million attempts) to find the above example.
Exactly the same method can be used to find similar values that work with RIPEMD160. You just need to test more hashes, since the extra hash digits mean that the probability of a "collision" will be approximately one in 14.6 billion. Quite a lot, but still tractable (in fact, I found a solution to this challenge in about 15 minutes, but I'm not posting it here).
Your code, on the other hand, will take much, much longer to find a solution. First of all, there is absolutely no point in generating random inputs. Sequential values will work just as well, and will be much faster to generate.
If you use sequential input values, then you also won't need to worry about repeating the same hash calculations. Your code uses a list structure to store previously hashed values. This is a terrible idea. Searching for an item in a list is an O(n) operation, so once your code has (unsuccessfully) tested a billion inputs, it will have to compare every new input against each of these billion inputs at each iteration, causing your code to grind to a complete standstill. Your code would actually run a lot faster if you didn't bother checking for duplicates. When you have time, I suggest you learn when to use lists, dicts and sets in Python.
Another problem is that your code only tests 10-digit numbers, which means it can only test a maximum of 10 billion possible inputs. Based on the numbers given above, are you sure this is a sensible limit?
Finally, your code is printing every single input string before you calculate its hash. Before your program outputs a solution, you can expect it to print out somewhere in the order of a billion screenfuls of incorrect guesses. Is there any point in doing this? No.
Here's the code I used to find the MD5 collision I mentioned earlier. You can easily adapt it to work with RIPEMD160, and you can convert it to Python if you like (although the PHP code is much simpler):
$n = 0;
while (1) {
$s = "0e$n";
$h = md5($s);
if ($s == $h) break;
$n++;
}
echo "$s : $h\n";
Note: Use PHP's hash_equals() function and strict comparison operators to avoid this sort of vulnerability in your own code.

how to create a row of digits based on a string in php [duplicate]

In php is there a way to give a unique hash from a string, but that the hash was made up from numbers only?
example:
return md5(234); // returns 098f6bcd4621d373cade4e832627b4f6
but I need
return numhash(234); // returns 00978902923102372190
(20 numbers only)
the problem here is that I want the hashing to be short.
edit:
OK let me explain the back story here.
I have a site that has a ID for every registered person, also I need a ID for the person to use and exchange (hence it can't be too long), so far the ID numbering has been 00001, 00002, 00003 etc...
this makes some people look more important
this reveals application info that I don't want to reveal.
To fix point 1 and 2 I need to "hide" the number while keeping it unique.
Edit + SOLUTION:
Numeric hash function based on the code by https://stackoverflow.com/a/23679870/175071
/**
* Return a number only hash
* https://stackoverflow.com/a/23679870/175071
* #param $str
* #param null $len
* #return number
*/
public function numHash($str, $len=null)
{
$binhash = md5($str, true);
$numhash = unpack('N2', $binhash);
$hash = $numhash[1] . $numhash[2];
if($len && is_int($len)) {
$hash = substr($hash, 0, $len);
}
return $hash;
}
// Usage
numHash(234, 20); // always returns 6814430791721596451

An MD5 or SHA1 hash in PHP returns a hexadecimal number, so all you need to do is convert bases. PHP has a function that can do this for you:
$bignum = hexdec( md5("test") );
or
$bignum = hexdec( sha1("test") );
PHP Manual for hexdec
Since you want a limited size number, you could then use modular division to put it in a range you want.
$smallnum = $bignum % [put your upper bound here]
EDIT
As noted by Artefacto in the comments, using this approach will result in a number beyond the maximum size of an Integer in PHP, and the result after modular division will always be 0. However, taking a substring of the hash that contains the first 16 characters doesn't have this problem. Revised version for calculating the initial large number:
$bignum = hexdec( substr(sha1("test"), 0, 15) );

You can try crc32(). See the documentation at: http://php.net/manual/en/function.crc32.php
$checksum = crc32("The quick brown fox jumped over the lazy dog.");
printf("%u\n", $checksum); // prints 2191738434
With that said, crc should only be used to validate the integrity of data.

There are some good answers but for me the approaches seem silly.
They first force php to create a Hex number, then convert this back (hexdec) in a BigInteger and then cut it down to a number of letters... this is much work!
Instead why not
Read the hash as binary:
$binhash = md5('[input value]', true);
then using
$numhash = unpack('N2', $binhash); //- or 'V2' for little endian
to cast this as two INTs ($numhash is an array of two elements). Now you can reduce the number of bits in the number simply using an AND operation. e.g:
$result = $numhash[1] & 0x000FFFFF; //- to get numbers between 0 and 1048575
But be warned of collisions! Reducing the number means increasing the probability of two different [input value] with the same output.
I think that the much better way would be the use of "ID-Crypting" with a Bijectiv function. So no collisions could happen! For the simplest kind just use an Affine_cipher
Example with max input value range from 0 to 25:
function numcrypt($a)
{
return ($a * 15) % 26;
}
function unnumcrypt($a)
{
return ($a * 7) % 26;
}
Output:
numcrypt(1) : 15
numcrypt(2) : 4
numcrypt(3) : 19
unnumcrypt(15) : 1
unnumcrypt(4) : 2
unnumcrypt(19) : 3
e.g.
$id = unnumcrypt($_GET('userid'));
... do something with the ID ...
echo ' go ';
of course this is not secure, but if no one knows the method used for your encryption then there are no security reasons then this way is faster and collision safe.

The problem of cut off the hash are the collisions, to avoid it try:
return hexdec(crc32("Hello World"));
The crc32():
Generates the cyclic redundancy checksum polynomial of 32-bit lengths
of the str. This is usually used to validate the integrity of data
being transmitted.
That give us an integer of 32 bit, negative in 32 bits installation, or positive in the 64 bits. This integer could be store like an ID in a database. This don´t have collision problems, because it fits into 32bits variable, once you convert it to decimal with the hexdec() function.

First of all, md5 is basically compromised, so you shouldn't be using it for anything but non-critical hashing.
PHP5 has the hash() function, see http://www.php.net/manual/en/function.hash.php.
Setting the last parameter to true will give you a string of binary data. Alternatively, you could split the resulting hexadecimal hash into pieces of 2 characters and convert them to integers individually, but I'd expect that to be much slower.

Try hashid.
It hash a number into format you can define. The formats include how many character, and what character included.
Example:
$hashids->encode(1);
Will return "28630" depends on your format,

Just use my manual hash method below:
Divide the number (e.g. 6 digit) by prime values, 3,5,7.
And get the first 6 values that are in the decimal places as the ID to be used. Do a check on uniqueness before actual creation of the ID, if a collision exists, increase the last digit by +1 until a non collision.
E.g. 123456 gives you 771428
123457 gives you 780952
123458 gives you 790476.

strcmp vs. == vs. === in PHP for checking hash equality

I'm using crypt() to hash passwords in PHP, and am trying to work out the safest way of testing equality of the resulting hash when performing password checks.
There are three options that I can see:
Option 1 - Double Equals
function checkPassword($hash, $password)
{
return crypt($password, $hash) == $hash;
}
Option 2 - Triple Equals
function checkPassword($hash, $password)
{
return crypt($password, $hash) === $hash;
}
Option 3 - strcmp()
function checkPassword($hash, $password)
{
return strcmp(crypt($password, $hash), $hash) === 0;
}
My intuition tells me that option 1 is a bad idea, due to the lack of type checking, and that options 2 or 3 are likely to be better. However, I can't work out if there's a specific case that === or strcmp would fail under. Which is safest for this purpose?

When it comes to security I prefer to use the === operator. === ensures the two operands are exactly the same, without trying to accomodate some casting in order to "help" the comparison to reach a successful match - as it may help while developing thanks to a loose-typed language, like PHP.
Of course, one of the operand is to be trusted. A hash from the database is trustable, while the user input is not.
One can always dither for a while, coming to the conclusion there is no risk using == in a specific case. Maybe. But for instance
"0afd9f7b678fdefca" == 0 is true
"aafd9f7b678fdefca" == 0 is also true
as PHP tries to convert the "hash" into a number (probably using atoi) which gives 0. While it is unlikely crypt returns 0, I'd prefer to maximize the cases where the passwords don't match (and answer a support call) by using ===, than allowing a rare case that I didn't think about by using ==.
As for strcmp, the function returns <0 or >0 if different, and 0 if equal. But
strcmp("3", 0003) returns 0
strcmp("0003", 0003) returns -3
which are not surprising after all. A literal 0003 is actually an integer, 3 and since strcmp expects a string, the 3 will be converted to "3". But that shows there is some conversion that may happen in this case, since strcmp is a function, while === is part of the language.
So my preference in that case goes to === (which is faster than == anyway).

You should be using the hash_equals() function that is built into PHP. There would be no need to make your own function. The hash_equals() will return a boolean value.
In my opinion it is usually NOT a good idea to use == or === for comparing strings let alone hashed strings.

That is incorrect, please look at the definition of the function.
According to PHP:
Returns < 0 if str1 is less than str2;
> 0 if str1 is greater than str2,
and 0 if they are equal
It returns less than 0 if str1 is less than str2. Note the phrase "less than", it does not return just -1, but any negative value. The same happens when str1 is greater than str2, but it returns a positive, non-zero value. It returns a positive value that can be 1, or any number thereafter.
strcmp()returns a number that is the difference between the two strings starting with the last character that was found to be similar.
Here is an example:
$output = strcmp("red", "blue");
The variable $output with contain a value of 16

I think that using == would be sufficient in your case.
== checks for equality regardless of type, whereas === checks for equality as well as type.
1 == "1" = True
1 === "1" = False
Since we're not too concerned with type, I'd keep it simple and go with ==.

Comparing two strings, limited via bitmask in PHP

In some PHP I need to compare two strings, but only on the bits that are set as one in the bitmask. How would I implement such a behavior?
I've tried:
$string1='aaabbb';
$string2='ababbb';
$bitmask='101101';
function compare($string1, $string2, $bitmask){
$resultBitmask=(~($string1 ^ $string2)|~$bitmask);
}
For clarity's sake, I've written ff bytes as 1 in the bitmask for illustrative purposes. They would actually be ff in hex when a bitmask is generated. Same goes for 0 being null bytes.
The string and the bitmask are always different lengths each time the function is called. I've managed to get a set of bits for comparison, but am unable to check whether they are all set since the lenths differ. At this time, I've been using preg_match with a regex that matches any number of ff bytes, but is there a more elegant solution?
Edit: Since the strings are any length up to 4096 bits long, they cannot be converted to numbers.

It's not the flashest way of doing it but:
$stillTheSame = true;
for($i=0;$i<=strlen($bitmask); $i++)
{
if($bitmask[$i] == 1)
{
if($string1[$i] != $string2[$i])
{
$stillTheSame = false;
break;
}
}
}
Not sure fof your actual checking logic, but this should help hopefully.

Self-solved:
Since this will repeat with many strings of the same length during a run, but have different lengths between runs, I need to check that the resulting string after the bitwise operations is all ones and the correct length. I realized that this string full of ones can be generated when needed, which is quite rarely, once every 1000 or so string comparisons. I can generate the string before runs as follows:
$ones=str_repeat(chr(255), $byte_length);
and then defining the compare( function a bit differently:
function compare($string1, $string2, $bitmask){
global $ones;
$resultBitmask=(~($string1 ^ $string2)|~$bitmask);
if ($resultBitmask=$ones){
return 1;
} else {return 0};
}
The trick was the str_repeat which I was not aware of before.

Secure string compare function

I just came across this code in the HTTP Auth library of the Zend Framework. It seems to be using a special string compare function to make it more secure. However, I don't quite understand the comments. Could anybody explain why this function is more secure than doing $a == $b?
/**
* Securely compare two strings for equality while avoided C level memcmp()
* optimisations capable of leaking timing information useful to an attacker
* attempting to iteratively guess the unknown string (e.g. password) being
* compared against.
*
* #param string $a
* #param string $b
* #return bool
*/
protected function _secureStringCompare($a, $b)
{
if (strlen($a) !== strlen($b)) {
return false;
}
$result = 0;
for ($i = 0; $i < strlen($a); $i++) {
$result |= ord($a[$i]) ^ ord($b[$i]);
}
return $result == 0;
}

It looks like they're trying to prevent timing attacks.
In cryptography, a timing attack is a side channel attack in which the attacker attempts to compromise a cryptosystem by analyzing the time taken to execute cryptographic algorithms. Every logical operation in a computer takes time to execute, and the time can differ based on the input; with precise measurements of the time for each operation, an attacker can work backwards to the input.
Basically, if it takes a different amount of time to compare a correct password and an incorrect password, then you can use the timing to figure out how many characters of the password you've guessed correctly.
Consider an extremely flawed string comparison (this is basically the normal string equality function, with an obvious wait added):
function compare(a, b) {
if(len(a) !== len(b)) {
return false;
}
for(i = 0; i < len(a); ++i) {
if(a[i] !== b[i]) {
return false;
}
wait(10); // wait 10 ms
}
return true;
}
Say you give a password and it (consistently) takes some amount of time for one password, and about 10 ms longer for another. What does this tell you? It means the second password has one more character correct than the first one.
This lets you do movie hacking -- where you guess a password one character at a time (which is much easier than guessing every single possible password).
In the real world, there's other factors involved, so you have to try a password many, many times to handle the randomness of the real world, but you can still try every one character password until one is obviously taking longer, then start on two character password, and so on.
This function still has a minor problem here:
if(strlen($a) !== strlen($b)) {
return false;
}
It lets you use timing attacks to figure out the correct length of the password, which lets you not bother guessing any shorter or longer passwords. In general, you want to hash your passwords first (which will create equal-length strings), so I'm guessing they didn't consider it to be a problem.

We Keep Coding

PHP, A popular general-purpose scripting language that is especially suited to web development.