Php: which is the fastest string-hashing method for filesystem? - php

I want to cache stuffes to disk, and of course what I put to the disk must be "safe". I was doing some measurements:
foreach (['md5', 'crc32', 'base64_encode'] as $item)
{
$m = microtime(true);
for ($i = 1; $i < 1000000; $i++)
{
$a = 'adat'.str_repeat('x', mt_rand(10, 1000));
$a = $item($a);
}
echo $item.'<br>';
echo microtime(true)-$m;
echo '<hr>';
}
the results:
md5
1.9821128845215
crc32
1.8771071434021
base64_encode
1.110063791275
so base64_encode won, but it generates a long string, so its easy to exceed the 256 character limit. Is there any faster encoding method which Im not aware of? It dont have to be bi-directioned

Try this:
$hashValue = hash('tiger160,3', $string);
tiger has good length and good speed.
Also in php manual you can find useful comments that describes features of different hash algorithms.

Related

Timing attack with PHP

I'm trying to produce a timing attack in PHP and am using PHP 7.1 with the following script:
<?php
$find = "hello";
$length = array_combine(range(1, 10), array_fill(1, 10, 0));
for ($i = 0; $i < 1000000; $i++) {
for ($j = 1; $j <= 10; $j++) {
$testValue = str_repeat('a', $j);
$start = microtime(true);
if ($find === $testValue) {
// Do nothing
}
$end = microtime(true);
$length[$j] += $end - $start;
}
}
arsort($length);
$length = key($length);
var_dump($length . " found");
$found = '';
$alphabet = array_combine(range('a', 'z'), array_fill(1, 26, 0));
for ($len = 0; $len < $length; $len++) {
$currentIteration = $alphabet;
$filler = str_repeat('a', $length - $len - 1);
for ($i = 0; $i < 1000000; $i++) {
foreach ($currentIteration as $letter => $time) {
$testValue = $found . $letter . $filler;
$start = microtime(true);
if ($find === $testValue) {
// Do nothing
}
$end = microtime(true);
$currentIteration[$letter] += $end - $start;
}
}
arsort($currentIteration);
$found .= key($currentIteration);
}
var_dump($found);
This is searching for a word with the following constraints
a-z only
up to 10 characters
The script finds the length of the word without any issue, but the value of the word never comes back as expected with a timing attack.
Is there something I am doing wrong?
The script loops though lengths, correctly identifies the length. It then loops though each letter (a-z) and checks the speed on these. In theory, 'haaaa' should be slightly slower than 'aaaaa' due to the first letter being a h. It then carries on for each of the five letters.
Running gives something like 'brhas' which is clearly wrong (it's different each time, but always wrong).
Is there something I am doing wrong?
I don't think so. I tried your code and I too, like you and the other people who tried in the comments, get completely random results for the second loop. The first one (the length) is mostly reliable, though not 100% of the times. By the way, the $argv[1] trick suggested didn't really improve the consistency of the results, and honestly I don't really see why it should.
Since I was curious I had a look at the PHP 7.1 source code. The string identity function (zend_is_identical) looks like this:
case IS_STRING:
return (Z_STR_P(op1) == Z_STR_P(op2) ||
(Z_STRLEN_P(op1) == Z_STRLEN_P(op2) &&
memcmp(Z_STRVAL_P(op1), Z_STRVAL_P(op2), Z_STRLEN_P(op1)) == 0));
Now it's easy to see why the first timing attack on the length works great. If the length is different then memcmp is never called and therefore it returns a lot faster. The difference is easily noticeable, even without too many iterations.
Once you have the length figured out, in your second loop you are basically trying to attack the underlying memcmp. The problem is that the difference in timing highly depends on:
the implementation of memcmp
the current load and interfering processes
the architecture of the machine.
I recommend this article titled "Benchmarking memcmp for timing attacks" for more detailed explanations. They did a much more precise benchmark and still were not able to get a clear noticeable difference in timing. I'm simply going to quote the conclusion of the article:
In conclusion, it highly depends on the circumstances if a memcmp() is subject to a timing attack.

php memory limit test

It seems this is an ever unsolved question: I did a simple test to the memory limits in my local machine (from command line):
<?php
for ($i = 0; $i < 4000*4000; $i ++) {
$R[$i] = 1.00001;
}
?>
and I have memory limit set at 128M. But PHP still sends off "Allowed memory exhausted" message. Why?
Well I wouldn't say ever unsolved question. There are a few reasons for it-PHP is a very insufficient language in terms of memory management-it's no secret. Now the code you provided could be optimized a little bit, but not enough to make a difference. For example take the multiplication in the for loop outside and store the value in a variable. Otherwise you are performing that mathematical operation on each loop. But that will not make any significant difference - 2310451248 bytes as it is and 2310451144 bytes if you do it as I proposed. But the point remains - PHP is not a low level language so you can't expect it to have the same efficiency as C for example. In your particular case, the required memory to perform all this is a little over 2 GB(2.15 gb)
<?php
ini_set('memory_limit', '4096M');
$ii = 4000*4000;
//$R = new SplFixedArray($ii);
$R = array();
for ($i = 0; $i < $ii; $i ++) {
$R[$i] = 1.00001;
}
echo humanize(memory_get_usage())."\n";
function humanize($size)
{
$unit=array('b','kb','mb','gb','tb','pb');
return round($size/pow(1024,($i=floor(log($size,1024)))),2).' '.$unit[$i];
}
?>
But using SplFixedArray things change a lot:
<?php
ini_set('memory_limit', '4096M');
$ii = 4000*4000;
$R = new SplFixedArray($ii);
for ($i = 0; $i < $ii; $i ++) {
$R[$i] = 1.00001;
}
echo humanize(memory_get_usage())."\n";
function humanize($size)
{
$unit=array('b','kb','mb','gb','tb','pb');
return round($size/pow(1024,($i=floor(log($size,1024)))),2).' '.$unit[$i];
}
?>
Which requires "only" 854.72 mb.
This is one of the main reasons why companies who deal with larger amounts of data in general avoid using PHP and go for languages such as python instead. There is a great article describing all of the problems and causes around this topic, found here. Hope that helps.

Fastest way of getting a character inside a string given the index (PHP)

I know of several ways to get a character off a string given the index.
<?php
$string = 'abcd';
echo $string[2];
echo $string{2};
echo substr($string, 2, 1);
?>
I don't know if there are any more ways, if you know of any please don't hesitate to add it. The question is, if I were to choose and repeat a method above a couple of million times, possibly using mt_rand to get the index value, which method would be the most efficient in terms of least memory consumption and fastest speed?
To arrive at an answer, you'll need to setup a benchmark test rig. Compare all methods over several (hundreds of thousands or millions) iterations on an idle box. Try the built-in microtime function to measure the difference between start and finish. That's your elapsed time.
The test should take you all of 2 minutes to write.
To save you some effort, I wrote a test. My own test shows that the functional solution (substr) is MUCH slower (expected). The idiomatic PHP ({}) solution is as fast as the index method. They are interchangeable. The ([]) is preferred, as this is the direction where PHP is going regarding string offsets.
<?php
$string = 'abcd';
$limit = 1000000;
$r = array(); // results
// PHP idiomatic string index method
$s = microtime(true);
for ($i = 0; $i < $limit; ++$i) {
$c = $string{2};
}
$r[] = microtime(true) - $s;
echo "\n";
// PHP functional solution
$s = microtime(true);
for ($i = 0; $i < $limit; ++$i) {
$c = substr($string, 2, 1);
}
$r[] = microtime(true) - $s;
echo "\n";
// index method
$s = microtime(true);
for ($i = 0; $i < $limit; ++$i) {
$c = $string[2];
}
$r[] = microtime(true) - $s;
echo "\n";
// RESULTS
foreach ($r as $i => $v) {
echo "RESULT ($i): $v \n";
}
?>
Results:
RESULT (PHP4 & 5 idiomatic braces syntax): 0.19106006622314
RESULT (string slice function): 0.50699090957642
RESULT (*index syntax, the future as the braces are being deprecated *): 0.19102001190186

Turning an integer into random string and back again

what I'm wanting is to convert an integer into a string. For example, 123456789 may become 8GFsah93r ... you know like Youtube, Pastebin and what not. I then want to convert it back.
I'm working with large integers, for example: 131569877435989900
Take a look at this link: http://codepad.viper-7.com/wHKOMi
This is my attempt using a function I found on the web, obviously... it's not correctly converting back to integer. I'm needing something that does this realiably.
Thanks
Ok, one of the ideas is to use a character array as a representation of a numeric system. Then you can convert from base 10 to base x and vica-versa. The value will be shorter and less readable (altought, you should encrypt it with a two-way crypter if it must be secure).
A solution:
final class UrlShortener {
private static $charfeed = Array(
'a','A','b','B','c','C','d','D','e','E','f','F','g','G','h','H','i','I','j','J','k','K','l','L','m',
'M','n','N','o','O','p','P','q','Q','r','R','s','S','t','T','u','U','v','V','w','W','x','X','y','Y',
'z','Z','0','1','2','3','4','5','6','7','8','9');
public static function intToShort($number) {
$need = count(self::$charfeed);
$s = '';
do {
$s .= self::$charfeed[$number%$need];
$number = floor($number/$need);
} while($number > 0);
return $s;
}
public static function shortToInt($string) {
$num = 0;
$need = count(self::$charfeed);
$length = strlen($string);
for($x = 0; $x < $length; $x++) {
$key = array_search($string[$x], self::$charfeed);
$value = $key * pow($need, $x);
$num += $value;
}
return $num;
}
}
Then you can use:
UrlShortener::intToShort(2);
UrlShortener::shortToInt("b");
EDIT
with large numbers, it does not work. You should use this version (with bcmath http://www.php.net/manual/en/book.bc.php ) with very large numbers:
final class UrlShortener {
private static $charfeed = Array(
'a','A','b','B','c','C','d','D','e','E','f','F','g','G','h','H','i','I','j','J','k','K','l','L','m',
'M','n','N','o','O','p','P','q','Q','r','R','s','S','t','T','u','U','v','V','w','W','x','X','y','Y',
'z','Z','0','1','2','3','4','5','6','7','8','9');
public static function intToShort($number) {
$need = count(self::$charfeed);
$s = '';
do {
$s .= self::$charfeed[bcmod($number, $need)];
$number = floor($number/$need);
} while($number > 0);
return $s;
}
public static function shortToInt($string) {
$num = 0;
$need = count(self::$charfeed);
$length = strlen($string);
for($x = 0; $x < $length; $x++) {
$key = array_search($string[$x], self::$charfeed);
$value = $key * bcpow($need, $x);
$num += $value;
}
return $num;
}
}
$original = 131569877435989900;
$short = UrlShortener::intToShort($original);
echo $short;
echo '<br/>';
$result = UrlShortener::shortToInt($short);
echo $result;
echo '<br/>';
echo bccomp($original, $result);
If something missing from here, please let me know, because it's only a snippet from my library (I don't wanna insert the whole thing here)
negra
check base64 encoding: http://php.net/manual/en/function.base64-encode.php http://php.net/manual/en/function.base64-decode.php
If you want a shorter string first encode it into an 8bit string then encode. You can do this with % 256 and / 256.
Or you could manually do what base64 does, get the first 6bits and encode it to a char.
Why not use something like this? Do you need it heavily encrypted?
$num = 131569877435989900;
echo $str = base64_encode($num);
echo base64_decode($str);
I think what you want is to encode the ids using Base32. The resulting string contains only the 26 letters of the alphabet and the digits 2-7, making it very human readable.
The simplest would be to use something like base_convert -- unfortunately, it won't work for such large integers correctly.
However, you can use the same idea by copying base_convert_arbitrary from my answer here and doing:
$id = '131569877435989900';
$encoded = base_convert_arbitrary($id, 10, 36);
$decoded = base_convert_arbitrary($encoded, 36, 10);
print_r($encoded);
print_r($decoded);
See it in action.
The nice thing about this approach is that you can tweak the first line inside the function, which reads:
$digits = '0123456789abcdefghijklmnopqrstuvwxyz'; // 36 "digits"
Add any other "digits" you find acceptable (e.g. capital letters or other symbols you don't mind having in your URL). You can then replace the base 36 in the above example with a larger one (you can go as high as there are defined digits), and it will work just like you want it to.
See it here working with 62 digits.
I am suprised No one is mentioning base64_encode() and it partner base64_decode().
If you were not considering length this is perfect
$before = base64_encode(131569877435989900);
$after = 'MS4zMTU2OTg3NzQzNTk5RSsxNw==';
$on_reverse = base64_decode('MS4zMTU2OTg3NzQzNTk5RSsxNw==');
$on_reverse == 131569877435989900;
I always go for the simplest solutions, as long as they don't compromise my security.
The easiest way to get random string is to use hash functions like md5() or sha1() For example:
<?php
$bigInt = '131569877435989900';
$hash = md5($bigInt);
$hashed=substr($hash,0,-20);
echo $hashed;
?>
These hash functions are irreversible-you can't get the original value(these functions are also used to crypt data). If you want you can save the original big integer in an array or a database. But decripting the hash would be impossible.

Random Code Overkill?

I have some code I am using
function genCode ($entropy=1) {
$truCde = "";
$indx = 0;
$leng = 30*$entropy;
while ($indx < $leng) {
$code = "";
$length = 100*$entropy;
$index = 0;
while ($index < $length) {
$code .= rand();
$index++;
}
$index = 0;
while ($index < $length) {
$code = sha1($code);
$index++;
}
$truCde .= $code;
$indx++;
}
$finalCode = sha1(rand()) . hash("sha256",$truCde . md5($entropy*rand()));
$finalCode .= sha1(md5(strlen($finalCode)*$entropy));
return hash (
"sha256",
sha1($finalCode) . sha1(md5($finalCode)) . sha1(sha1($finalCode))
);
}
to generate a random code for e-mail verification. Is there code that takes less time to generate random codes. It takes about 1-2 seconds to run this code, but I am looking to shave .7 seconds off this because the rest of the script will take longer.
That's massive overkill. Calling rand() repeatedly isn't going to make the code "more random", nor will using random combinations of SHA and MD5 hashes. None of that complexity improves the verification codes.
An improvement that would make a difference would be to use mt_rand() in preference to rand(). The Mersenne Twister pseudo RNG is much stronger than most default rand() implementations. The PHP documentation hints that rand() may max out at 215 meaning you can only generate 32,768 unique verification codes.
Other than that, a single hash call will do.
sha1(mt_rand())
(You don't even really need to call a hash function as the unpredictability of your codes will come from the random number generator, not the hash function. But hash functions have the nice side effect of creating long hex strings which "look" better.)
If you just want to generate random strings to test that someone has access to an email address, or something like that, I would throw out that code and use something a lot more straightforward. Something like the following would likely do.
function genCode () {
$chars = 'abcdefghijklmnopqrstuvwxyz0123456789';
$returnValue = '';
for ($i = 0; $i < 20; $i++) {
$returnValue .= $chars[mt_rand(0, 35)];
}
return $returnValue;
}
You can hash the return value if you want, but I don't know what the point would be other than to obfuscate the scheme used to come up with the random strings.

Categories