strcmp - what is means "Binary safe string comparison"? This compare is safe for the timing attack?
If no, how can I compare two strings for preventing the timing attack? Compare hashes of the strings is enough? Or I must use some library (or own code) that gives constant time for the compare?
Here writes that the timing attack can be used in the web. But can be this type of an attack exists in the real world? Or this attack can be used only for a small type of an attacker (like government) so this protection through the web is excess?
"binary safe" means that any bytes can be safely compared with strcmp, not just valid characters in some character set. A quick test confirms that strcmp is not safe against timing attacks:
$nchars = 1000;
$s1 = str_repeat('a', $nchars + 1);
$s2 = str_repeat('a', $nchars) . 'b';
$s3 = 'b' . str_repeat('a', $nchars);
$times = 100000;
$start = microtime(true);
for ($i = 0; $i < $times; $i++) {
strcmp($s1, $s2);
}
$timeForSameAtStart = microtime(true) - $start;
$start = microtime(true);
for ($i = 0; $i < $times; $i++) {
strcmp($s1, $s3);
}
$timeForSameAtEnd = microtime(true) - $start;
printf("'b' at the end: %.4f\n'b' at the front: %.4f\n", $timeForSameAtStart, $timeForSameAtEnd);
For me this prints something like 'b' at the end: 0.0634 'b' at the front: 0.0287.
Many other string-based functions in PHP likely suffer from similar issues. Working around this is tricky, especially in PHP where you don't actually know what a lot of functions are really doing at the physical level.
One possible tactic is just sticking a random wait time in your code before you return the answer to the caller/potential attacker. Even better, measure how long it took to check the input data (e.g., with microtime), and then wait a random time minus that amount of time. This is not 100% secure, but it makes attacking the system MUCH harder because, at a minimum, an attacker will have to try each input many times in order to filter out the randomness.
The problem with strcmp is, that it depends on implementation. If it binarily compares each byte of strings until it reaches difference or end of either strings, then it is vulnerable to timing attack.
Now how about hashing?
I have found this Security question and i belive it has the correct answer for you:
https://security.stackexchange.com/a/46215
Timing attack is a myth.
I explain.
The time that it takes to validates a text, between one similar versus other different is around a fraction of second, let's say +/- 0.1 second (exaggerated!).
However, the time that it takes an attacker to measure this time is:
delay of the network + 0.1 seconds + delay of the system (may be its busy doing some other task) + other delays.
So no, its not possible, even for a local system (lag zero), the result of interval of time is always unclear.
In a test, let's say the difference between one method and another is 1us.
So, if we test it and the difference is 1us, then we could guess part of the number.
But what if there is another factor, for example, the network, the cpu usage at the moment, the cpu cycle of the moment and such.
Even if we excluded the network, we have that most operating systems are multi-tasking, so the test must be done in a system with a single-tasking operating system or a system running a single task, and that is not something that you see in the wild. Even embedded systems run multiple threads at the same time.
But let's say we run locally (not network) and we are doing a drill-run in a computer that only runs a single task, our task. But we have another problem, modern CPUs don't run at a constant cycle, they vary depending on the usage (, temperature and other factors.
So, it is only possible if:
it is executed locally and there is no other factor.
it runs as a single task and no other task is running on the server.
the cpu runs constantly.
i.e. it is ABSURD.
it is the test.
<?php
$text='123456789012345678901234567890123456789012345678901234567890123456789012345678901234';
$compare1='12345678901234567890123456789012345678901234567890123456789012345678901234567890123x';
$compare2='2222222222222222222222222222222222222222222222222222222222222222222222222222222222222';
$a1=microtime(true);
for($i=0;$i<100000;$i++) {
if($compare1===$text) {
// do something
}
}
$a2=microtime(true);
var_dump($a2-$a1);
$a1=microtime(true);
for($i=0;$i<100000;$i++) {
if($compare2===$text) {
// do something
}
}
$a2=microtime(true);
var_dump($a2-$a1);
It took me 5 minutes to invalidate this hypothesis.
What is tested:
it tests a 512bit text and it compares with two tests and compares the times.
This test is done to prove the hypothesis so it forces a no-real situation where the first text compared is almost the same as the first test (excluding the last character).
It also excludes latencies and other operations.
(why 512bits, most passwords are encrypted in 128 and 256bits, 512bits is what we can call it safe)
And it is the result.
one round:
0.021588087081909
0.021672010421753 (long time)
another run:
0.021767854690552
0.022729873657227 (long time)
and another run:
0.021697998046875 (long time)
0.021611213684082
and again
0.021565914154053 (long time)
0.020948171615601
and again
0.021995067596436
0.0224769115448 (long time)
So, even when the test is forced to validate the point, it fails.
i.e.
you can't find a trend when one of the variables is unknown and this factor compromises the whole test. I can test it 1 million times and the result will be the same. And this test, in particular, avoids any variable such as latency, other processes, access to the database, etc.
Related
From Wikipedia
In cryptography, a timing attack is a side channel attack in which the
attacker attempts to compromise a cryptosystem by analyzing the time
taken to execute cryptographic algorithms.
Actually, to prevent timing attacks, I'm using the following function taken from this answer:
function timingSafeCompare($safe, $user) {
// Prevent issues if string length is 0
$safe .= chr(0);
$user .= chr(0);
$safeLen = strlen($safe);
$userLen = strlen($user);
// Set the result to the difference between the lengths
$result = $safeLen - $userLen;
// Note that we ALWAYS iterate over the user-supplied length
// This is to prevent leaking length information
for ($i = 0; $i < $userLen; $i++) {
// Using % here is a trick to prevent notices
// It's safe, since if the lengths are different
// $result is already non-0
$result |= (ord($safe[$i % $safeLen]) ^ ord($user[$i]));
}
// They are only identical strings if $result is exactly 0...
return $result === 0;
}
But I was thinking if is possible prevent this kind of attack using a random sleep like
function timingSafeCompare($a,$b) {
sleep(rand(0,100));
if ($a === $b) {
return true;
} else {
return false;
}
}
Or maybe augmenting the randomness of sleep
sleep(rand(1,10)+rand(1,10)+rand(1,10)+rand(1,10));
This kind of approach can totally prevent timing attacks? Or just make the work harder?
This kind of approach can totally prevent timing attacks? Or just make the work harder?
Neither. It doesn't prevent timing attacks, nor does it make them any more difficult at all.
To understand why, look at the docs for sleep. Specifically, the meaning of the first parameter:
Halt time in seconds.
So your app takes 0.3 seconds to respond without sleep. With sleep it takes either 0.3, 1.3, 2.3, etc...
So really, to get the part we care about (the timing difference), we just need to chop off the integer part:
$real_time = $time - floor($time);
But let's go a step further. Let's say that you randomly sleep using usleep. That's a lot more granular. That's sleeping in microseconds.
Well, the measurements are being made in the 15-50 nanosecond scale. So that sleep is still about 100 times less granular than the measurements being made. So we can average off to the single microsecond:
$microseconds = $time * 1000000;
$real_microseconds = $microseconds - floor($microseconds);
And still have meaningful data.
You could go further and use time_nanosleep which can sleep to nanosecond scale precision.
Then you could start fuddling with the numbers.
But the data is still there. The beauty of randomness is that you can just average it out:
$x = 15 + rand(1, 10000);
Run that enough times and you'll get a nice pretty graph. You'll tell that there are about 10000 different numbers, so you can then average away the randomness and deduce the "private" 15.
Because well-behaved randomness is unbiased, it's pretty easy to detect statistically over a large enough sample.
So the question I would ask is:
Why bother with sleep-like hacks when you can fix the problem correctly?
Anthony Ferrara answered this question in his blog post, It's All About Time. I highly recommend this article.
Many people, when they hear about timing attacks, think "Well, I'll just add a random delay! That'll work!". And it doesn't.
This is fine for a single request if the only side channel observable by the attacker is the response time.
However, if an attacker makes enough requests this random delay could average out as noted in #Scott's answer citing ircmaxell's blog post:
So if we needed to run 49,000 tests to get an accuracy of 15ns [without a random delay], then we would need perhaps 100,000 or 1,000,000 tests for the same accuracy with a random delay. Or perhaps 100,000,000. But the data is still there.
As an example, let's estimate the number of requests a timing attack would need to get a valid 160 bit Session ID like PHP at 6 bits per character which gives a length of 27 characters. Assume, like the linked answer that an attack can only be done on one user at once (as they are storing the user to lookup in the cookie).
Taking the very best case from the blog post, 100,000, the number of permutations would be 100,000 * 2^6 * 27.
On average, the attacker will find the value halfway through the number of permutations.
This gives the number of requests needed to discover the Session ID from a timing attack to be 86,400,000. This is compared to 42,336,000 requests without your proposed timing protection (assuming 15ns accuracy like the blog post).
In the blog post, taking the longest length tested, 14, took 0.01171 seconds on average, which means 86,400,000 would take 1,011,744 seconds which equates to 11 days 17 hours 2 minutes 24 seconds.
Could a random sleep prevent timing attacks?
This depends on the context in which your random sleep is used, and the bit strength of the string that it is protecting. If it is for "keep me logged in" functionality which is the context in the linked question, then it could be worth an attacker spending 11 days to use the timing attack to brute force a value. However, this is assuming perfect conditions (i.e. fairly consistent response times from your application for each string position tested and no resetting or rollover of IDs). Also, these type of activity from an attacker will create a lot of noise and it is likely they will be spotted via IDS and IPS.
It can't entirely prevent them, but it can make them more difficult for an attacker to execute. It would be much easier and better to use something like hash-equals which would prevent timing attacks entirely assuming the string lengths are equal.
Your proposed code
function timingSafeCompare($a,$b) {
sleep(rand(0,100));
if ($a === $b) {
return true;
} else {
return false;
}
}
Note that the PHP rand function is not cryptographically secure:
Caution
This function does not generate cryptographically secure values, and should not be used for cryptographic purposes. If you need a cryptographically secure value, consider using openssl_random_pseudo_bytes() instead.
This means that in theory an attacker could predict what rand was going to generate and then use this information to determine whether the response time delay from your application was due to random sleep or not.
The best way to approach security is to assume that the attacker knows your source code - the only things secret from the attacker should be things like keys and passwords - assume that they know the algorithms and function used. If you can still say your system is secure even though an attacker knows exactly how it works, you will be most of the way there. Functions like rand are usually set to seed with the current time of day, so an attacker can just make sure their system clock is set to the same as your server and then make requests to validate that their generator is matching yours.
Due to this, it is best to avoid insecure random functions like rand and change your implementation to use openssl_random_pseudo_bytes which will be unpredictable.
Also, as per ircmaxell's comment, sleep is not granular enough as it only accepts an integer to represent the number of seconds. If you are going to try this approach look into time_nanosleep with a random number of nanoseconds.
These pointers should help secure your implementation against this type of timing attack.
This kind of approach can totally prevent timing attacks? Or just make the work harder?
ircmaxell have already answered why this only makes the work harder,
but a solution to prevent timing attacks in PHP in general would be
/**
* execute callback function in constant-time,
* or throw an exception if callback was too slow
*
* #param callable $cb
* #param float $target_time_seconds
* #throws \LogicException if the callback was too slow
* #return whatever $cb returns.
*/
function execute_in_constant_time(callable $cb, float $target_time_seconds = 0.01)
{
$start_time = microtime(true);
$ret = ($cb)();
$success = time_sleep_until($start_time + $target_time_seconds);
if ($success) {
return $ret;
}
// dammit!
$time_used = microtime(true) - $start_time;
throw new \LogicException("callback function was too slow! time expired! target_time_seconds: {$target_time_seconds} actual time used: {$time_used}");
}
using that approach, your code could be
function timingSafeCompare($a,$b, float $target_time_seconds = 0.01) {
return execute_in_constant_time(fn() => $a === $b, $target_time_seconds);
}
downside is that you should pick a number with a large margin, meaning relatively much time is lost sleeping.. fwiw on my laptop i had to use 0.2 (200 milliseconds) to compare 2x exactly-1-GiB strings, with a Core i7-8565U (a weird 2018 mid-range laptop cpu i've never heard of)
and this loop:
ini_set("memory_limit", "-1");
$s1 = "a";
$s2 = "a";
$append = str_repeat("a",100*1024);
try {
for (;;) {
$res = timingSafeCompare($s1, $s2, 0.01);
$s1 .= $append;
$s2 .= $append;
}
} catch (\Throwable $e) {
var_dump(strlen($s1));
}
craps out at about 65 megabytes/int(65126401)
(but how often do you need to constant-time-compare strings above 65MB? i imagine it's not often)
you might think "then the attacker could send a HUGE string to compare, and check how long it takes for the exception to be thrown" but i don't think that would work, === starts by checking if both strings have the same length, and short-circuits if they have different lengths, such an attack should only work if the attacker can set the length for both strings to be large enough to timeout
today we have the native hash_equals() function to compare strings that have exactly the same length, but hash_equals() will not protect you against strings of different length, while the function above will.
I have this code that works well, just inquiring to see if there is a better, shorter way to do it:
$sString = $_GET["s"];
if (ctype_digit($sString) == true) {
if (strlen($sString) == 5){
$sString = ltrim($sString, '0');
}
}
The purpose of this code is to remove the leading zeros from a search query before passing it to a search engine if it is 5 digits and. The exact scenario is A) All digits, B) Length equals 5, and C) It has leading zeros.
Better/Shorter means in execution time...
$sString = $_GET["s"];
if (preg_match('/^\d{5}$/', $sString)) {
$sString = ltrim($sString, '0');
}
This is shorter (and makes some assumptions you're not clear about in your question). However, shorter is not always better. Readability is sacrificed here, as well as some performance I guess (which will be negligable in a single execution, could be "costly" when run in tight loops though) etc. You ask for a "better, shorter way to do it"; then you need to define "better" and "shorter". Shorter in terms of lines/characters of code? Shorter in execution time? Better as in more readable? Better as in "most best practices" applied?
Edit:
So you've added the following:
The purpose of this code is to remove the leading zeros from a search
query before passing it to a search engine if it is 5 digits and. The
exact scenario is A) All digits, B) Length equals 5, and C) It has
leading zeros.
A) Ok, that clears things up (You sure characters like '+' or '-' won't be required? I assume these 'inputs' will be things like (5 digit) invoicenumbers, productnumbers etc.?)
B) What to do on inputs of length 1, 2, 3, 4, 6, 7, ... 28? 97? Nothing? Keep input intact?
Better/Shorter means in execution time...
Could you explain why you'd want to "optimize" this tiny piece of code? Unless you run this thousands of times in a tight loop the effects of optimizing this will be negligible. (Mumbles something about premature, optimization, root, evil). What is it that you're hoping to "gain" in "optimizing" this piece of code?
I haven't profiled/measured it yet, but my (educated) guess is that my preg_match is more 'expensive' in terms of execution time than your original code. It is "shorter" in terms of code though (but see above about sacrificing readability etc.).
Long story short: this code is not worth optimizing*; any I/O, database queries etc. will "cost" a hundred, maybe even thousands of times more. Go optimize that first
* Unless you have "optimized" everything else as far as possible (and even then, a database query might be more efficient when written in another way so the execution plan is mor eefficient or I/O might be saved using (more) caching etc. etc.). The fraftion of a millisecond (at best) you're about to save optimizing this code better be worth it! And even then you'd probably need to consider switching to better hardware, another platform or other programming language for example.
Edit 2:
I have quick'n'dirty "profiled" your code:
$start = microtime(true);
for ($i=0; $i<1000000;$i++) {
$sString = '01234';
if (ctype_digit($sString) == true) {
if (strlen($sString) == 5){
$sString = ltrim($sString, '0');
}
}
}
echo microtime(true) - $start;
Output: 0.806390047073
So, that's 0.8 seconds for 1 million(!) iterations. My code, "profiled" the same way:
$start = microtime(true);
for ($i=0; $i<1000000;$i++) {
$sString = '01234';
if (preg_match('/^\d{5}$/', $sString)) {
$sString = ltrim($sString, '0');
}
}
echo microtime(true) - $start;
Output: 1.09024000168
So, it's slower. As I guessed/predicted.
Note my explicitly mentioning "quick'n'dirty": for good/accurate profiling you need better tools and if you do use a poor-man's profiling method like I demonstrated above then at least make sure you average your numbers over a few dozen runs etc. to make sure your numbers are consistent and reliable.
But, either solution takes, worst case, less than 0,0000011 to run. If this code runs "Tens of Thousands of times a day" (assuming 100.000 times) you'll save exactly 0.11 seconds over an entire day IF you got it down to 0! If this 0.11 seconds is a problem you have a bigger problem at hand, not these few lines of code. It's just not worth "optimizing" (hell, it's not even worth discussing; the time you and I have taken going back-and-forth over this will not be "earned back" in at least 50 years).
Lesson learned here:
Measure / Profile before optimizing!
A little shorter:
if (ctype_digit($sString) && strlen($sString) == 5) {
$sString = ltrim($sString, '0');
}
It also matters if you want "digits" 0-9 or a "numeric value", which may be -1, +0123.45e6, 0xf4c3b00c, 0b10100111001 etc.. in which case use is_numeric.
I guess you could also just do this to remove 0s:
$sString = (int)$sString;
While logging some data using microtime() (using PHP 5), I encountered some values that seemed slightly out of phase in respect to the timestamp of my log file, so I just tried to compare the output of time() and microtime() with a simple script (usleep is just here in order to limit the data output):
<?php
for($i = 0; $i < 500; $i++) {
$microtime = microtime();
$time = time();
list($usec, $sec) = explode(" ", $microtime);
if ((int)$sec > $time) {
echo $time . ' : ' . $microtime . '<br>';
}
usleep(50000);
}
?>
Now, as $microtime is declared before $time, I expect it to be smaller, and nothing should ever be output; however, this obviously is not the case, and every now and then, $time is smaller than the seconds returned from microtime(), as in this example (truncated) output:
1344536674 : 0.15545100 1344536675
1344536675 : 0.15553900 1344536676
1344536676 : 0.15961000 1344536677
1344536677 : 0.16758900 1344536678
Now, this is just a small gap; however, I have observed some series where the difference is (quite) more than a second... so, how is this possible?
If you look at the implementations of time and microtime, you see they're radically different:
time just calls the C time function.
microtime has two implementations: If the C function gettimeofday is available (which it should be on a Linux system), it is called straight-forward. Otherwise they pull of some acrobatics to use rusage to calculate the time.
Since the C time call is only precise up to a second, it may also intentionally use a low-fidelity time source.
Furthermore, on modern x86_64 systems, both C functions can be implemented without a system call by looking into certain CPU registers. If you're on a multi-core system, these registers may not exactly match across cores, and that could be a reason.
Another potential reason for the discrepancies is that NTPd(a time-keeping daemon) or some other user-space process is changing the clock. Normally, these kinds of effects should be avoided by adjtime.
All of these rationales are pretty esoteric. To further debug the problem, you should:
Determine OS and CPU architecture (hint: and tell us!)
Try to force the process on one core
Stop any programs that are (or may be) adjusting the system time.
This could be due to the fact that microtime() uses floating point numbers, and therefore rounding errors may occur.
You can specify floating point numbers' precision in php.ini
after several days of research and discussion i came up with this method to gather entropy from visitors (u can see the history of my research here)
when a user visits i run this code:
$entropy=sha1(microtime().$pepper.$_SERVER['REMOTE_ADDR'].$_SERVER['REMOTE_PORT'].
$_SERVER['HTTP_USER_AGENT'].serialize($_POST).serialize($_GET).serialize($_COOKIE));
note: pepper is a per site/setup random string set by hand.
then i execute the following (My)SQL query:
$query="update `crypto` set `value`=sha1(concat(`value`, '$entropy')) where name='entropy'";
that means we combine the entropy of the visitor's request with the others' gathered already.
that's all.
then when we want to generate random numbers we combine the gathered entropy with the output:
$query="select `value` from `crypto` where `name`='entropy'";
//...
extract(unpack('Nrandom', pack('H*', sha1(mt_rand(0, 0x7FFFFFFF).$entropy.microtime()))));
note: the last line is a part of a modified version of the crypt_rand function of the phpseclib.
please tell me your opinion about the scheme and other ideas/info regarding entropy gathering/random number generation.
ps: i know about randomness sources like /dev/urandom.
this system is just an auxiliary system or (when we don't have (access to) these sources) a fallback scheme.
In the best scenario, your biggest danger is a local user disclosure of information exploit. In the worst scenario, the whole world can predict your data. Any user that has access to the same resources you do: the same log files, the same network devices, the same border gateway, or the same line that runs between you and your remote connections allows them to sniff your traffic by unwinding your random number generator.
How would they do it? Why, basic application of information theory and a bit of knowledge of cryptography, of course!
You don't have a wrong idea, though! Seeding your PRNG with real sources of randomness is generally quite useful to prevent the above attacks from happening. For example, this same level of attack can be exploited by someone that understands how /dev/random gets populated on a per-system basis if the system has low entropy or its sources of randomness are reproducible.
If you can sufficiently secure the processes that seed your pool of entropy (for example, by gathering data from multiple sources over secure lines), the likelihood that someone is able to listen in becomes smaller and smaller as you get closer and closer to the desirable cryptographic qualities of a one-time pad.
In other words, don't do this in PHP, using a single source of randomness fed into a single Mersenne twister. Do it properly, by reading from your best, system-specific alternative to /dev/random, seeding its entropy pool from as many secure, distinct sources of "true" randomness as possible. I understand you've stated that these sources of randomness are inaccessible, but this notion is strange when similar functions are afforded to all major operating systems. So, I suppose I find the concept of an "auxiliary system" in this context to be dubious.
This will still be vulnerable to an attack by a local user cognizant of your sources of entropy, but securing the machine and increasing the true entropy within /dev/random will make it far more difficult for them to do their dirty work short of a man-in-the-middle attack.
As for cases where /dev/random is indeed accessible, you can seed it fairly easily:
Look at what options exist on your system for using /dev/hw_random
Embrace rngd (or a good alternative) for defining your sources of randomness
Use rng-tools for inspecting and improving your randomness profile
And finally, if you need a good, strong source of randomness, consider investing in more specialized hardware.
Best of luck in securing your application.
PS: You may want to give questions like this a spin at Security.SE and Cryptography.SE in the future!
Use Random.Org
If you need truly random numbers, use random.org. These numbers are generated via atmospheric noise. Besides library for PHP, it also has a http interface which allows you to get truly random numbers by simple requests:
https://www.random.org/integers/?num=10&min=1&max=6&col=1&base=10&format=plain&rnd=new
This means that you can simply retrieve the real random numbers in PHP without any additional PECL exension on the server.
If you don't like other users to be able to "steal" your random numbers (as MrGomez' argues), just use https with a certificate checking. Here follows an example with https certificate checking:
$url = "https://www.random.org/integers/?num=10&min=1&max=6&col=1&base=10&format=plain&rnd=new";
$ch = curl_init();
curl_setopt($ch, CURLOPT_URL, $url);
curl_setopt($ch, CURLOPT_RETURNTRANSFER, true);
curl_setopt($ch, CURLOPT_SSL_VERIFYPEER, true);
curl_setopt($ch, CURLOPT_SSL_VERIFYHOST, 2);
$response = curl_exec($ch);
if ($response === FALSE)
echo "http request failed: " . curl_error($ch);
else
echo $response;
curl_close($ch);
If you need more information on how to create https requests:
Make a HTTPS request through PHP and get response
http://unitstep.net/blog/2009/05/05/using-curl-in-php-to-access-https-ssltls-protected-sites/
More on security
Again, some might argue that if the attacker queries random.org at the same time as you, he might get the same numbers and predict.. I don't know if random.org would even work this way, but if you are really concerned, you may lessen the chance by fooling the attacker with dummy request which you throw out, or use only a certain part of the random numbers you get.
As MrGomez notes in his comment, this shall not be considered as an ultimate solution to security, but only as one of possible sources of entropy.
Performance
Of course, if you need a blitz latency then doing one random.org request per one client request might not be best idea... but what about just doing one bigger request to pre-cache the random numbers like every 5 minutes?
To come to the point, as far as i know there is no way to generate entrophy inside a PHP script, sorry for this non-answer. Even if you look at well etablished scripts like phppass, you will see, that their fallback system cannot do some magic.
The question is, whether you should try it anyway or not. Since you want to publish your system under GPL, you propably don't know in what scenario it will be used. In my opinion it's best then to require a random source, or to fail fast (die with an appropriate error message), so a developer who wants to use your system, knows immediately, that there is a problem.
To read from the random source, you could call the mcrypt_create_iv() function...
$randomBinaryString = mcrypt_create_iv($length, MCRYPT_DEV_URANDOM);
...this function reads from the random pool of the operating system. Since PHP 5.3 it does it on Windows servers as well, so you can leave it to PHP to handle the random source.
If you have access to /dev/urandom you can use this:
function getRandData($length = 1024) {
$randf = fopen('/dev/urandom', 'r');
$data = fread($randf, $length);
fclose($randf);
return $data;
}
UPDATE:
of course you should have some backup in case opening the device fails
should you have access to client side, you can enable mouse movement tracking - this is what true crypt is using for extra level of entropy.
as i have said before, my rand function is a modified version of phpseclib's crypt_random function.
u could see it in the link given on my first post. at least the author of the phpseclib cryptographic library confirmed it; not enough for ordinary apps? i don't speak of extreme/theoretical security, just speak about practical security to the extent really needed and at the same time 'easily'/'sufficiently low cost' available for almost all of the ordinary applications on the web.
phpseclib's crypt_random effectively and silently falls back to the mt_rand (which u should know is really weak) in the worst case (no openssl_random_pseudo_bytes or urandom available), but my function uses a much more secure scheme in such cases. it's just a fall back to a scheme that brute-forcing/predicting its output is much harder and (should be) in practice sufficient for all ordinary apps/sites. it uses possible (in practice very likely and hard to predict/circumvent) extra entropy that is gathered over time which quickly becomes almost impossible to know for outsiders. it adds this possible entropy to the mt_rand's output (and also to the output of other sources: urandom, openssl_random_pseudo_bytes, mcrypt_create_iv). if u are informed u should know, this entropy can be added but not subtracted. in the (almost surely really rare) worst case, that extra entropy would be 0 or some too tiny amount. in the mediocre case, which i think is almost all of the cases, it would be even more than practically necessary, i think. (i have had vast cryptography studies, so when i say i think, it is based on a much more informed and scientific analysis than ordinary programmers).
see the full code of my modified crypt_random:
function crypt_random($min = 0, $max = 0x7FFFFFFF)
{
if ($min == $max) {
return $min;
}
global $entropy;
if (function_exists('openssl_random_pseudo_bytes')) {
// openssl_random_pseudo_bytes() is slow on windows per the following:
// http://stackoverflow.com/questions/1940168/openssl-random-pseudo-bytes-is-slow-php
if ((PHP_OS & "\xDF\xDF\xDF") !== 'WIN') { // PHP_OS & "\xDF\xDF\xDF" == strtoupper(substr(PHP_OS, 0, 3)), but a lot faster
extract(unpack('Nrandom', pack('H*', sha1(openssl_random_pseudo_bytes(4).$entropy.microtime()))));
return abs($random) % ($max - $min) + $min;
}
}
// see http://en.wikipedia.org/wiki//dev/random
static $urandom = true;
if ($urandom === true) {
// Warning's will be output unles the error suppression operator is used. Errors such as
// "open_basedir restriction in effect", "Permission denied", "No such file or directory", etc.
$urandom = #fopen('/dev/urandom', 'rb');
}
if (!is_bool($urandom)) {
extract(unpack('Nrandom', pack('H*', sha1(fread($urandom, 4).$entropy.microtime()))));
// say $min = 0 and $max = 3. if we didn't do abs() then we could have stuff like this:
// -4 % 3 + 0 = -1, even though -1 < $min
return abs($random) % ($max - $min) + $min;
}
if(function_exists('mcrypt_create_iv') and version_compare(PHP_VERSION, '5.3.0', '>=')) {
#$tmp16=mcrypt_create_iv(4, MCRYPT_DEV_URANDOM);
if($tmp16!==false) {
extract(unpack('Nrandom', pack('H*', sha1($tmp16.$entropy.microtime()))));
return abs($random) % ($max - $min) + $min;
}
}
/* Prior to PHP 4.2.0, mt_srand() had to be called before mt_rand() could be called.
Prior to PHP 5.2.6, mt_rand()'s automatic seeding was subpar, as elaborated here:
http://www.suspekt.org/2008/08/17/mt_srand-and-not-so-random-numbers/
The seeding routine is pretty much ripped from PHP's own internal GENERATE_SEED() macro:
http://svn.php.net/viewvc/php/php-src/tags/php_5_3_2/ext/standard/php_rand.h?view=markup */
static $seeded;
if (!isset($seeded) and version_compare(PHP_VERSION, '5.2.5', '<=')) {
$seeded = true;
mt_srand(fmod(time() * getmypid(), 0x7FFFFFFF) ^ fmod(1000000 * lcg_value(), 0x7FFFFFFF));
}
extract(unpack('Nrandom', pack('H*', sha1(mt_rand(0, 0x7FFFFFFF).$entropy.microtime()))));
return abs($random) % ($max - $min) + $min;
}
$entropy contains my extra entropy which comes from all requests parameters' entropy combined till now + current request's parameters entropy + the entropy of a random string (*) set by hand at the installation time.
*: length: 22, composed of lower and uppercase letters + numbers (more than 128 bits of entropy)
Update 2: Code Review Warning to Everyone: Dont use The code in the original question. It's a security liability. If this code is online anywhere Remove it as it open the whole system, network and database to a malevolent user. Your not only exposing your code but all of your users data.
Do not ever Serialize user inputs. If in your code your already doing it, Stop your server and change your code. This is a great exemple of Not doing crypto by yourself.
Update 1: For real security you need to have UN-guessable randomess in your entropy. A suitable option to add entropy has your Question refer-to is to use the Delta of your script's execution time Not microtime() by itself . Because the Delta Rely on the load of your server. And so is a combination of the hardware environment, temperature, network load, power load, disk access, Cpu usage and voltage fluctuation which together are unpredictable.
Using Time(), timestamp or microtime is a flaw in your implementation.
Script execution Delta Exemple code coming:
#martinstoeckli stated correctly that a Suitable Random generation for crypto is from
mcrypt_create_iv($lengthinbytes, MCRYPT_DEV_URANDOM);
but is outside the requirements of not having a crypto module
In SQL use the RAND() in conjunction with your generated number.
http://www.tutorialspoint.com/mysql/mysql-rand-function.htm
Php offer as well the Rand() function
http://php.net/manual/en/function.rand.php
they wont give you the same number so you could use both.
rn_rand() should be getting used not rand()
Or: Should I optimize my string-operations in PHP? I tried to ask PHP's manual about it, but I didn't get any hints to anything.
PHP already optimises it - variables are assigned using copy-on-write, and objects are passed by reference. In PHP 4 it doesn't, but nobody should be using PHP 4 for new code anyway.
One of the most essential speed optimization techniques in many languages is instance reuse. In that case the speed increase comes from at least 2 factors:
1. Less instantiations means less time spent on construction.
2. The less the amount of memory that the application uses, the less CPU cache misses there probably are.
For applications, where the speed is the #1 priority, there exists a truly tight bottleneck between the CPU and the RAM. One of the reasons for the bottleneck is the latency of the RAM.
The PHP, Ruby, Python, etc., are related to the cache-misses by a fact that even they store at least some (probably all) of the run-time data of the interpreted programs in the RAM.
String instantiation is one of the operations that is done pretty often, in relatively "huge quantities", and it may have a noticeable impact on speed.
Here's a run_test.bash of a measurement experiment:
#!/bin/bash
for i in `seq 1 200`;
do
/usr/bin/time -p -a -o ./measuring_data.rb php5 ./string_instantiation_speedtest.php
done
Here are the ./string_instantiation_speedtest.php and the measurement results:
<?php
// The comments on the
// next 2 lines show arithmetic mean of (user time + sys time) for 200 runs.
$b_instantiate=False; // 0.1624 seconds
$b_instantiate=True; // 0.1676 seconds
// The time consumed by the reference version is about 97% of the
// time consumed by the instantiation version, but a thing to notice is
// that the loop contains at least 1, probably 2, possibly 4,
// string instantiations at the array_push line.
$ar=array();
$s='This is a string.';
$n=10000;
$s_1=NULL;
for($i=0;$i<$n;$i++) {
if($b_instantiate) {
$s_1=''.$s;
} else {
$s_1=&$s;
}
// The rand is for avoiding optimization at storage.
array_push($ar,''.rand(0,9).$s_1);
} // for
echo($ar[rand(0,$n)]."\n");
?>
My conclusion from this experiment and one other experiment that I did with Ruby 1.8 is that it makes sense to pass string values around by reference.
One possible way to allow the "pass-strings-by-reference" to take place at the whole application scope is to consistently create a new string instance, whenever one needs to use a modified version of a string.
To increase locality, therefore speed, one may want to decrease the amount of memory that each of the operands consumes. The following experiment demonstrates the case for string concatenations:
<?php
// The comments on the
// next 2 lines show arithmetic mean of (user time + sys time) for 200 runs.
$b_suboptimal=False; // 0.0611 seconds
$b_suboptimal=True; // 0.0785 seconds
// The time consumed by the optimal version is about 78% of the
// time consumed by the suboptimal version.
//
// The number of concatenations is the same and the resultant
// string is the same, but what differs is the "average" and maximum
// lengths of the tokens that are used for assembling the $s_whole.
$n=1000;
$s_token="This is a string with a Linux line break.\n";
$s_whole='';
if($b_suboptimal) {
for($i=0;$i<$n;$i++) {
$s_whole=$s_whole.$s_token.$i;
} // for
} else {
$i_watershed=(int)round((($n*1.0)/2),0);
$s_part_1='';
$s_part_2='';
for($i=0;$i<$i_watershed;$i++) {
$s_part_1=$s_part_1.$i.$s_token;
} // for
for($i=$i_watershed;$i<$n;$i++) {
$s_part_2=$s_part_2.$i.$s_token;
} // for
$s_whole=$s_part_1.$s_part_2;
} // else
// To circumvent possible optimization one actually "uses" the
// value of the $s_whole.
$file_handle=fopen('./it_might_have_been_a_served_HTML_page.txt','w');
fwrite($file_handle, $s_whole);
fclose($file_handle);
?>
For example, if one assembles HTML pages that contain considerable amount of text, then one might want to think about the order, how different parts of the generated HTML are concated together.
A BSD-licensed PHP implementation and Ruby implementation of the watershed string concatenation algorithm is available. The same algorithm can be (has been by me) generalized to speed up multiplication of arbitrary precision integers.
Arrays and strings have copy-on-write behaviour. They are mutable, but when you assign them to a variable initially that variable will contain the exact same instance of the string or array. Only when you modify the array or string is a copy made.
Example:
$a = array_fill(0, 10000, 42); //Consumes 545744 bytes
$b = $a; // " 48 "
$b[0] = 42; // " 545656 "
$s = str_repeat(' ', 10000); // " 10096 "
$t = $s; // " 48 "
$t[0] = '!'; // " 10048 "
A quick google would seem to suggest that they are mutable, but the preferred practice is to treat them as immutable.
PHP 7.4 used mutable strings:
<?php
$str = "Hello\n";
echo $str;
$str[2] = 'y';
echo $str;
Output:
Hello
Heylo
Test: PHP Sandbox
PHP strings are immutable.
Try this:
$a="string";
echo "<br>$a<br>";
echo str_replace('str','b',$a);
echo "<br>$a";
It echos:
string
bing
string
If a string was mutable, it would have continued to show "bing".