I'm trying to produce a timing attack in PHP and am using PHP 7.1 with the following script:
<?php
$find = "hello";
$length = array_combine(range(1, 10), array_fill(1, 10, 0));
for ($i = 0; $i < 1000000; $i++) {
for ($j = 1; $j <= 10; $j++) {
$testValue = str_repeat('a', $j);
$start = microtime(true);
if ($find === $testValue) {
// Do nothing
}
$end = microtime(true);
$length[$j] += $end - $start;
}
}
arsort($length);
$length = key($length);
var_dump($length . " found");
$found = '';
$alphabet = array_combine(range('a', 'z'), array_fill(1, 26, 0));
for ($len = 0; $len < $length; $len++) {
$currentIteration = $alphabet;
$filler = str_repeat('a', $length - $len - 1);
for ($i = 0; $i < 1000000; $i++) {
foreach ($currentIteration as $letter => $time) {
$testValue = $found . $letter . $filler;
$start = microtime(true);
if ($find === $testValue) {
// Do nothing
}
$end = microtime(true);
$currentIteration[$letter] += $end - $start;
}
}
arsort($currentIteration);
$found .= key($currentIteration);
}
var_dump($found);
This is searching for a word with the following constraints
a-z only
up to 10 characters
The script finds the length of the word without any issue, but the value of the word never comes back as expected with a timing attack.
Is there something I am doing wrong?
The script loops though lengths, correctly identifies the length. It then loops though each letter (a-z) and checks the speed on these. In theory, 'haaaa' should be slightly slower than 'aaaaa' due to the first letter being a h. It then carries on for each of the five letters.
Running gives something like 'brhas' which is clearly wrong (it's different each time, but always wrong).
Is there something I am doing wrong?
I don't think so. I tried your code and I too, like you and the other people who tried in the comments, get completely random results for the second loop. The first one (the length) is mostly reliable, though not 100% of the times. By the way, the $argv[1] trick suggested didn't really improve the consistency of the results, and honestly I don't really see why it should.
Since I was curious I had a look at the PHP 7.1 source code. The string identity function (zend_is_identical) looks like this:
case IS_STRING:
return (Z_STR_P(op1) == Z_STR_P(op2) ||
(Z_STRLEN_P(op1) == Z_STRLEN_P(op2) &&
memcmp(Z_STRVAL_P(op1), Z_STRVAL_P(op2), Z_STRLEN_P(op1)) == 0));
Now it's easy to see why the first timing attack on the length works great. If the length is different then memcmp is never called and therefore it returns a lot faster. The difference is easily noticeable, even without too many iterations.
Once you have the length figured out, in your second loop you are basically trying to attack the underlying memcmp. The problem is that the difference in timing highly depends on:
the implementation of memcmp
the current load and interfering processes
the architecture of the machine.
I recommend this article titled "Benchmarking memcmp for timing attacks" for more detailed explanations. They did a much more precise benchmark and still were not able to get a clear noticeable difference in timing. I'm simply going to quote the conclusion of the article:
In conclusion, it highly depends on the circumstances if a memcmp() is subject to a timing attack.
Related
I am trying to create a random string which will be used as a short reference number. I have spent the last couple of days trying to get this to work but it seems to get to around 32766 records and then it continues with endless duplicates. I need at minimum 200,000 variations.
The code below is a very simple mockup to explain what happens. The code should be syntaxed according to 1a-x1y2z (example) which should give a lot more results than 32k
I have a feeling it may be related to memory but not sure. Any ideas?
<?php
function createReference() {
$num = rand(1, 9);
$alpha = substr(str_shuffle("abcdefghijklmnopqrstuvwxyz"), 0, 1);
$char = '0123456789abcdefghijklmnopqrstuvwxyz';
$charLength = strlen($char);
$rand = '';
for ($i = 0; $i < 6; $i++) {
$rand .= $char[rand(0, $charLength - 1)];
}
return $num . $alpha . "-" . $rand;
}
$codes = [];
for ($i = 1; $i <= 200000; $i++) {
$code = createReference();
while (in_array($code, $codes) == true) {
echo 'Duplicate: ' . $code . '<br />';
$code = createReference();
}
$codes[] = $code;
echo $i . ": " . $code . "<br />";
}
exit;
?>
UPDATE
So I am beginning to wonder if this is not something with our WAMP setup (Bitnami) as our local machine gets to exactly 1024 records before it starts duplicating. By removing 1 character from the string above (instead of 6 in the for loop I make it 5) it gets to exactly 32768 records.
I uploaded the script to our centos server and had no duplicates.
What in our enviroment could cause such a behaviour?
The code looks overly complex to me. Let's assume for the moment you really want to create n unique strings each based on a single random value (rand/mt_rand/something between INT_MIN,INT_MAX).
You can start by decoupling the generation of the random values from the encoding (there seems to be nothing in the code that makes a string dependant on any previous state - excpt for the uniqueness). Comparing integers is quite a bit faster than comparing arbitrary strings.
mt_rand() returns anything between INT_MIN and INT_MAX, using 32bit integers (could be 64bit as well, depends on how php has been compiled) that gives ~232 elements. You want to pick 200k, let's make it 400k, that's ~ a 1/10000 of the value range. It's therefore reasonable to assume everything goes well with the uniqueness...and then check at a later time. and add more values if a collision occured. Again much faster than checking in_array in each iteration of the loop.
Once you have enough values, you can encode/convert them to a format you wish. I don't know whether the <digit><character>-<something> format is mandatory but assume it is not -> base_convert()
<?php
function unqiueRandomValues($n) {
$values = array();
while( count($values) < $n ) {
for($i=count($values);$i<$n; $i++) {
$values[] = mt_rand();
}
$values = array_unique($values);
}
return $values;
}
function createReferences($n) {
return array_map(
function($e) {
return base_convert($e, 10, 36);
},
unqiueRandomValues($n)
);
}
$start = microtime(true);
$references = createReferences(400000);
$end = microtime(true);
echo count($references), ' ', count(array_unique($references)), ' ', $end-$start, ' ', $references[0];
prints e.g. 400000 400000 3.3981630802155 f3plox on my i7-4770. (The $end-$start part is constantly between 3.2 and 3.4)
Using base_convert() there can be strings like li10, which can be quite annoying to decipher if you have to manually type the string.
Supposedly string is:
$a = "abc-def"
if (preg_match("/[^a-z0-9]/i", $a, $m)){
$i = "i stopped scanning '$a' because I found a violation in it while
scanning it from left to right. The violation was: $m[0]";
}
echo $i;
example above: should indicate "-" was the violation.
I would like to know if there is a non-preg_match way of doing this.
I will likely run benchmarks if there is a non-preg_match way of doing this perhaps 1000 or 1 million runs, to see which is faster and more efficient.
In the benchmarks "$a" will be much longer.
To ensure it is not trying to scan the entire "$a" and to ensure it stops soon as it detects a violation within the "$a"
Based on information I have witnessed on the internet, preg_match stops when the first match is found.
UPDATE:
this is based on the answer that was given by "bishop" and will likely to be chosen as the valid answer soon ( shortly ).
i modified it a little bit because i only want it to report the violator character. but i also commented that line out so benchmark can run without entanglements.
let's run a 1 million run based on that answer.
$start_time = microtime(TRUE);
$count = 0;
while ($count < 1000000){
$allowed = 'abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ0123456789';
$input = 'abc-def';
$validLen = strspn($input, $allowed);
if ($validLen < strlen($input)){
#echo "violation at: ". substr($input, $validLen,1);
}
$count = $count + 1;
};
$end_time = microtime(TRUE);
$dif = $end_time - $start_time;
echo $dif;
the result is: 0.606614112854
( 60 percent of a second )
let's do it with the preg_match method.
i hope everything is the same. ( and fair )..
( i say this because there is the ^ character in the preg_match )
$start_time = microtime(TRUE);
$count = 0;
while ($count < 1000000){
$input = 'abc-def';
preg_match("/[^a-z0-9]/i", $input, $m);
#echo "violation at:". $m[0];
$count = $count + 1;
};
$end_time = microtime(TRUE);
$dif = $end_time - $start_time;
echo $dif;
i use "dif" in reference to the terminology "difference".
the "dif" was.. 1.1145210266113
( took 11 percent more than a whole second )
( if it was 1.2 that would mean it is 2x slower than the php way )
You want to find the location of the first character not in the given range, without using regular expressions? You might want strspn or its complement strcspn:
$allowed = 'abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ0123456789';
$input = 'abc-def';
$validLen = strspn($input, $allowed);
if (strlen($input) !== $validLen) {
printf('Input invalid, starting at %s', substr($input, $validLen));
} else {
echo 'Input is valid';
}
Outputs Input invalid, starting at -def. See it live.
strspn (and its complement) are very old, very well specified (POSIX even). The standard implementations are optimized for this task. PHP just leverages that platform implementation, so PHP should be fast, too.
I have numbers coming out of a database (very controlled input) that will have underscores before and after them. They are stored like this:
_51_ _356_
They will not be stored in any other format, but there will be times where I need to get just the numbers out of them. I have chosen to use either
$x = filter_var($myNumber, FILTER_SANITIZE_NUMBER_INT);
or
$y = preg_replace("/[^0-9]/","",$myNumber);
I am not sure of the nuances between the 2 in the backend, but they both produce exactly what I need (I think so, anyway), so it doesn't matter to me which I use. What are the pros and cons of using each of these options? (For example, does one use an array or other weird thing that I might need to know about? One uses way too many resources?)
Well, there isn't big difference in your case. I think preg_replace is more expensive in resource, since it had to parse the regex pattern.
Alternatively you can use trim:
echo trim('_12_', '_');
It will remove the '_' in both side, I think this is the most readable manner to do.
Filters don't use regular expressions, but work in a similar way: iterate a string char-by-char and remove characters that don't match the pattern:
for (i = 0; i < Z_STRLEN_P(value); i++) {
if ((*map)[str[i]]) {
buf[c] = str[i];
++c;
}
}
#http://lxr.php.net/xref/PHP_5_6/ext/filter/sanitizing_filters.c#filter_map_apply
and the FILTER_SANITIZE_NUMBER_INT is defined as [^0-9+-]:
/* strip everything [^0-9+-] */
const unsigned char allowed_list[] = "+-" DIGIT;
filter_map map;
filter_map_init(&map);
filter_map_update(&map, 1, allowed_list);
filter_map_apply(value, &map);
#http://lxr.php.net/xref/PHP_5_6/ext/filter/sanitizing_filters.c#php_filter_number_int
Of course, [^0-9+-] is not a right expression to filter integer numbers, therefore be prepared for surprises:
$x = filter_var("+++123---", FILTER_SANITIZE_NUMBER_INT);
var_dump($x); // WTF?
My suggestion is to stick to regular expressions: they are explicit and far less buggy than filters.
I wanted to try some various methods for this, so set up the following benchmark. It looks like for your case, trim is definitely the best option as it only has to look at the beginning and end of the string instead of each character. Here are my test results on 10,000,000 random integers surrounded by underscores running PHP 7.0.18.
preg_replace: 1.9469740390778 seconds.
filter_var: 1.6922700405121 seconds.
str_replace: 0.72129797935486 seconds.
trim: 0.37275195121765 seconds.
And here is my code if anyone wants to run similar tests:
<?php
$ints = array();//array_fill(0, 10000000, '_1029384756_');
for($i = 0; $i < 10000000; $i++) {
$ints[] = '_'.mt_rand().'_';
}
$start = microtime(true);
foreach($ints as $v) {
preg_replace('/[^0-9]/', '', $v);
}
$end = microtime(true);
echo 'preg_replace in '.($end-$start).' seconds.',PHP_EOL;
$start = microtime(true);
foreach($ints as $v) {
filter_var($v, FILTER_SANITIZE_NUMBER_INT);
}
$end = microtime(true);
echo 'filter_var in '.($end-$start).' seconds.',PHP_EOL;
$start = microtime(true);
foreach($ints as $v) {
str_replace('_', '', $v);
}
$end = microtime(true);
echo 'str_replace in '.($end-$start).' seconds.',PHP_EOL;
$start = microtime(true);
foreach($ints as $v) {
trim($v, '_');
}
$end = microtime(true);
echo 'trim in '.($end-$start).' seconds.',PHP_EOL;
If I have
return recordsAffected > 0;
which would return either true or false, do I need to put return recordsAffected > 0 ? true : false?
No, you do not have to as your code works just fine. You may find some developers recommend doing it because it is clearer to read and understand but that's a matter of personal opinion.
Always code as if the person maintaining your code is a violent psychopath who knows where you live - Martin Golding
You do not need that. Main reason, is that it would be 2 operations instead of one: first for comparison, second for value choosing. I also want to mention, that each conditional operator (even ternary) affects performance.
Short test:
<?php
header('Content-Type: text/plain; charset=utf-8');
$start = microtime(true);
for($i = 1; $j = 1, $i <= 10000000; $i++){
($i == $j);
}
$end = microtime(true);
echo 'Not ternary: ', $end - $start, PHP_EOL;
$start = microtime(true);
for($i = 1; $j = 1, $i <= 10000000; $i++){
($i == $j ? true : false);
}
$end = microtime(true);
echo 'Ternary: ', $end - $start, PHP_EOL;
?>
An it's results.
You certainly do not need to. The intention and semantics of
return recordsAffected > 0;
is perfectly clear. This should hold true for every decent programmer reading your code.
return recordsAffected > 0 ? true : false;
is redundant at best, but I'd go further and call it detrimental. The second snippet does not add anything to the statement but complexity. I'll bet if you did not write code like this all the time (and I believe most decent progammers don't) the second statement will take you at least two passes to grasp the meaning, if not more. When there are two semantically equal solutions you should always stick to the clearest one, which is not necessarily the most explicit one. Nobody would ever write something like
if(recordsAffected > 0 ? true : false)
{
}
I am in doubt what to use:
foreach(){
// .....
if(!in_array($view, $this->_views[$condition]))
array_push($this->_views[$condition], $view);
// ....
}
OR
foreach(){
// .....
array_push($this->_views[$condition], $view);
// ....
}
$this->_views[$condition] = array_unique($this->_views[$condition]);
UPDATE
The goal is to get array of unique values. This can be done by checking every time if value already exists with in_array or add all values each time and in the end use array_unique. So is there any major difference between this two ways?
I think the second approach would be more efficient. In fact, array_unique sorts the array then scans it.
Sorting is done in N log N steps, then scanning takes N steps.
The first approach takes N^2 steps (foreach element scans all N previous elements). On big arrays, there is a very big difference.
Honestly if you're using a small dataset it does not matter which one you use. If your dataset is in the 10000s you'll most definitely want to use a hash map for this sort of thing.
This is assuming the views are a string or something, which it looks like it is.
This is typically O(n) and possibly the fastest way to deal with tracking unique values.
foreach($views as $view)
{
if(!array_key_exists($view,$unique_views))
{
$unique_views[$condition][$view] = true;
}
}
TL;DR: foreach combined with if (!in_array()) is better.
Truthfully you should not really worry about what performs better; in most cases the difference is so small, its negligible (unless you're really doing some big data stuff). I would suggest to go with whatever seems more readable.
If you're interested, check out this script I wrote. It loops each case 100.000 times and both take between 50 and 200 ms.
https://3v4l.org/lkTCF
Note that array_unique() keeps the original keys so to counter that we also have to wrap the result with array_values().
In case the link ever dies:
<?php
$loops = 100000;
$start = microtime(true);
for ($l = 0; $l < $loops; $l++) {
$x = [1,2,3,4,6,7,8,9];
for ($i = 0; $i <= 10; $i++) {
if (!in_array($i, $x)) {
$x[] = $i;
}
}
}
$duration = microtime(true) - $start;
echo "in_array took $duration<br>".PHP_EOL;
$start = microtime(true);
for ($l = 0; $l < $loops; $l++) {
$x = [1,2,3,4,6,7,8,9];
$x = array_values(array_unique(array_merge($x, [0,1,2,3,4,5,6,7,8,9,10])));
}
$duration = microtime(true) - $start;
echo "array_unique took $duration<br>".PHP_EOL;