I did a test and was really bummed to find that a standard foreach loop performed significantly faster than using array methods.
Using foreach:
$std_dev = 0;
$mean = self::calc_stat_mean($array);
$start = microtime(true);
foreach ($array as $value)
{
$std_dev += pow(($value - $mean), 2);
}
echo microtime(true) - $start;
Using array methods:
$mean = self::calc_stat_mean($array);
$start = microtime(true);
$std_dev = array_sum(array_map(function($value) use ($mean) {
return pow(($value - $mean), 2);
}, $array));
echo microtime(true) - $start;
Can someone tell me why this is? I feel the latter method just seems better written and cleaner than the former but the hit in speed isn't worth it.
The difference is so small that it isn't even worth worrying about.
Just pick something that matches your programming style, that you like better personally, and that will work better for your app.
Find other places to optimize... Don't stress over for, for each, and while!
Related
I'm trying to produce a timing attack in PHP and am using PHP 7.1 with the following script:
<?php
$find = "hello";
$length = array_combine(range(1, 10), array_fill(1, 10, 0));
for ($i = 0; $i < 1000000; $i++) {
for ($j = 1; $j <= 10; $j++) {
$testValue = str_repeat('a', $j);
$start = microtime(true);
if ($find === $testValue) {
// Do nothing
}
$end = microtime(true);
$length[$j] += $end - $start;
}
}
arsort($length);
$length = key($length);
var_dump($length . " found");
$found = '';
$alphabet = array_combine(range('a', 'z'), array_fill(1, 26, 0));
for ($len = 0; $len < $length; $len++) {
$currentIteration = $alphabet;
$filler = str_repeat('a', $length - $len - 1);
for ($i = 0; $i < 1000000; $i++) {
foreach ($currentIteration as $letter => $time) {
$testValue = $found . $letter . $filler;
$start = microtime(true);
if ($find === $testValue) {
// Do nothing
}
$end = microtime(true);
$currentIteration[$letter] += $end - $start;
}
}
arsort($currentIteration);
$found .= key($currentIteration);
}
var_dump($found);
This is searching for a word with the following constraints
a-z only
up to 10 characters
The script finds the length of the word without any issue, but the value of the word never comes back as expected with a timing attack.
Is there something I am doing wrong?
The script loops though lengths, correctly identifies the length. It then loops though each letter (a-z) and checks the speed on these. In theory, 'haaaa' should be slightly slower than 'aaaaa' due to the first letter being a h. It then carries on for each of the five letters.
Running gives something like 'brhas' which is clearly wrong (it's different each time, but always wrong).
Is there something I am doing wrong?
I don't think so. I tried your code and I too, like you and the other people who tried in the comments, get completely random results for the second loop. The first one (the length) is mostly reliable, though not 100% of the times. By the way, the $argv[1] trick suggested didn't really improve the consistency of the results, and honestly I don't really see why it should.
Since I was curious I had a look at the PHP 7.1 source code. The string identity function (zend_is_identical) looks like this:
case IS_STRING:
return (Z_STR_P(op1) == Z_STR_P(op2) ||
(Z_STRLEN_P(op1) == Z_STRLEN_P(op2) &&
memcmp(Z_STRVAL_P(op1), Z_STRVAL_P(op2), Z_STRLEN_P(op1)) == 0));
Now it's easy to see why the first timing attack on the length works great. If the length is different then memcmp is never called and therefore it returns a lot faster. The difference is easily noticeable, even without too many iterations.
Once you have the length figured out, in your second loop you are basically trying to attack the underlying memcmp. The problem is that the difference in timing highly depends on:
the implementation of memcmp
the current load and interfering processes
the architecture of the machine.
I recommend this article titled "Benchmarking memcmp for timing attacks" for more detailed explanations. They did a much more precise benchmark and still were not able to get a clear noticeable difference in timing. I'm simply going to quote the conclusion of the article:
In conclusion, it highly depends on the circumstances if a memcmp() is subject to a timing attack.
I have numbers coming out of a database (very controlled input) that will have underscores before and after them. They are stored like this:
_51_ _356_
They will not be stored in any other format, but there will be times where I need to get just the numbers out of them. I have chosen to use either
$x = filter_var($myNumber, FILTER_SANITIZE_NUMBER_INT);
or
$y = preg_replace("/[^0-9]/","",$myNumber);
I am not sure of the nuances between the 2 in the backend, but they both produce exactly what I need (I think so, anyway), so it doesn't matter to me which I use. What are the pros and cons of using each of these options? (For example, does one use an array or other weird thing that I might need to know about? One uses way too many resources?)
Well, there isn't big difference in your case. I think preg_replace is more expensive in resource, since it had to parse the regex pattern.
Alternatively you can use trim:
echo trim('_12_', '_');
It will remove the '_' in both side, I think this is the most readable manner to do.
Filters don't use regular expressions, but work in a similar way: iterate a string char-by-char and remove characters that don't match the pattern:
for (i = 0; i < Z_STRLEN_P(value); i++) {
if ((*map)[str[i]]) {
buf[c] = str[i];
++c;
}
}
#http://lxr.php.net/xref/PHP_5_6/ext/filter/sanitizing_filters.c#filter_map_apply
and the FILTER_SANITIZE_NUMBER_INT is defined as [^0-9+-]:
/* strip everything [^0-9+-] */
const unsigned char allowed_list[] = "+-" DIGIT;
filter_map map;
filter_map_init(&map);
filter_map_update(&map, 1, allowed_list);
filter_map_apply(value, &map);
#http://lxr.php.net/xref/PHP_5_6/ext/filter/sanitizing_filters.c#php_filter_number_int
Of course, [^0-9+-] is not a right expression to filter integer numbers, therefore be prepared for surprises:
$x = filter_var("+++123---", FILTER_SANITIZE_NUMBER_INT);
var_dump($x); // WTF?
My suggestion is to stick to regular expressions: they are explicit and far less buggy than filters.
I wanted to try some various methods for this, so set up the following benchmark. It looks like for your case, trim is definitely the best option as it only has to look at the beginning and end of the string instead of each character. Here are my test results on 10,000,000 random integers surrounded by underscores running PHP 7.0.18.
preg_replace: 1.9469740390778 seconds.
filter_var: 1.6922700405121 seconds.
str_replace: 0.72129797935486 seconds.
trim: 0.37275195121765 seconds.
And here is my code if anyone wants to run similar tests:
<?php
$ints = array();//array_fill(0, 10000000, '_1029384756_');
for($i = 0; $i < 10000000; $i++) {
$ints[] = '_'.mt_rand().'_';
}
$start = microtime(true);
foreach($ints as $v) {
preg_replace('/[^0-9]/', '', $v);
}
$end = microtime(true);
echo 'preg_replace in '.($end-$start).' seconds.',PHP_EOL;
$start = microtime(true);
foreach($ints as $v) {
filter_var($v, FILTER_SANITIZE_NUMBER_INT);
}
$end = microtime(true);
echo 'filter_var in '.($end-$start).' seconds.',PHP_EOL;
$start = microtime(true);
foreach($ints as $v) {
str_replace('_', '', $v);
}
$end = microtime(true);
echo 'str_replace in '.($end-$start).' seconds.',PHP_EOL;
$start = microtime(true);
foreach($ints as $v) {
trim($v, '_');
}
$end = microtime(true);
echo 'trim in '.($end-$start).' seconds.',PHP_EOL;
This question already has answers here:
Performance of FOR vs FOREACH in PHP
(5 answers)
Closed 9 years ago.
Intro
If I loop in PHP, and know how many times I want to iterate, I usually use the for-loop like this:
for($y=0; $y<10; $y++) {
// ...
}
But lately I have seen someone use the foreach-loop:
foreach (range(1, 10) as $y) {
// ...
}
Now, I found the foreach-loop much more readable and thought about adopt this foreach-construct. But on the other side the for-loop is faster as you can see in the following.
Speed Test
I did then some speed tests with the following results.
Foreach:
$latimes = [];
for($x=0; $x<100; $x++) {
$start = microtime(true);
$lcc = 0;
foreach (range(1, 10) as $y) {
$lcc ++;
}
$latimes[$x] = microtime(true) - $start;
}
echo "Test 'foreach':\n";
echo (float) array_sum($latimes)/count($latimes);
Results after I runnt it five times:
Test 'foreach': 2.2873878479004E-5
Test 'foreach': 2.2327899932861E-5
Test 'foreach': 2.9709339141846E-5
Test 'foreach': 2.5603771209717E-5
Test 'foreach': 2.2120475769043E-5
For:
$latimes = [];
for($x=0; $x<100; $x++) {
$start = microtime(true);
$lcc = 0;
for($y=0; $y<10; $y++) {
$lcc++;
}
$latimes[$x] = microtime(true) - $start;
}
echo "Test 'for':\n";
echo (float) array_sum($latimes)/count($latimes);
Results after I runnt it five times:
Test 'for': 1.3396739959717E-5
Test 'for': 1.0268688201904E-5
Test 'for': 1.0945796966553E-5
Test 'for': 1.3313293457031E-5
Test 'for': 1.9807815551758E-5
Question
What I like to know is what would you prefer and why? Which one is more readable for you, and would you prefer readability over speed?
The following code samples are written in php under codeigniter framework’s benchmark library(it just to save my time as i am currently using these :D ), if you are using other languages, consider this as a pseudo code implement in your language way. There shouldn’t be any problem to implement this to any programming language. If you have experience with php/codeigniter, then you are lucky, just copy paste this code and test :) .
$data = array();
for($i=0;$i<500000;$i++){
$data[$i] = rand();
}
$this->benchmark->mark('code_start');
for($i=0;$i<500000;$i++){
;
}
$this->benchmark->mark('code_end');
echo $this->benchmark->elapsed_time('code_start', 'code_end');
echo "<br/>";
$this->benchmark->mark('code_start');
foreach($data as $row){
;
}
$this->benchmark->mark('code_end');
I have got 2 seconds difference between these two loops(one later one ran in around 3 seconds while first one ran in around 5 seconds). So, foreach loop won the ‘battle of for vs foreach’ here. However, you might be thinking, we won’t need such big loop; May be not in all cases, but there are some cases where long loops may be needed, like update big product database from web service/xml/csv etc. And, this example is only to aware you about the performance difference between them.
But, yes, they both have some exclusive use where they should exclusively because of extreme easiness/optimization. Like, if you are working in a loop where, on a certain condition the loop can be terminated. In this case, for loop will do the work with the most flexibility. On the other hand, if you are taking every objects/item from a list array and processing them, in this case, foreach will serve you best.
I am in doubt what to use:
foreach(){
// .....
if(!in_array($view, $this->_views[$condition]))
array_push($this->_views[$condition], $view);
// ....
}
OR
foreach(){
// .....
array_push($this->_views[$condition], $view);
// ....
}
$this->_views[$condition] = array_unique($this->_views[$condition]);
UPDATE
The goal is to get array of unique values. This can be done by checking every time if value already exists with in_array or add all values each time and in the end use array_unique. So is there any major difference between this two ways?
I think the second approach would be more efficient. In fact, array_unique sorts the array then scans it.
Sorting is done in N log N steps, then scanning takes N steps.
The first approach takes N^2 steps (foreach element scans all N previous elements). On big arrays, there is a very big difference.
Honestly if you're using a small dataset it does not matter which one you use. If your dataset is in the 10000s you'll most definitely want to use a hash map for this sort of thing.
This is assuming the views are a string or something, which it looks like it is.
This is typically O(n) and possibly the fastest way to deal with tracking unique values.
foreach($views as $view)
{
if(!array_key_exists($view,$unique_views))
{
$unique_views[$condition][$view] = true;
}
}
TL;DR: foreach combined with if (!in_array()) is better.
Truthfully you should not really worry about what performs better; in most cases the difference is so small, its negligible (unless you're really doing some big data stuff). I would suggest to go with whatever seems more readable.
If you're interested, check out this script I wrote. It loops each case 100.000 times and both take between 50 and 200 ms.
https://3v4l.org/lkTCF
Note that array_unique() keeps the original keys so to counter that we also have to wrap the result with array_values().
In case the link ever dies:
<?php
$loops = 100000;
$start = microtime(true);
for ($l = 0; $l < $loops; $l++) {
$x = [1,2,3,4,6,7,8,9];
for ($i = 0; $i <= 10; $i++) {
if (!in_array($i, $x)) {
$x[] = $i;
}
}
}
$duration = microtime(true) - $start;
echo "in_array took $duration<br>".PHP_EOL;
$start = microtime(true);
for ($l = 0; $l < $loops; $l++) {
$x = [1,2,3,4,6,7,8,9];
$x = array_values(array_unique(array_merge($x, [0,1,2,3,4,5,6,7,8,9,10])));
}
$duration = microtime(true) - $start;
echo "array_unique took $duration<br>".PHP_EOL;
Hey there. Today I wrote a small benchmark script to compare performance of copying variables vs. creating references to them. I was expecting, that creating references to large arrays for example would be significantly slower than copying the whole array. Here is my benchmark code:
<?php
$array = array();
for($i=0; $i<100000; $i++) {
$array[] = mt_rand();
}
function recursiveCopy($array, $count) {
if($count === 1000)
return;
$foo = $array;
recursiveCopy($array, $count+1);
}
function recursiveReference($array, $count) {
if($count === 1000)
return;
$foo = &$array;
recursiveReference($array, $count+1);
}
$time = microtime(1);
recursiveCopy($array, 0);
$copyTime = (microtime(1) - $time);
echo "Took " . $copyTime . "s \n";
$time = microtime(1);
recursiveReference($array, 0);
$referenceTime = (microtime(1) - $time);
echo "Took " . $referenceTime . "s \n";
echo "Reference / Copy: " . ($referenceTime / $copyTime);
The actual result I got was, that recursiveReference took about 20 times (!) as long as recursiveCopy.
Can somebody explain this PHP behaviour?
PHP will very likely implement copy-on-write for its arrays, meaning when you "copy" an array, PHP doesn't do all the work of physically copying the memory until you modify one of the copies and your variables can no longer reference the same internal representation.
Your benchmarking is therefore fundamentally flawed, as your recursiveCopy function doesn't actually copy the object; if it did, you would run out of memory very quickly.
Try this: By assigning to an element of the array you force PHP to actually make a copy. You'll find you run out of memory pretty quickly as none of the copies go out of scope (and aren't garbage collected) until the recursive function reaches its maximum depth.
function recursiveCopy($array, $count) {
if($count === 1000)
return;
$foo = $array;
$foo[9492] = 3; // Force PHP to copy the array
recursiveCopy($array, $count+1);
}
in recursiveReference you're calling recursiveCopy... this doesn't make any sense, in this case you're calling recursiveReference just once. correct your code, rund the benchmark again and come back with your new results.
in addition, i don't think it's useful for a benchmark to do this recursively. a better solution would be to call a function 1000 times in a loop - once with the array directly and one with a reference to that array.
You don't need to (and thus shouldn't) assign or pass variables by reference just for performance reasons. PHP does such optimizations automatically.
The test you ran is flawed because of these automatic optimizations. In ran the following test instead:
<?php
for($i=0; $i<100000; $i++) {
$array[] = mt_rand();
}
$time = microtime(1);
for($i=0; $i<1000; $i++) {
$copy = $array;
unset($copy);
}
$duration = microtime(1) - $time;
echo "Normal Assignment and don't write: $duration<br />\n";
$time = microtime(1);
for($i=0; $i<1000; $i++) {
$copy =& $array;
unset($copy);
}
$duration = microtime(1) - $time;
echo "Assignment by Reference and don't write: $duration<br />\n";
$time = microtime(1);
for($i=0; $i<1000; $i++) {
$copy = $array;
$copy[0] = 0;
unset($copy);
}
$duration = microtime(1) - $time;
echo "Normal Assignment and write: $duration<br />\n";
$time = microtime(1);
for($i=0; $i<1000; $i++) {
$copy =& $array;
$copy[0] = 0;
unset($copy);
}
$duration = microtime(1) - $time;
echo "Assignment by Reference and write: $duration<br />\n";
?>
This was the output:
//Normal Assignment without write: 0.00023698806762695
//Assignment by Reference without write: 0.00023508071899414
//Normal Assignment with write: 21.302103042603
//Assignment by Reference with write: 0.00030708312988281
As you can see there is no significant performance difference in assigning by reference until you actually write to the copy, i.e. when there is also a functional difference.
Generally speaking in PHP, calling by reference is not something you'd do for performance reasons; it's something you'd do for functional reasons - ie because you actually want the referenced variable to be updated.
If you don't have a functional reason for calling by reference then you should stick with regular parameter passing, because PHP handles things perfectly efficiently that way.
(that said, as others have pointed out, your example code isn't exactly doing what you think it is anyway ;))
In recursiveReference() function you call recursiveCopy() function. It it what you really intended to do?
You do nothing with $foo variable - probably it was supposed to be used in further method call?
Passing variable by reference should generally save stack memory in case of passing large objects.
recursiveReference is calling recursiveCopy.
Not that that would necessarily harm performance, but that's probably not what you're trying to do.
Not sure why performance is slower, but it doesn't reflect the measurement you're trying to make.