Time it takes to assign a variable vs. assign + add - php

<?php
$a = microtime(true);
$num = 0;
for($i=0;$i<10000000;$i++)
{
$num = $i;
}
$b= microtime(true);
echo $b-$a;
?>
I run this on Ubuntu 12.10 and Apache 2
will give me approx. .50 seconds... when I run an assignment for a million times.. BUT BUT...
the same code, instead of $num = $i ... i go ...
$num = $i + 10; and it now takes almost 1.5 times less time to execute.. around .36 consistently..
How come the simple assignment is taking more, whilst an assignment and adding a 10 over it... takes less time!

I am by no means an expert, but here are my findings:
$s = microtime(true);
for($i=0;$i<100000000;$i++) $tmp = $i;
$t = microtime(true);
for($i=0;$i<100000000;$i++) $tmp = $i+10;
$u = microtime(true);
echo ($t-$s).chr(10).($u-$t);
Results in:
9.9528648853302
9.0821340084076
On the other hand, using a constant value for the assignment test:
$x = 0;
$s = microtime(true);
for($i=0;$i<100000000;$i++) $tmp = $x;
$t = microtime(true);
for($i=0;$i<100000000;$i++) $tmp = $x+10;
$u = microtime(true);
echo ($t-$s).chr(10).($u-$t);
Results in:
6.1365358829498
9.3231790065765
This leads me to believe that the answer has something to do with opcode cacheing. I honestly couldn't tell you what about it is making the difference, but as you can see using a constant value for the assignment makes a huge difference.

This is just an educated guess, based on looking at the latest php source on Github, but I'd say this difference is due to function call overhead in the interpreter source.
$tmp = $i;
compiles to a single opcode ASSIGN !2, !1;, which copies one named variable's value to another named variable. In the source, the key part looks like this:
if (EXPECTED(Z_TYPE_P(variable_ptr) <= IS_BOOL)) {
/* nothing to destroy */
ZVAL_COPY_VALUE(variable_ptr, value);
zendi_zval_copy_ctor(*variable_ptr);
}
$tmp = $i + 10;
compiles to two opcodes ADD ~8 !1, 10; ASSIGN !2, ~8;, which creates a temporary variable ~8 and assigns its value to a named variable. In the source, the key part looks like this:
if (EXPECTED(Z_TYPE_P(variable_ptr) <= IS_BOOL)) {
/* nothing to destroy */
ZVAL_COPY_VALUE(variable_ptr, value);
}
Notice that there's an extra function call to zendi_zval_copy_ctor() in the first case. That function performs some bookkeeping as needed (e.g. if the original variable is a resource, it needs to make sure that resource is not freed until this new variable is gone, etc.). For a primitive type such as a number, there's nothing to do, but the function call itself introduces some overhead, which accumulates over 10 million iterations of your test. You should note that this overhead is normally negligible, because even in 10 million iterations it only accumulated to .14 seconds.
#Kolink's observation about a constant being faster can also be answered in the same function. It includes a check to avoid redundant copying if the new value is the same as the old one:
if (EXPECTED(variable_ptr != value)) {
copy_value:
// the same code that handles `$tmp = $i` above
if (EXPECTED(Z_TYPE_P(variable_ptr) <= IS_BOOL)) {
/* nothing to destroy */
ZVAL_COPY_VALUE(variable_ptr, value);
zendi_zval_copy_ctor(*variable_ptr);
} else {
/* irrelevant to the question */
}
}
So only the first assignment of $tmp = $x copies the value of $x, the following ones see that the value of $tmp would not change and skip the copying, making it faster.

Related

Why is a sorted array slower than a non sorted array in PHP

I have the following script, and I know about the principle "Branch prediction" but it seems that's not the case here.
Why is it faster to process a sorted array than an unsorted array?
It seems to work the other way around.
When I run the following script without the sort($data) the script takes 193.23883700371 seconds to complete.
When I enable the sort($data) line the scripts takes 300.26129794121 seconds to complete.
Why is it so much slower in PHP? I used PHP 5.5 and 5.6.
In PHP 7 the script is faster when the sort() is not commented out.
<?php
$size = 32768;
$data = array_fill(0, $size, null);
for ($i = 0; $i < $size; $i++) {
$data[$i] = rand(0, 255);
}
// Improved performance when disabled
//sort($data);
$total = 0;
$start = microtime(true);
for ($i = 0; $i < 100000; $i++) {
for ($x = 0; $x < $size; $x++) {
if ($data[$x] >= 127) {
$total += $data[$x];
}
}
}
$end = microtime(true);
echo($end - $start);
Based on my comments above the solution is to either find or implement a sort function that moves the values so that memory remains contiguous and gives you the speedup, or push the values from the sorted array into a second array so that the new array has contiguous memory.
Assuming you MEANT to not time the actual sort, since your code doesn't time that action, it's difficult to assess any true performance difference because you've filled the array with random data. This means that one pass might have MANY more values greater than or equal to 127 (and thus running an additional command) then another pass. To really compare the two, fill your array with an identical set of fixed data. Otherwise, you'll never know if the random fill is causing the time differences you're seeing.

What is better to use: in_array or array_unique?

I am in doubt what to use:
foreach(){
// .....
if(!in_array($view, $this->_views[$condition]))
array_push($this->_views[$condition], $view);
// ....
}
OR
foreach(){
// .....
array_push($this->_views[$condition], $view);
// ....
}
$this->_views[$condition] = array_unique($this->_views[$condition]);
UPDATE
The goal is to get array of unique values. This can be done by checking every time if value already exists with in_array or add all values each time and in the end use array_unique. So is there any major difference between this two ways?
I think the second approach would be more efficient. In fact, array_unique sorts the array then scans it.
Sorting is done in N log N steps, then scanning takes N steps.
The first approach takes N^2 steps (foreach element scans all N previous elements). On big arrays, there is a very big difference.
Honestly if you're using a small dataset it does not matter which one you use. If your dataset is in the 10000s you'll most definitely want to use a hash map for this sort of thing.
This is assuming the views are a string or something, which it looks like it is.
This is typically O(n) and possibly the fastest way to deal with tracking unique values.
foreach($views as $view)
{
if(!array_key_exists($view,$unique_views))
{
$unique_views[$condition][$view] = true;
}
}
TL;DR: foreach combined with if (!in_array()) is better.
Truthfully you should not really worry about what performs better; in most cases the difference is so small, its negligible (unless you're really doing some big data stuff). I would suggest to go with whatever seems more readable.
If you're interested, check out this script I wrote. It loops each case 100.000 times and both take between 50 and 200 ms.
https://3v4l.org/lkTCF
Note that array_unique() keeps the original keys so to counter that we also have to wrap the result with array_values().
In case the link ever dies:
<?php
$loops = 100000;
$start = microtime(true);
for ($l = 0; $l < $loops; $l++) {
$x = [1,2,3,4,6,7,8,9];
for ($i = 0; $i <= 10; $i++) {
if (!in_array($i, $x)) {
$x[] = $i;
}
}
}
$duration = microtime(true) - $start;
echo "in_array took $duration<br>".PHP_EOL;
$start = microtime(true);
for ($l = 0; $l < $loops; $l++) {
$x = [1,2,3,4,6,7,8,9];
$x = array_values(array_unique(array_merge($x, [0,1,2,3,4,5,6,7,8,9,10])));
}
$duration = microtime(true) - $start;
echo "array_unique took $duration<br>".PHP_EOL;

A* implementation in PHP validation

This is a code that I got from the site here and I'd like to know whether this implementation of A* is correct. I have looked at it and compare it with the wikipedia page and it seems valid.. The reason why I ask is because in the site it says there is still a bug in this code, I tried finding it but can't find any. I want to change it though so that it takes an origin and destination as input parameter
<?php
class AStarSolver
{
function solve(&$s)
{
include_once('PQueue.class.php');
$o = new PQueue();
$l = array();
$c = array();
$p = array();
$a = $s->getStartIndex();
$z = $s->getGoalIndex();
$d = $s->goalDistance($a);
$n0 = array('g'=>0, 'h'=>$d, 'i'=>$a, 'p'=>NULL, 'f'=>$d);
$o->push($n0, -$d);
$l[$a] = TRUE;
while (! $o->isEmpty())
{
$n = $o->pop();
if ($n['i'] == $z)
{
while ($n)
{
$p[] = $n['i'];
$n = $n['p'];
}
break;
}
foreach ($s->getNeighbors($n['i']) as $j => $w)
{
if ((isset($l[$j]) || isset($c[$j])) && isset($m) && $m['g'] <= $n['g']+$w)
continue;
$d = $s->goalDistance($j);
$m = array('g'=>$n['g']+$w, 'h'=>$d, 'i'=>$j, 'p'=>$n, 'f'=>$n['g']+$w+$d);
if (isset($c[$j]))
unset($c[$j]);
if (! isset($l[$j]))
{
$o->push($m, -$m['f']);
$l[$j] = TRUE;
}
}
$c[$n['i']] = $n;
}
return $p;
}
}
?>
The code to the Pqueue can be found here
The site suggests that the bug might be in the PQueue class.
In PQueue::pop this
$j+1 < $m
is a test whether the heap node at $i has one child (at $j) or two (at $j and $j+1).
But $m here is count($h) only on the first iteration through the loop, since the --$m in the loop condition is evaluated every time.
Move that --$m next to the array_pop where it belongs, and that will be one less bug.
Now for AStarSolver.
The variables are (relative to Wikipedia pseudocode):
$o – open set as priority queue;
$l – open set as map keyed by index;
$c – closed set as map keyed by index;
$n – current node (x);
$m – neighbour node (y) ?;
$j – neighbour node index.
Problems that I see:
$n = $o->pop() isn't followed by unset($l[$n['i']]). Since both $o and $l represent the same set, they should be kept in sync.
According to Wikipedia the closed set is only used if the heuristic is monotone (and I think a distance heuristic is monotone), and in that case, once a node is added to the closed set, it is never visited again. This code seems to implement some other pseudocode, which does remove nodes from the closed set. I think this defeats the purpose of the closed set, and the first condition in the inner loop should be
if (isset($c[$j]) || isset($l[$j]) && isset($m) && $m['g'] <= $n['g']+$w)
Then we can remove the unset($c[$j]).
$m['g'] in this condition should be the g-score of the current neighbour indexed by $j. But $m has whatever value is left over from the previous loop: the node corresponding to $j on a previous iteration.
What we need is a way to find a node and its g-score by node index. We can store the node in the $l array: instead of $l[$j] = TRUE we do $l[$j] = $m and the above condition becomes
if (isset($c[$j]) || isset($l[$j]) && $l[$j]['g'] <= $n['g']+$w)
Now the tricky bit. If the node we just found is not in the open set, we add it there (that's the $o->push and $l[$j] =).
However, if it is in the open set we just found a better path to it, so we must update it. The code doesn't do this and it's tricky because the priority queue doesn't provide a routine for increasing the priority of an element. However, we can rebuild the priority queue completely and the last bit of code in the inner loop becomes
if (! isset($l[$j])) {
   $o->push($m, -$m['f']);
   $l[$j] = $m; // add a new element
} else {
   $l[$j] = $m; // replace existing element
   $o = new PQueue();
   foreach ($l as $m)
      $o->push($m, -$m['f']);
}
This is not terribly efficient, but it's a starting point. Changing an element in a priority queue isn't efficient anyway, because you first have to find it.
Even without these changes the algorithm does find a path, just not the best path. You can see it in the mentioned mazes:
In the crazy maze in the third inner circuit: the taken upper path around is slightly longer than the lower path would have been because of the obstacles on the left.
In the big maze in the upper-right part of the path there's an unnecessary loop up.
Since this was on my mind, I implemented my own version of the algorithm and posted it in an answer to your previous question.

Project Euler || Question 10

I'm attempting to solve Project Euler in PHP and running into a problem with my for loop conditions inside the while loop. Could someone point me towards the right direction? Am I on the right track here?
The problem, btw, is to find the sums of all prime numbers below 2,000,000
Other note: The problem I'm encountering is that it seems to be a memory hog and besides implementing the sieve, I'm not sure how else to approach this. So, I'm wondering if I did something wrong in the implementation.
<?php
// The sum of the primes below 10 is 2 + 3 + 5 + 7 = 17.
// Additional information:
// Sum below 100: 1060
// 1000: 76127
// (for testing)
// Find the sum of all the primes below 2,000,000.
// First, let's set n = 2 mill or the number we wish to find
// the primes under.
$n = 2000000;
// Then, let's set p = 2, the first prime number.
$p = 2;
// Now, let's create a list of all numbers from p to n.
$list = range($p, $n);
// Now the loop for Sieve of Eratosthenes.
// Also, let $i = 0 for a counter.
$i = 0;
while($p*$p < $n)
{
// Strike off all multiples of p less than or equal to n
for($k=0; $k < $n; $k++)
{
if($list[$k] % $p == 0)
{
unset($list[$k]);
}
}
// Re-initialize array
sort ($list);
// Find first number on list after p. Let that equal p.
$i = $i + 1;
$p = $list[$i];
}
echo array_sum($list);
?>
You can make a major optimization to your middle loop.
for($k=0; $k < $n; $k++)
{
if($list[$k] % $p == 0)
{
unset($list[$k]);
}
}
By beginning with 2*p and incrementing by $p instead of by 1. This eliminates the need for divisibility check as well as reducing the total iterations.
for($k=2*$p; $k < $n; $k += $p)
{
if (isset($list[k])) unset($list[$k]); //thanks matchu!
}
The suggestion above to check only odds to begin with (other than 2) is a good idea as well, although since the inner loop never gets off the ground for those cases I don't think its that critical. I also can't help but thinking the unsets are inefficient, tho I'm not 100% sure about that.
Here's my solution, using a 'boolean' array for the primes rather than actually removing the elements. I like using map,filters,reduce and stuff, but i figured id stick close to what you've done and this might be more efficient (although longer) anyway.
$top = 20000000;
$plist = array_fill(2,$top,1);
for ($a = 2 ; $a <= sqrt($top)+1; $a++)
{
if ($plist[$a] == 1)
for ($b = ($a+$a) ; $b <= $top; $b+=$a)
{
$plist[$b] = 0;
}
}
$sum = 0;
foreach ($plist as $k=>$v)
{
$sum += $k*$v;
}
echo $sum;
When I did this for project euler i used python, as I did for most. but someone who used PHP along the same lines as the one I did claimed it ran it 7 seconds (page 2's SekaiAi, for those who can look). I don't really care for his form (putting the body of a for loop into its increment clause!), or the use of globals and the function he has, but the main points are all there. My convenient means of testing PHP runs thru a server on a VMWareFusion local machine so its well slower, can't really comment from experience.
I've got the code to the point where it runs, and passes on small examples (17, for instance). However, it's been 8 or so minutes, and it's still running on my machine. I suspect that this algorithm, though simple, may not be the most effective, since it has to run through a lot of numbers a lot of times. (2 million tests on your first run, 1 million on your next, and they start removing less and less at a time as you go.) It also uses a lot of memory since you're, ya know, storing a list of millions of integers.
Regardless, here's my final copy of your code, with a list of the changes I made and why. I'm not sure that it works for 2,000,000 yet, but we'll see.
EDIT: It hit the right answer! Yay!
Set memory_limit to -1 to allow PHP to take as much memory as it wants for this very special case (very, very bad idea in production scripts!)
In PHP, use % instead of mod
The inner and outer loops can't use the same variable; PHP considers them to have the same scope. Use, maybe, $j for the inner loop.
To avoid having the prime strike itself off in the inner loop, start $j at $i + 1
On the unset, you used $arr instead of $list ;)
You missed a $ on the unset, so PHP interprets $list[j] as $list['j']. Just a typo.
I think that's all I did. I ran it with some progress output, and the highest prime it's reached by now is 599, so I'll let you know how it goes :)
My strategy in Ruby on this problem was just to check if every number under n was prime, looping through 2 and floor(sqrt(n)). It's also probably not an optimal solution, and takes a while to execute, but only about a minute or two. That could be the algorithm, or that could just be Ruby being better at this sort of job than PHP :/
Final code:
<?php
ini_set('memory_limit', -1);
// The sum of the primes below 10 is 2 + 3 + 5 + 7 = 17.
// Additional information:
// Sum below 100: 1060
// 1000: 76127
// (for testing)
// Find the sum of all the primes below 2,000,000.
// First, let's set n = 2 mill or the number we wish to find
// the primes under.
$n = 2000000;
// Then, let's set p = 2, the first prime number.
$p = 2;
// Now, let's create a list of all numbers from p to n.
$list = range($p, $n);
// Now the loop for Sieve of Eratosthenes.
// Also, let $i = 0 for a counter.
$i = 0;
while($p*$p < $n)
{
// Strike off all multiples of p less than or equal to n
for($j=$i+1; $j < $n; $j++)
{
if($list[$j] % $p == 0)
{
unset($list[$j]);
}
}
// Re-initialize array
sort ($list);
// Find first number on list after p. Let that equal p.
$i = $i + 1;
$p = $list[$i];
echo "$i: $p\n";
}
echo array_sum($list);
?>

PHP Performance : Copy vs. Reference

Hey there. Today I wrote a small benchmark script to compare performance of copying variables vs. creating references to them. I was expecting, that creating references to large arrays for example would be significantly slower than copying the whole array. Here is my benchmark code:
<?php
$array = array();
for($i=0; $i<100000; $i++) {
$array[] = mt_rand();
}
function recursiveCopy($array, $count) {
if($count === 1000)
return;
$foo = $array;
recursiveCopy($array, $count+1);
}
function recursiveReference($array, $count) {
if($count === 1000)
return;
$foo = &$array;
recursiveReference($array, $count+1);
}
$time = microtime(1);
recursiveCopy($array, 0);
$copyTime = (microtime(1) - $time);
echo "Took " . $copyTime . "s \n";
$time = microtime(1);
recursiveReference($array, 0);
$referenceTime = (microtime(1) - $time);
echo "Took " . $referenceTime . "s \n";
echo "Reference / Copy: " . ($referenceTime / $copyTime);
The actual result I got was, that recursiveReference took about 20 times (!) as long as recursiveCopy.
Can somebody explain this PHP behaviour?
PHP will very likely implement copy-on-write for its arrays, meaning when you "copy" an array, PHP doesn't do all the work of physically copying the memory until you modify one of the copies and your variables can no longer reference the same internal representation.
Your benchmarking is therefore fundamentally flawed, as your recursiveCopy function doesn't actually copy the object; if it did, you would run out of memory very quickly.
Try this: By assigning to an element of the array you force PHP to actually make a copy. You'll find you run out of memory pretty quickly as none of the copies go out of scope (and aren't garbage collected) until the recursive function reaches its maximum depth.
function recursiveCopy($array, $count) {
if($count === 1000)
return;
$foo = $array;
$foo[9492] = 3; // Force PHP to copy the array
recursiveCopy($array, $count+1);
}
in recursiveReference you're calling recursiveCopy... this doesn't make any sense, in this case you're calling recursiveReference just once. correct your code, rund the benchmark again and come back with your new results.
in addition, i don't think it's useful for a benchmark to do this recursively. a better solution would be to call a function 1000 times in a loop - once with the array directly and one with a reference to that array.
You don't need to (and thus shouldn't) assign or pass variables by reference just for performance reasons. PHP does such optimizations automatically.
The test you ran is flawed because of these automatic optimizations. In ran the following test instead:
<?php
for($i=0; $i<100000; $i++) {
$array[] = mt_rand();
}
$time = microtime(1);
for($i=0; $i<1000; $i++) {
$copy = $array;
unset($copy);
}
$duration = microtime(1) - $time;
echo "Normal Assignment and don't write: $duration<br />\n";
$time = microtime(1);
for($i=0; $i<1000; $i++) {
$copy =& $array;
unset($copy);
}
$duration = microtime(1) - $time;
echo "Assignment by Reference and don't write: $duration<br />\n";
$time = microtime(1);
for($i=0; $i<1000; $i++) {
$copy = $array;
$copy[0] = 0;
unset($copy);
}
$duration = microtime(1) - $time;
echo "Normal Assignment and write: $duration<br />\n";
$time = microtime(1);
for($i=0; $i<1000; $i++) {
$copy =& $array;
$copy[0] = 0;
unset($copy);
}
$duration = microtime(1) - $time;
echo "Assignment by Reference and write: $duration<br />\n";
?>
This was the output:
//Normal Assignment without write: 0.00023698806762695
//Assignment by Reference without write: 0.00023508071899414
//Normal Assignment with write: 21.302103042603
//Assignment by Reference with write: 0.00030708312988281
As you can see there is no significant performance difference in assigning by reference until you actually write to the copy, i.e. when there is also a functional difference.
Generally speaking in PHP, calling by reference is not something you'd do for performance reasons; it's something you'd do for functional reasons - ie because you actually want the referenced variable to be updated.
If you don't have a functional reason for calling by reference then you should stick with regular parameter passing, because PHP handles things perfectly efficiently that way.
(that said, as others have pointed out, your example code isn't exactly doing what you think it is anyway ;))
In recursiveReference() function you call recursiveCopy() function. It it what you really intended to do?
You do nothing with $foo variable - probably it was supposed to be used in further method call?
Passing variable by reference should generally save stack memory in case of passing large objects.
recursiveReference is calling recursiveCopy.
Not that that would necessarily harm performance, but that's probably not what you're trying to do.
Not sure why performance is slower, but it doesn't reflect the measurement you're trying to make.

Categories