A* implementation in PHP validation - php

This is a code that I got from the site here and I'd like to know whether this implementation of A* is correct. I have looked at it and compare it with the wikipedia page and it seems valid.. The reason why I ask is because in the site it says there is still a bug in this code, I tried finding it but can't find any. I want to change it though so that it takes an origin and destination as input parameter
<?php
class AStarSolver
{
function solve(&$s)
{
include_once('PQueue.class.php');
$o = new PQueue();
$l = array();
$c = array();
$p = array();
$a = $s->getStartIndex();
$z = $s->getGoalIndex();
$d = $s->goalDistance($a);
$n0 = array('g'=>0, 'h'=>$d, 'i'=>$a, 'p'=>NULL, 'f'=>$d);
$o->push($n0, -$d);
$l[$a] = TRUE;
while (! $o->isEmpty())
{
$n = $o->pop();
if ($n['i'] == $z)
{
while ($n)
{
$p[] = $n['i'];
$n = $n['p'];
}
break;
}
foreach ($s->getNeighbors($n['i']) as $j => $w)
{
if ((isset($l[$j]) || isset($c[$j])) && isset($m) && $m['g'] <= $n['g']+$w)
continue;
$d = $s->goalDistance($j);
$m = array('g'=>$n['g']+$w, 'h'=>$d, 'i'=>$j, 'p'=>$n, 'f'=>$n['g']+$w+$d);
if (isset($c[$j]))
unset($c[$j]);
if (! isset($l[$j]))
{
$o->push($m, -$m['f']);
$l[$j] = TRUE;
}
}
$c[$n['i']] = $n;
}
return $p;
}
}
?>
The code to the Pqueue can be found here

The site suggests that the bug might be in the PQueue class.
In PQueue::pop this
$j+1 < $m
is a test whether the heap node at $i has one child (at $j) or two (at $j and $j+1).
But $m here is count($h) only on the first iteration through the loop, since the --$m in the loop condition is evaluated every time.
Move that --$m next to the array_pop where it belongs, and that will be one less bug.
Now for AStarSolver.
The variables are (relative to Wikipedia pseudocode):
$o – open set as priority queue;
$l – open set as map keyed by index;
$c – closed set as map keyed by index;
$n – current node (x);
$m – neighbour node (y) ?;
$j – neighbour node index.
Problems that I see:
$n = $o->pop() isn't followed by unset($l[$n['i']]). Since both $o and $l represent the same set, they should be kept in sync.
According to Wikipedia the closed set is only used if the heuristic is monotone (and I think a distance heuristic is monotone), and in that case, once a node is added to the closed set, it is never visited again. This code seems to implement some other pseudocode, which does remove nodes from the closed set. I think this defeats the purpose of the closed set, and the first condition in the inner loop should be
if (isset($c[$j]) || isset($l[$j]) && isset($m) && $m['g'] <= $n['g']+$w)
Then we can remove the unset($c[$j]).
$m['g'] in this condition should be the g-score of the current neighbour indexed by $j. But $m has whatever value is left over from the previous loop: the node corresponding to $j on a previous iteration.
What we need is a way to find a node and its g-score by node index. We can store the node in the $l array: instead of $l[$j] = TRUE we do $l[$j] = $m and the above condition becomes
if (isset($c[$j]) || isset($l[$j]) && $l[$j]['g'] <= $n['g']+$w)
Now the tricky bit. If the node we just found is not in the open set, we add it there (that's the $o->push and $l[$j] =).
However, if it is in the open set we just found a better path to it, so we must update it. The code doesn't do this and it's tricky because the priority queue doesn't provide a routine for increasing the priority of an element. However, we can rebuild the priority queue completely and the last bit of code in the inner loop becomes
if (! isset($l[$j])) {
   $o->push($m, -$m['f']);
   $l[$j] = $m; // add a new element
} else {
   $l[$j] = $m; // replace existing element
   $o = new PQueue();
   foreach ($l as $m)
      $o->push($m, -$m['f']);
}
This is not terribly efficient, but it's a starting point. Changing an element in a priority queue isn't efficient anyway, because you first have to find it.
Even without these changes the algorithm does find a path, just not the best path. You can see it in the mentioned mazes:
In the crazy maze in the third inner circuit: the taken upper path around is slightly longer than the lower path would have been because of the obstacles on the left.
In the big maze in the upper-right part of the path there's an unnecessary loop up.
Since this was on my mind, I implemented my own version of the algorithm and posted it in an answer to your previous question.

Related

How can I make a loop faster?

I have an array of values that represent points on a line chart:
$temperatures = [23, 24, null, '', 25, '', '', null];
I'm using PHP4, but I think it can be answered in any language.
Array contains only numbers, nulls and empty strings.
Numbers represent temperatures, nulls mean that the instruments weren't working and empty strings represent neither (instruments are working, just not measuring anything).
Points must (in most cases) be connected, since it's a line chart.
I have a variable $gap that corresponds to each point and tells whether this point is connected to the next point. If it is set to true, than the points are not connected (false otherwise). For example, $gap for temperatures[0] must be set to false, since the line is drawn between temperatures[0] and temperatures[1](they are both valid temperatures). $gap fortemperatures[1]andtemperatures[2]` must be true, since there is null following. And so on.
When there is null the $gap is absolutely true. For numbers and empty strings, it depends on: if a null follows, gap is true; if a number follows, gap is false. If empty string follows, we must check if afterwards comes null or number and apply the previous sentence accordingly. If there are just empty strings following, gap is true. Here is my code that is working too slow, but produce correct results:
$limit = count($temperatures);
for ($i = 0; $i <= limit; $i++) {
$next_is_number = false;
if (is_null($temperatures[i]) {
$gap = true;
} else {
for ($y = $i + 1; $i <= limit; $i++) {
if (is_null($temperatures[$y]) {
break;
} elsif (is_numeric($temperatures[$y]) {
$next_is_number = true;
break;
}
}
if ($next_is_number) {
$gap = false;
} else {
$gap = true;
}
}
}
How can I speed it up?
Your code checks whether there is a a gap somewhere in your line chart or not.
So once a gap is found, there no reason to continue in the outer for-loop. Think of a chart of 1000 values, if there is a gap between the first two values it makes no sense to continue checking the other 998 values.
Thus, the first thing I would recommend is to set $gap = false at the beginning and to leave the loop once $gap is true. You could do that either with
1.) break (not so elegant),
2.) extract your code to a method and add a return-statement or
3.) adding a condition in the for-loop. I am not familiar with php but in most languages it is possible to do it like this:
$gap = false;
$limit = count($temperatures);
for ($i = 0; $i <= limit && !$gap; $i++) {
[...]
So once $gap is true, the outer for-loop is left.
Iterate through backwards, remembering the last valid value and putting that in when you see an empty string. Then it's O(n) worst case, not O(n^2).
Alternatively, you can work from $y - 1 to $x (or vice versa) after the inner loop, setting the values of your gaps array / outputting values, then skip past all the ones you've just done ($x = $y). This is also O(n).
Then, once you've got the algorithm as fast as you can, you can ditch PHP and write it in something like Rust or C. (I don't recall any true arrays in the language, so they're always going to be slow.)

Time it takes to assign a variable vs. assign + add

<?php
$a = microtime(true);
$num = 0;
for($i=0;$i<10000000;$i++)
{
$num = $i;
}
$b= microtime(true);
echo $b-$a;
?>
I run this on Ubuntu 12.10 and Apache 2
will give me approx. .50 seconds... when I run an assignment for a million times.. BUT BUT...
the same code, instead of $num = $i ... i go ...
$num = $i + 10; and it now takes almost 1.5 times less time to execute.. around .36 consistently..
How come the simple assignment is taking more, whilst an assignment and adding a 10 over it... takes less time!
I am by no means an expert, but here are my findings:
$s = microtime(true);
for($i=0;$i<100000000;$i++) $tmp = $i;
$t = microtime(true);
for($i=0;$i<100000000;$i++) $tmp = $i+10;
$u = microtime(true);
echo ($t-$s).chr(10).($u-$t);
Results in:
9.9528648853302
9.0821340084076
On the other hand, using a constant value for the assignment test:
$x = 0;
$s = microtime(true);
for($i=0;$i<100000000;$i++) $tmp = $x;
$t = microtime(true);
for($i=0;$i<100000000;$i++) $tmp = $x+10;
$u = microtime(true);
echo ($t-$s).chr(10).($u-$t);
Results in:
6.1365358829498
9.3231790065765
This leads me to believe that the answer has something to do with opcode cacheing. I honestly couldn't tell you what about it is making the difference, but as you can see using a constant value for the assignment makes a huge difference.
This is just an educated guess, based on looking at the latest php source on Github, but I'd say this difference is due to function call overhead in the interpreter source.
$tmp = $i;
compiles to a single opcode ASSIGN !2, !1;, which copies one named variable's value to another named variable. In the source, the key part looks like this:
if (EXPECTED(Z_TYPE_P(variable_ptr) <= IS_BOOL)) {
/* nothing to destroy */
ZVAL_COPY_VALUE(variable_ptr, value);
zendi_zval_copy_ctor(*variable_ptr);
}
$tmp = $i + 10;
compiles to two opcodes ADD ~8 !1, 10; ASSIGN !2, ~8;, which creates a temporary variable ~8 and assigns its value to a named variable. In the source, the key part looks like this:
if (EXPECTED(Z_TYPE_P(variable_ptr) <= IS_BOOL)) {
/* nothing to destroy */
ZVAL_COPY_VALUE(variable_ptr, value);
}
Notice that there's an extra function call to zendi_zval_copy_ctor() in the first case. That function performs some bookkeeping as needed (e.g. if the original variable is a resource, it needs to make sure that resource is not freed until this new variable is gone, etc.). For a primitive type such as a number, there's nothing to do, but the function call itself introduces some overhead, which accumulates over 10 million iterations of your test. You should note that this overhead is normally negligible, because even in 10 million iterations it only accumulated to .14 seconds.
#Kolink's observation about a constant being faster can also be answered in the same function. It includes a check to avoid redundant copying if the new value is the same as the old one:
if (EXPECTED(variable_ptr != value)) {
copy_value:
// the same code that handles `$tmp = $i` above
if (EXPECTED(Z_TYPE_P(variable_ptr) <= IS_BOOL)) {
/* nothing to destroy */
ZVAL_COPY_VALUE(variable_ptr, value);
zendi_zval_copy_ctor(*variable_ptr);
} else {
/* irrelevant to the question */
}
}
So only the first assignment of $tmp = $x copies the value of $x, the following ones see that the value of $tmp would not change and skip the copying, making it faster.

What is better to use: in_array or array_unique?

I am in doubt what to use:
foreach(){
// .....
if(!in_array($view, $this->_views[$condition]))
array_push($this->_views[$condition], $view);
// ....
}
OR
foreach(){
// .....
array_push($this->_views[$condition], $view);
// ....
}
$this->_views[$condition] = array_unique($this->_views[$condition]);
UPDATE
The goal is to get array of unique values. This can be done by checking every time if value already exists with in_array or add all values each time and in the end use array_unique. So is there any major difference between this two ways?
I think the second approach would be more efficient. In fact, array_unique sorts the array then scans it.
Sorting is done in N log N steps, then scanning takes N steps.
The first approach takes N^2 steps (foreach element scans all N previous elements). On big arrays, there is a very big difference.
Honestly if you're using a small dataset it does not matter which one you use. If your dataset is in the 10000s you'll most definitely want to use a hash map for this sort of thing.
This is assuming the views are a string or something, which it looks like it is.
This is typically O(n) and possibly the fastest way to deal with tracking unique values.
foreach($views as $view)
{
if(!array_key_exists($view,$unique_views))
{
$unique_views[$condition][$view] = true;
}
}
TL;DR: foreach combined with if (!in_array()) is better.
Truthfully you should not really worry about what performs better; in most cases the difference is so small, its negligible (unless you're really doing some big data stuff). I would suggest to go with whatever seems more readable.
If you're interested, check out this script I wrote. It loops each case 100.000 times and both take between 50 and 200 ms.
https://3v4l.org/lkTCF
Note that array_unique() keeps the original keys so to counter that we also have to wrap the result with array_values().
In case the link ever dies:
<?php
$loops = 100000;
$start = microtime(true);
for ($l = 0; $l < $loops; $l++) {
$x = [1,2,3,4,6,7,8,9];
for ($i = 0; $i <= 10; $i++) {
if (!in_array($i, $x)) {
$x[] = $i;
}
}
}
$duration = microtime(true) - $start;
echo "in_array took $duration<br>".PHP_EOL;
$start = microtime(true);
for ($l = 0; $l < $loops; $l++) {
$x = [1,2,3,4,6,7,8,9];
$x = array_values(array_unique(array_merge($x, [0,1,2,3,4,5,6,7,8,9,10])));
}
$duration = microtime(true) - $start;
echo "array_unique took $duration<br>".PHP_EOL;

Is PHP capable of caching count call inside loop?

I know the more efficient way to have a loop over array is a foreach, or to store count in a variable to avoid to call it multiple times.
But I am curious if PHP have some kind of "caching" stuff like:
for ($i=0; $i<count($myarray); $i++) { /* ... */ }
Does it have something similar and I am missing it, or it does not have anything and you should code:
$count=count($myarray);
for ($i=0; $i<$count; $i++) { /* ... */ }
PHP does exactly what you tell it to. The length of the array may change inside the loop, so it may be on purpose that you're calling count on each iteration. PHP doesn't try to infer what you mean here, and neither should it. Therefore the standard way to do this is:
for ($i = 0, $length = count($myarray); $i < $length; $i++)
PHP will execute the count each time the loop iterates. However, PHP does keep internal track of the array's size, so count is a relatively cheap operation. It's not as if PHP is literally counting each element in the array. But it's still not free.
Using a very simple 10 million item array doing a simple variable increment, I get 2.5 seconds for the in-loop count version, and 0.9 seconds for the count-before-loop. A fairly large difference, but not 'massive'.
edit: the code:
$x = range(1, 10000000);
$z = 0;
$start = microtime(true);
for ($i = 0; $i < count($x); $i++) {
$z++;
}
$end = microtime(true); // $end - $start = 2.5047581195831
Switching to do
$count = count($x);
for ($i = 0; $i < $count; $i++) {
and otherwise everything else the same, the time is 0.96466398239136
PHP is an imperative language, and that means it is not supposed to optimize away anything that can possibly have any effect. Given that it's also an interpreted language, it couldn't be done safely even if someone really wanted.
Plus, if you simply want to iterate over the array, you really want to use foreach. In that case, not only the count, but the whole array will be copied (and you can modify the original one as you wish). Or you can modify it in place using foreach ($arr as &$el) { $el = ... }; unset($el);. What I mean to say is that PHP (as any other language) often provides better solutions to your original problem (if you have any).

Project Euler || Question 10

I'm attempting to solve Project Euler in PHP and running into a problem with my for loop conditions inside the while loop. Could someone point me towards the right direction? Am I on the right track here?
The problem, btw, is to find the sums of all prime numbers below 2,000,000
Other note: The problem I'm encountering is that it seems to be a memory hog and besides implementing the sieve, I'm not sure how else to approach this. So, I'm wondering if I did something wrong in the implementation.
<?php
// The sum of the primes below 10 is 2 + 3 + 5 + 7 = 17.
// Additional information:
// Sum below 100: 1060
// 1000: 76127
// (for testing)
// Find the sum of all the primes below 2,000,000.
// First, let's set n = 2 mill or the number we wish to find
// the primes under.
$n = 2000000;
// Then, let's set p = 2, the first prime number.
$p = 2;
// Now, let's create a list of all numbers from p to n.
$list = range($p, $n);
// Now the loop for Sieve of Eratosthenes.
// Also, let $i = 0 for a counter.
$i = 0;
while($p*$p < $n)
{
// Strike off all multiples of p less than or equal to n
for($k=0; $k < $n; $k++)
{
if($list[$k] % $p == 0)
{
unset($list[$k]);
}
}
// Re-initialize array
sort ($list);
// Find first number on list after p. Let that equal p.
$i = $i + 1;
$p = $list[$i];
}
echo array_sum($list);
?>
You can make a major optimization to your middle loop.
for($k=0; $k < $n; $k++)
{
if($list[$k] % $p == 0)
{
unset($list[$k]);
}
}
By beginning with 2*p and incrementing by $p instead of by 1. This eliminates the need for divisibility check as well as reducing the total iterations.
for($k=2*$p; $k < $n; $k += $p)
{
if (isset($list[k])) unset($list[$k]); //thanks matchu!
}
The suggestion above to check only odds to begin with (other than 2) is a good idea as well, although since the inner loop never gets off the ground for those cases I don't think its that critical. I also can't help but thinking the unsets are inefficient, tho I'm not 100% sure about that.
Here's my solution, using a 'boolean' array for the primes rather than actually removing the elements. I like using map,filters,reduce and stuff, but i figured id stick close to what you've done and this might be more efficient (although longer) anyway.
$top = 20000000;
$plist = array_fill(2,$top,1);
for ($a = 2 ; $a <= sqrt($top)+1; $a++)
{
if ($plist[$a] == 1)
for ($b = ($a+$a) ; $b <= $top; $b+=$a)
{
$plist[$b] = 0;
}
}
$sum = 0;
foreach ($plist as $k=>$v)
{
$sum += $k*$v;
}
echo $sum;
When I did this for project euler i used python, as I did for most. but someone who used PHP along the same lines as the one I did claimed it ran it 7 seconds (page 2's SekaiAi, for those who can look). I don't really care for his form (putting the body of a for loop into its increment clause!), or the use of globals and the function he has, but the main points are all there. My convenient means of testing PHP runs thru a server on a VMWareFusion local machine so its well slower, can't really comment from experience.
I've got the code to the point where it runs, and passes on small examples (17, for instance). However, it's been 8 or so minutes, and it's still running on my machine. I suspect that this algorithm, though simple, may not be the most effective, since it has to run through a lot of numbers a lot of times. (2 million tests on your first run, 1 million on your next, and they start removing less and less at a time as you go.) It also uses a lot of memory since you're, ya know, storing a list of millions of integers.
Regardless, here's my final copy of your code, with a list of the changes I made and why. I'm not sure that it works for 2,000,000 yet, but we'll see.
EDIT: It hit the right answer! Yay!
Set memory_limit to -1 to allow PHP to take as much memory as it wants for this very special case (very, very bad idea in production scripts!)
In PHP, use % instead of mod
The inner and outer loops can't use the same variable; PHP considers them to have the same scope. Use, maybe, $j for the inner loop.
To avoid having the prime strike itself off in the inner loop, start $j at $i + 1
On the unset, you used $arr instead of $list ;)
You missed a $ on the unset, so PHP interprets $list[j] as $list['j']. Just a typo.
I think that's all I did. I ran it with some progress output, and the highest prime it's reached by now is 599, so I'll let you know how it goes :)
My strategy in Ruby on this problem was just to check if every number under n was prime, looping through 2 and floor(sqrt(n)). It's also probably not an optimal solution, and takes a while to execute, but only about a minute or two. That could be the algorithm, or that could just be Ruby being better at this sort of job than PHP :/
Final code:
<?php
ini_set('memory_limit', -1);
// The sum of the primes below 10 is 2 + 3 + 5 + 7 = 17.
// Additional information:
// Sum below 100: 1060
// 1000: 76127
// (for testing)
// Find the sum of all the primes below 2,000,000.
// First, let's set n = 2 mill or the number we wish to find
// the primes under.
$n = 2000000;
// Then, let's set p = 2, the first prime number.
$p = 2;
// Now, let's create a list of all numbers from p to n.
$list = range($p, $n);
// Now the loop for Sieve of Eratosthenes.
// Also, let $i = 0 for a counter.
$i = 0;
while($p*$p < $n)
{
// Strike off all multiples of p less than or equal to n
for($j=$i+1; $j < $n; $j++)
{
if($list[$j] % $p == 0)
{
unset($list[$j]);
}
}
// Re-initialize array
sort ($list);
// Find first number on list after p. Let that equal p.
$i = $i + 1;
$p = $list[$i];
echo "$i: $p\n";
}
echo array_sum($list);
?>

Categories