I'm attempting to solve Project Euler in PHP and running into a problem with my for loop conditions inside the while loop. Could someone point me towards the right direction? Am I on the right track here?
The problem, btw, is to find the sums of all prime numbers below 2,000,000
Other note: The problem I'm encountering is that it seems to be a memory hog and besides implementing the sieve, I'm not sure how else to approach this. So, I'm wondering if I did something wrong in the implementation.
<?php
// The sum of the primes below 10 is 2 + 3 + 5 + 7 = 17.
// Additional information:
// Sum below 100: 1060
// 1000: 76127
// (for testing)
// Find the sum of all the primes below 2,000,000.
// First, let's set n = 2 mill or the number we wish to find
// the primes under.
$n = 2000000;
// Then, let's set p = 2, the first prime number.
$p = 2;
// Now, let's create a list of all numbers from p to n.
$list = range($p, $n);
// Now the loop for Sieve of Eratosthenes.
// Also, let $i = 0 for a counter.
$i = 0;
while($p*$p < $n)
{
// Strike off all multiples of p less than or equal to n
for($k=0; $k < $n; $k++)
{
if($list[$k] % $p == 0)
{
unset($list[$k]);
}
}
// Re-initialize array
sort ($list);
// Find first number on list after p. Let that equal p.
$i = $i + 1;
$p = $list[$i];
}
echo array_sum($list);
?>
You can make a major optimization to your middle loop.
for($k=0; $k < $n; $k++)
{
if($list[$k] % $p == 0)
{
unset($list[$k]);
}
}
By beginning with 2*p and incrementing by $p instead of by 1. This eliminates the need for divisibility check as well as reducing the total iterations.
for($k=2*$p; $k < $n; $k += $p)
{
if (isset($list[k])) unset($list[$k]); //thanks matchu!
}
The suggestion above to check only odds to begin with (other than 2) is a good idea as well, although since the inner loop never gets off the ground for those cases I don't think its that critical. I also can't help but thinking the unsets are inefficient, tho I'm not 100% sure about that.
Here's my solution, using a 'boolean' array for the primes rather than actually removing the elements. I like using map,filters,reduce and stuff, but i figured id stick close to what you've done and this might be more efficient (although longer) anyway.
$top = 20000000;
$plist = array_fill(2,$top,1);
for ($a = 2 ; $a <= sqrt($top)+1; $a++)
{
if ($plist[$a] == 1)
for ($b = ($a+$a) ; $b <= $top; $b+=$a)
{
$plist[$b] = 0;
}
}
$sum = 0;
foreach ($plist as $k=>$v)
{
$sum += $k*$v;
}
echo $sum;
When I did this for project euler i used python, as I did for most. but someone who used PHP along the same lines as the one I did claimed it ran it 7 seconds (page 2's SekaiAi, for those who can look). I don't really care for his form (putting the body of a for loop into its increment clause!), or the use of globals and the function he has, but the main points are all there. My convenient means of testing PHP runs thru a server on a VMWareFusion local machine so its well slower, can't really comment from experience.
I've got the code to the point where it runs, and passes on small examples (17, for instance). However, it's been 8 or so minutes, and it's still running on my machine. I suspect that this algorithm, though simple, may not be the most effective, since it has to run through a lot of numbers a lot of times. (2 million tests on your first run, 1 million on your next, and they start removing less and less at a time as you go.) It also uses a lot of memory since you're, ya know, storing a list of millions of integers.
Regardless, here's my final copy of your code, with a list of the changes I made and why. I'm not sure that it works for 2,000,000 yet, but we'll see.
EDIT: It hit the right answer! Yay!
Set memory_limit to -1 to allow PHP to take as much memory as it wants for this very special case (very, very bad idea in production scripts!)
In PHP, use % instead of mod
The inner and outer loops can't use the same variable; PHP considers them to have the same scope. Use, maybe, $j for the inner loop.
To avoid having the prime strike itself off in the inner loop, start $j at $i + 1
On the unset, you used $arr instead of $list ;)
You missed a $ on the unset, so PHP interprets $list[j] as $list['j']. Just a typo.
I think that's all I did. I ran it with some progress output, and the highest prime it's reached by now is 599, so I'll let you know how it goes :)
My strategy in Ruby on this problem was just to check if every number under n was prime, looping through 2 and floor(sqrt(n)). It's also probably not an optimal solution, and takes a while to execute, but only about a minute or two. That could be the algorithm, or that could just be Ruby being better at this sort of job than PHP :/
Final code:
<?php
ini_set('memory_limit', -1);
// The sum of the primes below 10 is 2 + 3 + 5 + 7 = 17.
// Additional information:
// Sum below 100: 1060
// 1000: 76127
// (for testing)
// Find the sum of all the primes below 2,000,000.
// First, let's set n = 2 mill or the number we wish to find
// the primes under.
$n = 2000000;
// Then, let's set p = 2, the first prime number.
$p = 2;
// Now, let's create a list of all numbers from p to n.
$list = range($p, $n);
// Now the loop for Sieve of Eratosthenes.
// Also, let $i = 0 for a counter.
$i = 0;
while($p*$p < $n)
{
// Strike off all multiples of p less than or equal to n
for($j=$i+1; $j < $n; $j++)
{
if($list[$j] % $p == 0)
{
unset($list[$j]);
}
}
// Re-initialize array
sort ($list);
// Find first number on list after p. Let that equal p.
$i = $i + 1;
$p = $list[$i];
echo "$i: $p\n";
}
echo array_sum($list);
?>
Related
I'm reading "PHP 7 Data Structures and Algorithms" chapter "Shortest path using the Floyd-Warshall algorithm"
the author is generating a graph with this code:
$totalVertices = 5;
$graph = [];
for ($i = 0; $i < $totalVertices; $i++) {
for ($j = 0; $j < $totalVertices; $j++) {
$graph[$i][$j] = $i == $j ? 0 : PHP_INT_MAX;
}
}
i don't understand this line :
$graph[$i][$j] = $i == $j ? 0 : PHP_INT_MAX;
looks like a one line if statement
is it the same as ?
if ($i == $j) {
$graph[$i][$j] = 0;
} else {
$graph[$i][$j] = PHP_INT_MAX;
}
what is the point of using PHP_INT_MAX ?
at the end what does the graph look like ?
You've correctly understood the ternary (? :) operator
To answer the other part of your question, have a look if the following makes sense to you.
First:
The author initializes the $graph array using the following code:
<?php
$totalVertices = 5; // total nodes (use 0, 1, 2, 3, and 4 instead of A, B, C, D, and E, respectively)
$graph = [];
for ($i = 0; $i < $totalVertices; $i++) {
for ($j = 0; $j < $totalVertices; $j++) {
$graph[$i][$j] = $i == $j ? 0 : PHP_INT_MAX;
}
}
which results in the following matrix
All the nodes(vertices) on the main diagonal(grey) are set to 0 as a node's distance to itself equals 0.
All the remaining nodes in the 'matrix' are set to PHP_INT_MAX (the largest integer supported) - we'll see why this is in a minute.
Second:
The author then sets the distances between the nodes that have a direct connection(edges), writing them manually to the $graph array, as follows:
$graph[0][1] = $graph[1][0] = 10;
$graph[2][1] = $graph[1][2] = 5;
$graph[0][3] = $graph[3][0] = 5;
$graph[3][1] = $graph[1][3] = 5;
$graph[4][1] = $graph[1][4] = 10;
$graph[3][4] = $graph[4][3] = 20;
This results in the following 'matrix' stored in array $graph (green: edge distances):
So why does the author use PHP_INT_MAX for the nodes that are not directly connected(the non-edges)?
The reason is, because it allows for the algorithm to work with
node-connection(edge) distances up to and including PHP_INT_MAX.
In this particular example, any number smaller than 20 in stead of PHP_INT_MAX in the ternary would warp the outcomes of the algorithm - it would spit out wrong results.
Or another way to look at this, in this particular example the author could have just used any number bigger than 20 in stead of PHP_INT_MAX to get satisfactory results from the algorithm,
because the biggest distance between two directly connected nodes in this case equals 20. Use any number smaller than 20 and the results will come out wrong.
You can give it a try, and test:
$graph[$i][$j] = $i == $j ? 0 : 19;
the algorithm will now tell us that the shortest distance between A to E - i.e. $graph[0][4] equals 19... WRONG
So using PHP_INT_MAX here gives 'leeway', it allows for the algorithm to work successfully with edge distances smaller than or equal to 9223372036854775807 (the largest int that can be stored on a 64 bit system),
or 2147483647 (on a 32 bit system).
You have two questions here.
The first is regarding the syntax condition ? val_if_true : val_if_false. This is called the "ternary operator". Your assessment regarding the behavior is correct.
The second is regarding the use of PHP_INT_MAX. All distances between two nodes are being initialized to one of two values: 0 if nodes i and j are the same node (i.e. a vertex), and PHP_INT_MAX if the nodes are not the same (i.e. an edge). That is, a node's distance to itself is 0 and a node's distance to any other node is the largest integer value PHP recognizes. The reason for this is that the Floyd-Warshall algorithm utilizes the concept of "infinity" to represent minimum distances that have not yet been calculated, but as there is no concept of "infinity" in PHP, the value PHP_INT_MAX is being used as a stand-in for it.
I have a for loop in my code. I haven't changed anything on this part of code for about 5-6 days and I never had problems with it.
Since yesterday I tried to reload my code and it allways gives me this error:
Maximum execution time of 30 seconds exceeded - in LogController.php line 270
Well I can't explain why but maybe someone of you could look over it.
This is my code around line 270.
$topten_sites = [];
for ($i = 0; $i <= count($sites_array); $i++) {
if ($i < 10) { // this is 270
$topten_sites[] = $sites_array[$i];
}
}
$topten_sites = collect($topten_sites)->sortByDesc('number')->all();
As I said, it worked perfectly, so why it gives me an error? If I uncomment these lines and every other line that contains the $topten_sites array, the code workes again.
This looks wrong:
for ($i = 0; $i <= $sites_array; $i++) {
if ($i < 10) { // this is 270
$topten_sites[] = $sites_array[$i];
}
}
If $sites_array is an array, it makes no sense to compare it to an integer so you probably have a never-ending loop.
If you just need the first 10 elements in another array, you can replace your loop with:
$topten_sites = array_slice($sites_array, 0, 10);
Why would You iterate entire array if You only want first 10 results?
for ($i = 0; $i < 10; $i++) {
$topten_sites[] = $sites_array[$i];
}
To answer the actual answer; code never stops working "for no reason". Code works or it doesn't, both for a reason. If it stops working something changed compared to your previous tests.
"Sometimes it works, sometimes it doesn't" falls in the same logic. Code will always behave exactly the same every time, just some of the parameters have changed, you have to find which one.
In your case, i'm guessing the entries in your array have increased. PHP and arrays aren't best friends when it comes to speed, arrays are slow. It could very well be that your array was
smaller when you tested it (wasn't probally the fastest to begin with), but now with the current amount it just hit the threshold of 30 seconds.
It could also be that a part of code before this bit of code takes a lot of time (say suddenly 28 seconds instead of 20), and your loop (which never changed) does it's job in the regular 3seconds it always does, now runs into problems
Use it like this:
$topten_sites = [];
for ($i = 0; $i <= 10; $i++) {
$topten_sites[] = $sites_array[$i];
}
$topten_sites = collect($topten_sites)->sortByDesc('number')->all();
I found this perfect answer for the Codility's PermMissingElem Question.
function solution($A) {
$N = count($A);
$sum = ($N + 2) * ($N + 1) / 2;
for($i = 0; $i < $N; $i++){
$sum -= $A[$i];
}
return intval($sum);
}
However I found its puzzled for me regarding the $sum's function. What kind of function is this? It's amazingly correctly, yet, how come someone could make up such function? Is there anyone can somehow reverse engineer the thinking process?
I really want to know the process how it came about.
Thank You !
The sum of integers from 1 to N can be calculated by this formula:
N(N+1)/2
Basically, you take the first number and the last number and add them together, and then the second number and the second to last number..etc.
For example:
The sum of 1 to 100:
(1+100) + (2+99) + (3+98) + (4+97) ...
= (100/2)(101)
= 50 x 101
Here's a good explanation:
http://www.wikihow.com/Sum-the-Integers-from-1-to-N
I have this relatively complex combinations and permutations code that I have to execute in CLI. The code takes a number as command line parameter and then outputs a list of all permutations of all unique combinations. The arrays are strings of numbers separated by a space.
The code works nice if N is 6 or less. Even 7. However when I pass n=8 the code simply freezes,, it stops and doesn't move on.
How can I fix this so that N can be 8.
N will never be larger than 8, BUT the code must be able to execute with 8.
Here is the code
for ($i=0; count($list) < $nop; $i++) {
shuffle($array);
$tmp = implode(' ', $array);
if (!isset($list[$tmp])) {
$list[$tmp] = 1;
}}
Thanks for all advice in advance.
for ($i=0; count($list) < $nop; $i++) {
causes an infinite loop. Nowhere in the loop does the size of $list or $nop change so if count($list) < $nop is true, it'll be an infinite loop.
I have to compare two very large number of values, for that I put them in arrays but it didn't work. Below is the code I use. Is this the most efficient way? I have set the time and memory to unlimited as well. error 101 (connection reset) unknown error this is error shown by chrome
for ($k = 0; $k < sizeof($pid); $k++) {
$out = 0;
for ($m = 0; $m < sizeof($oid); $m++) {
if ($pid[$k] == $oid[$m]) // $pid have 300000 indexes
//and $oid have about 500000 indexes
{
$out++;
}
}
if ($out) {
echo "OID for ID ".$pid[$k]." = ".$out;
echo "<br>";
}
}
Doesn't work how? Won't give you an answer? You're comparing every possible pair. How many combinations is that? More than 10^13. That will take something like an hour on a modern machine, if you don't run out of memory first.A more efficient way would be to sort them first: NlogN + MlogM + N + M time, instead of N*M time.Sorting a list of size x with a comparison sort takes x*log(x) time. Then, you can walk from the front of each list once, confident that if there are any matches you will find them. This takes linear time.