prime generator optimization - php

I'm starting out my expedition into Project Euler. And as many others I've figured I need to make a prime number generator. Problem is: PHP doesn't like big numbers. If I use the standard Sieve of Eratosthenes function, and set the limit to 2 million, it will crash. It doesn't like creating arrays of that size. Understandable.
So now I'm trying to optimize it. One way, I found, was to instead of creating an array with 2 million variable, I only need 1 million (only odd numbers can be prime numbers). But now it's crashing because it exceeds the maximum execution time...
function getPrimes($limit) {
$count = 0;
for ($i = 3; $i < $limit; $i += 2) {
$primes[$count++] = $i;
}
for ($n = 3; $n < $limit; $n += 2) {
//array will be half the size of $limit
for ($i = 1; $i < $limit/2; $i++) {
if ($primes[$i] % $n === 0 && $primes[$i] !== $n) {
$primes[$i] = 0;
}
}
}
return $primes;
}
The function works, but as I said, it's a bit slow...any suggestions?
One thing I've found to make it a bit faster is to switch the loop around.
foreach ($primes as $value) {
//$limitSq is the sqrt of the limit, as that is as high as I have to go
for ($n = 3; $n = $limitSq; $n += 2) {
if ($value !== $n && $value % $n === 0) {
$primes[$count] = 0;
$n = $limitSq; //breaking the inner loop
}
}
$count++;
}
And in addition setting the time and memory limit (thanks Greg), I've finally managed to get an answer. phjew.

Without knowing much about the algorithm:
You're recalculating $limit/2 each time around the $i loop
Your if statement will be evaluated in order, so think about (or test) whether it would be faster to test $primes[$i] !== $n first.
Side note, you can use set_time_limit() to give it longer to run and give it more memory using
ini_set('memory_limit', '128M');
Assuming your setup allows this, of course - on a shared host you may be restricted.

From Algorithmist's proposed solution
This is a modification of the standard
Sieve of Eratosthenes. It would be
highly inefficient, using up far too
much memory and time, to run the
standard sieve all the way up to n.
However, no composite number less than
or equal to n will have a factor
greater than sqrt{n},
so we only need to know all primes up
to this limit, which is no greater
than 31622 (square root of 10^9). This
is accomplished with a sieve. Then,
for each query, we sieve through only
the range given, using our
pre-computed table of primes to
eliminate composite numbers.
This problem has also appeared on UVA's and Sphere's online judges. Here's how it's enunciated on Sphere.

You can use a bit field to store your sieve. That is, it's roughly identical to an array of booleans, except you pack your booleans into a large integer. For instance if you had 8-bit integers you would store 8 bits (booleans) per integer which would further reduce your space requirements.
Additionally, using a bit field allows the possibility of using bit masks to perform your sieve operation. For example, if your sieve kept all numbers (not just odd ones), you could construct a bit mask of b01010101 which you could then AND against every element in your array. For threes you could use three integers as the mask: b00100100 b10010010 b01001001.
Finally, you do not need to check numbers that are lower than $n, in fact you don't need to check for numbers less than $n*$n-1.

Once you know the number is not a prime, I would exit the enter loop. I don't know php, but you need a statement like a break in C or a last in Perl.
If that is not available, I would set a flag and use it to exit the inter loop as a condition of continuing the interloop. This should speed up your execution as you are not checking $limit/2 items if it is not a prime.

if you want speed, don’t use PHP on this one :P
no, seriously, i really like PHP and it’s a cool language, but it’s not suited at all for such algorithms

Related

How can I overcome discrete nature of a numerical algorithm that is currently skipping over a certain real number inside a loop?

I have a fairly complex algorithm that performs a search where I use a $search variable in some range [0.25 to 1.75].
Based on the algorithm there is an "interesting" thing happens when the $search is exactly 1, because it hits a configuration of variables that is sometimes (but not always) most favorable. Some of the code depends on $search being exactly 1 to produce that most favorable outcome.
More specifically, there is usually some specific value within the search range, which produces most favorable outcome, but the way my algorithm is laid out, that specific value is most often skipped over. Here I lay out example when that specific value (based on other inputs and configuration), happens to be exactly 1..
The Problem
Mathematically speaking if $search was continuous rather than discreet, I wouldn't have this problem. My problem is trying to converge on most favorable variable configuration using discrete mathematics. The issue here is the algorithm. Secondary issue to watch out for as well is floating point arithmetic, but I do not believe that is the issue here just yet.
Basic Loop:
$maxPowerOut = 0 ;
for ($increment = 0; $increment <= 500; $increment ++)
{
//vars computed elsewhere, i.e:
//MIN = 0.24651533;
//STEP = 0.00196969
$search = MIN + STEP * $increment;
//compute several coefficients (returns an array)
$coeff = $this->coefficient($search);
//design is a complex library function
list($a, $b) = $this->design($coeff);
$powerOut = $a * $b;
//keep track of max power (and other params, not shown)
if ($powerOut > $maxPowerOut)
$maxPowerOut = $PowerOut;
}
//currently prints 899.993 instead of 900 as should be expected
print "Max Power is $maxPowerOut";
Naturally, $search is almost never 1 exactly. It goes like this:
0.99569478115682
0.99866447159913
1.0016341620414
1.0046038524837
1.0075735429261
...
Note how 1 is skipped over in above loop. For the sake of argument let's say most favorable position happens at 1.003000. That value (1.003000) would be skipped over as well.
Question
How can I improve, restructure, rethink, reorganize, rewrite my loop to avoid this type of problem?
A simple improvement might be to use an iterative approach:
In your current loop you search say 500 values in the interval [0.25, 1.75]. Let's say you can narrow down the optimum to the much smaller interval [0.995, 1.007] in this way. Then again divide this interval into say 500 values and repeat your loop. Repeat until you reach the desired precision.
Mathematically, you want to find the maximum within a given interval of a function f: search -> power that computes some power value for a given search parameter. Note that this is generally easier the smoother your function f is. To get a feeling for what f might look like, you can plot the function based on the values you computed in your loop.
If your function is well-behaved and is say unimodal (has only one "hump"), then for instance a simple golden section search would be effective.
Here's a quick JavaScript snippet / pseudo code, to help solve your problem. Basically your function should recursively call itself if you find that the deltas / slope have toggled from positive to negative.
function findMax(low, high) {
var maxOut = Number.MIN_VALUE;
// Calculate a step based on the low and high
// Using a power of 2 since the floating point numbers are represented by binary
var step = Math.abs((high - low) / 128);
// we'll be tracking the deltas of two test values
var prevDelta;
var delta;
// loop and check two values at a time
for(var i=low; i<=(high - step); i+=step) {
// coef ...
// design ...
// for testing
var out1 = Math.cos(i);
var out2 = Math.cos(i + step);
// update the max
if(out1 > maxOut) maxOut = out1;
if(out2 > maxOut) maxOut = out2;
// calc delta
delta = out2 - out1;
if(prevDelta !== undefined) {
// If one delta is going up and
// another is going down...
// Recursively call the function
if(prevDelta > 0 && delta < 0) {
var out3 = findMax(i - step, i + step);
// update the max
if(out3 > maxOut) maxOut = out3;
}
}
prevDelta = delta;
}
return maxOut;
}
alert(findMax(-0.5, 0.5)); // returns 1
Here's the JSFiddle http://jsfiddle.net/hw5f2o1s/
The above approach won't work if the maximum lies between your initial low and low + step, because the recursion is triggered by reaching a peak then going down from it. If this happens you may have to make the step variable smaller by increasing the power of two dividing (high - low).
Floating point numbers have limited precision (they're discreet), expect deviations.
See: http://php.net/manual/en/language.types.float.php
You can try the arbitrary precision extension
Current direction
Number 1.0 seems to be of importance, perhaps representing default. Rework the code to include 1.0 as part of the $search, either injecting it as part of the same loop or as a separate iteration.

PHP looping context query

A question that has always puzzled me is why people write it like the first version when the second version is smaller and easier to read. I thought it might be because php calculates the strlen each time it iterates. any ideas?
FIRST VERSION
for ($i = 0, $len = strlen($key); $i < $len; $i++) {}
You can obviously use $len inside the loop and further on in the code, but what are the benefits over the following version?
SECOND VERSION
for ($i = 0; $i < strlen($key); $i++) {}
It's a matter of performance.
Your first version of the for loop will recaculate the strlen every time and thus, the performances could be slowed down.
Even though it wouldn't be significant enough, you could be surprised how much the slow can be exponantial sometimes.
You can see here for some performances benchmarks with loops.
The first version is best used if the loop is expected to have many iterations and $key won't change in the process.
The second one is best used if the loop is updating $key and you need to recalculate it, or, when recalculating it doesn't affect your performance.

Fastest way to compare numbers in php

I have a loop that needs to run a few million times; 10,967,700 to be precise. Within it, I am doing some checks including:
Number 1 is less than Number 2
Number 1 is less than or equal to Number 3
Number 4 is greater than Number 5
I'm wondering if there is any optimization/tweaks I can perform to have these checks performed faster. Or is this a ridiculous questino?
According to your snippet I suggest you the following changes:
Use the for-loop instead of the foreach like this example:
$key = array_keys($aHash);
$size = sizeOf($key);
for ($i=0; $i<$size; $i++) $aHash[$key[$i]] .= "a";
This foreach-loop is 4.7x slower. (see the example at the end - http://www.phpbench.com/)
foreach($aHash as $key=>$val) $aHash[$key] .= "a";
As well as checking a value is set the empty()-method is slightly faster than isset().
Using the if and elseif (using ===) is also faster than (==)
I hope I could help you.
(Performance Source: http://www.phpbench.com/)

Amount of data stored in a PHP array

I'm looking for a way to measure the amount of data stored in a PHP array. I'm not talking about the number of elements in the array (which you can figure out with count($array, COUNT_RECURSIVE)), but the cumulative amount of data from all the keys and their corresponding values. For instance:
array('abc'=>123); // size = 6
array('a'=>1,'b'=>2); // size = 4
As what I'm interested in is order of magnitude rather than the exact amount (I want to compare the processing memory and time usage versus the size of the arrays) I thought about using the following trick:
strlen(print_r($array,true));
However the amount of overhead coming from print_r varies depending on the structure of the array which doesn't give me consistent results:
echo strlen(print_r(array('abc'=>123),true)); // 27
echo strlen(print_r(array('a'=>1,'b'=>2),true)); // 35
Is there a way (ideally in a one-liner and without impacting too much performance as I need to execute this at run-time on production) to measure the amount of data stored in an array in PHP?
Does this do the trick:
<?php
$arr = array('abc'=>123);
echo strlen(implode('',array_keys($arr)).implode('',$arr));
?>
Short answer: mission impossible
You could try something like:
strlen(serialize($myArray)) // either this
strlen(json_encode($myArray)) // or this
But to approximate the true memory footprint of an array, you will have to do a lot more than that. If you're looking for a ballpark estimate, arrays take 3-8x more than their serialized version, depending on what you store in them and how many elements you have. It increases gradually, in bigger and bigger chunks as your array grows. To give you an idea of what's happening, here's an array estimation function I came up with, after many hours of trying, for one-level arrays only:
function estimateArrayFootprint($a) { // copied from one of my failed quests :(
$size = 0;
foreach($a as $k=>$v) {
foreach([$k,$v] as $x) {
$n = strlen($x);
do{
if($n>8192 ) {$n = (1+($n>>12)<<12);break;}
if($n>1024 ) {$n = (1+($n>> 9)<< 9);break;}
if($n>512 ) {$n = (1+($n>> 8)<< 8);break;}
if($n>0 ) {$n = (1+($n>> 5)<< 5);break;}
}while(0);
$size += $n + 96;
}
}
return $size;
}
So that's how easy it is, not. And again, it's not a reliable estimation, it probably depends on the PHP memory limit, the architecture, the PHP version and a lot more. The question is how accurately do you need this value.
Also let's not forget that these values came from a memory_get_usage(1) which is not very exact either. PHP allocates memory in big blocks in order to avoid a noticeable overhead as your string/array/whatever else grows, like in a for(...) $x.="yada" situation.
I wish I could say anything more useful.

RNG gaussian distribution

What I'm trying to do isn't exactly a Gaussian distribution, since it has a finite minimum and maximum. The idea is closer to rolling X dice and counting the total.
I currently have the following function:
function bellcurve($min=0,$max=100,$entropy=-1) {
$sum = 0;
if( $entropy < 0) $entropy = ($max-$min)/15;
for($i=0; $i<$entropy; $i++) $sum += rand(0,15);
return floor($sum/(15*$entropy)*($max-$min)+$min);
}
The idea behind the $entropy variable is to try and roll enough dice to get a more even distribution of fractional results (so that flooring it won't cause problems).
It doesn't need to be a perfect RNG, it's just for a game feature and nothing like gambling or cryptography.
However, I ran a test over 65,536 iterations of bellcurve() with no arguments, and the following graph emerged:
(source: adamhaskell.net)
As you can see, there are a couple of values that are "offset", and drastically so. While overall it doesn't really affect that much (at worst it's offset by 2, and ignoring that the probability is still more or less where I want it), I'm just wondering where I went wrong.
Any additional advice on this function would be appreciated too.
UPDATE: I fixed the problem above just by using round instead of floor, but I'm still having trouble getting a good function for this. I've tried pretty much every function I can think of, including gaussian, exponential, logistic, and so on, but to no avail. The only method that has worked so far is this approximation of rolling dice, which is almost certainly not what I need...
If you are looking for a bell curve distribution, generate multiple random numbers and add them together. If you are looking for more modifiers, simply multiply them to the end result.
Generate a random bell curve number, with a bonus of 50% - 150%.
Sum(rand(0,15), rand(0,15) , rand(0,15))*(rand(2,6)/2)
Though if you're concerned about rand not providing random enough numbers you can use mt_rand which will have a much better distribution (uses mersenne twister)
The main issue turned out to be that I was trying to generate a continuous bell curve based on a discrete variable. That's what caused holes and offsets when scaling the result.
The fix I used for this was: +rand(0,1000000)/1000000 - it essentially takes the whole number discrete variable and adds a random fraction to it, more or less making it continuous.
The function is now:
function bellcurve() {
$sum = 0;
$entropy = 6;
for($i=0; $i<$entropy; $i++) $sum += rand(0,15);
return ($sum+rand(0,1000000)/1000000)/(15*$entropy);
}
It returns a float between 0 and 1 inclusive (although those exact values are extremely unlikely), which can then be scaled and rounded as needed.
Example usage:
$damage *= bellcurve()-0.5; // adjusts $damage by a random amount
// between 50% and 150%, weighted in favour of 100%

Categories