Weighted Random Choice In PHP

Weighted Random Choice In PHP - php

I need help getting a my probability odds closer to testing results with low percentages. What I have seems to work for percentages at 1% or higher but I need it to work with very low percentages such as 0.02% (down to 4 decimals). Anything below 1% tends to end up having around a 1% probability after running tests from running 1000-100000 tests at once the results are similar.
Example Results
ID Odds Test Total Test Odds
1 60.0000 301773 60.3546%
2 30.0000 148360 29.672%
3 9.9800 44897 8.9794%
4 0.0200 4970 0.994%
Function
// $values = [1,2,3,4]
// $weights = [60.0000,30.0000,9.9800,.0200]
private function getRandom($values, $weights)
{
$count = count($values);
$i = 0;
$n = 0;
$num = mt_rand(0, array_sum($weights));
while($i < $count)
{
$n += $weights[$i];
if($n >= $num)
break;
$i++;
}
return $values[$i];
}

mt_rand returns an integer so comparing it to 0.02 is effectively the same as comparing it to 1. Hence you always get around 1% for the weights which are less than 1%. Try computing $num like this instead:
$num = mt_rand(0, array_sum($weights) * 100) / 100;
Demo on 3v4l.org

Related

Execution time exceeded

I am trying to find the sum of all primes below 2000000 and here is my code:
$set = 0;
for($i = 1; $i < 2000000; $i++){
if(is_prime($i)){
$set += $i;
}
}
echo $set;
is_prime is the custom function i created to find whether the number is prime or not. The problem is it is taking too much time to execute. Any way to optimize it?

Tell PHP not to time out using set_time_limit in seconds( 0 means infinite)
set_time_limit(0);
also your loop in not efficient, a prime other than 2 cannot be even , so you should be stepping up with + 2 and add 2 to the starting $set
$set = 2
for($i = 1; $i < 2000000; $i += 2)
Code:
<?php
set_time_limit(0);
$set = 2; // 2 is a prime number so must be included in the set
for($i = 1; $i < 2000000; $i += 2){
if(is_prime($i)){
$set += $i;
}
}
echo $set;
?>

I think the is_prime($i)-method is the bottleneck.
You can calculate all prime numbers (offline) up to 2000000 by using a Sieve of Eratosthenes to store all primes in 2000000 bits (or if you only store the odd numbers: 1000000) that fits in the RAM and makes is_prime($i) O(1) time.

Defining percentage for random number

My rand(0,1) php function returns me the 0 and 1 randomly when I call it.
Can I define something in php, so that it makes 30% numbers will be 0 and 70% numbers will be 1 for the random calls? Does php have any built in function for this?

Sure.
$rand = (float)rand()/(float)getrandmax();
if ($rand < 0.3)
$result = 0;
else
$result = 1;
You can deal with arbitrary results and weights, too.
$weights = array(0 => 0.3, 1 => 0.2, 2 => 0.5);
$rand = (float)rand()/(float)getrandmax();
foreach ($weights as $value => $weight) {
if ($rand < $weight) {
$result = $value;
break;
}
$rand -= $weight;
}

You can do something like this:
$rand = (rand(0,9) > 6 ? 1 : 0)
rand(0,9) will produce a random number between 0 and 9, and whenever that randomly generated number is greater than 6 (which should be nearly 70% time), it will give you 1 otherwise 0...
Obviously, it seems to be the easiest solution to me, but definitely, it wont give you 1 exactly 70% times, but should be quite near to do that, if done correctly.
But, I doubt that any solution based on rand will give you 1 exactly 70% times...

Generate a new random value between 1 and 100. If the value falls below 30, then use 0, and 1 otherwise:
$probability = rand(1, 100);
if ($probability < 30) {
echo 0;
} else {
echo 1;
}
To test this theory, consider the following loop:
$arr = array();
for ($i=0; $i < 10000; $i++) {
$rand = rand(0, 1);
$probability = rand(1, 100);
if ($probability < 30) {
$arr[] = 0;
} else {
$arr[] = 1;
}
}
$c = array_count_values($arr);
echo "0 = " . $c['0'] / 10000 * 100;
echo "1 = " . $c['1'] / 10000 * 100;
Output:
0 = 29.33
1 = 70.67

Create an array with 70% 1 and 30% 0s. Then random sort it. Then start picking numbers from the beginning of the array to the end :)
$num_array = array();
for($i = 0; $i < 3; $i++) $num_array[$i] = 0;
for($i = 0; $i < 7; $i++) $num_array[$i] = 1;
shuffle($num_array);
Pros:
You'll get exactly 30% 0 and 70% 1 for any such array.
Cons: Might take longer computation time than a rand() only solution to create the initial array.

I searched for an answer to my question and this was the topic I found.
But it didn't answered my question, so I had to figure it out myself, and I did :).
I figured out that maybe this will help someone else as well.
It's regarding what you asked, but for more usage.
Basically, I use it as a "power" calculator for a random generated item (let's say a weapon). The item has a "min power" and a "max power" value in the db. And I wanted to have 80% chances to have the "power" value closer to the lower 80% of the max possible power for the item, and 20% for the highest 20% possible max power (that are stored in the db).
So, to do this I did the following:
$min = 1; // this value is normally taken from the db
$max = 30; // this value is normally taken from the db
$total_possibilities = ($max - $min) + 1;
$rand = random_int(1, 100);
if ($rand <= 80) { // 80% chances
$new_max = $max - ($total_possibilities * 0.20); // remove 20% from the max value, so you can get a number only from the lowest 80%
$new_rand = random_int($min, $new_max);
} elseif ($rand <= 100) { // 20% chances
$new_min = $min + ($total_possibilities * 0.80); // add 80% for the min value, so you can get a number only from the highest 20%
$new_rand = random_int($new_min, $max);
}
echo $new_rand; // this will be the final item power
The only problem you can have, is if the initial $min and $max variables are the same (or obviously, if the $max is bigger than the $min). This will throw an error since the random works like this ($min, $max), not the other way around.
This code can be very easily changed to have more percentages for different purposes, instead of 80% and 20% to put 40%, 40% and 20% (or whatever you need). I think the code is pretty much easy to read and understand.
Sorry if this is not helpful, but I hope it is :).
It can't do any harm either way ;).

Generating random values and keeping track of their sum

I have more than 200 entries in a database table and I would like to generate a random value for each entry, but in the end, the sum of entries values must equal 100. Is it possible to do this using a for loop and rand() in PHP?

You could simply normalize a set of numbers, like:
$numbers = array();
for ($i = 0; $i < 200; $i += 1) {
$numbers[] = rand();
}
$sum = array_sum($numbers);
// divide $sum by the target sum, to have an instant result, e.g.:
// $sum = array_sum($numbers) / 100;
// $sum = array_sum($numbers) / 42;
// ...
$numbers = array_map(function ($n) use($sum) {
return $n / $sum;
}, $numbers);
print_r($numbers);
print_r(array_sum($numbers)); // ~ 1
demo: http://codepad.viper-7.com/RDOIvX

The solution for your problem is to rand number from 0 to 200 then put in array, then sum the values and divide it by 200 after that. Loop through elements and divide every element by result of previous equatation it will give you the answer
$sum = 0;
$max = 100; //max value to be sumed
$nr_of_records = 200; // number of records that should sum to $max
$arr = array();
for($i=0;$i<$nr_of_records;++$i)
{
$arr[$i] = rand(0,$max);
}
$div = array_sum($arr) / $max;
for($i=0;$i<$nr_of_records;++$i)
{
$arr[$i] /= $div;
echo $arr[$i].'<br>';
}
echo array_sum($arr);
Created living example

How exact has the 100 to be? Just curious, because all hints end at using floating point values, which tend to be inacurate.
I'd propose using fractions... lets say 10000 fractions, each count 1/100 point (10000 * 1/100 = 100 points). Distribute 10000 points to 200 elements, using integers - and be absolutely sure, that the sum of all integers divided by 10000 is 100. There is no need for floats, just think around the corner...

Do a little over/under:
$size = 200;
$sum = 100;
$places = 3;
$base = round($sum/$size, $places);
$values = array_fill(0, $size, $base);
for($i=0; $i<$size; $i+=2) {
$diff = round((rand()/getrandmax()) * $base, $places);
$values[$i] += $diff;
$values[$i+1] -= $diff;
}
//optional: array_shuffle($values);
$sum = 0;
foreach($values as $item) {
printf("%0.3f ", $item);
$sum += $item;
}
echo $sum;
Output:
0.650 0.350 0.649 0.351 0.911 0.089 0.678 0.322 0.566 0.434 0.563 0.437 0.933 0.067 0.505 0.495 0.503 0.497 0.752 0.248 0.957 0.043 0.856 0.144 0.977 0.023 0.863 0.137 0.766 0.234 0.653 0.347 0.770 0.230 0.888 0.112 0.637 0.363 0.716 0.284 0.891 0.109 0.549 0.451 0.629 0.371 0.501 0.499 0.652 0.348 0.729 0.271 0.957 0.043 0.769 0.231 0.767 0.233 0.513 0.487 0.647 0.353 0.612 0.388 0.509 0.491 0.925 0.075 0.797 0.203 0.799 0.201 0.588 0.412 0.788 0.212 0.693 0.307 0.688 0.312 0.847 0.153 0.903 0.097 0.843 0.157 0.801 0.199 0.538 0.462 0.954 0.046 0.541 0.459 0.893 0.107 0.592 0.408 0.913 0.087 0.711 0.289 0.679 0.321 0.816 0.184 0.781 0.219 0.632 0.368 0.839 0.161 0.568 0.432 0.914 0.086 0.991 0.009 0.979 0.021 0.666 0.334 0.678 0.322 0.705 0.295 0.683 0.317 0.869 0.131 0.837 0.163 0.792 0.208 0.618 0.382 0.606 0.394 0.574 0.426 0.927 0.073 0.661 0.339 0.986 0.014 0.759 0.241 0.547 0.453 0.804 0.196 0.681 0.319 0.960 0.040 0.708 0.292 0.558 0.442 0.605 0.395 0.986 0.014 0.621 0.379 0.992 0.008 0.622 0.378 0.937 0.063 0.884 0.116 0.840 0.160 0.607 0.393 0.765 0.235 0.632 0.368 0.898 0.102 0.946 0.054 0.794 0.206 0.561 0.439 0.801 0.199 0.770 0.230 0.843 0.157 0.681 0.319 0.794 0.206 100
The rounding gets a bit squiffy if you're not using nice numbers like 100 and 200, but never more than 0.1 off.

Original question yesterday had exactly 200 entries and the sum "not greater than 100".
My original answer from yesterday:
Use random numbers not greater than 0.5 to be sure.
Alternatively, depending on how "random" those numbers need to be (how
much correlation is allowed), you could keep a running total, and if
it gets disproportionately high, you can mix in a bunch of smaller
values.
Edit:
Way to go changing the question, making me look stupid and get downvoted.
To get the exact sum you have to normalize, and better use exact fractions instead of floats to avoid rounding errors.

displaying axis from min to max value - calculating scale and labels

Writing a routine to display data on a horizontal axis (using PHP gd2, but that's not the point here).
The axis starts at $min to $max and displays a diamond at $result, such an image will be around 300px wide and 30px high, like this:
(source: testwolke.de)
In the example above, $min=0, $max=3, $result=0.6.
Now, I need to calculate a scale and labels that make sense, in the above example e.g. dotted lines at 0 .25 .50 .75 1 1.25 ... up to 3, with number-labels at 0 1 2 3.
If $min=-200 and $max=600, dotted lines should be at -200 -150 -100 -50 0 50 100 ... up to 600, with number-labels at -200 -100 0 100 ... up to 600.
With $min=.02and $max=5.80, dotted lines at .02 .5 1 1.5 2 2.5 ... 5.5 5.8 and numbers at .02 1 2 3 4 5 5.8.
I tried explicitly telling the function where to put dotted lines and numbers by arrays, but hey, it's the computer who's supposed to work, not me, right?!
So, how to calculate???

An algorithm (example values $min=-186 and $max=+153 as limits):
Take these two limits $min, $max and mark them if you wish
Calculate the difference between $max and $min: $diff = $max - $min
153 - (-186) = 339
Calculate 10th logarithm of the difference $base10 = log($diff,10) = 2,5302
Round down: $power = round($base10) = 2.
This is your tenth power as base unit
To calculate $step calculate this:
$base_unit = 10^$power = 100;
$step = $base_unit / 2; (if you want 2 ticks per one $base_unit).
Calculate if $min is divisible by $step, if not take the nearest (round up) one
(in the case of $step = 50 it is $loop_start = -150)
for ($i=$loop_start; $i<=$max; $i++=$step){ // $i's are your ticks
end
I tested it in Excel and it gives quite nice results, you may want to increase its functionality,
for example (in point 5) by calculating $step first from $diff,
say $step = $diff / 4 and round $step in such way that $base_unit is divisible by $step;
this will avoid such situations that you have between (101;201) four ticks with $step=25 and you have 39 steps $step=25 between 0 and 999.

ACM Algorithm 463 provides three simple functions to produce good axis scales with outputs xminp, xmaxp and dist for the minimum and maximum values on the scale and the distance between tick marks on the scale, given a request for n intervals that include the data points xmin and xmax:
Scale1() gives a linear scale with approximately n intervals and dist being an integer power of 10 times 1, 2 or 5.
Scale2() gives a linear scale with exactly n intervals (the gap between xminp and xmaxp tends to be larger than the gap produced by Scale1()).
Scale3() gives a logarithmic scale.
The original 1973 paper is online here, which provides more explanation than the code linked to above.
The code is in Fortran but it is just a set of arithmetical calculations so it is very straightforward to interpret and convert into other languages. I haven't written any PHP myself, but it looks a lot like C so you might want to start by running the code through f2c which should give you something close to runnable in PHP.
There are more complicated functions that give prettier scales (e.g. the ones in gnuplot), but Scale1() would likely do the job for you with minimal code.
(This answer builds on my answer to a previous question Graph axis calibration in C++)
(EDIT -- I've found an implementation of Scale1() that I did in Perl):
use strict;
sub scale1 ($$$) {
# from TOMS 463
# returns a suitable scale ($xMinp, $xMaxp, $dist), when called with
# the minimum and maximum x values, and an approximate number of intervals
# to divide into. $dist is the size of each interval that results.
# #vInt is an array of acceptable values for $dist.
# #sqr is an array of geometric means of adjacent values of #vInt, which
# is used as break points to determine which #vInt value to use.
#
my ($xMin, $xMax, $n) = #_;
#vInt = {1, 2, 5, 10};
#sqr = {1.414214, 3.162278, 7.071068 }
if ($xMin > $xMax) {
my ($tmp) = $xMin;
$xMin = $xMax;
$xMax = $tmp;
}
my ($del) = 0.0002; # accounts for computer round-off
my ($fn) = $n;
# find approximate interval size $a
my ($a) = ($xMax - $xMin) / $fn;
my ($al) = log10($a);
my ($nal) = int($al);
if ($a < 1) {
$nal = $nal - 1;
}
# $a is scaled into a variable named $b, between 1 and 10
my ($b) = $a / 10^$nal;
# the closest permissable value for $b is found)
my ($i);
for ($i = 0; $i < $_sqr; $i++) {
if ($b < $sqr[$i]) last;
}
# the interval size is computed
$dist = $vInt[$i] * 10^$nal;
$fm1 = $xMin / $dist;
$m1 = int($fm1);
if ($fm1 < 0) $m1--;
if (abs(($m1 + 1.0) - $fm1) < $del) $m1++;
# the new minimum and maximum limits are found
$xMinp = $dist * $m1;
$fm2 = $xMax / $dist;
$m2 = $fm2 + 1;
if ($fm2 < -1) $m2--;
if (abs ($fm2 + 1 - $m2) < $del) $m2--;
$xMaxp = $dist * $m2;
# adjust limits to account for round-off if necessary
if ($xMinp > $xMin) $xMinp = $xMin;
if ($xMaxp < $xMax) $xMaxp = $xMax;
return ($xMinp, $xMaxp, $dist);
}
sub scale1_Test {
$par = (-3.1, 11.1, 5,
5.2, 10.1, 5,
-12000, -100, 9);
print "xMin\txMax\tn\txMinp\txMaxp,dist\n";
for ($i = 0; $i < $_par/3; $i++) {
($xMinp, $xMaxp, $dist) = scale1($par[3*$i+0],
$par[3*$i+1], $par[3*$i+2]);
print "$par[3*$i+0]\t$par[3*$i+1]\t$par[3*$i+2]\t$xMinp\t$xMaxp,$dist\n";
}
}

I know that this isn't exactly what you are looking for, but hopefully it will get you started in the right direction.
$min = -200;
$max = 600;
$difference = $max - $min;
$labels = 10;
$picture_width = 300;
/* Get units per label */
$difference_between = $difference / ($labels - 1);
$width_between = $picture_width / $labels;
/* Make the label array */
$label_arr = array();
$label_arr[] = array('label' => $min, 'x_pos' => 0);
/* Loop through the number of labels */
for($i = 1, $l = $labels; $i < $l; $i++) {
$label = $min + ($difference_between * $i);
$label_arr[] = array('label' => $label, 'x_pos' => $width_between * $i);
}

A quick example would be something in the lines of $increment = ($max-$min)/$scale where you can tweak scale to be the variable by which the increment scales. Since you devide by it, it should change proportionately as your max and min values change. After that you will have a function like:
$end = false;
while($end==false){
$breakpoint = $last_value + $increment; // that's your current breakpoint
if($breakpoint > $max){
$end = true;
}
}
At least thats the concept... Let me know if you have troubles with it.

Project Euler #23: Non-abundant sums

I'm struggling with Project Euler problem 23: Non-abundant sums.
I have a script, that calculates abundant numbers:
function getSummOfDivisors( $number )
{
$divisors = array ();
for( $i = 1; $i < $number; $i ++ ) {
if ( $number % $i == 0 ) {
$divisors[] = $i;
}
}
return array_sum( $divisors );
}
$limit = 28123;
//$limit = 1000;
$matches = array();
$k = 0;
while( $k <= ( $limit/2 ) ) {
if ( $k < getSummOfDivisors( $k ) ) {
$matches[] = $k;
}
$k++;
}
echo '<pre>'; print_r( $matches );
I checked those numbers with the available on the internet already, and they are correct. I can multiply those by 2 and get the number that is the sum of two abundant numbers.
But since I need to find all numbers that cannot be written like that, I just reverse the if statement like this:
if ( $k >= getSummOfDivisors( $k ) )
This should now store all, that cannot be created as the sum of to abundant numbers, but something is not quit right here. When I sum them up I get a number that is not even close to the right answer.
I don't want to see an answer, but I need some guidelines / tips on what am I doing wrong ( or what am I missing or miss-understanding ).
EDIT: I also tried in the reverse order, meaning, starting from top, dividing by 2 and checking if those are abundant. Still comes out wrong.

An error in your logic lies in the line:
"I can multiply those by 2 and get the number that is the sum of two abundant numbers"
You first determine all the abundant numbers [n1, n2, n3....] below the analytically proven limit. It is then true to state that all integers [2*n1, 2*n2,....] are the sum of two abundant numbers but n1+n2, and n2+n3 are also the sum of two abundant numbers. Therein lies your error. You have to calculate all possible integers that are the sum of any two numbers from [n1, n2, n3....] and then take the inverse to find the integers that are not.

I checked those numbers with the available on the internet already, and they are correct. I can multiply those by 2 and get the number that is the sum of two abundant numbers.
No, that's not right. There is only one abundant number <= 16, but the numbers <= 32 that can be written as the sum of abundant numbers are 24 (= 12 + 12), 30 (= 12 + 18), 32 (= 12 + 20).
If you have k numbers, there are k*(k+1)/2 ways to choose two (not necessarily different) of them. Often, a lot of these pairs will have the same sum, so in general there are much fewer than k*(k+1)/2 numbers that can be written as the sum of two of the given k numbers, but usually, there are more than 2*k.
Also, there are many numbers <= 28123 that can be written as the sum of abundant numbers only with one of the two abundant numbers larger than 28123/2.
This should now store all, that cannot be created as the sum of to abundant numbers,
No, that would store the non-abundant numbers, those may or may not be the sum of abundant numbers, e.g. 32 is a deficient number (sum of all divisors except 32 is 31), but can be written as the sum of two abundant numbers (see above).
You need to find the abundant numbers, but not only to half the given limit, and you need to check which numbers can be written as the sum of two abundant numbers. You can do that by taking all pairs of two abundant numbers (<= $limit) and mark the sum, or by checking $number - $abundant until you either find a pair of abundant numbers or determine that none sums to $number.
There are a few number theoretic properties that can speed it up greatly.

Below is php code takes 320 seconds
<?php
set_time_limit(0);
ini_set('memory_limit', '2G');
$time_start = microtime(true);
$abundantNumbers = array();
$sumOfTwoAbundantNumbers = array();
$totalNumbers = array();
$limit = 28123;
for ($i = 12; $i <= $limit; $i++) {
if ($i >= 24) {
$totalNumbers[] = $i;
}
if (isAbundant($i)) {
$abundantNumbers[] = $i;
}
}
$countOfAbundantNumbers = count($abundantNumbers);
for ($j = 0; $j < $countOfAbundantNumbers; $j++) {
if (($j * 2) > $limit)
break; //if sum of two abundant exceeds limit ignore that
for ($k = $j; $k < $countOfAbundantNumbers; $k++) { //set $k = $j to avoid duble addtion like 1+2, 2+1
$l = $abundantNumbers[$j] + $abundantNumbers[$k];
$sumOfTwoAbundantNumbers[] = $l;
}
}
$numbers = array_diff($totalNumbers, $sumOfTwoAbundantNumbers);
echo '<pre>';print_r(array_sum($numbers));
$time_end = microtime(true);
$execution_time = ($time_end - $time_start);
//execution time of the script
echo '<br /><b>Total Execution Time:</b> ' . $execution_time . 'seconds';
exit;
function isAbundant($n) {
if ($n % 12 == 0 || $n % 945 == 0) { //first even and odd abundant number. a multiple of abundant number is also abundant
return true;
}
$k = round(sqrt($n));
$sum = 1;
if ($n >= 1 && $n <= 28123) {
for ($i = 2; $i <= $k; $i++) {
if ($n % $i == 0)
$sum+= $i + ( $n / $i);
if ($n / $i == $i) {
$sum = $sum - $i;
}
}
}
return $sum > $n;
}

We Keep Coding

PHP, A popular general-purpose scripting language that is especially suited to web development.

Weighted Random Choice In PHP - php

mt_rand returns an integer so comparing it to 0.02 is effectively the same as comparing it to 1. Hence you always get around 1% for the weights which are less than 1%. Try computing $num like this instead: $num = mt_rand(0, array_sum($weights) * 100) / 100; Demo on 3v4l.org

Related

Execution time exceeded

Defining percentage for random number

Generating random values and keeping track of their sum

displaying axis from min to max value - calculating scale and labels

Project Euler #23: Non-abundant sums

Categories

Resources