Weighted random pick - php

I have a set of items. I need to randomly pick one. The problem is that they each have a weight of 1-10. A weight of 2 means that the item is twice as likely to be picked than a weight of 1. A weight of 3 is three times as likely.
I currently fill an array with each item. If the weight is 3, I put three copies of the item in the array. Then, I pick a random item.
My method is fast, but uses a lot of memory. I am trying to think of a faster method, but nothing comes to mind. Anyone have a trick for this problem?
EDIT: My Code...
Apparently, I wasn't clear. I do not want to use (or improve) my code. This is what I did.
//Given an array $a where $a[0] is an item name and $a[1] is the weight from 1 to 100.
$b = array();
foreach($a as $t)
$b = array_merge($b, array_fill(0,$t[1],$t));
$item = $b[array_rand($b)];
This required me to check every item in $a and uses max_weight/2*size of $a memory for the array. I wanted a COMPLETELY DIFFERENT algorithm.
Further, I asked this question in the middle of the night using a phone. Typing code on a phone is nearly impossible because those silly virtual keyboards simply suck. It auto-corrects everything, ruining any code I type.
An yet further, I woke up this morning with an entirely new algorithm that uses virtual no extra memory at all and does not require checking every item in the array. I posted it as an answer below.

This ones your huckleberry.
$arr = array(
array("val" => "one", "weight" => 1),
array("val" => "two", "weight" => 2),
array("val" => "three", "weight" => 3),
array("val" => "four", "weight" => 4)
);
$weight_sum = 0;
foreach($arr as $val)
{
$weight_sum += $val['weight'];
}
$r = rand(1, $weight_sum);
print "random value is $r\n";
for($i = 0; $i < count($arr); $i++)
{
if($r <= $arr[$i]['weight'])
{
print "$r <= {$arr[$i]['weight']}, this is our match\n";
print $arr[$i]['val'] . "\n";
break;
}
else
{
print "$r > {$arr[$i]['weight']}, subtracting weight\n";
$r -= $arr[$i]['weight'];
print "new \$r is $r\n";
}
}
No need to generate arrays containing an item for every weight, no need to fill an array with n elements for a weight of n. Just generate a random number between 1 and total weight, then loop through the array until you find a weight less than your random number. If it isn't less than the number, subtract that weight from the random and continue.
Sample output:
# php wr.php
random value is 8
8 > 1, subtracting weight
new $r is 7
7 > 2, subtracting weight
new $r is 5
5 > 3, subtracting weight
new $r is 2
2 <= 4, this is our match
four
This should also support fractional weights.
modified version to use array keyed by weight, rather than by item
$arr2 = array(
);
for($i = 0; $i <= 500000; $i++)
{
$weight = rand(1, 10);
$num = rand(1, 1000);
$arr2[$weight][] = $num;
}
$start = microtime(true);
$weight_sum = 0;
foreach($arr2 as $weight => $vals) {
$weight_sum += $weight * count($vals);
}
print "weighted sum is $weight_sum\n";
$r = rand(1, $weight_sum);
print "random value is $r\n";
$found = false;
$elem = null;
foreach($arr2 as $weight => $vals)
{
if($found) break;
for($j = 0; $j < count($vals); $j ++)
{
if($r < $weight)
{
$elem = $vals[$j];
$found = true;
break;
}
else
{
$r -= $weight;
}
}
}
$end = microtime(true);
print "random element is: $elem\n";
print "total time is " . ($end - $start) . "\n";
With sample output:
# php wr2.php
weighted sum is 2751550
random value is 345713
random element is: 681
total time is 0.017189025878906
measurement is hardly scientific - and fluctuates depending on where in the array the element falls (obviously) but it seems fast enough for huge datasets.

This way requires two random calculations but they should be faster and require about 1/4 of the memory but with some reduced accuracy if weights have disproportionate counts. (See Update for increased accuracy at the cost of some memory and processing)
Store a multidimensional array where each item is stored in the an array based on its weight:
$array[$weight][] = $item;
// example: Item with a weight of 5 would be $array[5][] = 'Item'
Generate a new array with the weights (1-10) appearing n times for n weight:
foreach($array as $n=>$null) {
for ($i=1;$i<=$n;$i++) {
$weights[] = $n;
}
}
The above array would be something like: [ 1, 2, 2, 3, 3, 3, 4, 4, 4, 4 ... ]
First calculation: Get a random weight from the weighted array we just created
$weight = $weights[mt_rand(0, count($weights)-1)];
Second calculation: Get a random key from that weight array
$value = $array[$weight][mt_rand(0, count($array[$weight])-1)];
Why this works: You solve the weighted issue by using the weighted array of integers we created. Then you select randomly from that weighted group.
Update: Because of the possibility of disproportionate counts of items per weight, you could add another loop and array for the counts to increase accuracy.
foreach($array as $n=>$null) {
$counts[$n] = count($array[$n]);
}
foreach($array as $n=>$null) {
// Calculate proportionate weight (number of items in this weight opposed to minimum counted weight)
$proportion = $n * ($counts[$n] / min($counts));
for ($i=1; $i<=$proportion; $i++) {
$weights[] = $n;
}
}
What this does is if you have 2000 10's and 100 1's, it'll add 200 10's (20 * 10, 20 because it has 20x the count, and 10 because it is weighted 10) instead of 10 10's to make it proportionate to how many are in there opposed the minimum weight count. So to be accurate, instead of adding one for EVERY possible key, you are just being proportionate based on the MINIMUM count of weights.

I greatly appreciate the answers above. Please consider this answer, which does not require checking every item in the original array.
// Given $a as an array of items
// where $a[0] is the item name and $a[1] is the item weight.
// It is known that weights are integers from 1 to 100.
for($i=0; $i<sizeof($a); $i++) // Safeguard described below
{
$item = $a[array_rand($a)];
if(rand(1,100)<=$item[1]) break;
}
This algorithm only requires storage for two variables ($i and $item) as $a was already created before the algorithm kicked in. It does not require a massive array of duplicate items or an array of intervals.
In a best-case scenario, this algorithm will touch one item in the original array and be done. In a worst-case scenario, it will touch n items in an array of n items (not necessarily every item in the array as some may be touched more than once).
If there was no safeguard, this could run forever. The safeguard is there to stop the algorithm if it simply never picks an item. When the safeguard is triggered, the last item touched is the one selected. However, in millions of tests using random data sets of 100,000 items with random weights of 1 to 10 (changing rand(1,100) to rand(1,10) in my code), the safeguard was never hit.
I made histograms comparing the frequency of items selected among my original algorithm, the ones from answers above, and the one in this answer. The differences in frequencies are trivial - easy to attribute to variances in the random numbers.
EDIT... It is apparent to me that my algorithm may be combined with the algorithm pala_ posted, removing the need for a safeguard.
In pala_'s algorithm, a list is required, which I call an interval list. To simplify, you begin with a random_weight that is rather high. You step down the list of items and subtract the weight of each one until your random_weight falls to zero (or less). Then, the item you ended on is your item to return. There are variations on this interval algorithm that I've tested and pala_'s is a very good one. But, I wanted to avoid making a list. I wanted to use only the given weighted list and never touch all the items. The following algorithm merges my use of random jumping with pala_'s interval list. Instead of a list, I randomly jump around the list. I am guaranteed to get to zero eventually, so no safeguard is needed.
// Given $a as the weighted array (described above)
$weight = rand(1,100); // The bigger this is, the slower the algorithm runs.
while($weight>0)
{
$item = $a[array_rand($a)];
$weight-= $item[1];
}
// $item is the random item you want.
I wish I could select both pala_ and this answer as the correct answers.

I'm not sure if this is "faster", but I think it may be more "balance"d between memory usage and speed.
The thought is to transform your current implementation (500000 items array) into an equal-length array (100000 items), with the lowest "origin" position as key, and origin index as value:
<?php
$set=[["a",3],["b",5]];
$current_implementation=["a","a","a","b","b","b","b","b"];
// 0=>0 means the lowest "position" 0
// points to 0 in the set;
// 3=>1 means the lowest "position" 3
// points to 1 in the set;
$my_implementation=[0=>0,3=>1];
And then randomly picks a number between 0 and highest "origin" position:
// 3 is the lowest position of the last element ("b")
// and 5 the weight of that last element
$my_implemention_pick=mt_rand(0,3+5-1);
Full code:
<?php
function randomPickByWeight(array $set)
{
$low=0;
$high=0;
$candidates=[];
foreach($set as $key=>$item)
{
$candidates[$high]=$key;
$high+=$item["weight"];
}
$pick=mt_rand($low,$high-1);
while(!array_key_exists($pick,$candidates))
{
$pick--;
}
return $set[$candidates[$pick]];
}
$cache=[];
for($i=0;$i<100000;$i++)
{
$cache[]=["item"=>"item {$i}","weight"=>mt_rand(1,10)];
}
$time=time();
for($i=0;$i<100;$i++)
{
print_r(randomPickByWeight($cache));
}
$time=time()-$time;
var_dump($time);
3v4l.org demo
3v4l.org have some time limitation on codes, so the demo didn't finished. On my laptop the above demo finished in 10 seconds (i7-4700 HQ)

ere is my offer in case I've understand you right. I offer you take a look and if there are some question I'll explain.
Some words in advance:
My sample is with only 3 stages of weight - to be clear
- With outer while I'm simulating your main loop - I count only to 100.
- The array must to be init with one set of initial numbers as shown in my sample.
- In every pass of main loop I get only one random value and I'm keeping the weight at all.
<?php
$array=array(
0=>array('item' => 'A', 'weight' => 1),
1=>array('item' => 'B', 'weight' => 2),
2=>array('item' => 'C', 'weight' => 3),
);
$etalon_weights=array(1,2,3);
$current_weights=array(0,0,0);
$ii=0;
while($ii<100){ // Simulates your main loop
// Randomisation cycle
if($current_weights==$etalon_weights){
$current_weights=array(0,0,0);
}
$ft=true;
while($ft){
$curindex=rand(0,(count($array)-1));
$cur=$array[$curindex];
if($current_weights[$cur['weight']-1]<$etalon_weights[$cur['weight']-1]){
echo $cur['item'];
$array[]=$cur;
$current_weights[$cur['weight']-1]++;
$ft=false;
}
}
$ii++;
}
?>

I'll use this input array for my explanation:
$values_and_weights=array(
"one"=>1,
"two"=>8,
"three"=>10,
"four"=>4,
"five"=>3,
"six"=>10
);
The simple version isn't going to work for you because your array is so large. It requires no array modification but may need to iterate the entire array, and that's a deal breaker.
/*$pick=mt_rand(1,array_sum($values_and_weights));
$x=0;
foreach($values_and_weights as $val=>$wgt){
if(($x+=$wgt)>=$pick){
echo "$val";
break;
}
}*/
For your case, re-structuring the array will offer great benefits.
The cost in memory for generating a new array will be increasingly justified as:
array size increases and
number of selections increases.
The new array requires the replacement of "weight" with a "limit" for each value by adding the previous element's weight to the current element's weight.
Then flip the array so that the limits are the array keys and the values are the array values.
The selection logic is: the selected value will have the lowest limit that is >= $pick.
// Declare new array using array_walk one-liner:
array_walk($values_and_weights,function($v,$k)use(&$limits_and_values,&$x){$limits_and_values[$x+=$v]=$k;});
//Alternative declaration method - 4-liner, foreach() loop:
/*$x=0;
foreach($values_and_weights as $val=>$wgt){
$limits_and_values[$x+=$wgt]=$val;
}*/
var_export($limits_and_values);
$limits_and_values looks like this:
array (
1 => 'one',
9 => 'two',
19 => 'three',
23 => 'four',
26 => 'five',
36 => 'six',
)
Now to generate the random $pick and select the value:
// $x (from walk/loop) is the same as writing: end($limits_and_values); $x=key($limits_and_values);
$pick=mt_rand(1,$x); // pull random integer between 1 and highest limit/key
while(!isset($limits_and_values[$pick])){++$pick;} // smallest possible loop to find key
echo $limits_and_values[$pick]; // this is your random (weighted) value
This approach is brilliant because isset() is very fast and the maximum number of isset() calls in the while loop can only be as many as the largest weight (not to be confused with limit) in the array.
FOR YOUR CASE, THIS APPROACH WILL FIND THE VALUE IN 10 ITERATIONS OR LESS!
Here is my Demo that will accept a weighted array (like $values_and_weights), then in just four lines:
Restructure the array,
Generate a random number,
Find the correct value, and
Display it.

Related

Cartesian product with specific criteria

I am attempting to find the cartesian product and append specific criteria.
I have four pools of 25 people each. Each person has a score and a price. Each person in each pool looks as such.
[0] => array(
"name" => "jacob",
"price" => 15,
"score" => 100
),
[1] => array(
"name" => "daniel",
"price" => 22,
"score" => 200
)
I want to find the best combination of people, with one person being picked from each pool. However, there is a ceiling price where no grouping can exceed a certain price.
I have been messing with cartesians and permutation functions and cannot seem to figure out how to do this. The only way I know how to code it is to have nested foreach loops, but that is incredibly taxing.
This code below, as you can see, is incredibly inefficient. Especially if the pools increase!
foreach($poolA as $vA) {
foreach($poolb as $vB) {
foreach($poolC as $vC) {
foreach($poolD as $vD) {
// calculate total price and check if valid
// calculate total score and check if greatest
// if so, add to $greatest array
}
}
}
}
I also thought I could find a way to calculate the total price/score ratio and use that to my advantage, but I don't know what I'm missing.
As pointed out by Barmar, sorting the people in each pool allows you to halt the loops early when the total price exceeds the limit and hence reduces the number of cases you need to check. However, the asymptotic complexity for applying this improvement is still O(n4) (where n is the number of people in a pool).
I will outline an alternative approach with better asymptotic complexity as follow:
Construct a pool X that contains all pairs of people with one from pool A and the other from pool B.
Construct a pool Y that contains all pairs of people with one from pool C and the other from pool D.
Sort the pairs in pool X by total price. Then for any pairs with the same price, retain the one with the highest score and discard the remaining pairs.
Sort the pairs in pool Y by total price. Then for any pairs with the same price, retain the one with the highest score and discard the remaining pairs.
Do a loop with two pointers to check over all possible combinations that satisfy the price constraint, where the head pointer starts at the first item in pool X, and the tail pointer starts at the last item in pool Y. Sample code is given below to illustrate how this loop works:
==========================================================================
$head = 0;
$tail = sizeof($poolY) - 1;
while ($head < sizeof($poolX) && $tail >= 0) {
$total_price = $poolX[$head].price + $poolY[$tail].price;
// Your logic goes here...
if ($total_price > $price_limit) {
$tail--;
} else if ($total_price < $price_limit) {
$head++;
} else {
$head++;
$tail--;
}
}
for ($i = $head; $i < sizeof($poolX); $i++) {
// Your logic goes here...
}
for ($i = $tail; $i >= 0; $i--) {
// Your logic goes here...
}
==========================================================================
The complexity of steps 1 and 2 are O(n2), and the complexity of steps 3 and 4 can be done in O(n2 log(n)) using balanced binary tree. And step 5 is essentially a linear scan over n2 items, so the complexity is also O(n2). Therefore the overall complexity of this approach is O(n2 log(n)).
A couple of things to note about your approach here. Speaking strictly from a mathematics perspective, you're calculating way more permutations than is actually necessary to arrive at a definitive answer.
In combinatorics, there are two important questions to ask in order to arrive at the exact number of permutations necessary to yield all possible combinations.
Does order matter? (for your case, it does not)
Is repetition allowed? (for your case, it is not necessary to repeat)
Since the answer to both of these question is no, you need only a fraction of the iterations you're currently doing with your nested loop. Currently you are doing, pow(25, 4) permutations, which is 390625. You only actually need n! / r! (n-r)! or gmp_fact(25) / (gmp_fact(4) * gmp_fact(25 - 4)) which is only 12650 total permutations needed.
Here's a simple example of a function that produces combinations without repetition (and where order does not matter), using a generator in PHP (taken from this SO answer).
function comb($m, $a) {
if (!$m) {
yield [];
return;
}
if (!$a) {
return;
}
$h = $a[0];
$t = array_slice($a, 1);
foreach(comb($m - 1, $t) as $c)
yield array_merge([$h], $c);
foreach(comb($m, $t) as $c)
yield $c;
}
$a = range(1,25); // 25 people in each pool
$n = 4; // 4 pools
foreach(comb($n, $a) as $i => $c) {
echo $i, ": ", array_sum($c), "\n";
}
It would be pretty easy to modify the generator function to check whether the sum of prices meets/exceeds the desired threshhold and only return valid results from there (i.e. abandoning early where needed).
The reason repetition and order are not important here for your use case, is because it doesn't matter whether you add $price1 + $price2 or $price2 + $price1, the result will undoubtedly be the same in both permutations. So you only need to add up each unique set once to ascertain all possible sums.
Similar to chiwangs solutions, you may eliminate up front every group member, where another group member in that group exists, with same or higher score for a lower price.
Maybe you can eliminate many members in each group with this approach.
You may then either use this technique, to build two pairs and repeat the filtering (eliminate pairs, where anothr pair exists, with higher score for the same or lower costs) and then combine the pairs the same way, or add a member step by step (one pair, a triple, a quartett).
If there exists some member, who exceed the allowed sum price on their own, they can be eliminated up front.
If you order the 4 groups by score descending, and you find a solution abcd, where the sum price is legal, you found the optimal solution for a given set of abc.
The reponses here helped me figure out the best way for me to do this.
I haven't optimized the function yet, but essentially I looped through each results two at a time to find the combined salaries / scores for each combination in the two pools.
I stored the combined salary -> score combination in a new array, and if the salary already existed, I'd compare scores and remove the lower one.
$results = array();
foreach($poolA as $A) {
foreach($poolB as $B) {
$total_salary = $A['Salary'] + $B['Salary'];
$total_score = $A['Score'] + $B['Score'];
$pids = array($A['pid'], $B['pid']);
if(isset($results[$total_salary]) {
if($total_score > $results[$total_salary]['Score']) {
$results[$total_salary]['Score'] => $total_score;
$results[$total_salary]['pid'] => $pids;
} else {
$results[$total_salary]['Score'] = $total_score;
$results[$total_salary]['pid'] = $pids;
}
}
}
After this loop, I have another one that is identical, except my foreach loops are between $results and $poolC.
foreach($results as $R) {
foreach($poolC as $C) {
and finally, I do it one last time for $poolD.
I am working on optimizing the code by putting all four foreach loops into one.
Thank you everyone for your help, I was able to loop through 9 lists with 25+ people in each and find the best result in an incredibly quick processing time!

array picking by percent [duplicate]

I know how to generate a random number in PHP but lets say I want a random number between 1-10 but I want more 3,4,5's then 8,9,10's. How is this possible? I would post what I have tried but honestly, I don't even know where to start.
Based on #Allain's answer/link, I worked up this quick function in PHP. You will have to modify it if you want to use non-integer weighting.
/**
* getRandomWeightedElement()
* Utility function for getting random values with weighting.
* Pass in an associative array, such as array('A'=>5, 'B'=>45, 'C'=>50)
* An array like this means that "A" has a 5% chance of being selected, "B" 45%, and "C" 50%.
* The return value is the array key, A, B, or C in this case. Note that the values assigned
* do not have to be percentages. The values are simply relative to each other. If one value
* weight was 2, and the other weight of 1, the value with the weight of 2 has about a 66%
* chance of being selected. Also note that weights should be integers.
*
* #param array $weightedValues
*/
function getRandomWeightedElement(array $weightedValues) {
$rand = mt_rand(1, (int) array_sum($weightedValues));
foreach ($weightedValues as $key => $value) {
$rand -= $value;
if ($rand <= 0) {
return $key;
}
}
}
For an efficient random number skewed consistently towards one end of the scale:
Choose a continuous random number between 0..1
Raise to a power γ, to bias it. 1 is unweighted, lower gives more of the higher numbers and vice versa
Scale to desired range and round to integer
eg. in PHP (untested):
function weightedrand($min, $max, $gamma) {
$offset= $max-$min+1;
return floor($min+pow(lcg_value(), $gamma)*$offset);
}
echo(weightedrand(1, 10, 1.5));
There's a pretty good tutorial for you.
Basically:
Sum the weights of all the numbers.
Pick a random number less than that
subtract the weights in order until the result is negative and return that number if it is.
This tutorial walks you through it, in PHP, with multiple cut and paste solutions. Note that this routine is slightly modified from what you'll find on that page, as a result of the comment below.
A function taken from the post:
/**
* weighted_random_simple()
* Pick a random item based on weights.
*
* #param array $values Array of elements to choose from
* #param array $weights An array of weights. Weight must be a positive number.
* #return mixed Selected element.
*/
function weighted_random_simple($values, $weights){
$count = count($values);
$i = 0;
$n = 0;
$num = mt_rand(1, array_sum($weights));
while($i < $count){
$n += $weights[$i];
if($n >= $num){
break;
}
$i++;
}
return $values[$i];
}
/**
* #param array $weightedValues
* #return string
*/
function getRandomWeightedElement(array $weightedValues)
{
$array = array();
foreach ($weightedValues as $key => $weight) {
$array = array_merge(array_fill(0, $weight, $key), $array);
}
return $array[array_rand($array)];
}
getRandomWeightedElement(array('A'=>10, 'B'=>90));
This is very easy method. How get random weighted element. I fill array variable $key. I get $key to array $weight x. After that, use array_rand to array. And I have random value ;).
Plain and fair.
Just copy/paste and test it.
/**
* Return weighted probability
* #param (array) prob=>item
* #return key
*/
function weightedRand($stream) {
$pos = mt_rand(1,array_sum(array_keys($stream)));
$em = 0;
foreach ($stream as $k => $v) {
$em += $k;
if ($em >= $pos)
return $v;
}
}
$item['30'] = 'I have more chances than everybody :]';
$item['10'] = 'I have good chances';
$item['1'] = 'I\'m difficult to appear...';
for ($i = 1; $i <= 10; $i++) {
echo weightedRand($item).'<br />';
}
Edit: Added missing bracket at the end.
You can use weightedChoice from Non-standard PHP library. It accepts a list of pairs (item, weight) to have the possibility to work with items that can't be array keys. You can use pairs function to convert array(item => weight) to the needed format.
use function \nspl\a\pairs;
use function \nspl\rnd\weightedChoice;
$weights = pairs(array(
1 => 10,
2 => 15,
3 => 15,
4 => 15,
5 => 15,
6 => 10,
7 => 5,
8 => 5,
9 => 5,
10 => 5
));
$number = weightedChoice($weights);
In this example, 2-5 will appear 3 times more often than 7-10.
i used Brad's answar and changed it a little to fit my situation and add more flexibility
i have an array with array value
$products = [
['id'=>1,'name'=> 'product1' , 'chance'=>2] ,
['id'=>2,'name'=> 'product2' , 'chance'=>7]
]
first i shuffle the products array
shuffle($products );
then you can pass it to the function
function getRandomWeightedElement(array $products) {
$chancesSum = 0;
foreach ($products as $product){
$chancesSum += (int) $product['chance'];
}
$rand = mt_rand(1, $chancesSum);
$range = 0;
foreach ($products as $product) {
$range += (int) $product['chance'];
$compare = $rand - $range;
if ($compare <= 0){
return (int) $product['id'];
}
}}
Since I used IainMH's solution, I may as well share my PHP code:
<pre><?php
// Set total number of iterations
$total = 1716;
// Set array of random number
$arr = array(1, 2, 3, 3, 3, 3, 3, 4, 4, 4, 4, 5);
$arr2 = array(0, 0, 1, 1, 2, 2, 2, 3, 3, 4, 5);
// Print out random numbers
for ($i=0; $i<$total; $i++){
// Pick random array index
$rand = array_rand($arr);
$rand2 = array_rand($arr2);
// Print array values
print $arr[$rand] . "\t" . $arr2[$rand2] . "\r\n";
}
?></pre>
I just released a class to perform weighted sorting easily.
It's based on the same algorithm mentioned in Brad's and Allain's answers, and is optimized for speed, unit-tested for uniform distribution, and supports elements of any PHP type.
Using it is simple. Instantiate it:
$picker = new Brick\Random\RandomPicker();
Then add elements as an array of weighted values (only if your elements are strings or integers):
$picker->addElements([
'foo' => 25,
'bar' => 50,
'baz' => 100
]);
Or use individual calls to addElement(). This method supports any kind of PHP values as elements (strings, numbers, objects, ...), as opposed to the array approach:
$picker->addElement($object1, $weight1);
$picker->addElement($object2, $weight2);
Then get a random element:
$element = $picker->getRandomElement();
The probability of getting one of the elements depends on its associated weight. The only restriction is that weights must be integers.
Many of the answers on this page seem to use array bloating, excessive iteration, a library, or a hard-to-read process. Of course, everyone thinks their own baby is the cutest, but I honestly think my approach is lean, simple and easy to read/modify...
Per the OP, I will create an array of values (declared as keys) from 1 to 10, with 3, 4, and 5 having double the weight of the other values (declared as values).
$values_and_weights=array(
1=>1,
2=>1,
3=>2,
4=>2,
5=>2,
6=>1,
7=>1,
8=>1,
9=>1,
10=>1
);
If you are only going to make one random selection and/or your array is relatively small* (do your own benchmarking to be sure), this is probably your best bet:
$pick=mt_rand(1,array_sum($values_and_weights));
$x=0;
foreach($values_and_weights as $val=>$wgt){
if(($x+=$wgt)>=$pick){
echo "$val";
break;
}
}
This approach involves no array modification and probably won't need to iterate the entire array (but may).
On the other hand, if you are going to make more than one random selection on the array and/or your array is sufficiently large* (do your own benchmarking to be sure), restructuring the array may be better.
The cost in memory for generating a new array will be increasingly justified as:
array size increases and
number of random selections increases.
The new array requires the replacement of "weight" with a "limit" for each value by adding the previous element's weight to the current element's weight.
Then flip the array so that the limits are the array keys and the values are the array values.
The logic is: the selected value will have the lowest limit that is >= $pick.
// Declare new array using array_walk one-liner:
array_walk($values_and_weights,function($v,$k)use(&$limits_and_values,&$x){$limits_and_values[$x+=$v]=$k;});
//Alternative declaration method - 4-liner, foreach() loop:
/*$x=0;
foreach($values_and_weights as $val=>$wgt){
$limits_and_values[$x+=$wgt]=$val;
}*/
var_export($limits_and_values);
Creates this array:
array (
1 => 1,
2 => 2,
4 => 3,
6 => 4,
8 => 5,
9 => 6,
10 => 7,
11 => 8,
12 => 9,
13 => 10,
)
Now to generate the random $pick and select the value:
// $x (from walk/loop) is the same as writing: end($limits_and_values); $x=key($limits_and_values);
$pick=mt_rand(1,$x); // pull random integer between 1 and highest limit/key
while(!isset($limits_and_values[$pick])){++$pick;} // smallest possible loop to find key
echo $limits_and_values[$pick]; // this is your random (weighted) value
This approach is brilliant because isset() is very fast and the maximum number of isset() calls in the while loop can only be as many as the largest weight (not to be confused with limit) in the array. For this case, maximum iterations = 2!
THIS APPROACH NEVER NEEDS TO ITERATE THE ENTIRE ARRAY
I used this:
mt_rand($min, mt_rand($min, $max));
it give more lower values and less higher values, since the more the value is high the more is cutted out by one of the mt_rand
The probability is linearly increasing in the lower values, forming a square diagonal (see maths lower)
PRO: easy and strightforward
CON: maybe too simple so not enough weightable or balanceable for some use case
Maths:
let i index of i-nth value from min to max,
let P(i) the probability of obtaining the i-nth value,
let N=max-min:
P(i)=(1+N-i)/sum(1,N)
Since N is equals for all terms:
P(i) is proportional to N-i
so, in facts, the probability is linearly increasing in the lower values, forming a square diagonal
Variants:
you can write variants:
mt_rand($min, mt_rand(1, mt_rand(1, $max))); //value more given in low part
mt_rand(mt_rand($min, $max), $max); //mirrored, more upper values than lower
...
function getBucketFromWeights($values) {
$total = $currentTotal = $bucket = 0;
foreach ($values as $amount) {
$total += $amount;
}
$rand = mt_rand(0, $total-1);
foreach ($values as $amount) {
$currentTotal += $amount;
if ($rand => $currentTotal) {
$bucket++;
}
else {
break;
}
}
return $bucket;
}
I ugh modified this from an answer here Picking random element by user defined weights
After I wrote this I saw someone else had an even more elegant answer. He he he he.

How can I gradually make an array sparser?

I have a fully-populated array of values, and I would like to arbitrarily remove elements from this array with more removed towards the far end.
For example, given input ( where a . signifies a populated index )
............................................
I would like something like
....... . ... .. . . .. . .
My first thought was to count the elements, then iterate over the array generating a random number somewhere between the current index and the total size of the array, eg:
if ( mt_rand( 0, $total ) > $total - $current_index )
//remove this element
however, as this entails making a random number each time the loop goes round it becomes very arduous.
Is there a better way of doing this?
One easy way is to flip a weighted coin for each entry with coin flips more weighted towards the end. For example, if the array is size n, for each entry you could choose a random number from 0 to n-1 and only keep the value if the index is less than or equal to the random number. (That is, keep each entry with probability 1 - index/total.) This has the nice advantage that if you're going to be compacting your array anyways, and you're using a good enough but efficient random number generator (could be a simple integer hash over a nonce), it's going to be rather fast for memory access.
On the other hand if you're only blanking out a few items and aren't rearranging the array, you can go with some sort of weighted random number generator that more often chooses numbers that are toward the end of the index. For example, if you have a random number generator that generates floats in the value of [0,1] (closed or open bounds not mattering that much likely), consider obtaining such a random float r and squaring it. This will tend to prefer lower values. You can fix this by flipping it around: 1-r^2. Of course, you need this to be in your index range of 0 to n - 1, so take floor(n * (1 - r^2)) and also round n down to n-1.
There's practically an infinite number of variations on both of these techniques.
This is quite probably not the best/most efficient way to do this, but it is the best I can come up with and it does work.
N.B. the codepad example takes a long time to execute, but this is because of the pretty-print loop I added to the end so you can see it visibly working. If you remove the inner loop, execution time drops to acceptable levels.
<?php
$array = range(0, 99);
for ($i = 0, $count = count($array); $i < $count; $i++) {
// Get array keys
$keys = array_keys($array);
// Get a random number between 0 and count($keys) - 1
$rand = mt_rand(0, count($keys) - 1);
// Cut $rand elements off the beginning of the keys
$keys = array_slice($keys, $rand);
// Unset a random key from the remaining keys
unset($array[$keys[array_rand($keys)]]);
}
This method isn't random- it works by you defining a function, and its inverse. Different functions, with different constant coefficients will have different distribution characteristics.
The results are very pattern like, as expected when mapping a continuous function to a discrete structure like an array.
Here's an example using a quadratic function. You could try varying the constant.
demo: http://codepad.org/ojU3s9xM
#as in y = x^2 / 7;
function y($x) {
return $x * $x / 7;
}
function x($y) {
return 7 * sqrt($y);
}
$theArray = range(0,100);
$size = count($theArray);
//use func inverse to find the max value we can input to $y() without going out of array bounds
$maximumX = x($size);
for ($i=0; $i<$maximumX; $i++) {
$index = (int) y($i);
//unset the index if it still exists, else, the next greatest index
while (!isset($theArray[$index]) && $index < $size) {
$index++;
}
unset($theArray[$index]);
}
for ($i=0; $i<$size; $i++) {
printf("[%-3s]", isset($theArray[$i]) ? $theArray[$i] : '');
}

Fixed Proportionate Selection

I have a set of elements and i need to choose any one element out of it. Each element is associated with a percentage chance. The percentages add to 100.
I need to choose one out of those element so that the chances of an element being chosen is equal to the percent value. So if a element has 25% chance, it is supposed to have 25% chances of getting chosen. In other words, if we choose elements 1 mil times, that element should be chosen near 250k times.
What you describe is a multinomial process.
http://en.wikipedia.org/wiki/Multinomial_distribution#Sampling_from_a_multinomial_distribution
They way to generate such random process is like this:
( I'll use pseudo code but it should be easy to make it in to real code. )
Sort the 'boxes' in reverse order of their probability:
(not needed. it's just an optimization)
so that you have for example values=[0.45,0.3,0.15,0.1]
then create the 'cumulative' distribution, which is the sum of all elements with index <=i.
pseudocode:
cumulant=[0,0,0,0] // initiate it
s=0
for j=0 to size()-1 {
s=s+values[i] ;
cumulant[i]=s
}
in our case cumulant=[0.45,0.70,0.85 ,1 ]
make a uniform random number x between 0 and 1.
For php: http://php.net/manual/en/function.rand.php
the resulting random box index i is
the highest i for which cumulant[i]< x
pseudocode:
for j=0 to size()-1 {
if !(cumulant[i]<){
print "your index is ",i
break;
}
that is it. Get another random index i by going back to point 3.
if you sort like suggested above, that means that the final search will be faster. For example, if you have this vector of probabilities: 0.001 0.001 0.001 0.001 0.996 then, when you sort it, you will almost always only have to look only at index i=0, since the random number x will almost always be lower than 0.996. If the sort pays off or not depends on if you repeatedly use the same 'boxes'. So, yes with 250k tries it will help a lot. Just remember that the box index i you get is for the sorted vector.
I guess it was faster for me to write it than it was for you to show us what you did so far.
Probably not the best solution, but as it stands, it looks like it's the only one you've got.
Here you go:
$elements = array(
'This' => 25,
'is' => 15,
'a' => 15,
'crappy' => 20,
'list' => 25
);
asort($elements);
$elements = array_reverse($elements);
// Precalc cumulative value
$cumulant = 0;
foreach ($elements as $key => &$value) {
$cumulant += $value;
$value = $cumulant;
}
function pickAnElement($elements) {
$random = rand(1, 100);
foreach ($elements as $key => $value) {
if ($random <= $value) {
return $key;
}
}
}
$picks = array();
for ($i = 0; $i < 10000; $i++) {
$element = pickAnElement($elements);
if (!array_key_exists($element, $picks)) {
$picks[$element] = 0;
}
$picks[$element]++;
}
var_dump($picks);
Inspired by Johans answer, I added a loop to sort and pre-calculate the cumulant.

Generating random results by weight in PHP?

I know how to generate a random number in PHP but lets say I want a random number between 1-10 but I want more 3,4,5's then 8,9,10's. How is this possible? I would post what I have tried but honestly, I don't even know where to start.
Based on #Allain's answer/link, I worked up this quick function in PHP. You will have to modify it if you want to use non-integer weighting.
/**
* getRandomWeightedElement()
* Utility function for getting random values with weighting.
* Pass in an associative array, such as array('A'=>5, 'B'=>45, 'C'=>50)
* An array like this means that "A" has a 5% chance of being selected, "B" 45%, and "C" 50%.
* The return value is the array key, A, B, or C in this case. Note that the values assigned
* do not have to be percentages. The values are simply relative to each other. If one value
* weight was 2, and the other weight of 1, the value with the weight of 2 has about a 66%
* chance of being selected. Also note that weights should be integers.
*
* #param array $weightedValues
*/
function getRandomWeightedElement(array $weightedValues) {
$rand = mt_rand(1, (int) array_sum($weightedValues));
foreach ($weightedValues as $key => $value) {
$rand -= $value;
if ($rand <= 0) {
return $key;
}
}
}
For an efficient random number skewed consistently towards one end of the scale:
Choose a continuous random number between 0..1
Raise to a power γ, to bias it. 1 is unweighted, lower gives more of the higher numbers and vice versa
Scale to desired range and round to integer
eg. in PHP (untested):
function weightedrand($min, $max, $gamma) {
$offset= $max-$min+1;
return floor($min+pow(lcg_value(), $gamma)*$offset);
}
echo(weightedrand(1, 10, 1.5));
There's a pretty good tutorial for you.
Basically:
Sum the weights of all the numbers.
Pick a random number less than that
subtract the weights in order until the result is negative and return that number if it is.
This tutorial walks you through it, in PHP, with multiple cut and paste solutions. Note that this routine is slightly modified from what you'll find on that page, as a result of the comment below.
A function taken from the post:
/**
* weighted_random_simple()
* Pick a random item based on weights.
*
* #param array $values Array of elements to choose from
* #param array $weights An array of weights. Weight must be a positive number.
* #return mixed Selected element.
*/
function weighted_random_simple($values, $weights){
$count = count($values);
$i = 0;
$n = 0;
$num = mt_rand(1, array_sum($weights));
while($i < $count){
$n += $weights[$i];
if($n >= $num){
break;
}
$i++;
}
return $values[$i];
}
/**
* #param array $weightedValues
* #return string
*/
function getRandomWeightedElement(array $weightedValues)
{
$array = array();
foreach ($weightedValues as $key => $weight) {
$array = array_merge(array_fill(0, $weight, $key), $array);
}
return $array[array_rand($array)];
}
getRandomWeightedElement(array('A'=>10, 'B'=>90));
This is very easy method. How get random weighted element. I fill array variable $key. I get $key to array $weight x. After that, use array_rand to array. And I have random value ;).
Plain and fair.
Just copy/paste and test it.
/**
* Return weighted probability
* #param (array) prob=>item
* #return key
*/
function weightedRand($stream) {
$pos = mt_rand(1,array_sum(array_keys($stream)));
$em = 0;
foreach ($stream as $k => $v) {
$em += $k;
if ($em >= $pos)
return $v;
}
}
$item['30'] = 'I have more chances than everybody :]';
$item['10'] = 'I have good chances';
$item['1'] = 'I\'m difficult to appear...';
for ($i = 1; $i <= 10; $i++) {
echo weightedRand($item).'<br />';
}
Edit: Added missing bracket at the end.
You can use weightedChoice from Non-standard PHP library. It accepts a list of pairs (item, weight) to have the possibility to work with items that can't be array keys. You can use pairs function to convert array(item => weight) to the needed format.
use function \nspl\a\pairs;
use function \nspl\rnd\weightedChoice;
$weights = pairs(array(
1 => 10,
2 => 15,
3 => 15,
4 => 15,
5 => 15,
6 => 10,
7 => 5,
8 => 5,
9 => 5,
10 => 5
));
$number = weightedChoice($weights);
In this example, 2-5 will appear 3 times more often than 7-10.
i used Brad's answar and changed it a little to fit my situation and add more flexibility
i have an array with array value
$products = [
['id'=>1,'name'=> 'product1' , 'chance'=>2] ,
['id'=>2,'name'=> 'product2' , 'chance'=>7]
]
first i shuffle the products array
shuffle($products );
then you can pass it to the function
function getRandomWeightedElement(array $products) {
$chancesSum = 0;
foreach ($products as $product){
$chancesSum += (int) $product['chance'];
}
$rand = mt_rand(1, $chancesSum);
$range = 0;
foreach ($products as $product) {
$range += (int) $product['chance'];
$compare = $rand - $range;
if ($compare <= 0){
return (int) $product['id'];
}
}}
Since I used IainMH's solution, I may as well share my PHP code:
<pre><?php
// Set total number of iterations
$total = 1716;
// Set array of random number
$arr = array(1, 2, 3, 3, 3, 3, 3, 4, 4, 4, 4, 5);
$arr2 = array(0, 0, 1, 1, 2, 2, 2, 3, 3, 4, 5);
// Print out random numbers
for ($i=0; $i<$total; $i++){
// Pick random array index
$rand = array_rand($arr);
$rand2 = array_rand($arr2);
// Print array values
print $arr[$rand] . "\t" . $arr2[$rand2] . "\r\n";
}
?></pre>
I just released a class to perform weighted sorting easily.
It's based on the same algorithm mentioned in Brad's and Allain's answers, and is optimized for speed, unit-tested for uniform distribution, and supports elements of any PHP type.
Using it is simple. Instantiate it:
$picker = new Brick\Random\RandomPicker();
Then add elements as an array of weighted values (only if your elements are strings or integers):
$picker->addElements([
'foo' => 25,
'bar' => 50,
'baz' => 100
]);
Or use individual calls to addElement(). This method supports any kind of PHP values as elements (strings, numbers, objects, ...), as opposed to the array approach:
$picker->addElement($object1, $weight1);
$picker->addElement($object2, $weight2);
Then get a random element:
$element = $picker->getRandomElement();
The probability of getting one of the elements depends on its associated weight. The only restriction is that weights must be integers.
Many of the answers on this page seem to use array bloating, excessive iteration, a library, or a hard-to-read process. Of course, everyone thinks their own baby is the cutest, but I honestly think my approach is lean, simple and easy to read/modify...
Per the OP, I will create an array of values (declared as keys) from 1 to 10, with 3, 4, and 5 having double the weight of the other values (declared as values).
$values_and_weights=array(
1=>1,
2=>1,
3=>2,
4=>2,
5=>2,
6=>1,
7=>1,
8=>1,
9=>1,
10=>1
);
If you are only going to make one random selection and/or your array is relatively small* (do your own benchmarking to be sure), this is probably your best bet:
$pick=mt_rand(1,array_sum($values_and_weights));
$x=0;
foreach($values_and_weights as $val=>$wgt){
if(($x+=$wgt)>=$pick){
echo "$val";
break;
}
}
This approach involves no array modification and probably won't need to iterate the entire array (but may).
On the other hand, if you are going to make more than one random selection on the array and/or your array is sufficiently large* (do your own benchmarking to be sure), restructuring the array may be better.
The cost in memory for generating a new array will be increasingly justified as:
array size increases and
number of random selections increases.
The new array requires the replacement of "weight" with a "limit" for each value by adding the previous element's weight to the current element's weight.
Then flip the array so that the limits are the array keys and the values are the array values.
The logic is: the selected value will have the lowest limit that is >= $pick.
// Declare new array using array_walk one-liner:
array_walk($values_and_weights,function($v,$k)use(&$limits_and_values,&$x){$limits_and_values[$x+=$v]=$k;});
//Alternative declaration method - 4-liner, foreach() loop:
/*$x=0;
foreach($values_and_weights as $val=>$wgt){
$limits_and_values[$x+=$wgt]=$val;
}*/
var_export($limits_and_values);
Creates this array:
array (
1 => 1,
2 => 2,
4 => 3,
6 => 4,
8 => 5,
9 => 6,
10 => 7,
11 => 8,
12 => 9,
13 => 10,
)
Now to generate the random $pick and select the value:
// $x (from walk/loop) is the same as writing: end($limits_and_values); $x=key($limits_and_values);
$pick=mt_rand(1,$x); // pull random integer between 1 and highest limit/key
while(!isset($limits_and_values[$pick])){++$pick;} // smallest possible loop to find key
echo $limits_and_values[$pick]; // this is your random (weighted) value
This approach is brilliant because isset() is very fast and the maximum number of isset() calls in the while loop can only be as many as the largest weight (not to be confused with limit) in the array. For this case, maximum iterations = 2!
THIS APPROACH NEVER NEEDS TO ITERATE THE ENTIRE ARRAY
I used this:
mt_rand($min, mt_rand($min, $max));
it give more lower values and less higher values, since the more the value is high the more is cutted out by one of the mt_rand
The probability is linearly increasing in the lower values, forming a square diagonal (see maths lower)
PRO: easy and strightforward
CON: maybe too simple so not enough weightable or balanceable for some use case
Maths:
let i index of i-nth value from min to max,
let P(i) the probability of obtaining the i-nth value,
let N=max-min:
P(i)=(1+N-i)/sum(1,N)
Since N is equals for all terms:
P(i) is proportional to N-i
so, in facts, the probability is linearly increasing in the lower values, forming a square diagonal
Variants:
you can write variants:
mt_rand($min, mt_rand(1, mt_rand(1, $max))); //value more given in low part
mt_rand(mt_rand($min, $max), $max); //mirrored, more upper values than lower
...
function getBucketFromWeights($values) {
$total = $currentTotal = $bucket = 0;
foreach ($values as $amount) {
$total += $amount;
}
$rand = mt_rand(0, $total-1);
foreach ($values as $amount) {
$currentTotal += $amount;
if ($rand => $currentTotal) {
$bucket++;
}
else {
break;
}
}
return $bucket;
}
I ugh modified this from an answer here Picking random element by user defined weights
After I wrote this I saw someone else had an even more elegant answer. He he he he.

Categories