Related
I have some elements that I'm trying to randomize at 50% chance of output. Wrote a quick if statement like this.
$rand = mt_rand(1, 2);
if ( $rand == 1 ) {
echo "hello";
} else {
echo "goodbye";
}
In notice that when using mt_rand, "goodbye" is output many times in a row, whereas, if I just use "rand," it's a more equal distribution.
Is there something about mt_rand that makes it worse at handling a simple 1-2 randomization like this? Or is my dataset so small that these results are just anecdotal?
To get the same value "many times in a row" is a possible outcome of a randomly generated series. It would not be completely random if such a pattern were not allowed to occur. If you would continue taking samples, you would also find that the opposite value will sometimes occur several times in a row, provided you keep going long enough.
One way to test that the generated values are indeed quite random and uniformly distributed, is to count how many times the same value is generated as the one generated before, and how many times the opposite value is generated.
Note that the strings "hello" and "goodbye" don't add much useful information; we can just look at the values 1 and 2.
Here is how you could do such a test:
// $countAfter[$i][$j] will contain the number of occurrences of
// a pair $i, $j in the randomly generated sequence.
// So there is an entry for [1][1], [1][2], [2][1] and [2][2]:
$countAfter = [1 => [1 => 0, 2 => 0],
2 => [1 => 0, 2 => 0]];
$prev = 1; // We assume for simplicity that the "previously" generated value was 1
for ($i = 0; $i < 10000; $i++) { // Produce a large enough sample
$n = mt_rand(1, 2);
$countAfter[$prev][$n]++; // Increase the counter that corresponds to the generated pair
$prev = $n;
}
print_r($countAfter);
You can see in this demo that the 4 numbers that are output do not differ that much. Output is something like:
Array (
[1] => Array (
[1] => 2464
[2] => 2558
)
[2] => Array (
[1] => 2558
[2] => 2420
)
)
This means that 1 and 2 are generated about an equal number of times and that a repetition of a value happens just as often as a toggle in the series.
Obviously these numbers are rarely exactly the same, since that would mean the last couple of generated values would not be random at all, as they would need to bring those counts to the desired value.
The important thing is that your sample needs to be large enough to see the pattern of a uniform distribution confirmed.
I have a set of items. I need to randomly pick one. The problem is that they each have a weight of 1-10. A weight of 2 means that the item is twice as likely to be picked than a weight of 1. A weight of 3 is three times as likely.
I currently fill an array with each item. If the weight is 3, I put three copies of the item in the array. Then, I pick a random item.
My method is fast, but uses a lot of memory. I am trying to think of a faster method, but nothing comes to mind. Anyone have a trick for this problem?
EDIT: My Code...
Apparently, I wasn't clear. I do not want to use (or improve) my code. This is what I did.
//Given an array $a where $a[0] is an item name and $a[1] is the weight from 1 to 100.
$b = array();
foreach($a as $t)
$b = array_merge($b, array_fill(0,$t[1],$t));
$item = $b[array_rand($b)];
This required me to check every item in $a and uses max_weight/2*size of $a memory for the array. I wanted a COMPLETELY DIFFERENT algorithm.
Further, I asked this question in the middle of the night using a phone. Typing code on a phone is nearly impossible because those silly virtual keyboards simply suck. It auto-corrects everything, ruining any code I type.
An yet further, I woke up this morning with an entirely new algorithm that uses virtual no extra memory at all and does not require checking every item in the array. I posted it as an answer below.
This ones your huckleberry.
$arr = array(
array("val" => "one", "weight" => 1),
array("val" => "two", "weight" => 2),
array("val" => "three", "weight" => 3),
array("val" => "four", "weight" => 4)
);
$weight_sum = 0;
foreach($arr as $val)
{
$weight_sum += $val['weight'];
}
$r = rand(1, $weight_sum);
print "random value is $r\n";
for($i = 0; $i < count($arr); $i++)
{
if($r <= $arr[$i]['weight'])
{
print "$r <= {$arr[$i]['weight']}, this is our match\n";
print $arr[$i]['val'] . "\n";
break;
}
else
{
print "$r > {$arr[$i]['weight']}, subtracting weight\n";
$r -= $arr[$i]['weight'];
print "new \$r is $r\n";
}
}
No need to generate arrays containing an item for every weight, no need to fill an array with n elements for a weight of n. Just generate a random number between 1 and total weight, then loop through the array until you find a weight less than your random number. If it isn't less than the number, subtract that weight from the random and continue.
Sample output:
# php wr.php
random value is 8
8 > 1, subtracting weight
new $r is 7
7 > 2, subtracting weight
new $r is 5
5 > 3, subtracting weight
new $r is 2
2 <= 4, this is our match
four
This should also support fractional weights.
modified version to use array keyed by weight, rather than by item
$arr2 = array(
);
for($i = 0; $i <= 500000; $i++)
{
$weight = rand(1, 10);
$num = rand(1, 1000);
$arr2[$weight][] = $num;
}
$start = microtime(true);
$weight_sum = 0;
foreach($arr2 as $weight => $vals) {
$weight_sum += $weight * count($vals);
}
print "weighted sum is $weight_sum\n";
$r = rand(1, $weight_sum);
print "random value is $r\n";
$found = false;
$elem = null;
foreach($arr2 as $weight => $vals)
{
if($found) break;
for($j = 0; $j < count($vals); $j ++)
{
if($r < $weight)
{
$elem = $vals[$j];
$found = true;
break;
}
else
{
$r -= $weight;
}
}
}
$end = microtime(true);
print "random element is: $elem\n";
print "total time is " . ($end - $start) . "\n";
With sample output:
# php wr2.php
weighted sum is 2751550
random value is 345713
random element is: 681
total time is 0.017189025878906
measurement is hardly scientific - and fluctuates depending on where in the array the element falls (obviously) but it seems fast enough for huge datasets.
This way requires two random calculations but they should be faster and require about 1/4 of the memory but with some reduced accuracy if weights have disproportionate counts. (See Update for increased accuracy at the cost of some memory and processing)
Store a multidimensional array where each item is stored in the an array based on its weight:
$array[$weight][] = $item;
// example: Item with a weight of 5 would be $array[5][] = 'Item'
Generate a new array with the weights (1-10) appearing n times for n weight:
foreach($array as $n=>$null) {
for ($i=1;$i<=$n;$i++) {
$weights[] = $n;
}
}
The above array would be something like: [ 1, 2, 2, 3, 3, 3, 4, 4, 4, 4 ... ]
First calculation: Get a random weight from the weighted array we just created
$weight = $weights[mt_rand(0, count($weights)-1)];
Second calculation: Get a random key from that weight array
$value = $array[$weight][mt_rand(0, count($array[$weight])-1)];
Why this works: You solve the weighted issue by using the weighted array of integers we created. Then you select randomly from that weighted group.
Update: Because of the possibility of disproportionate counts of items per weight, you could add another loop and array for the counts to increase accuracy.
foreach($array as $n=>$null) {
$counts[$n] = count($array[$n]);
}
foreach($array as $n=>$null) {
// Calculate proportionate weight (number of items in this weight opposed to minimum counted weight)
$proportion = $n * ($counts[$n] / min($counts));
for ($i=1; $i<=$proportion; $i++) {
$weights[] = $n;
}
}
What this does is if you have 2000 10's and 100 1's, it'll add 200 10's (20 * 10, 20 because it has 20x the count, and 10 because it is weighted 10) instead of 10 10's to make it proportionate to how many are in there opposed the minimum weight count. So to be accurate, instead of adding one for EVERY possible key, you are just being proportionate based on the MINIMUM count of weights.
I greatly appreciate the answers above. Please consider this answer, which does not require checking every item in the original array.
// Given $a as an array of items
// where $a[0] is the item name and $a[1] is the item weight.
// It is known that weights are integers from 1 to 100.
for($i=0; $i<sizeof($a); $i++) // Safeguard described below
{
$item = $a[array_rand($a)];
if(rand(1,100)<=$item[1]) break;
}
This algorithm only requires storage for two variables ($i and $item) as $a was already created before the algorithm kicked in. It does not require a massive array of duplicate items or an array of intervals.
In a best-case scenario, this algorithm will touch one item in the original array and be done. In a worst-case scenario, it will touch n items in an array of n items (not necessarily every item in the array as some may be touched more than once).
If there was no safeguard, this could run forever. The safeguard is there to stop the algorithm if it simply never picks an item. When the safeguard is triggered, the last item touched is the one selected. However, in millions of tests using random data sets of 100,000 items with random weights of 1 to 10 (changing rand(1,100) to rand(1,10) in my code), the safeguard was never hit.
I made histograms comparing the frequency of items selected among my original algorithm, the ones from answers above, and the one in this answer. The differences in frequencies are trivial - easy to attribute to variances in the random numbers.
EDIT... It is apparent to me that my algorithm may be combined with the algorithm pala_ posted, removing the need for a safeguard.
In pala_'s algorithm, a list is required, which I call an interval list. To simplify, you begin with a random_weight that is rather high. You step down the list of items and subtract the weight of each one until your random_weight falls to zero (or less). Then, the item you ended on is your item to return. There are variations on this interval algorithm that I've tested and pala_'s is a very good one. But, I wanted to avoid making a list. I wanted to use only the given weighted list and never touch all the items. The following algorithm merges my use of random jumping with pala_'s interval list. Instead of a list, I randomly jump around the list. I am guaranteed to get to zero eventually, so no safeguard is needed.
// Given $a as the weighted array (described above)
$weight = rand(1,100); // The bigger this is, the slower the algorithm runs.
while($weight>0)
{
$item = $a[array_rand($a)];
$weight-= $item[1];
}
// $item is the random item you want.
I wish I could select both pala_ and this answer as the correct answers.
I'm not sure if this is "faster", but I think it may be more "balance"d between memory usage and speed.
The thought is to transform your current implementation (500000 items array) into an equal-length array (100000 items), with the lowest "origin" position as key, and origin index as value:
<?php
$set=[["a",3],["b",5]];
$current_implementation=["a","a","a","b","b","b","b","b"];
// 0=>0 means the lowest "position" 0
// points to 0 in the set;
// 3=>1 means the lowest "position" 3
// points to 1 in the set;
$my_implementation=[0=>0,3=>1];
And then randomly picks a number between 0 and highest "origin" position:
// 3 is the lowest position of the last element ("b")
// and 5 the weight of that last element
$my_implemention_pick=mt_rand(0,3+5-1);
Full code:
<?php
function randomPickByWeight(array $set)
{
$low=0;
$high=0;
$candidates=[];
foreach($set as $key=>$item)
{
$candidates[$high]=$key;
$high+=$item["weight"];
}
$pick=mt_rand($low,$high-1);
while(!array_key_exists($pick,$candidates))
{
$pick--;
}
return $set[$candidates[$pick]];
}
$cache=[];
for($i=0;$i<100000;$i++)
{
$cache[]=["item"=>"item {$i}","weight"=>mt_rand(1,10)];
}
$time=time();
for($i=0;$i<100;$i++)
{
print_r(randomPickByWeight($cache));
}
$time=time()-$time;
var_dump($time);
3v4l.org demo
3v4l.org have some time limitation on codes, so the demo didn't finished. On my laptop the above demo finished in 10 seconds (i7-4700 HQ)
ere is my offer in case I've understand you right. I offer you take a look and if there are some question I'll explain.
Some words in advance:
My sample is with only 3 stages of weight - to be clear
- With outer while I'm simulating your main loop - I count only to 100.
- The array must to be init with one set of initial numbers as shown in my sample.
- In every pass of main loop I get only one random value and I'm keeping the weight at all.
<?php
$array=array(
0=>array('item' => 'A', 'weight' => 1),
1=>array('item' => 'B', 'weight' => 2),
2=>array('item' => 'C', 'weight' => 3),
);
$etalon_weights=array(1,2,3);
$current_weights=array(0,0,0);
$ii=0;
while($ii<100){ // Simulates your main loop
// Randomisation cycle
if($current_weights==$etalon_weights){
$current_weights=array(0,0,0);
}
$ft=true;
while($ft){
$curindex=rand(0,(count($array)-1));
$cur=$array[$curindex];
if($current_weights[$cur['weight']-1]<$etalon_weights[$cur['weight']-1]){
echo $cur['item'];
$array[]=$cur;
$current_weights[$cur['weight']-1]++;
$ft=false;
}
}
$ii++;
}
?>
I'll use this input array for my explanation:
$values_and_weights=array(
"one"=>1,
"two"=>8,
"three"=>10,
"four"=>4,
"five"=>3,
"six"=>10
);
The simple version isn't going to work for you because your array is so large. It requires no array modification but may need to iterate the entire array, and that's a deal breaker.
/*$pick=mt_rand(1,array_sum($values_and_weights));
$x=0;
foreach($values_and_weights as $val=>$wgt){
if(($x+=$wgt)>=$pick){
echo "$val";
break;
}
}*/
For your case, re-structuring the array will offer great benefits.
The cost in memory for generating a new array will be increasingly justified as:
array size increases and
number of selections increases.
The new array requires the replacement of "weight" with a "limit" for each value by adding the previous element's weight to the current element's weight.
Then flip the array so that the limits are the array keys and the values are the array values.
The selection logic is: the selected value will have the lowest limit that is >= $pick.
// Declare new array using array_walk one-liner:
array_walk($values_and_weights,function($v,$k)use(&$limits_and_values,&$x){$limits_and_values[$x+=$v]=$k;});
//Alternative declaration method - 4-liner, foreach() loop:
/*$x=0;
foreach($values_and_weights as $val=>$wgt){
$limits_and_values[$x+=$wgt]=$val;
}*/
var_export($limits_and_values);
$limits_and_values looks like this:
array (
1 => 'one',
9 => 'two',
19 => 'three',
23 => 'four',
26 => 'five',
36 => 'six',
)
Now to generate the random $pick and select the value:
// $x (from walk/loop) is the same as writing: end($limits_and_values); $x=key($limits_and_values);
$pick=mt_rand(1,$x); // pull random integer between 1 and highest limit/key
while(!isset($limits_and_values[$pick])){++$pick;} // smallest possible loop to find key
echo $limits_and_values[$pick]; // this is your random (weighted) value
This approach is brilliant because isset() is very fast and the maximum number of isset() calls in the while loop can only be as many as the largest weight (not to be confused with limit) in the array.
FOR YOUR CASE, THIS APPROACH WILL FIND THE VALUE IN 10 ITERATIONS OR LESS!
Here is my Demo that will accept a weighted array (like $values_and_weights), then in just four lines:
Restructure the array,
Generate a random number,
Find the correct value, and
Display it.
EDIT 1 -since posting I have learnt that the underlying question is about how to find the CARTESIAN PRODUCT (now go google), but not only because I don't want every perm, I want to find the cartesian products that use the same subarray Key never more than once per permuation AND my 'extra' question then is more about how to minimise the workload that a cartesian product would require (accepting a small error rate, I have to say)-
Imagine... I have four cooks and four recipes, each cook has a score for each recipe and today I'd like each cook to make one dish (but no dish should be made twice) and the decision should be based on the best (highest total scores) permutation for all four (so maybe a cook won't make his personal best).
I have put the data into a multi-dimensional array as such
array(
array (1,2,3,4),
array (35,0,0,0),
array (36,33,1,1),
array (20,20,5,3)
)
it has the same number of valuepairs in each sub array as the number of sub-arrays (if that helps any)
in reality the number of sub-arrays would reach a maximum of 8 (max perms therefore =8!, approx 40,000 not 8^8 because many combinations are not allowed)
the choice of having the data in this format is flexible if that helps
I am trying to create a second array that would output the best (ie HIGHEST value) possible combination of the sub-arrays as per KEYs where only ONE of each subarray can be used
--so here each subarray[0][1][2][3] would be used once per permutation
and each subarrayKey [0][1][2][3] would be used once per permutaion, in my actual problem I'm using associated arrays, but that is extra to this issue.--
So the example would create an array as such
newArray (35,33,5,4) // note that [2][0] was not used
IDEALLY I would prefer to not produce the ALL perms but rather, SOMEHOW, discard many combinations that would clearly not be best fit.
Any ideas for how to start? I would accept pseudo code.
For an example on SO about Cartesian Product, see PHP 2D Array output all combinations
EDIT 2
for more on making cartesian products more efficient, and maybe why it has to be case specific if you want to see if you can cut corners (with risk) Efficient Cartesian Product algorithm
Apologies, but this is going to be more of a logic layout than code...
It's not quite clear to me whether the array(1,2,3,4) are the scores for the first dish or for the first cook, but I would probably use an array such that
$array[$cook_id][$dish_number] = $score;
asort() each array so that $array[$cook_id] = array($lowest_scored_dish,...,$highest);
Consider a weighted preference for a particular cook to make a dish to be the difference between the score of the best dish and another.
As a very simple example, cooks a,b,c and dishes 0,1,2
$array['a'] = array(0=>100, 1=>50, 2=>0); // cook a prefers 0 over 1 with weight 50, over 2 with weight 100
$array['b'] = array(0=>100, 1=>100, 2=>50); // cook b prefers 0,1 over 2 with weight 50
$array['c'] = array(0=>50, 1=>50, 2=>100); // cook c prefers 2 with weight 50
After asort():
$array['a'] = array(0=>100, 1=>50, 2=>0);
$array['b'] = array(0=>100, 1=>100, 2=>50);
$array['c'] = array(2=>100, 0=>50, 1=>50);
Start with cook 'a' who prefers dish 0 over his next best dish by 50 points (weight). Cook 'b' also prefers dih 0, but with a weight of 0 over the next dish. Therefore it's likely (though not yet certain that cook 'a' should make dish 0.
Consider dish 0 to be reserved and move on to cook 'b'. Excluding dish 0, cook 'b' prefers dish 1. No other cook prefers dish 1, so cook 'b' is assigned dish 1.
Cook 'c' gets dish 2 by default.
This is a VERY convenient example where each cook gets to cook something that's a personal max, but I hope it's illustrative of some logic that would work out.
Let's make it less convenient:
$array['a'] = array(0=>75, 1=>50, 2=>0);
$array['b'] = array(0=>100, 1=>50, 2=>50);
$array['c'] = array(0=>100, 1=>25, 2=>25);
Start again with cook 'a' and see that 0 is preferred, but this time with weight 25. Cook 'b' prefers with a weight of 50 and cook 'c' prefers with a weight of 75. Cook 'c' wins dish 0.
Going back to the list of available cooks, 'a' prefers 1 with a weight of 50, but 'b' prefers it with weight 0. 'a' gets dish 1 and 'b' gets dish 2.
This still doesn't take care of all complexities, but it's a step in the right direction. Sometimes the assumption made for the first cook/dish combination will be wrong.
WAY less convenient:
$array['a'] = array(0=>200, 1=>148, 2=>148, 3=>0);
$array['b'] = array(0=>200, 1=>149, 2=>0, 3=>0);
$array['c'] = array(0=>200, 1=>150, 2=>147, 3=>147);
$array['d'] = array(0=>69, 1=>18, 2=>16, 3=>15);
'a' gets 0 since that's the max and no one else who prefers 0 has a higher weight
'b' wins 1 with a weight of 149
'd' wins 2 since 'c' doesn't have a preference from the available options
'c' gets 3
score: 200+149+147+16 = 512
While that's a good guess that's gathered without checking every permutation, it may be wrong. From here, ask, "If one cook traded with any one other cook, would the total increase?"
The answer is YES, a(0)+d(2) = 200+16 = 216, but a(2)+d(0) = 148+69 = 217.
I'll leave it to you to write the code for the "best guess" using the weighted approach, but after that, here's a good start for you:
// a totally uneducated guess...
$picks = array(0=>'a', 1=>'b', 2=>'c', 3=>'d');
do {
$best_change = false;
$best_change_weight = 0;
foreach ($picks as $dish1 => $cook1) {
foreach ($picks as $dish2 => $cook2) {
if (($array[$cook1][$dish1] + $array[$cook2][$dish2]) <
($array[$cook1][$dish2] + $array[$cook2][$dish1]))
{
$old_score = $array[$cook1][$dish1] + $array[$cook2][$dish2];
$new_score = $array[$cook1][$dish2] + $array[$cook2][$dish1];
if (($new_score - $old_score) > $best_change_weight) {
$best_change_weight = $new_score - $old_score;
$best_change = $dish2;
}
}
}
if ($best_change !== false) {
$cook2 = $picks[$best_change];
$picks[$dish1] = $cook2;
$picks[$dish2] = $cook1;
break;
}
}
} while ($best_change !== false);
I can't find a counter example to show that this doesn't work, but I'm suspicious of the case where
($array[$cook1][$dish1] + $array[$cook2][$dish2])
==
($array[$cook1][$dish2] + $array[$cook2][$dish1])
Maybe someone else will follow up with an answer to this "What if?"
Given this matrix, where the items in brackets are the "picks"
[a1] a2 a3
b1 [b2] b3
c1 c2 [c3]
If a1 + b2 == a2 + b1, then 'a' and 'b' will not switch dishes. The case I'm not 100% sure about is if there exists a matrix such that this is a better choice:
a1 [a2] a3
b1 b2 [b3]
[c1] c2 c3
Getting from the first state to the second requires two switches, the first of which seems arbitrary since it doesn't change the total. But, only by going through this arbitrary change can the last switch be made.
I tried to find an example 3x3 such that based on the "weighted preference" model I wrote about above, the first would be selected, but also such that the real optimum selection is given by the second. I wasn't able to find an example, but that doesn't mean that it doesn't exist. I don't feel like doing more matrix algebra right now, but maybe someone will pick up where I left off. Heck, maybe the case doesn't exist, but I thought I should point out the concern.
If it does work and you start with the correct pick, the above code will only loop through 64 times (8x8) for 8 cooks/dishes. If the pick is not correct and the first cook has a change, then it will go up to 72. If the 8th cook has a change, it's up to 128. It's possible that the do-while will loop several times, but I doubt it will get up near the CPU cycles required to sum all of the 40k combinations.
I may have a starting point for you with this algorithm that tries to choose cooks based on their ratio of max score to sum of scores (thus trying to choose chefs who are really good at one recipe but bad at the rest of the recipes to do that recipe)
$cooks = array(
array(1,2,3,4),
array(35,0,0,0),
array(36,33,1,1),
array(20,20,5,3)
);
$results = array();
while (count($cooks)) {
$curResult = array(
'cookId' => -1,
'recipe' => -1,
'score' => -1,
'ratio' => -1
);
foreach ($cooks as $cookId => $scores) {
$max = max($scores);
$ratio = $max / array_sum($scores);
if ($ratio > $curResult['ratio']) {
$curResult['cookId'] = $cookId;
$curResult['ratio'] = $ratio;
foreach ($scores as $recipe => $score) {
if ($score == $max) {
$curResult['recipe'] = $recipe;
$curResult['score'] = $score;
}
}
}
}
$results[$curResult['recipe']] = $curResult['score'];
unset($cooks[$curResult['cookId']]);
foreach ($cooks as &$cook) {
unset($cook[$curResult['recipe']]);
}
}
For the dataset provided, it does find what seems to be the optimum answer (35,33,5,4). However, it is still not perfect, for example, with the array:
$cooks = array(
array(1,2,3,4),
array(35,0,33,0),
array(36,33,1,1),
array(20,20,5,3)
);
The ideal answer would be (20,33,33,4), however this algorithm would return (35,33,5,4).
But since the question was asking for ideas of where to start, I guess this at least might suffice as something to start from :P
Try this
$mainArr = array(
array (1,2,3,4) ,
array (35,0,0,0) ,
array (36,33,1,1) ,
array (20,20,5,3)
);
$i = 0;
foreach( $mainArr as $subArray )
{
foreach( $subArray as $key => $value)
{
$newArr[$key][$i]=$value;
$i++;
}
}
$finalArr = array();
foreach( $newArr as $newSubArray )
{
$finalArr[] = max($newSubArray);
}
print_r( $finalArr );
OK here is a solution that allows you to find the best permutation of one cook to one recipe and no cook works twice and no recipe is made twice.
Thanks for the code to calculate perm of arrays goes to o'reilly...
http://docstore.mik.ua/orelly/webprog/pcook/ch04_26.htm
CONSIDERATIONS:
The number of cooks and the number of recipes are the same.
Going above a 5 by 5 matrix as here will get very big very fast. (see part 2 to be posted shortly)
The logic:
A permutation of an array assigns a place as well as just being included (ie what a combination does), so why not then assign each key of such an array to a recipe, the permutation guarantees no cook is repeated and the keys guarantee no recipe is repeated.
Please let me know if there are improvements or errors in my thinking or my code but here it is!
<?php
function pc_next_permutation($p, $size) {
//this is from http://docstore.mik.ua/orelly/webprog/pcook/ch04_26.htm
// slide down the array looking for where we're smaller than the next guy
for ($i = $size - 1; $p[$i] >= $p[$i+1]; --$i) { }
// if this doesn't occur, we've finished our permutations
// the array is reversed: (1, 2, 3, 4) => (4, 3, 2, 1)
if ($i == -1) { return false; }
// slide down the array looking for a bigger number than what we found before
for ($j = $size; $p[$j] <= $p[$i]; --$j) { }
// swap them
$tmp = $p[$i]; $p[$i] = $p[$j]; $p[$j] = $tmp;
// now reverse the elements in between by swapping the ends
for (++$i, $j = $size; $i < $j; ++$i, --$j) {
$tmp = $p[$i]; $p[$i] = $p[$j]; $p[$j] = $tmp;
}
return $p;
}
$cooks[441] = array(340=>5,342=>43,343=>50,344=>9,345=>0);
$cooks[442] = array(340=>5,342=>-33,343=>-30,344=>29,345=>0);
$cooks[443] = array(340=>5,342=>3,343=>0,344=>9,345=>10,);
$cooks[444] = array(340=>25,342=>23,343=>20,344=>19,345=>20,);
$cooks[445] = array(340=>27,342=>27,343=>26,344=>39,345=>50,);
//a consideration: this solution requires that the number of cooks equal the number of recipes
foreach ($cooks as $cooksCode => $cooksProfile){
$arrayOfCooks[]=$cooksCode;
$arrayOfRecipes = (array_keys($cooksProfile));
}
echo "<br/> here is the array of the different cooks<br/>";
print_r($arrayOfCooks);
echo "<br/> here is the array of the different recipes<br/>";
print_r($arrayOfRecipes);
$set = $arrayOfCooks;
$size = count($set) - 1;
$perm = range(0, $size);
$j = 0;
do {
foreach ($perm as $i) { $perms[$j][] = $set[$i]; }
} while ($perm = pc_next_permutation($perm, $size) and ++$j);
echo "<br/> here are all the permutations of the cooks<br/>";
print_r($perms);
$bestCombo = 0;
foreach($perms as $perm){
$thisScore =0;
foreach($perm as $key =>$cook){
$recipe= $arrayOfRecipes[$key];
$cookScore =$cooks[$cook][$recipe];
$thisScore = $thisScore+$cookScore;
}
if ($thisScore>$bestCombo){
$bestCombo=$thisScore;
$bestArray= $perm;
}
}
echo "<br/> here is the very best array<br/>";
print_r ($bestArray);
echo "<br/> best recipe assignment value is:".$bestCombo."<br/><br/>";
?>
The problem is pretty straightforward, I think, by looking at the code. I have a randomized array (the array must be randomized, some code has been excluded because it doesn't pertain to the actual problem, but does require randomization). For each element in the array, there is a "probability" index (described here as the value itself, in $rules) that is suppose to hint that, if other conditions are met (that are removed here for the sake of non-relevancy), the probability that array element will be "triggered" (in this case, that the array element's score will increment by 1)
Consider the code:
<?php
// Taken from php.net/shuffle user notes
// Shuffles an array order for the sake of foreach while maintaining
// key => value associations
function shuffle_assoc(&$array) {
$keys = array_keys($array);
shuffle($keys);
foreach($keys as $key) {
$new[$key] = $array[$key];
}
return $new;
}
$i = 1000000; // How many tests to perform
// This is my rule list. Each key is a simple color
// and each value is a probability represented as a percent
$rules = array(
'black' => 20,
'white' => 10,
'red' => 40,
'green' => 5,
'blue' => 25,
);
// Initialize the scores array with all 0's
// The "outs" will be used when the probability does not
// occur in any of the rules
$scores = array('outs' => 0);
foreach($rules as $k => $v) {
$scores[$k] = 0;
}
$count = count($rules);
for($x = 0; $x < $i; $x++) {
$rules = shuffle_assoc($rules);
foreach($rules as $k => $probability) {
$rand = mt_rand(1,100);
//$probability = ??; I've tried applying many different operations here to "correct" the probability
if($rand > $probability) {
continue;
} else {
$scores[$k]++;
continue 2;
}
}
$scores['outs']++;
}
foreach($scores as $k => $v) {
echo "$k: " . (($v/$i)*100) . "% ($v/$i)\n";
}
?>
Expected output (pseudo). Note the percentages correspond with the values of $rules
outs: less than 1% (.../1000000)
black: 20% (.../1000000)
white: 10% (.../1000000)
red: 40% (.../1000000)
green: 5% (.../1000000)
blue: 25% (.../1000000)
Example output:
outs: 30.7128% (307128/1000000)
black: 13.2114% (132114/1000000)
white: 6.3381% (63381/1000000)
red: 29.5247% (295247/1000000)
green: 3.1585% (31585/1000000)
blue: 17.0545% (170545/1000000)
Things I've tried & Considerations:
As you can see, within the loop I have a commented out section of $probability = ?? which I've tried various obvious-to-me methods of calculating the actual probability to use within each element, including playing with $count (count of rules) which is why that variable exists and isn't used.
It doesn't have to be exact obviously, but preferably has stable results over a smaller set of numbers (e.x. 1,000 iterations).
It can be pretty fuzzy. A variance of +/- 5% wouldn't hurt my feelings, especially in smaller numbers of iterations, I understand big number theory comes to play here.
The number of outs isn't a big deal as long as they're less than 1%-2%. I also tried eliminating outs using various methods to see if the outs alone were skewing, and interestingly enough when I did that on one occasion, I got a 20% split all around (i.e. even).
Furthermore, on "outs", I was able to get pretty close to the proper split with very little outs by basically brute-forcing the probability "numbers" (that is, the values of $rules) starting from 100 backwards, but I was never able to find out a precise, optimal method. Each time, I would get closer to the result for one color, that would skew the other colors on a small but noticeable scale. There was no easy-for-me-to-grasp correlation in these numbers and were seemingly random although it is obvious that the results played well with probability vs big numbers.
Tell me there is a precise way to calculate this. It's driving me nuts.
Edit: I have a finalized version of my code, with the help from the two answers below, that does this without the need for knowing probability percentages before the loop begins, and no additional or nested loops (which is what I specifically needed, I guess I should of been more direct in that part) .. In the sense of, each iteration, you could be pulling the probability dynamically based on that specific iteration's properties.. All answers here were invaluable, here is my version of the final code: http://pastebin.com/eB3TVP1E
Just normalize the results, accumulate them and then you are done.
What I mean is:
sum all probabilities given for every item of the array to get the total (which is 100 in your case but it's easily generalizable)
divide every probability for the total
So for example:
$rules = array(
'black' => 20,
'white' => 10,
'red' => 40,
'green' => 5,
'blue' => 25,
);
will be normalized to:
$rules_norm = array(
'black' => 0.2,
'white' => 0.1,
'red' => 0.4,
'green' => 0.05,
'blue' => 0.25,
);
now accumulate the result so that for every element in $rules_norm you calculate the sum of all previous elements plus the current one.
So:
$rules_norm = array(
'black' => 0.2,
'white' => 0.3,
'red' => 0.7,
'green' => 0.75,
'blue' => 1.0,
);
Now with this you can just extract a random float number in range [0,1) and choose which elements are increased according to the result: to increment the score of one element just start from the first one in the array and increment the one such that $rand > $rules_norm[k]
Jack's idea implemented in your code (if the sum of probabilities is >100 this won't work):
php fiddle
<?php
// Taken from php.net/shuffle user notes
// Shuffles an array order for the sake of foreach while maintaining
// key => value associations
function shuffle_assoc(&$array) {
$keys = array_keys($array);
shuffle($keys);
foreach($keys as $key) {
$new[$key] = $array[$key];
}
return $new;
}
$i = 1000000; // How many tests to perform
// This is my rule list. Each key is a simple color
// and each value is a probability represented as a percent
$rules = array(
'black' => 20,
'white' => 10,
'red' => 40,
'green' => 5,
'blue' => 25,
);
// Initialize the scores array with all 0's
// The "outs" will be used when the probability does not
// occur in any of the rules
$scores = array('outs' => 0);
foreach($rules as $k => $v) {
$scores[$k] = 0;
}
$count = count($rules);
//$limits is what Jack called $rules_norm
$limits=array();
$limit=0;
foreach($rules as $k=>$v)
{
$limit+=$v;
$limits[$k]=$limit;
}
for($x = 0; $x < $i; $x++) {
$rand = mt_rand(1,100);
foreach($limits as $k=>$v)
{
if($v>=$rand)
{
$scores[$k]++;
continue(2);
}
}
$scores['outs']++;
}
foreach($scores as $k => $v) {
echo "$k: " . (($v/$i)*100) . "% ($v/$i)\n";
}
?>
I have a set of elements and i need to choose any one element out of it. Each element is associated with a percentage chance. The percentages add to 100.
I need to choose one out of those element so that the chances of an element being chosen is equal to the percent value. So if a element has 25% chance, it is supposed to have 25% chances of getting chosen. In other words, if we choose elements 1 mil times, that element should be chosen near 250k times.
What you describe is a multinomial process.
http://en.wikipedia.org/wiki/Multinomial_distribution#Sampling_from_a_multinomial_distribution
They way to generate such random process is like this:
( I'll use pseudo code but it should be easy to make it in to real code. )
Sort the 'boxes' in reverse order of their probability:
(not needed. it's just an optimization)
so that you have for example values=[0.45,0.3,0.15,0.1]
then create the 'cumulative' distribution, which is the sum of all elements with index <=i.
pseudocode:
cumulant=[0,0,0,0] // initiate it
s=0
for j=0 to size()-1 {
s=s+values[i] ;
cumulant[i]=s
}
in our case cumulant=[0.45,0.70,0.85 ,1 ]
make a uniform random number x between 0 and 1.
For php: http://php.net/manual/en/function.rand.php
the resulting random box index i is
the highest i for which cumulant[i]< x
pseudocode:
for j=0 to size()-1 {
if !(cumulant[i]<){
print "your index is ",i
break;
}
that is it. Get another random index i by going back to point 3.
if you sort like suggested above, that means that the final search will be faster. For example, if you have this vector of probabilities: 0.001 0.001 0.001 0.001 0.996 then, when you sort it, you will almost always only have to look only at index i=0, since the random number x will almost always be lower than 0.996. If the sort pays off or not depends on if you repeatedly use the same 'boxes'. So, yes with 250k tries it will help a lot. Just remember that the box index i you get is for the sorted vector.
I guess it was faster for me to write it than it was for you to show us what you did so far.
Probably not the best solution, but as it stands, it looks like it's the only one you've got.
Here you go:
$elements = array(
'This' => 25,
'is' => 15,
'a' => 15,
'crappy' => 20,
'list' => 25
);
asort($elements);
$elements = array_reverse($elements);
// Precalc cumulative value
$cumulant = 0;
foreach ($elements as $key => &$value) {
$cumulant += $value;
$value = $cumulant;
}
function pickAnElement($elements) {
$random = rand(1, 100);
foreach ($elements as $key => $value) {
if ($random <= $value) {
return $key;
}
}
}
$picks = array();
for ($i = 0; $i < 10000; $i++) {
$element = pickAnElement($elements);
if (!array_key_exists($element, $picks)) {
$picks[$element] = 0;
}
$picks[$element]++;
}
var_dump($picks);
Inspired by Johans answer, I added a loop to sort and pre-calculate the cumulant.