Related
I've come to a mathematical problem which for I can't program the logic.
Let me explain it with an example:
Let's say I have 4 holes and 3 marbles, the holes are in order and my marbles are A,B and C and also in order.
I need to get every posible ORDERED combination:
ABC4
AB3C
A2BC
1ABC
This is very simple, but what if the number of holes changes? Let's say now I have 5 holes.
ABC45
AB3C5
A2BC5
1ABC5
AB34C
A2B4C
1AB4C
A23BC
1A3BC
12ABC
Now let's say we have 5 holes and 4 marbles.
ABCD5
ABC4D
AB3CD
A2BCD
1ABCD
And this can be any number of holes and any number of marbles.
The number of combinations is given by:
$combinations = factorial($number_of_holes)/(factorial($number_of_marbles)*factorial($number_of_holes-$number_of_marbles)))
(Here it is the factorial function in case you need it)
function factorial($number) {
if ($number < 2) {
return 1;
} else {
return ($number * factorial($number-1));
}
}
What I need and can't figure out how to program, is a function or a loop or something, that returns an array with the position of the holes, given X numbers of holes and Y number of marbles.
For first example it would be: [[4],[3],[2],[1]], for second: [[4,5],[2,5],[1,5],[3,4],[2,4],[1,5],[2,3],[1,3],[1,2]], for third: [[5],[4],[3],[2],[1]].
It doesn't have to be returned in order, I just need all the elements.
As you can see, another approach is the complementary or inverse or don't know how to call it, but the solution is every combinations of X number of free holes given Y number of holes, so, If I have 10 holes, and 5 marbles, there would be 5 free holes, the array returned would be every combination of 5 that can be formed with (1,2,3,4,5,6,7,8,9,10), which are 252 combinations, and what I need is the 252 combinations.
Examples for the 2nd approach:
Given an array=[1,2,3,4], return every combination for sets of 2 and 3.
Sets of 2
[[1,2],[1,3],[1,4],[2,3],[2,4],[3,4]]
Sets of 3
[[1,2,3],[1,2,4],[1,3,4],[2,3,4]]
What I need is the logic to do this, I'm trying to do it in PHP, but I just can't figure out how to do it.
The function would receive the array and the set size and would return the array of sets:
function getCombinations($array,$setize){
//magic code which I can't figure out
return array(sets);
}
I hope this is clear enough and someone can help me, I've been stuck for several days now, but it seems to be just too much for me to handle by myself.
This post, PHP algorithm to generate all combinations of a specific size from a single set, is for all possible combinations, repeating the elements and order doesn't matter, its a good lead, I did read it, but it doesn't solve my problem, it's very different. I need them without repeating the elements and ordered as explained.
Let's say if I have already a set of [3,4] in my array, I don't want [4,3] as an other set.
Here's a recursive solution in PHP:
function getCombinations($array, $setsize){
if($setsize == 0)
return [[]];
// generate combinations including the first element by generating combinations for
// the remainder of the array with one less element and prepending the first element:
$sets = getCombinations(array_slice($array, 1), $setsize - 1);
foreach ($sets as &$combo) {
array_unshift($combo, $array[0]);
}
// generate combinations not including the first element and add them to the list:
if(count($array) > $setsize)
$sets = array_merge($sets, getCombinations(array_slice($array, 1), $setsize));
return $sets;
}
// test:
print_r(getCombinations([1, 2, 3, 4], 3));
Algorithm works like this:
If setsize is 0 then you return a single, empty combination
Otherwise, generate all combinations that include the first element, by recursively generating all combinations off the array excluding the first element with setsize - 1 elements, and then prepending the first element to each of them.
Then, if the array size is greater than setsize (meaning including the first element is not compulsory), generate all the combinations for the rest of the list and add them to the ones we generated in the second step.
So basically at each step you need to consider whether an element will be included or excluded in the combination, and merge together the set of combinations representing both choices.
I want to generate the profile ids in my software. The mt_rand function works well but I need the ids to be a fixed 10 digit long. Currently I am looping through mt_rand outputs until I get a 10 digit number. But the problem I am facing now is that most of the profile ids start from 1 and some from 2. None from any of the other single digit numbers. I understand this happens because of mt_rand's range and it can't produce 10 digit numbers that start with 3 or more.
This is what I am currently doing
for($i = 0; $i < 200; $i++){
$num = mt_rand();
if(strlen($num) == 10) echo $num."<br>";
}
If you run the above code you will see all numbers start from either 1 or 2. Any way to fix this?
Edit: I guess I can just flip the numbers but some numbers end with zero and this seems like a bit of a hack anyways. But then again, random number generation is a hack in itself I guess.
just start your IDs at 1000000001 , then ID 2 at 1000000002 , ID 543 at 1000000543 , and so on?
alternatively, keep calling mt_rand(1000000001,min((PHP_INT_SIZE>4 ? intval("9999999999",10): PHP_INT_MAX),mt_getrandmax())) until you get an ID which does not already exist in your database? (this will be more and more cpu intesive as your db grows larger and larger.. when its almost full, i wouldn't be surprised if it took billions of iterations and several minutes..)
To elaborate on Rizier's suggestion, the only way to ensure any string (even a string of numbers) fits a given mold for length and rules is to generate it one character at a time and then fit them together
$str = '';
for($loop = 0; $loop < 10; $loop++) {
$str .= mt_rand(0,9);
}
echo $str;
You can then add rules to this. Maybe you don't want a leading 0 so you can add a rule for that. Maybe you want letters too. This will always give you a random string with the rules you want.
You can see this in action here http://3v4l.org/kIRdV
I've been searching for a while to try and arrive at some sort of solution for a problem that is currently roadblocking a task I'm trying to complete.
I've come across a few solutions in other programming languages that I really can't understand despite my attempts at doing so. I've also seen a lot of terminology surrounding this problem such as permutations, refactoring, subset sums, coins in a dollar, etc.
If I'm going about this the wrong way, please do feel free to let me know.
Here's the problem in a nutshell:
Given a set (array) of numbers,
ex: 2, 3, 7, 14,
how could I find what combinations of those numbers add up to (or equal) a specific sum, ex: 14.
An example of some of the potential combinations for the above example numbers:
3 + 3 + 3 + 3 + 2
7 + 3 + 2 + 2
7 + 7
14
Since the problem I'm trying to solve is in PHP, I'd love if there were a solution that could be offered in that language. If not, even if someone could better explain what the problem is that I'm trying to solve, and potential methods of doing so, I'd be greatly appreciative.
Or again if I might be going about this the wrong way, I'm all ears.
To generate ALL solutions you are going to need to use some kind of backtracking, "guess" if the first number is in the solution or not, and recurse for each of the possibilities (it is needed to sum the result, or it is not).
Something like the following pseudo-code:
genResults(array, sum, currentResult):
if (sum == 0): //stop clause, found a series summing to to correct number
print currentResult
else if (sum < 0): //failing stop clause, passed the required number
return
else if (array.length == 0): //failing stop clause, exhausted the array
return
else:
//find all solutions reachable while using the first number (can use it multiple times)
currentResult.addLast(array[0])
genResults(array, sum - array[0], currentResult)
//clean up
currentResult.removeLast()
//find all solutions reachable while NOT using first number
genResults(array+1, sum, currentResult)
//in the above array+1 means the subarray starting from the 2nd element
Here's what I have managed to come up with thus far, based on amit's feedback and example, and some other examples.
So far it appears to be working - but I'm not 100% certain.
$totals = array();
$x=0;
function getAllCombinations($ind, $denom, $n, $vals=array()){
global $totals, $x;
if ($n == 0){
foreach ($vals as $key => $qty){
for(; $qty>0; $qty--){
$totals[$x][] = $denom[$key];
}
}
$x++;
return;
}
if ($ind == count($denom)) return;
$currdenom = $denom[$ind];
for ($i=0;$i<=($n/$currdenom);$i++){
$vals[$ind] = $i;
getAllCombinations($ind+1,$denom,$n-($i*$currdenom),$vals);
}
}
$array = array(3, 5, 7, 14);
$sum = 30;
getAllCombinations(0, $array, $sum);
var_dump($totals);
I am just wondering, how unique is a mt_rand() number is, if you draw 5-digits number?
In the example, I tried to get a list of 500 random numbers with this function and some of them are repeated.
http://www.php.net/manual/en/function.mt-rand.php
<?php
header('Content-Type: text/plain');
$errors = array();
$uniques = array();
for($i = 0; $i < 500; ++$i)
{
$random_code = mt_rand(10000, 99999);
if(!in_array($random_code, $uniques))
{
$uniques[] = $random_code;
}
else
{
$errors[] = $random_code;
}
}
/**
* If you get any data in this array, it is not exactly unique
* Run this script for few times and you may see some repeats
*/
print_r($errors);
?>
How many digits may be required to ensure that the first 500 random numbers drawn in a loop are unique?
If numbers are truly random, then there's a probability that numbers will be repeated. It doesn't matter how many digits there are -- adding more digits makes it much less likely there will be a repeat, but it's always a possibility.
You're better off checking if there's a conflict, then looping until there isn't like so:
$uniques = array();
for($i = 0; $i < 500; $i++) {
do {
$code = mt_rand(10000, 99999);
} while(in_array($code, $uniques));
$uniques[] = $code
}
Why not use range, shuffle, and slice?
<?php
$uniques = range(10000, 99999);
shuffle($uniques);
$uniques = array_slice($uniques, 0, 500);
print_r($uniques);
Output:
Array
(
[0] => 91652
[1] => 87559
[2] => 68494
[3] => 70561
[4] => 16514
[5] => 71605
[6] => 96725
[7] => 15908
[8] => 14923
[9] => 10752
[10] => 13816
*** truncated ***
)
This method is less expensive as it does not search the array each time to see if the item is already added or not. That said, it does make this approach less "random". More information should be provided on where these numbers are going to be used. If this is an online gambling site, this would be the worst! However if this was used in returning "lucky" numbers for a horoscope website, I think it would be fine.
Furthermore, this method could be extended, changing the shuffle method to use mt_rand (where as the original method simply used rand). It may also use openssl_random_pseudo_bytes, but that might be overkill.
The birthday paradox is at play here. If you pick a random number from 10000-99999 500 times, there's a good chance of duplicates.
Intuitive idea with small numbers
If you flip a coin twice, you'll get a duplicate about half the time. If you roll a six-sided die twice, you'll get a duplicate 1/6 of the time. If you roll it 3 times, you'll get a duplicate 4/9 (44%) of the time. If you roll it 4 times you'll get at least one duplicate 13/18 (63.33%). Roll it a fifth time and it's 49/54 (90.7%). Roll it a sixth time and it's 98.5%. Roll it a seventh time and it's 100%.
If you take replace the six-sided die with a 20-sided die, the probabilities grow a bit more slowly, but grow they do. After 3 rolls you have a 14.5% chance of duplicates. After 6 rolls it's 69.5%. After 10 rolls it's 96.7%, near certainty.
The math
Let's define a function f(num_rolls, num_sides) to generalize this to any number of rolls of any random number generator that chooses out of a finite set of choices. We'll define f(num_rolls, num_sides) to be the probability of getting no duplicates in num_rolls of a num_sides-side die.
Now we can try to build a recursive definition for this. To get num_rolls unique numbers, you'll need to first roll num_rolls-1 unique numbers, then roll one more unique number, now that num_rolls-1 numbers have been taken. Therefore
f(num_rolls, num_sides) =
f(num_rolls-1, num_sides) * (num_sides - (num_rolls - 1)) / num_sides
Alternately,
f(num_rolls + 1, num_side) =
f(num_rolls, num_sides) * (num_sides - num_rolls) / num_sides
This function follows a logistic decay curve, starting at 1 and moving very slowly (since num_rolls is very low, the change with each step is very small), then slowly picking up speed as num_rolls grows, then eventually tapering off as the function's value gets closer and closer to 0.
I've created a Google Docs spreadsheet that has this function built in as a formula to let you play with this here: https://docs.google.com/spreadsheets/d/1bNJ5RFBsXrBr_1BEXgWGein4iXtobsNjw9dCCVeI2_8
Tying this back to your specific problem
You've generated rolled a 90000-sided die 500 times. The spreadsheet above suggests you'd expect at least one duplicate pair about 75% of the time assuming a perfectly random mt_rand. Mathematically, the operation your code was performing is choosing N elements from a set with replacement. In other words, you pick a random number out of the bag of 90000 things, write it down, then put it back in the bag, then pick another random number, repeat 500 times. It sounds like you wanted all of the numbers to be distinct, in other words you wanted to choose N elements from a set without replacement. There are a few algorithms to do this. Dave Chen's suggestion of shuffle and then slice is a relatively straightforward one. Josh from Qaribou's suggestion of separately rejecting duplicates is another possibility.
Your question deals with a variation of the "Birthday Problem" which asks if there are N students in a class, what is the probability that at least two students have the same birthday? See Wikipedia: The "Birthday Problem".
You can easily modify the formula shown there to answer your problem. Instead of having 365 equally probable possibilities for the birthday of each student, you have 90001 (=99999-10000+2) equally probable integers that can be generated between 10000 and 99999. The probability that if you generate 500 such numbers that at least two numbers will be the same is:
P(500)= 1- 90001! / ( 90001^n (90001 - 500)! ) = 0.75
So there is a 75% chance that at least two of the 500 numbers that you generate will be the same or, in other words, only a 25% chance that you will be successful in getting 500 different numbers with the method you are currently using.
As others here have already suggested, I would suggest checking for repeated numbers in your algorithm rather than just blindly generating random numbers and hoping that you don't have a match between any pair of numbers.
I know similar questions come up a lot and there's probably no definitive answer, but I want to generate five unique random numbers from a subset of numbers that is potentially infinite (maybe 0-20, or 0-1,000,000).
The only catch is that I don't want to have to run while loops or fill an array.
My current method is to simply generate five random numbers from a subset minus the last five numbers. If any of the numbers match each other, then they go to their respective place at the end of the subset. So if the fourth number matches any other number, it will bet set to the 4th from the last number.
Does anyone have a method that is "random enough" and doesn't involve costly loops or arrays?
Please keep in mind this a curiosity, not some mission-critical problem. I would appreciate it if everyone didn't post "why are you having this problem?" answers. I am just looking for ideas.
Thanks a lot!
One random number call is enough.
If you want to choose a subset of 5 unique numbers in range 1-n, then select a random number in 1 to (n choose r).
Keep a 1-1 mapping from 1 to (n choose r) to the set of possible 5 element subsets, and you are done. This mapping is standard and can be found on the web, for instance here: http://msdn.microsoft.com/en-us/library/aa289166%28VS.71%29.aspx
As an example:
Consider the problem of generating a subset of two numbers from five numbers:
The possible 2 element subset of {1,..., 5} are
1. {1,2}
2. {1,3}
3. {1,4}
4. {1,5}
5. {2,3}
6. {2,4}
7. {2,5}
8. {3,4}
9. {3,5}
10. {4,5}
Now 5 choose 2 is 10.
So we select a random number from 1 to 10. Say we got 8. Now we generate the 8th element in the sequence above: which gives {3,4}, so the two numbers you want are 3 and 4.
The msdn page I linked to, shows you a method to generate the set, given the number. i.e. given 8, it gives back the set {3,4}.
Your best option is a loop, as in:
$max = 20;
$numels = 5;
$vals = array();
while (count($vals) < $numels) {
$cur = rand(0, $max);
if (!in_array($cur, $vals))
$vals[] = $cur;
}
For small ranges, you can use array_rand:
$max = 20;
$numels = 5;
$range = range(0, $max);
$vals = array_rand($range, $numels);
You could also generate a number between 0 and max, another between 0 and max-1, ... between 0 and max-4. Then you would sum x to the n-th generated number where x is the number calculated in this fashion:
Take the number generated in the n-th iteration and assign it to x
if it's larger or equal to that generated in the first iteration, increment it
if this new number is larger or equal to that generated (and corrected) in the second iteration, increment it
...
if this new number is larger or equal to that generated (and corrected) in the (n-1)-th iteration increment it
The mapping is like this:
1 2 3 4 5 6 7 8 9 (take 4)
1 2 3 4 5 6 7 8 9 (gives 4)
1 2 3 4 5 6 7 8 (take 5)
1 2 3 5 6 7 8 9 (gives 6)
1 2 3 4 5 6 7 (take 6)
1 2 3 5 7 8 9 (gives 8)
1 2 3 4 5 6 (take 5)
1 2 3 5 7 9 (gives 7)
example, last extraction:
x = 5
x >= 4? x == 6
x >= 6? x == 7
x >= 8? x == 7
The general form of this question is really interesting. Should one select from a pool of elements (and remove them from the pool) or should one loop "while hitting" an already taken element?
As far as I can tell, the python library implementation for random.sample chooses at runtime between the two methods depending on the proportion of the size of the input list and the number of elements to select.
A comment from the source code:
# When the number of selections is small compared to the
# population, then tracking selections is efficient, requiring
# only a small set and an occasional reselection. For
# a larger number of selections, the pool tracking method is
# preferred since the list takes less space than the
# set and it doesn't suffer from frequent reselections.
In the specific instance that the OP mentions however (selecting 5 numbers), I think that looping "while hitting a taken number" is ok, unless the pseudo random generator is broken.
Since you are just looking for different ideas here's one:
Call out to Random.org to generate the set of random numbers you need.
If you know the size N then keep each number with probability 5/N generate a random number between 0 and 1 and if it is less than 5/N keep the item. Stop when we have 5 items.
If we don't know N use resorvoir sampling.
An implementation of Artefacto's second solution above in C#, as a helper and an extension method on ICollection:
static class Program {
public static IEnumerable<int> Subset(int max) {
Random random = new Random();
List<int> selections = new List<int>();
for (int space = max; space > 0; space--) {
int selection = random.Next(space);
int offset = selections.TakeWhile((n, i) => n <= selection + i).Count();
selections.Insert(offset, selection + offset);
yield return selection + offset;
}
}
public static IEnumerable<T> Random<T>(this ICollection<T> collection) {
return Subset(collection.Count).Select(collection.ElementAt);
}
static void Main(string[] args) {
Subset(10000).Take(10).ToList().ForEach(Console.WriteLine);
"abcdefghijklmnopqrstuvwxyz".ToArray().Random().Take(5).ToList().ForEach(Console.WriteLine);
}
}
I know we are trying to avoid loops, but just in case this helps someone, you can use a HashSet instead of a List. This is very efficient on a sparse collection where collisions are somewhat rare.
var hs = new HashSet<int>();
var rand = new Random();
for(int i=0; i<10000; i++)
{
int n;
while(true)
{
n = rand.Next(0, 10000000);
if(!hs.Contains(n)) {break;}
}
hs.Add(n);
}