How unique a 5-digit mt_rand() number is? - php

I am just wondering, how unique is a mt_rand() number is, if you draw 5-digits number?
In the example, I tried to get a list of 500 random numbers with this function and some of them are repeated.
http://www.php.net/manual/en/function.mt-rand.php
<?php
header('Content-Type: text/plain');
$errors = array();
$uniques = array();
for($i = 0; $i < 500; ++$i)
{
$random_code = mt_rand(10000, 99999);
if(!in_array($random_code, $uniques))
{
$uniques[] = $random_code;
}
else
{
$errors[] = $random_code;
}
}
/**
* If you get any data in this array, it is not exactly unique
* Run this script for few times and you may see some repeats
*/
print_r($errors);
?>
How many digits may be required to ensure that the first 500 random numbers drawn in a loop are unique?

If numbers are truly random, then there's a probability that numbers will be repeated. It doesn't matter how many digits there are -- adding more digits makes it much less likely there will be a repeat, but it's always a possibility.
You're better off checking if there's a conflict, then looping until there isn't like so:
$uniques = array();
for($i = 0; $i < 500; $i++) {
do {
$code = mt_rand(10000, 99999);
} while(in_array($code, $uniques));
$uniques[] = $code
}

Why not use range, shuffle, and slice?
<?php
$uniques = range(10000, 99999);
shuffle($uniques);
$uniques = array_slice($uniques, 0, 500);
print_r($uniques);
Output:
Array
(
[0] => 91652
[1] => 87559
[2] => 68494
[3] => 70561
[4] => 16514
[5] => 71605
[6] => 96725
[7] => 15908
[8] => 14923
[9] => 10752
[10] => 13816
*** truncated ***
)
This method is less expensive as it does not search the array each time to see if the item is already added or not. That said, it does make this approach less "random". More information should be provided on where these numbers are going to be used. If this is an online gambling site, this would be the worst! However if this was used in returning "lucky" numbers for a horoscope website, I think it would be fine.
Furthermore, this method could be extended, changing the shuffle method to use mt_rand (where as the original method simply used rand). It may also use openssl_random_pseudo_bytes, but that might be overkill.

The birthday paradox is at play here. If you pick a random number from 10000-99999 500 times, there's a good chance of duplicates.
Intuitive idea with small numbers
If you flip a coin twice, you'll get a duplicate about half the time. If you roll a six-sided die twice, you'll get a duplicate 1/6 of the time. If you roll it 3 times, you'll get a duplicate 4/9 (44%) of the time. If you roll it 4 times you'll get at least one duplicate 13/18 (63.33%). Roll it a fifth time and it's 49/54 (90.7%). Roll it a sixth time and it's 98.5%. Roll it a seventh time and it's 100%.
If you take replace the six-sided die with a 20-sided die, the probabilities grow a bit more slowly, but grow they do. After 3 rolls you have a 14.5% chance of duplicates. After 6 rolls it's 69.5%. After 10 rolls it's 96.7%, near certainty.
The math
Let's define a function f(num_rolls, num_sides) to generalize this to any number of rolls of any random number generator that chooses out of a finite set of choices. We'll define f(num_rolls, num_sides) to be the probability of getting no duplicates in num_rolls of a num_sides-side die.
Now we can try to build a recursive definition for this. To get num_rolls unique numbers, you'll need to first roll num_rolls-1 unique numbers, then roll one more unique number, now that num_rolls-1 numbers have been taken. Therefore
f(num_rolls, num_sides) =
f(num_rolls-1, num_sides) * (num_sides - (num_rolls - 1)) / num_sides
Alternately,
f(num_rolls + 1, num_side) =
f(num_rolls, num_sides) * (num_sides - num_rolls) / num_sides
This function follows a logistic decay curve, starting at 1 and moving very slowly (since num_rolls is very low, the change with each step is very small), then slowly picking up speed as num_rolls grows, then eventually tapering off as the function's value gets closer and closer to 0.
I've created a Google Docs spreadsheet that has this function built in as a formula to let you play with this here: https://docs.google.com/spreadsheets/d/1bNJ5RFBsXrBr_1BEXgWGein4iXtobsNjw9dCCVeI2_8
Tying this back to your specific problem
You've generated rolled a 90000-sided die 500 times. The spreadsheet above suggests you'd expect at least one duplicate pair about 75% of the time assuming a perfectly random mt_rand. Mathematically, the operation your code was performing is choosing N elements from a set with replacement. In other words, you pick a random number out of the bag of 90000 things, write it down, then put it back in the bag, then pick another random number, repeat 500 times. It sounds like you wanted all of the numbers to be distinct, in other words you wanted to choose N elements from a set without replacement. There are a few algorithms to do this. Dave Chen's suggestion of shuffle and then slice is a relatively straightforward one. Josh from Qaribou's suggestion of separately rejecting duplicates is another possibility.

Your question deals with a variation of the "Birthday Problem" which asks if there are N students in a class, what is the probability that at least two students have the same birthday? See Wikipedia: The "Birthday Problem".
You can easily modify the formula shown there to answer your problem. Instead of having 365 equally probable possibilities for the birthday of each student, you have 90001 (=99999-10000+2) equally probable integers that can be generated between 10000 and 99999. The probability that if you generate 500 such numbers that at least two numbers will be the same is:
P(500)= 1- 90001! / ( 90001^n (90001 - 500)! ) = 0.75
So there is a 75% chance that at least two of the 500 numbers that you generate will be the same or, in other words, only a 25% chance that you will be successful in getting 500 different numbers with the method you are currently using.
As others here have already suggested, I would suggest checking for repeated numbers in your algorithm rather than just blindly generating random numbers and hoping that you don't have a match between any pair of numbers.

Related

Making a coinflip in php (Provably fair)

so I'm trying to create website with a coinflip system (Its just a small project I'm doing in my free time) but I don't really know where to begin. I need to make it in php (so its in the backend) and I need it to be provably fair (so I can prove that it is legit). What I've found out is that I need to use something like sh256 but I also heard that its pretty out dated and can be easily cracked. Also if it matters it's a site with a steam login system so I plan on being able to join 1v1's with others steam users not just a person sitting besides me or something (not just 1 button is what I mean hehe).
EDIT: I have googled it and tried asking people I know and etc if they knew anything but nothing was any good.
Thanks in advance
-EirĂ­kur
This is a simple way to get a random coin toss result:
$result = array("heads", "tails")[random_int(0,1)];
First, we make an array, which will be our choices. array("heads, "tails") means we will always get one of those 2 results. Next, in the same line, we can select a single element to actually assign to the $result variable from the array we made previously. We can use random_int(min, max) to generate that number.
Note: random_int() generates cryptographic random integers that are
suitable for use where unbiased results are critical, such as when
shuffling a deck of cards for a poker game.
http://php.net/manual/en/function.random-int.php
As a bonus, you could add more elements to this array, and then just increase the max value in random_int(), and it will work. You could make this more dynamic as-well by doing it like this:
$choices = ["heads", "tails", "Coin flew off the table"];
$result = $choices[random_int(0, count($choices)-1];
With the above code, you can have as many choices as you'd like!
Testing
I ran this code 50,000 times, and these were my results.
Array
(
[heads] => 24923
[tails] => 25077
)
And I ran this code 100,000 times, these were my results:
Array
(
[tails] => 49960
[heads] => 50040
)
You can play around with this here, to check out results:
https://eval.in/894945
The answer above might be the best for most of the scenarios.
In commercial usage, you might want to make sure that the results can be recalculated to prove fairness.
In the following code, you need to calculate a seed for the server. Besides, you also might want to create a public seed that users can see. Those can be anything but I do recommend using some kind of a hash. Each time you get a new result just increase the round, it will generate a new truly random result.
$server_seed = "96f3ea4d221ca1b2048cc3b3b844e479f2bd9c80a870628072ee98fd1aa83cd0";
$public_seed = "460679512935";
for($round = 0;$round < 10;$round++) {
$hash = hash('sha256', $server_seed . "-" . $public_seed . "-" . $round);
if (hexdec(substr($hash, 0, 8)) % 2) {
echo 'heads', PHP_EOL;
} else {
echo 'tails', PHP_EOL;
}
}
This code will loop through 10 times using for loop, each time generating a new result. In the code, we assign a SHA256 hash to the $hash variable. Then we can calculate the decimal value from the $hash using PHP inbuilt function hexdec. We take the remainder from the decimal value and give the result based on it whether it's 0 or not.
NOTE You can play around with the values. Changing the substring to substr($hash, 0, 14) will get you a different way of generation to the results. Keep in mind that this will not change the final results in any way.
Average results of 1 000 000 runs were the following:
Heads: 50.12%
Tails: 49.88%
You can experiment with the code above at here.

Getting every combination of X numbers given Y numbers?

I've come to a mathematical problem which for I can't program the logic.
Let me explain it with an example:
Let's say I have 4 holes and 3 marbles, the holes are in order and my marbles are A,B and C and also in order.
I need to get every posible ORDERED combination:
ABC4
AB3C
A2BC
1ABC
This is very simple, but what if the number of holes changes? Let's say now I have 5 holes.
ABC45
AB3C5
A2BC5
1ABC5
AB34C
A2B4C
1AB4C
A23BC
1A3BC
12ABC
Now let's say we have 5 holes and 4 marbles.
ABCD5
ABC4D
AB3CD
A2BCD
1ABCD
And this can be any number of holes and any number of marbles.
The number of combinations is given by:
$combinations = factorial($number_of_holes)/(factorial($number_of_marbles)*factorial($number_of_holes-$number_of_marbles)))
(Here it is the factorial function in case you need it)
function factorial($number) {
if ($number < 2) {
return 1;
} else {
return ($number * factorial($number-1));
}
}
What I need and can't figure out how to program, is a function or a loop or something, that returns an array with the position of the holes, given X numbers of holes and Y number of marbles.
For first example it would be: [[4],[3],[2],[1]], for second: [[4,5],[2,5],[1,5],[3,4],[2,4],[1,5],[2,3],[1,3],[1,2]], for third: [[5],[4],[3],[2],[1]].
It doesn't have to be returned in order, I just need all the elements.
As you can see, another approach is the complementary or inverse or don't know how to call it, but the solution is every combinations of X number of free holes given Y number of holes, so, If I have 10 holes, and 5 marbles, there would be 5 free holes, the array returned would be every combination of 5 that can be formed with (1,2,3,4,5,6,7,8,9,10), which are 252 combinations, and what I need is the 252 combinations.
Examples for the 2nd approach:
Given an array=[1,2,3,4], return every combination for sets of 2 and 3.
Sets of 2
[[1,2],[1,3],[1,4],[2,3],[2,4],[3,4]]
Sets of 3
[[1,2,3],[1,2,4],[1,3,4],[2,3,4]]
What I need is the logic to do this, I'm trying to do it in PHP, but I just can't figure out how to do it.
The function would receive the array and the set size and would return the array of sets:
function getCombinations($array,$setize){
//magic code which I can't figure out
return array(sets);
}
I hope this is clear enough and someone can help me, I've been stuck for several days now, but it seems to be just too much for me to handle by myself.
This post, PHP algorithm to generate all combinations of a specific size from a single set, is for all possible combinations, repeating the elements and order doesn't matter, its a good lead, I did read it, but it doesn't solve my problem, it's very different. I need them without repeating the elements and ordered as explained.
Let's say if I have already a set of [3,4] in my array, I don't want [4,3] as an other set.
Here's a recursive solution in PHP:
function getCombinations($array, $setsize){
if($setsize == 0)
return [[]];
// generate combinations including the first element by generating combinations for
// the remainder of the array with one less element and prepending the first element:
$sets = getCombinations(array_slice($array, 1), $setsize - 1);
foreach ($sets as &$combo) {
array_unshift($combo, $array[0]);
}
// generate combinations not including the first element and add them to the list:
if(count($array) > $setsize)
$sets = array_merge($sets, getCombinations(array_slice($array, 1), $setsize));
return $sets;
}
// test:
print_r(getCombinations([1, 2, 3, 4], 3));
Algorithm works like this:
If setsize is 0 then you return a single, empty combination
Otherwise, generate all combinations that include the first element, by recursively generating all combinations off the array excluding the first element with setsize - 1 elements, and then prepending the first element to each of them.
Then, if the array size is greater than setsize (meaning including the first element is not compulsory), generate all the combinations for the rest of the list and add them to the ones we generated in the second step.
So basically at each step you need to consider whether an element will be included or excluded in the combination, and merge together the set of combinations representing both choices.

How to generate card number so the users cannot follow how much is sold?

I want some generator script to generate unique numbers but not in one order. We need to sell tickets.
For example currently ticket numbers are like this:
100000
100001
100002
...
So the users can see how many are sold.
How can I generate unique numbers?
for example:
151647
457561
752163
...
I could use random number generator, but then I have always check in database if such number has not been generated.
Hmm, maybe when using index on that column - the check would not take long.
Still now I have to get last card number, if I want to add 1 to it, but getting last is fast enough.
And the more tickets will be sold, then bigger chance that RNG will generate existing number. So migth be more checks in future. SO the best would be to take last number and generate next by it.
Here's a simple way to scramble ticket numbers (note: you need 64-bit PHP, or change the code to use the bcmath library):
function scramble($number) {
return (305914*($number-100000)+151647) % 999983;
}
Look, the output even looks like your example:
Input Output
------ ------
100000 151647
100001 457561
100002 763475
100003 069406
If you want to you can reverse it, so you can use these codes in URLs and then recover the original number:
function unscramble($number) {
return (605673*($number-151647)+100000) % 999983 ;
}
Is this safe? Someone with access to many sequential numbers can find the pattern so don't use this if the ticket numbers are extremely sensitive.
Generate random numbers, make the ticket number unique index, insert the record with the new ticket, if fails means that you had a collision, so you have to generate another id. With a good random space, say 32 bit integer, the chance of collision is minimal. The SQL implementation behind if the column is index and numerical is lightning fast.
You can have your number generated, store in a pool, when you need new number, get one with RNG index of the pool, remove from the pool and return it.
if the pool nearly run out, just generate another batch of it
function generateCode() {
$chars = '01234567890';
do {
$code = '';
for ($x = 0; $x < 6; $x++) {
$code .= $chars[ rand(0, strlen($chars)-1) ];
}
you may check here in databse if this code has been generated earlier, if yes, return;
} while (true);
return $code;
}
The easy way, you can simply use md5() function..
And to get a 6 digit string, you can do
$x = md5(microtime());
echo substr($x, 0, 6);
Edit:
session_start();
$x = md5(microtime().session_id());
echo substr($x, 0, 6);

What is the best algorithm to see if my number is in an array of ranges?

I have a 2 dimensional arrays in php containing the Ranges. for example:
From.........To
---------------
125..........3957
4000.........5500
5217628......52198281
52272128.....52273151
523030528....523229183
and so on
and it is a very long list. now I want to see if a number given by user is in range.
for example numbers 130, 4200, 52272933 are in my range but numbers 1, 5600 are not.
of course I can count all indexes and see if my number is bigger than first and smaller than second item. but is there a faster algorithm or a more efficient way of doing it using php function?
added later
It is sorted. it is actually numbers created with ip2long() showing all IPs of a country.
I just wrote a code for it:
$ips[1] = array (2,20,100);
$ips[2] = array (10,30,200);
$n=11;// input ip
$count = count($ips);
for ($i = 0; $i <= $count; $i++) {
if ($n>=$ips[1][$i]){
if ($n<=$ips[2][$i]){
echo "$i found";
break;
}
}else if($n<$ips[1][$i]){echo "not found";break;}
}
in this situation numbers 2,8,22,and 200 are in range. but not numbers 1,11,300
Put the ranges in a flat array, sorted from lower to higher, like this:
a[0] = 125
a[1] = 3957
a[2] = 4000
a[3] = 5500
a[4] = 5217628
a[5] = 52198281
a[6] = 52272128
a[7] = 52273151
a[8] = 523030528
a[9] = 523229183
Then do a binary search to determine at what index of this array the number in question should be inserted. If the insertion index is even then the number is not in any sub-range. If the insertion index is odd, then the number falls inside one of the ranges.
Examples:
n = 20 inserts at index 0 ==> not in a range
n = 126 inserts at index 1 ==> within a range
n = 523030529 inserts at index 9 ==> within a range
You can speed things up by implementing a binary search algorithm. Thus, you don't have to look at every range.
Then you can use in_array to check if the number is in the array.
I'm not sure if I got you right, do your arrays really look like this:
array(125, 126, 127, ..., 3957);
If so, what's the point? Why not just have?
array(125, 3957);
That contains all the information necessary.
The example you give suggests that the numbers may be large and the space sparse by comparison.
At that point, you don't have very many options. If the array is sorted, binary search is about all there is. If the array is not sorted, you're down to plain, old CS101 linear search.
The correct data structure to use for this problem is an interval tree. This is, in general, much faster than binary search.
I am assuming that the ranges do not overlap.
If that is the case, you can maintain a map data structure that is keyed on the lower value of the range.
Now all you have to do (given the number N) is to find the key in the map that is just lower than N (using binary search - logarithmic complexity) and then check if the number is lesser than the right value.
Basically, it is a binary search (logarithmic) on the constructed map.
From a pragmatic point of view, a linear search may very well turn out to be the fastest lookup method. Think of page faults and hard disk seek time here.
If your array is large enough (whatever "enough" actually means), it may be wise to stuff your IPs in a SQL database and let the database figure out how to efficiently compute SELECT ID FROM ip_numbers WHERE x BETWEEN start AND end;.

random and unique subsets generation

Lets say we have numbers from 1 to 25 and we have to choose sets of 15 numbers.
The possible sets are, if i'm right 3268760.
Of those 3268760 options, you have to generate say 100000
What would be the best way to generate 100000 unique and random of that subsets?
Is there a way, an algorithm to do that?
If not, what would be the best option to detect duplicates?
I'm planning to do this on PHP but a general solution would be enough,
and any reference not to much 'academic' (more practical) would help me a lot.
There is a way to generate a sample of the subsets that is random, guaranteed not to have duplicates, uses O(1) storage, and can be re-generated at any time. First, write a function to generate a combination given its lexical index. Second, use a pseudorandom permutation of the first Combin(n, m) integers to step through those combinations in a random order. Simply feed the numbers 0...100000 into the permutation, use the output of the permutation as input to the combination generator, and process the resulting combination.
Here's a solution in PHP based on mjv's answer, which is how I was thinking about it. If you run it for a full 100k sets, you do indeed see a lot of collisions. However, I'm hard pressed to devise a system to avoid them. Instead, we just check them fairly quickly.
I'll think about better solutions ... on this laptop, I can do 10k sets in 5 seconds, 20k sets in under 20 seconds. 100k takes several minutes.
The sets are represented as (32-bit) ints.
<?PHP
/* (c) 2009 tim - anyone who finds a use for this is very welcome to use it with no restrictions unless they're making a weapon */
//how many sets shall we generate?
$gNumSets = 1000;
//keep track of collisions, just for fun.
$gCollisions = 0;
$starttime = time();
/**
* Generate and return an integer with exactly 15 of the lower 25 bits set (1) and the other 10 unset (0)
*/
function genSetHash(){
$hash = pow(2,25)-1;
$used = array();
for($i=0;$i<10;){
//pick a bit to turn off
$bit = rand(0,24);
if (! in_array($bit,$used)){
$hash = ( $hash & ~pow(2,$bit) );
$i++;
$used[] = $bit;
}
}
return $hash;
}
//we store our solution hashes in here.
$solutions = array();
//generate a bunch of solutions.
for($i=0;$i<$gNumSets;){
$hash = genSetHash();
//ensure no collisions
if (! in_array($hash,$solutions)){
$solutions[] = $hash;
//brag a little.
echo("Generated $i random sets in " . (time()-$starttime) . " seconds.\n");
$i++;
}else {
//there was a collision. There will generally be more the longer the process runs.
echo "thud.\n";
$gCollisions++;
}
}
// okay, we're done with the hard work. $solutions contains a bunch of
// unique, random, ints in the right range. Everything from here on out
// is just output.
//takes an integer with 25 significant digits, and returns an array of 15 numbers between 1 and 25
function hash2set($hash){
$set = array();
for($i=0;$i<24;$i++){
if ($hash & pow(2,$i)){
$set[] = $i+1;
}
}
return $set;
}
//pretty-print our sets.
function formatSet($set){
return "[ " . implode(',',$set) . ']';
}
//if we wanted to print them,
foreach($solutions as $hash){
echo formatSet(hash2set($hash)) . "\n";
}
echo("Generated $gNumSets unique random sets in " . (time()-$starttime) . " seconds.\n");
echo "\n\nDone. $gCollisions collisions.\n";
I think it's all correct, but it's late, and I've been enjoying several very nice bottles of beer.
Do they have to be truly random? Or seemingly random?
Selection: generate a set with all 25 - "shuffle" the first 15 elements using Fisher-Yates / the Knuth shuffle, and then check if you've seen that permutation of the first 15 elements before. If so, disregard, and retry.
Duplicates: You have 25 values that are there or not - this can be trivially hashed to an integer value (if the 1st element is present, add 2^0, if the second is, add 2^1, etc. - it can be directly represented as a 25 bit number), so you can check easily if you've seen it already.
You'll get a fair bit of collisions, but if it's not a performance critical snippet, it might be doable.
The random number generator (RNG) of your environment will supply you random numbers that are evenly distributed in a particular range. This type of distribution is often what is needed, say if your subset simulate lottery drawings, but it is important to mention this fact in case your are modeling say the age of people found on the grounds of a middle school...
Given this RNG you can "draw" 10 (or 15, read below) numbers between 1 and 25. This may require that you multiply (and round) the random number produced by the generator, and that you ignore numbers that are above 25 (i.e. draw again), depending on the exact API associated with the RNG, but again getting a drawing in a given range is trivial. You will also need to re-draw when a number comes up again.
I suggest you get 10 numbers only, as these can be removed from the 1-25 complete sequence to produce a set of 15. In other words drawing 15 to put in is the same drawing 10 to take out...
Next you need to assert the uniqueness of the sets. Rather than storing the whole set, you can use a hash to identify each set uniquely. This should take fewer that 25 bits, so can be stored on a 32 bits integer. You then need to have an efficient storage for up to 100,000 of these values; unless you want to store this in a database.
On this question of uniqueness of 100,000 sets taken out of all the possible sets, the probability of a collision seems relatively low. Edit: Oops... I was optimistic... This probability is not so low, with about 1.5% chance of a collision starting after drawing the 50,000th, there will be quite a few collisions, enough to warrant a system to exclude them...

Categories