I want a random number generator with non-uniform distribution, ie:
// prints 0 with 0.1 probability, and 1 with 0.9 probability
echo probRandom(array(10, 90));
This is what I have right now:
/**
* method to generated a *not uniformly* random index
*
* #param array $probs int array with weights
* #return int a random index in $probs
*/
function probRandom($probs) {
$size = count($probs);
// construct probability vector
$prob_vector = array();
$ptr = 0;
for ($i=0; $i<$size; $i++) {
$ptr += $probs[$i];
$prob_vector[$i] = $ptr;
}
// get a random number
$rand = rand(0, $ptr);
for ($i=0, $ret = false; $ret === false; $i++) {
if ($rand <= $prob_vector[$i])
return $i;
}
}
Can anyone think of a better way? Possibly one that doesn't require me to do pre-processing?
If you know the sum of all elements in $probs, you can do this without preprocessing.
Like so:
$max = sum($probs);
$r = rand(0,$max-1);
$tot = 0;
for ($i = 0; $i < length($probs); $i++) {
$tot += $probs[$i];
if ($r < $tot) {
return $i;
}
}
This will do what you want in O(N) time, where N is the length of the array. This is a firm lower bound on the algorithmic runtime of such an algorithm, as each element in the input must be considered.
The probability a given index $i is selected is $probs[$i]/sum($probs), given that the rand function returns independent uniformly distributed integers in the given range.
In your solution you generate an accumulated probability vector, which is very useful.
I have two suggestions for improvement:
if $probs are static, i.e. it's the same vector every time you want to generate a random number, you can preprocess $prob_vector just once and keep it.
you can use binary search for the $i (Newton bisection method)
EDIT: I now see that you ask for a solution without preprocessing.
Without preprocessing, you will end up with worst case linear runtime (i.e., double the length of the vector, and your running time will double as well).
Here is a method that doesn't require preprocessing. It does, however, require you to know a maximum limit of the elements in $probs:
Rejection method
Pick a random index, $i and a random number, X (uniformly) between 0 and max($probs)-1, inclusive.
If X is less than $probs[$i], you're done - $i is your random number
Otherwise reject $i (hence the name of the method) and restart.
Related
I'm reading "PHP 7 Data Structures and Algorithms" chapter "Shortest path using the Floyd-Warshall algorithm"
the author is generating a graph with this code:
$totalVertices = 5;
$graph = [];
for ($i = 0; $i < $totalVertices; $i++) {
for ($j = 0; $j < $totalVertices; $j++) {
$graph[$i][$j] = $i == $j ? 0 : PHP_INT_MAX;
}
}
i don't understand this line :
$graph[$i][$j] = $i == $j ? 0 : PHP_INT_MAX;
looks like a one line if statement
is it the same as ?
if ($i == $j) {
$graph[$i][$j] = 0;
} else {
$graph[$i][$j] = PHP_INT_MAX;
}
what is the point of using PHP_INT_MAX ?
at the end what does the graph look like ?
You've correctly understood the ternary (? :) operator
To answer the other part of your question, have a look if the following makes sense to you.
First:
The author initializes the $graph array using the following code:
<?php
$totalVertices = 5; // total nodes (use 0, 1, 2, 3, and 4 instead of A, B, C, D, and E, respectively)
$graph = [];
for ($i = 0; $i < $totalVertices; $i++) {
for ($j = 0; $j < $totalVertices; $j++) {
$graph[$i][$j] = $i == $j ? 0 : PHP_INT_MAX;
}
}
which results in the following matrix
All the nodes(vertices) on the main diagonal(grey) are set to 0 as a node's distance to itself equals 0.
All the remaining nodes in the 'matrix' are set to PHP_INT_MAX (the largest integer supported) - we'll see why this is in a minute.
Second:
The author then sets the distances between the nodes that have a direct connection(edges), writing them manually to the $graph array, as follows:
$graph[0][1] = $graph[1][0] = 10;
$graph[2][1] = $graph[1][2] = 5;
$graph[0][3] = $graph[3][0] = 5;
$graph[3][1] = $graph[1][3] = 5;
$graph[4][1] = $graph[1][4] = 10;
$graph[3][4] = $graph[4][3] = 20;
This results in the following 'matrix' stored in array $graph (green: edge distances):
So why does the author use PHP_INT_MAX for the nodes that are not directly connected(the non-edges)?
The reason is, because it allows for the algorithm to work with
node-connection(edge) distances up to and including PHP_INT_MAX.
In this particular example, any number smaller than 20 in stead of PHP_INT_MAX in the ternary would warp the outcomes of the algorithm - it would spit out wrong results.
Or another way to look at this, in this particular example the author could have just used any number bigger than 20 in stead of PHP_INT_MAX to get satisfactory results from the algorithm,
because the biggest distance between two directly connected nodes in this case equals 20. Use any number smaller than 20 and the results will come out wrong.
You can give it a try, and test:
$graph[$i][$j] = $i == $j ? 0 : 19;
the algorithm will now tell us that the shortest distance between A to E - i.e. $graph[0][4] equals 19... WRONG
So using PHP_INT_MAX here gives 'leeway', it allows for the algorithm to work successfully with edge distances smaller than or equal to 9223372036854775807 (the largest int that can be stored on a 64 bit system),
or 2147483647 (on a 32 bit system).
You have two questions here.
The first is regarding the syntax condition ? val_if_true : val_if_false. This is called the "ternary operator". Your assessment regarding the behavior is correct.
The second is regarding the use of PHP_INT_MAX. All distances between two nodes are being initialized to one of two values: 0 if nodes i and j are the same node (i.e. a vertex), and PHP_INT_MAX if the nodes are not the same (i.e. an edge). That is, a node's distance to itself is 0 and a node's distance to any other node is the largest integer value PHP recognizes. The reason for this is that the Floyd-Warshall algorithm utilizes the concept of "infinity" to represent minimum distances that have not yet been calculated, but as there is no concept of "infinity" in PHP, the value PHP_INT_MAX is being used as a stand-in for it.
Generating a random number in the range [M..N] is easy enough. I however would like to generate a series of random numbers in that range with mean X (M < X < N).
For example, assume the following:
M = 10000
N = 1000000
X = 20000
I would like to generate (a large amount of) random numbers such that the entire range [M..N] is covered, but in this case numbers closer to N should become exceedingly more rare. Numbers closer to M should be more common to ensure that the mean converges to X.
The intended target language is PHP, but this is not a language question per se.
There are many ways to accomplish this, and it would differ very much depending on your demands on precision. The following code uses the 68-95-99.7 rule, based on the normal distribution, with a standard deviation of 15% of the mean.
It does not:
ensure exact precision. If you need this you have to calculate the real mean and compensate for the missing amount.
created a true normal distributed curve dynamically, as all the three chunks (68-95-99.7) are considered equal within their groups.
It does however give you a start:
<?php
$mean = (int)$_GET['mean']; // The mean you want
$amnt = (int)$_GET['amnt']; // The amount of integers to generate
$sd = $mean * 0.15;
$numbers = array();
for($i=1;$i<$amnt;$i++)
{
$n = mt_rand(($mean-$sd), ($mean+$sd));
$r = mt_rand(10,1000)/10; // For decimal counting
if($r>68)
{
if(2==mt_rand(1,2)) // Coin flip, should it add or subtract?
{
$n = $n+$sd;
}
else
{
$n = $n-$sd;
}
}
if($r>95)
{
if(2==mt_rand(1,2))
{
$n = $n+$sd;
}
else
{
$n = $n-$sd;
}
}
if($r>99.7)
{
if(2==mt_rand(1,2))
{
$n = $n+$sd;
}
else
{
$n = $n-$sd;
}
}
$numbers[] = $n;
}
arsort($numbers);
print_r($numbers);
// Echo real mean to see how far off you get. Typically within 1%
/*
$sum = 0;
foreach($numbers as $val)
{
$sum = $sum + $val;
}
echo $rmean = $sum/$amnt;
*/
?>
Hope it helps!
I need to generate x amount of random odd numbers, within a given range.
I know this can be achieved with simple looping, but I'm unsure which approach would be the best, and is there a better mathematical way of solving this.
EDIT: Also I cannot have the same number more than once.
Generate x integer values over half the range, and for each value double it and add 1.
ANSWERING REVISED QUESTION: 1) Generate a list of candidates in range, shuffle them, and then take the first x. Or 2) generate values as per my original recommendation, and reject and retry if the generated value is in the list of already generated values.
The first will work better if x is a substantial fraction of the range, the latter if x is small relative to the range.
ADDENDUM: Should have thought of this approach earlier, it's based on conditional probability. I don't know php (I came at this from the "random" tag), so I'll express it as pseudo-code:
generate(x, upper_limit)
loop with index i from upper_limit downto 1 by 2
p_value = x / floor((i + 1) / 2)
if rand <= p_value
include i in selected set
decrement x
return/exit if x <= 0
end if
end loop
end generate
x is the desired number of values to generate, upper_limit is the largest odd number in the range, and rand generates a uniformly distributed random number between zero and one. Basically, it steps through the candidate set of odd numbers and accepts or rejects each one based how many values you still need and how many candidates still remain.
I've tested this and it really works. It requires less intermediate storage than shuffling and fewer iterations than the original acceptance/rejection.
Generate a list of elements in the range, remove the element you want in your random series. Repeat x times.
Or you can generate an array with the odd numbers in the range, then do a shuffle
Generation is easy:
$range_array = array();
for( $i = 0; $i < $max_value; $i++){
$range_array[] .= $i*2 + 1;
}
Shuffle
shuffle( $range_array );
splice out the x first elements.
$result = array_slice( $range_array, 0, $x );
This is a complete solution.
function mt_rands($min_rand, $max_rand, $num_rand){
if(!is_integer($min_rand) or !is_integer($max_rand)){
return false;
}
if($min_rand >= $max_rand){
return false;
}
if(!is_integer($num_rand) or ($num_rand < 1)){
return false;
}
if($num_rand <= ($max_rand - $min_rand)){
return false;
}
$rands = array();
while(count($rands) < $num_rand){
$loops = 0;
do{
++$loops; // loop limiter, use it if you want to
$rand = mt_rand($min_rand, $max_rand);
}while(in_array($rand, $rands, true));
$rands[] = $rand;
}
return $rands;
}
// let's see how it went
var_export($rands = mt_rands(0, 50, 5));
Code is not tested. Just wrote it. Can be improved a bit but it's up to you.
This code generates 5 odd unique numbers in the interval [1, 20]. Change $min, $max and $n = 5 according to your needs.
<?php
function odd_filter($x)
{
if (($x % 2) == 1)
{
return true;
}
return false;
}
// seed with microseconds
function make_seed()
{
list($usec, $sec) = explode(' ', microtime());
return (float) $sec + ((float) $usec * 100000);
}
srand(make_seed());
$min = 1;
$max = 20;
//number of random numbers
$n = 5;
if (($max - $min + 1)/2 < $n)
{
print "iterval [$min, $max] is too short to generate $n odd numbers!\n";
exit(1);
}
$result = array();
for ($i = 0; $i < $n; ++$i)
{
$x = rand($min, $max);
//not exists in the hash and is odd
if(!isset($result{$x}) && odd_filter($x))
{
$result[$x] = 1;
}
else//new iteration needed
{
--$i;
}
}
$result = array_keys($result);
var_dump($result);
I'm generating an array of random numbers, between 0 and 2 with this code:
for ($j = 0; $j < 60; $j++) {
for ($i = 0; $i < 100; $i++) {
$value = rand(0,2);
$DBH->query("INSERT INTO map (x, y, value) VALUES($i, $j, $value);");
}
And i found and oddity, as you may see here, the rows are random, but they repeat:
22121000210211220022122200120200122000122121
22121000210211220022122200120200122000122121
22121000210211220022122200120200122000122121
22121000210211220022122200120200122000122121
22121000210211220022122200120200122000122121
How can avoid that?
You might want to explicitly seed your generator using srand, e.g. srand(time()) (note that the srand link has a better example of seeding than just using time, depends on how random you need, I suppose).
Failing that
You could try using mt_rand with mt_srand
You could always use MySQL's rand function to generate the numbers as a workaround.
I want to calculate Frequency (Monobits) test in PHP:
Description: The focus of the test is
the proportion of zeroes and ones for
the entire sequence. The purpose of
this test is to determine whether that
number of ones and zeros in a sequence
are approximately the same as would be
expected for a truly random sequence.
The test assesses the closeness of the
fraction of ones to ½, that is, the
number of ones and zeroes in a
sequence should be about the same.
I am wondering that do I really need to calculate the 0's and 1's (the bits) or is the following adequate:
$value = 0;
// Loop through all the bytes and sum them up.
for ($a = 0, $length = strlen((binary) $data); $a < $length; $a++)
$value += ord($data[$a]);
// The average should be 127.5.
return (float) $value/$length;
If the above is not the same, then how do I exactly calculate the 0's and 1's?
No, you really need to check all zeroes and ones. For example, take the following binary input:
01111111 01111101 01111110 01111010
. It is clearly (literally) one-sided(8 zeroes, 24 ones, correct result 24/32 = 3/4 = 0.75) and therefore not random. However, your test would compute 125.0 /255 which is close to ½.
Instead, count like this:
function one_proportion($binary) {
$oneCount = 0;
$len = strlen($binary);
for ($i = 0;$i < $len;$i++) {
$intv = ord($binary{$i});
for ($bitp = 0;$bitp < 7;$bitp++) {
$oneCount += ($intv>>$bitp) & 0x1;
}
}
return $oneCount / (8 * $len);
}