Pick a number based on probability in floating number PHP

Pick a number based on probability in floating number PHP - php

I am trying to create a dice game where you are given a 6 sided dice and you roll but the probabilities of each side is predefined (not fair).
Example:
Side Probability
1 3.63098
2 18.38200
3 10.59424
4 3.87055
5 13.66651
6 49.85572
Total: 100
So Side 6 should roll most often than any other. I've tried the following approach:
3.63098 + 18.38200 + 10.59424 + 3.87055 + 13.66651 + 49.85572 = 100
and generate a random number: rand(0, 100) and pick a number based on where it lied.
However it does not really work with floating numbers, max probability does not have to be
100 it can be 23.994 and will be distributed randomly to 6 sides.
Please suggest an algorithm to use or where to look for algorithms, I'm not asking to write code for me just need to research but dont know what to look for.

You can generate a random number between 0 and RAND_MAX.
Then scale each probability to RAND_MAX by dividing by 100 and multiplying by RAND_MAX. You can get more accurate probabilistic outcomes.
OR
Just divide the generated random number by RAND_MAX and check which range of [0,1] it lies.

First do normalization of probabilty to lie within [0,1]
normalizes prob(p) = value(p)/(sum of all values)
then find range of values of each variable for example
1 => 0.2
2 => 0.3
3 => 0.1
4 => 0.2
5 => 0.1
6 => 0.1
then ranges will be:-
1 =>[0,0.2]
2 =>(0.2,0.5]
3 =>(0.5,0.6]
4 =>(0.6,0.8]
5 =>(0.8,0.9]
6 =>(0.9,1.0]
Then for a dice roll generate a random float in range [0,1]
Find the range in which it is enclosed
eg . random float = 0.37 then side on top will be 2

Related

PHP: number_format float to string conversion unexpected behaviour searching for solution [duplicate]

Locked. This question and its answers are locked because the question is off-topic but has historical significance. It is not currently accepting new answers or interactions.
How do you explain floating point inaccuracy to fresh programmers and laymen who still think computers are infinitely wise and accurate?
Do you have a favourite example or anecdote which seems to get the idea across much better than an precise, but dry, explanation?
How is this taught in Computer Science classes?

There are basically two major pitfalls people stumble in with floating-point numbers.
The problem of scale. Each FP number has an exponent which determines the overall “scale” of the number so you can represent either really small values or really larges ones, though the number of digits you can devote for that is limited. Adding two numbers of different scale will sometimes result in the smaller one being “eaten” since there is no way to fit it into the larger scale.
PS> $a = 1; $b = 0.0000000000000000000000001
PS> Write-Host a=$a b=$b
a=1 b=1E-25
PS> $a + $b
1
As an analogy for this case you could picture a large swimming pool and a teaspoon of water. Both are of very different sizes, but individually you can easily grasp how much they roughly are. Pouring the teaspoon into the swimming pool, however, will leave you still with roughly a swimming pool full of water.
(If the people learning this have trouble with exponential notation, one can also use the values 1 and 100000000000000000000 or so.)
Then there is the problem of binary vs. decimal representation. A number like 0.1 can't be represented exactly with a limited amount of binary digits. Some languages mask this, though:
PS> "{0:N50}" -f 0.1
0.10000000000000000000000000000000000000000000000000
But you can “amplify” the representation error by repeatedly adding the numbers together:
PS> $sum = 0; for ($i = 0; $i -lt 100; $i++) { $sum += 0.1 }; $sum
9,99999999999998
I can't think of a nice analogy to properly explain this, though. It's basically the same problem why you can represent 1/3 only approximately in decimal because to get the exact value you need to repeat the 3 indefinitely at the end of the decimal fraction.
Similarly, binary fractions are good for representing halves, quarters, eighths, etc. but things like a tenth will yield an infinitely repeating stream of binary digits.
Then there is another problem, though most people don't stumble into that, unless they're doing huge amounts of numerical stuff. But then, those already know about the problem. Since many floating-point numbers are merely approximations of the exact value this means that for a given approximation f of a real number r there can be infinitely many more real numbers r1, r2, ... which map to exactly the same approximation. Those numbers lie in a certain interval. Let's say that rmin is the minimum possible value of r that results in f and rmax the maximum possible value of r for which this holds, then you got an interval [rmin, rmax] where any number in that interval can be your actual number r.
Now, if you perform calculations on that number—adding, subtracting, multiplying, etc.—you lose precision. Every number is just an approximation, therefore you're actually performing calculations with intervals. The result is an interval too and the approximation error only ever gets larger, thereby widening the interval. You may get back a single number from that calculation. But that's merely one number from the interval of possible results, taking into account precision of your original operands and the precision loss due to the calculation.
That sort of thing is called Interval arithmetic and at least for me it was part of our math course at the university.

Show them that the base-10 system suffers from exactly the same problem.
Try to represent 1/3 as a decimal representation in base 10. You won't be able to do it exactly.
So if you write "0.3333", you will have a reasonably exact representation for many use cases.
But if you move that back to a fraction, you will get "3333/10000", which is not the same as "1/3".
Other fractions, such as 1/2 can easily be represented by a finite decimal representation in base-10: "0.5"
Now base-2 and base-10 suffer from essentially the same problem: both have some numbers that they can't represent exactly.
While base-10 has no problem representing 1/10 as "0.1" in base-2 you'd need an infinite representation starting with "0.000110011..".

How's this for an explantation to the layman. One way computers represent numbers is by counting discrete units. These are digital computers. For whole numbers, those without a fractional part, modern digital computers count powers of two: 1, 2, 4, 8. ,,, Place value, binary digits, blah , blah, blah. For fractions, digital computers count inverse powers of two: 1/2, 1/4, 1/8, ... The problem is that many numbers can't be represented by a sum of a finite number of those inverse powers. Using more place values (more bits) will increase the precision of the representation of those 'problem' numbers, but never get it exactly because it only has a limited number of bits. Some numbers can't be represented with an infinite number of bits.
Snooze...
OK, you want to measure the volume of water in a container, and you only have 3 measuring cups: full cup, half cup, and quarter cup. After counting the last full cup, let's say there is one third of a cup remaining. Yet you can't measure that because it doesn't exactly fill any combination of available cups. It doesn't fill the half cup, and the overflow from the quarter cup is too small to fill anything. So you have an error - the difference between 1/3 and 1/4. This error is compounded when you combine it with errors from other measurements.

In python:
>>> 1.0 / 10
0.10000000000000001
Explain how some fractions cannot be represented precisely in binary. Just like some fractions (like 1/3) cannot be represented precisely in base 10.

Another example, in C
printf (" %.20f \n", 3.6);
incredibly gives
3.60000000000000008882

Here is my simple understanding.
Problem:
The value 0.45 cannot be accurately be represented by a float and is rounded up to 0.450000018. Why is that?
Answer:
An int value of 45 is represented by the binary value 101101.
In order to make the value 0.45 it would be accurate if it you could take 45 x 10^-2 (= 45 / 10^2.)
But that’s impossible because you must use the base 2 instead of 10.
So the closest to 10^2 = 100 would be 128 = 2^7. The total number of bits you need is 9 : 6 for the value 45 (101101) + 3 bits for the value 7 (111).
Then the value 45 x 2^-7 = 0.3515625. Now you have a serious inaccuracy problem. 0.3515625 is not nearly close to 0.45.
How do we improve this inaccuracy? Well we could change the value 45 and 7 to something else.
How about 460 x 2^-10 = 0.44921875. You are now using 9 bits for 460 and 4 bits for 10. Then it’s a bit closer but still not that close. However if your initial desired value was 0.44921875 then you would get an exact match with no approximation.
So the formula for your value would be X = A x 2^B. Where A and B are integer values positive or negative.
Obviously the higher the numbers can be the higher would your accuracy become however as you know the number of bits to represent the values A and B are limited. For float you have a total number of 32. Double has 64 and Decimal has 128.

A cute piece of numerical weirdness may be observed if one converts 9999999.4999999999 to a float and back to a double. The result is reported as 10000000, even though that value is obviously closer to 9999999, and even though 9999999.499999999 correctly rounds to 9999999.

generating a pseudo unique number(code) based on a sequence of numbers with no repetition within 4 digits

I'm generating a somewhat unique code that I don't want to repeat within at least 1000 consecutive numbers
this is my function. take a number multiply it by another number then take the last $length=5 digits before the decimal point
function createPseudoUniqueString($input,$length)
{
return substr( intval($input*738510.93067),-$length) ;
}
is there a way to validate that the resulting numbers do not repeat other than testing all the possibilities?
Is there another alternative that is known not to have be repeating

You can design a custom Linear Congruential Generator that generates random 5-digit numbers and is guaranteed not to repeat until it has generated all of them.
An LCG generates random numbers using the following formula:
Xn+1 = ((Xn * a) + c) mod m
To generate 5-digit numbers m should be 100000 (range of 0-99999).
To guarantee no repeats (a "full period") you have to select values for a and c using the following criteria:
c and m are relatively prime
a - 1 is divisible by all prime factors of m
a - 1 is a multiple of 4 if m is a multiple of 4.
The prime factors of 100000 are 2 and 5, and it's also divisible by 4, so any multiple of 20 + 1 will work as a suitable value of a, being careful not to set it too large to avoid integer overflows. For c just choose a reasonably large prime number.
e.g: m = 100000, a = 4781, c = 62873
Set an initial seed value for x and then generate each value from the previous one using $x = (($x*4781)+62873)%100000;
Note that you can't just use a random number generator with a larger period and then mod 100000 the generated values, because even though the raw generated numbers from the larger-period RNG don't repeat, that doesn't guarantee that the numbers mod 100K won't.

Convert rating percent (50%) to the equivelant 5-star rating in PHP

I'm sure this is possible but my math isn't that fantastic.
I'm showing latest movies on my page and my application uses a 5-star rating system, however, the data I receive from a Web Service arrives as a percentage e.g. 50%.
Is there any way I can convert this percentage to a star rating equivalent, which in this case would be 50% = 2.5, allowing me to show 2.5 stars?
It seems fairly simple when I have 50% but if I get 94%, it confuses my poor little pea for a brain! Please help.

If you want to convert the 0..100 scale to a 0..5 scale, just divide by 20.
If you want it on a half-star boundary, then divide it by 10 instead and that's the number of half-stars you need.
Keep in mind I'm talking about integer division here, where the value is truncated (rounded down).
You may also want to consider rounding it more intelligently during the division, rather than truncating, so that something like 99% is 5 stars (not 4.5). This can be done by simply adding half the amount you're dividing by before the division, something like (in C):
int percent = 94;
int halfstars = (percent + 5) / 10;
This would give the following results for input values between 0 and 100 inclusive:
percent halfstars
------- ---------
0- 4 0
5- 14 1
15- 24 2
25- 34 3
35- 44 4
45- 54 5
55- 64 6
65- 74 7
75- 84 8
85- 94 9
95-100 10

The formula for finding the percentage of a number is fairly simple:
$percentInDecimalForm * $number
For example, a 94% rating would be:
.94 * 5 = 4.7

You just need to solve the following:
100% ---------- 5
94% ---------- x = (94 * 5) / 100 (=) x = 4.7
Now it's necessary to know the granularity of your star scale (how many times you can divide the star).
Since you mentioned 0.5 stars, I'm gonna assume your star granularity is 1 / 0.5 = 2, so just solve:
round(4.7 * 2) / 2 (=) 9 / 2 (=) 4.5

standard deviation and mode

I have a system that monitors the performance of students. It tabulates the number of students who gained a score of 1,1.25,1.5,....5 (this is our grading system). For example:
grading system number of students
1 12
1.25 10
1.5 15
1.75 15
2 20
2.25 1
2.5 5
2.75 6
3 8
5 0
From this example, I need my system to determine which is the mode and then print it. I also need to get the standard deviation.
I need this in PHP. Can anyone help me with this?
Your ideas, comments, and suggestions are appreciated.
Update:
Here's what I've done so far:
Finished the standard deviation...but there are still discrepancies i can't resolve...when i calculate the standard deviation manually..the answer is different from the output of my system.. >.<
While for the mode I used an array..this is my code:
$sample = array($one[$ctr],$two[$ctr],$three[$ctr],$four[$ctr],$five[$ctr],$six[$ctr],$seven[$ctr],$eight[$ctr],$nine[$ctr],$ten[$ctr],$fda[$ctr]);
rsort($sample);
$holder = $sample[0];
//$holder = $mode;
The sorting is successful and I can the highest number but I need to print the value of $holder to a table using fpdf.
Any ideas, why the value is not visible in the output?

Well, the mode is easy. Just find the grade (2) which has the highest number of students (20) and there you are.
If there's more than one, then it's multi-modal and you should probably allow for that.
For the standard deviation, the method can be found here. It's basically working out the mean of all those numbers (let's simplify this by using 1, 1, 2 and 7):
1 + 1 + 2 + 7 10
------------- = -- = 2.5
4 4
then calculating the square root of the variance of all those samples from that mean:
_____________________________________________
/ (1-2.5)^2 + (1-2.5)^2 + (2-2.5)^2 + (7-2.5)^2
/ ---------------------------------------------
\/ 4
__________________________
/ 2.25 + 2.25 + 0.25 + 20.25
= / --------------------------
\/ 4
= 2.5
If you're asking a beginner-level question like how best to do this in a specific language like PHP, you should investigate the use of arrays and loops.

generate a random number between 1 and x where a lower number is more likely than a higher one

This is more of a maths/general programming question, but I am programming with PHP is that makes a difference.
I think the easiest way to explain is with an example.
If the range is between 1 and 10.
I want to generate a number that is between 1 an 10 but is more likely lower than high.
The only way I can think is generate an array with 10 elements equal to 1, 9 elements equal to 2, 8 elements equal to 3.....1 element equal to 10. Then generate a random number based on the number of elements.
The trouble is I am potentially dealing with 1 - 100000 and that array would be ridiculously big.
So how best to do it?

Generate a random number between 0 and a random number!

Generate a number between 1 and foo(n), where foo runs an algorithm over n (e.g. a logarithmic function). Then reverse foo() on the result.

Generate number n which is 0 <= n < 1, multiply it by itself, than multiply by x, run floor on it and add 1. Sorry I used php toooo long ago to write code in it

You could do
$rand = floor(100000 * (rand(0, 1)*rand(0, 1)));
Or something along these lines

There are basically two (or more?) ways to map uniform density to any distribution function: Inverse transformation sampling and Rejection sampling. I think in your case you should use the former.

Quick and simple:
rand(1, rand(1, n))

What you need to do is generate a random number over a greater interval (preferably floating point), and map that into [1,10] in a nonuniform way. Exactly what way depends on how much more likely you want a 1 to be than a 9 or 10.
For C language solutions, see these libraries. You may find use for this in PHP.

Generally speaking, it looks like you want to draw a random number from a Poisson distribution rather than the [uniform distribution](http://en.wikipedia.org/wiki/Uniform_distribution_(continuous)). On the wiki page cited above there is a section which specifically states how you can use the continuous distribution to generate a pseudo-Poisson distribution... check it out. Note that you may want to test different values of λ to ensure the distribution works as you want it to.

It depends on what distribution you want to have exactly, i.e., what number should appear with what probability.
For instance, for even n you could do the following: generate one integer random number x between 1 and n/2 and generate a second number between 1 and n+1. If y > x you generate x otherwise you generate n-x+1. This should give you the distribution in your example.

I think this should give the requested distribution:
Generate a random number in the range 1 .. x. Generate another one in the range 1 .. x+1.
Return the minimum of the two.

Let's think about how your array idea changes the probabilities. Normally every element from 1 to n has a probability of 1/n and is thus equally likely.
Since you have n entries for 1, n-1 entries for 2...1 entry for n, then the total number of entries you have is an arithmetic series. The sum of an arithmetic series counting from 1 to n is n(1+n)/2. So now we know every element's probability should use that as the denominator.
Element 1 has n entries, so it's probability is n/n(1+n)/2. Element 2 is n-1/n(1+n)/2 ... n is 1/n(1+n)/2. That gives a general formula of the numerator as n+1 -i, where i is the number you are checking. That means we now have a function for the probability of any element as n-i+1/n(1+n)/2. all probabilities are between 0 and 1 and sum to 1 by definition, and that is key to the next step.
How can we use this function to skew the number of times an element appears? It's easier with continuous distributions (ie doubles instead of ints) but we can do it. First let's make an array of our probabilities, call it c, and make a running sum of them (cumsum) and store it back in c. If that doesn't make sense, its just a loop like
for(j=0; j < n-1; j++)
if(j) c[j]+=c[j-1]
Now that we have this cumulative distribution, generate a number i from 0 to 1 (a double, not an int. We can check if i is between 0 and c[0], return 1. if i is between c[1] and c[2] return 2...all the way up to n. e.g.
for(j=0; j < n=1;j++)
if(i %lt;= c[j]) return i+1
This will distribute the integers according to the probabilities you have calculated.

<?php
//get random number between 1 and 10,000
$random = mt_rand(1, 10000);
?>

We Keep Coding

PHP, A popular general-purpose scripting language that is especially suited to web development.