Probability of a random variable [closed] - php

Closed. This question is off-topic. It is not currently accepting answers.
Want to improve this question? Update the question so it's on-topic for Stack Overflow.
Closed 10 years ago.
Improve this question
I really feel ashamed to ask this question however I don't have time for revision. Also not a native English speaker, so excuse my lack of math vocabulary.
I am writing a program that requires assigning probabilities to variables then selecting one randomly.
Example:
Imagine that I have I coin, I would like to assign the probably of 70% to heads and 30% to tails. When I toss it I would like to have 70% chance that the heads appears and 30% tails.
A dumb way to do it is to create an array of cells insert the heads 70 cells and the tail in 30. Randomize them and select one randomly.
Edit 1: I also would like to point out that I am not limited to 2 variables. For example lets say that I have 3 characters to select between (*,\$,#) and I want to assign the following probably to each of them * = 30%, \$ = 30%, and # = 40%.
That's why I did not want to to use the random function and wanted to see how it was done mathematically.

You want another way to do it? Most rand functions produce a decimal from [0, 1). For 30%, check produced number is less than 0.3
Though note, if you actually test the perceived "randomness", it's not really random..
In PHP, you can use rand(0, 99) (integer instead of double, 30 instead of 0.3). PHP rand function is a closed interval (both inclusive)
function randWithWeight($chanceToReturnTrue) { // chance in percent
return rand(0, 99) < $chanceToReturnTrue;
}
Edit: for the note about perceived randomness. Some math since you say you're coming from math... Generate numbers from 0-99, adding them to an array. Stop when the array contains a duplicate. It usually takes about ~20 passes (I'm getting 3-21 passes before duplicate, 10+ tries). So it's not what you'd expect as "random". Though, (I know I'm going off track), take a look at the birthday problem. It is "more random" than it seems.

Here is a simple function to calculate weighted rand:
<?php
function weightedRand($weights, $weight_sum = 100){
$r = rand(1,$weight_sum);
$n = count($weights);
$i = 0;
while($r > 0 && $i < $n){
$r -= $weights[$i];
$i++;
}
return $i - 1;
}
This function accepts an array. For example array(30,70) will have 30% chance getting 0 and 70% chance getting 1. This should work for multiple weights.
Its principle is to subtract the generated random number by the weight until it gets less than or equal to zero.
Demo with 30%:70%
Demo with 20%:30%:50%

If you want 30% probability just do
if(rand(1,100) <= 30){
// execute code
}

One way would be
$r=rand(1,100);
if($r<70)
{
echo "Head";
}
else
{
echo "Tail";
}

Related

How can I overcome discrete nature of a numerical algorithm that is currently skipping over a certain real number inside a loop?

I have a fairly complex algorithm that performs a search where I use a $search variable in some range [0.25 to 1.75].
Based on the algorithm there is an "interesting" thing happens when the $search is exactly 1, because it hits a configuration of variables that is sometimes (but not always) most favorable. Some of the code depends on $search being exactly 1 to produce that most favorable outcome.
More specifically, there is usually some specific value within the search range, which produces most favorable outcome, but the way my algorithm is laid out, that specific value is most often skipped over. Here I lay out example when that specific value (based on other inputs and configuration), happens to be exactly 1..
The Problem
Mathematically speaking if $search was continuous rather than discreet, I wouldn't have this problem. My problem is trying to converge on most favorable variable configuration using discrete mathematics. The issue here is the algorithm. Secondary issue to watch out for as well is floating point arithmetic, but I do not believe that is the issue here just yet.
Basic Loop:
$maxPowerOut = 0 ;
for ($increment = 0; $increment <= 500; $increment ++)
{
//vars computed elsewhere, i.e:
//MIN = 0.24651533;
//STEP = 0.00196969
$search = MIN + STEP * $increment;
//compute several coefficients (returns an array)
$coeff = $this->coefficient($search);
//design is a complex library function
list($a, $b) = $this->design($coeff);
$powerOut = $a * $b;
//keep track of max power (and other params, not shown)
if ($powerOut > $maxPowerOut)
$maxPowerOut = $PowerOut;
}
//currently prints 899.993 instead of 900 as should be expected
print "Max Power is $maxPowerOut";
Naturally, $search is almost never 1 exactly. It goes like this:
0.99569478115682
0.99866447159913
1.0016341620414
1.0046038524837
1.0075735429261
...
Note how 1 is skipped over in above loop. For the sake of argument let's say most favorable position happens at 1.003000. That value (1.003000) would be skipped over as well.
Question
How can I improve, restructure, rethink, reorganize, rewrite my loop to avoid this type of problem?
A simple improvement might be to use an iterative approach:
In your current loop you search say 500 values in the interval [0.25, 1.75]. Let's say you can narrow down the optimum to the much smaller interval [0.995, 1.007] in this way. Then again divide this interval into say 500 values and repeat your loop. Repeat until you reach the desired precision.
Mathematically, you want to find the maximum within a given interval of a function f: search -> power that computes some power value for a given search parameter. Note that this is generally easier the smoother your function f is. To get a feeling for what f might look like, you can plot the function based on the values you computed in your loop.
If your function is well-behaved and is say unimodal (has only one "hump"), then for instance a simple golden section search would be effective.
Here's a quick JavaScript snippet / pseudo code, to help solve your problem. Basically your function should recursively call itself if you find that the deltas / slope have toggled from positive to negative.
function findMax(low, high) {
var maxOut = Number.MIN_VALUE;
// Calculate a step based on the low and high
// Using a power of 2 since the floating point numbers are represented by binary
var step = Math.abs((high - low) / 128);
// we'll be tracking the deltas of two test values
var prevDelta;
var delta;
// loop and check two values at a time
for(var i=low; i<=(high - step); i+=step) {
// coef ...
// design ...
// for testing
var out1 = Math.cos(i);
var out2 = Math.cos(i + step);
// update the max
if(out1 > maxOut) maxOut = out1;
if(out2 > maxOut) maxOut = out2;
// calc delta
delta = out2 - out1;
if(prevDelta !== undefined) {
// If one delta is going up and
// another is going down...
// Recursively call the function
if(prevDelta > 0 && delta < 0) {
var out3 = findMax(i - step, i + step);
// update the max
if(out3 > maxOut) maxOut = out3;
}
}
prevDelta = delta;
}
return maxOut;
}
alert(findMax(-0.5, 0.5)); // returns 1
Here's the JSFiddle http://jsfiddle.net/hw5f2o1s/
The above approach won't work if the maximum lies between your initial low and low + step, because the recursion is triggered by reaching a peak then going down from it. If this happens you may have to make the step variable smaller by increasing the power of two dividing (high - low).
Floating point numbers have limited precision (they're discreet), expect deviations.
See: http://php.net/manual/en/language.types.float.php
You can try the arbitrary precision extension
Current direction
Number 1.0 seems to be of importance, perhaps representing default. Rework the code to include 1.0 as part of the $search, either injecting it as part of the same loop or as a separate iteration.

10 digit mt_rand() with unbiased first digit

I want to generate the profile ids in my software. The mt_rand function works well but I need the ids to be a fixed 10 digit long. Currently I am looping through mt_rand outputs until I get a 10 digit number. But the problem I am facing now is that most of the profile ids start from 1 and some from 2. None from any of the other single digit numbers. I understand this happens because of mt_rand's range and it can't produce 10 digit numbers that start with 3 or more.
This is what I am currently doing
for($i = 0; $i < 200; $i++){
$num = mt_rand();
if(strlen($num) == 10) echo $num."<br>";
}
If you run the above code you will see all numbers start from either 1 or 2. Any way to fix this?
Edit: I guess I can just flip the numbers but some numbers end with zero and this seems like a bit of a hack anyways. But then again, random number generation is a hack in itself I guess.
just start your IDs at 1000000001 , then ID 2 at 1000000002 , ID 543 at 1000000543 , and so on?
alternatively, keep calling mt_rand(1000000001,min((PHP_INT_SIZE>4 ? intval("9999999999",10): PHP_INT_MAX),mt_getrandmax())) until you get an ID which does not already exist in your database? (this will be more and more cpu intesive as your db grows larger and larger.. when its almost full, i wouldn't be surprised if it took billions of iterations and several minutes..)
To elaborate on Rizier's suggestion, the only way to ensure any string (even a string of numbers) fits a given mold for length and rules is to generate it one character at a time and then fit them together
$str = '';
for($loop = 0; $loop < 10; $loop++) {
$str .= mt_rand(0,9);
}
echo $str;
You can then add rules to this. Maybe you don't want a leading 0 so you can add a rule for that. Maybe you want letters too. This will always give you a random string with the rules you want.
You can see this in action here http://3v4l.org/kIRdV

Integer string compression algorithm [closed]

Closed. This question does not meet Stack Overflow guidelines. It is not currently accepting answers.
Questions asking for code must demonstrate a minimal understanding of the problem being solved. Include attempted solutions, why they didn't work, and the expected results. See also: Stack Overflow question checklist
Closed 9 years ago.
Improve this question
can someone please name an existing algo which is used for compressing numbers? numbers are integers and totally random with no spaces and decimals, eg. 35637462736423478235687479567456....n
well, so far, all i have is this, it converts the integers into ascii reducing approx 40% of the original size
function intergerToChar($v)
{
$buffer="";
$charsLen=strlen($v);
for($i = 0; $i <= $charsLen; $i++)
{
$asc=$v[$i];
if($asc==0){$buffer[]=0;}
elseif($asc==1){$buffer[]=$v[$i].$v[$i+1].$v[$i+2];$i=$i+2;}
elseif($asc==2)
{
if($v[$i+1]<5){$buffer[]=$v[$i].$v[$i+1].$v[$i+2];$i=$i+2;}
elseif($v[$i+1]==5 && $v[$i+2]<6){$buffer[]=$v[$i].$v[$i+1].$v[$i+2];$i=$i+2;}
else{$buffer[]=$v[$i].$v[$i+1];$i++;}
}
else{$buffer[]=$v[$i].$v[$i+1];$i++;}
}
return $buffer;
}
btw, i know PHP is not meant for building a compression tool. I'll be using C/C++
UPDATE: This is another PHP code with better compressing result than the above code, it can compress upto 66% if the integers on the position 1st,6th,12,th, and so on has vales of less than 256 and the 3 integers following them have a values not more than 256 than the preceding 3 integers egs, 134298156286159.... can be compressed upto 66% i knw its not optimal, please feel free to make suggestions/corrections
function intergerToChar2($v)
{
$buffer="";
$charsLen=strlen($v);
for($i = 0; $i <= $charsLen; $i++)
{
if($v[$i].$v[$i+1].$v[$i+2]<256){$base=$v[$i].$v[$i+1].$v[$i+2];$i=$i+2;}
else{$base=$v[$i].$v[$i+1];$i=$i+1;}$i=$i+1;
if($v[$i].$v[$i+1].$v[$i+2]<256){$next=$v[$i].$v[$i+1].$v[$i+2];$i=$i+2;}
else{$next=$v[$i].$v[$i+1];$i=$i+1;}
if($next!=="")
{
$next=$next-$base;
if($next<0)$next=255+$next;
}
$buffer[]=$base;
$buffer[]=$next;
}
return $buffer;
}
btw, 10 bit encoding or 40 bit encoding can be easily done using base_convert() or 4th comment from http://php.net/manual/en/ref.bc.php page which always shows a compression of about 58.6%.
If the digits are random, then you cannot compress the sequence more than the information-theoretic limit, which is log210 bits/digit. (Actually, it's slightly more than that unless the precise length of the string is fixed.) You can achieve that limit by representing the digits as a (very long) binary number; however, that's awkward and timeconsuming to compress and decompress.
A very near optimal solution results from the fact that 1000 is only slightly less than 210, so you can represent 3 digits using 10 bits. That's 3.33 bits/digits, compared with the theoretically optimal 3.32 bits/digit. (In other words, it's about 99.7% optimal.)
Since there actually 1024 possible 10-bit codes, and you only need 1000 of them to represent 3 digits, you have some left over; one of them can be used to indicate the end of the stream, if necessary.
It's a little bit annoying to output 10-bit numbers. It's easier to output 40-bit numbers, since 40 bits is exactly five bytes. Fortunately, most languages these days support 40-bit arithmetic (actually 64-bit arithmetic).
(Note: This is not that different from your solution. But it's a bit easier and a bit more compressed.)

RNG gaussian distribution

What I'm trying to do isn't exactly a Gaussian distribution, since it has a finite minimum and maximum. The idea is closer to rolling X dice and counting the total.
I currently have the following function:
function bellcurve($min=0,$max=100,$entropy=-1) {
$sum = 0;
if( $entropy < 0) $entropy = ($max-$min)/15;
for($i=0; $i<$entropy; $i++) $sum += rand(0,15);
return floor($sum/(15*$entropy)*($max-$min)+$min);
}
The idea behind the $entropy variable is to try and roll enough dice to get a more even distribution of fractional results (so that flooring it won't cause problems).
It doesn't need to be a perfect RNG, it's just for a game feature and nothing like gambling or cryptography.
However, I ran a test over 65,536 iterations of bellcurve() with no arguments, and the following graph emerged:
(source: adamhaskell.net)
As you can see, there are a couple of values that are "offset", and drastically so. While overall it doesn't really affect that much (at worst it's offset by 2, and ignoring that the probability is still more or less where I want it), I'm just wondering where I went wrong.
Any additional advice on this function would be appreciated too.
UPDATE: I fixed the problem above just by using round instead of floor, but I'm still having trouble getting a good function for this. I've tried pretty much every function I can think of, including gaussian, exponential, logistic, and so on, but to no avail. The only method that has worked so far is this approximation of rolling dice, which is almost certainly not what I need...
If you are looking for a bell curve distribution, generate multiple random numbers and add them together. If you are looking for more modifiers, simply multiply them to the end result.
Generate a random bell curve number, with a bonus of 50% - 150%.
Sum(rand(0,15), rand(0,15) , rand(0,15))*(rand(2,6)/2)
Though if you're concerned about rand not providing random enough numbers you can use mt_rand which will have a much better distribution (uses mersenne twister)
The main issue turned out to be that I was trying to generate a continuous bell curve based on a discrete variable. That's what caused holes and offsets when scaling the result.
The fix I used for this was: +rand(0,1000000)/1000000 - it essentially takes the whole number discrete variable and adds a random fraction to it, more or less making it continuous.
The function is now:
function bellcurve() {
$sum = 0;
$entropy = 6;
for($i=0; $i<$entropy; $i++) $sum += rand(0,15);
return ($sum+rand(0,1000000)/1000000)/(15*$entropy);
}
It returns a float between 0 and 1 inclusive (although those exact values are extremely unlikely), which can then be scaled and rounded as needed.
Example usage:
$damage *= bellcurve()-0.5; // adjusts $damage by a random amount
// between 50% and 150%, weighted in favour of 100%

Is any solution the correct solution?

I always think to myself after solving a programming challenge that I have been tied up with for some time, "It works, thats good enough".
I don't think this is really the correct mindset, in my opinion and I think I should always be trying to code with the greatest performance.
Anyway, with this said, I just tried a ProjectEuler question. Specifically question #2.
How could I have improved this solution. I feel like its really verbose. Like I'm passing the previous number in recursion.
<?php
/* Each new term in the Fibonacci sequence is generated by adding the previous two
terms. By starting with 1 and 2, the first 10 terms will be:
1, 2, 3, 5, 8, 13, 21, 34, 55, 89, ...
Find the sum of all the even-valued terms in the sequence which do not exceed
four million.
*/
function fibonacci ( $number, $previous = 1 ) {
global $answer;
$fibonacci = $number + $previous;
if($fibonacci > 4000000) return;
if($fibonacci % 2 == 0) {
$answer = is_numeric($answer) ? $answer + $fibonacci : $fibonacci;
}
return fibonacci($fibonacci, $number);
}
fibonacci(1);
echo $answer;
?>
Note this isn't homework. I left school hundreds of years ago. I am just feeling bored and going through the Project Euler questions
I always think to myself after solving
a programming challenge that I have
been tied up with for some time, "It
works, thats good enough".
I don't think this is really the
correct mindset, in my opinion and I
think I should always be trying to
code with the greatest performance.
One of the classic things presented in Code Complete is that programmers, given a goal, can create an "optimum" computer program using one of many metrics, but its impossible to optimize for all of the parameters at once. Parameters such as
Code Readabilty
Understandability of Code Output
Length of Code (lines)
Speed of Code Execution (performance)
Speed of writing code
Feel free to optimize for any one of these parameters, but keep in mind that optimizing for all of them at the same time can be an exercise in frustration, or result in an overdesigned system.
You should ask yourself: what are your goals? What is "good enough" in this situation? If you're just learning and want to make things more optimized, by all means go for it, just be aware that a perfect program takes infinite time to build, and time is valuable in and of itself.
You can avoid the mod 2 section by doing the operation three times (every third element is even), so that it reads:
$fibonacci = 3*$number + 2*$previous;
and the new input to fibonacci is ($fibonnacci,2*$number+$previous)
I'm not familiar with php, so this is just general algorithm advice, I don't know if it's the right syntax. It's practically the same operation, it just substitutes a few multiplications for moduluses and additions.
Also, make sure that you start with $number as even and the $previous as the odd one that precedes it in the sequence (you could start with $number as 2, $previous as 1, and have the sum also start at 2).
Forget about Fibonacci (Problem 2), i say just advance in Euler. Don't waste time finding the optimal code for every question.
If your answer achieves the One minute rule then you are good to try the next one. After few problems, things will get harder and you will be optimizing the code while you write to achieve that goal
Others on here have said it as well "This is part of the problem with example questions vs real business problems"
The answer to that question is very difficult to answer for a number of reasons:
Language plays a huge role. Some languages are much more suited to some problems and so if you are faced with a mismatch you are going to find your solution "less than eloquent"
It depends on how much time you have to solve the problem, the more time to solve the problem the more likely it is you will come to a solution you like (though the reverse is occasionally true as well too much time makes you over think)
It depends on your level of satisfaction overall. I have worked on several projects where I thought parts where great and coded beautifully, and other parts where utter garbage, but they were outside of what I had time to address.
I guess the bottom line is if you think its a good solution, and your customer/purchaser/team/etc agree then its a good solution for the time. You might change your mind in the future but for now its a good solution.
Use the guideline that the code to solve the problem shouldn't take more than about a minute to execute. That's the most important thing for Euler problems, IMO.
Beyond that, just make sure it's readable - make sure that you can easily see how the code works. This way, you can more easily see how things worked if you ever get a problem like one of the Euler problems you solved, which in turn lets you solve that problem more quickly - because you already know how you should solve it.
You can set other criteria for yourself, but I think that's going above and beyond the intention of Euler problems - to me, the context of the problems seem far more suitable for focusing on efficiency and readability than anything else
I didn't actually test this ... but there was something i personally would have attempted to solve in this solution before calling it "done".
Avoiding globals as much as possible by implementing recursion with a sum argument
EDIT: Update according to nnythm's algorithm recommendation (cool!)
function fibonacci ( $number, $previous, $sum ) {
if($fibonacci > 4000000) { return $sum; }
else {
$fibonacci = 3*$number + 2*$previous;
return fibonacci($fibonnacci,2*$number+$previous,$sum+$fibonacci);
}
}
echo fibonacci(2,1,2);
[shrug]
A solution should be evaluated by the requirements. If all requirements are satisfied, then everything else is moxy. If all requirements are met, and you are personally dissatisfied with the solution, then perhaps the requirements need re-evaluation. That's about as far as you can take this meta-physical question, because we start getting into things like project management and business :S
Ahem, regarding your Euler-Project question, just my two-pence:
Consider refactoring to iterative, as opposed to recursive
Notice every third term in the series is even? No need to modulo once you are given your starting term
For example
public const ulong TermLimit = 4000000;
public static ulong CalculateSumOfEvenTermsTo (ulong termLimit)
{
// sum!
ulong sum = 0;
// initial conditions
ulong prevTerm = 1;
ulong currTerm = 1;
ulong swapTerm = 0;
// unroll first even term, [odd + odd = even]
swapTerm = currTerm + prevTerm;
prevTerm = currTerm;
currTerm = swapTerm;
// begin iterative sum,
for (; currTerm < termLimit;)
{
// we have ensured currTerm is even,
// and loop condition ensures it is
// less than limit
sum += currTerm;
// next odd term, [odd + even = odd]
swapTerm = currTerm + prevTerm;
prevTerm = currTerm;
currTerm = swapTerm;
// next odd term, [even + odd = odd]
swapTerm = currTerm + prevTerm;
prevTerm = currTerm;
currTerm = swapTerm;
// next even term, [odd + odd = even]
swapTerm = currTerm + prevTerm;
prevTerm = currTerm;
currTerm = swapTerm;
}
return sum;
}
So, perhaps more lines of code, but [practically] guaranteed to be faster. An iterative approach is not as "elegant", but saves recursive method calls and saves stack space. Second, unrolling term generation [that is, explicitly expanding a loop] reduces the number of times you would have had to perform modulus operation and test "is even" conditional. Expanding also reduces the number of times your end conditional [if current term is less than limit] is evaluated.
Is it "better", no, it's just "another" solution.
Apologies for the C#, not familiar with php, but I am sure you could translate it fairly well.
Hope this helps, :)
Cheers
It is completely your choice, whether you are happy with a solution or whether you want to improve it further. There are many project Euler problems where a brute force solution would take too long, and where you will have to look for a clever algorithm.
Problem 2 doesn't require any optimisation. Your solution is already more than fast enough.
Still let me explain what kind of optimisation is possible. Often it helps to do some research on the subject. E.g. the wiki page on Fibonacci numbers contains this formula
fib(n) = (phi^n - (1-phi)^n)/sqrt(5)
where phi is the golden ratio. I.e.
phi = (sqrt(5)+1)/2.
If you use that fib(n) is approximately phi^n/sqrt(5) then you can find the index of the largest Fibonacci number smaller than M by
n = floor(log(M * sqrt(5)) / log(phi)).
E.g. for M=4000000, we get n=33, hence fib(33) the largest Fibonacci number smaller than 4000000. It can be observed that fib(n) is even if n is a multiple of 3. Hence the sum of the even Fibonacci numbers is
fib(0) + fib(3) + fib(6) + ... + fib(3k)
To find a closed form we use the formula above from the wikipedia page and notice that
the sum is essentially just two geometric series. The math isn't completely trivial, but using these ideas it can be shown that
fib(0) + fib(3) + fib(6) + ... + fib(3k) = (fib(3k + 2) - 1) /2 .
Since fib(n) has size O(n), the straight forward solution has a complexity of O(n^2).
Using the closed formula above together with a fast method to evaluate Fibonacci numbers
has a complexity of O(n log(n)^(1+epsilon)). For small numbers either solution is of course fine.

Categories