generating random numbers from skewed normal distribution - php

when you use the random(min,max) function in most languages, what is the distribution like ?
what if i want to produce a range of numbers for 20% of the time, and another range of numbers for 80% of the time, how can i generate series of random number that follows that ?
ex) i should get random frequency but the frequency of "1" must be higher by around 20% than the frequency of "0"

For most languages, the random number generated can be dependent on an algorithm within that language, or generated randomly based on the several factors such as time, processor, seed number.
The distribution is not normal. In fact say if the function returns 5 integers, all 5 integers have a fair chance of appearing in the next function call. This is also known as uniformed distribution.
So say if you wish to produce a number (say 7) for 20% of the time, and another number (say 13) for 80% of the time, you can do an array like this:
var arr = [7,13,13,13,13];
var picked = arr[Math.floor(Math.random()*arr.length)] ;
// since Math.random() returns a float from 0.0 to 1.0
So thus 7 has a 20% chance of appearing, and 13 has 80% chance.

This is one possible method:
ranges = [(10..15), (20..30)]
selector = [0, 0, 1,1,1,1,1,1,1,1] # 80:20 distribution array
# now select a range randomly
random_within_range(ranges(selector[random(10)]))
def random_within_range range
rand (range.last - range.begin - (range.exclude_end? ? 1 : 0)) + range.begin
end

Most pseudo random generators built-in programming languages produce a uniform distribution, i.e. each value within the range has the same probability of being produced as any other value in the range. Indeed in some cases this requirement is part of the language standard. Some languages such as Python or R support various of the common distributions.
If the language doesn't support it, you either have to use mathematical tricks to produce other distributions such as a normal distribution from a uniform one, or you can look for third-party libraries which perform this function.
Your problem seems much simpler however since the random variable is discrete (and of the simpler type thereof, i.e binary). The trick for these is to produce a random number form the uniform distribution, in a given range, say 0 to 999, and to split this range in the proportions associated with each value, in the case at hand this would be something like :
If (RandomNumber) < 200 // 20%
RandomVariable = 0
Else // 80%
RandomVariable = 1
This logic can of course be applied to n discrete variables.

Your question differs from your example quite a bit. So I'll answer both and you can figure out whichever answers what you're really looking for.
1) Your example (I don't know ruby or java, so bear with me)
First generate a random number from a uniform distribution from 0 to 1, we'll call it X.
You can then setup a if/else (i.e. if ( x < .2) {1} else {0})
2) Generating random numbers from a normal distribution with skew
You can look into skewed distributions such as a skewed student T's distribution with high degree of freedom.
You can also use the normal CDF and just pick off numbers that way.
Here's a paper which discusses how to do it with multiple random numbers from a uniform distribution
Finally, you can use a non-parametric approach which would involve kernal density estimation (I suspect you aren't looking for anything this sophisticated however).

Like anybody says, pseudo-random number generator on most languages implements the uniform distribution over (0,1).
If you have two responses categories (0,1) with p probability for 1, you have a Bernoulli distribution and can be emulated with
# returns 1 with p probability and 0 with (1-p) probability
def bernoulli(p)
rand()<p ? 1:0;
end
Simple as that.
Skewed normal distribution is a entirely different beast, made by the 'union' of pdf and cdf of a normal distribution to create the skew. You can read Azzalini's work here. Using gem distribution, you can generate the probability density function, with
# require 'distribution'
def sn_pdf(x,alpha)
sp = 2*Distribution::Normal.pdf(x)*Distribution::Normal.cdf(x*alpha)
end
Obtains the cdf is difficult, because there isn't an analytical solution, so you should integrate.
To obtain random numbers from a skewed normal, you could use the acceptation-rejection algorithm.

Most computer languages have a uniform distribution to their (pseudo) random integer generators. So each integer is equally likely.
For your example, suppose you want "1" 55% of the time and "0" 45% of the time.
To get unequal these frequencies, try generating a random number between 1 and 100. If the number generated is from 1 to 55, output "1"; otherwise output "0".

How about
var oneFreq = 80.0/100.0;
var output = 0;
if (Math.random() > oneFreq)
output = 1;
or, if you want 20% of the values to be between 0 and 100, and 80% to be between 100 and 200.
var oneFreq = 80.0/100.0;
var oneRange = 100;
var zeroRange = 100;
var output = Math.random();
if (output > oneFreq)
output = zeroRange + Math.floor(oneRange * (output - oneFreq));
else
output = Math.floor(zeroRange * output);

In ruby I would do it like this:
class DistributedRandom
def initialize(left, right = nil)
if right
#distribution = [0] * left + [1] * right
else
#distribution = left
end
end
def get
#distribution[rand #distribution.length]
end
end
Running a test with 80:20 distribution:
test = [0,0]
rnd = DistributedRandom.new 80, 20 # 80:20 distribution
10000.times { test[rnd.get] += 1 }; puts "Test 1", test
Running a test with 20% more distribution on the right side:
test = [0,0]
rnd = DistributedRandom.new 100, 120 # +20% distribution
10000.times { test[rnd.get] += 1 }; puts "Test 2", test
Running a test with custom distribution with a trigonometric function over 91 discrete values, output however does not fit very well into the previous tests:
test = [0,0]
rnd = DistributedRandom.new((0..90).map {|x| Math.sin(Math::PI * x / 180.0)})
10000.times { test[rnd.get] += 1 }; puts "Test 3", test

Have a look at this lecture if you want a good mathematical understanding.

Related

Two-way hashing of fixed range numbers

I need to create a function which takes a single integer as argument in the range 0-N and returns a seemingly random number in the same range.
Each input number should always have exactly one output and it should always be the same.
Such a function would produce something like this:
f(1) = 4
f(2) = 1
f(3) = 5
f(4) = 2
f(5) = 3
I believe this could be accomplished by some kind of a hashing algorithm? I don't need anything complex, just not something too simple like f(1) = 2, f(2) = 3 etc.
The biggest issue is that I need this to be reversible. E.g. the above table should be true left-to-right as well as right-to-left, using a different function for the right-to-left conversion is fine.
I know the easiest way is to create an array, shuffle it and just store the relations in a db or something, but as I need N to be quite large I'd like to avoid this if possible.
Edit: For my particular case N is a specific number, it's exactly 16777216 (64^4).
If the range is always a power of two -- like [0,16777216) -- then you can use exclusive-or just as #MarkBaker suggested. It just doesn't work so easily if your range is not a power of two.
You can use addition and subtraction modulo N, although these alone are too obvious, so you have to combine it with something else.
You can also do multiplication modulo-N, but reversing that is complicated. To make it simpler, we can isolate the bottom eight bits and multiply those and add them in a way that doesn't interfere with those bits so we can use them again to reverse the operation.
I don't know PHP so I'm going to give an example in C, instead. Maybe it's the same.
int enc(int x) {
x = x + 4799 * 256 * (x % 256);
x = x + 8896843;
x = x ^ 4777277;
return (x + 1073741824) % 16777216;
}
And to decode, play the operations back in reverse order:
int dec(int x) {
x = x + 1073741824;
x = x ^ 4777277;
x = x - 8896843;
x = x - 4799 * 256 * (x % 256);
return x % 16777216;
}
That 1073741824 must be a multiple of N, and 256 must be a factor of N, and if N is not a power of two then you can't (necessarily) use exclusive-or (^ is exclusive-or in C and I assume in PHP too). The other numbers you can fiddle with, and add and remove stages, at your leisure.
The addition of 1073741824 in both functions is to ensure that x stays positive; this is so that the modulo operation doesn't ever give a negative result, even after we've subtracted values from x which might have made it go negative in the interim.
I offered to describe how I "randomly" scramble up 9-digit SSNs when producing research data sets. This does not replace or hash an SSN. It re-orders the digits. It is difficult to put the digits back in the correct order if you don't know the order in which they were scrambled. I have a gut feeling that this is not what the questioner really wants. So, I am happy to delete this answer if it is deemed off-topic.
I know that I have 9 digits. So, I start with an array that has 9 index values in order:
$a = array(0,1,2,3,4,5,6,7,8);
Now, I need to turn a key that I can remember into a way to shuffle the array. The shuffling has to be the same order for the same key every time. I use a couple tricks. I use crc32 to turn a word into a number. I use srand/rand to get a predictable order of random values. Note: mt_rand no longer produces the same sequence of random digits with the same seed, so I have to use rand.
srand(crc32("My secret key"));
usort($a, function($a, $b) { return rand(-1,1); });
The array $a still has the digits 0 through 8, but they are shuffled. If I use the same keyword I will get the same shuffled order every time. That lets me repeat this every month and get the same result. Then, with a shuffled array, I can pick the digits off the SSN. First, I ensure it has 9 characters (some SSNs are sent as integers and a leading 0 is omitted). Then, I build a masked SSN by picking the digits using $a.
$ssn = str_pad($ssn, 9, '0', STR_PAD_LEFT);
$masked_ssn = '';
foreach($a as $i) $masked_ssn.= $ssn{$i};
$masked_ssn will now have all the digits in $ssn, but in a different order. Technically, there are keywords that make $a become the original ordered array after shuffling, but that is very very rare.
Hopefully this makes sense. If so, you can do it all much faster. If you turn the original string into an array of characters, you can shuffle the array of characters. You just need to reseed rand every time.
$ssn = "111223333"; // Assume I'm using a proper 9-digit SSN
$a = str_split($ssn);
srand(crc32("My secret key"));
usort($a, function($a, $b) { return rand(-1,1); });
$masked_ssn = implode('', $a);
This is not really faster in a runtime way because rand is a rather expensive function and you run rand a hell of lot more here. If you are masking thousands of values as I do, you will want to use an index array that is shuffled just once, not a shuffling for every value.
Now, how do I undo it? Assume I'm using the first method with the index array. It will be something like $a = {5, 3, 6, 1, 0, 2, 7, 8, 4}. Those are the indexes for the original SSN in the masked order. So, I can easily build the original SSN.
$ssn = '000000000'; // I like to define all 9 characters before I start
foreach($a as $i=>$j) $ssn[$j] = $masked_ssn{$i};
As you can see, $i counts from 0 to 8 across the masked SSN. $j counts 5, 3, 6... and puts each value from the masked SSN in the correct place in the original SSN.
Looks like you've got good answer, but still there is an alternative. Linear Congruential Generator (LCG) could provide 1-to-1 mapping and it is known to be a reversible using Euclid's algorithm. For 24bit
Xi = [(A * Xi-1) + C] Mod M
where M = 2^24 = 16,777,216
A = 16,598,013
C = 12,820,163
For LCG reversability take a look at Reversible pseudo-random sequence generator

How can I overcome discrete nature of a numerical algorithm that is currently skipping over a certain real number inside a loop?

I have a fairly complex algorithm that performs a search where I use a $search variable in some range [0.25 to 1.75].
Based on the algorithm there is an "interesting" thing happens when the $search is exactly 1, because it hits a configuration of variables that is sometimes (but not always) most favorable. Some of the code depends on $search being exactly 1 to produce that most favorable outcome.
More specifically, there is usually some specific value within the search range, which produces most favorable outcome, but the way my algorithm is laid out, that specific value is most often skipped over. Here I lay out example when that specific value (based on other inputs and configuration), happens to be exactly 1..
The Problem
Mathematically speaking if $search was continuous rather than discreet, I wouldn't have this problem. My problem is trying to converge on most favorable variable configuration using discrete mathematics. The issue here is the algorithm. Secondary issue to watch out for as well is floating point arithmetic, but I do not believe that is the issue here just yet.
Basic Loop:
$maxPowerOut = 0 ;
for ($increment = 0; $increment <= 500; $increment ++)
{
//vars computed elsewhere, i.e:
//MIN = 0.24651533;
//STEP = 0.00196969
$search = MIN + STEP * $increment;
//compute several coefficients (returns an array)
$coeff = $this->coefficient($search);
//design is a complex library function
list($a, $b) = $this->design($coeff);
$powerOut = $a * $b;
//keep track of max power (and other params, not shown)
if ($powerOut > $maxPowerOut)
$maxPowerOut = $PowerOut;
}
//currently prints 899.993 instead of 900 as should be expected
print "Max Power is $maxPowerOut";
Naturally, $search is almost never 1 exactly. It goes like this:
0.99569478115682
0.99866447159913
1.0016341620414
1.0046038524837
1.0075735429261
...
Note how 1 is skipped over in above loop. For the sake of argument let's say most favorable position happens at 1.003000. That value (1.003000) would be skipped over as well.
Question
How can I improve, restructure, rethink, reorganize, rewrite my loop to avoid this type of problem?
A simple improvement might be to use an iterative approach:
In your current loop you search say 500 values in the interval [0.25, 1.75]. Let's say you can narrow down the optimum to the much smaller interval [0.995, 1.007] in this way. Then again divide this interval into say 500 values and repeat your loop. Repeat until you reach the desired precision.
Mathematically, you want to find the maximum within a given interval of a function f: search -> power that computes some power value for a given search parameter. Note that this is generally easier the smoother your function f is. To get a feeling for what f might look like, you can plot the function based on the values you computed in your loop.
If your function is well-behaved and is say unimodal (has only one "hump"), then for instance a simple golden section search would be effective.
Here's a quick JavaScript snippet / pseudo code, to help solve your problem. Basically your function should recursively call itself if you find that the deltas / slope have toggled from positive to negative.
function findMax(low, high) {
var maxOut = Number.MIN_VALUE;
// Calculate a step based on the low and high
// Using a power of 2 since the floating point numbers are represented by binary
var step = Math.abs((high - low) / 128);
// we'll be tracking the deltas of two test values
var prevDelta;
var delta;
// loop and check two values at a time
for(var i=low; i<=(high - step); i+=step) {
// coef ...
// design ...
// for testing
var out1 = Math.cos(i);
var out2 = Math.cos(i + step);
// update the max
if(out1 > maxOut) maxOut = out1;
if(out2 > maxOut) maxOut = out2;
// calc delta
delta = out2 - out1;
if(prevDelta !== undefined) {
// If one delta is going up and
// another is going down...
// Recursively call the function
if(prevDelta > 0 && delta < 0) {
var out3 = findMax(i - step, i + step);
// update the max
if(out3 > maxOut) maxOut = out3;
}
}
prevDelta = delta;
}
return maxOut;
}
alert(findMax(-0.5, 0.5)); // returns 1
Here's the JSFiddle http://jsfiddle.net/hw5f2o1s/
The above approach won't work if the maximum lies between your initial low and low + step, because the recursion is triggered by reaching a peak then going down from it. If this happens you may have to make the step variable smaller by increasing the power of two dividing (high - low).
Floating point numbers have limited precision (they're discreet), expect deviations.
See: http://php.net/manual/en/language.types.float.php
You can try the arbitrary precision extension
Current direction
Number 1.0 seems to be of importance, perhaps representing default. Rework the code to include 1.0 as part of the $search, either injecting it as part of the same loop or as a separate iteration.

How to calculate in PHP, Normal Distribution that exactly matches Excel Result using NORMDIST(x, mean, standard_dev, accumulative) [duplicate]

How can I convert a uniform distribution (as most random number generators produce, e.g. between 0.0 and 1.0) into a normal distribution? What if I want a mean and standard deviation of my choosing?
There are plenty of methods:
Do not use Box Muller. Especially if you draw many gaussian numbers. Box Muller yields a result which is clamped between -6 and 6 (assuming double precision. Things worsen with floats.). And it is really less efficient than other available methods.
Ziggurat is fine, but needs a table lookup (and some platform-specific tweaking due to cache size issues)
Ratio-of-uniforms is my favorite, only a few addition/multiplications and a log 1/50th of the time (eg. look there).
Inverting the CDF is efficient (and overlooked, why ?), you have fast implementations of it available if you search google. It is mandatory for Quasi-Random numbers.
The Ziggurat algorithm is pretty efficient for this, although the Box-Muller transform is easier to implement from scratch (and not crazy slow).
Changing the distribution of any function to another involves using the inverse of the function you want.
In other words, if you aim for a specific probability function p(x) you get the distribution by integrating over it -> d(x) = integral(p(x)) and use its inverse: Inv(d(x)). Now use the random probability function (which have uniform distribution) and cast the result value through the function Inv(d(x)). You should get random values cast with distribution according to the function you chose.
This is the generic math approach - by using it you can now choose any probability or distribution function you have as long as it have inverse or good inverse approximation.
Hope this helped and thanks for the small remark about using the distribution and not the probability itself.
Here is a javascript implementation using the polar form of the Box-Muller transformation.
/*
* Returns member of set with a given mean and standard deviation
* mean: mean
* standard deviation: std_dev
*/
function createMemberInNormalDistribution(mean,std_dev){
return mean + (gaussRandom()*std_dev);
}
/*
* Returns random number in normal distribution centering on 0.
* ~95% of numbers returned should fall between -2 and 2
* ie within two standard deviations
*/
function gaussRandom() {
var u = 2*Math.random()-1;
var v = 2*Math.random()-1;
var r = u*u + v*v;
/*if outside interval [0,1] start over*/
if(r == 0 || r >= 1) return gaussRandom();
var c = Math.sqrt(-2*Math.log(r)/r);
return u*c;
/* todo: optimize this algorithm by caching (v*c)
* and returning next time gaussRandom() is called.
* left out for simplicity */
}
Where R1, R2 are random uniform numbers:
NORMAL DISTRIBUTION, with SD of 1:
sqrt(-2*log(R1))*cos(2*pi*R2)
This is exact... no need to do all those slow loops!
Reference: dspguide.com/ch2/6.htm
Use the central limit theorem wikipedia entry mathworld entry to your advantage.
Generate n of the uniformly distributed numbers, sum them, subtract n*0.5 and you have the output of an approximately normal distribution with mean equal to 0 and variance equal to (1/12) * (1/sqrt(N)) (see wikipedia on uniform distributions for that last one)
n=10 gives you something half decent fast. If you want something more than half decent go for tylers solution (as noted in the wikipedia entry on normal distributions)
I would use Box-Muller. Two things about this:
You end up with two values per iteration
Typically, you cache one value and return the other. On the next call for a sample, you return the cached value.
Box-Muller gives a Z-score
You have to then scale the Z-score by the standard deviation and add the mean to get the full value in the normal distribution.
It seems incredible that I could add something to this after eight years, but for the case of Java I would like to point readers to the Random.nextGaussian() method, which generates a Gaussian distribution with mean 0.0 and standard deviation 1.0 for you.
A simple addition and/or multiplication will change the mean and standard deviation to your needs.
The standard Python library module random has what you want:
normalvariate(mu, sigma)
Normal distribution. mu is the mean, and sigma is the standard deviation.
For the algorithm itself, take a look at the function in random.py in the Python library.
The manual entry is here
This is a Matlab implementation using the polar form of the Box-Muller transformation:
Function randn_box_muller.m:
function [values] = randn_box_muller(n, mean, std_dev)
if nargin == 1
mean = 0;
std_dev = 1;
end
r = gaussRandomN(n);
values = r.*std_dev - mean;
end
function [values] = gaussRandomN(n)
[u, v, r] = gaussRandomNValid(n);
c = sqrt(-2*log(r)./r);
values = u.*c;
end
function [u, v, r] = gaussRandomNValid(n)
r = zeros(n, 1);
u = zeros(n, 1);
v = zeros(n, 1);
filter = r==0 | r>=1;
% if outside interval [0,1] start over
while n ~= 0
u(filter) = 2*rand(n, 1)-1;
v(filter) = 2*rand(n, 1)-1;
r(filter) = u(filter).*u(filter) + v(filter).*v(filter);
filter = r==0 | r>=1;
n = size(r(filter),1);
end
end
And invoking histfit(randn_box_muller(10000000),100); this is the result:
Obviously it is really inefficient compared with the Matlab built-in randn.
This is my JavaScript implementation of Algorithm P (Polar method for normal deviates) from Section 3.4.1 of Donald Knuth's book The Art of Computer Programming:
function normal_random(mean,stddev)
{
var V1
var V2
var S
do{
var U1 = Math.random() // return uniform distributed in [0,1[
var U2 = Math.random()
V1 = 2*U1-1
V2 = 2*U2-1
S = V1*V1+V2*V2
}while(S >= 1)
if(S===0) return 0
return mean+stddev*(V1*Math.sqrt(-2*Math.log(S)/S))
}
I thing you should try this in EXCEL: =norminv(rand();0;1). This will product the random numbers which should be normally distributed with the zero mean and unite variance. "0" can be supplied with any value, so that the numbers will be of desired mean, and by changing "1", you will get the variance equal to the square of your input.
For example: =norminv(rand();50;3) will yield to the normally distributed numbers with MEAN = 50 VARIANCE = 9.
Q How can I convert a uniform distribution (as most random number generators produce, e.g. between 0.0 and 1.0) into a normal distribution?
For software implementation I know couple random generator names which give you a pseudo uniform random sequence in [0,1] (Mersenne Twister, Linear Congruate Generator). Let's call it U(x)
It is exist mathematical area which called probibility theory.
First thing: If you want to model r.v. with integral distribution F then you can try just to evaluate F^-1(U(x)). In pr.theory it was proved that such r.v. will have integral distribution F.
Step 2 can be appliable to generate r.v.~F without usage of any counting methods when F^-1 can be derived analytically without problems. (e.g. exp.distribution)
To model normal distribution you can cacculate y1*cos(y2), where y1~is uniform in[0,2pi]. and y2 is the relei distribution.
Q: What if I want a mean and standard deviation of my choosing?
You can calculate sigma*N(0,1)+m.
It can be shown that such shifting and scaling lead to N(m,sigma)
I have the following code which maybe could help:
set.seed(123)
n <- 1000
u <- runif(n) #creates U
x <- -log(u)
y <- runif(n, max=u*sqrt((2*exp(1))/pi)) #create Y
z <- ifelse (y < dnorm(x)/2, -x, NA)
z <- ifelse ((y > dnorm(x)/2) & (y < dnorm(x)), x, z)
z <- z[!is.na(z)]
It is also easier to use the implemented function rnorm() since it is faster than writing a random number generator for the normal distribution. See the following code as prove
n <- length(z)
t0 <- Sys.time()
z <- rnorm(n)
t1 <- Sys.time()
t1-t0
function distRandom(){
do{
x=random(DISTRIBUTION_DOMAIN);
}while(random(DISTRIBUTION_RANGE)>=distributionFunction(x));
return x;
}

Looping through a formula that describes a spiral to generate XY coordinates

I'm trying to generate a spiral galaxy in the form of xy (2D) coordinates -- but math is not my strong suit.
I've gleaned the following from an excellent source on spirals:
The radius r(t) and the angle t are proportional for the
simpliest spiral, the spiral of Archimedes. Therefore the equation is:
(3) Polar equation: r(t) = at [a is constant].
From this follows
(2) Parameter form: x(t) = at cos(t), y(t) = at sin(t),
(1) Central
equation: x²+y² = a²[arc tan (y/x)]².
This question sort of touched upon galaxy generation, but the responses were scattered and still overly complex for what I need (aka, my math-dumb mind can't understand them).
Essentially, what I need to do is loop through a spiral formula in PHP ~5000 times to generate points on a 513x513 XY grid. The size of the grid and the number of points needed may change in the future. Even better would be to weigh those points towards the origin of the spirals both in frequency and how far they can stray from the exact mathematical formula, similarly to how a galaxy actually looks.
This mathematical paper talks about a formula that describes the structure of spiral galaxies.
What completely loses me is how to translate a mathematical formula to something I can loop through in PHP!
// a is 5 here
function x($t){ return 5 * $t * cos($t); }
function y($t){ return 5 * $t * sin($t); }
for ($t = 0; $t < 50; $t += 0.01) {
$xyPoint = array(x($t), y($t));
// draw it
}
when you encounter parametric equations like this, its common for the parameter variable to be t, which means time. So you could think of plugging increasing values of t into the functions, and getting coordinates which gradually change as elapsed time increases.
you'll need to choose your own values for a, the range of t, and the increment step size of t. It just depends on your requirements. both cos() and sin() have a max value of 1, if that helps you figure out suitable values for a and t depending on your canvas size

How to compare two 64 bit numbers

In PHP I have a 64 bit number which represents tasks that must be completed. A second 64 bit number represents the tasks which have been completed:
$pack_code = 1001111100100000000000000011111101001111100100000000000000011111
$veri_code = 0000000000000000000000000001110000000000000000000000000000111110
I need to compare the two and provide a percentage of tasks completed figure. I could loop through both and find how many bits are set, but I don't know if this is the fastest way?
Assuming that these are actually strings, perhaps something like:
$pack_code = '1001111100100000000000000011111101001111100100000000000000011111';
$veri_code = '0000000000000000000000000001110000000000000000000000000000111110';
$matches = array_intersect_assoc(str_split($pack_code),str_split($veri_code));
$finished_matches = array_intersect($matches,array(1));
$percentage = (count($finished_matches) / 64) * 100
Because you're getting the numbers as hex strings instead of ones and zeros, you'll need to do a bit of extra work.
PHP does not reliably support numbers over 32 bits as integers. 64-bit support requires being compiled and running on a 64-bit machine. This means that attempts to represent a 64-bit integer may fail depending on your environment. For this reason, it will be important to ensure that PHP only ever deals with these numbers as strings. This won't be hard, as hex strings coming out of the database will be, well, strings, not ints.
There are a few options here. The first would be using the GMP extension's gmp_xor function, which performs a bitwise-XOR operation on two numbers. The resulting number will have bits turned on when the two numbers have opposing bits in that location, and off when the two numbers have identical bits in that location. Then it's just a matter of counting the bits to get the remaining task count.
Another option would be transforming the number-as-a-string into a string of ones and zeros, as you've represented in your question. If you have GMP, you can use gmp_init to read it as a base-16 number, and use gmp_strval to return it as a base-2 number.
If you don't have GMP, this function provided in another answer (scroll to "Step 2") can accurately transform a string-as-number into anything between base-2 and 36. It will be slower than using GMP.
In both of these cases, you'd end up with a string of ones and zeros and can use code like that posted by #Mark Baker to get the difference.
Optimization in this case is not worth of considering. I'm 100% sure that you don't really care whether your scrip will be generated 0.00000014 sec. faster, am I right?
Just loop through each bit of that number, compare it with another and you're done.
Remember words of Donald Knuth:
We should forget about small efficiencies, say about 97% of the time: premature optimization is the root of all evil.
This code utilizes the GNU Multi Precision library, which is supported by PHP, and since it is implemented in C, should be fast enough, and supports arbitrary precision.
$pack_code = gmp_init("1001111100100000000000000011111101001111100100000000000000011111", 2);
$veri_code = gmp_init("0000000000000000000000000001110000000000000000000000000000111110", 2);
$number_of_different_bits = gmp_popcount(gmp_xor($pack_code, $veri_code));
$a = 11111;
echo sprintf('%032b',$a)."\n";
$b = 12345;
echo sprintf('%032b',$b)."\n";
$c = $a & $b;
echo sprintf('%032b',$c)."\n";
$n=0;
while($c)
{
$n += $c & 1;
$c = $c >> 1;
}
echo $n."\n";
Output:
00000000000000000010101101100111
00000000000000000011000000111001
00000000000000000010000000100001
3
Given your PHP-setuo can handle 64bit, this can be easily extended.
If not you can sidestep this restriction using GNU Multiple Precision
You could also split up the HEx-Representation and then operate on those coresponding parts parts instead. As you need just the local fact of 1 or 0 and not which number actually is represented! I think that would solve your problem best.
For example:
0xF1A35C and 0xD546C1
you just compare the binary version of F and D, 1 and 5, A and 4, ...

Categories