Store many numbers as a single unique number - php

I have the necessity to store many numbers (i can decide which numbers) as a single unique number from which i should be able to retrieve the original number.
I already know 2 ways to do this:
1) Fundamental theorem of arithmetic (Prime Numbers)
Say i have 5 values, i assign a prime number other than 1 to each value
a = 2
b = 3
c = 5
d = 7
e = 13
If i want to store a, b and c i can multiply them 2*3*5=30 and i know no other product of primes can be 30. Then to check if a value contains, for example, b, all i need to do is 30 % b == 0
2) Bitmask
Just like Linux permissions, use powers of 2 and sum each value
But these 2 methods grow up fast (1st way faster than 2nd), and using prime numbers requires me to have a lot of primes.
Is there any other method to do this efficiently when you have, for example, a thousand values?

If you are storing, say, base 10 numbers, then do a conversion through base 11 numbers. With the increased base, you have an extra 'digit'. Use that digit as a separator. So, three base 10 numbers "10, 42, 457" become "10A42A457": a single base 11 number (with 'A' as the additional digit).
Whatever base your original numbers are in, increase the base by 1 and concatenate, using the extra digit as a separator. That will give you a single number in the increased base.
That single number can be stored in whatever number base you find convenient: binary, denary or hex for example.
To retrieve your original numbers just convert to base 11 (or whatever) and replace the extra digit with separators.
ETA: You don't have to use base 11. The single number "10A42A457" is also a valid hexadecimal number, so any base of 11 or above could be used. Hex may be easier to work with than base 11.

Is there any other method to do this efficiently when you have, for example, a thousand values?
I an not a mathematician but it's basic math, all depends on range
Range 0-1: You want to store 4 numbers 0-1 - it's basically binary system
Number1 + Number2 * 2^1 + Number3 * 2^2 + Number4 * 2^3
Range 0-50 You want to store 4 numbers 0-49
Number1 + Number2 * 50^1 + Number3 * 50^2 + Number4 * 50^3
Range 0-X You want to store N numbers 0-X
Number1 + Number2 * (X+1)^1 + Number3 * (X+1)^2 + ... + NumberN * (X+1)^(N-1)
If you have no pattern for your numbers (so it can get compressed in some way) there is really no other way.
It's also super easy for computer to resolve the number unlike the prime numbers
Predetermined values
#FlorainK comment pointed me to fact I missed
(i can decide which numbers)
The only logical solution is give your numbers references
0 is 15342
1 is 6547
2 is 76234
3 is "i like stack overflow"
4 is 42141
so you'll work range 0-4 (5 options) and whatever combination length. Use reference when "encoding" and "decoding" the number
a thousand values?
so you'll work with Range 0-999
0 is 62342
1 is 7456345653
2 is 45656234532
...
998 is 7623452
999 is 4324234326453
Let's say you use 64-bit system and programming/db language that works with 64-bit integers
2^64 = 18446744073709551616
your max range is 1000^X < 18446744073709551616 where X is number of numbers you can store in one single 64-bit integer number
Which is only 6.
You can store only 6 separate numbers 0-999 that will fit one 64-bit integer number.
0,0,0,0,0,0 is 0
1,0,0,0,0,0 is 1
0,1,0,0,0,0 is 1000
999,999,999,999,999,999 is ~1e+18

Ok so you want to store "a,b,c" or "a,b" or "a,b,c,d" or "a" etc. (thanks #FlorianK)
in such case just could use bitwise operators and powers of two
$a = 1 << 0; // 1
$b = 1 << 1; // 2
$c = 1 << 2; // 4
$d = 1 << 3; // 8
.. etc
let's say $flag has $a and $c
$flag = $a | $c; // $flag is integer here
now check it
$ok = ($flag & $a) && ($flag & $c); // true
$ok = ($flag & $a) && ($flag & $b); // false
so in 64 bit system/language/os you can use up to 64 flags which gives you a 2^64 combinations
there is no really other option. prime numbers are much worse for this as you skip many numbers in-between while binary system uses every single number.
I see you are using database and you want to store this in DB.
I really think we are dealing here with XY Problem and you should reconsider your application instead of making such workarounds.

Related

Dealing with big size binary numbers in php

Algorithm:
Given a number n, list(L(n)) of all binary numbers of size n can be calculated from L(n-1) in the following way:
Suppose array L(n-1) contains all the binary numbers of length n-1.
Reverse L(n-1) and call RL(n-1).
Append '0' to all binary numbers in L(n-1). Append '1' to all binary numbers in RL(n-1).
Merge new appended arrays L(n-1) and RL(n-1) to get all the binary numbers of size n.
Base Case,
if n=1, output = [0,1].
Example, if n=2, We can get list of all binary numbers of size 2 in the following way:
Let a = [0,1] be list of binary numbers of size (2-1) = 1.
Let b = reverse of a = [1,0].
Append 0 to all elements in a. New a = [00,11].
Append 1 to all elements in b. New b = [11,10].
Merge new a and b. [00,11,11,10].
Problem Statement: Given a number n, find list of n binary numbers.
Solution: A simple recursive or non-recursive solution works if n is less than 20.
Question: My code fails if a bigger number is passed lets say 40 and exceeds memory limits.
Why? - 'Coz For a number lets say 40, Total number of binary numbers will be power(2,40) which is huge(1048576 * 1048576).
So, Is there any better algorithm or way to solve the above problem?
What you can do is store only first n numbers. It seems that in every case you only need first n-1 numbers to only reverse and append with '1's for the correct output.

Two-way hashing of fixed range numbers

I need to create a function which takes a single integer as argument in the range 0-N and returns a seemingly random number in the same range.
Each input number should always have exactly one output and it should always be the same.
Such a function would produce something like this:
f(1) = 4
f(2) = 1
f(3) = 5
f(4) = 2
f(5) = 3
I believe this could be accomplished by some kind of a hashing algorithm? I don't need anything complex, just not something too simple like f(1) = 2, f(2) = 3 etc.
The biggest issue is that I need this to be reversible. E.g. the above table should be true left-to-right as well as right-to-left, using a different function for the right-to-left conversion is fine.
I know the easiest way is to create an array, shuffle it and just store the relations in a db or something, but as I need N to be quite large I'd like to avoid this if possible.
Edit: For my particular case N is a specific number, it's exactly 16777216 (64^4).
If the range is always a power of two -- like [0,16777216) -- then you can use exclusive-or just as #MarkBaker suggested. It just doesn't work so easily if your range is not a power of two.
You can use addition and subtraction modulo N, although these alone are too obvious, so you have to combine it with something else.
You can also do multiplication modulo-N, but reversing that is complicated. To make it simpler, we can isolate the bottom eight bits and multiply those and add them in a way that doesn't interfere with those bits so we can use them again to reverse the operation.
I don't know PHP so I'm going to give an example in C, instead. Maybe it's the same.
int enc(int x) {
x = x + 4799 * 256 * (x % 256);
x = x + 8896843;
x = x ^ 4777277;
return (x + 1073741824) % 16777216;
}
And to decode, play the operations back in reverse order:
int dec(int x) {
x = x + 1073741824;
x = x ^ 4777277;
x = x - 8896843;
x = x - 4799 * 256 * (x % 256);
return x % 16777216;
}
That 1073741824 must be a multiple of N, and 256 must be a factor of N, and if N is not a power of two then you can't (necessarily) use exclusive-or (^ is exclusive-or in C and I assume in PHP too). The other numbers you can fiddle with, and add and remove stages, at your leisure.
The addition of 1073741824 in both functions is to ensure that x stays positive; this is so that the modulo operation doesn't ever give a negative result, even after we've subtracted values from x which might have made it go negative in the interim.
I offered to describe how I "randomly" scramble up 9-digit SSNs when producing research data sets. This does not replace or hash an SSN. It re-orders the digits. It is difficult to put the digits back in the correct order if you don't know the order in which they were scrambled. I have a gut feeling that this is not what the questioner really wants. So, I am happy to delete this answer if it is deemed off-topic.
I know that I have 9 digits. So, I start with an array that has 9 index values in order:
$a = array(0,1,2,3,4,5,6,7,8);
Now, I need to turn a key that I can remember into a way to shuffle the array. The shuffling has to be the same order for the same key every time. I use a couple tricks. I use crc32 to turn a word into a number. I use srand/rand to get a predictable order of random values. Note: mt_rand no longer produces the same sequence of random digits with the same seed, so I have to use rand.
srand(crc32("My secret key"));
usort($a, function($a, $b) { return rand(-1,1); });
The array $a still has the digits 0 through 8, but they are shuffled. If I use the same keyword I will get the same shuffled order every time. That lets me repeat this every month and get the same result. Then, with a shuffled array, I can pick the digits off the SSN. First, I ensure it has 9 characters (some SSNs are sent as integers and a leading 0 is omitted). Then, I build a masked SSN by picking the digits using $a.
$ssn = str_pad($ssn, 9, '0', STR_PAD_LEFT);
$masked_ssn = '';
foreach($a as $i) $masked_ssn.= $ssn{$i};
$masked_ssn will now have all the digits in $ssn, but in a different order. Technically, there are keywords that make $a become the original ordered array after shuffling, but that is very very rare.
Hopefully this makes sense. If so, you can do it all much faster. If you turn the original string into an array of characters, you can shuffle the array of characters. You just need to reseed rand every time.
$ssn = "111223333"; // Assume I'm using a proper 9-digit SSN
$a = str_split($ssn);
srand(crc32("My secret key"));
usort($a, function($a, $b) { return rand(-1,1); });
$masked_ssn = implode('', $a);
This is not really faster in a runtime way because rand is a rather expensive function and you run rand a hell of lot more here. If you are masking thousands of values as I do, you will want to use an index array that is shuffled just once, not a shuffling for every value.
Now, how do I undo it? Assume I'm using the first method with the index array. It will be something like $a = {5, 3, 6, 1, 0, 2, 7, 8, 4}. Those are the indexes for the original SSN in the masked order. So, I can easily build the original SSN.
$ssn = '000000000'; // I like to define all 9 characters before I start
foreach($a as $i=>$j) $ssn[$j] = $masked_ssn{$i};
As you can see, $i counts from 0 to 8 across the masked SSN. $j counts 5, 3, 6... and puts each value from the masked SSN in the correct place in the original SSN.
Looks like you've got good answer, but still there is an alternative. Linear Congruential Generator (LCG) could provide 1-to-1 mapping and it is known to be a reversible using Euclid's algorithm. For 24bit
Xi = [(A * Xi-1) + C] Mod M
where M = 2^24 = 16,777,216
A = 16,598,013
C = 12,820,163
For LCG reversability take a look at Reversible pseudo-random sequence generator

Which will be more unique?

I need a 16 digit unique number. I can use the following 2 examples, Which will generate more unique numbers?
<?php>
$a=rand(1000000000000000,9999999999999999);
$b=rand(1000,9999).rand(1000,9999).rand(1000,9999).rand(1000,9999);
echo($a);
echo($b);
?>
The first will generate more random numbers simply because it will allow "0" in 3 additional spots that the second won't.
$a is more random for it can have all zeros except at the placement of preceding 1 - whereas $b can't.
But that's not my point
Your $b statement indicates that your random number can well be a string.
So why not expanding your range by padding zeros to make it still 16 digits?
$a = str_pad(rand(0,9999999999999999), 16, "0", STR_PAD_LEFT);
There are additional 1000000000000000 possibilities here.
$a=rand(1000000000000000,9999999999999999);
Creates one number in the range from 1000000000000000 to 9999999999999999 which makes a total of 8999999999999999 possible numbers.
$b=rand(1000,9999).rand(1000,9999).rand(1000,9999).rand(1000,9999);
Creates 4 numbers from 1000 to 9999 which makes a total of 8999 * 8999 * 8999 * 8999 = 6558084485964001 numbers.
First variation will produce about 37% more possible numbers compared with second solution.
This will create one random 16-digit number from 0000000000000000 to 9999999999999999:
$c = sprintf('%016d',rand(0,9999999999999999));

PHP: precision in mathematical operations [duplicate]

This question already has answers here:
Is floating point math broken?
(31 answers)
Closed 5 years ago.
$a = '35';
$b = '-34.99';
echo ($a + $b);
Results in 0.009999999999998
What is up with that? I wondered why my program kept reporting odd results.
Why doesn't PHP return the expected 0.01?
Because floating point arithmetic != real number arithmetic. An illustration of the difference due to imprecision is, for some floats a and b, (a+b)-b != a. This applies to any language using floats.
Since floating point are binary numbers with finite precision, there's a finite amount of representable numbers, which leads accuracy problems and surprises like this. Here's another interesting read: What Every Computer Scientist Should Know About Floating-Point Arithmetic.
Back to your problem, basically there is no way to accurately represent 34.99 or 0.01 in binary (just like in decimal, 1/3 = 0.3333...), so approximations are used instead. To get around the problem, you can:
Use round($result, 2) on the result to round it to 2 decimal places.
Use integers. If that's currency, say US dollars, then store $35.00 as 3500 and $34.99 as 3499, then divide the result by 100.
It's a pity that PHP doesn't have a decimal datatype like other languages do.
Floating point numbers, like all numbers, must be stored in memory as a string of 0's and 1's. It's all bits to the computer. How floating point differs from integer is in how we interpret the 0's and 1's when we want to look at them.
One bit is the "sign" (0 = positive, 1 = negative), 8 bits are the exponent (ranging from -128 to +127), 23 bits are the number known as the "mantissa" (fraction). So the binary representation of (S1)(P8)(M23) has the value (-1^S)M*2^P
The "mantissa" takes on a special form. In normal scientific notation we display the "one's place" along with the fraction. For instance:
4.39 x 10^2 = 439
In binary the "one's place" is a single bit. Since we ignore all the left-most 0's in scientific notation (we ignore any insignificant figures) the first bit is guaranteed to be a 1
1.101 x 2^3 = 1101 = 13
Since we are guaranteed that the first bit will be a 1, we remove this bit when storing the number to save space. So the above number is stored as just 101 (for the mantissa). The leading 1 is assumed
As an example, let's take the binary string
00000010010110000000000000000000
Breaking it into it's components:
Sign Power Mantissa
0 00000100 10110000000000000000000
+ +4 1.1011
+ +4 1 + .5 + .125 + .0625
+ +4 1.6875
Applying our simple formula:
(-1^S)M*2^P
(-1^0)(1.6875)*2^(+4)
(1)(1.6875)*(16)
27
In other words, 00000010010110000000000000000000 is 27 in floating point (according to IEEE-754 standards).
For many numbers there is no exact binary representation, however. Much like how 1/3 = 0.333.... repeating forever, 1/100 is 0.00000010100011110101110000..... with a repeating "10100011110101110000". A 32-bit computer can't store the entire number in floating point, however. So it makes its best guess.
0.0000001010001111010111000010100011110101110000
Sign Power Mantissa
+ -7 1.01000111101011100001010
0 -00000111 01000111101011100001010
0 11111001 01000111101011100001010
01111100101000111101011100001010
(note that negative 7 is produced using 2's complement)
It should be immediately clear that 01111100101000111101011100001010 looks nothing like 0.01
More importantly, however, this contains a truncated version of a repeating decimal. The original decimal contained a repeating "10100011110101110000". We've simplified this to 01000111101011100001010
Translating this floating point number back into decimal via our formula we get 0.0099999979 (note that this is for a 32-bit computer. A 64-bit computer would have much more accuracy)
A Decimal Equivalent
If it helps to understand the problem better, let's look decimal scientific notation when dealing with repeating decimals.
Let's assume that we have 10 "boxes" to store digits. Therefore if we wanted to store a number like 1/16 we would write:
+---+---+---+---+---+---+---+---+---+---+
| + | 6 | . | 2 | 5 | 0 | 0 | e | - | 2 |
+---+---+---+---+---+---+---+---+---+---+
Which is clearly just 6.25 e -2, where e is shorthand for *10^(. We've allocated 4 boxes for the decimal even though we only needed 2 (padding with zeroes), and we've allocated 2 boxes for signs (one for the sign of the number, one of the sign of the exponent)
Using 10 boxes like this we can display numbers ranging from -9.9999 e -9 to +9.9999 e +9
This works fine for anything with 4 or fewer decimal places, but what happens when we try to store a number like 2/3?
+---+---+---+---+---+---+---+---+---+---+
| + | 6 | . | 6 | 6 | 6 | 7 | e | - | 1 |
+---+---+---+---+---+---+---+---+---+---+
This new number 0.66667 does not exactly equal 2/3. In fact, it's off by 0.000003333.... If we were to try and write 0.66667 in base 3, we would get 0.2000000000012... instead of 0.2
This problem may become more apparent if we take something with a larger repeating decimal, like 1/7. This has 6 repeating digits: 0.142857142857...
Storing this into our decimal computer we can only show 5 of these digits:
+---+---+---+---+---+---+---+---+---+---+
| + | 1 | . | 4 | 2 | 8 | 6 | e | - | 1 |
+---+---+---+---+---+---+---+---+---+---+
This number, 0.14286, is off by .000002857...
It's "close to correct", but it's not exactly correct, and so if we tried to write this number in base 7 we would get some hideous number instead of 0.1. In fact, plugging this into Wolfram Alpha we get: .10000022320335...
These minor fractional differences should look familiar to your 0.0099999979 (as opposed to 0.01)
There's plenty of answers here about why floating point numbers work the way they do...
But there's little talk of arbitrary precision (Pickle mentioned it). If you want (or need) exact precision, the only way to do it (for rational numbers at least) is to use the BC Math extension (which is really just a BigNum, Arbitrary Precision implementation...
To add two numbers:
$number = '12345678901234.1234567890';
$number2 = '1';
echo bcadd($number, $number2);
will result in 12345678901235.1234567890...
This is called arbitrary precision math. Basically all numbers are strings which are parsed for every operation and operations are performed on a digit by digit basis (think long division, but done by the library). So that means it's quite slow (in comparison to regular math constructs). But it's very powerful. You can multiply, add, subtract, divide, find modulo and exponentiate any number that has an exact string representation.
So you can't do 1/3 with 100% accuracy, since it has a repeating decimal (and hence isn't rational).
But, if you want to know what 1500.0015 squared is:
Using 32 bit floats (double precision) gives the estimated result of:
2250004.5000023
But bcmath gives the exact answer of:
2250004.50000225
It all depends on the precision you need.
Also, something else to note here. PHP can only represent either 32 bit or 64 bit integers (depending on your install). So if an integer exceeds the size of the native int type (2.1 billion for 32bit, 9.2 x10^18, or 9.2 billion billion for signed ints), PHP will convert the int into a float. While that's not immediately a problem (Since all ints smaller than the precision of the system's float are by definition directly representable as floats), if you try multiplying two together, it'll lose significant precision.
For example, given $n = '40000000002':
As a number, $n will be float(40000000002), which is fine since it's exactly represented. But if we square it, we get: float(1.60000000016E+21)
As a string (using BC math), $n will be exactly '40000000002'. And if we square it, we get: string(22) "1600000000160000000004"...
So if you need the precision with large numbers, or rational decimal points, you might want to look into bcmath...
bcadd() might be useful here.
<?PHP
$a = '35';
$b = '-34.99';
echo $a + $b;
echo '<br />';
echo bcadd($a,$b,2);
?>
(inefficient output for clarity)
First line gives me 0.009999999999998.
Second gives me 0.01
Because 0.01 can't be represented exactly as sum of series of binary fractions. And that is how floats are stored in memory.
I guess it is not what you want to hear, but it is answer to question. For how to fix see other answers.
Every number will be save in computer by binary value such as 0, 1. In Single-precision numbers occupy 32 bits.
The floating point number can be presented by: 1 bit for sign, 8 bit for exponent and 23 bit called mantissa (fraction).
Look the example below:
0.15625 = 0.00101 = 1.01*2^(-3)
sign: 0 mean positive number, 1 mean negative number, in this case it is 0.
exponent: 01111100 = 127 - 3 = 124.
Note: the bias = 127 so biased exponent = −3 + the "bias". In single precision, the bias is ,127, so in this example the biased exponent is 124;
At fraction part, we have: 1.01 mean: 0*2^-1 + 1*2^-2
Number 1 (first position of 1.01) do not need to save because when present the floating number in this way the first number always be 1.
For example convert: 0.11 => 1.1*2^(-1), 0.01 => 1*2^(-2).
Another example show always remove the first zero: 0.1 will be presented 1*2^(-1). So the first alwasy be 1.
The present number of 1*2^(-1) will be:
0: positive number
127-1 = 126 = 01111110
fraction: 00000000000000000000000 (23 number)
Finally: The raw binary is:
0 01111110 00000000000000000000000
Check it here: http://www.binaryconvert.com/result_float.html?decimal=048046053
Now if you already understand how a floating point number are saved. What happen if the number cannot save in 32 bit (simple precision).
For example: in decimal. 1/3 = 0.3333333333333333333333 and because it is infinite I suppose we have 5 bit to save data. Repeat again this is not real. just suppose. So the data saved in computer will be:
0.33333.
Now when the number loaded the computer calculate again:
0.33333 = 3*10^-1 + 3*10^-2 + 3*10^-3 + 3*10^-4 + 3*10^-5.
About this:
$a = '35';
$b = '-34.99';
echo ($a + $b);
The result is 0.01 ( decimal). Now let show this number in binary.
0.01 (decimal) = 0 10001111 01011100001010001111 (01011100001010001111)*(binary)
Check here: http://www.binaryconvert.com/result_double.html?decimal=048046048049
Because (01011100001010001111) is repeat just like 1/3. So computer cannot save this number in their memory. It must sacrifice. This lead not accuracy in computer.
Advanced
( You must have knowledge about mathematics )
So why we can easily show 0.01 in decimal but not in binary.
Suppose the fraction in binary of 0.01 (decimal) is finite.
So 0.01 = 2^x + 2^y... 2^-z
0.01 * (2^(x+y+...z)) = (2^x + 2^y... 2^z)*(2^(x+y+...z)). This expression is true when (2^(x+y+...z)) = 100*x1. There are not integer n = x+y+...+z exists.
=> So 0.01 (decimal) must be infine in binary.
Use PHP's round() function: http://php.net/manual/en/function.round.php
This answer solves problem, but not explains why. I thought that it is obvious [I am also programming in C++, so it IS obvious for me ;]], but if not, let's say that PHP has it's own calculating precision and in that particular situation it returned most complying information regarding that calculation.
wouldn't it be easier to use number_format(0.009999999999998, 2) or $res = $a+$b; -> number_format($res, 2);?

Five unique, random numbers from a subset

I know similar questions come up a lot and there's probably no definitive answer, but I want to generate five unique random numbers from a subset of numbers that is potentially infinite (maybe 0-20, or 0-1,000,000).
The only catch is that I don't want to have to run while loops or fill an array.
My current method is to simply generate five random numbers from a subset minus the last five numbers. If any of the numbers match each other, then they go to their respective place at the end of the subset. So if the fourth number matches any other number, it will bet set to the 4th from the last number.
Does anyone have a method that is "random enough" and doesn't involve costly loops or arrays?
Please keep in mind this a curiosity, not some mission-critical problem. I would appreciate it if everyone didn't post "why are you having this problem?" answers. I am just looking for ideas.
Thanks a lot!
One random number call is enough.
If you want to choose a subset of 5 unique numbers in range 1-n, then select a random number in 1 to (n choose r).
Keep a 1-1 mapping from 1 to (n choose r) to the set of possible 5 element subsets, and you are done. This mapping is standard and can be found on the web, for instance here: http://msdn.microsoft.com/en-us/library/aa289166%28VS.71%29.aspx
As an example:
Consider the problem of generating a subset of two numbers from five numbers:
The possible 2 element subset of {1,..., 5} are
1. {1,2}
2. {1,3}
3. {1,4}
4. {1,5}
5. {2,3}
6. {2,4}
7. {2,5}
8. {3,4}
9. {3,5}
10. {4,5}
Now 5 choose 2 is 10.
So we select a random number from 1 to 10. Say we got 8. Now we generate the 8th element in the sequence above: which gives {3,4}, so the two numbers you want are 3 and 4.
The msdn page I linked to, shows you a method to generate the set, given the number. i.e. given 8, it gives back the set {3,4}.
Your best option is a loop, as in:
$max = 20;
$numels = 5;
$vals = array();
while (count($vals) < $numels) {
$cur = rand(0, $max);
if (!in_array($cur, $vals))
$vals[] = $cur;
}
For small ranges, you can use array_rand:
$max = 20;
$numels = 5;
$range = range(0, $max);
$vals = array_rand($range, $numels);
You could also generate a number between 0 and max, another between 0 and max-1, ... between 0 and max-4. Then you would sum x to the n-th generated number where x is the number calculated in this fashion:
Take the number generated in the n-th iteration and assign it to x
if it's larger or equal to that generated in the first iteration, increment it
if this new number is larger or equal to that generated (and corrected) in the second iteration, increment it
...
if this new number is larger or equal to that generated (and corrected) in the (n-1)-th iteration increment it
The mapping is like this:
1 2 3 4 5 6 7 8 9 (take 4)
1 2 3 4 5 6 7 8 9 (gives 4)
1 2 3 4 5 6 7 8 (take 5)
1 2 3 5 6 7 8 9 (gives 6)
1 2 3 4 5 6 7 (take 6)
1 2 3 5 7 8 9 (gives 8)
1 2 3 4 5 6 (take 5)
1 2 3 5 7 9 (gives 7)
example, last extraction:
x = 5
x >= 4? x == 6
x >= 6? x == 7
x >= 8? x == 7
The general form of this question is really interesting. Should one select from a pool of elements (and remove them from the pool) or should one loop "while hitting" an already taken element?
As far as I can tell, the python library implementation for random.sample chooses at runtime between the two methods depending on the proportion of the size of the input list and the number of elements to select.
A comment from the source code:
# When the number of selections is small compared to the
# population, then tracking selections is efficient, requiring
# only a small set and an occasional reselection. For
# a larger number of selections, the pool tracking method is
# preferred since the list takes less space than the
# set and it doesn't suffer from frequent reselections.
In the specific instance that the OP mentions however (selecting 5 numbers), I think that looping "while hitting a taken number" is ok, unless the pseudo random generator is broken.
Since you are just looking for different ideas here's one:
Call out to Random.org to generate the set of random numbers you need.
If you know the size N then keep each number with probability 5/N generate a random number between 0 and 1 and if it is less than 5/N keep the item. Stop when we have 5 items.
If we don't know N use resorvoir sampling.
An implementation of Artefacto's second solution above in C#, as a helper and an extension method on ICollection:
static class Program {
public static IEnumerable<int> Subset(int max) {
Random random = new Random();
List<int> selections = new List<int>();
for (int space = max; space > 0; space--) {
int selection = random.Next(space);
int offset = selections.TakeWhile((n, i) => n <= selection + i).Count();
selections.Insert(offset, selection + offset);
yield return selection + offset;
}
}
public static IEnumerable<T> Random<T>(this ICollection<T> collection) {
return Subset(collection.Count).Select(collection.ElementAt);
}
static void Main(string[] args) {
Subset(10000).Take(10).ToList().ForEach(Console.WriteLine);
"abcdefghijklmnopqrstuvwxyz".ToArray().Random().Take(5).ToList().ForEach(Console.WriteLine);
}
}
I know we are trying to avoid loops, but just in case this helps someone, you can use a HashSet instead of a List. This is very efficient on a sparse collection where collisions are somewhat rare.
var hs = new HashSet<int>();
var rand = new Random();
for(int i=0; i<10000; i++)
{
int n;
while(true)
{
n = rand.Next(0, 10000000);
if(!hs.Contains(n)) {break;}
}
hs.Add(n);
}

Categories