Further to my question here, I'll be using the random_compat polyfill (which uses /dev/urandom) to generate random numbers in the 1 to 10,000,000 range.
I do realise, that all things being correct with how I code my project, the above tools should produce good (as in random/secure etc) data. However, I'd like to add extra sources of randomness into the mix - just in case 6 months down the line I read there is patch available for my specific OS version to fix a major bug in /dev/urandom (or any other issue).
So, I was thinking I can get numbers from random.org and fourmilab.ch/hotbits
An alternative source would be some logs from a web site I operate - timed to the microsecond, if I ignore the date/time part and just take the microseconds - this has in effect been generated by when humans decide to click on a link. I know this may be classed as haphazard rather than random, but would it be good for my use?
Edit re timestamp logs - will use PHP microtime() which will creaet a log like:
0.**832742**00 1438282477
0.**57241**000 1438282483
0.**437752**00 1438282538
0.**622097**00 1438282572
I will just use the bolded portion.
So let's say I take two sources of extra random numbers, A and B, and the output of /dev/urandom, call that U and set ranges as follows:
A and B are 1 - 500,000
U is 1 - 9,000,000
Final random number is A+B+U
I will be needing several million final numbers between 1 and 10,000,000
But the pool of A and B numbers will only contain a few thousand, but I think by using prime number amounts I can stretch that into millions of A&B combinations like so
// this pool will be integers from two sources and contain a larger prime number
// of members instead of the 7 & 11 here - this sequence repeats at 77
$numbers = array("One","Two","Three","Four","Five","Six","Seven");
$colors = array("Silver","Gray","Black","Red","Maroon","Yellow","Olive","Lime","Green","Aqua","Orange");
$ni=0;
$ci=0;
for ($i=0;$i<$num_numbers_required;$i++)
{
$offset = $numbers[$ni] + $colors[$ci];
if ($ni==6) // reset at prime num 7
$ni=0;
else
$ni++;
if ($ci==10) // reset at prime num 11
$ci=0;
else
$ci++;
}
Does this plan make sense - is there any possibility I can actually make my end result less secure by doing all this? And what of my idea to use timestamp data?
Thanks in advance.
I would suggest reading RFC4086, section 5. Basically it talks about how to "mix" different entropy sources without compromising security or bias.
In short, you need a "mixing function". You can do this with xor, where you simply set the result to the xor of the inputs: result = A xor B.
The problem with xor is that if the numbers are correlated in any way, it can introduce strong bias into the result. For example, if bits 1-4 of A and B are the current timestamp, then the result's first 4 bits will always be 0.
Instead, you can use a stronger mixing function based on a cryptographic hash function. So instead of A xor B you can do HMAC-SHA256(A, B). This is slower, but also prevents any correlation from biasing the result.
This is the strategy that I used in RandomLib. I did this because not every system has every method of generation. So I pull as many methods as I can, and mix them strongly. That way the result is never weaker than the strongest method.
HOWEVER, I would ask why. If /dev/urandom is available, you're not going to get better than it. The reason is simple, even if you call random.org for more entropy, your call is encrypted using random keys generated from /dev/urandom. Meaning if an attacker can compromise /dev/urandom, your server is toast and you will be spinning your wheels trying to make it better.
Instead, simply use /dev/urandom and keep your OS updated...
Related
I'm building a system where I need to assign users access to a specific (individual) number of assets. These assets (potentially numbering in the tens of thousands).
I thought to do it with bitwise comparisons, so something like storing the value of 3 if a user has access to assets 1 and 2, a value of 7 for access to 1, 2 and 3, etc. etc.
The access is not necessarily sequential, so a user could easily have access to assets 10, 12 and 24324.
I quickly ran into a problem using bits where the server wouldn't pick up assets beyond the 63rd bit, so obviously I've either misunderstood something, or bits is a dumb way to store this kind of info.
My code, running on a 64-bit Linux system, is this (just for testing purposes obviously, to discover just such limitations as this):
<?php
$bitwise = $_GET['bitwise'];
if (isset($bitwise)) {
echo "<br/>bitwise input: ";
echo $bitwise;
$bitcount = 0;
for ($i=1;$i<=$bitwise;$i*=2) {
if (($i & $bitwise) > 0) {
$bitcount++;
echo "<br/>{$bitcount}: " . $i . " is in " . $bitwise;
}
}
}
?>
And I input test values via the querystring. However, no matter what value I input, the maximum count I can get to is 63.
So, my question is: Is this simply because I'm using bitwise comparisons for something they're not ideal for (my theory), or is my implementation of it just wrong?
My next go-to solution would be to store the "bits" in arrays, so if someone has access to assets 1, 2 and 3 I'll store their list as [1,2,3]. It's unlikely that someone has access to more than, say, a hundred specific assets. Is this a reasonable way to do it? I realize this puts the question somewhat into discussion-worthy territory, but hopefully it's still specific enough.
Paramount concerns are, of course, performance if the server has to serve a large number of clients at the same time.
(please excuse wrong terminology where applicable, hopefully my meaning is clear).
This is standard behavior — on 64 bit compiled PHP, integers have a maximum length of 64 bits. While it warms my secret grey beard heart, if you have more than 64 different roles, a bitwise solution is the wrong one for access control.
Two other things worth mentioning.
First, doing this for performance reasons is probably a premature optimization for a web application. ACL lookups aren't going to be the bottleneck in your system for a long time, if at all. Also, it's not clear if bitwise operators offer that much PHP performance benefit, given the language's dynamically typed nature.
Second, the reason you're limited to 63 bits is because PHP (appears to?) use Two's compliment for their implementation of signed integers. The final bit is for representing positive or negative numbers. I asked this question about the bitwise NOT a while back, which is why this question caught my eye.
I'm making a little web application which needs to randomize stuff.
Just a little example of what's it gonna have: returns a random number between and 10 to the user.
I'm planning to do it using Javascript and jQuery (the calculator itself).
My question is: How can I make its functions truly random and not pseudo-random? Is the PHP function perhaps more random than the Javascript ones?
For the sake of the question, let's say what I want is a true random number between X and Y.
No function that you call will be "truly random". They are all PRNGs; most PRNGs are, in fact, quite good and I'd be surprised if they were inadequate for your application. Also, while a few PRNGs are notorious for their small period (Java's Random comes to mind), every modern language that I know of—including JavaScript, PHP, and (with its crypto packages) Java—have very good PRNGs.
The best way to collect "more random" data is to obtain it from an external random source. One possibility is to ask the user to move the mouse around in a window for a certain period of time, collecting the mouse coordinates at regular intervals. Some military, banking, and other high-security systems use hardware like thermal noise sensors to obtain data that is as close to random as one can hope; however, such hardware support is not going to be available to a web app.
Note that hacks like using the system clock are no better than PRNGs; most PRNGs initialize their seed with a combination of such data.
You have not understood the Matrix movies. ;) One function is not "more random" than one in another language. All functions are equally pseudo random. "Pseudo" means that the computer cannot, by definition, pull a random number out of thin air. It just can't. A computer computes, strictly, based on rules, accurately. There's no randomness anywhere in the system. For all its supposed power, randomness is the one thing a computer simply cannot do (that, and making coffee).
For true randomness, you need an outside, natural source. Like measuring atomic decay, or some such thing which is not predictable and truly random. Anything else is just pseudo-randomness, which may be of varying quality.
Good PRNGs try to collect "outside interference" in an entropy pool; e.g. Linux' /dev/random takes into account system driver "noise" which may be based on "random" packets hitting the ethernet port, or the user's mouse movements. How truly random that is is debatable, but it's very very hard to predict at least and suitably random for most purposes.
I don't think there's any way to remove the deterministic aspect of randomness from a program completely. You can do all you want to minimize, mitigate, and obfuscate whatever process you're using to "pull numbers from a hat", but you can never truly make it perfectly random.
You can fashion out a process with sufficient detail to make it practically random, but true random may be impossible.
While you can't achive true random with your code in php you can use random.org API. You could connect through curl in php or through ajax in javascript.
They are using atmospheric noise as a random seed as far as i know.
It is not possible to generate truly random variables on a computer. However, you may improve standard generators. Suppose, you have two basic generators. You create a table and fill it with values from the first generator. Then, if you want to get a number, the second one generates an index and returns correspondent value from the table. This value is then replaced with the new one... I forgot how this generator is called... Hope, it helps.
P.S. Sorry for my English.
My suggestion is that you generate a binary random string by encrypting the local time and date by an encryption algorithm. In this case try to gather all possible sources of "random" data, and load these both as input message and input key.
As you have seen from previous answers, above, your use and your requirements of your random data are important. A number is "random" if it is difficult or impossible for your application to guess its value in advance. Note that the same number source may be considered random for some applications while they are not random in others. Evidently you will have serious problems in case you need high quality random numbers for a demanding application.
TRNG98 True Random Numbers
In an attempt to prevent memory corruption in a long-running pre-rendering script, I want to be able to say to my program "okay, render the first 1,000 steps". Then I can take a look at the output, inspect it, etc. Then I want to say "now generate steps 1,001 through 10,000".
I have it working almost perfectly. There's just one thing I'm struggling with.
The rendering script uses rand() to add entropy to the generated output, with srand() at the start to ensure it remains constant in re-renders. Currently I "solve" this issue by counting how many times rand() is called, and then calling it that many times before starting the actual generation. The problem with this is that it can be very slow, especially when I've generated a few million random numbers.
Is there any way to determine what value I need to pass to srand() to continue generating the sequence? Is this even possible?
If not, is there any way to find out what exact algorithm rand() is using? I really like the map I'm getting from srand(43) and would like to keep it if possible!
EDIT: Using Patashu's answer, here's what I've come up with:
function rnd() {
static $seed = 42;
$seed = $seed*214013+2531011;
$mod = pow(2,32);
while($seed > $mod) $seed -= $mod;
$rs = floor($seed/65536)&0x7fff;
return floor(2*$rs/0x8000);
}
It relies on the use of floats because as far as I can tell the 51 bits of the mantissa are easily enough to store the numbers with perfect precision, and integers get truncated or wrap around if bit operators are used.
This does not directly answer your question, but since it looks like you don't need particularly "good" random numbers, why not look into writing your own pseudo-random-number generator? This way you can easily serialize and deserialize its state whenever you need.
Something like the algorithm at http://en.wikipedia.org/wiki/Random_number_generation#Computational_methods may even do the trick. You can seed it with the current time whenever you start a new sequence.
This page here http://cod.ifies.com/2008/05/php-rand01-on-windows-openssl-rand-on.html explains the source code for PHP's rand()
Be warned, it's not very pretty and it's dependent on quirks of PHP and your OS implementation of rand() :)
In nearly any programming language, if I do $number = rand(1,100) then I have created a flat probability, in which each number has a 1% chance of coming up.
What if I'm trying to abstract something weird, like launching rockets into space, so I want a curved (or angled) probability chart. But I don't want a "stepped" chart. (important: I'm not a math nerd, so there are probably terms or concepts that I'm completely skipping or ignorant of!) An angled chart is fine though.
So, if I wanted a probability that gave results of 1 through 100... 1 would be the most common result. 2 the next most common. In a straight line until a certain point - lets say 50, then the chart angles, and the probability of rolling 51 is less than that of rolling 49. Then it angles again at 75, so the probability of getting a result above 75 is not simply 25%, but instead is some incredibly smaller number, depending on the chart, perhaps only 10% or 5% or so.
Does this question make any sense? I'd specifically like to see how this can be done in PHP, but I wager the required logic will be rather portable.
The short answers to your questions are, yes this makes sense, and yes it is possible.
The technical term for what you're talking about is a probability density function. Intuitively, it's just what it sounds like: It is a function that tells you, if you draw random samples, how densely those samples will cluster (and what those clusters look like.) What you identify as a "flat" function is also called a uniform density; another very common one often built into standard libraries is a "normal" or Gaussian distribution. You've seen it, it's also called a bell curve distribution.
But subject to some limitations, you can have any distribution you like, and it's relatively straightforward to build one from the other.
That's the good news. The bad news is that it's math nerd territory. The ideas behind probability density functions are pretty intuitive and easy to understand, but the full power of working with them is only unlocked with a little bit of calculus. For instance, one of the limitations on your function is that the total probability has to be unity, which is the same as saying that the area under your curve needs to be exactly one. In the exact case you describe, the function is all straight lines, so you don't strictly need calculus to help you with that constraint... but in the general case, you really do.
Two good terms to look for are "Transformation methods" (there are several) and "rejection sampling." The basic idea behind rejection sampling is that you have a function you can use (in this case, your uniform distribution) and a function you want. You use the uniform distribution to make a bunch of points (x,y), and then use your desired function as a test vs the y coordinate to accept or reject the x coordinates.
That makes almost no sense without pictures, though, and unfortunately, all the best ways to talk about this are calculus based. The link below has a pretty good description and pretty good illustrations.
http://www.stats.bris.ac.uk/~manpw/teaching/folien2.pdf
Essentially you need only to pick a random number and then feed into a function, probably exponential, to pick the number.
Figuring out how weighted you want the results to be will make the formula you use different.
Assuming PHP has a random double function, I'm going to call it random.
$num = 100 * pow(random(), 2);
This will cause the random number to multiply by itself twice, and since it returns a number between 0 and 1, it will get smaller, thus increasing the chance to be a lower number. To get the exact ratio you'd just have to play with this format.
To me it seems like you need a logarithmic function (which is curved). You'd still pull a random number, but the value that you'd get would be closer to 1 than 100 most of the time. So I guess this could work:
function random_value($min=0, $max=100) {
return log(rand($min, $max), 10) * 10;
}
However you may want to look into it yourself to make sure.
The easiest way to achieve a curved probability is to think how you want to distribute for example a prize in a game across many winners and loosers. To simplify your example I take 16 players and 4 prizes. Then I make an array with a symbol of the prize (1,2,2,3,3,3,3,3,4,4,4,4,4,4,4) and pick randomly a number out of this array. Mathematically you would have a probability for prize 1 = 1:16, for prize 2 3:16, for prize 3 5:16 and for prize 4 7:16.
Previously I was using the class found here to convert userID to some random string.
From his blog:
Running:
alphaID(9007199254740989);
will return 'PpQXn7COf' and:
alphaID('PpQXn7COf', true);
will return '9007199254740989'
So the idea was that users could do www.mysite.com/user/PpQXn7COf and i convert that to a normal integer so i could do in mysql
"Select * from Users where userID=".alphaID('PpQXn7COf', true)
Now i'm just started working with Cassandra an i'm looking for some replacement.
I want url like www.mysite.com/user/PpQXn7COf not like www.mysite.com/user/username1
The "PpQXn7COf" uuid must be as short as possible.
In the Twissandra example explained here: http://www.rackspace.com/cloud/blog/2010/05/12/cassandra-by-example/
They create some long uuid (i guess it is so long because then its almost 100 percent sure its random).
In mysql i just had a userID column with auto increasement so when i used the alphaID() function i always got a very short random string.
Anyone an idea how to solve this as clean as possible?
Edit:
It is used for a social media site so it must be persistent.
Thats also why i don't want to use usernames/realnames in urls, user cant remain google undetected if they need.
I just got a simple idea, however i don't know how scalable it is
<?php
//createUUID() makes +- 14 char string with A-Z a-z 1-0 based on micro/milli/nanoseconds
while(get_count(createUUID()) > 0){//uuid is unique
//insert username pass, uuid etc into cassandra
if($result == "1"){
header('Location: http://www.mysite.com/usercenter');
}else{
echo "error";
}
}
?>
When this gets the size of lets say twitter/facebook:
Will it execute in acceptable time?
Will it still generate unique uuid fast enough so if 10000 users/second are registering it isnt cluttering up?
Auto-increments are not suitable for a robust distributed system. You can only assign a unique ID if every node in your system is available, to ensure it's unique.
You can of course, invent your own unique-id generator, but you must then ensure that it will generate unique IDs anywhere in your infrastructure.
For example, each node can just have a file which it (with suitable locking etc) just increments, but you will also need to ensure that they don't clash - for instance, by having the server ID included in the generation algorithm.
This may be operationally nontrivial - your ops engineers will need to ensure that all the servers in the infrastructure are configured correctly with their own ID generators set up so that they don't generate the same ID. However, it's possible.
UUIDs are the reasonable alternative, because they will definitely be unique.
A UUID is 128 bits; if we store 6 bits per character (i.e. base64) then that takes 22 characters, which is quite a long URI. If you want it shorter, you will need to generate unique IDs a different way.
Plus it all depends on "how unique" you actually need your IDs to be. If your IDs can safely be reused after a few months, you can probably do it in < 60 bits (depending also on the number of servers in your infrastructure, and how frequently you need to generate them).
We use
Server ID
Time (granularity = 2 seconds), but wraps after a few months
A per-server counter (which wraps frequently, but not within 2 seconds)
And stick all the bits together. This generates an ID which is < 64 bits long, but is guaranteed to be unique for the length of time it needs to be (which in our case is only a couple of months)
Our algorithm will malfunction and generate a duplicate ID if:
The system clock on one of our nodes goes backwards by the same amount of time in which the counter wraps.
Our operations engineers make a mistake and assign the same server ID to two servers.
Eventually, after about 9 months.