Save/restore the state of PHP's rand() - php

In an attempt to prevent memory corruption in a long-running pre-rendering script, I want to be able to say to my program "okay, render the first 1,000 steps". Then I can take a look at the output, inspect it, etc. Then I want to say "now generate steps 1,001 through 10,000".
I have it working almost perfectly. There's just one thing I'm struggling with.
The rendering script uses rand() to add entropy to the generated output, with srand() at the start to ensure it remains constant in re-renders. Currently I "solve" this issue by counting how many times rand() is called, and then calling it that many times before starting the actual generation. The problem with this is that it can be very slow, especially when I've generated a few million random numbers.
Is there any way to determine what value I need to pass to srand() to continue generating the sequence? Is this even possible?
If not, is there any way to find out what exact algorithm rand() is using? I really like the map I'm getting from srand(43) and would like to keep it if possible!
EDIT: Using Patashu's answer, here's what I've come up with:
function rnd() {
static $seed = 42;
$seed = $seed*214013+2531011;
$mod = pow(2,32);
while($seed > $mod) $seed -= $mod;
$rs = floor($seed/65536)&0x7fff;
return floor(2*$rs/0x8000);
}
It relies on the use of floats because as far as I can tell the 51 bits of the mantissa are easily enough to store the numbers with perfect precision, and integers get truncated or wrap around if bit operators are used.

This does not directly answer your question, but since it looks like you don't need particularly "good" random numbers, why not look into writing your own pseudo-random-number generator? This way you can easily serialize and deserialize its state whenever you need.
Something like the algorithm at http://en.wikipedia.org/wiki/Random_number_generation#Computational_methods may even do the trick. You can seed it with the current time whenever you start a new sequence.

This page here http://cod.ifies.com/2008/05/php-rand01-on-windows-openssl-rand-on.html explains the source code for PHP's rand()
Be warned, it's not very pretty and it's dependent on quirks of PHP and your OS implementation of rand() :)

Related

Mathematical formula string to php variables and operators

I have to following problem. I find it difficult to explain, as I am a hobby coder. Please forgive me if commit any major fauxpas:
I am working on a baseball database that deals with baseball specfic and non specific metrics/stats. Most of the date output is pretty simple when the metrics are cumulative or when I only want to display the dataset of one day that has been entered manually or imported from a csv. (All percentages are calculated and fed into the db as numbers)
For example, if I put in the stats of
{ Hits:1,
Walks:2,
AB:2,
Bavg:0.500 }
for day one, and
{ Hits:2,
Walks:2,
AB:6,
Bavg:0.333 }
and then try to get the totals, Hits, Walks and ABs are simple: SUM. But Bavg has to be a formula (Hits/AB). Other (non baseball specific) metrics, like vertical jump or 60 yard times are pretty straight forward too: MAX or MIN.
The user should be able to add his own metrics. So he has to be able to input a formula for the calculation. This calculation is stored in the database table as a column next to the metric name, id, type (cumulative, max, min, calculated).
The php script that produces the html table is setup to where it is dynamic to what ever metrics and however many metrics the query sends (the metrics can be part of several categories).
In the end result, I want to replace all values of metrics of the calculated types with their formula.
My approach is to get the formula from the mysql table as a string. Then, in php, convert the string that could be Strikes/Pitches*100 into $strikes/$pitches*100 - assuming that is something I could put into an php sql query. However, before it is put into the $strikes/$pitches*100 format, I need to have those variables available to define them. That I'm sure I can do, but I'll cross that bridge when I get there.
Could you point me in the right direction of either how to accomplish that or tell where or what to search for? I'm sure this has been done before somewhere...
I highly appreciate any help!
Clemens
The correct solution has already been given by Vilx-. So I will give you a not-so-correct, dirty solution.
As the correct solution states, eval is evil. But, it is also easy and powerful (as evil often is -- but I'll spare you my "hhh join the Dark Side, Luke hhh" spiel).
And since what you need to do is a very small and simple subset of SQL, you actually can use eval() - or even better its SQL equivalent, plugging user supplied code into a SQL query - as long as you do it safely; and with small requirements, this is possible.
(In the general case it absolutely is not. So keep it in mind - this solution is easy, quick, but does not scale. If the program grows beyond a certain complexity, you'll have to adopt Vilx-'s solution anyway).
You can verify the user-supplied string to ensure that, while it might not be syntactically or logically correct, at least it won't execute arbitrary code.
This is okay:
SELECT SUM(pitch)+AVG(runs)-5*(MIN(balls)) /* or whatever */
and this, while wrong, is harmless too:
SELECT SUM(pitch +
but this absolutely is not (mandatory XKCD reference):
SELECT "Robert'); DROP TABLE Students;--
and this is even worse, since the above would not work on a standard MySQL (that doesn't allow multiple statements by default), while this would:
SELECT SLEEP(3600)
So how do we tell the harmless from the harmful? We start by defining placeholder variables that you can use in your formula. Let us say they will always be in the form {name}. So, get those - which we know to be safe - out of the formula:
$verify = preg_replace('#{[a-z]+}#', '', $formula);
Then, arithmetic operators are also removed; they are safe too.
$verify = preg_replace('#[+*/-]#', '', $verify);
Then numbers and things that look like numbers:
$verify = preg_replace('#[0-9.]+#', '', $verify);
Finally a certain number of functions you trust. The arguments of these functions may have been variables or combinations of variables, and therefore they've been deleted and the function has now no arguments - say, SUM() - or you had nested functions or nested parentheses, like SUM (SUM( ()())).
You keep replacing () (with any spaces inside) with a single space until the replacement no longer finds anything:
for ($old = ''; $old !== $verify; $verify = preg_replace('#\\s*\\(\\s*\\)\\s*#', ' ', $verify)) {
$old = $verify;
}
Now you remove from the result the occurrences of any function you trust, as an entire word:
for ($old = ''; $old !== $verify; $verify = preg_replace('#\\b(SUM|AVG|MIN|MAX)\\b#', ' ', $verify)) {
$old = $verify;
}
The last two steps have to be merged because you might have both nested parentheses and functions, interfering with one another:
for ($old = ''; $old !== $verify; $verify = preg_replace('#\\s*(\\b(SUM|AVG|MIN|MAX)\\b|\\(\\s*\\))\\s*#', ' ', $verify)) {
$old = $verify;
}
And at this point, if you are left with nothing, it means the original string was harmless (at worst it could have triggered a division by 0, or a SQL exception if it was syntactically wrong). If instead you're left with something, the formula is rejected and never saved in the database.
When you have a valid formula, you can replace variables using preg_replace_callback() so that they become numbers (or names of columns). You're left with what is either valid, innocuous SQL code, or incorrect SQL code. You can plug this directly into the query, after wrapping it in try/catch to intercept any PDOException or division by zero.
I'll assume that the requirement is indeed to allow the user to enter arbitrary formulas. As noted in the comments, this is indeed no small task, so if you can settle for something less, I'd advise doing so. But, assuming that nothing less will do, let's see what can be done.
The simplest idea is, of course, to use PHP's eval() function. You can execute arbitrary PHP code from a string. All you need to do is to set up all necessary variables beforehand and grab the return value.
It does have drawbacks however. The biggest one is security. You're essentially executing arbitrary user-supplied code on your server. And it can do ANYTHING that your own code can. Access files, use your database connections, change variables, whatever. Unless you completely trust your users, this is a security disaster.
Also syntax or runtime errors can throw off the rest of your script. And eval() is pretty slow too, since it has to parse the code every time. Maybe not a big deal in your particular case, but worth keeping an eye on.
All in all, in every language that has an eval() function, it is almost universally considered evil and to be avoided at all costs.
So what's the alternative? Well, a dedicated formula parser/executor would be nice. I've written one a few times, but it's far from trivial. The job is easier if the formula is written in Polish notation or Reverse Polish Notation, but those are a pain to write unless you've practiced. For normal formulas, take a look at the Shunting Yard Algorithm. It's straightforward enough and can be easily adapted to functions and whatnot. But it's still fairly tedious.
So, unless you want to do it as a fun challenge, look for a library that has already done it. There seem to be a bunch of them out there. Search for something along the lines of "arithmetic expression parser library php".

Mixing other sources of random numbers with ones generated by /dev/urandom

Further to my question here, I'll be using the random_compat polyfill (which uses /dev/urandom) to generate random numbers in the 1 to 10,000,000 range.
I do realise, that all things being correct with how I code my project, the above tools should produce good (as in random/secure etc) data. However, I'd like to add extra sources of randomness into the mix - just in case 6 months down the line I read there is patch available for my specific OS version to fix a major bug in /dev/urandom (or any other issue).
So, I was thinking I can get numbers from random.org and fourmilab.ch/hotbits
An alternative source would be some logs from a web site I operate - timed to the microsecond, if I ignore the date/time part and just take the microseconds - this has in effect been generated by when humans decide to click on a link. I know this may be classed as haphazard rather than random, but would it be good for my use?
Edit re timestamp logs - will use PHP microtime() which will creaet a log like:
0.**832742**00 1438282477
0.**57241**000 1438282483
0.**437752**00 1438282538
0.**622097**00 1438282572
I will just use the bolded portion.
So let's say I take two sources of extra random numbers, A and B, and the output of /dev/urandom, call that U and set ranges as follows:
A and B are 1 - 500,000
U is 1 - 9,000,000
Final random number is A+B+U
I will be needing several million final numbers between 1 and 10,000,000
But the pool of A and B numbers will only contain a few thousand, but I think by using prime number amounts I can stretch that into millions of A&B combinations like so
// this pool will be integers from two sources and contain a larger prime number
// of members instead of the 7 & 11 here - this sequence repeats at 77
$numbers = array("One","Two","Three","Four","Five","Six","Seven");
$colors = array("Silver","Gray","Black","Red","Maroon","Yellow","Olive","Lime","Green","Aqua","Orange");
$ni=0;
$ci=0;
for ($i=0;$i<$num_numbers_required;$i++)
{
$offset = $numbers[$ni] + $colors[$ci];
if ($ni==6) // reset at prime num 7
$ni=0;
else
$ni++;
if ($ci==10) // reset at prime num 11
$ci=0;
else
$ci++;
}
Does this plan make sense - is there any possibility I can actually make my end result less secure by doing all this? And what of my idea to use timestamp data?
Thanks in advance.
I would suggest reading RFC4086, section 5. Basically it talks about how to "mix" different entropy sources without compromising security or bias.
In short, you need a "mixing function". You can do this with xor, where you simply set the result to the xor of the inputs: result = A xor B.
The problem with xor is that if the numbers are correlated in any way, it can introduce strong bias into the result. For example, if bits 1-4 of A and B are the current timestamp, then the result's first 4 bits will always be 0.
Instead, you can use a stronger mixing function based on a cryptographic hash function. So instead of A xor B you can do HMAC-SHA256(A, B). This is slower, but also prevents any correlation from biasing the result.
This is the strategy that I used in RandomLib. I did this because not every system has every method of generation. So I pull as many methods as I can, and mix them strongly. That way the result is never weaker than the strongest method.
HOWEVER, I would ask why. If /dev/urandom is available, you're not going to get better than it. The reason is simple, even if you call random.org for more entropy, your call is encrypted using random keys generated from /dev/urandom. Meaning if an attacker can compromise /dev/urandom, your server is toast and you will be spinning your wheels trying to make it better.
Instead, simply use /dev/urandom and keep your OS updated...

Making a true random number generator?

I'm making a little web application which needs to randomize stuff.
Just a little example of what's it gonna have: returns a random number between and 10 to the user.
I'm planning to do it using Javascript and jQuery (the calculator itself).
My question is: How can I make its functions truly random and not pseudo-random? Is the PHP function perhaps more random than the Javascript ones?
For the sake of the question, let's say what I want is a true random number between X and Y.
No function that you call will be "truly random". They are all PRNGs; most PRNGs are, in fact, quite good and I'd be surprised if they were inadequate for your application. Also, while a few PRNGs are notorious for their small period (Java's Random comes to mind), every modern language that I know of—including JavaScript, PHP, and (with its crypto packages) Java—have very good PRNGs.
The best way to collect "more random" data is to obtain it from an external random source. One possibility is to ask the user to move the mouse around in a window for a certain period of time, collecting the mouse coordinates at regular intervals. Some military, banking, and other high-security systems use hardware like thermal noise sensors to obtain data that is as close to random as one can hope; however, such hardware support is not going to be available to a web app.
Note that hacks like using the system clock are no better than PRNGs; most PRNGs initialize their seed with a combination of such data.
You have not understood the Matrix movies. ;) One function is not "more random" than one in another language. All functions are equally pseudo random. "Pseudo" means that the computer cannot, by definition, pull a random number out of thin air. It just can't. A computer computes, strictly, based on rules, accurately. There's no randomness anywhere in the system. For all its supposed power, randomness is the one thing a computer simply cannot do (that, and making coffee).
For true randomness, you need an outside, natural source. Like measuring atomic decay, or some such thing which is not predictable and truly random. Anything else is just pseudo-randomness, which may be of varying quality.
Good PRNGs try to collect "outside interference" in an entropy pool; e.g. Linux' /dev/random takes into account system driver "noise" which may be based on "random" packets hitting the ethernet port, or the user's mouse movements. How truly random that is is debatable, but it's very very hard to predict at least and suitably random for most purposes.
I don't think there's any way to remove the deterministic aspect of randomness from a program completely. You can do all you want to minimize, mitigate, and obfuscate whatever process you're using to "pull numbers from a hat", but you can never truly make it perfectly random.
You can fashion out a process with sufficient detail to make it practically random, but true random may be impossible.
While you can't achive true random with your code in php you can use random.org API. You could connect through curl in php or through ajax in javascript.
They are using atmospheric noise as a random seed as far as i know.
It is not possible to generate truly random variables on a computer. However, you may improve standard generators. Suppose, you have two basic generators. You create a table and fill it with values from the first generator. Then, if you want to get a number, the second one generates an index and returns correspondent value from the table. This value is then replaced with the new one... I forgot how this generator is called... Hope, it helps.
P.S. Sorry for my English.
My suggestion is that you generate a binary random string by encrypting the local time and date by an encryption algorithm. In this case try to gather all possible sources of "random" data, and load these both as input message and input key.
As you have seen from previous answers, above, your use and your requirements of your random data are important. A number is "random" if it is difficult or impossible for your application to guess its value in advance. Note that the same number source may be considered random for some applications while they are not random in others. Evidently you will have serious problems in case you need high quality random numbers for a demanding application.
TRNG98 True Random Numbers

Countdown from 100 to 0 with no assignations in PHP

I'm having a problem figuring out an exercise my boss gave me as personal enrichment of knowledge sort of. Unfortunately I have been looking for a glimpse of answer everywhere on the web with no success and I'm now turning to you fellow programmers.
What he asked me to do is make a simple countdown from 100 to 0 (it can be displayed all at once). Easy enough so far eh? Just make a simple for loop or a while even. The problem here is that he asks that there are no assignations in the code, ex: $[var]=[value].
How can one even make a loop with no assignations? Since $i--; is the equivalent to $i = $i - 1; how can we count down?
I'm baffled by this problem which I can't resolve, I really want to find the answer as I am very curious on how this can be done.
Help is kindly appreciated.
Edit
Take note that this problem is kind of a 2 part where as the first part is to make it work 100 to 0, and in the second part x to 0 where x is inputted by the user
I will give you some hints, writing the exact code is an exercise for you.
No assignments, so you must have some type that can hold multiple values. Primitive types like integers cannot be used directly here, think of arrays.
Now, you cannot use variables, so using a loop and then printing the values is disallowed. Use one of the many ways to print an array (or the result of operating on an array).
The range() function does not accept a negative step value, you need to reverse the numbers.
My guess would be that your boss is trying to encourage you towards recursion rather than iteration. Recursion, as a technique, works extremely well for certain problems, and is an excellent tool to have in your armoury as a programmer.
Without giving everything away, try experimenting with defining a function that takes an argument - the "countdown" number -- and then calls itself in some way. You'll also need to separately kick it off by calling it once you've defined it.
Bear in mind that recursion must have some kind of termination defined, otherwise things can go very wrong. Here's an example of something going very wrong to get you started: :D
<?php
function infinity() {
print "Whoah.";
infinity();
}
infinity();

Speeding up levenshtein / similar_text in PHP

I am currently using similar_text to compare a string against a list of ~50,000 which works although due to the number of comparisons it's very slow. It takes around 11 minutes to compare ~500 unique strings.
Before running this I do check the databases to see whether it has been processed in the past so everytime after the inital run it's close to instant.
I'm sure using levenshtein would be slightly faster and the LevenshteinDistance function someone posted in the manual looks interesting. Am I missing something that could make this significantly faster?
In the end, both levenshtein and similar_text were both too slow with the number of strings it had to go through, even with lots of checks and only using them one of them as a last resort.
As an experiment, I ported some of the code to C# to see how much faster it would be over interperated code. It ran in about 3 minutes with the same dataset.
Next I added an extra field to the table and used the double metaphone PECL extension to generate keys for each row. The results were good although since some included numbers this caused duplicates. I guess I could then have run each one through the above functions but decided not to.
In the end I opted for the simplest approach, MySQLs full text which worked very well. Occasionally there are mistakes although they are easy to detect and correct. Also it runs very fast, in around 3-4 seconds.
Perhaps you could 'short-circuit' some checks by first comparing your string for an exact match (and by first comparing if length identical), and if it is skip the more expensive similar_text call.
As #jason noted, an O(N^3) algorithm is never going to be a good choice.
When using levenshtein automaton (automaton that matches a string with distance k) you can do a check for matching in O(n), where n is the length of the string you are checking. Constructing the automaton will take O(kn), where k is max distance and n length of the base string.

Categories