PHP rand vs mt_rand vs openssl_random_pseudo_bytes - php

I want to generate a random string and was doing some research and found the following link:
http://golearnphp.com/php-rand-vs-mt_rand-and-openssl_random_pseudo_bytes/
function generateRandom($length) {
$validCharacters = 'abcdefghijklmnopqrstuvwxyz0123456789';
$myKeeper = '';
for ($n = 1; $n < $length; $n++) {
$whichCharacter = rand(0, strlen($validCharacters) - 1);
$myKeeper .= $validCharacters{$whichCharacter};
}
return $myKeeper;
}
function generateRandomdMT($length) {
$validCharacters = 'abcdefghijklmnopqrstuvwxyz0123456789';
$myKeeper = '';
for ($n = 1; $n < $length; $n++) {
$whichCharacter = mt_rand(0, strlen($validCharacters) - 1);
$myKeeper .= $validCharacters{$whichCharacter};
}
return $myKeeper;
}
$start = microtime(true);
echo htmlentities(generateRandom(100000));
var_dump(microtime(true) - $start);
$start = microtime(true);
echo htmlentities(generateRandomdMT(100000));
var_dump(microtime(true) - $start);
$start = microtime(true);
echo htmlentities(substr(base64_encode(openssl_random_pseudo_bytes(100000)), 0, 100000));
var_dump(microtime(true) - $start);
In the post the writer is saying that openssl_random_pseudo_bytes is significant faster then the other two. Is this true? Is openssl_random_pseudo_bytes really that much faster? Is that the correct way to test the "fastness" of functions?

openssl_random_pseudo_bytes created to be crypto strong(check the second param). Rand is old rand function with small period of repeating. MT_Rand is better than rand but not supposed to be used by crypto systems.
I bet that the difference between execution time do not impact on your application.
Also. Those functions return different results. First two return string with 36 possible letters. And third one returns string with 64 possible symbols. Result of two first function is shorter than third one.
If you are making optimization to speed up your application first thing that you should to know: how to profile your code.

In the post the writer is saying that openssl_random_pseudo_bytes is significant faster then the other two. Is this true?
In normal situations mt_rand() is significantly faster than openssl_random_pseudo_bytes().
It's only slower in the test code you've posted because you are comparing apples and oranges. For rand() and mt_rand() you are using complex functions which build up a string one byte at a time, whereas for openssl_random_pseudo_bytes() you're using the raw binary stream it produces with base64_encode() which is going to be much faster.
If you could get a raw binary stream out of mt_rand() or rand(), or a sequence of numbers 0 to 63 from openssl_random_pseudo_bytes(), you could do an apples to apples comparison.
In my testing, I found mt_rand() about 4 times as fast as openssl_random_pseudo_bytes(4) when I used unpack('V', openssl_random_pseudo_bytes(4) & "\xff\xff\xff\x7f") in order to get an equivalent output to mt_rand(). However this is still technically an apples to oranges situation because I'm doing additional processing on one in order to match it to the other, just in the opposite direction to you.

The time you asked this question, there was a bug report here > https://bugs.php.net/bug.php?id=70014 (php 5.6.10) It seems to be fixed in new versions of PHP.
My experience using it has always been unnecessary, I prefer Mt_Rand() but if you are generating random values for encryption purposes like I am doing, then do not use it, you should use random_bytes() ref. https://www.php.net/manual/en/function.random-bytes.php

Related

PHP: How to raise number to (tiny) fractional exponent?

I'm doing a calculation in PHP using bcmath, and need to raise e by a fractional exponent. Unfortunately, bcpow() only accepts integer exponents. The exponent is typically higher precision than a float will allow, so normal arithmetic functions won't cut it.
For example:
$e = exp(1);
$pow = "0.000000000000000000108420217248550443400745280086994171142578125";
$result = bcpow($e, $pow);
Result is "1" with the error, "bc math warning: non-zero scale in exponent".
Is there another function I can use instead of bcpow()?
Your best bet is probably to use the Taylor series expansion. As you noted, PHP's bcpow is limited to raising to integer exponentiation.
So what you can do is roll your own bc factorial function and use the wiki page to implement a Taylor series expansion of the exponential function.
function bcfac($num) {
if ($num==0) return 1;
$result = '1';
for ( ; $num > 0; $num--)
$result = bcmul($result,$num);
return $result;
}
$mysum = '0';
for ($i=0; $i<300; $i++) {
$mysum = bcadd($mysum, bcdiv(bcpow($pow,$i), bcfac($i)) );
}
print $mysum;
Obviously, the $i<300 is an approximation for infinity... You can change it to suit your performance needs.
With $i=20, I got
1.00000000000000000010842021724855044340662275184110560868263421994092888869270293594926619547803962155136242752708629105688492780863293090291376157887898519458498571566021915144483905034693109606778068801680332504212458366799913406541920812216634834265692913062346724688397654924947370526356787052264726969653983148004800229537555582281617497990286595977830803702329470381960270717424849203303593850108090101578510305396615293917807977774686848422213799049363135722460179809890014584148659937665374616
This is comforting since that small of an exponent should yield something really close to 1.0.
Old question, but people might still be interested nonetheless.
So Kevin got the right idea with the Taylor-polynomial, but when you derive your algorithm from it directly, you can get into trouble, mainly your code gets slow for long input-strings when using large cut-off values for $i.
Here is why:
At every step, by which I mean with each new $i, the code calls bcfac($i). Everytime bcfac is called it performs $i-1 calculations. And $i goes all the way up to 299... that's almost 45000 operations! Not your quick'n'easy floating point operations, but slow BC-string-operations - if you set bcscale(100) your bcmul has to handle up to 10000 pairs of chars!
Also bcpow slows down with increasing $i, too. Not as much as bcfac, because it propably uses something akin to the square-and-multiply method, but it still adds something.
Overall the time required grows quadraticly with the number of polynomial terms computed.
So... what to do?
Here's a tip:
Whenever you handle polynomials, especially Taylor-polynomials, use the Horner method.
It converts this: exp(x) = x^0/0! + x^1/1! + x^2/2! + x^3/3! + ...
...into that: exp(x) = ((( ... )*x/3+1 )*x/2+1 )*x/1+1
And suddenly you don't need any powers or factorials at all!
function bc_exp($number) {
$result = 1;
for ($i=299; $i>0; $i--)
$result = bcadd(bcmul(bcdiv($result, $i), $number), 1);
return $result;
}
This needs only 3 bc-operations for each step, no matter what $i is.
With a starting value of $i=299 (to calculate exp with the same precision as kevin's code does) we now only need 897 bc-operations, compared to more than 45000.
Even using 30 as cut-off instead of 300, we now only need 87 bc-operations while the other code still needs 822 for the factorials alone.
Horner's Method saving the day again!
Some other thoughts:
1) Kevin's code would propably crash with input="0", depending on how bcmath handles errors, because the code trys bcpow(0,0) at the first step ($i=0).
2) Larger exponents require longer polynomials and therefore more iterations, e.g. bc_exp(300) will give a wrong answer, even with $i=299, whyle something like bc_exp(3) will work fine and dandy.
Each term adds x^n/n! to the result, so this term has to get small before the polynomial can start to converge. Now compare two consecutive terms:
( x^(n+1)/(n+1)! ) / ( x^n/n! ) = x/n
Each summand is larger than the one before by a factor of x/n (which we used via the Horner method), so in order for x^(n+1)/(n+1)! to get small x/n has to get small as well, which is only the case when n>x.
Inconclusio: As long as the number of iterations is smaller than the input value, the result will diverge. Only when you add steps until your number of iterations gets larger than the input, the algorithm starts to slowly converge.
In order to reach results that can satisfie someone who is willing to use bcmath, your $i needs to be significantly larger then your $number. And that's a huge proplem when you try to calculate stuff like e^346674567801
A solution is to divide the input into its integer part and its fraction part.
Than use bcpow on the integer part and bc_exp on the fraction part, which now converges from the get-go since the fraction part is smaller than 1. In the end multiply the results.
e^x = e^(intpart+fracpart) = e^intpart * e^fracpart = bcpow(e,intpart) * bc_exp(fracpart)
You could even implement it directly into the code above:
function bc_exp2($number) {
$parts = explode (".", $number);
$fracpart = "0.".$parts[1];
$result = 1;
for ($i=299; $i>0; $i--)
$result = bcadd(bcmul(bcdiv($result, $i), $fracpart), 1);
$result = bcmul(bcpow(exp(1), $parts[0]), $result);
return $result;
}
Note that exp(1) gives you a floating-point number which propably won't satisfy your needs as a bcmath user. You might want to use a value for e that is more accurate, in accordance with your bcscale setting.
3) Talking about numbers of iterations: 300 will be overkill in most situations while in some others it might not even be enough. An algorithm that takes your bcscale and $number and calculates the number of required iterations would be nice. Alraedy got some ideas involving log(n!), but nothing concrete yet.
4) To use this method with an arbitrary base you can use a^x = e^(x*ln(a)).
You might want to divide x into its intpart and fracpart before using bc_exp (instead of doing that within bc_exp2) to avoid unneccessary function calls.
function bc_pow2($base,$exponent) {
$parts = explode (".", $exponent);
if ($parts[1] == 0){
$result = bcpow($base,$parts[0]);
else $result = bcmul(bc_exp(bcmul(bc_ln($base), "0.".$parts[1]), bcpow($base,$parts[0]);
return result;
}
Now we only need to program bc_ln. We can use the same strategy as above:
Take the Taylor-polynomial of the natural logarithm function. (since ln(0) isn't defined, take 1 as developement point instead)
Use Horner's method to drasticly improve performance.
Turn the result into a loop of bc-operations.
Also make use of ln(x) = -ln(1/x) when handling x > 1, to guarantee convergence.
usefull functions(don't forget to set bcscale() before using them)
function bc_fact($f){return $f==1?1:bcmul($f,bc_fact(bcsub($f, '1')));}
function bc_exp($x,$L=50){$r=bcadd('1.0',$x);for($i=0;$i<$L;$i++){$r=bcadd($r,bcdiv(bcpow($x,$i+2),bc_fact($i+2)));}return $r;}#e^x
function bc_ln($x,$L=50){$r=0;for($i=0;$i<$L;$i++){$p=1+$i*2;$r = bcadd(bcmul(bcdiv("1.0",$p),bcpow(bcdiv(bcsub($x,"1.0"),bcadd($x,"1.0")),$p)),$r);}return bcmul("2.0", $r);}#2*Sum((1/(2i+1))*(((x-1)/x+1)^(2i+1)))
function bc_pow($x,$p){return bc_exp(bcmul((bc_ln(($x))), $p));}

How to get log() of a very big number (PHP)?

I've looked at php-big numbers, BC Math, and GMP for dealing with very big numbers in php. But none seem to have a function equivilent to php's log(). For example I want to do this:
$result = log($bigNumber, 2);
Would anyone know of an alternate way to get the log base 2 of a arbitray precision point number in php? Maybe Ive missed a function, or library, or formula.
edit: php-bignumbers seems to have a log base 10 function only log10()
In general if you want to implement your high precision log own calculation, I'd suggest 1st use the basic features of logarithm:
log_a(x) = log_b(x) / log_b(a) |=> thus you can recalulate logarith to any base
log(x*y) = log(x) + log(y)
log(a**n) = n*log(a)
where log_a(x) - meaning logarithm to the base a of x; log means natural logarithm
So log(1000000000000000000000.123) = 21*log(1.000000000000000000000123)
and for high precision of log(1+x)
use algorithm referenced at
http://en.wikipedia.org/wiki/Natural_logarithm#High_precision
One solution combining the suggestions so far would be to use this formula:
log2($num) = log10($num) / log10(2)
in conjunction with php-big numbers since it has a pre-made log10 function.
eg, after installing the php-big numbers library, use:
$log2 = log10($bigNum) / log10(2);
Personally I've decided to use different math/logic so as to not need the log function, and just using bcmath for the big numbers.
One of the great things about base 2 is that counting and shifting become part of the tool set.
So one way to get a 'log2' of a number is to convert it to a binary string and count the bits.
You can accomplish this equivalently by dividing by 2 in a loop. But it seems to me that counting would be more efficient.
gmp_scan0 and gmp_scan1 can be used if you are counting from the right. But you'd have to somehow convert the mixed bits to all ones and zeroes.
But using gmp_strval(num, 2), you can produce a string and do a strpos on it.
if the whole value is being converted, you can do a (strlen - 1) on it.
Obviously this only works when you want an integer log.
I've had a very similar problem just recently.. and so I just scaled the number considerably in order to use the inbuild log to find the fractional part.. (I prefere the log10 for some reason.. don't ask... people are strange, me too)
I hope this is selfexplanatory enough..
it returns a float value (since that's what I needed)
function gmp_log($num, $base=10, $full=true)
{
if($base == 10)
$string = gmp_strval($num);
else
$string = gmp_strval($num,$base);
$intpart = strlen($string)-1;
if(!$full)
return $intpart;
if($base ==10)
{
$string = substr_replace($string, ".", 1, 0);
$number = floatval($string);
$lg = $intpart + log10($number);
return $lg;
}
else
{
$string = gmp_strval($num);
$intpart = strlen($string)-1;
$string = substr_replace($string, ".", 1, 0);
$number = floatval($string);
$lg = $intpart + log10($number);
$lb = $lg / log10($base);
return $lb;
}
}
it's quick, it's dirty... but it works well enough to get the log of some RSA sized integers ;)
usage is straight forward as well
$N = gmp_init("11002930366353704069");
echo gmp_log($N,10)."\n";
echo gmp_log($N,10, false)."\n";
echo gmp_log($N,2)."\n";
echo gmp_log($N,16)."\n";
returns
19.041508364472
19
63.254521604973
15.813630401243

Generating unique 6 digit code

I'm generating a 6 digit code from the following characters. These will be used to stamp on stickers.
They will be generated in batches of 10k or less (before printing) and I don't envisage there will ever be more than 1-2 million total (probably much less).
After I generate the batches of codes, I'll check the MySQL database of existing codes to ensure there are no duplicates.
// exclude problem chars: B8G6I1l0OQDS5Z2
$characters = 'ACEFHJKMNPRTUVWXY4937';
$string = '';
for ($i = 0; $i < 6; $i++) {
$string .= $characters[rand(0, strlen($characters) - 1)];
}
return $string;
Is this a solid approach to generating the code?
How many possible permutations would there be? (6 Digit code from pool of 21 characters). Sorry math isn't my strong point
21^6 = 85766121 possibilities.
Using a DB and storing used values is bad. If you want to fake randomness you can use the following:
Reduce to 19 possible numbers and make use of the fact that groups of order p^k where p is an odd prime are always cyclic.
Take the group of order 7^19, using a generator co-prime to 7^19 (I'll pick 13^11, you can choose anything not divisible by 7).
Then the following works:
$previous = 0;
function generator($previous)
{
$generator = pow(13,11);
$modulus = pow(7,19); //int might be too small
$possibleChars = "ACEFHJKMNPRTUVWXY49";
$previous = ($previous + $generator) % $modulus;
$output='';
$temp = $previous;
for($i = 0; $i < 6; $i++) {
$output += $possibleChars[$temp % 19];
$temp = $temp / 19;
}
return $output;
}
It will cycle through all possible values and look a little random unless they go digging. An even safer alternative would be multiplicative groups but I forget my math already :(
There is a lot of possible combination with or without repetition so your logic would be sufficient
Collision would be frequent because you are using rand see str_shuffle and randomness.
Change rand to mt_rand
Use fast storage like memcached or redis not MySQL when checking
Total Possibility
21 ^ 6 = 85,766,121
85,766,121 should be ok , To add database to this generation try:
Example
$prifix = "stamp.";
$cache = new Memcache();
$cache->addserver("127.0.0.1");
$stamp = myRand(6);
while($cache->get($prifix . $stamp)) {
$stamp = myRand(6);
}
echo $stamp;
Function Used
function myRand($no, $str = "", $chr = 'ACEFHJKMNPRTUVWXY4937') {
$length = strlen($chr);
while($no --) {
$str .= $chr{mt_rand(0, $length- 1)};
}
return $str;
}
as Baba said generating a string on the fly will result in tons of collisions. the closer you will go to 80 millions already generated ones the harder it will became to get an available string
another solution could be to generate all possible combinations once, and store each of them in the database already, with some boolean column field that marks if a row/token is already used or not
then to get one of them
SELECT * FROM tokens WHERE tokenIsUsed = 0 ORDER BY RAND() LIMIT 0,1
and then mark it as already used
UPDATE tokens SET tokenIsUsed = 1 WHERE token = ...
You would have 21 ^ 6 codes = 85 766 121 ~ 85.8 million codes!
To generate them all (which would take some time), look at the selected answer to this question: algorithm that will take numbers or words and find all possible combinations.
I had the same problem, and I found very impressive open source solution:
http://www.hashids.org/php/
You can take and use it, also it's worth it to look in it's source code to understand what's happening under the hood.
Or... you can encode username+datetime in md5 and save to database, this for sure will generate an unique code ;)

openssl random pseudo bytes - repeatable?

I found this openssl_random_pseudo_bytes functions from php.net.
function generate_password($length = 24) {
if(function_exists('openssl_random_pseudo_bytes')) {
$password = base64_encode(openssl_random_pseudo_bytes($length, $strong));
if($strong == TRUE)
return substr($password, 0, $length); //base64 is about 33% longer, so we need to truncate the result
}
# fallback to mt_rand if php < 5.3 or no openssl available
$characters = '0123456789';
$characters .= 'ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz/+';
$charactersLength = strlen($characters)-1;
$password = '';
# select some random characters
for ($i = 0; $i < $length; $i++) {
$password .= $characters[mt_rand(0, $charactersLength)];
}
return $password;
}
But I want to be sure that whether the value it generates is repeatable or not? I am looking for a result which is not repeatable.
I have been told that the value from mt_rand() is not repeatable? but a random number should be repeatable as it self-explained already - mt_rand, doesn't it?
EDIT:
for instance, I just tested it and it has generated W8hhkS+ngIl7DxxFDxEx6gSn. but if will generate the same value again in the future - then it is repeatable.
random_pseudo_bytes is not repeatable. It is true that any output of finite length must eventually repeat, for example if you specify a length of 1 byte, then there is only 256 possible strings, so you must get the same string you had before no later than after 256 attempts (and likely quite a bit sooner).
But you're talking practically and not mathemathically, and have a default length of 24.
Yes, picking random 24-byte strings will eventually give you a string that you had before, but that's only true in the mathemathical universe. In the real physical universe, there's 6277101735386680763835789423207666416102355444464034512896 possible such strings, which means that even if you generated billions of passwords this way every second, you'd still not be likely to get the same string twice in a million years.
Any random function, including the one above, is repeatable.
Perhaps you're looking for the PHP function uniqid?

Random number/letter value

So I was wonder what are some good/preferred methods for generating a 'hex-like' value in PHP? Preferably, I would want to restrict it to 5 characters long like such: 1e1f7
Currently this is what I am doing:
echo dechex(mt_rand(10000, 99999));
however this gives me values anywhere from 4-5 characters long, and I want to keep it at a consistent 4 or 5.
What are some ways to better generate something like this in PHP? Is there even a built in function?
Note: When I say 'hex-like' I really just mean a random combination of letters and numbers. There does not have to be a restriction on available letters.
Something simple like:
$length = 5;
$string = "";
while ($length > 0) {
$string .= dechex(mt_rand(0,15));
$length -= 1;
}
return $string;
(untested)
Or fix your mt_rand range to: mt_rand(65535, 1048575) (10000-fffff in hex) or if you like tinfoil hats: mt_rand(hexdec("10000"), hexdec("ffffff"))
The advantage of the while-loop approach is that it works for arbitrarily long strings. If you'd want 32 random characters you're well over the integer limit and a single mt_rand will not work.
If you really just want random stuff, I'd propose:
$length = 5;
$string = "";
$characters = "abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ0123456789-=+!##$%^&*()[]"; // change to whatever characters you want
while ($length > 0) {
$string .= $characters[mt_rand(0,strlen($characters)-1)];
$length -= 1;
}
return $string;
(untested)
echo substr( base64_encode( mt_rand(1000, mt_getrandmax() ), 0, 5);
This uses more of the alphabet due to the base64, but remember that it will include upper and lower case letters along with numbers.
Why all the work sha1 is tested and evenly distributed:
substr(sha1(uniqid('moreentropyhere')),0,5);
I have used this to generate millions and millions of uniq uids for sharding tables, no collisions and remarkably evenly distributed regardless of the length you use...
you can even use binary form of sha1 hash for base 64:
base64_encode(sha1(uniqid('moreentropyhere'), true))
to limit characters, you can use a regex:
substr(preg_replace('~[^a-km-np-z2-9]~','',strtolower(base64_encode(sha1(uniqid(),true)))),0,6)
Here we limited 0,1,l (letter), and o (letter) from the string, trading a little entropy to prevent confusion (and service tickets) during entry for all ages...

Categories