What is the minimum value allowed for mt_rand()? Is it the same value for 32 bit and 64 bit machines? How could I generate a 32 bit integer using mt_rand() (note that it doesn't need to be highly random)?
BACKGROUND WHY I AM ASKING: I have a 64 bit development physical server and a 32 bit production VPS. Just realized the production server was not generating PKs spanning the full range. To figure out what is going on, I ran the following script. The 64 bit machine never (or at least I've never witnessed) matches, but the 32 bit matches about 50% of the time.
<?php
date_default_timezone_set('America/Los_Angeles');
ini_set('display_errors', 1);
error_reporting(E_ALL);
$count=0;
for ($i = 0; $i <= 10000; $i++) {
$rand=2147483648+mt_rand(-2147483647,2147483647); //Spans 1 to 4294967295 where 0 is reserved
if($rand==2147483649){$count++;}
}
echo('mt_getrandmax()='.mt_getrandmax().' count='.$count);
output
mt_getrandmax()=2147483647 count=5034
TL;DR: To get a random integer in the full range of possible integers, use:
function random_integer() {
$min = defined('PHP_INT_MIN') ? PHP_INT_MIN : (-PHP_INT_MAX-1);
return mt_rand($min, -1) + mt_rand(0, PHP_INT_MAX);
}
For PHP 7, you can use random_int().
Under the hood (1, 2), PHP is doing this:
$number = random_number_between_0_and_0x7FFFFFFF_using_Mersenne_Twister;
$number = $min + (($max - $min + 1.0) * ($number / (0x7FFFFFFF + 1.0)));
Notice $max - $min. When max is set to the top end and min is anything negative, an overflow occurs. Therefore, the maximum range is PHP_INT_MAX. If your maximum value is PHP_INT_MAX, then your minimum is necessarily 0.
Now for the back story. PHP implements the 32-bit Mersenne Twister algorithm. This gives us random integers between [0, and 2^31-1). If you ask for any other range, PHP scales that number using a simple binning function. That binning function includes a subtraction that can lead to overflow, and this problem.
Thus if you want to get a range larger than could be represented by an integer in PHP, you have to add intervals together, like so:
mt_rand(PHP_INT_MIN, -1) + mt_rand(0, PHP_INT_MAX);
Note that PHP_INT_MIN is available since PHP 7, so you'll need to calculate a suitable minimum for your environment before then.
As an aside, notice that 2^31-1 is what getrandmax() returns. People mistakenly believe that on a 64-bit machine getrandmax() will return 2^63-1. That's not true. getrandmax() returns the maximum integer the algorithm will return, which is always 2^31-1.
You can generate a 32 bit integer like this:
$rand = unpack("l", openssl_random_pseudo_bytes(4));
This is a problem that is noted in the PHP docs over here
This works fine on 64 bit Linux:
<?php
printf ("%08x\n", mt_rand (0, 0xFFFFFFFF));
?>
but on our 32 bit Linux development server, it's always yielding 00000000.
On that same machine, this:
<?php
printf ("%08x\n", mt_rand (0, 0xFFFFFFF0));
?>
seems to always yield either 00000000 or a number in the range fffffff2 to ffffffff. This:
<?php
printf ("%08x\n", mt_rand (0, 0xFFFFFF00));
?>
gives numbers where the last two digits vary, and so on through at least 0xF0000000.
However, this:
<?php
printf ("%08x\n", mt_rand (0, 0x7FFFFFFF));
?>
works fine
A bug report is added here.
There has been no word whether PHP is fixing this yet.
In the meantime you can use the mt_rand function between the max_rand and you should be fine
Example Usage
$rand=mt_rand(1,2147483647)+mt_rand(0,2147483647);
I am trying to get the sum of 1 + 2 + ... + 1000000000, but I'm getting funny results in PHP and Node.js.
PHP
$sum = 0;
for($i = 0; $i <= 1000000000 ; $i++) {
$sum += $i;
}
printf("%s", number_format($sum, 0, "", "")); // 500000000067108992
Node.js
var sum = 0;
for (i = 0; i <= 1000000000; i++) {
sum += i ;
}
console.log(sum); // 500000000067109000
The correct answer can be calculated using
1 + 2 + ... + n = n(n+1)/2
Correct answer = 500000000500000000, so I decided to try another language.
GO
var sum , i int64
for i = 0 ; i <= 1000000000; i++ {
sum += i
}
fmt.Println(sum) // 500000000500000000
But it works fine! So what is wrong with my PHP and Node.js code?
Perhaps this a problem of interpreted languages, and that's why it works in a compiled language like Go? If so, would other interpreted languages such as Python and Perl have the same problem?
Python works:
>>> sum(x for x in xrange(1000000000 + 1))
500000000500000000
Or:
>>> sum(xrange(1000000000+1))
500000000500000000
Python's int auto promotes to a Python long which supports arbitrary precision. It will produce the correct answer on 32 or 64 bit platforms.
This can be seen by raising 2 to a power far greater than the bit width of the platform:
>>> 2**99
633825300114114700748351602688L
You can demonstrate (with Python) that the erroneous values you are getting in PHP is because PHP is promoting to a float when the values are greater than 2**32-1:
>>> int(sum(float(x) for x in xrange(1000000000+1)))
500000000067108992
Your Go code uses integer arithmetic with enough bits to give an exact answer. Never touched PHP or Node.js, but from the results I suspect the math is done using floating point numbers and should be thus expected not to be exact for numbers of this magnitude.
The reason is that the value of your integer variable sum exceeds the maximum value. And the sum you get is result of float-point arithmetic which involves rounding off. Since other answers did not mention the exact limits, I decided to post it.
The max integer value for PHP for:
32-bit version is 2147483647
64-bit version is 9223372036854775807
So it means either you are using 32 bit CPU or 32 bit OS or 32 bit compiled version of PHP. It can be found using PHP_INT_MAX. The sum would be calculated correctly if you do it on a 64 bit machine.
The max integer value in JavaScript is 9007199254740992. The largest exact integral value you can work with is 253 (taken from this question). The sum exceeds this limit.
If the integer value does not exceed these limits, then you are good. Otherwise you will have to look for arbitrary precision integer libraries.
Here is the answer in C, for completeness:
#include <stdio.h>
int main(void)
{
unsigned long long sum = 0, i;
for (i = 0; i <= 1000000000; i++) //one billion
sum += i;
printf("%llu\n", sum); //500000000500000000
return 0;
}
The key in this case is using C99's long long data type. It provides the biggest primitive storage C can manage and it runs really, really fast. The long long type will also work on most any 32 or 64-bit machine.
There is one caveat: compilers provided by Microsoft explicitly do not support the 14 year-old C99 standard, so getting this to run in Visual Studio is a crapshot.
My guess is that when the sum exceeds the capacity of a native int (231-1 = 2,147,483,647), Node.js and PHP switch to a floating point representation and you start getting round-off errors. A language like Go will probably try to stick with an integer form (e.g., 64-bit integers) as long as possible (if, indeed, it didn't start with that). Since the answer fits in a 64-bit integer, the computation is exact.
Perl script give us the expected result:
use warnings;
use strict;
my $sum = 0;
for(my $i = 0; $i <= 1_000_000_000; $i++) {
$sum += $i;
}
print $sum, "\n"; #<-- prints: 500000000500000000
The Answer to this is "surprisingly" simple:
First - as most of you might know - a 32-bit integer ranges from −2,147,483,648 to 2,147,483,647. So, what happens if PHP gets a result, that is LARGER than this?
Usually, one would expect a immediate "Overflow", causing 2,147,483,647 + 1 to turn into −2,147,483,648. However, that is NOT the case. IF PHP Encounters a larger number, it Returns FLOAT instead of INT.
If PHP encounters a number beyond the bounds of the integer type, it will be interpreted as a float instead. Also, an operation which results in a number beyond the bounds of the integer type will return a float instead.
http://php.net/manual/en/language.types.integer.php
This said, and knowing that PHP FLOAT implementation is following the IEEE 754 double precision Format, means, that PHP is able to deal with numbers upto 52 bit, without loosing precision. (On a 32-bit System)
So, at the Point, where your Sum hits 9,007,199,254,740,992 (which is 2^53) The Float value returned by the PHP Maths will no longer be precise enough.
E:\PHP>php -r "$x=bindec(\"100000000000000000000000000000000000000000000000000000\"); echo number_format($x,0);"
9,007,199,254,740,992
E:\PHP>php -r "$x=bindec(\"100000000000000000000000000000000000000000000000000001\"); echo number_format($x,0);"
9,007,199,254,740,992
E:\PHP>php -r "$x=bindec(\"100000000000000000000000000000000000000000000000000010\"); echo number_format($x,0);"
9,007,199,254,740,994
This example Shows the Point, where PHP is loosing precision. First, the last significatn bit will be dropped, causing the first 2 expressions to result in an equal number - which they aren't.
From NOW ON, the whole math will go wrong, when working with default data-types.
•Is it the same problem for other interpreted language such as Python or Perl?
I don't think so. I think this is a problem of languages that have no type-safety. While a Integer Overflow as mentioned above WILL happen in every language that uses fixed data types, the languages without type-safety might try to catch this with other datatypes. However, once they hit their "natural" (System-given) Border - they might return anything, but the right result.
However, each language may have different threadings for such a Scenario.
The other answers already explained what is happening here (floating point precision as usual).
One solution is to use an integer type big enough, or to hope the language will chose one if needed.
The other solution is to use a summation algorithm that knows about the precision problem and works around it. Below you find the same summation, first with with 64 bit integer, then with 64 bit floating point and then using floating point again, but with the Kahan summation algorithm.
Written in C#, but the same holds for other languages, too.
long sum1 = 0;
for (int i = 0; i <= 1000000000; i++)
{
sum1 += i ;
}
Console.WriteLine(sum1.ToString("N0"));
// 500.000.000.500.000.000
double sum2 = 0;
for (int i = 0; i <= 1000000000; i++)
{
sum2 += i ;
}
Console.WriteLine(sum2.ToString("N0"));
// 500.000.000.067.109.000
double sum3 = 0;
double error = 0;
for (int i = 0; i <= 1000000000; i++)
{
double corrected = i - error;
double temp = sum3 + corrected;
error = (temp - sum3) - corrected;
sum3 = temp;
}
Console.WriteLine(sum3.ToString("N0"));
//500.000.000.500.000.000
The Kahan summation gives a beautiful result. It does of course take a lot longer to compute. Whether you want to use it depends a) on your performance vs. precision needs, and b) how your language handles integer vs. floating point data types.
If you have 32-Bit PHP, you can calculate it with bc:
<?php
$value = 1000000000;
echo bcdiv( bcmul( $value, $value + 1 ), 2 );
//500000000500000000
In Javascript you have to use arbitrary number library, for example BigInteger:
var value = new BigInteger(1000000000);
console.log( value.multiply(value.add(1)).divide(2).toString());
//500000000500000000
Even with languages like Go and Java you will eventually have to use arbitrary number library, your number just happened to be small enough for 64-bit but too high for 32-bit.
In Ruby:
sum = 0
1.upto(1000000000).each{|i|
sum += i
}
puts sum
Prints 500000000500000000, but takes a good 4 minutes on my 2.6 GHz Intel i7.
Magnuss and Jaunty have a much more Ruby solution:
1.upto(1000000000).inject(:+)
To run a benchmark:
$ time ruby -e "puts 1.upto(1000000000).inject(:+)"
ruby -e "1.upto(1000000000).inject(:+)" 128.75s user 0.07s system 99% cpu 2:08.84 total
I use node-bigint for big integer stuff:
https://github.com/substack/node-bigint
var bigint = require('bigint');
var sum = bigint(0);
for(var i = 0; i <= 1000000000; i++) {
sum = sum.add(i);
}
console.log(sum);
It's not as quick as something that can use native 64-bit stuff for this exact test, but if you get into bigger numbers than 64-bit, it uses libgmp under the hood, which is one of the faster arbitrary precision libraries out there.
took ages in ruby, but gives the correct answer:
(1..1000000000).reduce(:+)
=> 500000000500000000
To get the correct result in php I think you'd need to use the BC math operators: http://php.net/manual/en/ref.bc.php
Here is the correct answer in Scala. You have to use Longs otherwise you overflow the number:
println((1L to 1000000000L).reduce(_ + _)) // prints 500000000500000000
There's actually a cool trick to this problem.
Assume it was 1-100 instead.
1 + 2 + 3 + 4 + ... + 50 +
100 + 99 + 98 + 97 + ... + 51
= (101 + 101 + 101 + 101 + ... + 101) = 101*50
Formula:
For N= 100:
Output = N/2*(N+1)
For N = 1e9:
Output = N/2*(N+1)
This is much faster than looping through all of that data. Your processor will thank you for it. And here is an interesting story regarding this very problem:
http://www.jimloy.com/algebra/gauss.htm
This gives the proper result in PHP by forcing the integer cast.
$sum = (int) $sum + $i;
Common Lisp is one of the fastest interpreted* languages and handles arbitrarily large integers correctly by default. This takes about 3 second with SBCL:
* (time (let ((sum 0)) (loop :for x :from 1 :to 1000000000 :do (incf sum x)) sum))
Evaluation took:
3.068 seconds of real time
3.064000 seconds of total run time (3.044000 user, 0.020000 system)
99.87% CPU
8,572,036,182 processor cycles
0 bytes consed
500000000500000000
By interpreted, I mean, I ran this code from the REPL, SBCL may have done some JITing internally to make it run fast, but the dynamic experience of running code immediately is the same.
I don't have enough reputation to comment on #postfuturist's Common Lisp answer, but it can be optimized to complete in ~500ms with SBCL 1.1.8 on my machine:
CL-USER> (compile nil '(lambda ()
(declare (optimize (speed 3) (space 0) (safety 0) (debug 0) (compilation-speed 0)))
(let ((sum 0))
(declare (type fixnum sum))
(loop for i from 1 to 1000000000 do (incf sum i))
sum)))
#<FUNCTION (LAMBDA ()) {1004B93CCB}>
NIL
NIL
CL-USER> (time (funcall *))
Evaluation took:
0.531 seconds of real time
0.531250 seconds of total run time (0.531250 user, 0.000000 system)
100.00% CPU
1,912,655,483 processor cycles
0 bytes consed
500000000500000000
Racket v 5.3.4 (MBP; time in ms):
> (time (for/sum ([x (in-range 1000000001)]) x))
cpu time: 2943 real time: 2954 gc time: 0
500000000500000000
Works fine in Rebol:
>> sum: 0
== 0
>> repeat i 1000000000 [sum: sum + i]
== 500000000500000000
>> type? sum
== integer!
This was using Rebol 3 which despite being 32 bit compiled it uses 64-bit integers (unlike Rebol 2 which used 32 bit integers)
I wanted to see what happened in CF Script
<cfscript>
ttl = 0;
for (i=0;i LTE 1000000000 ;i=i+1) {
ttl += i;
}
writeDump(ttl);
abort;
</cfscript>
I got 5.00000000067E+017
This was a pretty neat experiment. I'm fairly sure I could have coded this a bit better with more effort.
ActivePerl v5.10.1 on 32bit windows, intel core2duo 2.6:
$sum = 0;
for ($i = 0; $i <= 1000000000 ; $i++) {
$sum += $i;
}
print $sum."\n";
result: 5.00000000067109e+017 in 5 minutes.
With "use bigint" script worked for two hours, and would worked more, but I stopped it. Too slow.
For the sake of completeness, in Clojure (beautiful but not very efficient):
(reduce + (take 1000000000 (iterate inc 1))) ; => 500000000500000000
AWK:
BEGIN { s = 0; for (i = 1; i <= 1000000000; i++) s += i; print s }
produces the same wrong result as PHP:
500000000067108992
It seems AWK uses floating point when the numbers are really big, so at least the answer is the right order-of-magnitude.
Test runs:
$ awk 'BEGIN { s = 0; for (i = 1; i <= 100000000; i++) s += i; print s }'
5000000050000000
$ awk 'BEGIN { s = 0; for (i = 1; i <= 1000000000; i++) s += i; print s }'
500000000067108992
Category other interpreted language:
Tcl:
If using Tcl 8.4 or older it depends if it was compiled with 32 or 64 bit. (8.4 is end of life).
If using Tcl 8.5 or newer which has arbitrary big integers, it will display the correct result.
proc test limit {
for {set i 0} {$i < $limit} {incr i} {
incr result $i
}
return $result
}
test 1000000000
I put the test inside a proc to get it byte-compiled.
For the PHP code, the answer is here:
The size of an integer is platform-dependent, although a maximum value of about two billion is the usual value (that's 32 bits signed). 64-bit platforms usually have a maximum value of about 9E18. PHP does not support unsigned integers. Integer size can be determined using the constant PHP_INT_SIZE, and maximum value using the constant PHP_INT_MAX since PHP 4.4.0 and PHP 5.0.5.
Harbour:
proc Main()
local sum := 0, i
for i := 0 to 1000000000
sum += i
next
? sum
return
Results in 500000000500000000.
(on both windows/mingw/x86 and osx/clang/x64)
Erlang works:
from_sum(From,Max) ->
from_sum(From,Max,Max).
from_sum(From,Max,Sum) when From =:= Max ->
Sum;
from_sum(From,Max,Sum) when From =/= Max ->
from_sum(From+1,Max,Sum+From).
Results: 41> useless:from_sum(1,1000000000).
500000000500000000
Funny thing, PHP 5.5.1 gives 499999999500000000 (in ~ 30s), while Dart2Js gives 500000000067109000 (which is to be expected, since it's JS that gets executed). CLI Dart gives the right answer ... instantly.
Erlang gives the expected result too.
sum.erl:
-module(sum).
-export([iter_sum/2]).
iter_sum(Begin, End) -> iter_sum(Begin,End,0).
iter_sum(Current, End, Sum) when Current > End -> Sum;
iter_sum(Current, End, Sum) -> iter_sum(Current+1,End,Sum+Current).
And using it:
1> c(sum).
{ok,sum}
2> sum:iter_sum(1,1000000000).
500000000500000000
Smalltalk:
(1 to: 1000000000) inject: 0 into: [:subTotal :next | subTotal + next ].
"500000000500000000"
I have a class for computing the Luhn checksum for a number. It takes integer as an input and returns true or false to indicate validity or otherwise, or it throws an exception if an inappropriate data type is given as input.
The code is as follows (The full source is on GitHub):
class Luhn extends abstr\Prop implements iface\Prop
{
/**
* Test that the given data passes a Luhn check.
*
* #return bool True if the data passes the Luhn check
* #throws \InvalidArgumentException
* #see http://en.wikipedia.org/wiki/Luhn_algorithm
*/
public function isValid ()
{
$data = $this -> getData ();
$valid = false;
switch (gettype ($data))
{
case 'NULL' :
$valid = true;
break;
case 'integer' :
// Get the sequence of digits that make up the number under test
$digits = array_reverse (array_map ('intval', str_split ((string) $data)));
// Walk the array, doubling the value of every second digit
for ($i = 0, $count = count ($digits); $i < $count; $i++)
{
if ($i % 2)
{
// Double the digit
if (($digits [$i] *= 2) > 9)
{
// Handle the case where the doubled digit is over 9
$digits [$i] -= 10;
$digits [] = 1;
}
}
}
// The Luhn is valid if the sum of the digits ends in a 0
$valid = ((array_sum ($digits) % 10) === 0);
break;
default :
// An attempt was made to apply the check to an invalid data type
throw new \InvalidArgumentException (__CLASS__ . ': This property cannot be applied to data of type ' . gettype ($data));
break;
}
return ($valid);
}
}
I also built a full unit test to exercise the class.
My main development environment is a workstation running 64 bit builds PHP 5.3 and Apache under OSX Lion. I also use a laptop running a 64 bit build of Apache and PHP 5.4 also under Apache. As well as this I have a Ubuntu Linux virtual machine running 64 bit Apache and PHP 5.3. The unit test was fine for all of these, as expected.
I thought I could some spare time during lunch at work (Windows 7, XAMPP, 32 bit PHP 5.3) for working on the project that this class is a part of, but the first thing I ran into was failure of the unit test.
The problem is that on a 32 bit build of PHP the number gets silently cast to float if it exceeds the limits of a 32 bit integer. My proposed solution is to have a special case for float. If the input type is float, and its value is outside the range that can be expressed in int (PHP_INT_MIN .. PHP_INT_MAX) then I'll number_format() it to get it back into a string of digits. If it's within the range of an integer then I'll throw an exception.
However, this leads to its own problem. I know that the further away you get from 0 with a floating point number, the less resolution the number has (the smaller the increment between a given number and the next representable number gets). How far away from 0 do you have to get before it becomes impossible to represent the integer part of the number before you can't reliably represent the integer part any more? (I'm not sure if that's really clear, so for example, say the limit is 1000 before the resolution drops below the difference between one int and the next. I could enter a digit bigger than 1000, say 1001, but the limitations of floating point numbers means it ends up being 1001.9 and rounding it yields 1002, meaning I've lost the value I was interested in).
Is it possible to detect when the loss in resolution will become an issue for a floating point number?
EDIT TO ADD: I suppose I could modify the extension to accept a string instead of a numeric type and then verify that it contains only digits with a regex or some other similar technique, but as Luhn-checkable data is a string of digits that doesn't feel right to me, somehow. There are extensions for PHP that can handle bignums, but as they're extensions and this is meant to be a piece of framework code that could potentially be deployed over a wide range of configurations, I'd rather not rely on the presence of such extensions if at all possible. Besides, none of the above addresses the issue that if you give PHP a big int it silently converts it to float. I need a way of detecting that this has happened.
If you need precision, you should not use floats.
Instead, especially as you want to work with integers (if I understand correctly), you could try working with the bc* functions : BCMath Arbitrary Precision Mathematics
If you need precision, you should not use floats.
Instead, especially as you want to work with integers (if I understand correctly), you could try working with the gmp* functions: GMP - GNU Multiple Precision
If you cannot work with that extension you might get some additional ideas from
PEAR Big Integer - Pure-PHP arbitrary precision integer arithmetic library