This question already has answers here:
Is floating point math broken?
(31 answers)
Closed 6 years ago.
In php arithmetic operation
<?php
$lower = floor(63.1);
$value = 63.1;
echo $value - $lower;die; ?>
i get the answer as 0.1, but when i do the same for
64.1-64 i get the result as
0.099999999999994
Why is it so?
It's a problem that many languages, not just PHP have, when running on hardware (like most hardware) that supports floating point in binary or base-2 (base-16 as a shorthand) but not in base-10 that we humans use (and often assume is the only way to represent numbers).
0.1 to us humans is a base-10 notation which means 1 x 10**-1. This cannot be represented accurately in base-16. Closest is 9 x 16**-2 + 9 x 16**-3 ...
Base-10 has this problem in reverse. Base-10 represents the fraction 1/3 as 0.333333... But if we were using base-3, it would be a succinct 0.1 = 1 x 3**-1.
In PHP, if you need better base-10 precision, use the BC Math functions.
(Note some hardware can represent numbers in base-10. Mainframes, for example, for 50+ years have "packed decimal" and the corresponding native opcodes to process this. Not surprising, as they were built as business machines for accurately handling base-10 things like money!).
Related
This question already has answers here:
When should I use double instead of decimal?
(12 answers)
Closed 9 years ago.
I keep seeing people using doubles in C#. I know I read somewhere that doubles sometimes lose precision.
My question is when should a use a double and when should I use a decimal type?
Which type is suitable for money computations? (ie. greater than $100 million)
For money, always decimal. It's why it was created.
If numbers must add up correctly or balance, use decimal. This includes any financial storage or calculations, scores, or other numbers that people might do by hand.
If the exact value of numbers is not important, use double for speed. This includes graphics, physics or other physical sciences computations where there is already a "number of significant digits".
My question is when should a use a
double and when should I use a decimal
type?
decimal for when you work with values in the range of 10^(+/-28) and where you have expectations about the behaviour based on base 10 representations - basically money.
double for when you need relative accuracy (i.e. losing precision in the trailing digits on large values is not a problem) across wildly different magnitudes - double covers more than 10^(+/-300). Scientific calculations are the best example here.
which type is suitable for money
computations?
decimal, decimal, decimal
Accept no substitutes.
The most important factor is that double, being implemented as a binary fraction, cannot accurately represent many decimal fractions (like 0.1) at all and its overall number of digits is smaller since it is 64-bit wide vs. 128-bit for decimal. Finally, financial applications often have to follow specific rounding modes (sometimes mandated by law). decimal supports these; double does not.
According to Characteristics of the floating-point types:
.NET Type
C# Keyword
Precision
System.Single
float
~6-9 digits
System.Double
double
~15-17 digits
System.Decimal
decimal
28-29 digits
The way I've been stung by using the wrong type (a good few years ago) is with large amounts:
£520,532.52 - 8 digits
£1,323,523.12 - 9 digits
You run out at 1 million for a float.
A 15 digit monetary value:
£1,234,567,890,123.45
9 trillion with a double. But with division and comparisons it's more complicated (I'm definitely no expert in floating point and irrational numbers - see Marc's point). Mixing decimals and doubles causes issues:
A mathematical or comparison operation
that uses a floating-point number
might not yield the same result if a
decimal number is used because the
floating-point number might not
exactly approximate the decimal
number.
When should I use double instead of decimal? has some similar and more in depth answers.
Using double instead of decimal for monetary applications is a micro-optimization - that's the simplest way I look at it.
Decimal is for exact values. Double is for approximate values.
USD: $12,345.67 USD (Decimal)
CAD: $13,617.27 (Decimal)
Exchange Rate: 1.102932 (Double)
For money: decimal. It costs a little more memory, but doesn't have rounding troubles like double sometimes has.
Definitely use integer types for your money computations.
This cannot be emphasized enough since at first glance it might seem that a floating point type is adequate.
Here an example in python code:
>>> amount = float(100.00) # one hundred dollars
>>> print amount
100.0
>>> new_amount = amount + 1
>>> print new_amount
101.0
>>> print new_amount - amount
>>> 1.0
looks pretty normal.
Now try this again with 10^20 Zimbabwe dollars:
>>> amount = float(1e20)
>>> print amount
1e+20
>>> new_amount = amount + 1
>>> print new_amount
1e+20
>>> print new_amount-amount
0.0
As you can see, the dollar disappeared.
If you use the integer type, it works fine:
>>> amount = int(1e20)
>>> print amount
100000000000000000000
>>> new_amount = amount + 1
>>> print new_amount
100000000000000000001
>>> print new_amount - amount
1
I think that the main difference beside bit width is that decimal has exponent base 10 and double has 2
http://software-product-development.blogspot.com/2008/07/net-double-vs-decimal.html
Why do some numbers lose accuracy when stored as floating point numbers?
For example, the decimal number 9.2 can be expressed exactly as a ratio of two decimal integers (92/10), both of which can be expressed exactly in binary (0b1011100/0b1010). However, the same ratio stored as a floating point number is never exactly equal to 9.2:
32-bit "single precision" float: 9.19999980926513671875
64-bit "double precision" float: 9.199999999999999289457264239899814128875732421875
How can such an apparently simple number be "too big" to express in 64 bits of memory?
In most programming languages, floating point numbers are represented a lot like scientific notation: with an exponent and a mantissa (also called the significand). A very simple number, say 9.2, is actually this fraction:
5179139571476070 * 2 -49
Where the exponent is -49 and the mantissa is 5179139571476070. The reason it is impossible to represent some decimal numbers this way is that both the exponent and the mantissa must be integers. In other words, all floats must be an integer multiplied by an integer power of 2.
9.2 may be simply 92/10, but 10 cannot be expressed as 2n if n is limited to integer values.
Seeing the Data
First, a few functions to see the components that make a 32- and 64-bit float. Gloss over these if you only care about the output (example in Python):
def float_to_bin_parts(number, bits=64):
if bits == 32: # single precision
int_pack = 'I'
float_pack = 'f'
exponent_bits = 8
mantissa_bits = 23
exponent_bias = 127
elif bits == 64: # double precision. all python floats are this
int_pack = 'Q'
float_pack = 'd'
exponent_bits = 11
mantissa_bits = 52
exponent_bias = 1023
else:
raise ValueError, 'bits argument must be 32 or 64'
bin_iter = iter(bin(struct.unpack(int_pack, struct.pack(float_pack, number))[0])[2:].rjust(bits, '0'))
return [''.join(islice(bin_iter, x)) for x in (1, exponent_bits, mantissa_bits)]
There's a lot of complexity behind that function, and it'd be quite the tangent to explain, but if you're interested, the important resource for our purposes is the struct module.
Python's float is a 64-bit, double-precision number. In other languages such as C, C++, Java and C#, double-precision has a separate type double, which is often implemented as 64 bits.
When we call that function with our example, 9.2, here's what we get:
>>> float_to_bin_parts(9.2)
['0', '10000000010', '0010011001100110011001100110011001100110011001100110']
Interpreting the Data
You'll see I've split the return value into three components. These components are:
Sign
Exponent
Mantissa (also called Significand, or Fraction)
Sign
The sign is stored in the first component as a single bit. It's easy to explain: 0 means the float is a positive number; 1 means it's negative. Because 9.2 is positive, our sign value is 0.
Exponent
The exponent is stored in the middle component as 11 bits. In our case, 0b10000000010. In decimal, that represents the value 1026. A quirk of this component is that you must subtract a number equal to 2(# of bits) - 1 - 1 to get the true exponent; in our case, that means subtracting 0b1111111111 (decimal number 1023) to get the true exponent, 0b00000000011 (decimal number 3).
Mantissa
The mantissa is stored in the third component as 52 bits. However, there's a quirk to this component as well. To understand this quirk, consider a number in scientific notation, like this:
6.0221413x1023
The mantissa would be the 6.0221413. Recall that the mantissa in scientific notation always begins with a single non-zero digit. The same holds true for binary, except that binary only has two digits: 0 and 1. So the binary mantissa always starts with 1! When a float is stored, the 1 at the front of the binary mantissa is omitted to save space; we have to place it back at the front of our third element to get the true mantissa:
1.0010011001100110011001100110011001100110011001100110
This involves more than just a simple addition, because the bits stored in our third component actually represent the fractional part of the mantissa, to the right of the radix point.
When dealing with decimal numbers, we "move the decimal point" by multiplying or dividing by powers of 10. In binary, we can do the same thing by multiplying or dividing by powers of 2. Since our third element has 52 bits, we divide it by 252 to move it 52 places to the right:
0.0010011001100110011001100110011001100110011001100110
In decimal notation, that's the same as dividing 675539944105574 by 4503599627370496 to get 0.1499999999999999. (This is one example of a ratio that can be expressed exactly in binary, but only approximately in decimal; for more detail, see: 675539944105574 / 4503599627370496.)
Now that we've transformed the third component into a fractional number, adding 1 gives the true mantissa.
Recapping the Components
Sign (first component): 0 for positive, 1 for negative
Exponent (middle component): Subtract 2(# of bits) - 1 - 1 to get the true exponent
Mantissa (last component): Divide by 2(# of bits) and add 1 to get the true mantissa
Calculating the Number
Putting all three parts together, we're given this binary number:
1.0010011001100110011001100110011001100110011001100110 x 1011
Which we can then convert from binary to decimal:
1.1499999999999999 x 23 (inexact!)
And multiply to reveal the final representation of the number we started with (9.2) after being stored as a floating point value:
9.1999999999999993
Representing as a Fraction
9.2
Now that we've built the number, it's possible to reconstruct it into a simple fraction:
1.0010011001100110011001100110011001100110011001100110 x 1011
Shift mantissa to a whole number:
10010011001100110011001100110011001100110011001100110 x 1011-110100
Convert to decimal:
5179139571476070 x 23-52
Subtract the exponent:
5179139571476070 x 2-49
Turn negative exponent into division:
5179139571476070 / 249
Multiply exponent:
5179139571476070 / 562949953421312
Which equals:
9.1999999999999993
9.5
>>> float_to_bin_parts(9.5)
['0', '10000000010', '0011000000000000000000000000000000000000000000000000']
Already you can see the mantissa is only 4 digits followed by a whole lot of zeroes. But let's go through the paces.
Assemble the binary scientific notation:
1.0011 x 1011
Shift the decimal point:
10011 x 1011-100
Subtract the exponent:
10011 x 10-1
Binary to decimal:
19 x 2-1
Negative exponent to division:
19 / 21
Multiply exponent:
19 / 2
Equals:
9.5
Further reading
The Floating-Point Guide: What Every Programmer Should Know About Floating-Point Arithmetic, or, Why don’t my numbers add up? (floating-point-gui.de)
What Every Computer Scientist Should Know About Floating-Point Arithmetic (Goldberg 1991)
IEEE Double-precision floating-point format (Wikipedia)
Floating Point Arithmetic: Issues and Limitations (docs.python.org)
Floating Point Binary
This isn't a full answer (mhlester already covered a lot of good ground I won't duplicate), but I would like to stress how much the representation of a number depends on the base you are working in.
Consider the fraction 2/3
In good-ol' base 10, we typically write it out as something like
0.666...
0.666
0.667
When we look at those representations, we tend to associate each of them with the fraction 2/3, even though only the first representation is mathematically equal to the fraction. The second and third representations/approximations have an error on the order of 0.001, which is actually much worse than the error between 9.2 and 9.1999999999999993. In fact, the second representation isn't even rounded correctly! Nevertheless, we don't have a problem with 0.666 as an approximation of the number 2/3, so we shouldn't really have a problem with how 9.2 is approximated in most programs. (Yes, in some programs it matters.)
Number bases
So here's where number bases are crucial. If we were trying to represent 2/3 in base 3, then
(2/3)10 = 0.23
In other words, we have an exact, finite representation for the same number by switching bases! The take-away is that even though you can convert any number to any base, all rational numbers have exact finite representations in some bases but not in others.
To drive this point home, let's look at 1/2. It might surprise you that even though this perfectly simple number has an exact representation in base 10 and 2, it requires a repeating representation in base 3.
(1/2)10 = 0.510 = 0.12 = 0.1111...3
Why are floating point numbers inaccurate?
Because often-times, they are approximating rationals that cannot be represented finitely in base 2 (the digits repeat), and in general they are approximating real (possibly irrational) numbers which may not be representable in finitely many digits in any base.
While all of the other answers are good there is still one thing missing:
It is impossible to represent irrational numbers (e.g. π, sqrt(2), log(3), etc.) precisely!
And that actually is why they are called irrational. No amount of bit storage in the world would be enough to hold even one of them. Only symbolic arithmetic is able to preserve their precision.
Although if you would limit your math needs to rational numbers only the problem of precision becomes manageable. You would need to store a pair of (possibly very big) integers a and b to hold the number represented by the fraction a/b. All your arithmetic would have to be done on fractions just like in highschool math (e.g. a/b * c/d = ac/bd).
But of course you would still run into the same kind of trouble when pi, sqrt, log, sin, etc. are involved.
TL;DR
For hardware accelerated arithmetic only a limited amount of rational numbers can be represented. Every not-representable number is approximated. Some numbers (i.e. irrational) can never be represented no matter the system.
There are infinitely many real numbers (so many that you can't enumerate them), and there are infinitely many rational numbers (it is possible to enumerate them).
The floating-point representation is a finite one (like anything in a computer) so unavoidably many many many numbers are impossible to represent. In particular, 64 bits only allow you to distinguish among only 18,446,744,073,709,551,616 different values (which is nothing compared to infinity). With the standard convention, 9.2 is not one of them. Those that can are of the form m.2^e for some integers m and e.
You might come up with a different numeration system, 10 based for instance, where 9.2 would have an exact representation. But other numbers, say 1/3, would still be impossible to represent.
Also note that double-precision floating-points numbers are extremely accurate. They can represent any number in a very wide range with as much as 15 exact digits. For daily life computations, 4 or 5 digits are more than enough. You will never really need those 15, unless you want to count every millisecond of your lifetime.
Why can we not represent 9.2 in binary floating point?
Floating point numbers are (simplifying slightly) a positional numbering system with a restricted number of digits and a movable radix point.
A fraction can only be expressed exactly using a finite number of digits in a positional numbering system if the prime factors of the denominator (when the fraction is expressed in it's lowest terms) are factors of the base.
The prime factors of 10 are 5 and 2, so in base 10 we can represent any fraction of the form a/(2b5c).
On the other hand the only prime factor of 2 is 2, so in base 2 we can only represent fractions of the form a/(2b)
Why do computers use this representation?
Because it's a simple format to work with and it is sufficiently accurate for most purposes. Basically the same reason scientists use "scientific notation" and round their results to a reasonable number of digits at each step.
It would certainly be possible to define a fraction format, with (for example) a 32-bit numerator and a 32-bit denominator. It would be able to represent numbers that IEEE double precision floating point could not, but equally there would be many numbers that can be represented in double precision floating point that could not be represented in such a fixed-size fraction format.
However the big problem is that such a format is a pain to do calculations on. For two reasons.
If you want to have exactly one representation of each number then after each calculation you need to reduce the fraction to it's lowest terms. That means that for every operation you basically need to do a greatest common divisor calculation.
If after your calculation you end up with an unrepresentable result because the numerator or denominator you need to find the closest representable result. This is non-trivil.
Some Languages do offer fraction types, but usually they do it in combination with arbitary precision, this avoids needing to worry about approximating fractions but it creates it's own problem, when a number passes through a large number of calculation steps the size of the denominator and hence the storage needed for the fraction can explode.
Some languages also offer decimal floating point types, these are mainly used in scenarios where it is imporant that the results the computer gets match pre-existing rounding rules that were written with humans in mind (chiefly financial calculations). These are slightly more difficult to work with than binary floating point, but the biggest problem is that most computers don't offer hardware support for them.
This question already has answers here:
Is floating point math broken?
(31 answers)
Closed 6 years ago.
Everywhere within my code when I interact with a float value in my database, I always round(); it to 2 decimals in PHP, how come I get values like this: 0.00000305176 within my database sometimes?
As far as I know, most, if not all programming languages suffer from this particular issue and the reason is, non-technically put because the machine uses a binary system to work through operations, from what I understand, while we expect a result in a decimal system.
If I understand it correctly the language has no way to precisely represent values such as 0.1, 0.2, or so, because it doesn't understand decimals like we do.
While we have decimal increases on 10s, 100s and 1000, to represent in a power mathematically it is 10^1, 10^2, 10^3, decimals go same way 10^-1, 10^-2, 10^-3, representing 0.1, 0.01 and 0.001.
However in binary the base is 2 not 10, so the same power exponent applies, you'd get 2, 4, 8, and in decimal you'd get 0.5, 0.25, 0.125 , 0.0625. So you see, it is hard for a machine to get an exact 0.1, it can get close though.
More about this in this article
This question already has answers here:
PHP - Floating Number Precision [duplicate]
(8 answers)
Closed 9 years ago.
I just hit an odd issue. I have the following tables in MySQL:
table:deposits
user_id amount
1 0.50
table:withdrawls
user_id amount
1 0.01
1 0.01
1 0.01
To get the balance, I run this:
SELECT (IFNULL((SELECT sum(amount) FROM deposits WHERE user_id = 1),0) - IFNULL((SELECT sum(amount) FROM withdrawls WHERE user_id = 1),0) ) as balance
I then return it as
return (float) $row['balance']
For some weird reason, the result is float(0.47000000067055). Does anyone know why there is bizarre rounding?
Floating-point arithmetic does not represent all real numbers exactly. 0.01 is converted to a representable value, which is, in essence, a binary numeral with a limited number of significant bits. Since 1/100 cannot be represented exactly in binary with a finite number of digits (the same way 1/3 in decimal requires an infinite number of digits: .3333…), the conversion to floating-point rounds the value.
That said, I do not see exactly how 0.47000000067055 is produced. Converting 0.01 to IEEE-754 64-bit binary floating-point with correct rounding to nearest produces 0.01000000000000000020816681711721685132943093776702880859375. Subtracting that three times from .5, rounding each time, produces 0.4699999999999999733546474089962430298328399658203125. Subtracting three times the value from .5 produces the same value.
I suspect that your PHP implementation converted 0.01 to floating-point in a less than optimal way.
Apologies for my poor maths skills, I've tried to understand this to answer my own query but I'm not convinced.
We all know that PHP doesn't store Floats in base 10 but base 2.
I have a series of calculations that are using 0.5 as the only float, and in trying to understand if they will be stored as 0.500001 or 0.4999999 (for rounding purposes there is a big difference!!!) I have come to understand that 0.5 will be stored precisely in base2.
My queries are
A Have I understood this correctly?
B What other floats are stored precisely in base2? eg 0.25?
Any multiple of 1/pow(x, 2) can be precisely represented as a float.
That means x/2, x/4, x/8, x/16 ...ect. can be accurately represented.
For more information on how floating point numbers are store see http://kipirvine.com/asm/workbook/floating_tut.htm
Gmp is a good library for high precision math.
PHP is not required to use binary floating-point. It depends on the system.
Many systems use IEEE-754 binary floating-point (sometimes incompletely or with modifications, such as flushing subnormal numbers to zero).
In IEEE-754 64-bit binary floating point, a number is exactly representable if and only if it is representable as an integer F times a power of two, 2E, such that:
The magnitude of F is less than 253.
–1074 ≤ E < 972.
For example, ½ equals 1•2–1. 1 is an integer under the integer limit, and –1 is an exponent within the exponent limits. So ½ is representable.
253+1 is not representable. As it is, it is an integer outside the integer limit. If you try to scale it by a power of two to bring it within the limit, you get a number that is not an integer. So there is no way to represent this value exactly in IEEE-754 64-bit binary floating-point.
1/3 and 1/10 are also not representable because no matter what power of two you scale them by, you will not produce an integer.