I am trying very hard to develop a much deeper understanding of programming as a whole. I understand the textbook definition of "binary", but what I don't understand is exactly how it applies to my day to day programming?
The concept of "binary numbers" vs .. well... "regular" numbers, is completely lost on me despite my best attempts to research and understand the concept.
I am someone who originally taught myself to program by building stupid little adventure games in early DOS Basic and C, and now currently does most (er, all) of my work in PHP, JavaScript, Rails, and other "web" languages. I find that so much of this logic is abstracted out in these higher level languages that I ultimately feel I am missing many of the tools I need to continue progressing and writing better code.
If anyone could point me in the direction of a good, solid practical learning resource, or explain it here, it would be massively appreciated.
I'm not so much looking for the 'definition' (I've read the wikipedia page a few times now), but more some direction on how I can incorporate this new-found knowledge of exactly what binary numbers are into my day to day programming, if at all. I'm primarily writing in PHP these days, so references to that language specifically would be very helpful.
Edit: As pointed out.. binary is a representation of a number, not a different system altogether.. So to revise my question, what are the benefits (if any) of using binary representation of numbers rather than just... numbers.
Binary trees (one of your tags), particularly binary search trees, are practical for some everyday programming scenarios (e.g. sorting).
Binary numbers are essential to computing fundamentals but more rarely used in higher-level languages.
Binary numbers are useful in understanding bounds, such as the largest unsigned number of various widths (e.g. 2^32 - 1 for 32-bit), or the largest and smallest signed numbers for two's complement (the system normally used). For example, why is the smallest signed two's complement 32-bit number -2^31 but the largest 2^31 - 1? Even odder at first glance, -(-2^31) (negating the smallest number), yields itself. (Hint, try it with 2-bit numbers, since the analysis is the same).
Another is basic information theory. How many bits do I need to represent 10000 possibilities (log2 10000, rounded up)? It's also applicable to cryptography, but you're probably not getting into that much yet.
Don't expect to use binary everyday, but do develop a basic understanding for these and other reasons.
If you explore pack and bitwise operators, you may find other uses. In particular, many programmers don't know when they can use XOR (which can be understood by looking at a truth table involving the two binary digits).
Here is a brief history to help your understanding and I will get to your question at the end.
Binary is a little weird because we are so used to using a base 10 number system. This is because humans have 10 fingers, when they ran out they had to use a stick, toe or something else to represent 10 fingers. This it not true for all cultures though, some of the hunter gatherer populations (such as the Australian Aboriginal) used a base 5 number system (one hand) as producing large numbers were not necessary.
Anyway, the reason base 2 is important in computing is because a circuit can have two states, low voltage and high voltage; think of this like a switch (on and off). Place 8 of these switches together and you have 1 byte (8 bits). The best way to think of a bit is 1=on and 0=off which is exactly how it is represented in binary. You might then have something like this 10011100 where 1's are high volts and 0 are low volts. In early computers, physical switches were used which the the operator could turn on and off to create a program.
Nowadays, you will rarely need to use binary number in modern programming. The only exceptions I can think of is bitwise arithmetic which are very fast and efficient ways of solving certain problems or maybe some form of computer hacking. All I can suggest is learn the basics of it but don't worry about actually using it in everyday programming.
There are two usages of binary (versus regular) numbers.
Because of the word regular, probably not:
Binary stored as compact bytes, say 4 bytes for an integer, 8 B for a double. Is SQL INT or DOUBLE. Regular stored as text, byte per digit. SQL VARCHAR.
But in our case:
Representation in different numbering base: 101 binary = 1*4 + 0*2 + 1*1 = 5.
This lends itself for complex codings of yes/no states:
Given 1 | x = 1 and 0 | x = x (or, binary +) and 0 & x = 0 and 1 & x = x (and, binary *)
$sex_male = 0:
$sex_female = 1;
$employee_no = 0*2;
$employee_yes = 1*2;
$has_no_email = 0*4;
$has_email = 1*4;
$code = $sex_female | $employee_no | $has_email;
if (($code & $sex_female) != 0) print "female";
To me, one of the biggest impacts of a binary representation of numbers is the difference between floating point values and our "ordinary" (base-10 or decimal) notion of fractions, decimals, and real numbers.
The vast majority of fractions cannot be exactly represented in binary. Something like 0.4 seems like it's not a hard number to represent; it's only got one place after the decimal, it's the same as two fifths or 40%, what's so tough? But most programming environments use binary floating point, and cannot represent this number exactly! Even if the computer displays 0.4, the actual value used by the computer is not exactly 0.4. So you get all kinds of unintuitive behavior when it comes to rounding and arithmetic.
Note that this "problem" is not unique to binary. For example, using our own base-10 decimal notation, how do we represent one third? Well, we can't do it exactly. 0.333 is not exactly the same as one third. 0.333333333333 is not exactly one third either. We can get pretty close, and the more digits you let us use, the closer we can get. But we can never, ever be exactly right, because it would require an infinite number of digits. This is fundamentally what's happening when binary floating point does something we don't expect: The computer doesn't have an infinite number of binary digits (bits) to represent our number, and so it can't get it exactly right, but gives us the closest thing it can.
rather more of an experience rather than a solid answer:
actually, you don't actually need binary because it's pretty much abstracted in programming nowadays (depending on what you program). binary has more use in the systems design and networking.
some things my colleagues at school do in their majors:
processor instruction sets and operations (op codes)
networking and data transmission
hacking (especially memory "tampering". more of hex but still related)
memory allocation (in assembly, we use hex but sometimes binary)
you need to know how these "regular numbers" are represented and understood by the machine - hence all those "conversion lessons" like hex to binary, binary to octal etc. machines only read binary.
With Python you can explore bitwise operations and manipulations with the command line. Personally I've used bit operations to examine an obscure compression algorithm used in packet radio.
Bitwise Operators
Bit Manipulations
Interesting question. Although you are a "lowly web guy" I would have to say that it is great that you are curious about how binary affects you. Well to help I would suggest picking up a low-level language and playing around with it. Something along the likes of C programming and/or Assembly. As far as using PHP try looking through the source code of PHP and how its implemented.
Here's a Quality links on binary/Hexadecimal http://maven.smith.edu/~thiebaut/ArtOfAssembly/artofasm.html
Good luck and happy learning :)
As a web guy, you no doubt understand the importance of unicode. Unicode is represented in hexidecimal format when viewing character sets not supported by your system. Hexidecimal also appears in RGB values, and memory addresses. Hexideciaml is, among many things, a shorthand for writing out long binary characters.
Finally, binary numbers work as the basis of truthiness: 1 is true, while 0 is always false.
Go check out a book on digital fundementals, and try your hand at boolean logic. You'll never look at if a and not b or c the same way again!
Related
I am making Fibonacci series for long numbers in PHP. For example my n = 100 and post 92 sequence it starts getting values like 1.2200160415122E+19.
Please help me to understand how to handle such big numbers in PHP.
On first sight I'd say this has nothing to do with the php language. It is a general issue with floating point notation that you simply do not have a precision as with fixed point notation. For tasks like Fibonacci I'd say you need a precision of 1, thus a floating point notation is unsuitable for the task. No way around that.
However there are a number of classes and extensions for php that allow arithmetic with large integers. I suggest you take a look into those:
BC Math
GMP
When developing calculator in PHP 5+, can programmer get results without thinking on errors of rounding and imprecise representation of decimal numbers?
I mean built-in perfect ways to get, for example, results with given accuracy.
Or each operation like $x = $y * $z + 0.77 must require to develop additional checks in algorithm like rounding (for avoiding errors)?
So errors or imprecision - is up to PHP built-in core or up to developer?
PHP 7 has 64-bit support, what is the improvement on this direction (accuracy of calculations)?
When developing calculator in PHP 5+, can programmer get results without thinking on errors of rounding and imprecise representation of decimal numbers?
Simple answer is No.
Doing arithmetics on decimal basis by using a binary machine like a computer can and will always produce some kind of error.
Many details are explained in this article, it's quite a complex subject.
http://docs.oracle.com/cd/E19957-01/806-3568/ncg_goldberg.html
Simple example from the decimal world:
The fraction 1/3 can not be expressed as a finite floating point number in decimal notation. Often we try it by writing 0.333333333... but that's not 100% accurate.
Same applies to binary number arithmetics.
[Edit]
If you need a very high degree of precision than you should have a look at phps BC math functions http://php.net/manual/en/ref.bc.php
I don't think there's anything in PHP 7 that will change this. AS #maxhb points out, doing floating point arithmetic is inherently imprecise so you must mind rounding errors.
The 64-bit capability does not fundamentally change this.
It's kind of a common knowledge that (most) floating point numbers are not stored precisely (when IEEE-754 format is used). So one shouldn't do this:
0.3 - 0.2 === 0.1; // very wrong
... as it will result in false, unless some specific arbitrary-precision type/class was used (BigDecimal in Java/Ruby, BCMath in PHP, Math::BigInt/Math::BigFloat in Perl, to name a few) instead.
Yet I wonder why when one tries to print the result of this expression, 0.3 - 0.2, scripting languages (Perl and PHP) give 0.1, but "virtual-machine" ones (Java, JavaScript and Erlang) give something more similar to 0.09999999999999998 instead?
And why is it also inconsistent in Ruby? version 1.8.6 (codepad) gives 0.1, version 1.9.3 (ideone) gives 0.0999...
As for php, output is related to ini settings of precision:
ini_set('precision', 15);
print 0.3 - 0.2; // 0.1
ini_set('precision', 17);
print 0.3 - 0.2; //0.099999999999999978
This may be also cause for other languages
Floating-point numbers are printed differently because printing is done for different purposes, so different choices are made about how to do it.
Printing a floating-point number is a conversion operation: A value encoded in an internal format is converted to a decimal numeral. However, there are choices about the details of the conversion.
(A) If you are doing precise mathematics and want to see the actual value represented by the internal format, then the conversion must be exact: It must produce a decimal numeral that has exactly the same value as the input. (Each floating-point number represents exactly one number. A floating-point number, as defined in the IEEE 754 standard, does not represent an interval.) At times, this may require producing a very large number of digits.
(B) If you do not need the exact value but do need to convert back and forth between the internal format and decimal, then you need to convert it to a decimal numeral precisely (and accurately) enough to distinguish it from any other result. That is, you must produce enough digits that the result is different from what you would get by converting numbers that are adjacent in the internal format. This may require producing a large number of digits, but not so many as to be unmanageable.
(C) If you only want to give the reader a sense of the number, and do not need to produce the exact value in order for your application to function as desired, then you only need to produce as many digits as are needed for your particular application.
Which of these should a conversion do?
Different languages have different defaults because they were developed for different purposes, or because it was not expedient during development to do all the work necessary to produce exact results, or for various other reasons.
(A) requires careful code, and some languages or implementations of them do not provide, or do not guarantee to provide, this behavior.
(B) is required by Java, I believe. However, as we saw in a recent question, it can have some unexpected behavior. (65.12 is printed as “65.12” because the latter has enough digits to distinguish it from nearby values, but 65.12-2 is printed as “63.120000000000005” because there is another floating-point value between it and 63.12, so you need the extra digits to distinguish them.)
(C) is what some languages use by default. It is, in essence, wrong, since no single value for how many digits to print can be suitable for all applications. Indeed, we have seen over decades that it fosters continuing misconceptions about floating-point, largely by concealing the true values involved. It is, however, easy to implement, and hence is attractive to some implementors. Ideally, a language should by default print the correct value of a floating-point number. If fewer digits are to be displayed, the number of digits should be selected only by the application implementor, hopefully including consideration of the appropriate number of digits to produce the desire results.
Worse, some languages, in addition to not displaying the actual value or enough digits to distinguish it, do not even guarantee that the digits produced are correct in some sense (such as being the value you would get by rounding the exact value to the number of digits shown). When programming in an implementation that does not provide a guarantee about this behavior, you are not doing engineering.
PHP automatically rounds the number to an arbitrary precision.
Floating-point numbers in general aren't accurate (as you noted), and you should use the language-specific round() function if you need a comparison with only a few decimal places. Otherwise, take the absolute value of the equation, and test they are within a given range.
PHP Example from php.net:
$a = 1.23456789;
$b = 1.23456780;
$epsilon = 0.00001;
if(abs($a - $b) < $epsilon) {
echo "true";
}
As for the Ruby issue, they appear to be using different versions. Codepad uses 1.8.6, While Ideaone uses 1.9.3, but it's more likely related to a config somewhere.
If we want this property
every two different float has a different printed representation
Or an even stronger one useful for REPL
printed representation shall be re-interpreted unchanged
Then I see 3 solutions for printing a float/double with base 2 internal representation into base 10
print the EXACT representation.
print enough decimal digits (with proper rounding)
print the shortest decimal representation that can be reinterpreted unchanged
Since in base two, the float number is an_integer * 2^an_exponent, its base 10 exact representation has a finite number of digits.
Unfortunately, this can result in very long strings...
For example 1.0e-10 is represented exactly as 1.0000000000000000364321973154977415791655470655996396089904010295867919921875e-10
Solution 2 is easy, you use printf with 17 digits for IEEE-754 double...
Drawback: it's not exact, nor the shortest! If you enter 0.1, you get
0.100000000000000006
Solution 3 is the best one for REPL languages, if you enter 0.1, it prints 0.1
Unfortunately it is not found in standard libraries (a shame).
At least, Scheme, Python and recent Squeak/Pharo Smalltalk do it right, I think Java too.
As for Javascript, base2 is being used internally for calculations.
> 0.2 + 0.4
0.6000000000000001
For that, Javascript can only deliver even numbers, if the resulting base2 number is not periodic.
0.6 is 0.10011 10011 10011 10011 ... in base2 (periodic), whereas 0.5 is not and therefore correctly printed.
It's kind of a common knowledge that (most) floating point numbers are not stored precisely (when IEEE-754 format is used). So one shouldn't do this:
0.3 - 0.2 === 0.1; // very wrong
... as it will result in false, unless some specific arbitrary-precision type/class was used (BigDecimal in Java/Ruby, BCMath in PHP, Math::BigInt/Math::BigFloat in Perl, to name a few) instead.
Yet I wonder why when one tries to print the result of this expression, 0.3 - 0.2, scripting languages (Perl and PHP) give 0.1, but "virtual-machine" ones (Java, JavaScript and Erlang) give something more similar to 0.09999999999999998 instead?
And why is it also inconsistent in Ruby? version 1.8.6 (codepad) gives 0.1, version 1.9.3 (ideone) gives 0.0999...
As for php, output is related to ini settings of precision:
ini_set('precision', 15);
print 0.3 - 0.2; // 0.1
ini_set('precision', 17);
print 0.3 - 0.2; //0.099999999999999978
This may be also cause for other languages
Floating-point numbers are printed differently because printing is done for different purposes, so different choices are made about how to do it.
Printing a floating-point number is a conversion operation: A value encoded in an internal format is converted to a decimal numeral. However, there are choices about the details of the conversion.
(A) If you are doing precise mathematics and want to see the actual value represented by the internal format, then the conversion must be exact: It must produce a decimal numeral that has exactly the same value as the input. (Each floating-point number represents exactly one number. A floating-point number, as defined in the IEEE 754 standard, does not represent an interval.) At times, this may require producing a very large number of digits.
(B) If you do not need the exact value but do need to convert back and forth between the internal format and decimal, then you need to convert it to a decimal numeral precisely (and accurately) enough to distinguish it from any other result. That is, you must produce enough digits that the result is different from what you would get by converting numbers that are adjacent in the internal format. This may require producing a large number of digits, but not so many as to be unmanageable.
(C) If you only want to give the reader a sense of the number, and do not need to produce the exact value in order for your application to function as desired, then you only need to produce as many digits as are needed for your particular application.
Which of these should a conversion do?
Different languages have different defaults because they were developed for different purposes, or because it was not expedient during development to do all the work necessary to produce exact results, or for various other reasons.
(A) requires careful code, and some languages or implementations of them do not provide, or do not guarantee to provide, this behavior.
(B) is required by Java, I believe. However, as we saw in a recent question, it can have some unexpected behavior. (65.12 is printed as “65.12” because the latter has enough digits to distinguish it from nearby values, but 65.12-2 is printed as “63.120000000000005” because there is another floating-point value between it and 63.12, so you need the extra digits to distinguish them.)
(C) is what some languages use by default. It is, in essence, wrong, since no single value for how many digits to print can be suitable for all applications. Indeed, we have seen over decades that it fosters continuing misconceptions about floating-point, largely by concealing the true values involved. It is, however, easy to implement, and hence is attractive to some implementors. Ideally, a language should by default print the correct value of a floating-point number. If fewer digits are to be displayed, the number of digits should be selected only by the application implementor, hopefully including consideration of the appropriate number of digits to produce the desire results.
Worse, some languages, in addition to not displaying the actual value or enough digits to distinguish it, do not even guarantee that the digits produced are correct in some sense (such as being the value you would get by rounding the exact value to the number of digits shown). When programming in an implementation that does not provide a guarantee about this behavior, you are not doing engineering.
PHP automatically rounds the number to an arbitrary precision.
Floating-point numbers in general aren't accurate (as you noted), and you should use the language-specific round() function if you need a comparison with only a few decimal places. Otherwise, take the absolute value of the equation, and test they are within a given range.
PHP Example from php.net:
$a = 1.23456789;
$b = 1.23456780;
$epsilon = 0.00001;
if(abs($a - $b) < $epsilon) {
echo "true";
}
As for the Ruby issue, they appear to be using different versions. Codepad uses 1.8.6, While Ideaone uses 1.9.3, but it's more likely related to a config somewhere.
If we want this property
every two different float has a different printed representation
Or an even stronger one useful for REPL
printed representation shall be re-interpreted unchanged
Then I see 3 solutions for printing a float/double with base 2 internal representation into base 10
print the EXACT representation.
print enough decimal digits (with proper rounding)
print the shortest decimal representation that can be reinterpreted unchanged
Since in base two, the float number is an_integer * 2^an_exponent, its base 10 exact representation has a finite number of digits.
Unfortunately, this can result in very long strings...
For example 1.0e-10 is represented exactly as 1.0000000000000000364321973154977415791655470655996396089904010295867919921875e-10
Solution 2 is easy, you use printf with 17 digits for IEEE-754 double...
Drawback: it's not exact, nor the shortest! If you enter 0.1, you get
0.100000000000000006
Solution 3 is the best one for REPL languages, if you enter 0.1, it prints 0.1
Unfortunately it is not found in standard libraries (a shame).
At least, Scheme, Python and recent Squeak/Pharo Smalltalk do it right, I think Java too.
As for Javascript, base2 is being used internally for calculations.
> 0.2 + 0.4
0.6000000000000001
For that, Javascript can only deliver even numbers, if the resulting base2 number is not periodic.
0.6 is 0.10011 10011 10011 10011 ... in base2 (periodic), whereas 0.5 is not and therefore correctly printed.
This question already has answers here:
Closed 12 years ago.
Possible Duplicates:
Why does 99.99 / 100 = 0.9998999999999999
Dealing with accuracy problems in floating-point numbers
I've seen this issue in php and javascript. I have this number: float 0.699
if I do this:
0.699 x 100 = 69.89999999999999
why?
edit
round(0.699 x 10, 2): float 69.90000000000001
Floating point arithmetic is not exact.
See Floating point on Wikipedia for a deeper discussion of the problem.
This is what has helped me in the past. It has a lot to do with how things are represented in binary. Basically long story short in binary there isn't an exact number for all real numbers of large numbers.
The link below will describe that in more detail for you.
What Every Computer Scientist Should Know About Floating-Point Arithmetic
This will happen in any language. Floats, like everything else on a computer, are stored as binary. The number 0.699, while representable exactly in decimal, is probably a repeating decimal in binary, so it can't be stored to exact precision.
Check out the wikipedia entry for how floats are stored, and why this happens.
Javascript numbers are floating point.
Take a look at The complete javascript number reference. Excerpt:
All numbers in Javascript are 64bit (8
bytes) floating point numbers which
yields an effective range of 5e-324
(negative) to 1.7976931348623157e+308
(positive) at the time this article
was written (this may eventually
change to 128 bits in the future as 64
bit processors become commonplace and
the ECMA standards evolve).
Integers are considered reliable
(numbers without a period or exponent
notation) to 15 digits (9e15) 1.
Floating point numbers are considered
only as reliable as possible and no
more! This is an especially important
concept to understand for currency
manipulation as 0.06 + 0.01 resolves
to 0.06999999999999999 instead of
0.07.
Take a look at Floating Point, specifically the section on IEEE 754 and representable numbers.
This behavior can be reproduced in many programming languages, including C++ and Assembly. The reason is floating point format using by FPU. You can read details here:
http://www.arl.wustl.edu/~lockwood/class/cs306/books/artofasm/Chapter_14/CH14-1.html#HEADING1-19
General rule: never expect exact result of floating-point operations. Never compare two floating point numbers, use interval, for example: instead of testing f1 == f2, use f1 > (f2 -e) and f1 < ( f2 + e ), e is some small value.