I have read some previous questions in the past, but still, I could not work out the step I have to do
Bit not operation in PHP(or any other language probably)
What is “2's Complement”?
$a = -9;
echo ~$a; //print 8
If I understand correctly,the 1st step is to convert to binary. In binary, it is 1001. The NOT operator then convert from 1001 to 0110.
What do I have to do after this? Do I add a 1? I am lost in all the explanation given in the explanation.
Or is it simpler to make an educated guess (in the exam condition). If I take -9, (step 1) convert to 9 and (step 2) then take away 1. That would give me 8. Is this a correct pattern to work out the answer?
Well it's very easy but first you must note the fact that if you system is 32 bit you cannot use all 32 bits to represent the number rather just 31, same for 16 bit and 8 bit systems ; the rightmost bit is never used to denote the magnitude of the number but rather just the sign.
There are some areas where you can tell the system to use all the bits for magnitude so 2's complement isn't used as such and all numbers will be assumed positive but normally for the most "regular" business we use signed numbers
Here's an example for an 8-bit systems
Step 1. Covert it to binary (you did that already)
You can see since it's binary going right to left each place value is double of the next in contrast to decimal where each place value is 10 times of its predecessor
Step 2. Invert all the bits form the result from previous step
Step 3. Now add 1 to the accumulated result from the previous step
Now ~ to above result will give 00001000 ie 8
Why so?
Because you are using BITWISE operator it knows nothing of the underlying number it just does what it should - inverting rest of the things like how to make sense of this 1s0s is upto the system which does a reverse 2s complement but since it's a postive number (MSB is 0) so it knows there's no need to do anything it just converts it to decimal but had it been negative it would have done all the computation to reverse the 2s complement because 2s complement of positive number is same as that number.
Now note that if you are using signed number as in above you can use 8 bits to represent number from -128 to 127 so to represent 256 you must upgrade to 32 bit.
A fun thing is to note that in 2's complement form for positive numbers more 1s means greater magnitude whereas for negative numbers 0s mean more magnitude
It would be hard at first to interpret 2s complement directly so start first converting every 2's complement to binary then interpret it as decimal once you are proficient you can directly tell what a 2s complement form amounts to in decimal
Ref.
http://sandbox.mc.edu/~bennet/cs110/tc/tctod.html
https://www.rapidtables.com/convert/number/binary-to-decimal.html
http://onlinecalculators.brainmeasures.com/Numbers/TwosComplimentCalculations.aspx
https://www.quora.com/Why-computer-use-2s-complement-to-store-negative-number-instead-of-1s-complement
Update:
To answer your comment : above isn't applicable to positive numbers.2s complement of positive number is that same number itself you might ask why so? well it's because 2s complement was invented to mitigate the problem of subtraction so computers really don't have to implement a SUB circuit separately (in fact there isn't any SUB logic gate at all! all we have is AND OR and NOT that's why it's hard to implement a SUB circuit)
So now same circuit could be used for addition as well as subtraction which uses 2s complement 'hack' in to perform subtraction by doing addition!
So 1 is stored as 00000001 now when you NOT it ie ~00000001 gives11111110
now that -2 in 2s complement form
Related
Locked. This question and its answers are locked because the question is off-topic but has historical significance. It is not currently accepting new answers or interactions.
How do you explain floating point inaccuracy to fresh programmers and laymen who still think computers are infinitely wise and accurate?
Do you have a favourite example or anecdote which seems to get the idea across much better than an precise, but dry, explanation?
How is this taught in Computer Science classes?
There are basically two major pitfalls people stumble in with floating-point numbers.
The problem of scale. Each FP number has an exponent which determines the overall “scale” of the number so you can represent either really small values or really larges ones, though the number of digits you can devote for that is limited. Adding two numbers of different scale will sometimes result in the smaller one being “eaten” since there is no way to fit it into the larger scale.
PS> $a = 1; $b = 0.0000000000000000000000001
PS> Write-Host a=$a b=$b
a=1 b=1E-25
PS> $a + $b
1
As an analogy for this case you could picture a large swimming pool and a teaspoon of water. Both are of very different sizes, but individually you can easily grasp how much they roughly are. Pouring the teaspoon into the swimming pool, however, will leave you still with roughly a swimming pool full of water.
(If the people learning this have trouble with exponential notation, one can also use the values 1 and 100000000000000000000 or so.)
Then there is the problem of binary vs. decimal representation. A number like 0.1 can't be represented exactly with a limited amount of binary digits. Some languages mask this, though:
PS> "{0:N50}" -f 0.1
0.10000000000000000000000000000000000000000000000000
But you can “amplify” the representation error by repeatedly adding the numbers together:
PS> $sum = 0; for ($i = 0; $i -lt 100; $i++) { $sum += 0.1 }; $sum
9,99999999999998
I can't think of a nice analogy to properly explain this, though. It's basically the same problem why you can represent 1/3 only approximately in decimal because to get the exact value you need to repeat the 3 indefinitely at the end of the decimal fraction.
Similarly, binary fractions are good for representing halves, quarters, eighths, etc. but things like a tenth will yield an infinitely repeating stream of binary digits.
Then there is another problem, though most people don't stumble into that, unless they're doing huge amounts of numerical stuff. But then, those already know about the problem. Since many floating-point numbers are merely approximations of the exact value this means that for a given approximation f of a real number r there can be infinitely many more real numbers r1, r2, ... which map to exactly the same approximation. Those numbers lie in a certain interval. Let's say that rmin is the minimum possible value of r that results in f and rmax the maximum possible value of r for which this holds, then you got an interval [rmin, rmax] where any number in that interval can be your actual number r.
Now, if you perform calculations on that number—adding, subtracting, multiplying, etc.—you lose precision. Every number is just an approximation, therefore you're actually performing calculations with intervals. The result is an interval too and the approximation error only ever gets larger, thereby widening the interval. You may get back a single number from that calculation. But that's merely one number from the interval of possible results, taking into account precision of your original operands and the precision loss due to the calculation.
That sort of thing is called Interval arithmetic and at least for me it was part of our math course at the university.
Show them that the base-10 system suffers from exactly the same problem.
Try to represent 1/3 as a decimal representation in base 10. You won't be able to do it exactly.
So if you write "0.3333", you will have a reasonably exact representation for many use cases.
But if you move that back to a fraction, you will get "3333/10000", which is not the same as "1/3".
Other fractions, such as 1/2 can easily be represented by a finite decimal representation in base-10: "0.5"
Now base-2 and base-10 suffer from essentially the same problem: both have some numbers that they can't represent exactly.
While base-10 has no problem representing 1/10 as "0.1" in base-2 you'd need an infinite representation starting with "0.000110011..".
How's this for an explantation to the layman. One way computers represent numbers is by counting discrete units. These are digital computers. For whole numbers, those without a fractional part, modern digital computers count powers of two: 1, 2, 4, 8. ,,, Place value, binary digits, blah , blah, blah. For fractions, digital computers count inverse powers of two: 1/2, 1/4, 1/8, ... The problem is that many numbers can't be represented by a sum of a finite number of those inverse powers. Using more place values (more bits) will increase the precision of the representation of those 'problem' numbers, but never get it exactly because it only has a limited number of bits. Some numbers can't be represented with an infinite number of bits.
Snooze...
OK, you want to measure the volume of water in a container, and you only have 3 measuring cups: full cup, half cup, and quarter cup. After counting the last full cup, let's say there is one third of a cup remaining. Yet you can't measure that because it doesn't exactly fill any combination of available cups. It doesn't fill the half cup, and the overflow from the quarter cup is too small to fill anything. So you have an error - the difference between 1/3 and 1/4. This error is compounded when you combine it with errors from other measurements.
In python:
>>> 1.0 / 10
0.10000000000000001
Explain how some fractions cannot be represented precisely in binary. Just like some fractions (like 1/3) cannot be represented precisely in base 10.
Another example, in C
printf (" %.20f \n", 3.6);
incredibly gives
3.60000000000000008882
Here is my simple understanding.
Problem:
The value 0.45 cannot be accurately be represented by a float and is rounded up to 0.450000018. Why is that?
Answer:
An int value of 45 is represented by the binary value 101101.
In order to make the value 0.45 it would be accurate if it you could take 45 x 10^-2 (= 45 / 10^2.)
But that’s impossible because you must use the base 2 instead of 10.
So the closest to 10^2 = 100 would be 128 = 2^7. The total number of bits you need is 9 : 6 for the value 45 (101101) + 3 bits for the value 7 (111).
Then the value 45 x 2^-7 = 0.3515625. Now you have a serious inaccuracy problem. 0.3515625 is not nearly close to 0.45.
How do we improve this inaccuracy? Well we could change the value 45 and 7 to something else.
How about 460 x 2^-10 = 0.44921875. You are now using 9 bits for 460 and 4 bits for 10. Then it’s a bit closer but still not that close. However if your initial desired value was 0.44921875 then you would get an exact match with no approximation.
So the formula for your value would be X = A x 2^B. Where A and B are integer values positive or negative.
Obviously the higher the numbers can be the higher would your accuracy become however as you know the number of bits to represent the values A and B are limited. For float you have a total number of 32. Double has 64 and Decimal has 128.
A cute piece of numerical weirdness may be observed if one converts 9999999.4999999999 to a float and back to a double. The result is reported as 10000000, even though that value is obviously closer to 9999999, and even though 9999999.499999999 correctly rounds to 9999999.
In my php script I do a calculation of entries from a MySQL db. The concerning fields in the db are defined as decimal(10,3). It's an accounting plattform where I have to check if in every entry debit = credit.
I do this with the following operation:
$sumupNet = 0;
$sumup = 0;
foreach($val['Record'] as $subkey => $subval)
{
$sumupNet = $sumupNet + $subval['lc_amount_net'];
$sumup = $sumup + $subval['lc_amount_debit'] - $subval['lc_amount_credit'];
}
Now we say every entry is correkt, then $sumupNet and $sumup results in 0. In most cases, this works. But in some cases the result is something like this: -1.4432899320127E-15 or this -8.8817841970013E-15. If I calculate this values manually, the result is 0. I guess (not sure) that the above results are numbers near 0 and are outputted in the form of exponential.
So I think I have to convert something or my calculation is wrong. But what? I tried floatval() at some points but didn't work. If anybody has a hint, I thank you very much.
You're getting this because you are doing math with floating-point values. Read some theory about it.
You really don't want to calculate money like that as you might get weird rounding problems that you can't really do anything to fix.
For PHP, there are plenty of libraries that help you evade the problem, such as BC Math, or GMP.
Other solution would be to calculate all of the values using the smallest monetary value that the currency has (like cents) so you are always using integers.
These are rounding problems. These are perfectly normal when we are talking about floats. To give you an everyday example,
1/3 = 0.3333333333333333...333333333...3333...
Reason: 10 is relative prime with 3. You might wonder where is 10 coming from. We are using 10-base for numbers, that is, whenever we speak about a number, its digits represent 10-base exponential values. The computer works with binary numbers, that is, 2-base numbers. This means that division with such numbers often result in endless sequences of digits. For instance, 1/3 as a binary number looks like this:
0.010101010101010101010101010101010101010101010101010101...
Decimal types are representing decimal numbers, that is, 10-base numbers. You use three digits for the part after the . Let's supose your number ends like this:
.xyz
this means:
xyz / 1000
However, 1000 can be divided with the following prime numbers:
2 and 5.
Since 5 is relative prime with 2, whenever you are representing the result of a division by 5 as a binary number, there is a potential that the result will be an endless cycle of digits. 1/5 as a binary number looks like this:
0.0011001100110011001100110011001100110011001100110011...
Since a computer cannot store endless digits, it has to round the number, that is, find a number close to its value which can be represented in an easier manner. If the number a is rounded to b and the two numbers are not equal, then a certain amount of precision is lost and this is the reason of the bug you have mentioned.
You can solve the problem as follows: when you select the values from the database, multiply them by 1000 (thus, converting them into integers) and then check the operations. At the end, divide by 1000.
The operation 1539 | 0xfffff800 returns -509 in JavaScript and Python 2.7.
In PHP I get 4294966787.
Does anybody know why and could explain that to me. I would love to know how I get the expected result in PHP as well.
1539 | 0xfffff800 = 4294966787 (= 0xFFFFFE03)
This is perfectly right. So, PHP is right.
If you would like to have both positive and negative integers, you need some mechanism to determine whether the number is negative. This is usually done using the 2-complement of the number. You can negate a number by just inverting all the bits of the number and then add 1 to it. In order to avoid ambiguities, you cannot use all the bits of your integer variable. You cannot use the highest bit in this case. The highest bit is reserved as a sign bit. (If you would not do so, you never know if your number is a big positive number or a negative number.)
For exammple with an 8 bit integer variable, you would be able to represent numbers from 0 to 255. If you need signed values, you can represent number from -128 (1000 000 binary) to +127 (0111 1111).
In your example, you have a 32 bit number which has its highest bit set. In Python and JavaScript, it's interpreted as negative number, as they apparently have 32 bit variables, and there, the highest bit is set. They interpret that as negative number. So, the result of your calculation is also negative.
In the PHP version you are using, the integer variable seems to be 64 bit long and only the lower 32 bits are used. The highest bit (bit 63) is not set, so PHP interprets this number as positive. Depending on what you want to achive, you may want to fill up all bits from bit 32 to bit 63 with 1s which will create a negative number...
Web application compares pairs of sets of positive integers. Each set has only unique values, no greater than 210 000 000 (fits into 28 bits). Up to 5 000 000 values in each set.
Comparing sets A & B, need three result sets: "unique to A", "unique to B", "common to A & B". Particular task is to answer a question "is number N present in set S?"
So far the project runs in limited resources of a shared hosting, under LAMP stack. Quick'n'dirty solution I came up with was to outsource the job to hosting's MySQL, which has more resources. Temporary table for each set, the only column with the numbers is the primary index. Rarely sets are small enough to fit into engine=Memory, which is fast. It works, but too slow.
Looking for a way to keep a set like this in-memory, effective for the task of searching a particular number within. Keeping memory footprint as low as possible.
I came up to an idea of coding each set as a bit mask of 2^28 bits (32 Mb). A number present in the set = 1 bit set. 5 mln numbers = 5 mln bits set out of 210mln. Many zeroes == can compress effectively?
Seems like I'm inventing a bicycle. Please direct me to a "well-known" solution to this particular case of binary compression. I read about Huffman coding, which seems not the right solution, as its focus is size reduction, while my task requires many searches over a compressed set.
Upd. Just found an article on Golomb coding and an example of its application to run-length encooding.
There is a standard compression technique available for represented large sets of integers in a range, which allows for efficient iteration (so it can easily do intersection, union, set difference, etc.) but does not allow random access (so it's no good for "is N in S"). For this particular problem, it will reduce the dataset to around seven bits each, which would be around 8MB for sets of size 5,000,000. In case it's useful, I'll describe it below.
Bit-vectors of size 210,000,000 bits (26MB each, roughly) are computationally efficient, both to answer the "is N in S" query, and for bitwise operations, since you can do them rapidly with vectorized instructions on modern processors; it's probably as fast as you're going to get for a 5,000,000-element intersection computation. It consumes a lot of memory, but if you've got that much memory, go for it.
The compression technique, which is simple and just about optimal if the sets are uniformly distributed random samples of the specified size, is as follows:
Sort the set (or ensure that it is sorted).
Set the "current value" to 0.
For each element in the set, in order:
a. subtract the "current value" from the element;
b. while that difference is at least 32, output a single 1 bit and subtract 32 from the difference;
c. output a single 0 bit, followed by the difference encoded in five bits.
d. set the "current value" to one more than the element
To justify my claim that the compression will result in around seven bits per element:
It's clear that every element will occupy six bits (0 plus a five-bit delta); in addition, we have to account for the 1 bits in step 3b. Note, however, that the sum of all the deltas is exactly the largest element in the set, which cannot be more than 210,000,000 and consequently, we cannot execute step 3b more than 210,000,000/32 times. So step 3b. will account for less than seven million bits, while step 3c will account for 6 * 5,000,000 bits, for a total of 37 million, or 7.4 bits per element (in practice, it will usually be a bit less than this).
24151.40 - 31891.10 = -7739.699999999997
I grab these two numbers from a MySQL table with the type as decimal(14,2)
24151.40
31891.10
It is saved exactly as stated above and it echos exactly like that in PHP. But the minute I subtract the second value from the first value, I get a number -7739.699999999997 instead of -7,739.7. Why the extra precision? And where is it coming from?
From an article I wrote for Authorize.Net:
One plus one equals two, right? How about .2 plus 1.4 times 10? That equals 16, right? Not if you're doing the math with PHP (or most other programming languages):
echo floor((0.2 + 1.4) * 10); // Should be 16. But it's 15!
This is due to how floating point numbers are handled internally. They are represented with a fixed number of decimal places and can result in numbers that do not add up quite like you expect. Internally our .2 plus 1.4 times 10 example computes to roughly 15.9999999998 or so. This kind of math is fine when working with numbers that do not have to be precise like percentages. But when working with money precision matters as a penny or a dollar missing here or there adds up quickly and no one likes being on the short end of any missing money.
The BC Math Solution
Fortunately PHP offers the BC Math extension which is "for arbitrary precision mathematics PHP offers the Binary Calculator which supports numbers of any size and precision, represented as strings." In other words, you can do precise math with monetary values using this extension. The BC Math extension contains functions that allow you to perform the most common operations with precision including addition, subtraction, multiplication, and division.
A Better Example
Here's the same example as above but using the bcadd() function to do the math for us. It takes three parameters. The first two are the values we wish to add and the third is the number of decimal places we wish to be precise to. Since we're working with money we'll set the precision to be two decimal palces.
echo floor(bcadd('0.2', '1.4', 2) * 10); // It's 16 like we would expect it to be.
PHP doesn't have a decimal type like MySQL does, it uses floats; and floats are notorious for being inaccurate.
To cure this, look into number_format, e.g.:
echo number_format(24151.40 - 31891.10, 2, '.', '');
For more accurate number manipulation, you could also look at the math extensions of PHP:
http://www.php.net/manual/en/refs.math.php
This has to do with general float / double precision rates, which scientifically relates to 1.FRACTAL * 2^exponential power. Being that there's a prefix of 1, there's technically no such thing as zero, and the closest value you can obtain to 0 is 1.0 * 2 ^ -127 which is .000000[127 0s]00001
By rounding off your answer to a certain precision, the round factor will give you a more precise answer
http://dev.mysql.com/doc/refman/5.0/en/mathematical-functions.html#function_round