SolR float (TrieFloatField) storage limits - php

I'm trying to understand how float are stored in SolR.
I have a delta between the float value in PHP (32-bit) and the stored one in SolR.
I've searched in the documentation, "Field Types Included with SolR" :
https://cwiki.apache.org/confluence/display/solr/Field+Types+Included+with+Solr
And found for TrieFloatField:
Floating point field (32-bit IEEE floating point). precisionStep="0"
enables efficient numeric sorting and minimizes index size;
precisionStep="8" (the default) enables efficient range queries.
But I don't know how to estimate what will be the stored value.
Here are some tests I've made.
The value I've tried to insert in the float field and the result:
ok: 2097151.1
ko: 2097152.1 -> 2097152
ko: 20971521 -> 20971520
ok: 16777216
ko: 16777217 -> 16777216
ko: 4294967296 -> 4294967300
ok: 4294967300
ko: 4294967301 -> 4294967300
I don't understand which constraint is used, it is not rounded.
Maybe it is a binary constraint, because it looks like it is rounded to fit powers of 2.
https://en.wikipedia.org/wiki/Power_of_two#The_first_96_powers_of_two
2^21 = 2,097,152
2^24 = 16,777,216
2^32 = 4,294,967,296
As you can see, these values are close the the ones stored by SolR.
Does someone have an idea how SolR stores float?
And how to evaluate it with PHP?
Thanks.

As you've mentioned, it's a 32 bit floating point number. A 32-bit floating point number can't represent all the values between 0 and 2^32 exactly, so there will be inaccuracies and numbers that can't be represented using those bits.
You can use a converter like IEEE754 Floating Point Conversion to test the values you've included, and they all convert to what you're getting back from Solr.
Floating point numbers are not exact, and aren't magic - there's still just 2^32 distinct values available, so when you're trying to store values that don't map exactly onto the possible values that a 32 bit FP can represent, you'll get inaccuracies.
Doubles were introduced to have more accuracy (64-bit vs 32-bit), and you can use doubles in Solr by using a TrieDoubleField instead.
Another option, depending on what you need, is to use a long field instead, and multiplying by 10 or 100 when storing a value and dividing the value on the way out. That will allow you to exactly represent a decimal number with two digits after the dot.

Apparently, the most secure way to compare floats is to use pack().
Pack data into binary string to securely compare two floats.
http://php.net/manual/en/language.types.float.php#119860
So, as an alternative to using
$float1 === $float2
one could use
pack('f', $float1) === pack ('f', $float2)
with a big footnote that one should really remember that one is reducing your accuracy of the comparison. AFAIK is this the only way (apart from epsilon methods) to securely compare two floats.

Related

Bitwise operator with large numbers

Is there a way to check greater values then "2147483648"?
I have to work with numbers up to "6.73297395398192e212" (2^707).
The data is stored in a mysql-database as float.
Maybe I'm just using the wrong search terms or there is not a good way.
A double precision value uses 8 bytes, and you obviously cannot store 707 bits in those (which I assume you are trying to do). It can store a value of 1e308 by an approximation that costs precision in the lower digits, which makes it a bad choice for storing data that you want to do bitwise operations on. For bitwise operation on 8 bytes, you can use bigint.
Since MySQL 8, MySQL supports bitwise operations on binary string of arbitrary length, so you should store your value that way - a bit array is basically a binary string anyway. You cannot treat them as numbers though (e.g. add or multiply them like integers).
For earlier MySQL versions, bit operations on binary strings were limited to 8 bytes. You should still store your bits as a binary string (which allows for an easy upgrade), and write a small function that does the operation e.g. bytewise.

phpMyAdmin Float Disable Round Number

I have been working on a project with a lot of numbers inserted in a database table. Now that I finished the code, I was checking the values for errors and I noticed my value 3075277 would transform in 3075280 when inserted in the db and 3075255 would be 3075260.
The colummn type is Float. What should I change to disable the rounding? This one doesn't even have decimals numbers, why would it round like that? I use the default options, only changed collation to utf8_general_ci and change the type to varchar and lenght in some and float in others.
This issue is with MySQL, not Phpmyadmin.
FLOAT has 6-7 significant digits of precision, as you are seeing with the mangled values. By "significant digits", I mean starting anywhere:
1234567xxxx.
12345.67xxx
1.234567xxx
0.0000001234567xxx
That is the xxx is likely to be zeros or some kind of 'noise', not the original value you put into the column.
DOUBLE gives you about 16 significant digits.
DECIMAL(9,0) gives you 9 digits to the left of the decimal point, none afterwards. Sort of like INT.
DECIMAL(9,4) gives you 5 (9-4) digits to the left of the point; 4 afterwards.
What kinds of numbers are you storing? Money? Scientific measurements? Astronomical distances? DT's wealth?
Now you are using FLOAT type but getting error because you are saving big decimal number in the database. You should go for DOUBLE.
Although FLOAT and DOUBLE are similar because they store the value in approximate value, but that DOUBLE is 8-bytes, and FLOAT is 4-bytes.
A FLOAT is for single-precision, while a DOUBLE is for double-precision numbers.
MySQL uses four bytes for single-precision values and eight bytes for double-precision values.
There is a big difference from floating point numbers and decimal (numeric) numbers, which you can use with the DECIMAL data type. This is used to store exact numeric data values, unlike floating point numbers, where it is important to preserve exact precision, for example with monetary data.
So as in your case, for larger numbers you would want DOUBLE instead of FLOAT.

Convert IEEE 754 to decimal floating point

I have what I think it is an IEEE754 with single or double precision (not sure) and I'd like to convert it to decimal on PHP.
Given 4 hex value (which might be in little endian format, so basically reversed order) 4A,5B,1B,05 I need to convert it to a decimal value which I know will be very close to 4724.50073.
I've tried some online converters but they are far from the expected result so I'm clearly missing something.
If I echo 0x4A; I get 74 and the others are 91, 27 and 5. Not sure where to take it from here...
To convert it to float, use unpack. If the byte order is incorrect, you'll have to reverse it yourself before unpacking. 4 bytes (32 bits) usually means it's a float, 8 for double.
$bin = "\x4A\x5B\x1B\x05";
$a = unpack('f', strrev($bin));
echo $a[1]; // 3589825.25
I don't see any way how this maps to 4724.50073 directly tho. Without any more test data or manufacturer's manual this question is not fully answerable.
Speculation: judging from the size of the coordinate it's probably some sort of projection (XYZ or mercator) which can then be converted to WGS84 or whatever you need. Unfortunately there's no way to check since you haven't provided both latitude and longitude.

PHP & Base 2. Which Floats give a precise Value?

Apologies for my poor maths skills, I've tried to understand this to answer my own query but I'm not convinced.
We all know that PHP doesn't store Floats in base 10 but base 2.
I have a series of calculations that are using 0.5 as the only float, and in trying to understand if they will be stored as 0.500001 or 0.4999999 (for rounding purposes there is a big difference!!!) I have come to understand that 0.5 will be stored precisely in base2.
My queries are
A Have I understood this correctly?
B What other floats are stored precisely in base2? eg 0.25?
Any multiple of 1/pow(x, 2) can be precisely represented as a float.
That means x/2, x/4, x/8, x/16 ...ect. can be accurately represented.
For more information on how floating point numbers are store see http://kipirvine.com/asm/workbook/floating_tut.htm
Gmp is a good library for high precision math.
PHP is not required to use binary floating-point. It depends on the system.
Many systems use IEEE-754 binary floating-point (sometimes incompletely or with modifications, such as flushing subnormal numbers to zero).
In IEEE-754 64-bit binary floating point, a number is exactly representable if and only if it is representable as an integer F times a power of two, 2E, such that:
The magnitude of F is less than 253.
–1074 ≤ E < 972.
For example, ½ equals 1•2–1. 1 is an integer under the integer limit, and –1 is an exponent within the exponent limits. So ½ is representable.
253+1 is not representable. As it is, it is an integer outside the integer limit. If you try to scale it by a power of two to bring it within the limit, you get a number that is not an integer. So there is no way to represent this value exactly in IEEE-754 64-bit binary floating-point.
1/3 and 1/10 are also not representable because no matter what power of two you scale them by, you will not produce an integer.

MySQL & PHP decimal precision wrong

24151.40 - 31891.10 = -7739.699999999997
I grab these two numbers from a MySQL table with the type as decimal(14,2)
24151.40
31891.10
It is saved exactly as stated above and it echos exactly like that in PHP. But the minute I subtract the second value from the first value, I get a number -7739.699999999997 instead of -7,739.7. Why the extra precision? And where is it coming from?
From an article I wrote for Authorize.Net:
One plus one equals two, right? How about .2 plus 1.4 times 10? That equals 16, right? Not if you're doing the math with PHP (or most other programming languages):
echo floor((0.2 + 1.4) * 10); // Should be 16. But it's 15!
This is due to how floating point numbers are handled internally. They are represented with a fixed number of decimal places and can result in numbers that do not add up quite like you expect. Internally our .2 plus 1.4 times 10 example computes to roughly 15.9999999998 or so. This kind of math is fine when working with numbers that do not have to be precise like percentages. But when working with money precision matters as a penny or a dollar missing here or there adds up quickly and no one likes being on the short end of any missing money.
The BC Math Solution
Fortunately PHP offers the BC Math extension which is "for arbitrary precision mathematics PHP offers the Binary Calculator which supports numbers of any size and precision, represented as strings." In other words, you can do precise math with monetary values using this extension. The BC Math extension contains functions that allow you to perform the most common operations with precision including addition, subtraction, multiplication, and division.
A Better Example
Here's the same example as above but using the bcadd() function to do the math for us. It takes three parameters. The first two are the values we wish to add and the third is the number of decimal places we wish to be precise to. Since we're working with money we'll set the precision to be two decimal palces.
echo floor(bcadd('0.2', '1.4', 2) * 10); // It's 16 like we would expect it to be.
PHP doesn't have a decimal type like MySQL does, it uses floats; and floats are notorious for being inaccurate.
To cure this, look into number_format, e.g.:
echo number_format(24151.40 - 31891.10, 2, '.', '');
For more accurate number manipulation, you could also look at the math extensions of PHP:
http://www.php.net/manual/en/refs.math.php
This has to do with general float / double precision rates, which scientifically relates to 1.FRACTAL * 2^exponential power. Being that there's a prefix of 1, there's technically no such thing as zero, and the closest value you can obtain to 0 is 1.0 * 2 ^ -127 which is .000000[127 0s]00001
By rounding off your answer to a certain precision, the round factor will give you a more precise answer
http://dev.mysql.com/doc/refman/5.0/en/mathematical-functions.html#function_round

Categories