Coping with big ints in 32 bit PHP - php

I have a class for computing the Luhn checksum for a number. It takes integer as an input and returns true or false to indicate validity or otherwise, or it throws an exception if an inappropriate data type is given as input.
The code is as follows (The full source is on GitHub):
class Luhn extends abstr\Prop implements iface\Prop
{
/**
* Test that the given data passes a Luhn check.
*
* #return bool True if the data passes the Luhn check
* #throws \InvalidArgumentException
* #see http://en.wikipedia.org/wiki/Luhn_algorithm
*/
public function isValid ()
{
$data = $this -> getData ();
$valid = false;
switch (gettype ($data))
{
case 'NULL' :
$valid = true;
break;
case 'integer' :
// Get the sequence of digits that make up the number under test
$digits = array_reverse (array_map ('intval', str_split ((string) $data)));
// Walk the array, doubling the value of every second digit
for ($i = 0, $count = count ($digits); $i < $count; $i++)
{
if ($i % 2)
{
// Double the digit
if (($digits [$i] *= 2) > 9)
{
// Handle the case where the doubled digit is over 9
$digits [$i] -= 10;
$digits [] = 1;
}
}
}
// The Luhn is valid if the sum of the digits ends in a 0
$valid = ((array_sum ($digits) % 10) === 0);
break;
default :
// An attempt was made to apply the check to an invalid data type
throw new \InvalidArgumentException (__CLASS__ . ': This property cannot be applied to data of type ' . gettype ($data));
break;
}
return ($valid);
}
}
I also built a full unit test to exercise the class.
My main development environment is a workstation running 64 bit builds PHP 5.3 and Apache under OSX Lion. I also use a laptop running a 64 bit build of Apache and PHP 5.4 also under Apache. As well as this I have a Ubuntu Linux virtual machine running 64 bit Apache and PHP 5.3. The unit test was fine for all of these, as expected.
I thought I could some spare time during lunch at work (Windows 7, XAMPP, 32 bit PHP 5.3) for working on the project that this class is a part of, but the first thing I ran into was failure of the unit test.
The problem is that on a 32 bit build of PHP the number gets silently cast to float if it exceeds the limits of a 32 bit integer. My proposed solution is to have a special case for float. If the input type is float, and its value is outside the range that can be expressed in int (PHP_INT_MIN .. PHP_INT_MAX) then I'll number_format() it to get it back into a string of digits. If it's within the range of an integer then I'll throw an exception.
However, this leads to its own problem. I know that the further away you get from 0 with a floating point number, the less resolution the number has (the smaller the increment between a given number and the next representable number gets). How far away from 0 do you have to get before it becomes impossible to represent the integer part of the number before you can't reliably represent the integer part any more? (I'm not sure if that's really clear, so for example, say the limit is 1000 before the resolution drops below the difference between one int and the next. I could enter a digit bigger than 1000, say 1001, but the limitations of floating point numbers means it ends up being 1001.9 and rounding it yields 1002, meaning I've lost the value I was interested in).
Is it possible to detect when the loss in resolution will become an issue for a floating point number?
EDIT TO ADD: I suppose I could modify the extension to accept a string instead of a numeric type and then verify that it contains only digits with a regex or some other similar technique, but as Luhn-checkable data is a string of digits that doesn't feel right to me, somehow. There are extensions for PHP that can handle bignums, but as they're extensions and this is meant to be a piece of framework code that could potentially be deployed over a wide range of configurations, I'd rather not rely on the presence of such extensions if at all possible. Besides, none of the above addresses the issue that if you give PHP a big int it silently converts it to float. I need a way of detecting that this has happened.

If you need precision, you should not use floats.
Instead, especially as you want to work with integers (if I understand correctly), you could try working with the bc* functions : BCMath Arbitrary Precision Mathematics

If you need precision, you should not use floats.
Instead, especially as you want to work with integers (if I understand correctly), you could try working with the gmp* functions: GMP - GNU Multiple Precision
If you cannot work with that extension you might get some additional ideas from
PEAR Big Integer - Pure-PHP arbitrary precision integer arithmetic library

Related

how to create a row of digits based on a string in php [duplicate]

In php is there a way to give a unique hash from a string, but that the hash was made up from numbers only?
example:
return md5(234); // returns 098f6bcd4621d373cade4e832627b4f6
but I need
return numhash(234); // returns 00978902923102372190
(20 numbers only)
the problem here is that I want the hashing to be short.
edit:
OK let me explain the back story here.
I have a site that has a ID for every registered person, also I need a ID for the person to use and exchange (hence it can't be too long), so far the ID numbering has been 00001, 00002, 00003 etc...
this makes some people look more important
this reveals application info that I don't want to reveal.
To fix point 1 and 2 I need to "hide" the number while keeping it unique.
Edit + SOLUTION:
Numeric hash function based on the code by https://stackoverflow.com/a/23679870/175071
/**
* Return a number only hash
* https://stackoverflow.com/a/23679870/175071
* #param $str
* #param null $len
* #return number
*/
public function numHash($str, $len=null)
{
$binhash = md5($str, true);
$numhash = unpack('N2', $binhash);
$hash = $numhash[1] . $numhash[2];
if($len && is_int($len)) {
$hash = substr($hash, 0, $len);
}
return $hash;
}
// Usage
numHash(234, 20); // always returns 6814430791721596451
An MD5 or SHA1 hash in PHP returns a hexadecimal number, so all you need to do is convert bases. PHP has a function that can do this for you:
$bignum = hexdec( md5("test") );
or
$bignum = hexdec( sha1("test") );
PHP Manual for hexdec
Since you want a limited size number, you could then use modular division to put it in a range you want.
$smallnum = $bignum % [put your upper bound here]
EDIT
As noted by Artefacto in the comments, using this approach will result in a number beyond the maximum size of an Integer in PHP, and the result after modular division will always be 0. However, taking a substring of the hash that contains the first 16 characters doesn't have this problem. Revised version for calculating the initial large number:
$bignum = hexdec( substr(sha1("test"), 0, 15) );
You can try crc32(). See the documentation at: http://php.net/manual/en/function.crc32.php
$checksum = crc32("The quick brown fox jumped over the lazy dog.");
printf("%u\n", $checksum); // prints 2191738434
With that said, crc should only be used to validate the integrity of data.
There are some good answers but for me the approaches seem silly.
They first force php to create a Hex number, then convert this back (hexdec) in a BigInteger and then cut it down to a number of letters... this is much work!
Instead why not
Read the hash as binary:
$binhash = md5('[input value]', true);
then using
$numhash = unpack('N2', $binhash); //- or 'V2' for little endian
to cast this as two INTs ($numhash is an array of two elements). Now you can reduce the number of bits in the number simply using an AND operation. e.g:
$result = $numhash[1] & 0x000FFFFF; //- to get numbers between 0 and 1048575
But be warned of collisions! Reducing the number means increasing the probability of two different [input value] with the same output.
I think that the much better way would be the use of "ID-Crypting" with a Bijectiv function. So no collisions could happen! For the simplest kind just use an Affine_cipher
Example with max input value range from 0 to 25:
function numcrypt($a)
{
return ($a * 15) % 26;
}
function unnumcrypt($a)
{
return ($a * 7) % 26;
}
Output:
numcrypt(1) : 15
numcrypt(2) : 4
numcrypt(3) : 19
unnumcrypt(15) : 1
unnumcrypt(4) : 2
unnumcrypt(19) : 3
e.g.
$id = unnumcrypt($_GET('userid'));
... do something with the ID ...
echo ' go ';
of course this is not secure, but if no one knows the method used for your encryption then there are no security reasons then this way is faster and collision safe.
The problem of cut off the hash are the collisions, to avoid it try:
return hexdec(crc32("Hello World"));
The crc32():
Generates the cyclic redundancy checksum polynomial of 32-bit lengths
of the str. This is usually used to validate the integrity of data
being transmitted.
That give us an integer of 32 bit, negative in 32 bits installation, or positive in the 64 bits. This integer could be store like an ID in a database. This don´t have collision problems, because it fits into 32bits variable, once you convert it to decimal with the hexdec() function.
First of all, md5 is basically compromised, so you shouldn't be using it for anything but non-critical hashing.
PHP5 has the hash() function, see http://www.php.net/manual/en/function.hash.php.
Setting the last parameter to true will give you a string of binary data. Alternatively, you could split the resulting hexadecimal hash into pieces of 2 characters and convert them to integers individually, but I'd expect that to be much slower.
Try hashid.
It hash a number into format you can define. The formats include how many character, and what character included.
Example:
$hashids->encode(1);
Will return "28630" depends on your format,
Just use my manual hash method below:
Divide the number (e.g. 6 digit) by prime values, 3,5,7.
And get the first 6 values that are in the decimal places as the ID to be used. Do a check on uniqueness before actual creation of the ID, if a collision exists, increase the last digit by +1 until a non collision.
E.g. 123456 gives you 771428
123457 gives you 780952
123458 gives you 790476.

Reliable Margin of Error for Float -> String -> Float Conversion?

I have a float value that I need to store as a string in PHP and then compare later after casting back into a float.
Due to the conversion I know that relying on equality would be a mistake, as there's potential for a loss of precision, so I'm doing something like the following:
if (abs((float)$string_value - $float_value) < 0.001) { echo "Values are close enough\n"; }
Now, while a margin for error of 0.001 should be fine for my immediate purposes, it got me wondering; what is the smallest margin of error that I can reliably/safely use?
I realise that the safe margin of error will change with the size of the float (i.e- larger values have less or even no fractional precision), so an answer should probably account for this.
So to put it another way; given a float value that I want to store in base 10 and read back, how can I reliably decide what my margin of error should be such that I can reasonably confirm that the two values are the same?
Unfortunately the values I'm handling must be stored in plain decimal form, so my usual go-to of packing them as a network order 64-bit integer is not an option here ☹️
EDIT: To clarify; please assume that my question is about handling arbitrarily sized floats; the example code I've given is for a recent case where I'm handling floats within a limited range, so setting the margin of error manually is fine, but I'd like to be able to handle floats of any magnitude in future.
As mentioned in Mark Dickinson's comment, it is possible to convert a floating-point number to a string and back without losing precision. This only works if
you use enough significant decimal digits (17 for IEEE doubles)
the conversions are accurate (i.e. they're guaranteed to convert to the nearest number)
From a quick look, it seems that casting a double $f to a string in PHP, either implicitly or with (string) $f, only uses 14 significant digits, so this method isn't accurate enough. But you can use sprintf with a %.16e conversion specifier to get 17 significant digits. So after the following roundtrip
$s = sprintf("%.16e", $f);
$f2 = (double) $s;
$f2 should equal $f exactly unless PHP uses suboptimal algorithms internally.
Note that the %e conversion specifier uses scientific (exponential) notation. If you need plain decimal strings, you can use the %f specifier and calculate the required number of digits after the decimal point using log10:
if ($f != 0) {
$prec = 16 - floor(log10(abs($f)));
if ($prec < 0) $prec = 0;
}
else {
$prec = 0;
}
$s = sprintf("%.${prec}f", $f);
This can produce extremely long strings for very small or large numbers, though.
It would probably require a huge amount of research to tell the whether these methods are completely reliable, and if not what the maximum error is. It all depends on several implementation details like PHP version, underlying C library, etc.
Another idea is to compare the string representations instead of floating-point values:
# Assuming $string_value was also converted with float_to_string
if ($string_value == float_to_string($float_value)) {
echo "Values are close enough\n";
}
This should be reliable as long as you stick to the same PHP version.
If you must compare floating-point numbers, it often makes more sense to compare the relative error. See Bruce Dawson's excellent blog for more details.

accurate and reliable Base36 in PHP?

Here is my code so far:
function base36($value, $return_size)
{
$base36 = "0123456789ABCDEFGHIJKLMNOPQRSTUVWXYZ";
$buffer = str_pad("", $return_size);
$offset = $return_size;
do {
$buffer[--$offset] = $base36[$value % 36];
} while ($value /= 36);
return $buffer;
}
$value: 64bit integer
$return_size: the expected size in bytes the function should return
It doesn't work correctly, because the $value is 64bit integer and because PHP forces double divisions. PHP seems pretty limited when it's about 64bit integers. How to make the above code work like the exact C version would?
Native:
base_convert is a function which happens to do exactly what you want to do.
string base_convert ( string $number , int $frombase , int $tobase )
Its input structure is limited between base 2 and base 36 so it covers what you need. It is most likely (like many other PHP functions) just a light wrapper over the C library originals.
GMP:
gmp_strval is another function which happens to do exactly what you want to do -- it also has better precision (because GMP is a multiprecision arithmetic library).
GMP values need to be initialized from strings using gmp_init and the resulting value (is a resource) is used in all subsequent GMP arithmetic function calls.
It has a higher number of available bases (from 2 to 62) but it is a bit less fun to work with because of the requirement to init the values and use them as resources.
The less fun part isn't true if you're running PHP 5.6 because GMP overloads the arithmetic operators in that version allowing GMP objects (resources?) to be added substracted etc. by using the operators.
Here's a simple GMP conversion function example that doesn't require that you initialize values with gmp_init:
function gmp_convert($num, $base_a, $base_b)
{
return gmp_strval ( gmp_init($num, $base_a), $base_b );
}

What are the maximum limit of bitwise operation on large integers in PHP?

I just realized something useful and hope to share:
I am running PHP x86 on a 64bit Windows 7 machine. I was trying to do a permission system where I have a line like this:
// value of $role came from database i.e. 0xffffffff
// value of $function_ACL is hardcoded in PHP file using 32bit hex notation i.e. 0x80000000
// true if access is allowed
return ($function_ACL & $role) != 0
Somehow, the value of $role is converted using the rules of intval(), thereby reaching the integer limit, and the result is wrongly 0.
To get around this problem, I noticed we can do this
$function_ACL += 0;
$role += 0;
return ($function_ACL & $role) != 0 // works! which is odd, because type conversion don't follow the same routine
This leads me to wonder what really are the limits, then trialed a few very large numbers
// these numbers get converted to scientific notation
echo 0xffffffffffffffffffffffffffffffff;
echo 9999999999999999999999999999999999
// 52 bits (13 f's) is the max limit for a correct bitwise operation
echo (0xfffffffffffff & 0x0000000000001);
Anyone has more to contribute?
Take a look on gmp. It allows bitwise operations on large numbers.
$role = gmp_init("0xffffffff");
$function_ACL = gmp_init("0x80000000");
if (gmp_and($role, $function_ACL))
echo "yes!";

How to compare two 64 bit numbers

In PHP I have a 64 bit number which represents tasks that must be completed. A second 64 bit number represents the tasks which have been completed:
$pack_code = 1001111100100000000000000011111101001111100100000000000000011111
$veri_code = 0000000000000000000000000001110000000000000000000000000000111110
I need to compare the two and provide a percentage of tasks completed figure. I could loop through both and find how many bits are set, but I don't know if this is the fastest way?
Assuming that these are actually strings, perhaps something like:
$pack_code = '1001111100100000000000000011111101001111100100000000000000011111';
$veri_code = '0000000000000000000000000001110000000000000000000000000000111110';
$matches = array_intersect_assoc(str_split($pack_code),str_split($veri_code));
$finished_matches = array_intersect($matches,array(1));
$percentage = (count($finished_matches) / 64) * 100
Because you're getting the numbers as hex strings instead of ones and zeros, you'll need to do a bit of extra work.
PHP does not reliably support numbers over 32 bits as integers. 64-bit support requires being compiled and running on a 64-bit machine. This means that attempts to represent a 64-bit integer may fail depending on your environment. For this reason, it will be important to ensure that PHP only ever deals with these numbers as strings. This won't be hard, as hex strings coming out of the database will be, well, strings, not ints.
There are a few options here. The first would be using the GMP extension's gmp_xor function, which performs a bitwise-XOR operation on two numbers. The resulting number will have bits turned on when the two numbers have opposing bits in that location, and off when the two numbers have identical bits in that location. Then it's just a matter of counting the bits to get the remaining task count.
Another option would be transforming the number-as-a-string into a string of ones and zeros, as you've represented in your question. If you have GMP, you can use gmp_init to read it as a base-16 number, and use gmp_strval to return it as a base-2 number.
If you don't have GMP, this function provided in another answer (scroll to "Step 2") can accurately transform a string-as-number into anything between base-2 and 36. It will be slower than using GMP.
In both of these cases, you'd end up with a string of ones and zeros and can use code like that posted by #Mark Baker to get the difference.
Optimization in this case is not worth of considering. I'm 100% sure that you don't really care whether your scrip will be generated 0.00000014 sec. faster, am I right?
Just loop through each bit of that number, compare it with another and you're done.
Remember words of Donald Knuth:
We should forget about small efficiencies, say about 97% of the time: premature optimization is the root of all evil.
This code utilizes the GNU Multi Precision library, which is supported by PHP, and since it is implemented in C, should be fast enough, and supports arbitrary precision.
$pack_code = gmp_init("1001111100100000000000000011111101001111100100000000000000011111", 2);
$veri_code = gmp_init("0000000000000000000000000001110000000000000000000000000000111110", 2);
$number_of_different_bits = gmp_popcount(gmp_xor($pack_code, $veri_code));
$a = 11111;
echo sprintf('%032b',$a)."\n";
$b = 12345;
echo sprintf('%032b',$b)."\n";
$c = $a & $b;
echo sprintf('%032b',$c)."\n";
$n=0;
while($c)
{
$n += $c & 1;
$c = $c >> 1;
}
echo $n."\n";
Output:
00000000000000000010101101100111
00000000000000000011000000111001
00000000000000000010000000100001
3
Given your PHP-setuo can handle 64bit, this can be easily extended.
If not you can sidestep this restriction using GNU Multiple Precision
You could also split up the HEx-Representation and then operate on those coresponding parts parts instead. As you need just the local fact of 1 or 0 and not which number actually is represented! I think that would solve your problem best.
For example:
0xF1A35C and 0xD546C1
you just compare the binary version of F and D, 1 and 5, A and 4, ...

Categories