accurate and reliable Base36 in PHP? - php

Here is my code so far:
function base36($value, $return_size)
{
$base36 = "0123456789ABCDEFGHIJKLMNOPQRSTUVWXYZ";
$buffer = str_pad("", $return_size);
$offset = $return_size;
do {
$buffer[--$offset] = $base36[$value % 36];
} while ($value /= 36);
return $buffer;
}
$value: 64bit integer
$return_size: the expected size in bytes the function should return
It doesn't work correctly, because the $value is 64bit integer and because PHP forces double divisions. PHP seems pretty limited when it's about 64bit integers. How to make the above code work like the exact C version would?

Native:
base_convert is a function which happens to do exactly what you want to do.
string base_convert ( string $number , int $frombase , int $tobase )
Its input structure is limited between base 2 and base 36 so it covers what you need. It is most likely (like many other PHP functions) just a light wrapper over the C library originals.
GMP:
gmp_strval is another function which happens to do exactly what you want to do -- it also has better precision (because GMP is a multiprecision arithmetic library).
GMP values need to be initialized from strings using gmp_init and the resulting value (is a resource) is used in all subsequent GMP arithmetic function calls.
It has a higher number of available bases (from 2 to 62) but it is a bit less fun to work with because of the requirement to init the values and use them as resources.
The less fun part isn't true if you're running PHP 5.6 because GMP overloads the arithmetic operators in that version allowing GMP objects (resources?) to be added substracted etc. by using the operators.
Here's a simple GMP conversion function example that doesn't require that you initialize values with gmp_init:
function gmp_convert($num, $base_a, $base_b)
{
return gmp_strval ( gmp_init($num, $base_a), $base_b );
}

Related

Coping with big ints in 32 bit PHP

I have a class for computing the Luhn checksum for a number. It takes integer as an input and returns true or false to indicate validity or otherwise, or it throws an exception if an inappropriate data type is given as input.
The code is as follows (The full source is on GitHub):
class Luhn extends abstr\Prop implements iface\Prop
{
/**
* Test that the given data passes a Luhn check.
*
* #return bool True if the data passes the Luhn check
* #throws \InvalidArgumentException
* #see http://en.wikipedia.org/wiki/Luhn_algorithm
*/
public function isValid ()
{
$data = $this -> getData ();
$valid = false;
switch (gettype ($data))
{
case 'NULL' :
$valid = true;
break;
case 'integer' :
// Get the sequence of digits that make up the number under test
$digits = array_reverse (array_map ('intval', str_split ((string) $data)));
// Walk the array, doubling the value of every second digit
for ($i = 0, $count = count ($digits); $i < $count; $i++)
{
if ($i % 2)
{
// Double the digit
if (($digits [$i] *= 2) > 9)
{
// Handle the case where the doubled digit is over 9
$digits [$i] -= 10;
$digits [] = 1;
}
}
}
// The Luhn is valid if the sum of the digits ends in a 0
$valid = ((array_sum ($digits) % 10) === 0);
break;
default :
// An attempt was made to apply the check to an invalid data type
throw new \InvalidArgumentException (__CLASS__ . ': This property cannot be applied to data of type ' . gettype ($data));
break;
}
return ($valid);
}
}
I also built a full unit test to exercise the class.
My main development environment is a workstation running 64 bit builds PHP 5.3 and Apache under OSX Lion. I also use a laptop running a 64 bit build of Apache and PHP 5.4 also under Apache. As well as this I have a Ubuntu Linux virtual machine running 64 bit Apache and PHP 5.3. The unit test was fine for all of these, as expected.
I thought I could some spare time during lunch at work (Windows 7, XAMPP, 32 bit PHP 5.3) for working on the project that this class is a part of, but the first thing I ran into was failure of the unit test.
The problem is that on a 32 bit build of PHP the number gets silently cast to float if it exceeds the limits of a 32 bit integer. My proposed solution is to have a special case for float. If the input type is float, and its value is outside the range that can be expressed in int (PHP_INT_MIN .. PHP_INT_MAX) then I'll number_format() it to get it back into a string of digits. If it's within the range of an integer then I'll throw an exception.
However, this leads to its own problem. I know that the further away you get from 0 with a floating point number, the less resolution the number has (the smaller the increment between a given number and the next representable number gets). How far away from 0 do you have to get before it becomes impossible to represent the integer part of the number before you can't reliably represent the integer part any more? (I'm not sure if that's really clear, so for example, say the limit is 1000 before the resolution drops below the difference between one int and the next. I could enter a digit bigger than 1000, say 1001, but the limitations of floating point numbers means it ends up being 1001.9 and rounding it yields 1002, meaning I've lost the value I was interested in).
Is it possible to detect when the loss in resolution will become an issue for a floating point number?
EDIT TO ADD: I suppose I could modify the extension to accept a string instead of a numeric type and then verify that it contains only digits with a regex or some other similar technique, but as Luhn-checkable data is a string of digits that doesn't feel right to me, somehow. There are extensions for PHP that can handle bignums, but as they're extensions and this is meant to be a piece of framework code that could potentially be deployed over a wide range of configurations, I'd rather not rely on the presence of such extensions if at all possible. Besides, none of the above addresses the issue that if you give PHP a big int it silently converts it to float. I need a way of detecting that this has happened.
If you need precision, you should not use floats.
Instead, especially as you want to work with integers (if I understand correctly), you could try working with the bc* functions : BCMath Arbitrary Precision Mathematics
If you need precision, you should not use floats.
Instead, especially as you want to work with integers (if I understand correctly), you could try working with the gmp* functions: GMP - GNU Multiple Precision
If you cannot work with that extension you might get some additional ideas from
PEAR Big Integer - Pure-PHP arbitrary precision integer arithmetic library

php5 pack is broken on x84_64 env

pack('H*', dechex(12345678900)) /* on 32bit */
!= pack('H*', dechex(12345678900)) /* on 64bit */
why ?
I don't know how to fix it, but I think I know why this is happening. No bug here - straigt out from the manual http://php.net/manual/en/function.dechex.php
The largest number that can be converted is 4294967295 in decimal resulting to "ffffffff"
I do not know what exactly is happening "inside" php, but you probably are causing 32 bit unsigned integer to overflow (12,345,678,900 > 4,294,967,295). Since on 64 bit this limit should be 18,446,744,073,709,551,615, dechex is returning "correct" values (32 vs 64 bit diffirence doesn't seem to be documented and I might be wrong since I don't have 64 bit system for testing).
//Edit:
As a last resort you could use GMP extesion to make your own hecdex function for 32 bit system, but that is going to produce lots and lots of overhead. Probably going to be one of the slowest implementations known to the modern programming.
//Edit2:
Wrote a function using BCMath, I'm on a Windows at the moment and was struggling finding correct dll for GMP.
function dechex32($i) {
//Cast string
$i = (string)$i;
//Initialize result string
$r = NULL;
//Map hex values 0-9, a-f to array keys
$hex = array_merge(range(0, 9), range('a', 'f'));
//While input is lagrer than 0
while(bccomp($i, '0') > 0) {
//Modulo 16 and append hex char to result
$r.= $hex[$mod = bcmod($i, '16')];
//i = (i - mod) / 16
$i = bcdiv(bcsub($i, $mod), '16');
}
//Reverse result and return
return strrev($r);
}
var_dump(dechex32(12345678900));
/*string(9) "2dfdc1c34"*/
Didn't test thoroughly but seems to work. Use as a last resort - rough benchmarking with 100,000 iterations did show, that it's ~40 times slower than native implemetation.

How to compare two 64 bit numbers

In PHP I have a 64 bit number which represents tasks that must be completed. A second 64 bit number represents the tasks which have been completed:
$pack_code = 1001111100100000000000000011111101001111100100000000000000011111
$veri_code = 0000000000000000000000000001110000000000000000000000000000111110
I need to compare the two and provide a percentage of tasks completed figure. I could loop through both and find how many bits are set, but I don't know if this is the fastest way?
Assuming that these are actually strings, perhaps something like:
$pack_code = '1001111100100000000000000011111101001111100100000000000000011111';
$veri_code = '0000000000000000000000000001110000000000000000000000000000111110';
$matches = array_intersect_assoc(str_split($pack_code),str_split($veri_code));
$finished_matches = array_intersect($matches,array(1));
$percentage = (count($finished_matches) / 64) * 100
Because you're getting the numbers as hex strings instead of ones and zeros, you'll need to do a bit of extra work.
PHP does not reliably support numbers over 32 bits as integers. 64-bit support requires being compiled and running on a 64-bit machine. This means that attempts to represent a 64-bit integer may fail depending on your environment. For this reason, it will be important to ensure that PHP only ever deals with these numbers as strings. This won't be hard, as hex strings coming out of the database will be, well, strings, not ints.
There are a few options here. The first would be using the GMP extension's gmp_xor function, which performs a bitwise-XOR operation on two numbers. The resulting number will have bits turned on when the two numbers have opposing bits in that location, and off when the two numbers have identical bits in that location. Then it's just a matter of counting the bits to get the remaining task count.
Another option would be transforming the number-as-a-string into a string of ones and zeros, as you've represented in your question. If you have GMP, you can use gmp_init to read it as a base-16 number, and use gmp_strval to return it as a base-2 number.
If you don't have GMP, this function provided in another answer (scroll to "Step 2") can accurately transform a string-as-number into anything between base-2 and 36. It will be slower than using GMP.
In both of these cases, you'd end up with a string of ones and zeros and can use code like that posted by #Mark Baker to get the difference.
Optimization in this case is not worth of considering. I'm 100% sure that you don't really care whether your scrip will be generated 0.00000014 sec. faster, am I right?
Just loop through each bit of that number, compare it with another and you're done.
Remember words of Donald Knuth:
We should forget about small efficiencies, say about 97% of the time: premature optimization is the root of all evil.
This code utilizes the GNU Multi Precision library, which is supported by PHP, and since it is implemented in C, should be fast enough, and supports arbitrary precision.
$pack_code = gmp_init("1001111100100000000000000011111101001111100100000000000000011111", 2);
$veri_code = gmp_init("0000000000000000000000000001110000000000000000000000000000111110", 2);
$number_of_different_bits = gmp_popcount(gmp_xor($pack_code, $veri_code));
$a = 11111;
echo sprintf('%032b',$a)."\n";
$b = 12345;
echo sprintf('%032b',$b)."\n";
$c = $a & $b;
echo sprintf('%032b',$c)."\n";
$n=0;
while($c)
{
$n += $c & 1;
$c = $c >> 1;
}
echo $n."\n";
Output:
00000000000000000010101101100111
00000000000000000011000000111001
00000000000000000010000000100001
3
Given your PHP-setuo can handle 64bit, this can be easily extended.
If not you can sidestep this restriction using GNU Multiple Precision
You could also split up the HEx-Representation and then operate on those coresponding parts parts instead. As you need just the local fact of 1 or 0 and not which number actually is represented! I think that would solve your problem best.
For example:
0xF1A35C and 0xD546C1
you just compare the binary version of F and D, 1 and 5, A and 4, ...

Using bit operations on 64 bits integers in 32 bit systems (no php_gpm extension)

I found some solution Efficient way of doing 64 bit rotate using 32 bit values but it's not in PHP.
The biggest problem is that I get from remote server big integer 9223372036854775808(10) as hexadecimal 8000000000000000(16).
There is no chance to enable php_gmp (extension) on production server but I have to check selected bits in received value. Both, production and development server are 32bits machines.
You can accomplish this using BC Math (Arbitrary Precision Mathematics):
BC Math allows you to perform mathematic operations on numbers. The difference between using arithmetic operators and using BC Maths is that instead of storing the number as an integer or a float, BC Math returns the number as string.
http://php.net/manual/en/ref.bc.php
PHP has to be compiled with BC Math; however most PHP installs should have this.
Unfortunately you can't do bitwise operations on strings, and BC Math doesn't have any built-in bitwise functions. However; after doing a bit of Googling, I found the following code sample and I've copied and pasted it here below:
function bitValue($no) { return bcpow(2, $no); }
function bitSet($no, $value) {
$tmp = bcmod($value, bitValue($no+1));
return bccomp(bcsub($tmp, bitValue($no)), 0)>= 0;
}
echo bitSet(49, bitValue(48)) ."\n";
echo bitSet(48, bitValue(48)) ."\n";
echo bitSet(47, bitValue(48)) ."\n";
(Credits to hernst42)

PHP 64 bit numbers?

I have a simple function that I'm using but for some reason the number doesn't calculate correctly as it would in a calculator. I think it has something to do with the numbers being too large, or something to do with 64 bit. Is there any way I can convert them so that they would work correctly?
$sSteamComID = 76561197990369545;
$steamBase = 76561197960265728;
function convertToSteamID($sSteamComID) {
$sServer = bcmod($sSteamComID, '2') == '0' ? '0' : '1';
$sCommID = bcsub($sSteamComID, $sServer);
$sCommID = bcsub($sCommID, $steamBase);
$sAuth = bcdiv($sCommID, '2');
echo "$sServer:$sAuth";
}
convertToSteamID($sSteamComID);
This function outputs 0:15051912 on a server when it should be printing 1:15051908
The missing global $steamBase was the problem, as already mentioned in a comment. (Tip: turn on E_NOTICE during development.) However, I'd like to address your question:
I think it has something to do with
the numbers being too large, or
something to do with 64 bit. Is there
any way I can convert them so that
they would work correctly?
PHP integers are signed and platform-dependent. Using 64-bit numbers will not work if you are on a 32-bit host.
So your concern is valid. But even on a 64-bit system:
$x = 9223372036854775808; // highest bit (64th) set
var_dump($x);
--> float(9.2233720368548E+18)
Note that PHP's BC Math routines operate on strings, not integers. Thus, you should be storing your big numbers as strings.
This will work around the potential problem of integers being converted to floats, which will happen even on your 64-bit environment if you are using large, unsigned integers.

Categories