Ok, so PHP isn't the best language to be dealing with arbitrarily large integers in, considering that it only natively supports 32-bit signed integers. What I'm trying to do though is create a class that could represent an arbitrarily large binary number and be able to perform simple arithmetic operations on two of them (add/subtract/multiply/divide).
My target is dealing with 128-bit integers.
There's a couple of approaches I'm looking at, and problems I see with them. Any input or commentary on what you would choose and how you might go about it would be greatly appreciated.
Approach #1: Create a 128-bit integer class that stores its integer internally as four 32-bit integers. The only problem with this approach is that I'm not sure how to go about handling overflow/underflow issues when manipulating individual chunks of the two operands.
Approach #2: Use the bcmath extension, as this looks like something it was designed to tackle. My only worry in taking this approach is the scale setting of the bcmath extension, because there can't be any rounding errors in my 128-bit integers; they must be precise. I'm also worried about being able to eventually convert the result of the bcmath functions into a binary string (which I'll later need to shove into some mcrypt encryption functions).
Approach #3: Store the numbers as binary strings (probably LSB first). Theoretically I should be able to store integers of any arbitrary size this way. All I would have to do is write the four basic arithmetic functions to perform add/sub/mult/div on two binary strings and produce a binary string result. This is exactly the format I need to hand over to mcrypt as well, so that's an added plus. This is the approach I think has the most promise at the moment, but the one sticking point I've got is that PHP doesn't offer me any way to manipulate the individual bits (that I know of). I believe I'd have to break it up into byte-sized chunks (no pun intended), at which point my questions about handling overflow/underflow from Approach #1 apply.
The PHP GMP extension will be better for this. As an added bonus, you can use it to do your decimal-to-binary conversion, like so:
gmp_strval(gmp_init($n, 10), 2);
There are already various classes available for this so you may wish to look at them before writing your own solution (if indeed writing your own solution is still needed).
As far as I can tell, the bcmath extension is the one you'll want. The data in the PHP manual is a little sparse, but you out to be able to set the precision to be exactly what you need by using the bcscale() function, or the optional third parameter in most of the other bcmath functions. Not too sure on the binary strings thing, but a bit of googling tells me you ought to be able to do with by making use of the pack() function.
I implemented the following PEMDAS complaint BC evaluator which may be useful to you.
function BC($string, $precision = 32)
{
if (extension_loaded('bcmath') === true)
{
if (is_array($string) === true)
{
if ((count($string = array_slice($string, 1)) == 3) && (bcscale($precision) === true))
{
$callback = array('^' => 'pow', '*' => 'mul', '/' => 'div', '%' => 'mod', '+' => 'add', '-' => 'sub');
if (array_key_exists($operator = current(array_splice($string, 1, 1)), $callback) === true)
{
$x = 1;
$result = #call_user_func_array('bc' . $callback[$operator], $string);
if ((strcmp('^', $operator) === 0) && (($i = fmod(array_pop($string), 1)) > 0))
{
$y = BC(sprintf('((%1$s * %2$s ^ (1 - %3$s)) / %3$s) - (%2$s / %3$s) + %2$s', $string = array_shift($string), $x, $i = pow($i, -1)));
do
{
$x = $y;
$y = BC(sprintf('((%1$s * %2$s ^ (1 - %3$s)) / %3$s) - (%2$s / %3$s) + %2$s', $string, $x, $i));
}
while (BC(sprintf('%s > %s', $x, $y)));
}
if (strpos($result = bcmul($x, $result), '.') !== false)
{
$result = rtrim(rtrim($result, '0'), '.');
if (preg_match(sprintf('~[.][9]{%u}$~', $precision), $result) > 0)
{
$result = bcadd($result, (strncmp('-', $result, 1) === 0) ? -1 : 1, 0);
}
else if (preg_match(sprintf('~[.][0]{%u}[1]$~', $precision - 1), $result) > 0)
{
$result = bcmul($result, 1, 0);
}
}
return $result;
}
return intval(version_compare(call_user_func_array('bccomp', $string), 0, $operator));
}
$string = array_shift($string);
}
$string = str_replace(' ', '', str_ireplace('e', ' * 10 ^ ', $string));
while (preg_match('~[(]([^()]++)[)]~', $string) > 0)
{
$string = preg_replace_callback('~[(]([^()]++)[)]~', __FUNCTION__, $string);
}
foreach (array('\^', '[\*/%]', '[\+-]', '[<>]=?|={1,2}') as $operator)
{
while (preg_match(sprintf('~(?<![0-9])(%1$s)(%2$s)(%1$s)~', '[+-]?(?:[0-9]++(?:[.][0-9]*+)?|[.][0-9]++)', $operator), $string) > 0)
{
$string = preg_replace_callback(sprintf('~(?<![0-9])(%1$s)(%2$s)(%1$s)~', '[+-]?(?:[0-9]++(?:[.][0-9]*+)?|[.][0-9]++)', $operator), __FUNCTION__, $string, 1);
}
}
}
return (preg_match('~^[+-]?[0-9]++(?:[.][0-9]++)?$~', $string) > 0) ? $string : false;
}
It automatically deals with rounding errors, just set the precision to whatever digits you need.
Related
As generic of a question as this seems, I'm having a really hard time
learning specifically about how to base-convert large high-precision float values in PHP using BCMath.
I'm trying to base-convert something like
1234.5678900000
to
4D2.91613D31B
How can I do this?
I just want base-10 → base-16, but a conversion for arbitrary-base floats would probably make the most useful answer for others as well.
How to convert a huge integer to hex in php? involves BC, but only for integers.
https://www.exploringbinary.com/base-conversion-in-php-using-bcmath/ explores floats, but only in the context of decimal<->binary. (It says extending the code for other bases is easy, and it probably is (using the code in the previous point), but I have no idea how to reason through the correctness of the result I'd reach.)
Fast arbitrary-precision logarithms with bcmath is also float-based, but in the context of reimplementing high-precision log(). (There is a mention of converting bases in there, though, along with notes about how BC dumbly uses PHP's own pow() and loses precision.)
The other results I've found are just talking about PHP's own float coercion, and don't relate to BC at all.
Up to base 36 conversions with high precision
I think this question is just a bit too difficult for Stack Overflow. Not only do you want to base-convert floating-points, which is a bit unusual by itself, but it has to be done at high precision. This is certainly possible, but not many people will have a solution for this lying around and making one takes time. The math of base conversion is not very complex, and once you understand it you can work it out yourself.
Oh, well, to make a long story short, I couldn't resist this, and gave it a try.
<?php
function splitNo($operant)
// get whole and fractional parts of operant
{
if (strpos($operant, '.') !== false) {
$sides = explode('.',$operant);
return [$sides[0], '.' . $sides[1]];
}
return [$operant, ''];
}
function wholeNo($operant)
// get the whole part of an operant
{
return explode('.', $operant)[0];
}
function toDigits($number, $base, $scale = 0)
// convert a positive number n to its digit representation in base b
{
$symbols = '0123456789ABCDEFGHIJKLMNOPQRSTUVWXYZ';
$digits = '';
list($whole, $fraction) = splitNo($number);
while (bccomp($whole, '0.0', $scale) > 0) {
$digits = $symbols{(int)bcmod($whole, $base, $scale)} . $digits;
$whole = wholeNo(bcdiv($whole, $base, $scale));
}
if ($scale > 0) {
$digits .= '.';
for ($i = 1; $i <= $scale; $i++) {
$fraction = bcmul($fraction, $base, $scale);
$whole = wholeNo($fraction);
$fraction = bcsub($fraction, $whole, $scale);
$digits .= $symbols{$whole};
}
}
return $digits;
}
function toNumber($digits, $base, $scale = 0)
// compute the number given by digits in base b
{
$symbols = str_split('0123456789ABCDEFGHIJKLMNOPQRSTUVWXYZ');
$number = '0';
list($whole, $fraction) = splitNo($digits);
foreach (str_split($whole) as $digit) {
$shiftUp = bcmul($base, $number, $scale);
$number = bcadd($shiftUp, array_search($digit, $symbols));
}
if ($fraction != '') {
$shiftDown = bcdiv('1', $base, $scale);
foreach (str_split(substr($fraction, 1)) as $symbol) {
$index = array_search($symbol, $symbols);
$number = bcadd($number, bcmul($index, $shiftDown, $scale), $scale);
$shiftDown = bcdiv($shiftDown, $base, $scale);
}
}
return $number;
}
function baseConv($operant, $fromBase, $toBase, $scale = 0)
// convert the digits representation of a number from base 1 to base 2
{
return toDigits(toNumber($operant, $fromBase, $scale), $toBase, $scale);
}
echo '<pre>';
print_r(baseConv('1234.5678900000', 10, 16, 60));
echo '</pre>';
The output is:
4D2.91613D31B9B66F9335D249E44FA05143BF727136A400FBA8826AA8EB4634
It looks a bit complicated, but isn't really. It just takes time. I started with converting whole numbers, then added fractions, and when that all worked I put in all the BC Math functions.
The $scale argument represents the number of wanted decimal places.
It may look a bit strange that I use three function for the conversion: toDigits(), toNumber() and baseConv(). The reason is that the BC Math functions work with a base of 10. So, toDigits() converts away from 10 to another base and toNumber() does the opposite. To convert between two arbitrary-base operants we need both functions, and this results in the third: baseConv().
This could possible be further optimized, if needed, but you haven't told us what you need it for, so optimization wasn't a priority for me. I just tried to make it work.
You can get higher base conversions by simply adding more symbols. However, in the current implementation each symbol needs to be one character. With UTF8 that doesn't really limit you, but make sure everything is multibyte compatible (which it isn't at this moment).
NOTE: It seems to work, but I don't give any guarantees. Test thoroughly before use!
I've looked at php-big numbers, BC Math, and GMP for dealing with very big numbers in php. But none seem to have a function equivilent to php's log(). For example I want to do this:
$result = log($bigNumber, 2);
Would anyone know of an alternate way to get the log base 2 of a arbitray precision point number in php? Maybe Ive missed a function, or library, or formula.
edit: php-bignumbers seems to have a log base 10 function only log10()
In general if you want to implement your high precision log own calculation, I'd suggest 1st use the basic features of logarithm:
log_a(x) = log_b(x) / log_b(a) |=> thus you can recalulate logarith to any base
log(x*y) = log(x) + log(y)
log(a**n) = n*log(a)
where log_a(x) - meaning logarithm to the base a of x; log means natural logarithm
So log(1000000000000000000000.123) = 21*log(1.000000000000000000000123)
and for high precision of log(1+x)
use algorithm referenced at
http://en.wikipedia.org/wiki/Natural_logarithm#High_precision
One solution combining the suggestions so far would be to use this formula:
log2($num) = log10($num) / log10(2)
in conjunction with php-big numbers since it has a pre-made log10 function.
eg, after installing the php-big numbers library, use:
$log2 = log10($bigNum) / log10(2);
Personally I've decided to use different math/logic so as to not need the log function, and just using bcmath for the big numbers.
One of the great things about base 2 is that counting and shifting become part of the tool set.
So one way to get a 'log2' of a number is to convert it to a binary string and count the bits.
You can accomplish this equivalently by dividing by 2 in a loop. But it seems to me that counting would be more efficient.
gmp_scan0 and gmp_scan1 can be used if you are counting from the right. But you'd have to somehow convert the mixed bits to all ones and zeroes.
But using gmp_strval(num, 2), you can produce a string and do a strpos on it.
if the whole value is being converted, you can do a (strlen - 1) on it.
Obviously this only works when you want an integer log.
I've had a very similar problem just recently.. and so I just scaled the number considerably in order to use the inbuild log to find the fractional part.. (I prefere the log10 for some reason.. don't ask... people are strange, me too)
I hope this is selfexplanatory enough..
it returns a float value (since that's what I needed)
function gmp_log($num, $base=10, $full=true)
{
if($base == 10)
$string = gmp_strval($num);
else
$string = gmp_strval($num,$base);
$intpart = strlen($string)-1;
if(!$full)
return $intpart;
if($base ==10)
{
$string = substr_replace($string, ".", 1, 0);
$number = floatval($string);
$lg = $intpart + log10($number);
return $lg;
}
else
{
$string = gmp_strval($num);
$intpart = strlen($string)-1;
$string = substr_replace($string, ".", 1, 0);
$number = floatval($string);
$lg = $intpart + log10($number);
$lb = $lg / log10($base);
return $lb;
}
}
it's quick, it's dirty... but it works well enough to get the log of some RSA sized integers ;)
usage is straight forward as well
$N = gmp_init("11002930366353704069");
echo gmp_log($N,10)."\n";
echo gmp_log($N,10, false)."\n";
echo gmp_log($N,2)."\n";
echo gmp_log($N,16)."\n";
returns
19.041508364472
19
63.254521604973
15.813630401243
Ok, so PHP isn't the best language to be dealing with arbitrarily large integers in, considering that it only natively supports 32-bit signed integers. What I'm trying to do though is create a class that could represent an arbitrarily large binary number and be able to perform simple arithmetic operations on two of them (add/subtract/multiply/divide).
My target is dealing with 128-bit integers.
There's a couple of approaches I'm looking at, and problems I see with them. Any input or commentary on what you would choose and how you might go about it would be greatly appreciated.
Approach #1: Create a 128-bit integer class that stores its integer internally as four 32-bit integers. The only problem with this approach is that I'm not sure how to go about handling overflow/underflow issues when manipulating individual chunks of the two operands.
Approach #2: Use the bcmath extension, as this looks like something it was designed to tackle. My only worry in taking this approach is the scale setting of the bcmath extension, because there can't be any rounding errors in my 128-bit integers; they must be precise. I'm also worried about being able to eventually convert the result of the bcmath functions into a binary string (which I'll later need to shove into some mcrypt encryption functions).
Approach #3: Store the numbers as binary strings (probably LSB first). Theoretically I should be able to store integers of any arbitrary size this way. All I would have to do is write the four basic arithmetic functions to perform add/sub/mult/div on two binary strings and produce a binary string result. This is exactly the format I need to hand over to mcrypt as well, so that's an added plus. This is the approach I think has the most promise at the moment, but the one sticking point I've got is that PHP doesn't offer me any way to manipulate the individual bits (that I know of). I believe I'd have to break it up into byte-sized chunks (no pun intended), at which point my questions about handling overflow/underflow from Approach #1 apply.
The PHP GMP extension will be better for this. As an added bonus, you can use it to do your decimal-to-binary conversion, like so:
gmp_strval(gmp_init($n, 10), 2);
There are already various classes available for this so you may wish to look at them before writing your own solution (if indeed writing your own solution is still needed).
As far as I can tell, the bcmath extension is the one you'll want. The data in the PHP manual is a little sparse, but you out to be able to set the precision to be exactly what you need by using the bcscale() function, or the optional third parameter in most of the other bcmath functions. Not too sure on the binary strings thing, but a bit of googling tells me you ought to be able to do with by making use of the pack() function.
I implemented the following PEMDAS complaint BC evaluator which may be useful to you.
function BC($string, $precision = 32)
{
if (extension_loaded('bcmath') === true)
{
if (is_array($string) === true)
{
if ((count($string = array_slice($string, 1)) == 3) && (bcscale($precision) === true))
{
$callback = array('^' => 'pow', '*' => 'mul', '/' => 'div', '%' => 'mod', '+' => 'add', '-' => 'sub');
if (array_key_exists($operator = current(array_splice($string, 1, 1)), $callback) === true)
{
$x = 1;
$result = #call_user_func_array('bc' . $callback[$operator], $string);
if ((strcmp('^', $operator) === 0) && (($i = fmod(array_pop($string), 1)) > 0))
{
$y = BC(sprintf('((%1$s * %2$s ^ (1 - %3$s)) / %3$s) - (%2$s / %3$s) + %2$s', $string = array_shift($string), $x, $i = pow($i, -1)));
do
{
$x = $y;
$y = BC(sprintf('((%1$s * %2$s ^ (1 - %3$s)) / %3$s) - (%2$s / %3$s) + %2$s', $string, $x, $i));
}
while (BC(sprintf('%s > %s', $x, $y)));
}
if (strpos($result = bcmul($x, $result), '.') !== false)
{
$result = rtrim(rtrim($result, '0'), '.');
if (preg_match(sprintf('~[.][9]{%u}$~', $precision), $result) > 0)
{
$result = bcadd($result, (strncmp('-', $result, 1) === 0) ? -1 : 1, 0);
}
else if (preg_match(sprintf('~[.][0]{%u}[1]$~', $precision - 1), $result) > 0)
{
$result = bcmul($result, 1, 0);
}
}
return $result;
}
return intval(version_compare(call_user_func_array('bccomp', $string), 0, $operator));
}
$string = array_shift($string);
}
$string = str_replace(' ', '', str_ireplace('e', ' * 10 ^ ', $string));
while (preg_match('~[(]([^()]++)[)]~', $string) > 0)
{
$string = preg_replace_callback('~[(]([^()]++)[)]~', __FUNCTION__, $string);
}
foreach (array('\^', '[\*/%]', '[\+-]', '[<>]=?|={1,2}') as $operator)
{
while (preg_match(sprintf('~(?<![0-9])(%1$s)(%2$s)(%1$s)~', '[+-]?(?:[0-9]++(?:[.][0-9]*+)?|[.][0-9]++)', $operator), $string) > 0)
{
$string = preg_replace_callback(sprintf('~(?<![0-9])(%1$s)(%2$s)(%1$s)~', '[+-]?(?:[0-9]++(?:[.][0-9]*+)?|[.][0-9]++)', $operator), __FUNCTION__, $string, 1);
}
}
}
return (preg_match('~^[+-]?[0-9]++(?:[.][0-9]++)?$~', $string) > 0) ? $string : false;
}
It automatically deals with rounding errors, just set the precision to whatever digits you need.
I have sets of 5, 6 and 7 digit numbers. I need them to be displayed in the 000/000/000 format. So for example:
12345 would be displayed as 000/012/345
and
9876543 would be displayed as 009/876/543
I know how to do this in a messy way, involving a series of if/else statements, and strlen functions, but there has to be a cleaner way involving regex that Im not seeing.
sprintf and modulo is one option
function formatMyNumber($num)
{
return sprintf('%03d/%03d/%03d',
$num / 1000000,
($num / 1000) % 1000,
$num % 1000);
}
$padded = str_pad($number, 9, '0', STR_PAD_LEFT);
$split = str_split($padded, 3);
$formatted = implode('/', $split);
You asked for a regex solution, and I love playing with them, so here is a regex solution!
I show it for educational (and fun) purpose only, just use Adam's solution, clean, readable and fast.
function FormatWithSlashes($number)
{
return substr(preg_replace('/(\d{3})?(\d{3})?(\d{3})$/', '$1/$2/$3',
'0000' . $number),
-11, 11);
}
$numbers = Array(12345, 345678, 9876543);
foreach ($numbers as $val)
{
$r = FormatWithSlashes($val);
echo "<p>$r</p>";
}
OK, people are throwing stuff out, so I will too!
number_format would be great, because it accepts a thousands separator, but it doesn't do padding zeroes like sprintf and the like. So here's what I came up with for a one-liner:
function fmt($x) {
return substr(number_format($x+1000000000, 0, ".", "/"), 2);
}
Minor improvement to PhiLho's suggestion:
You can avoid the substr by changing the regex to:
function FormatWithSlashes($number)
{
return preg_replace('/^0*(\d{3})(\d{3})(\d{3})$/', '$1/$2/$3',
'0000' . $number);
}
I also removed the ? after each of the first two capture groups because, when given a 5, 6, or 7 digit number (as specified in the question), this will always have at least 9 digits to work with. If you want to guard against the possibility of receiving a smaller input number, run the regex against '000000000' . $number instead.
Alternately, you could use
substr('0000' . $number, -9, 9);
and then splice the slashes in at the appropriate places with substr_replace, which I suspect may be the fastest way to do this (no need to run regexes or do division), but that's really just getting into pointless optimization, as any of the solutions presented will still be much faster than establishing a network connection to the server.
This would be how I would write it if using Perl 5.10 .
use 5.010;
sub myformat(_;$){
# prepend with zeros
my $_ = 0 x ( 9-length($_[0]) ) . $_[0];
my $join = $_[1] // '/'; # using the 'defined or' operator `//`
# m// in a list context returns ($1,$2,$3,...)
join $join, m/ ^ (\d{3}) (\d{3}) (\d{3}) $ /x;
}
Tested with:
$_ = 11111;
say myformat;
say myformat(2222);
say myformat(33333,';');
say $_;
returns:
000/011/111
000/002/222
000;033;333
11111
Back-ported to Perl 5.8 :
sub myformat(;$$){
local $_ = #_ ? $_[0] : $_
# prepend with zeros
$_ = 0 x ( 9-length($_) ) . $_;
my $join = defined($_[1]) ? $_[1] :'/';
# m// in a list context returns ($1,$2,$3,...)
join $join, m/ ^ (\d{3}) (\d{3}) (\d{3}) $ /x;
}
Here's how I'd do it in python (sorry I don't know PHP as well). I'm sure you can convert it.
def convert(num): #num is an integer
a = str(num)
s = "0"*(9-len(a)) + a
return "%s/%s/%s" % (s[:3], s[3:6], s[6:9])
This just pads the number to have length 9, then splits the substrings.
That being said, it seems the modulo answer is a bit better.
I want to split an arithmetic expression into tokens, to convert it into RPN.
Java has the StringTokenizer, which can optionally keep the delimiters. That way, I could use the operators as delimiters. Unfortunately, I need to do this in PHP, which has strtok, but that throws away the delimiters, so I need to brew something myself.
This sounds like a classic textbook example for Compiler Design 101, but I'm afraid I'm lacking some formal education here. Is there a standard algorithm you can point me to?
My other options are to read up on Lexical Analysis or to roll up something quick and dirty with the available string functions.
This might help.
Practical Uses of Tokenizer
As often, I would just use a regular expression to do this:
$expr = '(5*(7 + 2 * -9.3) - 8 )/ 11';
$tokens = preg_split('/([*\/^+-]+)\s*|([\d.]+)\s*/', $expr, -1,
PREG_SPLIT_DELIM_CAPTURE | PREG_SPLIT_NO_EMPTY);
$tts = print_r($tokens, true);
echo "<pre>x=$tts</pre>";
It needs a little more work to accept numbers with exponent (like -9.2e-8).
OK, thanks to PhiLho, my final code is this, should anyone need it. It's not even really dirty. :-)
static function rgTokenize($s)
{
$rg = array();
// remove whitespace
$s = preg_replace("/\s+/", '', $s);
// split at numbers, identifiers, function names and operators
$rg = preg_split('/([*\/^+\(\)-])|(#\d+)|([\d.]+)|(\w+)/', $s, -1, PREG_SPLIT_DELIM_CAPTURE | PREG_SPLIT_NO_EMPTY);
// find right-associative '-' and put it as a sign onto the following number
for ($ix = 0, $ixMax = count($rg); $ix < $ixMax; $ix++) {
if ('-' == $rg[$ix]) {
if (isset($rg[$ix - 1]) && self::fIsOperand($rg[$ix - 1])) {
continue;
} else if (isset($rg[$ix + 1]) && self::fIsOperand($rg[$ix + 1])) {
$rg[$ix + 1] = $rg[$ix].$rg[$ix + 1];
unset($rg[$ix]);
} else {
throw new Exception("Syntax error: Found right-associative '-' without operand");
}
}
}
$rg = array_values($rg);
echo join(" ", $rg)."\n";
return $rg;
}