Store a playing cards deck in MySQL (single column) - php

I'm doing a game with playing cards and have to store shuffled decks in MySQL.
What is the most efficient way to store a deck of 52 cards in a single column? And save/retrieve those using PHP.
I need 6 bits to represent a number from 0 to 52 and thus thought of saving the deck as binary data but I've tried using PHP's pack function without much luck. My best shot is saving a string of 104 characters (52 zero-padded integers) but that's far from optimal.
Thanks.

I do agree that it's not necessary or rather impractical to do any of this, but for fun, why not ;)
From your question I'm not sure if you aim to have all cards encoded as one value and stored accordingly or whether you want to encode cards individually. So I assume the former.
Further I assume you have a set 52 cards (or items) that you represent with an integer value between 1 and 52.
I see some approaches, outlined as follows, not all actually for the better of using less space, but included for the sake of being complete:
using a comma separated list (CSV), total length of 9+2*42+51 = 144 characters
turning each card into a character ie a card is represented with 8 bits, total length of 52 characters
encoding each card with those necessary 6 bits and concatenating just the bits without the otherwise lost 2 bits (as in the 2nd approach), total length of 39 characters
treat the card-ids as coefficients in a polynomial of form p(cards) = cards(1)*52^51 + cards(2)*52^50 + cards(3)*52^49 + ... + cards(52)*52^0 which we use to identify the card-set. Roughly speaking p(cards) must lie in the value range of [0,52^52], which means that the value can be represented with log(52^52)/log(2) = 296.422865343.. bits or with a byte sequence of length 37.052 respectively 38.
There naturally are further approaches, taking into account mere practical, technical or theoretical aspects, as is also visible through the listed approaches.
For the theoretical approaches (which I consider the most interesting) it is helpful to know a bit about information theory and entropy. Essentially, depending on what is known about a problem, no further information is required, respectively only the information to clarify all remaining uncertainty is needed.
As we are working with bits and bytes, it mostly is interesting for us in terms of memory usage which practically speaking is bit- or byte-based (if you consider only bit-sequences without the underlying technologies and hardware); that is, bits represent one of two states, ergo allow the differentiation of two states. This is trivial but important, actually :/
then, if you want to represent N states in a base B, you will need log(N) / log(B) digits in that base, or in your example log(52) / log(2) = 5.70.. -> 6 bits. you will notice that actually only 5.70.. bits would be required, which means with 6 bits we actually have a loss.
Which is the moment the problem transformation comes in: instead of representing 52 states individually, the card set as a whole can be represented. The polynomial approach is a way to do this. Essentially it works as it assumes a base of 52, ie where the card set is represented as 1'4'29'31.... or mathematically speaking: 52 + 1 = 1'1 == 53 decimal, 1'1 + 1 = 1'2 == 54 decimal, 1'52'52 + 1 = 2'1'1 == 5408 decimal,
But if you further look at the polynomial-approach you will notice that there is a total of 52^52 possible values whereas we would only ever use 52! = 1*2*3*...*52 because once a card is fixed the remaining possibilities decrease, respectively the uncertainty or entropy decreases. (please note that 52! / 52^52 = 4.7257911e-22 ! which means the polynomial is a total waste of space).
If we now were to use a value in [1,52!] which is pretty much the theoretical minimum, we could represent the card set with log(52!) / log(2) = 225.581003124.. bits = 28.1976.. bytes. Problem with that is, that any of the values represented as such does not contain any structure from which we can derive its semantics, which means that for each of the 52! possible values (well 52! - 1, if you consider the principle of exclusion) we need a reference of its meaning, ie a lookup table of 52! values and that would certainly be a memory overkill.
Although we can make a compromise with the knowledge of the decreasing entropy of an encoded ordered set. As an example: we sequentially encode each card with the minimum number of bits required at that point in the sequence. So assume N<=52 cards remain, then in each step a card can be represented in log(N)/log(2) bits, meaning that the number of required bits decreases, until for the last card, you don't need a bit in the first place. This would give about (please correct)..
20 * 6 bits + 16 * 5 bits + 8 * 4 bits + 4 * 3 bits + 2 * 2 bits + 1 bit = 249 bits = 31.125.. bytes
But still there would be a loss because of the partial bits used unnecessarily, but the structure in the data totally makes up for that.
So a question might be, hej can we combine the polynomial with this??!?11?! Actually, I have to think about that, I'm getting tired.
Generally speaking, knowing about the structure of a problem drastically helps decreasing the necessary memory space. Practically speaking, in this day and age, for your average high-level developer such low level considerations are not so important (hej, 100kByte of wasted space, so what!) and other considerations are weighted higher; also because the underlying technologies are often reducing memory usage by themselves, be it your filesystem or gzip-ed web-server responses, etc. The general knowledge of these kind of things though is still helpful in creating your services and datastructures.
But these latter approaches are very problem-specific "compression procedures"; general compression works differently, where as an example approach the procedures sequencially run through the bytes of the data and for any unseen bit sequences add these to a lookup table and represent the actual sequence with an index (as a sketch).
Well enough of funny talk, let's get technical!
1st approach "csv"
// your unpacked card set
$cards = range(1,52);
$coded = implode(',',$cards);
$decoded = explode(',',$coded);
2nd approach: 1 card = 1 character
// just a helper
// (not really necessary, but using this we can pretty print the resulting string)
function chr2($i)
{
return chr($i + 32);
}
function myPack($cards)
{
$ar = array_map('chr2',$cards);
return implode('',$ar);
}
function myUnpack($str)
{
$set = array();
$len = strlen($str);
for($i=0; $i<$len; $i++)
$set[] = ord($str[$i]) - 32; // adjust this shift along with the helper
return $set;
}
$str = myPack($cards);
$other_cards = myUnpack($str);
3rd approach, 1 card = 6 bits
$set = ''; // target string
$offset = 0;
$carry = 0;
for($i=0; $i < 52; $i++)
{
$c = $cards[$i];
switch($offset)
{
case 0:
$carry = ($c << 2);
$next = null;
break;
case 2:
$next = $carry + $c;
$carry = 0;
break;
case 4:
$next = $carry + ($c>>2);
$carry = ($c << 6) & 0xff;
break;
case 6:
$next = $carry + ($c>>4);
$carry = ($c << 4) & 0xff;
break;
}
if ($next !== null)
{
$set .= chr($next);
}
$offset = ($offset + 6) % 8;
}
// and $set it is!
$new_cards = array(); // the target array for cards to be unpacked
$offset = 0;
$carry = 0;
for($i=0; $i < 39; $i++)
{
$o = ord(substr($set,$i,1));
$new = array();
switch($offset)
{
case 0:
$new[] = ($o >> 2) & 0x3f;
$carry = ($o << 4) & 0x3f;
break;
case 4:
$new[] = (($o >> 6) & 3) + $carry;
$new[] = $o & 0x3f;
$carry = 0;
$offset += 6;
break;
case 6:
$new[] = (($o >> 4) & 0xf) + $carry;
$carry = ($o & 0xf) << 2;
break;
}
$new_cards = array_merge($new_cards,$new);
$offset = ($offset + 6) % 8;
}
4th approach, the polynomial, just outlined (please consider using bigints because of the integer overflow)
$encoded = 0;
$base = 52;
foreach($cards as $c)
{
$encoded = $encoded*$base + $c;
}
// and now save the binary representation
$decoded = array();
for($i=0; $i < 52; $i++)
{
$v = $encoded % $base;
$encoded = ($encoded - $v) / $base;
array_shift($v, $decoded);
}

Related

What's the most efficient way of randomly picking a floating number within a specific range? [duplicate]

How does one generate a random float between 0 and 1 in PHP?
I'm looking for the PHP's equivalent to Java's Math.random().
You may use the standard function: lcg_value().
Here's another function given on the rand() docs:
// auxiliary function
// returns random number with flat distribution from 0 to 1
function random_0_1()
{
return (float)rand() / (float)getrandmax();
}
Example from documentation :
function random_float ($min,$max) {
return ($min+lcg_value()*(abs($max-$min)));
}
rand(0,1000)/1000 returns:
0.348 0.716 0.251 0.459 0.893 0.867 0.058 0.955 0.644 0.246 0.292
or use a bigger number if you want more digits after decimal point
class SomeHelper
{
/**
* Generate random float number.
*
* #param float|int $min
* #param float|int $max
* #return float
*/
public static function rand($min = 0, $max = 1)
{
return ($min + ($max - $min) * (mt_rand() / mt_getrandmax()));
}
}
update:
forget this answer it doesnt work wit php -v > 5.3
What about
floatVal('0.'.rand(1, 9));
?
this works perfect for me, and it´s not only for 0 - 1 for example between 1.0 - 15.0
floatVal(rand(1, 15).'.'.rand(1, 9));
function mt_rand_float($min, $max, $countZero = '0') {
$countZero = +('1'.$countZero);
$min = floor($min*$countZero);
$max = floor($max*$countZero);
$rand = mt_rand($min, $max) / $countZero;
return $rand;
}
example:
echo mt_rand_float(0, 1);
result: 0.2
echo mt_rand_float(3.2, 3.23, '000');
result: 3.219
echo mt_rand_float(1, 5, '00');
result: 4.52
echo mt_rand_float(0.56789, 1, '00');
result: 0.69
$random_number = rand(1,10).".".rand(1,9);
function frand($min, $max, $decimals = 0) {
$scale = pow(10, $decimals);
return mt_rand($min * $scale, $max * $scale) / $scale;
}
echo "frand(0, 10, 2) = " . frand(0, 10, 2) . "\n";
This question asks for a value from 0 to 1. For most mathematical purposes this is usually invalid albeit to the smallest possible degree. The standard distribution by convention is 0 >= N < 1. You should consider if you really want something inclusive of 1.
Many things that do this absent minded have a one in a couple billion result of an anomalous result. This becomes obvious if you think about performing the operation backwards.
(int)(random_float() * 10) would return a value from 0 to 9 with an equal chance of each value. If in one in a billion times it can return 1 then very rarely it will return 10 instead.
Some people would fix this after the fact (to decide that 10 should be 9). Multiplying it by 2 should give around a ~50% chance of 0 or 1 but will also have a ~0.000000000465% chance of returning a 2 like in Bender's dream.
Saying 0 to 1 as a float might be a bit like mistakenly saying 0 to 10 instead of 0 to 9 as ints when you want ten values starting at zero. In this case because of the broad range of possible float values then it's more like accidentally saying 0 to 1000000000 instead of 0 to 999999999.
With 64bit it's exceedingly rare to overflow but in this case some random functions are 32bit internally so it's not no implausible for that one in two and a half billion chance to occur.
The standard solutions would instead want to be like this:
mt_rand() / (getrandmax() + 1)
There can also be small usually insignificant differences in distribution, for example between 0 to 9 then you might find 0 is slightly more likely than 9 due to precision but this will typically be in the billionth or so and is not as severe as the above issue because the above issue can produce an invalid unexpected out of bounds figure for a calculation that would otherwise be flawless.
Java's Math.random will also never produce a value of 1. Some of this comes from that it is a mouthful to explain specifically what it does. It returns a value from 0 to less than one. It's Zeno's arrow, it never reaches 1. This isn't something someone would conventionally say. Instead people tend to say between 0 and 1 or from 0 to 1 but those are false.
This is somewhat a source of amusement in bug reports. For example, any PHP code using lcg_value without consideration for this may glitch approximately one in a couple billion times if it holds true to its documentation but that makes it painfully difficult to faithfully reproduce.
This kind of off by one error is one of the common sources of "Just turn it off and on again." issues typically encountered in embedded devices.
Solution for PHP 7. Generates random number in [0,1). i.e. includes 0 and excludes 1.
function random_float() {
return random_int(0, 2**53-1) / (2**53);
}
Thanks to Nommyde in the comments for pointing out my bug.
>>> number_format((2**53-1)/2**53,100)
=> "0.9999999999999998889776975374843459576368331909179687500000000000000000000000000000000000000000000000"
>>> number_format((2**53)/(2**53+1),100)
=> "1.0000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000"
Most answers are using mt_rand. However, mt_getrandmax() usually returns only 2147483647. That means you only have 31 bits of information, while a double has a mantissa with 52 bits, which means there is a density of at least 2^53 for the numbers between 0 and 1.
This more complicated approach will get you a finer distribution:
function rand_754_01() {
// Generate 64 random bits (8 bytes)
$entropy = openssl_random_pseudo_bytes(8);
// Create a string of 12 '0' bits and 52 '1' bits.
$x = 0x000FFFFFFFFFFFFF;
$first12 = pack("Q", $x);
// Set the first 12 bits to 0 in the random string.
$y = $entropy & $first12;
// Now set the first 12 bits to be 0[exponent], where exponent is randomly chosen between 1 and 1022.
// Here $e has a probability of 0.5 to be 1022, 0.25 to be 1021, etc.
$e = 1022;
while($e > 1) {
if(mt_rand(0,1) == 0) {
break;
} else {
--$e;
}
}
// Pack the exponent properly (add four '0' bits behind it and 49 more in front)
$z = "\0\0\0\0\0\0" . pack("S", $e << 4);
// Now convert to a double.
return unpack("d", $y | $z)[1];
}
Please note that the above code only works on 64-bit machines with a Litte-Endian byte order and Intel-style IEEE754 representation. (x64-compatible computers will have this). Unfortunately PHP does not allow bit-shifting past int32-sized boundaries, so you have to write a separate function for Big-Endian.
You should replace this line:
$z = "\0\0\0\0\0\0" . pack("S", $e << 4);
with its big-endian counterpart:
$z = pack("S", $e << 4) . "\0\0\0\0\0\0";
The difference is only notable when the function is called a large amount of times: 10^9 or more.
Testing if this works
It should be obvious that the mantissa follows a nice uniform distribution approximation, but it's less obvious that a sum of a large amount of such distributions (each with cumulatively halved chance and amplitude) is uniform.
Running:
function randomNumbers() {
$f = 0.0;
for($i = 0; $i < 1000000; ++$i) {
$f += \math::rand_754_01();
}
echo $f / 1000000;
}
Produces an output of 0.49999928273099 (or a similar number close to 0.5).
I found the answer on PHP.net
<?php
function randomFloat($min = 0, $max = 1) {
return $min + mt_rand() / mt_getrandmax() * ($max - $min);
}
var_dump(randomFloat());
var_dump(randomFloat(2, 20));
?>
float(0.91601131712832)
float(16.511210331931)
So you could do
randomFloat(0,1);
or simple
mt_rand() / mt_getrandmax() * 1;
what about:
echo (float)('0.' . rand(0,99999));
would probably work fine... hope it helps you.

How to optimise a Exponential Moving Average algorithm in PHP?

I'm trying to retrieve the last EMA of a large dataset (15000+ values). It is a very resource-hungry algorithm since each value depends on the previous one. Here is my code :
$k = 2/($range+1);
for ($i; $i<$size_data; ++$i) {
$lastEMA = $lastEMA + $k * ($data[$i]-$lastEMA);
}
What I already did:
Isolate $k so it is not computed 10000+ times
Keep only the latest computed EMA, and not keep all of them in an array
use for() instead of foreach()
the $data[] array doesn't have keys; it's a basic array
This allowed me to reduced execution time from 2000ms to about 500ms for 15000 values!
What didn't work:
Use SplFixedArray(), this shaved only ~10ms executing 1,000,000 values
Use PHP_Trader extension, this returns an array containing all the EMAs instead of just the latest, and it's slower
Writing and running the same algorithm in C# and running it over 2,000,000 values takes only 13ms! So obviously, using a compiled, lower-level language seems to help ;P
Where should I go from here? The code will ultimately run on Ubuntu, so which language should I choose? Will PHP be able to call and pass such a huge argument to the script?
Clearly implementing with an extension gives you a significant boost.
Additionally the calculus can be improved as itself and that gain you can add in whichever language you choose.
It is easy to see that lastEMA can be calculated as follows:
$lastEMA = 0;
$k = 2/($range+1);
for ($i; $i<$size_data; ++$i) {
$lastEMA = (1-$k) * $lastEMA + $k * $data[$i];
}
This can be rewritten as follows in order to take out of the loop as most as possible:
$lastEMA = 0;
$k = 2/($range+1);
$k1m = 1 - $k;
for ($i; $i<$size_data; ++$i) {
$lastEMA = $k1m * $lastEMA + $data[$i];
}
$lastEMA = $lastEMA * $k;
To explain the extraction of the "$k" think that in the previous formulation is as if all the original raw data are multiplied by $k so practically you can instead multiply the end result.
Note that, rewritten in this way, you have 2 operations inside the loop instead of 3 (to be precise inside the loop there are also $i increment, $i comparison with $size_data and $lastEMA value assignation) so this way you can expect to achieve an additional speedup in the range between the 16% and 33%.
Further there are other improvements that can be considered at least in some circumstances:
Consider only last values
The first values are multiplied several times by $k1m = 1 - $k so their contribute may be little or even go under the floating point precision (or the acceptable error).
This idea is particularly helpful if you can do the assumption that older data are of the same order of magnitude as the newer because if you consider only the last $n values the error that you make is
$err = $EMA_of_discarded_data * (1-$k) ^ $n.
So if order of magnitude is broadly the same we can tell that the relative error done is
$rel_err = $err / $lastEMA = $EMA_of_discarded_data * (1-$k) ^ $n / $lastEMA
that is almost equal to simply (1-$k) ^ $n.
Under the assumption that "$lastEMA almost equal to $EMA_of_discarded_data":
Let's say that you can accept a relative error $rel_err
you can safely consider only the last $n values where (1 - $k)^$n < $rel_err.
Means that you can pre-calculate (before the loop) $n = log($rel_err) / log (1-$k) and compute all only considering the last $n values.
If the dataset is very big this can give a sensible speedup.
Consider that for 64 bit floating point numbers you have a relative precision (related to the mantissa) that is 2^-53 (about 1.1e-16 and only 2^-24 = 5.96e-8 for 32 bit floating point numbers) so you cannot obtain better than this relative error
so basically you should never have an advantage in calculating more than $n = log(1.1e-16) / log(1-$k) values.
to give an example if $range = 2000 then $n = log(1.1e-16) / log(1-2/2001) = 36'746.
I think that is interesting to know that extra calculations would go lost inside the roundings ==> it is useless ==> is better not to do.
now one example for the case where you can accept a relative error larger than floating point precision $rel_err = 1ppm = 1e-6 = 0.00001% = 6 significant decimal digits you have $n = log(1.1e-16) / log(1-2/2001) = 13'815
I think is quite a little number compared to your last samples numbers so in that cases the speedup could be evident (I'm assuming that $range = 2000 is meaningful or high for your application but thi I cannot know).
just other few numbers because I do not know what are your typical figures:
$rel_err = 1e-3; $range = 2000 => $n = 6'907
$rel_err = 1e-3; $range = 200 => $n = 691
$rel_err = 1e-3; $range = 20 => $n = 69
$rel_err = 1e-6; $range = 2000 => $n = 13'815
$rel_err = 1e-6; $range = 200 => $n = 1'381
$rel_err = 1e-6; $range = 20 => $n = 138
If the assumption "$lastEMA almost equal to $EMA_of_discarded_data" cannot be taken things are less easy but since the advantage cam be significant it can be meaningful to go on:
we need to re-consider the full formula: $rel_err = $EMA_of_discarded_data * (1-$k) ^ $n / $lastEMA
so $n = log($rel_err * $lastEMA / $EMA_of_discarded_data) / log (1-$k) = (log($rel_err) + log($lastEMA / $EMA_of_discarded_data)) / log (1-$k)
the central point is to calculate $lastEMA / $EMA_of_discarded_data (without actually calculating $lastEMA nor $EMA_of_discarded_data of course)
one case is when we know a-priori that for example $EMA_of_discarded_data / $lastEMA < M (for example M = 1000 or M = 1e6)
in that case $n < (log($rel_err/M)) / log (1-$k)
if you cannot give any M number
you have to find a good idea to over-estimate $EMA_of_discarded_data / $lastEMA
one quick way could be to take M = max(data) / min(data)
Parallelization
The calculation can be re-written in a form where it is a simple addition of independent terms:
$lastEMA = 0;
$k = 2/($range+1);
$k1m = 1 - $k;
for ($i; $i<$size_data; ++$i) {
$lastEMA += $k1m ^ ($size_data - 1 - $i) * $data[$i];
}
$lastEMA = $lastEMA * $k;
So if the implementing language supports parallelization the dataset can be divided in 4 (or 8 or n ...basically the number of CPU cores available) chunks and it can be computed the sum of terms on each chunk in parallel summing up the individual results at the end.
I do not go in detail with this since this reply is already terribly long and I think the concept is already expressed.
Building your own extension definitely improves performance. Here's a good tutorial from the Zend website.
Some performance figures: Hardware: Ubuntu 14.04, PHP 5.5.9, 1-core Intel CPU#3.3Ghz, 128MB RAM (it's a VPS).
Before (PHP only, 16,000 values) : 500ms
C Extension, 16,000 values : 0.3ms
C Extension (100,000 values) : 3.7ms
C Extension (500,000 values) : 28.0ms
But I'm memory limited at this point, using 70MB. I will fix that and update the numbers accordingly.

Is there a clever way to do this with pure math

I've got this spot of code that seems it could be done cleaner with pure math (perhaps a logarigthms?). Can you help me out?
The code finds the first power of 2 greater than a given input. For example, if you give it 500, it returns 9, because 2^9 = 512 > 500. 2^8 = 256, would be too small because it's less than 500.
function getFactor($iMaxElementsPerDir)
{
$aFactors = range(128, 1);
foreach($aFactors as $i => $iFactor)
if($iMaxElementsPerDir > pow(2, $iFactor) - 1)
break;
if($i == 0)
return false;
return $aFactors[$i - 1];
}
The following holds true
getFactor(500) = 9
getFactor(1000) = 10
getFactor(2500) = 12
getFactor(5000) = 13
You can get the same effect by shifting the bits in the input to the right and checking against 0. Something like this.
i = 1
while((input >> i) != 0)
i++
return i
The same as jack but shorter. Log with base 2 is the reverse function of 2^x.
echo ceil(log(500, 2));
If you're looking for a "math only" solution (that is a single expression or formula), you can use log() and then take the ceiling value of its result:
$factors = ceil(log(500) / log(2)); // 9
$factors = ceil(log(5000) / log(2)); // 13
I seem to have not noticed that this function accepts a second argument (since PHP 4.3) with which you can specify the base; though internally the same operation is performed, it does indeed make the code shorter:
$factors = ceil(log(500, 2)); // 9
To factor in some inaccuracies, you may need some tweaking:
$factors = floor(log($nr - 1, 2)) + 1;
There are a few ways to do this.
Zero all but the most significant bit of the number, maybe like this:
while (x & x-1) x &= x-1;
and look the answer up in a table. Use a table of length 67 and mod your power of two by 67.
Binary search for the high bit.
If you're working with a floating-point number, inspect the exponent field. This field contains 1023 plus your answer, except in the case where the number is a perfect power of two. You can detect the perfect power case by checking whether the significand field is exactly zero.
If you aren't working with a floating-point number, convert it to floating-point and look at the exponent like in 3. Check for a power of two by testing (x & x-1) == 0 instead of looking at the significand; this is true exactly when x is a power of two.
Note that log(2^100) is the same double as log(nextafter(2^100, 1.0/0.0)), so any solution based on floating-point natural logarithms will fail.
Here's (nonconformant C++, not PHP) code for 4:
int ceillog2(unsigned long long x) {
if (x < 2) return x-1;
double d = x-1;
int ans = (long long &)d >> 52;
return ans - 1022;
}

How to get number of digits in both right, left sides of a decimal number

I wonder if is there a good way to get the number of digits in right/left side of a decimal number PHP. For example:
12345.789 -> RIGHT SIDE LENGTH IS 3 / LEFT SIDE LENGTH IS 5
I know it is readily attainable by helping string functions and exploding the number. I mean is there a mathematically or programmatically way to perform it better than string manipulations.
Your answers would be greatly appreciated.
Update
The best solution for left side till now was:
$left = floor(log10($x))+1;
but still no sufficient for right side.
Still waiting ...
To get the digits on the left side you can do this:
$left = floor(log10($x))+1;
This uses the base 10 logarithm to get the number of digits.
The right side is harder. A simple approach would look like this, but due to floating point numbers, it would often fail:
$decimal = $x - floor($x);
$right = 0;
while (floor($decimal) != $decimal) {
$right++;
$decimal *= 10; //will bring in floating point 'noise' over time
}
This will loop through multiplying by 10 until there are no digits past the decimal. That is tested with floor($decimal) != $decimal.
However, as Ali points out, giving it the number 155.11 (a hard to represent digit in binary) results in a answer of 14. This is because as the number is stored as something like 155.11000000000001 with the 32 bits of floating precision we have.
So instead, a more robust solution is needed. (PoPoFibo's solutions above is particularly elegant, and uses PHPs inherit float comparison functions well).
The fact is, we can never distinguish between input of 155.11 and 155.11000000000001. We will never know which number was originally given. They will both be represented the same. However, if we define the number of zeroes that we can see in a row before we just decide the decimal is 'done' than we can come up with a solution:
$x = 155.11; //the number we are testing
$LIMIT = 10; //number of zeroes in a row until we say 'enough'
$right = 0; //number of digits we've checked
$empty = 0; //number of zeroes we've seen in a row
while (floor($x) != $x) {
$right++;
$base = floor($x); //so we can see what the next digit is;
$x *= 10;
$base *= 10;
$digit = floor($x) - $base; //the digit we are dealing with
if ($digit == 0) {
$empty += 1;
if ($empty == $LIMIT) {
$right -= $empty; //don't count all those zeroes
break; // exit the loop, we're done
}
} else {
$zeros = 0;
}
}
This should find the solution given the reasonable assumption that 10 zeroes in a row means any other digits just don't matter.
However, I still like PopoFibo's solution better, as without any multiplication, PHPs default comparison functions effectively do the same thing, without the messiness.
I am lost on PHP semantics big time but I guess the following would serve your purpose without the String usage (that is at least how I would do in Java but hopefully cleaner):
Working code here: http://ideone.com/7BnsR3
Non-string solution (only Math)
Left side is resolved hence taking the cue from your question update:
$value = 12343525.34541;
$left = floor(log10($value))+1;
echo($left);
$num = floatval($value);
$right = 0;
while($num != round($num, $right)) {
$right++;
}
echo($right);
Prints
85
8 for the LHS and 5 for the RHS.
Since I'm taking a floatval that would make 155.0 as 0 RHS which I think is valid and can be resolved by String functions.
php > $num = 12345.789;
php > $left = strlen(floor($num));
php > $right = strlen($num - floor($num));
php > echo "$left / $right\n";
5 / 16 <--- 16 digits, huh?
php > $parts = explode('.', $num);
php > var_dump($parts);
array(2) {
[0]=>
string(5) "12345"
[1]=>
string(3) "789"
As you can see, floats aren't the easiest to deal with... Doing it "mathematically" leads to bad results. Doing it by strings works, but makes you feel dirty.
$number = 12345.789;
list($whole, $fraction) = sscanf($number, "%d.%d");
This will always work, even if $number is an integer and you’ll get two real integers returned. Length is best done with strlen() even for integer values. The proposed log10() approach won't work for 10, 100, 1000, … as you might expect.
// 5 - 3
echo strlen($whole) , " - " , strlen($fraction);
If you really, really want to get the length without calling any string function here you go. But it's totally not efficient at all compared to strlen().
/**
* Get integer length.
*
* #param integer $integer
* The integer to count.
* #param boolean $count_zero [optional]
* Whether 0 is to be counted or not, defaults to FALSE.
* #return integer
* The integer's length.
*/
function get_int_length($integer, $count_zero = false) {
// 0 would be 1 in string mode! Highly depends on use case.
if ($count_zero === false && $integer === 0) {
return 0;
}
return floor(log10(abs($integer))) + 1;
}
// 5 - 3
echo get_int_length($whole) , " - " , get_int_length($fraction);
The above will correctly count the result of 1 / 3, but be aware that the precision is important.
$number = 1 / 3;
// Above code outputs
// string : 1 - 10
// math : 0 - 10
$number = bcdiv(1, 3);
// Above code outputs
// string : 1 - 0 <-- oops
// math : 0 - INF <-- 8-)
No problem there.
I would like to apply a simple logic.
<?php
$num=12345.789;
$num_str="".$num; // Converting number to string
$array=explode('.',$num_str); //Explode number (String) with .
echo "Left side length : ".intval(strlen($array[0])); // $array[0] contains left hand side then check the string length
echo "<br>";
if(sizeof($array)>1)
{
echo "Left side length : ".intval(strlen($array[1]));// $array[1] contains left hand check the string length side
}
?>

Base conversion of arbitrary sized numbers (PHP)

I have a long "binary string" like the output of PHPs pack function.
How can I convert this value to base62 (0-9a-zA-Z)?
The built in maths functions overflow with such long inputs, and BCmath doesn't have a base_convert function, or anything that specific. I would also need a matching "pack base62" function.
I think there is a misunderstanding behind this question. Base conversion and encoding/decoding are different. The output of base64_encode(...) is not a large base64-number. It's a series of discrete base64 values, corresponding to the compression function. That is why BC Math does not work, because BC Math is concerned with single large numbers, not strings that are in reality groups of small numbers that represent binary data.
Here's an example to illustrate the difference:
base64_encode(1234) = "MTIzNA=="
base64_convert(1234) = "TS" //if the base64_convert function existed
base64 encoding breaks the input up into groups of 3 bytes (3*8 = 24 bits), then converts each sub-segment of 6 bits (2^6 = 64, hence "base64") to the corresponding base64 character (values are "ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz0123456789+/", where A = 0, / = 63).
In our example, base64_encode() treats "1234" as a string of 4 characters, not an integer (because base64_encode() does not operate on integers). Therefore it outputs "MTIzNA==", because (in US-ASCII/UTF-8/ISO-8859-1) "1234" is 00110001 00110010 00110011 00110100 in binary. This gets broken into 001100 (12 in decimal, character "M") 010011 (19 in decimal, character "T") 001000 ("I") 110011 ("z") 001101 ("N") 00. Since the last group isn't complete, it gets padded with 0's and the value is 000000 ("A"). Because everything is done by groups of 3 input characters, there are 2 groups: "123" and "4". The last group is padded with ='s to make it 3 chars long, so the whole output becomes "MTIzNA==".
converting to base64, on the other hand, takes a single integer value and converts it into a single base64 value. For our example, 1234 (decimal) is "TS" (base64), if we use the same string of base64 values as above. Working backward, and left-to-right: T = 19 (column 1), S = 18 (column 0), so (19 * 64^1) + (18 * 64^0) = 19 * 64 + 18 = 1234 (decimal). The same number can be represented as "4D2" in hexadecimal (base16): (4 * 16^2) + (D * 16^1) + (2 * 16^0) = (4 * 256) + (13 * 16) + (2 * 1) = 1234 (decimal).
Unlike encoding, which takes a string of characters and changes it, base conversion does not alter the actual number, just changes its presentation. The hexadecimal (base16) "FF" is the same number as decimal (base10) "255", which is the same number as "11111111" in binary (base2). Think of it like currency exchange, if the exchange rate never changed: $1 USD has the same value as £0.79 GBP (exchange rate as of today, but pretend it never changes).
In computing, integers are typically operated on as binary values (because it's easy to build 1-bit arithmetic units and then stack them together to make 32-bit/etc. arithmetic units). To do something as simple as "255 + 255" (decimal), the computer needs to first convert the numbers to binary ("11111111" + "11111111") and then perform the operation in the Arithmetic Logic Unit (ALU).
Almost all other uses of bases are purely for the convenience of humans (presentational) - computers display their internal value 11111111 (binary) as 255 (decimal) because humans are trained to operate on decimal numbers. The function base64_convert() doesn't exist as part of the standard PHP repertoire because it's not often useful to anyone: not many humans read base64 numbers natively. By contrast, binary 1's and 0's are sometimes useful for programmers (we can use them like on/off switches!), and hexadecimal is convenient for humans editing binary data because an entire 8-bit byte can be represented unambiguously as 00 through FF, without wasting too much space.
You may ask, "if base conversion is just for presentation, why does BC Math exist?" That's a fair question, and also exactly why I said "almost" purely for presentation: typical computers are limited to 32-bit or 64-bit wide numbers, which are usually plenty big enough. Sometimes you need to operate on really, really big numbers (RSA moduli for example), which don't fit in those registers. BC Math solves this problem by acting as an abstraction layer: it converts huge numbers into long strings of text. When it's time to do some operation, BC Math painstakingly breaks the long strings of text up into small chunks which the computer can handle. It's much, much slower than native operations, but it can handle arbitrary-sized numbers.
Here is a function base_conv() that can convert between completely arbitrary bases, expressed as arrays of strings; Each array element represents a single "digit" in that base, thus also allowing multi-character values (it is your responsibility to avoid ambiguity).
function base_conv($val, &$baseTo, &$baseFrom)
{
return base_arr_to_str(base_conv_arr(base_str_to_arr((string) $val, $baseFrom), count($baseTo), count($baseFrom)), $baseTo);
}
function base_conv_arr($val, $baseToDigits, $baseFromDigits)
{
$valCount = count($val);
$result = array();
do
{
$divide = 0;
$newlen = 0;
for ($i = 0; $i < $valCount; ++$i)
{
$divide = $divide * $baseFromDigits + $val[$i];
if ($divide >= $baseToDigits)
{
$val[$newlen ++] = (int) ($divide / $baseToDigits);
$divide = $divide % $baseToDigits;
}
else if ($newlen > 0)
{
$val[$newlen ++] = 0;
}
}
$valCount = $newlen;
array_unshift($result, $divide);
}
while ($newlen != 0);
return $result;
}
function base_arr_to_str($arr, &$base)
{
$str = '';
foreach ($arr as $digit)
{
$str .= $base[$digit];
}
return $str;
}
function base_str_to_arr($str, &$base)
{
$arr = array();
while ($str === '0' || !empty($str))
{
foreach ($base as $index => $digit)
{
if (mb_substr($str, 0, $digitLen = mb_strlen($digit)) === $digit)
{
$arr[] = $index;
$str = mb_substr($str, $digitLen);
continue 2;
}
}
throw new Exception();
}
return $arr;
}
Examples:
$baseDec = str_split('0123456789');
$baseHex = str_split('0123456789abcdef');
echo base_conv(255, $baseHex, $baseDec); // ff
echo base_conv('ff', $baseDec, $baseHex); // 255
// multi-character base:
$baseHelloworld = array('hello ', 'world ');
echo base_conv(37, $baseHelloworld, $baseDec); // world hello hello world hello world
echo base_conv('world hello hello world hello world ', $baseDec, $baseHelloworld); // 37
// ambiguous base:
// don't do this! base_str_to_arr() won't know how to decode e.g. '11111'
// (well it does, but the result might not be what you'd expect;
// It matches digits sequentially so '11111' would be array(0, 0, 1)
// here (matched as '11', '11', '1' since they come first in the array))
$baseAmbiguous = array('11', '1', '111');

Categories