Given a string of bytes I want their binary representation such that the bits of each byte are ordered from least to most significant (as produced by Perl's unpack "b*").
For example,
"\x28\x9b"
should return
"0001010011011001"
In this post Ilmari Karonen described how to achieve pack "b*" in PHP. So I thought all I have to do is split the hex string into bytes and run them through base_convert.
function unpack_B($data) {
$unpacked = unpack("H*", $data)[1];
$nibbles = str_split($unpacked, 2);
foreach ($nibbles as $i => $nibble) {
$nibbles[$i] = base_convert($nibble, 16, 2);
}
return implode("", $nibbles);
}
However, It's returning something different.
What am I missing here?
Looking at the docs for perl's pack() it seems like B is the usual "big endian" [I know I'm abusing this term] "descending" bit order, and b is "little endian"/"ascending".
I honestly cannot parse what on earth that the code/answer you've linked is supposed to do, so I've written it all from scratch based on what the perl docs say the pack arguments do.
function bin_to_litbin($input, $be=true) {
return implode(
'',
array_map(
function($a)use($be){
$ret = str_pad(decbin(ord($a)), 8, '0', STR_PAD_LEFT);
if(!$be) {
$ret = strrev($ret);
}
return $ret;
},
str_split($input, 1)
)
);
}
function litbin_to_bin($input, $be=true) {
return implode(
'',
array_map(
function($a)use($be){
if(!$be) {
$a=strrev($a);
}
return chr(bindec($a));
},
str_split($input, 8)
)
);
}
$hex = '00289b150065b302a06c560094cd0a80';
$bin = hex2bin($hex);
var_dump(
$hex,
$cur = bin_to_litbin($bin, false),
bin2hex(litbin_to_bin($cur, false))
);
where $be=true is B/"big endian" and $be=false is b/"little endian".
Output:
string(32) "00289b150065b302a06c560094cd0a80"
string(128) "00000000000101001101100110101000000000001010011011001101010000000000010100110110011010100000000000101001101100110101000000000001"
string(32) "00289b150065b302a06c560094cd0a80"
Though truth be told I cannot think of any practical reason to ever encode data as literal zero and one characters. It is wildly unnecessary and wasteful compared to literally any other encoding. I would wager that that is why B and b were never implemented in PHP.
Base64 is 1.33x the length its input, hex is 2x, and literal binary is 8x.
Related
pack doesn't support them, so how to read and write 64-bit unsigned little-endian encoded integers?
You could consider 64 bit numbers as an opaque chunk of 8 bytes of binary data and use the usual string manipulation functions to handle them (i.e. substr and the dot operator above all).
Whenever (and if) you need to perform arithmetics on them, you can use a couple of wrapper functions to encode/decode that chunk in its hexadecimal representation (in my implementation I call this intermediate type hex64) and use the external library you prefer to do the real work:
<?php
/* Save as test.php */
function chunk_to_hex64($chunk) {
assert(strlen($chunk) == 8);
return strrev(bin2hex($chunk));
}
function hex64_to_chunk($hex64) {
assert(strlen($hex64) == 16);
return strrev(pack('h*', $hex64));
}
// Test code:
function to_hex64($number) {
$hex64 = base_convert($number, 10, 16);
// Ensure the hex64 is left padded with '0'
return str_pad($hex64, 16, '0');
}
for ($number = 0.0; $number < 18446744073709551615.0; $number += 2932031007403.0) {
$hex64 = to_hex64($number);
$hex64_reencoded = chunk_to_hex64(hex64_to_chunk($hex64));
assert($hex64_reencoded == $hex64, "Result is $hex64_reencoded, expected $hex64");
}
$data = file_get_contents('test.php');
// Skip the last element because it is not 8 bytes
$chunks = array_slice(str_split($data, 8), 0, -1);
foreach ($chunks as $chunk) {
$hex64 = to_hex64($number);
$chunk_reencoded = hex64_to_chunk(chunk_to_hex64($chunk));
assert($chunk_reencoded == $chunk, "Result is $chunk_reencoded, expected $chunk");
}
I wrote a helper class for packing/unpacking 64-bit unsigned ints.
The relevant bit is just two lines:
$ints = unpack("#$offset/Vlo/Vhi", $data);
$sum = Math::add($ints['lo'], Math::mul($ints['hi'], '4294967296'));
(Math::* is a simple wrapper around bcmath)
And packing:
$out .= pack('VV', (int)bcmod($args[$idx],'4294967296'), (int)bcdiv($args[$idx],'4294967296'));
It splits the 64-bit ints into two 32-bit ints, which 64-bit PHP should support :-)
Ok, so PHP isn't the best language to be dealing with arbitrarily large integers in, considering that it only natively supports 32-bit signed integers. What I'm trying to do though is create a class that could represent an arbitrarily large binary number and be able to perform simple arithmetic operations on two of them (add/subtract/multiply/divide).
My target is dealing with 128-bit integers.
There's a couple of approaches I'm looking at, and problems I see with them. Any input or commentary on what you would choose and how you might go about it would be greatly appreciated.
Approach #1: Create a 128-bit integer class that stores its integer internally as four 32-bit integers. The only problem with this approach is that I'm not sure how to go about handling overflow/underflow issues when manipulating individual chunks of the two operands.
Approach #2: Use the bcmath extension, as this looks like something it was designed to tackle. My only worry in taking this approach is the scale setting of the bcmath extension, because there can't be any rounding errors in my 128-bit integers; they must be precise. I'm also worried about being able to eventually convert the result of the bcmath functions into a binary string (which I'll later need to shove into some mcrypt encryption functions).
Approach #3: Store the numbers as binary strings (probably LSB first). Theoretically I should be able to store integers of any arbitrary size this way. All I would have to do is write the four basic arithmetic functions to perform add/sub/mult/div on two binary strings and produce a binary string result. This is exactly the format I need to hand over to mcrypt as well, so that's an added plus. This is the approach I think has the most promise at the moment, but the one sticking point I've got is that PHP doesn't offer me any way to manipulate the individual bits (that I know of). I believe I'd have to break it up into byte-sized chunks (no pun intended), at which point my questions about handling overflow/underflow from Approach #1 apply.
The PHP GMP extension will be better for this. As an added bonus, you can use it to do your decimal-to-binary conversion, like so:
gmp_strval(gmp_init($n, 10), 2);
There are already various classes available for this so you may wish to look at them before writing your own solution (if indeed writing your own solution is still needed).
As far as I can tell, the bcmath extension is the one you'll want. The data in the PHP manual is a little sparse, but you out to be able to set the precision to be exactly what you need by using the bcscale() function, or the optional third parameter in most of the other bcmath functions. Not too sure on the binary strings thing, but a bit of googling tells me you ought to be able to do with by making use of the pack() function.
I implemented the following PEMDAS complaint BC evaluator which may be useful to you.
function BC($string, $precision = 32)
{
if (extension_loaded('bcmath') === true)
{
if (is_array($string) === true)
{
if ((count($string = array_slice($string, 1)) == 3) && (bcscale($precision) === true))
{
$callback = array('^' => 'pow', '*' => 'mul', '/' => 'div', '%' => 'mod', '+' => 'add', '-' => 'sub');
if (array_key_exists($operator = current(array_splice($string, 1, 1)), $callback) === true)
{
$x = 1;
$result = #call_user_func_array('bc' . $callback[$operator], $string);
if ((strcmp('^', $operator) === 0) && (($i = fmod(array_pop($string), 1)) > 0))
{
$y = BC(sprintf('((%1$s * %2$s ^ (1 - %3$s)) / %3$s) - (%2$s / %3$s) + %2$s', $string = array_shift($string), $x, $i = pow($i, -1)));
do
{
$x = $y;
$y = BC(sprintf('((%1$s * %2$s ^ (1 - %3$s)) / %3$s) - (%2$s / %3$s) + %2$s', $string, $x, $i));
}
while (BC(sprintf('%s > %s', $x, $y)));
}
if (strpos($result = bcmul($x, $result), '.') !== false)
{
$result = rtrim(rtrim($result, '0'), '.');
if (preg_match(sprintf('~[.][9]{%u}$~', $precision), $result) > 0)
{
$result = bcadd($result, (strncmp('-', $result, 1) === 0) ? -1 : 1, 0);
}
else if (preg_match(sprintf('~[.][0]{%u}[1]$~', $precision - 1), $result) > 0)
{
$result = bcmul($result, 1, 0);
}
}
return $result;
}
return intval(version_compare(call_user_func_array('bccomp', $string), 0, $operator));
}
$string = array_shift($string);
}
$string = str_replace(' ', '', str_ireplace('e', ' * 10 ^ ', $string));
while (preg_match('~[(]([^()]++)[)]~', $string) > 0)
{
$string = preg_replace_callback('~[(]([^()]++)[)]~', __FUNCTION__, $string);
}
foreach (array('\^', '[\*/%]', '[\+-]', '[<>]=?|={1,2}') as $operator)
{
while (preg_match(sprintf('~(?<![0-9])(%1$s)(%2$s)(%1$s)~', '[+-]?(?:[0-9]++(?:[.][0-9]*+)?|[.][0-9]++)', $operator), $string) > 0)
{
$string = preg_replace_callback(sprintf('~(?<![0-9])(%1$s)(%2$s)(%1$s)~', '[+-]?(?:[0-9]++(?:[.][0-9]*+)?|[.][0-9]++)', $operator), __FUNCTION__, $string, 1);
}
}
}
return (preg_match('~^[+-]?[0-9]++(?:[.][0-9]++)?$~', $string) > 0) ? $string : false;
}
It automatically deals with rounding errors, just set the precision to whatever digits you need.
I'm trying to encode a string using the Crockford Base32 Algorithm.
Unfortunately, my current code only accepts numeric values as input. I thought of converting the ASCII characters to Decimal or Octal, but then the concatenation of 010 and 100 results in 10100 which makes it impossible to decode this. Is there some way to do this I am not aware of?
I believe this should be a more efficient implementation of Crockford Base32 encoding:
function crockford_encode( $base10 ) {
return strtr( base_convert( $base10, 10, 32 ),
"abcdefghijklmnopqrstuv",
"ABCDEFGHJKMNPQRSTVWXYZ" );
}
function crockford_decode( $base32 ) {
$base32 = strtr( strtoupper( $base32 ),
"ABCDEFGHJKMNPQRSTVWXYZILO",
"abcdefghijklmnopqrstuv110" );
return base_convert( $base32, 32, 10 );
}
(demo on codepad.org)
Note that, due to known limitations (or, arguably, bugs) in PHP's base_convert() function, these functions will only return correct results for values that can be accurately represented by PHP's internal numeric type (probably double). We can hope that this will be fixed in some future PHP version, but in the mean time, you could always use this drop-in replacement for base_convert().
Edit: The easiest way to compute the optional check digit is probably simply like this:
function crockford_check( $base10 ) {
return substr( "0123456789ABCDEFGHJKMNPQRSTVWXYZ*~$=U", $base10 % 37, 1 );
}
or, for large numbers:
function crockford_check( $base10 ) {
return substr( "0123456789ABCDEFGHJKMNPQRSTVWXYZ*~$=U", bcmod( $base10, 37 ), 1 );
}
We can then use it like this:
function crockford_encode_check( $base10 ) {
return crockford_encode( $base10 ) . crockford_check( $base10 );
}
function crockford_decode_check( $base32 ) {
$base10 = crockford_decode( substr( $base32, 0, -1 ) );
if ( strtoupper( substr( $base32, -1 ) ) != crockford_check( $base10 ) ) {
return null; // wrong checksum
}
return $base10;
}
(demo on codepad.org)
Note: (July 18, 2014) The original version of the code above had a bug in the Crockford alphabet strings, such that they read ...WZYZ instead of ...WXYZ, causing some numbers to be encoded and decoded incorrectly. This bug has now been fixed, and the codepad.org versions now include a basic self-test routine to verify this. Thanks to James Firth for spotting the bug and fixing it.
I need PHP function that will create 8 chars long [a-z] hash from any input string.
So e.g. when I'll submit "Stack Overflow" it will return e.g. "gdqreaxc" (8 chars [a-z] no numbers allowed)
Perhaps something like:
$hash = substr(strtolower(preg_replace('/[0-9_\/]+/','',base64_encode(sha1($input)))),0,8);
This produces a SHA1 hash, base-64 encodes it (giving us the full alphabet), removes non-alpha chars, lowercases it, and truncates it.
For $input = 'yar!';:
mwinzewn
For $input = 'yar!!';:
yzzhzwjj
So the spread seems pretty good.
This function will generate a hash containing evenly distributed characters [a-z]:
function my_hash($string, $length = 8) {
// Convert to a string which may contain only characters [0-9a-p]
$hash = base_convert(md5($string), 16, 26);
// Get part of the string
$hash = substr($hash, -$length);
// In rare cases it will be too short, add zeroes
$hash = str_pad($hash, $length, '0', STR_PAD_LEFT);
// Convert character set from [0-9a-p] to [a-z]
$hash = strtr($hash, '0123456789', 'qrstuvwxyz');
return $hash;
}
By the way, if this is important for you, for 100,000 different strings you'll have ~2% chance of hash collision (for a 8 chars long hash), and for a million of strings this chance rises up to ~90%, if my math is correct.
function md5toabc($myMD5)
{
$newString = "";
for ($i = 0; $i < 16; $i+=2)
{
//add the first val of 0-15 to the second val of 0-15 for a range of 0-30
$myintval = hexdec(substr($myMD5, $i, $i +1) ) +
hexdec(substr($myMD5, $i+1, $i +2) );
// mod by 26 and add 97 to get to the lowercase ascii range
$newString .= chr(($myintval%26) + 97);
}
return $newString;
}
Note this introduces bias to various characters, but do with it what you will.
(Like when you roll two dice, the most common value is a 7 combined...) plus the modulo, etc...
one can give you a good a-p{8} (but not a-z) by using and modifying (the output of) a well known algo:
function mini_hash( $string )
{
$h = hash( 'crc32' , $string );
for($i=0;$i<8;$i++) {
$h{$i} = chr(96+hexdec($h{$i}));
}
return $h;
}
interesting set of constraints you posted there
how about
substr (preg_replace(md5($mystring), "/[1-9]/", ""), 0, 8 );
you could add a bit more entorpy by doing a
preg_replace($myString, "1", "g");
preg_replace($myString, "2", "h");
preg_replace($myString, "3", "i");
etc instead of stripping the digits.
Ok, so PHP isn't the best language to be dealing with arbitrarily large integers in, considering that it only natively supports 32-bit signed integers. What I'm trying to do though is create a class that could represent an arbitrarily large binary number and be able to perform simple arithmetic operations on two of them (add/subtract/multiply/divide).
My target is dealing with 128-bit integers.
There's a couple of approaches I'm looking at, and problems I see with them. Any input or commentary on what you would choose and how you might go about it would be greatly appreciated.
Approach #1: Create a 128-bit integer class that stores its integer internally as four 32-bit integers. The only problem with this approach is that I'm not sure how to go about handling overflow/underflow issues when manipulating individual chunks of the two operands.
Approach #2: Use the bcmath extension, as this looks like something it was designed to tackle. My only worry in taking this approach is the scale setting of the bcmath extension, because there can't be any rounding errors in my 128-bit integers; they must be precise. I'm also worried about being able to eventually convert the result of the bcmath functions into a binary string (which I'll later need to shove into some mcrypt encryption functions).
Approach #3: Store the numbers as binary strings (probably LSB first). Theoretically I should be able to store integers of any arbitrary size this way. All I would have to do is write the four basic arithmetic functions to perform add/sub/mult/div on two binary strings and produce a binary string result. This is exactly the format I need to hand over to mcrypt as well, so that's an added plus. This is the approach I think has the most promise at the moment, but the one sticking point I've got is that PHP doesn't offer me any way to manipulate the individual bits (that I know of). I believe I'd have to break it up into byte-sized chunks (no pun intended), at which point my questions about handling overflow/underflow from Approach #1 apply.
The PHP GMP extension will be better for this. As an added bonus, you can use it to do your decimal-to-binary conversion, like so:
gmp_strval(gmp_init($n, 10), 2);
There are already various classes available for this so you may wish to look at them before writing your own solution (if indeed writing your own solution is still needed).
As far as I can tell, the bcmath extension is the one you'll want. The data in the PHP manual is a little sparse, but you out to be able to set the precision to be exactly what you need by using the bcscale() function, or the optional third parameter in most of the other bcmath functions. Not too sure on the binary strings thing, but a bit of googling tells me you ought to be able to do with by making use of the pack() function.
I implemented the following PEMDAS complaint BC evaluator which may be useful to you.
function BC($string, $precision = 32)
{
if (extension_loaded('bcmath') === true)
{
if (is_array($string) === true)
{
if ((count($string = array_slice($string, 1)) == 3) && (bcscale($precision) === true))
{
$callback = array('^' => 'pow', '*' => 'mul', '/' => 'div', '%' => 'mod', '+' => 'add', '-' => 'sub');
if (array_key_exists($operator = current(array_splice($string, 1, 1)), $callback) === true)
{
$x = 1;
$result = #call_user_func_array('bc' . $callback[$operator], $string);
if ((strcmp('^', $operator) === 0) && (($i = fmod(array_pop($string), 1)) > 0))
{
$y = BC(sprintf('((%1$s * %2$s ^ (1 - %3$s)) / %3$s) - (%2$s / %3$s) + %2$s', $string = array_shift($string), $x, $i = pow($i, -1)));
do
{
$x = $y;
$y = BC(sprintf('((%1$s * %2$s ^ (1 - %3$s)) / %3$s) - (%2$s / %3$s) + %2$s', $string, $x, $i));
}
while (BC(sprintf('%s > %s', $x, $y)));
}
if (strpos($result = bcmul($x, $result), '.') !== false)
{
$result = rtrim(rtrim($result, '0'), '.');
if (preg_match(sprintf('~[.][9]{%u}$~', $precision), $result) > 0)
{
$result = bcadd($result, (strncmp('-', $result, 1) === 0) ? -1 : 1, 0);
}
else if (preg_match(sprintf('~[.][0]{%u}[1]$~', $precision - 1), $result) > 0)
{
$result = bcmul($result, 1, 0);
}
}
return $result;
}
return intval(version_compare(call_user_func_array('bccomp', $string), 0, $operator));
}
$string = array_shift($string);
}
$string = str_replace(' ', '', str_ireplace('e', ' * 10 ^ ', $string));
while (preg_match('~[(]([^()]++)[)]~', $string) > 0)
{
$string = preg_replace_callback('~[(]([^()]++)[)]~', __FUNCTION__, $string);
}
foreach (array('\^', '[\*/%]', '[\+-]', '[<>]=?|={1,2}') as $operator)
{
while (preg_match(sprintf('~(?<![0-9])(%1$s)(%2$s)(%1$s)~', '[+-]?(?:[0-9]++(?:[.][0-9]*+)?|[.][0-9]++)', $operator), $string) > 0)
{
$string = preg_replace_callback(sprintf('~(?<![0-9])(%1$s)(%2$s)(%1$s)~', '[+-]?(?:[0-9]++(?:[.][0-9]*+)?|[.][0-9]++)', $operator), __FUNCTION__, $string, 1);
}
}
}
return (preg_match('~^[+-]?[0-9]++(?:[.][0-9]++)?$~', $string) > 0) ? $string : false;
}
It automatically deals with rounding errors, just set the precision to whatever digits you need.