Implementing the Crockford Base32 encoding in PHP

Implementing the Crockford Base32 encoding in PHP - php

I'm trying to encode a string using the Crockford Base32 Algorithm.
Unfortunately, my current code only accepts numeric values as input. I thought of converting the ASCII characters to Decimal or Octal, but then the concatenation of 010 and 100 results in 10100 which makes it impossible to decode this. Is there some way to do this I am not aware of?

I believe this should be a more efficient implementation of Crockford Base32 encoding:
function crockford_encode( $base10 ) {
return strtr( base_convert( $base10, 10, 32 ),
"abcdefghijklmnopqrstuv",
"ABCDEFGHJKMNPQRSTVWXYZ" );
}
function crockford_decode( $base32 ) {
$base32 = strtr( strtoupper( $base32 ),
"ABCDEFGHJKMNPQRSTVWXYZILO",
"abcdefghijklmnopqrstuv110" );
return base_convert( $base32, 32, 10 );
}
(demo on codepad.org)
Note that, due to known limitations (or, arguably, bugs) in PHP's base_convert() function, these functions will only return correct results for values that can be accurately represented by PHP's internal numeric type (probably double). We can hope that this will be fixed in some future PHP version, but in the mean time, you could always use this drop-in replacement for base_convert().
Edit: The easiest way to compute the optional check digit is probably simply like this:
function crockford_check( $base10 ) {
return substr( "0123456789ABCDEFGHJKMNPQRSTVWXYZ*~$=U", $base10 % 37, 1 );
}
or, for large numbers:
function crockford_check( $base10 ) {
return substr( "0123456789ABCDEFGHJKMNPQRSTVWXYZ*~$=U", bcmod( $base10, 37 ), 1 );
}
We can then use it like this:
function crockford_encode_check( $base10 ) {
return crockford_encode( $base10 ) . crockford_check( $base10 );
}
function crockford_decode_check( $base32 ) {
$base10 = crockford_decode( substr( $base32, 0, -1 ) );
if ( strtoupper( substr( $base32, -1 ) ) != crockford_check( $base10 ) ) {
return null; // wrong checksum
}
return $base10;
}
(demo on codepad.org)
Note: (July 18, 2014) The original version of the code above had a bug in the Crockford alphabet strings, such that they read ...WZYZ instead of ...WXYZ, causing some numbers to be encoded and decoded incorrectly. This bug has now been fixed, and the codepad.org versions now include a basic self-test routine to verify this. Thanks to James Firth for spotting the bug and fixing it.

Related

Implement unpack b from Perl in PHP

Given a string of bytes I want their binary representation such that the bits of each byte are ordered from least to most significant (as produced by Perl's unpack "b*").
For example,
"\x28\x9b"
should return
"0001010011011001"
In this post Ilmari Karonen described how to achieve pack "b*" in PHP. So I thought all I have to do is split the hex string into bytes and run them through base_convert.
function unpack_B($data) {
$unpacked = unpack("H*", $data)[1];
$nibbles = str_split($unpacked, 2);
foreach ($nibbles as $i => $nibble) {
$nibbles[$i] = base_convert($nibble, 16, 2);
}
return implode("", $nibbles);
}
However, It's returning something different.
What am I missing here?

Looking at the docs for perl's pack() it seems like B is the usual "big endian" [I know I'm abusing this term] "descending" bit order, and b is "little endian"/"ascending".
I honestly cannot parse what on earth that the code/answer you've linked is supposed to do, so I've written it all from scratch based on what the perl docs say the pack arguments do.
function bin_to_litbin($input, $be=true) {
return implode(
'',
array_map(
function($a)use($be){
$ret = str_pad(decbin(ord($a)), 8, '0', STR_PAD_LEFT);
if(!$be) {
$ret = strrev($ret);
}
return $ret;
},
str_split($input, 1)
)
);
}
function litbin_to_bin($input, $be=true) {
return implode(
'',
array_map(
function($a)use($be){
if(!$be) {
$a=strrev($a);
}
return chr(bindec($a));
},
str_split($input, 8)
)
);
}
$hex = '00289b150065b302a06c560094cd0a80';
$bin = hex2bin($hex);
var_dump(
$hex,
$cur = bin_to_litbin($bin, false),
bin2hex(litbin_to_bin($cur, false))
);
where $be=true is B/"big endian" and $be=false is b/"little endian".
Output:
string(32) "00289b150065b302a06c560094cd0a80"
string(128) "00000000000101001101100110101000000000001010011011001101010000000000010100110110011010100000000000101001101100110101000000000001"
string(32) "00289b150065b302a06c560094cd0a80"
Though truth be told I cannot think of any practical reason to ever encode data as literal zero and one characters. It is wildly unnecessary and wasteful compared to literally any other encoding. I would wager that that is why B and b were never implemented in PHP.
Base64 is 1.33x the length its input, hex is 2x, and literal binary is 8x.

Converting binary string to hex in PHP

I have a string of binary text which is a multiple of 8 characters long. Take the following for example.
$fullNameBin = "01010001000010110110010000010011000110000111100011001011111110110100111100111100";
I wish to convert this to hex. Note that the bits of each byte are in least significant to most significant order, so the above should result in
$fullNameCoded = "8AD026C8181ED3DFF23C";
In Perl, this can be achieved using
my $fullNameCoded = "";
for ( unpack( '(A8)*', $fullNameBin ) ) {
$fullNameCoded .= sprintf( "%02X", oct( "0b" . reverse( $_ ) ) );
}
or
my $fullNameCoded = uc unpack 'H*', pack 'b*', $fullNameBin;
PHP's pack/unpack is much more limited than Perl's, and a naive translation was unfruitful.
foreach ( unpack( "A8*", $fullNameBin) as $item ) {
$fullNameCoded .= sprintf( "%02X", octdec( "0b" . strrev( $item ) ) );
}

What I did is splitting the binary string representation into 8 chars, reversed it and created the hex representation from it and uppercased them.
$fullNameBin = "01010001000010110110010000010011000110000111100011001011111110110100111100111100";
$fullNameCoded = '';
foreach(str_split($fullNameBin, 8) as $char) {
$fullNameCoded .= strtoupper(dechex(bindec(strrev($char))));
}
gives
8AD026C8181ED3DFF23C
I may be totally wrong, please anyone correct me:
For me it looks like the unpack "A8*" can get 8 characters from the string, but is not capable of repeating until end when a number is given.
"A*" will give the whole string.
"A8" will give 8 chars
"A8*" will give also only 8 and does not repeat.
Conclusion
I think it does not work with unpack, because $fullNameBin is not binary, but a string representation of the binary data.

Validate TelephoneNumber PHP

I was given a task to validate a telephone number (stored in the var $number) introduced by a user in my website
$number = $_POST["telephone"];
The thing is this validation is quite complex as i must validate the number to see if it is from Portugal. i thought about validating it using all the indicators from Portugal, which are 52: (50 indicators are 3 digits long and 2 indicators are 2 digits long) Example of a number:
254872272 (254 is the indicator)
i also thought about making an array with all the indicators and then with a cycle verificate somehow if the first 2/3 digits are equal to the ones in the array.
what do you guys think? how should i solve this problem?

One way is to use regular expressions with named subpatterns:
$number = 254872272;
$ind = array( 251, 252, 254 );
preg_match( '/^(?<ind>\d{3})(?<rest>\d{6})$/', $number, $match );
if ( isset($match['ind']) && in_array( (int) $match['ind'], $ind, true ) ) {
print_r( $match );
/*
Array
(
[0] => 254872272
[ind] => 254
[1] => 254
[rest] => 872272
[2] => 872272
)
*/
}
Or you can insert indicators directly into regular expression:
preg_match( '/^(?<ind>251|252|254)(?<rest>\d{6})$/', $number, $match );

There's potential REGEX ways of "solving" this, but really, all you need is in_array() with your indicators in an array. For example:
$indicators = array('254', '072', '345');
$numbers = array(
'254872272',
'225872272',
'054872272',
'072872272',
'294872272',
'974872272',
'345872272'
);
while ($number = array_shift($numbers)) {
$indicator = substr($number, 0, 3);
if (in_array($indicator, $indicators)) {
echo "$number is indicated ($indicator).\n";
} else {
echo "$number is NOT indicated ($indicator).\n";
}
}
http://codepad.org/zesUaxF7
This gives:
254872272 is indicated (254).
225872272 is NOT indicated (225).
054872272 is NOT indicated (054).
072872272 is indicated (072).
294872272 is NOT indicated (294).
974872272 is NOT indicated (974).
345872272 is indicated (345).
Also, I use strings instead of integers on purpose, since PHP is going to interpret any numbers that begin with 0 (like 0724445555) as not having a leading zero, so you need to use a string to make sure that works correctly.

Perhaps with a regular expression?
I have not tested the following, but it should check for one of the matching indicators, followed by any 6 digits, something like:
$indicators = array('123' ,'456', '78'); // etc...
$regex = '/^(' . implode('|', $indicators) . ')[0-9]{6}$/';
if(preg_match($regex, 'your test number')) {
// Run further code...
}

There's a couple of libraries around that aim to validate as many telephone number formats as possible against the actual validation format, as defined by the relevant authorities.
They are usually based on a library by Google, and there are versions for PHP.

PHP Sum of two numbers resulting in a large numbers with a + symbol [duplicate]

Ok, so PHP isn't the best language to be dealing with arbitrarily large integers in, considering that it only natively supports 32-bit signed integers. What I'm trying to do though is create a class that could represent an arbitrarily large binary number and be able to perform simple arithmetic operations on two of them (add/subtract/multiply/divide).
My target is dealing with 128-bit integers.
There's a couple of approaches I'm looking at, and problems I see with them. Any input or commentary on what you would choose and how you might go about it would be greatly appreciated.
Approach #1: Create a 128-bit integer class that stores its integer internally as four 32-bit integers. The only problem with this approach is that I'm not sure how to go about handling overflow/underflow issues when manipulating individual chunks of the two operands.
Approach #2: Use the bcmath extension, as this looks like something it was designed to tackle. My only worry in taking this approach is the scale setting of the bcmath extension, because there can't be any rounding errors in my 128-bit integers; they must be precise. I'm also worried about being able to eventually convert the result of the bcmath functions into a binary string (which I'll later need to shove into some mcrypt encryption functions).
Approach #3: Store the numbers as binary strings (probably LSB first). Theoretically I should be able to store integers of any arbitrary size this way. All I would have to do is write the four basic arithmetic functions to perform add/sub/mult/div on two binary strings and produce a binary string result. This is exactly the format I need to hand over to mcrypt as well, so that's an added plus. This is the approach I think has the most promise at the moment, but the one sticking point I've got is that PHP doesn't offer me any way to manipulate the individual bits (that I know of). I believe I'd have to break it up into byte-sized chunks (no pun intended), at which point my questions about handling overflow/underflow from Approach #1 apply.

The PHP GMP extension will be better for this. As an added bonus, you can use it to do your decimal-to-binary conversion, like so:
gmp_strval(gmp_init($n, 10), 2);

There are already various classes available for this so you may wish to look at them before writing your own solution (if indeed writing your own solution is still needed).

As far as I can tell, the bcmath extension is the one you'll want. The data in the PHP manual is a little sparse, but you out to be able to set the precision to be exactly what you need by using the bcscale() function, or the optional third parameter in most of the other bcmath functions. Not too sure on the binary strings thing, but a bit of googling tells me you ought to be able to do with by making use of the pack() function.

I implemented the following PEMDAS complaint BC evaluator which may be useful to you.
function BC($string, $precision = 32)
{
if (extension_loaded('bcmath') === true)
{
if (is_array($string) === true)
{
if ((count($string = array_slice($string, 1)) == 3) && (bcscale($precision) === true))
{
$callback = array('^' => 'pow', '*' => 'mul', '/' => 'div', '%' => 'mod', '+' => 'add', '-' => 'sub');
if (array_key_exists($operator = current(array_splice($string, 1, 1)), $callback) === true)
{
$x = 1;
$result = #call_user_func_array('bc' . $callback[$operator], $string);
if ((strcmp('^', $operator) === 0) && (($i = fmod(array_pop($string), 1)) > 0))
{
$y = BC(sprintf('((%1$s * %2$s ^ (1 - %3$s)) / %3$s) - (%2$s / %3$s) + %2$s', $string = array_shift($string), $x, $i = pow($i, -1)));
do
{
$x = $y;
$y = BC(sprintf('((%1$s * %2$s ^ (1 - %3$s)) / %3$s) - (%2$s / %3$s) + %2$s', $string, $x, $i));
}
while (BC(sprintf('%s > %s', $x, $y)));
}
if (strpos($result = bcmul($x, $result), '.') !== false)
{
$result = rtrim(rtrim($result, '0'), '.');
if (preg_match(sprintf('~[.][9]{%u}$~', $precision), $result) > 0)
{
$result = bcadd($result, (strncmp('-', $result, 1) === 0) ? -1 : 1, 0);
}
else if (preg_match(sprintf('~[.][0]{%u}[1]$~', $precision - 1), $result) > 0)
{
$result = bcmul($result, 1, 0);
}
}
return $result;
}
return intval(version_compare(call_user_func_array('bccomp', $string), 0, $operator));
}
$string = array_shift($string);
}
$string = str_replace(' ', '', str_ireplace('e', ' * 10 ^ ', $string));
while (preg_match('~[(]([^()]++)[)]~', $string) > 0)
{
$string = preg_replace_callback('~[(]([^()]++)[)]~', __FUNCTION__, $string);
}
foreach (array('\^', '[\*/%]', '[\+-]', '[<>]=?|={1,2}') as $operator)
{
while (preg_match(sprintf('~(?<![0-9])(%1$s)(%2$s)(%1$s)~', '[+-]?(?:[0-9]++(?:[.][0-9]*+)?|[.][0-9]++)', $operator), $string) > 0)
{
$string = preg_replace_callback(sprintf('~(?<![0-9])(%1$s)(%2$s)(%1$s)~', '[+-]?(?:[0-9]++(?:[.][0-9]*+)?|[.][0-9]++)', $operator), __FUNCTION__, $string, 1);
}
}
}
return (preg_match('~^[+-]?[0-9]++(?:[.][0-9]++)?$~', $string) > 0) ? $string : false;
}
It automatically deals with rounding errors, just set the precision to whatever digits you need.

PHP unpack question

list(,$nfields) = unpack ( "N*", substr ( $response, $p, 4 ) ); $p += 4;
The question is, why "N*" if substr should return 4 bytes, and they will be unpacked as N? And why double assignment?
UPD: This code is part of Sphinx native PHP connector. After some code hacking it became clear that this code extracts 4-byte integer. But logic behind double assignment and substr / N* is still unclear to me. I'm offering a bounty to finally understand it.

We'd need to see the revision history of the file but some possibilities are:
These are the remains of a previous algorithm that was progressively stripped of functionality but never cleaned up.
It's the typical spaghetti code we all produce after a bad night.
It's an optimization that speeds up the code for large input strings.
These are all synonyms:
<?php
$packed = pack('N*', 100, 200, 300);
// 1
var_dump( unpack('N*', $packed) );
// 2
var_dump( unpack('N*', substr($packed, 0, 4)) );
var_dump( unpack('N*', substr($packed, 4, 4)) );
var_dump( unpack('N*', substr($packed, 8, 4)) );
// 3
var_dump( unpack('N', substr($packed, 0, 4)) );
var_dump( unpack('N', substr($packed, 4, 4)) );
var_dump( unpack('N', substr($packed, 8, 4)) );
?>
I did the typical repeat-a-thousand-times benchmark with three integers and 1 is way faster. However, a similar test with 10,000 integers shows that 1 is the slowest :-!
0.82868695259094 seconds
0.0046610832214355 seconds
0.0029149055480957 seconds
Being a full-text engine where performance is a must, I'd dare say it's an optimization.

The code is probably a bug. This kind of loop is precisely the reason why * exists...

unpack ( "N*", substr ( $response, $p,
4 ) );
Specifies the format to use when unpacking the data from substr()
N - unsigned long, always 32 bit, big endian byte order

We Keep Coding

PHP, A popular general-purpose scripting language that is especially suited to web development.