list(,$nfields) = unpack ( "N*", substr ( $response, $p, 4 ) ); $p += 4;
The question is, why "N*" if substr should return 4 bytes, and they will be unpacked as N? And why double assignment?
UPD: This code is part of Sphinx native PHP connector. After some code hacking it became clear that this code extracts 4-byte integer. But logic behind double assignment and substr / N* is still unclear to me. I'm offering a bounty to finally understand it.
We'd need to see the revision history of the file but some possibilities are:
These are the remains of a previous algorithm that was progressively stripped of functionality but never cleaned up.
It's the typical spaghetti code we all produce after a bad night.
It's an optimization that speeds up the code for large input strings.
These are all synonyms:
<?php
$packed = pack('N*', 100, 200, 300);
// 1
var_dump( unpack('N*', $packed) );
// 2
var_dump( unpack('N*', substr($packed, 0, 4)) );
var_dump( unpack('N*', substr($packed, 4, 4)) );
var_dump( unpack('N*', substr($packed, 8, 4)) );
// 3
var_dump( unpack('N', substr($packed, 0, 4)) );
var_dump( unpack('N', substr($packed, 4, 4)) );
var_dump( unpack('N', substr($packed, 8, 4)) );
?>
I did the typical repeat-a-thousand-times benchmark with three integers and 1 is way faster. However, a similar test with 10,000 integers shows that 1 is the slowest :-!
0.82868695259094 seconds
0.0046610832214355 seconds
0.0029149055480957 seconds
Being a full-text engine where performance is a must, I'd dare say it's an optimization.
The code is probably a bug. This kind of loop is precisely the reason why * exists...
unpack ( "N*", substr ( $response, $p,
4 ) );
Specifies the format to use when unpacking the data from substr()
N - unsigned long, always 32 bit, big endian byte order
Related
Given a string of bytes I want their binary representation such that the bits of each byte are ordered from least to most significant (as produced by Perl's unpack "b*").
For example,
"\x28\x9b"
should return
"0001010011011001"
In this post Ilmari Karonen described how to achieve pack "b*" in PHP. So I thought all I have to do is split the hex string into bytes and run them through base_convert.
function unpack_B($data) {
$unpacked = unpack("H*", $data)[1];
$nibbles = str_split($unpacked, 2);
foreach ($nibbles as $i => $nibble) {
$nibbles[$i] = base_convert($nibble, 16, 2);
}
return implode("", $nibbles);
}
However, It's returning something different.
What am I missing here?
Looking at the docs for perl's pack() it seems like B is the usual "big endian" [I know I'm abusing this term] "descending" bit order, and b is "little endian"/"ascending".
I honestly cannot parse what on earth that the code/answer you've linked is supposed to do, so I've written it all from scratch based on what the perl docs say the pack arguments do.
function bin_to_litbin($input, $be=true) {
return implode(
'',
array_map(
function($a)use($be){
$ret = str_pad(decbin(ord($a)), 8, '0', STR_PAD_LEFT);
if(!$be) {
$ret = strrev($ret);
}
return $ret;
},
str_split($input, 1)
)
);
}
function litbin_to_bin($input, $be=true) {
return implode(
'',
array_map(
function($a)use($be){
if(!$be) {
$a=strrev($a);
}
return chr(bindec($a));
},
str_split($input, 8)
)
);
}
$hex = '00289b150065b302a06c560094cd0a80';
$bin = hex2bin($hex);
var_dump(
$hex,
$cur = bin_to_litbin($bin, false),
bin2hex(litbin_to_bin($cur, false))
);
where $be=true is B/"big endian" and $be=false is b/"little endian".
Output:
string(32) "00289b150065b302a06c560094cd0a80"
string(128) "00000000000101001101100110101000000000001010011011001101010000000000010100110110011010100000000000101001101100110101000000000001"
string(32) "00289b150065b302a06c560094cd0a80"
Though truth be told I cannot think of any practical reason to ever encode data as literal zero and one characters. It is wildly unnecessary and wasteful compared to literally any other encoding. I would wager that that is why B and b were never implemented in PHP.
Base64 is 1.33x the length its input, hex is 2x, and literal binary is 8x.
Old question name: How to effectively split a binary string in a groups of 10, 0, 11?
I have some strings as an input, which are binary representation of a number.
For example:
10011
100111
0111111
11111011101
I need to split these strings (or arrays) into groups of 10, 0, and 11 in order to replace them.
10 => 11
0 => 0
11 => 10
How to do it? I have tried these options but don't work.
preg_match('/([10]{2})(0{1})([11]{2})/', $S, $matches);
It should be [10] [0], [11] for 10011 input.
And it should be 11010 when replaced.
UPD1.
Actually, I'm trying to do a negation algorithm for converting a positive number in a base -2 to a negative one in a base -2.
It could be done with an algorithm from Wikipedia with a loop. But byte groups replacing is a much faster. I have implemented it already and just trying to optimize it.
For this case 0111111 it's possible to add 0 in the end. Then rules will be applied. And we could remove leading zeros in a result. The output will be 101010.
UPD2.
#Wiktor Stribiżew proposed an idea how to do a replace immediately, without splitting bytes into groups first.
But I have a faster solution already.
$S = strtr($S, $rules);
The meaning of this question isn't do a replacement, but get an array of desired groups [11] [0] [10].
UPD3.
This is a solution which I reached with an idea of converting binary groups. It's faster than one with a loop.
function solution2($A)
{
$S = implode('', $A);
//we could add leading 0
if (substr($S, strlen($S) - 1, 1) == 1) {
$S .= '0';
}
$rules = [
'10' => '11',
'0' => '0',
'11' => '10',
];
$S = strtr($S, $rules);
$arr = str_split($S);
//remove leading 0
while ($arr[count($arr) - 1] == 0) {
array_pop($arr);
}
return $arr;
}
But the solution in #Alex Blex answer is faster.
You may use a simple /11|10/ regex with a preg_replace_callback:
$s = '10011';
echo preg_replace_callback("/11|10/", function($m) {
return $m[0] == "11" ? "10" : "11"; // if 11 is matched, replace with 10 or vice versa
}, $s);
// => 11010
See the online PHP demo.
Answering the question
algorithm for converting a positive number in a base -2 to a negative one in a base -2
I believe following function is more efficient than a regex:
function negate($negabin)
{
$mask = 0xAAAAAAAAAAAAAAA;
return decbin((($mask<<1)-($mask^bindec($negabin)))^$mask);
}
Parameter is a positive int60 in a base -2 notation, e.g. 11111011101.
The function converts the parameter to base 10, negate it, and convert it back to base -2 as described in the wiki: https://en.wikipedia.org/wiki/Negative_base#To_negabinary
Works on 64bit system, but can easily adopted to work on 32bit.
I am trying to interpret a binary string as an unsigned big endian integer, as her instructions here: http://mimesniff.spec.whatwg.org/#matches-the-signature-for-mp4 (point 4)
I'm not quite sure what I need to do here, but here are my attempts:
// ONE
$box_size = substr( $sequence, 0, 4 );
$box_size = pack( 'C*', $box_size[0], $box_size[1], $box_size[2], $box_size[3] );
$box_size = unpack( 'N*', $box_size );
// TWO
$box_size = substr( $sequence, 0, 4 );
$box_size = array_map( 'ord', str_split( $box_size ) );
// THREE
$box_size = substr( $sequence, 0, 4 );
$box_size = bindec( $box_size );
// FOUR
$box_size = substr( $sequence, 0, 4);
$box_size = (int) $box_size;
I have had no luck, and honestly am not sure what the result should even be.. Does anyone understand this? I think I might be on the right track with pack and unpack.
I'll go ahead and post the comment as an answer then...
To unpack a "compact" bit representation of something into a native type, just use unpack with the right parameters to denote the type of bits you're unpacking. In your case:
$unpacked = unpack('N', $unsignedBigEndianInteger);
$int = $unpacked[1];
This makes PHP read the byte stream assuming it represents an unsigned big endian long and convert it into a PHP native integer.
Remember:
pack: "large native type" → squeeze into byte representation
unpack: compact byte representation → "large native type"
I'm trying to encode a string using the Crockford Base32 Algorithm.
Unfortunately, my current code only accepts numeric values as input. I thought of converting the ASCII characters to Decimal or Octal, but then the concatenation of 010 and 100 results in 10100 which makes it impossible to decode this. Is there some way to do this I am not aware of?
I believe this should be a more efficient implementation of Crockford Base32 encoding:
function crockford_encode( $base10 ) {
return strtr( base_convert( $base10, 10, 32 ),
"abcdefghijklmnopqrstuv",
"ABCDEFGHJKMNPQRSTVWXYZ" );
}
function crockford_decode( $base32 ) {
$base32 = strtr( strtoupper( $base32 ),
"ABCDEFGHJKMNPQRSTVWXYZILO",
"abcdefghijklmnopqrstuv110" );
return base_convert( $base32, 32, 10 );
}
(demo on codepad.org)
Note that, due to known limitations (or, arguably, bugs) in PHP's base_convert() function, these functions will only return correct results for values that can be accurately represented by PHP's internal numeric type (probably double). We can hope that this will be fixed in some future PHP version, but in the mean time, you could always use this drop-in replacement for base_convert().
Edit: The easiest way to compute the optional check digit is probably simply like this:
function crockford_check( $base10 ) {
return substr( "0123456789ABCDEFGHJKMNPQRSTVWXYZ*~$=U", $base10 % 37, 1 );
}
or, for large numbers:
function crockford_check( $base10 ) {
return substr( "0123456789ABCDEFGHJKMNPQRSTVWXYZ*~$=U", bcmod( $base10, 37 ), 1 );
}
We can then use it like this:
function crockford_encode_check( $base10 ) {
return crockford_encode( $base10 ) . crockford_check( $base10 );
}
function crockford_decode_check( $base32 ) {
$base10 = crockford_decode( substr( $base32, 0, -1 ) );
if ( strtoupper( substr( $base32, -1 ) ) != crockford_check( $base10 ) ) {
return null; // wrong checksum
}
return $base10;
}
(demo on codepad.org)
Note: (July 18, 2014) The original version of the code above had a bug in the Crockford alphabet strings, such that they read ...WZYZ instead of ...WXYZ, causing some numbers to be encoded and decoded incorrectly. This bug has now been fixed, and the codepad.org versions now include a basic self-test routine to verify this. Thanks to James Firth for spotting the bug and fixing it.
I have sets of 5, 6 and 7 digit numbers. I need them to be displayed in the 000/000/000 format. So for example:
12345 would be displayed as 000/012/345
and
9876543 would be displayed as 009/876/543
I know how to do this in a messy way, involving a series of if/else statements, and strlen functions, but there has to be a cleaner way involving regex that Im not seeing.
sprintf and modulo is one option
function formatMyNumber($num)
{
return sprintf('%03d/%03d/%03d',
$num / 1000000,
($num / 1000) % 1000,
$num % 1000);
}
$padded = str_pad($number, 9, '0', STR_PAD_LEFT);
$split = str_split($padded, 3);
$formatted = implode('/', $split);
You asked for a regex solution, and I love playing with them, so here is a regex solution!
I show it for educational (and fun) purpose only, just use Adam's solution, clean, readable and fast.
function FormatWithSlashes($number)
{
return substr(preg_replace('/(\d{3})?(\d{3})?(\d{3})$/', '$1/$2/$3',
'0000' . $number),
-11, 11);
}
$numbers = Array(12345, 345678, 9876543);
foreach ($numbers as $val)
{
$r = FormatWithSlashes($val);
echo "<p>$r</p>";
}
OK, people are throwing stuff out, so I will too!
number_format would be great, because it accepts a thousands separator, but it doesn't do padding zeroes like sprintf and the like. So here's what I came up with for a one-liner:
function fmt($x) {
return substr(number_format($x+1000000000, 0, ".", "/"), 2);
}
Minor improvement to PhiLho's suggestion:
You can avoid the substr by changing the regex to:
function FormatWithSlashes($number)
{
return preg_replace('/^0*(\d{3})(\d{3})(\d{3})$/', '$1/$2/$3',
'0000' . $number);
}
I also removed the ? after each of the first two capture groups because, when given a 5, 6, or 7 digit number (as specified in the question), this will always have at least 9 digits to work with. If you want to guard against the possibility of receiving a smaller input number, run the regex against '000000000' . $number instead.
Alternately, you could use
substr('0000' . $number, -9, 9);
and then splice the slashes in at the appropriate places with substr_replace, which I suspect may be the fastest way to do this (no need to run regexes or do division), but that's really just getting into pointless optimization, as any of the solutions presented will still be much faster than establishing a network connection to the server.
This would be how I would write it if using Perl 5.10 .
use 5.010;
sub myformat(_;$){
# prepend with zeros
my $_ = 0 x ( 9-length($_[0]) ) . $_[0];
my $join = $_[1] // '/'; # using the 'defined or' operator `//`
# m// in a list context returns ($1,$2,$3,...)
join $join, m/ ^ (\d{3}) (\d{3}) (\d{3}) $ /x;
}
Tested with:
$_ = 11111;
say myformat;
say myformat(2222);
say myformat(33333,';');
say $_;
returns:
000/011/111
000/002/222
000;033;333
11111
Back-ported to Perl 5.8 :
sub myformat(;$$){
local $_ = #_ ? $_[0] : $_
# prepend with zeros
$_ = 0 x ( 9-length($_) ) . $_;
my $join = defined($_[1]) ? $_[1] :'/';
# m// in a list context returns ($1,$2,$3,...)
join $join, m/ ^ (\d{3}) (\d{3}) (\d{3}) $ /x;
}
Here's how I'd do it in python (sorry I don't know PHP as well). I'm sure you can convert it.
def convert(num): #num is an integer
a = str(num)
s = "0"*(9-len(a)) + a
return "%s/%s/%s" % (s[:3], s[3:6], s[6:9])
This just pads the number to have length 9, then splits the substrings.
That being said, it seems the modulo answer is a bit better.