using binary operator in Python... translating if (c1 >= "\xc0" & c1 <= "\xdf") - php

i am converting an external class from PHP to Python, it does some tricks like :
if ($c1 >= "\xc0" & $c1 <= "\xdf")
[...]
$cc1 = (chr(ord($c1) / 64) | "\xc0");
[...]
$cc2 = ($c1 & "\x3f") | "\x80";
where $c1,^$cc1, $cc2 are characters
and i just realized that i cannot use it as such with python, as characters are string, and not duplicately seen as "binary representation of a character" where operators & and | make sense...
please, how would you translate any of these in a Pythonic way ?
>>> c1 = "a"
>>> (c1 & "\x3f") | "\x80"
Traceback (most recent call last):
File "<pyshell#202>", line 1, in <module>
(c1 & "\x3f") | "\x80"
TypeError: unsupported operand type(s) for &: 'str' and 'str'
EDIT: actually, it seems that this PHP class do not work, so it will not fit my needs either. Many thanks for your help.

Use the ord function to get the value and then use actual numbers to do the masking.
>>> c1 = "a"
>>> (ord(c1) & 0x3f) | 0x80
161
>>> hex((ord(c1) & 0x3f) | 0x80)
'0xa1'

That's a primitive UTF-8 encoding function.
c1.encode('utf-8')
Note that unless you use unicodes natively (and why aren't you?) you'll need to decode from 'latin-1' first.

Python 2.7.3 (default, Sep 26 2012, 21:51:14)
>>> c1 = 'd'
>>> # if ($c1 >= "\xc0" & $c1 <= "\xdf")
...
>>> ord(c1) >= 0xc0 and ord(c1) <= 0xdf
False
>>> # $cc1 = (chr(ord($c1) / 64) | "\xc0");
...
>>> chr(ord(c1) / 64 | 0xc0)
'\xc1'
>>> # $cc2 = ($c1 & "\x3f") | "\x80";
...
>>> ord(c1) & 0x3f | 0x80
164
>>>

Related

How this works: chr(($number >>6 )+192).chr(($number & 63)+128);

Can you please explain how this line of code is equivalent to the next code:
<?php
$string = chr( ( $number >> 6 ) + 192 ).chr( ( $number & 63 ) + 128 );
?>
Its equivalent to :
if ( $number >=128 && $number <=2047 ){
$byte1 = 192 + (int)($number / 64); //= 192 + ( $number >> 6 )
$byte2 = 128 + ($number % 64); //= 128 + ( $number & 63 )
$utf = chr($byte1).chr($byte2);
}
for example entering number 1989 both produces ߅
These codes are used for converting UNICODE Entities back to original UTF-8 characters.
The code on top uses binary operators.
>> is right shift operator. It shifts the bit in the number to the right (towards more significant bits).
So 11110000 >> 2 = 00111100
It's equivalent to division by powers of 2
$number >> $n is the same as $number / pow(2,$n).
The & is the "bitwise and" operator. It compares respective bits on both numbers, and sets in result those, that are 1 in both numbers.
11110000 & 01010101 = 01010000
By and'ing $number with 63 (001111111) you get the remainder of dividing $number by 64 (aka the modulus), which is written $number % 64.
$number >> 6 is a binary shift-right operation, ie: 11000000 >> 6 == 00000011 equivalent to $number / pow(2,6) aka $number / 64
$number & 63 is a binary AND with 00111111
Both are much faster to do as binary operations since they deal with powers or two.
Adding to #Mchl's answer the reason for adding 192 in UTF sequence is to signal the start of byte information
192 - 11000000 - Start of 2 Byte sequence ( 128 + 64)
224 - 11100000 - Start of 3 Byte sequence ( 128 + 64 + 32)
240 - 11110000 - Start of 4 Byte sequence ( 128 + 64 + 32 + 16)
248 - 11111000 - Start of 5 Byte sequence (Restricted) (... + 8)
252 - 11111100 - Start of 6 Byte sequence (Restricted) (... + 4)
254 - 11111110 - Invalid
Table Reference : https://en.wikipedia.org/w/index.php?title=UTF-8&oldid=388157043
UTF-8 byte range table

Weird behaviour in PHP and Apache2: different output in different servers

I'm experiencing different outputs in PHP code running in Mac and Linux.
I have 2 servers running the following code:
$ltt = ((ord($str[7]) << 24) | (ord($str[8]) << 16) | (ord($str[9]) << 8) | (ord($str[10]))) / 1000000;
Even the ord(str[ ]) outputs are the same:
[7] = 254
[8] = 26
[9] = 22
[10] = 216
But, on the MAMP stack (Mac) running php 5.3.6, if $ltt is originally supposed to be a negative number, it returns 4263.12265 (incorrect).
On the LAMP stack (Ubuntu) running same php version, it will return the exact negative value -31.84465.
This happens only with negative numbers..
Update Addl. Info:
A var dump gives þØçï_Kstring(25) "þØçï_K"
bin2hex gives 000e1b00000000fe1a16d806e707ef0000045f0000004b0000
Simplying the function to include only numeric inputs, the output still differs:
$ltt = (254 << 24 | 26 << 16 | 22 << 8 | 216)/ 1000000;
4263.12265 on MAMP and -31.84465 on LAMP
This is a 32 vs 64 bit problem.
Because your most significant byte is > 127, on a 32 bit platform this is interpreted as a negative value because of integer overflow - the most significant bit is set. On a 64-bit platform it is not.
The solution is to use pack() and unpack() so you can specify that the integer should be signed. EDIT Fixed this code sample See edit 2
$packed = pack('C*', ord($str[7]), ord($str[8]), ord($str[9]), ord($str[10]));
$unpacked = unpack('l', $packed);
$lat = current($unpacked);
...however you should also be conscious that this will not work on a little-endian architecture, because the byte ordering will be wrong. You can simply reverse the order of the packed bytes to work around this, but I am just trying to wrap my head around a works-everywhere solution.
EDIT 2
OK, it took me a while to wrap my head around this but we got there in the end:
What you need to do is, if the most significant bit is set, OR the result with a number where the least significant 32 bits are not set but the rest are. So the following works on both 32 and 64 bit:
<?php
// The input bytes as ints
$bytes = array(254, 26, 22, 216);
// The operand to OR with at the end
$op = $bytes[0] & 0x80 ? (~0 << 16) << 16 : 0;
// Do the bitwise thang
$lat = (($bytes[0] << 24) | ($bytes[1] << 16) | ($bytes[2] << 8) | $bytes[3]) | $op;
// Convert to float for latitude
$lat /= 1000000;
echo $lat;

Swapping bytes of text in PHP

I basically need to port this piece of code to php
for (i = 0; i < 128/4; i++)
data32[i] = bswap_32(data32[i]);
But, there is no bswap function in php.
Would someone be kind enough to provide me with something that could solve the problem?
This should do it (untested):
function bswap_32($j)
{
return (($j & 255) << 24) | (($j & 0xff00) << 8) |
(($j & 0xff0000) >> 8) | (($j & 0xff000000) >> 24);
}
Or, if there is a sign extension problem, this should resolve it:
function bswap_32($j)
{
return (($j & 255) << 24) | (($j & 0xff00) << 8) |
(($j & 0xff0000) >> 8) | (255 & (($j & 0xff000000) >> 24));
}
It sounds like bswap_32 is swapping endianness of your 32-bit quantities.
I could just give you some code, but I'd prefer not to do people's work for them, so I'll explain the principle instead:
You can achieve that with bit-shifts and masks (so for instance, you need to mask out the 8 lowest bits, and shift them into the highest 8 bit positions of the result).
Shifting can be done with the << and >> operators. Masking can be done with the & operator. See the PHP manual page on operators for more details.

Easily calculating and listing binary combinations

I have 5 bits and so 32 different combinations (of them).
Starting from
00000
and ending with
11111
Is there some way of quickly listing all possibilities? I could do it by hand, but I worry that I might miss one. I'm guessing some clever chap has written some algorithm and/or made a website, that can do this very easily. At least I hope so.
Thanks a lot.
This will put them all on the command line on Linux.
echo {0..1}{0..1}{0..1}{0..1}{0..1}
In Ruby:
0b0000.upto(0b1111) {|n| puts n.to_s(2).rjust(4,"0")}
0000
0001
0010
0011
0100
0101
0110
0111
1000
1001
1010
1011
1100
1101
1110
1111
Write a column with integer from 0 to 31, then write a second column with the binary equivalent of each integer side-by-side.
That way you will increase your chance not to miss a combination.
Just count from 0 to 31 and output the digit in it's binary form.
Something like this should do:
public static String zeroPad(String str) {
return "00000".substring(str.length()) + str;
}
public static void main(String[] args) {
for (int i = 0; i < 32; i++)
System.out.printf("%s%n", zeroPad(Integer.toBinaryString(i)));
}
Output:
00000
00001
00010
00011
...
11110
11111
for (int i = 0; i <31; i++)
cout << ((i & 16) >> 4) << ((i & 8) >> 3) << ((i & 4) >> 2)
<< ((i & 2) >> 1) << (i & 1) << endl;
Unix:
echo {0..1}{0..1}{0..1}{0..1} | xargs -n 1

PHP Bytes 2 DWord

I have an array:
$arr[0] = 95
$arr[1] = 8
$arr[2] = 0
$arr[3] = 0
That are bytes. I need a DWORD.
I tried:
$dword = $arr[0]+$arr[1]*265+$arr[2]*265*265+$arr[3]*265*265*265;
Is that right or am I doing it wrong?
Try:
$dword = (($arr[3] & 0xFF) << 24) | (($arr[2] & 0xFF) << 16) | (($arr[1] & 0xFF) << 8) | ($arr[0] & 0xFF);
It can also be done your way with some corrections:
$dword = $arr[0] + $arr[1]*0x100 + $arr[2]*0x10000 + $arr[3]*0x1000000;
Or using pack/unpack:
$dword = array_shift(unpack("L", pack("CCCC", $arr[0], $arr[1], $arr[2], $arr[3])));
Or try<?php
$arr = array(95,8,0,0);
$bindata = join('', array_map('chr', $arr));
var_dump(unpack('L', $bindata));both (Emil H's and my code) give you 2143 as the result.
Or at the very least use 256 rather than 265.
Your code should work correctly, but you should multiply with 256, not 265. (in 8 bits, there are 2^8 = 256 unique values). It works, because multiplying with 256 is the same as shifting the bits 8 places to the left.
Perhaps you should consider using the bitwise operators instead, to better convey the intent. See http://theopensourcery.com/phplogic.htm

Categories