I'm trying to learn binary and create a simple WebM parser in PHP based on Matroska.
I read TimecodeScale, MuxingAppm WritingApp, etc. with unpack(format, data). My problem is when I reach Duration (0x4489) in Segment Information (0x1549a966) I must read a float and based on TimecodeScale convert it to seconds: 261.564s->00:04:21.564 and I don't know how.
This is a sample sequence:
`2A D7 B1 83 0F 42 40 4D 80 86 67 6F 6F 67 6C 65 57 41 86 67 6F 6F 67 6C 65 44 89 88 41 0F ED E0 00 00 00 00 16 54 AE 6B`
TimecodeScale := 2ad7b1 uint [ def:1000000; ]
MuxingApp := 4d80 string; ("google")
WritingApp := 5741 string; ("google")
Duration := 4489 float [ range:>0.0; ]
Tracks := 1654ae6b container [ card:*; ]{...}
I must read a float after (0x4489) and return 261.564s.
The duration is a double precision floating point value (64-bits) represented in the IEEE 754 format. If you want to see how the conversion is done check this.
The TimecodeScale is the timestamp scale in nanoseconds.
In php you can do:
$bin = hex2bin('410fede000000000');
$timecode_scale = 1e6;
// endianness
if (unpack('S', "\xff\x00")[1] === 0xff) {
$bytes = unpack('C8', $bin);
$bytes = array_reverse($bytes);
$bin = implode('', array_map('chr', $bytes));
}
$duration = unpack('d', $bin)[1];
$duration_s = $duration * $timecode_scale / 1e9;
echo "duration=${duration_s}s\n";
Result:
duration=261.564s
Related
PHP has a method hash_hmac that computes the HMAC signature of a given string using a given key and algorithm. But HMAC technically operates on binary data, and PHP takes all its params here as strings. How does it convert those strings to binary data?
Short answer: String encoding is just metadata attached to a lump of binary data. PHP strings are just the lump, you have to keep track of the rest.
Long answer:
PHP takes the Honey Badger approach to native string encodings, in other words, "PHP don't care". You give it a sequence of bytes, it stores them. It has no concept of encoding until you want to use a function that cares about it. Even then you need to explicitly declare the input and output encodings, otherwise PHP will go with its configured default which is usually not what anyone actually wants.
function nice_hex($in) {
return implode(' ', str_split(bin2hex($in), 2));
}
$utf8 = "You owe me €5.";
$utf16le = mb_convert_encoding($utf8, 'utf-16le', 'utf-8');
$utf16be = mb_convert_encoding($utf8, 'utf-16be', 'utf-8');
$iso88591 = mb_convert_encoding($utf8, 'iso-8859-1', 'utf-8');
$cp1252 = mb_convert_encoding($utf8, 'cp1252', 'utf-8');
var_dump(
$utf8,
nice_hex($utf8),
hash_hmac('md5', $utf8, 'foo'),
$utf16le,
nice_hex($utf16le),
hash_hmac('md5', $utf16le, 'foo'),
$utf16be,
nice_hex($utf16be),
hash_hmac('md5', $utf16be, 'foo'),
$iso88591,
nice_hex($iso88591),
hash_hmac('md5', $iso88591, 'foo'),
$cp1252,
nice_hex($cp1252),
hash_hmac('md5', $cp1252, 'foo')
);
Output:
string(16) "You owe me €5."
string(47) "59 6f 75 20 6f 77 65 20 6d 65 20 e2 82 ac 35 2e"
string(32) "7724135d91c43906f8730a26dcd76ffb"
string(28) "You owe me � 5."
string(83) "59 00 6f 00 75 00 20 00 6f 00 77 00 65 00 20 00 6d 00 65 00 20 00 ac 20 35 00 2e 00"
string(32) "f4a2347b4a1336dae1db21554c54b9e2"
string(28) "You owe me �5."
string(83) "00 59 00 6f 00 75 00 20 00 6f 00 77 00 65 00 20 00 6d 00 65 00 20 20 ac 00 35 00 2e"
string(32) "b0c1a98d8b853e6568bae513d764a029"
string(14) "You owe me ?5."
string(41) "59 6f 75 20 6f 77 65 20 6d 65 20 3f 35 2e"
string(32) "301a0fb55e23285904413323d10cc774"
string(14) "You owe me �5."
string(41) "59 6f 75 20 6f 77 65 20 6d 65 20 80 35 2e"
string(32) "fa1ee73d39e1a70fe2cde7a8c5bbf0ba"
And the reason why that all looks like it does is because:
StackOverflow uses UTF-8.
My editor uses UTF-8.
My console uses UTF-8.
The fact that PHP doesn't care about string encoding lets me produce arbitrarily-encoded trash output like the above quite easily.
Additional recommended reading: UTF-8 all the way through
Fun Fact: One of the reasons why PHP6 never ended up happening was because they wanted to include native multibyte string encoding but no one could agree on what flavor it should be. Eventually they just scrapped the whole thing and left it up to us the same as it was in PHP5.
It's just UTF-8 (for string literals).
You can put whatever encoding you want in a string, hash_hmac() doesn't use any specific encoding, just whatever encoding your string has.
Here's an example from Wikipedia using UTF-8 encoding and running a HMAC algorithm over the binary:
HMAC_MD5("key", "The quick brown fox jumps over the lazy dog") = 80070713463e7749b90c2dc24911e275
And here's the result of the equivalent PHP code, which gets the same response:
php > echo hash_hmac('md5', "The quick brown fox jumps over the lazy dog", "key");
80070713463e7749b90c2dc24911e275
My problem is that I was some time ago base64 encoding random bytes from openssl sha256 in C (as uint8_t), feeding them into a shell script and using the output.
What I can recreate from my data now is:
Content of file.txt:
uvjWEHTUk1LnzVZul9ynRpezWfKYN3bvlx103wxACxo
test#test:~# base64 -d file.txt | od -t x1
0000000 ba f8 d6 10 74 d4 93 52 e7 cd 56 6e 97 dc a7 46
0000020 97 b3 59 f2 98 37 76 ef 97 1d 74 df 0c 40 0b 1a
The output is the same as calling in PHP:
echo bin2hex(base64_decode("uvjWEHTUk1LnzVZul9ynRpezWfKYN3bvlx103wxACxo="));
baf8d61074d49352e7cd566e97dca74697b359f2983776ef971d74df0c400b1a
What I did all the time in shell and need to do now in PHP is the following:
Again, same content of file.txt:
uvjWEHTUk1LnzVZul9ynRpezWfKYN3bvlx103wxACxo
test#test:~# base64 -d file.txt | od -t x8
0000000 5293d47410d6f8ba 46a7dc976e56cde7
0000020 ef763798f259b397 1a0b400cdf741d97
My problem here: what is now the equal procedure in PHP (to od -t x8 in shell)?
I tried pack / unpack / bin2hex / ... and can't get the same result.
I'm trying to get a string with this content:
"5293d47410d6f8ba46a7dc976e56cde7ef763798f259b3971a0b400cdf741d97"
from a starting point of base64_decode("uvjWEHTUk1LnzVZul9ynRpezWfKYN3bvlx103wxACxo="). Any ideas?
If x8 is what you really need, which is 8 bytes, then the implementation would be as simple as
<?php
$str = 'uvjWEHTUk1LnzVZul9ynRpezWfKYN3bvlx103wxACxo';
$bin = base64_decode($str);
if (strlen($bin) % 8 !== 0) {
throw new \RuntimeException('data length should be divisible by 8');
}
$result = '';
for ($i = 0; $i < strlen($bin); $i += 8) {
for ($j = $i + 7; $j >= $i; --$j) {
$result .= bin2hex($bin[$j]);
}
}
echo $result;
It iterates over blocks of 8 bytes, then dumps them in reverse order each.
Ideone: https://ideone.com/hBanqi
I am trying to convert little endian hex to big endian hex.
Example:
Little endian:
E1 31 01 00 00 9D
Big endian:
9D 00 00 01 31 E1
If numbers are in the format described than you can convert by using standard array functions.
function littleToBigEndian($little) {
return implode(' ',array_reverse(explode(' ', $little)));
}
echo littleToBigEndian('E1 31 3C 01 00 00 9B');
// Output: 9B 00 00 01 3C 31 E1
If there are no spaces for separation of numbers you need to str_split() the string instead.
function littleToBigEndian($little) {
return implode('',array_reverse(str_split($little,2)));
}
echo littleToBigEndian('E1313C0100009B');
// Output: 9B0000013C31E1
<?php
$file = 'file.dat';
$file_contents = file_get_contents($file);
for ($i = 0x000481; $i <= 0x00048B; $i++) {
print $i;
}
?>
I am creating an online file analyzer but I have a small problem. It outputs (which is the actual position the hex is in)
1153 1154 1155 1156 1157 1158 1159 1160 1161 1162 1163
when it should be
44 72 48 79 64 72 61 6C 69 73 6B
which his hex for DrHydralisk (me). Can anyone help me output the latter or just have it strait output ASCII (but hex is fine, I can just convert it)?
edit
Here is an image of what I am trying to do that I think will help.
http://imgur.com/nwenA.png
Here is the file I am trying to read, its a Starcraft replay (file.SC2Replay). Just search for DrHydralisk in a hex editor and that is where I am trying to read from.
http://www.mediafire.com/?6w8wi35q3o6ix8q
It should be (if clear text is in the file):
for( $i=0x481; $i<0x48D; $i++ ) {
printf("%X ", ord($file_contents[$i]));
}
Note the loop boundaries: 0x481 .. 0x48D
Result:
44 72 20 48 79 64 72 61 6C 69 73 6B
If the file contains hexadecimal numbers, this would be impossible because you need two bytes per hex char for the ascii character value range. So what is really in the file?
Edit
After reading your file, i did:
...
$file = 'file.SC2Replay';
$file_contents = file_get_contents($file);
for( $i=0x438; $i<0x443; $i++) {
printf("%X ", ord($file_contents[$i]));
}
for( $i=0x438; $i<0x443; $i++) {
printf("%s ", $file_contents[$i]);
}
...
And it says:
72 48 79 64 72 61 6C 69 73 6B
and
D r H y d r a l i s k
You messed up the file position ;-)
Regards
rbo
EDIT:
Thanks for providing the file, helped a lot! Beleive I got it working too:
//Do binary safe file read
$filename = 'file.SC2Replay';
$file = fopen($filename, "rb");
$contents = fread($file, filesize($filename));
fclose($file);
//position 1080 - 1091
for ($i = 0x438; $i < 0x443; $i++)
echo $contents[$i];
The reasons you were probably having problems is that first of all, a binary safe file read in php automatically replaces the bytes with the correct ASCII characters, so that threw off what position you actually needed to start reading from. Intead of 1153, it starts at 1080.
Could you explain how you are using the file you read in? Because the hex equivalent of:
11531154115511561157115811591160116111621163
is:
481 482 483 484 485 486 487 488 489 48a 48b
Also, there are two php functions you may find helpful
chr(int): returns the ascii character associated with the integer provided - http://php.net/manual/en/function.chr.php
dechex(int): returns the hex value of the integer provided - http://php.net/manual/en/function.dechex.php
How can I encode strings on UTF-16BE format in PHP? For "Demo Message!!!" the encoded string should be '00440065006D006F0020004D00650073007300610067006'. Also, I need to encode Arabic characters to this format.
First of all, this is absolutly not UTF-8, which is just a charset (i.e. a way to store strings in memory / display them).
WHat you have here looks like a dump of the bytes that are used to build each characters.
If so, you could get those bytes this way :
$str = utf8_encode("Demo Message!!!");
for ($i=0 ; $i<strlen($str) ; $i++) {
$byte = $str[$i];
$char = ord($byte);
printf('%02x ', $char);
}
And you'd get the following output :
44 65 6d 6f 20 4d 65 73 73 61 67 65 21 21 21
But, once again, this is not UTF-8 : in UTF-8, like you can see in the example I've give, D is stored on only one byte : 0x44
In what you posted, it's stored using two Bytes : 0x00 0x44.
Maybe you're using some kind of UTF-16 ?
EDIT after a bit more testing and #aSeptik's comment : this is indeed UTF-16.
To get the kind of dump you're getting, you'll have to make sure your string is encoded in UTF-16, which could be done this way, using, for example, the mb_convert_encoding function :
$str = mb_convert_encoding("Demo Message!!!", 'UTF-16', 'UTF-8');
Then, it's just a matter of iterating over the bytes that make this string, and dumping their values, like I did before :
for ($i=0 ; $i<strlen($str) ; $i++) {
$byte = $str[$i];
$char = ord($byte);
printf('%02x ', $char);
}
And you'll get the following output :
00 44 00 65 00 6d 00 6f 00 20 00 4d 00 65 00 73 00 73 00 61 00 67 00 65 00 21 00 21 00 21
Which kind of looks like what youy posted :-)
(you just have to remove the space in the call to printf -- I let it there to get an easier to read output=)
E.g. by using the mbstring extension and its mb_convert_encoding() function.
$in = 'Demo Message!!!';
$out = mb_convert_encoding($in, 'UTF-16BE');
for($i=0; $i<strlen($out); $i++) {
printf("%02X ", ord($out[$i]));
}
prints
00 44 00 65 00 6D 00 6F 00 20 00 4D 00 65 00 73 00 73 00 61 00 67 00 65 00 21 00 21 00 21
Or by using iconv()
$in = 'Demo Message!!!';
$out = iconv('iso-8859-1', 'UTF-16BE', $in);
for($i=0; $i<strlen($out); $i++) {
printf("%02X ", ord($out[$i]));
}