Strange behavior reading DBF file in PHP using fread()

Strange behavior reading DBF file in PHP using fread() - php

I have a DBF file created as part of a shapefile with rgdal library's writeOGR function (in R).
When I ask to see its first bytes with Linux od command, I get the following.
od -x -c -N 32 BRA.dbf
0000000 7703 1e07 001b 0000 00a1 00d1 0000 0000
0000020 0000 0000 0000 0000 0000 0000 5700 0000
0000040
My PHP code goes like this.
$dbf = fopen('BRA.dbf','rb');
fread($dbf,10); // jumps over the first 10 bytes
$dbfRecSize = unpack('v',fread($dbf,2))[1]; // 'v' = little endian 16 bits: 00d1 = d1(16) = 209
fread($dbf,17); // jumps over a few more bytes
$dbfLangID = ord(fread($dbf,1)); // language driver ID
if ($dbfLangID == 0x57) {
echo "Language: 0x57 (ISO-8859-1)\n";
} else {
echo "Language: $dbfLangID;\n";
}
The code above outputs "Language: 0x57 (ISO-8859-1)", which means the "57" close to the end of the od output is being read with the ord(fread($dbf,1)); command.
Strange thing is that I've read 10+2+17 = 29 bytes from the file, so the next byte should be "00", or not (right after the 0x57)? $dbfRecSize is 209, which means my logic is correct in the first two reads. Why isn't it in the following reads?
What am I misunderstanding here?

The error is that I was confusing od command with debug from DOS...
od -x prints bytes with the order reversed every two bytes (too confusing to me).
0000000 7703 1e07 001b 0000 00a1 00d1 0000 0000
0000020 0000 0000 0000 0000 0000 0000 5700 0000
od -t x1 prints each byte once and separated (harder to count/read in the middle of the line).
0000000 03 77 07 1e 1b 00 00 00 a1 00 d1 00 00 00 00 00
0000020 00 00 00 00 00 00 00 00 00 00 00 00 00 57 00 00
Wonder if is there an option to print bytes two by two (in hexadecimal), without reversing their orders?

Related

What string encoding is used by PHP's hash_hmac?

PHP has a method hash_hmac that computes the HMAC signature of a given string using a given key and algorithm. But HMAC technically operates on binary data, and PHP takes all its params here as strings. How does it convert those strings to binary data?

Short answer: String encoding is just metadata attached to a lump of binary data. PHP strings are just the lump, you have to keep track of the rest.
Long answer:
PHP takes the Honey Badger approach to native string encodings, in other words, "PHP don't care". You give it a sequence of bytes, it stores them. It has no concept of encoding until you want to use a function that cares about it. Even then you need to explicitly declare the input and output encodings, otherwise PHP will go with its configured default which is usually not what anyone actually wants.
function nice_hex($in) {
return implode(' ', str_split(bin2hex($in), 2));
}
$utf8 = "You owe me €5.";
$utf16le = mb_convert_encoding($utf8, 'utf-16le', 'utf-8');
$utf16be = mb_convert_encoding($utf8, 'utf-16be', 'utf-8');
$iso88591 = mb_convert_encoding($utf8, 'iso-8859-1', 'utf-8');
$cp1252 = mb_convert_encoding($utf8, 'cp1252', 'utf-8');
var_dump(
$utf8,
nice_hex($utf8),
hash_hmac('md5', $utf8, 'foo'),
$utf16le,
nice_hex($utf16le),
hash_hmac('md5', $utf16le, 'foo'),
$utf16be,
nice_hex($utf16be),
hash_hmac('md5', $utf16be, 'foo'),
$iso88591,
nice_hex($iso88591),
hash_hmac('md5', $iso88591, 'foo'),
$cp1252,
nice_hex($cp1252),
hash_hmac('md5', $cp1252, 'foo')
);
Output:
string(16) "You owe me €5."
string(47) "59 6f 75 20 6f 77 65 20 6d 65 20 e2 82 ac 35 2e"
string(32) "7724135d91c43906f8730a26dcd76ffb"
string(28) "You owe me � 5."
string(83) "59 00 6f 00 75 00 20 00 6f 00 77 00 65 00 20 00 6d 00 65 00 20 00 ac 20 35 00 2e 00"
string(32) "f4a2347b4a1336dae1db21554c54b9e2"
string(28) "You owe me �5."
string(83) "00 59 00 6f 00 75 00 20 00 6f 00 77 00 65 00 20 00 6d 00 65 00 20 20 ac 00 35 00 2e"
string(32) "b0c1a98d8b853e6568bae513d764a029"
string(14) "You owe me ?5."
string(41) "59 6f 75 20 6f 77 65 20 6d 65 20 3f 35 2e"
string(32) "301a0fb55e23285904413323d10cc774"
string(14) "You owe me �5."
string(41) "59 6f 75 20 6f 77 65 20 6d 65 20 80 35 2e"
string(32) "fa1ee73d39e1a70fe2cde7a8c5bbf0ba"
And the reason why that all looks like it does is because:
StackOverflow uses UTF-8.
My editor uses UTF-8.
My console uses UTF-8.
The fact that PHP doesn't care about string encoding lets me produce arbitrarily-encoded trash output like the above quite easily.
Additional recommended reading: UTF-8 all the way through
Fun Fact: One of the reasons why PHP6 never ended up happening was because they wanted to include native multibyte string encoding but no one could agree on what flavor it should be. Eventually they just scrapped the whole thing and left it up to us the same as it was in PHP5.

It's just UTF-8 (for string literals).
You can put whatever encoding you want in a string, hash_hmac() doesn't use any specific encoding, just whatever encoding your string has.
Here's an example from Wikipedia using UTF-8 encoding and running a HMAC algorithm over the binary:
HMAC_MD5("key", "The quick brown fox jumps over the lazy dog") = 80070713463e7749b90c2dc24911e275
And here's the result of the equivalent PHP code, which gets the same response:
php > echo hash_hmac('md5', "The quick brown fox jumps over the lazy dog", "key");
80070713463e7749b90c2dc24911e275

Extract images from access with php

I'm trying to extract images from an Access database with php.
I can read the data, it gives me an hexadecimal string with a "\0" after every 254 chars so after a
$pic = str_replace("\0", '', $pic);
$pic = hex2bin($pic);
I get this:
00000000: 151c 1c00 0300 0000 0700 0100 1400 1b00 ................
00000010: ffff ffff 496d 6167 656d 0000 0105 0000 ....Imagem......
00000020: 0300 0000 0400 0000 4449 4200 5a17 0000 ........DIB.Z...
00000030: f5e8 ffff ac13 0300 2800 0000 e200 0000 ........(.......
00000040: df00 0000 0100 2000 0300 0000 7813 0300 ...... .....x...
00000050: c40e 0000 c40e 0000 0000 0000 0000 0000 ................
00000060: 0000 ff00 00ff 0000 ff00 0000 c2d6 dbff ................
00000070: c9e0 e2ff c7db e0ff c3da dcff cbe2 e4ff ................
00000080: c3dd ddff bbd5 d5ff bedb d8ff c2e2 ddff ................
00000090: c0e2 dbff bee0 d9ff b7dc d2ff b1d6 ccff ................
000000a0: a9ce c4ff a1c6 bcff 9dc3 b7ff 88af a0ff ................
000000b0: 86ad 9eff 7fa8 99ff 7aa3 94ff 739c 8dff ........z...s...
000000c0: 6e97 88ff 6792 83ff 6590 81ff 5986 76ff n...g...e...Y.v.
000000d0: 5986 76ff 5986 76ff 5885 75ff 5685 75ff Y.v.Y.v.X.u.V.u.
I think this is a bitmap with an OLE header, but I couldn't find what to do next. How can I save these, ideally as jpeg?
Edit: when I save the data to a file, no program I tried can view/identify the image. Also, none of the imagecreatefrom*() worked. I think I need to handle the header somehow.

How to read a float from binary WebM file?

I'm trying to learn binary and create a simple WebM parser in PHP based on Matroska.
I read TimecodeScale, MuxingAppm WritingApp, etc. with unpack(format, data). My problem is when I reach Duration (0x4489) in Segment Information (0x1549a966) I must read a float and based on TimecodeScale convert it to seconds: 261.564s->00:04:21.564 and I don't know how.
This is a sample sequence:
`2A D7 B1 83 0F 42 40 4D 80 86 67 6F 6F 67 6C 65 57 41 86 67 6F 6F 67 6C 65 44 89 88 41 0F ED E0 00 00 00 00 16 54 AE 6B`
TimecodeScale := 2ad7b1 uint [ def:1000000; ]
MuxingApp := 4d80 string; ("google")
WritingApp := 5741 string; ("google")
Duration := 4489 float [ range:>0.0; ]
Tracks := 1654ae6b container [ card:*; ]{...}
I must read a float after (0x4489) and return 261.564s.

The duration is a double precision floating point value (64-bits) represented in the IEEE 754 format. If you want to see how the conversion is done check this.
The TimecodeScale is the timestamp scale in nanoseconds.
In php you can do:
$bin = hex2bin('410fede000000000');
$timecode_scale = 1e6;
// endianness
if (unpack('S', "\xff\x00")[1] === 0xff) {
$bytes = unpack('C8', $bin);
$bytes = array_reverse($bytes);
$bin = implode('', array_map('chr', $bytes));
}
$duration = unpack('d', $bin)[1];
$duration_s = $duration * $timecode_scale / 1e9;
echo "duration=${duration_s}s\n";
Result:
duration=261.564s

PHP - convert little endian hex to big endian hex

I am trying to convert little endian hex to big endian hex.
Example:
Little endian:
E1 31 01 00 00 9D
Big endian:
9D 00 00 01 31 E1

If numbers are in the format described than you can convert by using standard array functions.
function littleToBigEndian($little) {
return implode(' ',array_reverse(explode(' ', $little)));
}
echo littleToBigEndian('E1 31 3C 01 00 00 9B');
// Output: 9B 00 00 01 3C 31 E1
If there are no spaces for separation of numbers you need to str_split() the string instead.
function littleToBigEndian($little) {
return implode('',array_reverse(str_split($little,2)));
}
echo littleToBigEndian('E1313C0100009B');
// Output: 9B0000013C31E1

PHP utf encoding problem

How can I encode strings on UTF-16BE format in PHP? For "Demo Message!!!" the encoded string should be '00440065006D006F0020004D00650073007300610067006'. Also, I need to encode Arabic characters to this format.

First of all, this is absolutly not UTF-8, which is just a charset (i.e. a way to store strings in memory / display them).
WHat you have here looks like a dump of the bytes that are used to build each characters.
If so, you could get those bytes this way :
$str = utf8_encode("Demo Message!!!");
for ($i=0 ; $i<strlen($str) ; $i++) {
$byte = $str[$i];
$char = ord($byte);
printf('%02x ', $char);
}
And you'd get the following output :
44 65 6d 6f 20 4d 65 73 73 61 67 65 21 21 21
But, once again, this is not UTF-8 : in UTF-8, like you can see in the example I've give, D is stored on only one byte : 0x44
In what you posted, it's stored using two Bytes : 0x00 0x44.
Maybe you're using some kind of UTF-16 ?
EDIT after a bit more testing and #aSeptik's comment : this is indeed UTF-16.
To get the kind of dump you're getting, you'll have to make sure your string is encoded in UTF-16, which could be done this way, using, for example, the mb_convert_encoding function :
$str = mb_convert_encoding("Demo Message!!!", 'UTF-16', 'UTF-8');
Then, it's just a matter of iterating over the bytes that make this string, and dumping their values, like I did before :
for ($i=0 ; $i<strlen($str) ; $i++) {
$byte = $str[$i];
$char = ord($byte);
printf('%02x ', $char);
}
And you'll get the following output :
00 44 00 65 00 6d 00 6f 00 20 00 4d 00 65 00 73 00 73 00 61 00 67 00 65 00 21 00 21 00 21
Which kind of looks like what youy posted :-)
(you just have to remove the space in the call to printf -- I let it there to get an easier to read output=)

E.g. by using the mbstring extension and its mb_convert_encoding() function.
$in = 'Demo Message!!!';
$out = mb_convert_encoding($in, 'UTF-16BE');
for($i=0; $i<strlen($out); $i++) {
printf("%02X ", ord($out[$i]));
}
prints
00 44 00 65 00 6D 00 6F 00 20 00 4D 00 65 00 73 00 73 00 61 00 67 00 65 00 21 00 21 00 21
Or by using iconv()
$in = 'Demo Message!!!';
$out = iconv('iso-8859-1', 'UTF-16BE', $in);
for($i=0; $i<strlen($out); $i++) {
printf("%02X ", ord($out[$i]));
}

We Keep Coding

PHP, A popular general-purpose scripting language that is especially suited to web development.

Strange behavior reading DBF file in PHP using fread() - php

Related

What string encoding is used by PHP's hash_hmac?

Extract images from access with php

How to read a float from binary WebM file?

PHP - convert little endian hex to big endian hex

PHP utf encoding problem

Categories

Resources