Read C++ QString binary data with PHP - php

I have to read a binary file written by a C++ app using Qt framework. Data is structured from a C struct as described below. Chars are written from a QString pointer in the C++ app.
struct panelSystem{
char ipAddress[16];
char netMask[16];
char gateway[16];
char paddingBytes[128];
};
I tried using the following PHP code to read the multibyte char values :
// Where: $length is a defined number (16 or 128 in this case)
// $data is a binary string read from the binary file
$var = substr($data, $currentOffset, $length);
$currentOffset += $length; // Increment offset by X bytes
$var = trim(str_replace("\0", "\n", $var));
$var = unpack("C*", $var);
$char = '';
foreach ($var as $letter) {
$char .= chr($letter);
}
$var = $char;
Unfortunately the result includes null (\0) and/or irrelevant characters before and after the desired char.
Is there a way to interpret or convert those char from QString multibyte array to PHP standard string (without modifying the original input) ?
Thank you.

A QString will be written to the binary with (at least) a length value and the character data. You also will need to take into account how the string is formatted. It might be in unicode utf-16, in which case, simple chars will have each character padded with zeros. Where the zeros go will depend upon what endian-ness the file is written in, though IIRC Qt will always store files in one particular endian-ness, regardless of the platform so that binaries are cross platform.
Have a look in the Qt source code to see how strings are written. Maybe also take a look at the binary file in a hex editor.
If your binary file only ever contains the struct, and you don't have a problem with the binary format being fixed, then you may find things far easier to write the file using raw IO:
struct panelSystem{
char ipAddress[16];
char netMask[16];
char gateway[16];
char paddingBytes[128];
void write(QDataStream& ds) {
ds.writeRawData(ipAddress, sizeof(ipAddress));
ds.writeRawData(netMask, sizeof(netMask));
ds.writeRawData(gateway, sizeof(gateway));
ds.writeRawData(paddingBytes, sizeof(paddingBytes));
}
};
//...
panelSystem p;
GetPanelSystemData(&p);
QFile file("c:\\work\\test\\testbin.bin");
if (file.open(QIODevice::Truncate | QIODevice::WriteOnly)) {
QDataStream ds(&file);
p.write(ds);
}
file.close();
I would recommend at least adding a version number/header to the start binary file to prevent painting yourself into a corner.

Related

Check if string is in the BMP range

So I was searching for a proper way in PHP to detect if a string is in the BMP range (Basic Multilingual Plane) but I found nothing. Even mb-check-encoding and mb_detect_encoding do not offer any help in this particular case.
So I wrote my own code
<?php
function is_bmp($string) {
$str_ar = mb_str_split($string);
foreach ($str_ar as $char) {
/*Check if there's any character's code point outside the BMP range*/
if (mb_ord($char) > 0xFFFF)
return false;
}
return true;
}
/*String containing non-BMP Unicode characters*/
$string = '😈blah blah';
var_dump(is_bmp($string));
?>
Outputs:
bool(false)
Now my question is:
Is there a better approach? and are there any flaws in it?
If you have an correct UTF-8 encoded input string, you can just check its bytes to figure out does it have symbols out of BMP or not.
Literally, you need to detect: does the input string contains any symbol, which codepoint is greater than 0xFFFF (i.e. longer than 16 bits)
Note on how UTF-8 encoding works:
Codepoints with codes 0 thru 0x7F are encoded as is. By one byte.
All other codepoints have a code within range 0xC0 ... 0xFF as the first byte, which also encodes how many additional bytes folows. And codes 0x80...0xBF as additional bytes.
To encode code points 0x10000 and greater, UTF-8 requires a sequence of 4 bytes, and the first byte of that sequence will be 0xF0 or greater. In all other cases the whole string will contain bytes less than 0xF0.
In short your task just to find: does the binary representation of the string contanin any byte of range 0xF0...0xFF?
function is_bmp($string) {
return preg_match('#[\xF0-\xFF]#', $string) != 0;
}
OR
even simpler (but probably less effective on speed), you can use ability of PCRE to work with UTF-8 sequences (see option PCRE_UTF8):
function is_bmp($string) {
return preg_match('#[^\x00-\x{FFFF}]#u', $string) != 0;
}
var_dump(
!preg_match('/[^\x0-\x{ffff}]/u', '😈blah blah')
);

Javascript hexadecimal to binary using UTF8

I have data stored in an SQLite database as BINARY(16), the value of which is determined by PHP's hex2bin function on a 32-character hexadecimal string.
As an example, the string 434e405b823445c09cb6c359fb1b7918 returns CN#[4EÀ¶ÃYûy.
The data stored in this database needs to be manipulated by JavaScript, and to do so I've used the following function (adapted from Andris's answer here):
// Convert hexadecimal to binary string
String.prototype.hex2bin = function ()
{
// Define the variables
var i = 0, l = this.length - 1, bytes = []
// Iterate over the nibbles and convert to binary string
for (i; i < l; i += 2)
{
bytes.push(parseInt(this.substr(i, 2), 16))
}
// Return the binary string
return String.fromCharCode.apply(String, bytes)
}
This works as expected, returning CN#[4EÀ¶ÃYûy from 434e405b823445c09cb6c359fb1b7918.
The problem I have, however, is that when dealing directly with the data returned by PHP's hex2bin function I am given the string CN#[�4E����Y�y rather than CN#[4EÀ¶ÃYûy. This is making it impossible for me to work between the two (for context, JavaScript is being used to power an offline iPad app that works with data retrieved from a PHP web app) as I need to be able to use JavaScript to generate a 32-character hexadecimal string, convert it to a binary string, and have it work with PHP's hex2bin function (and SQLite's HEX function).
This issue, I believe, is that JavaScript uses UTF-16 whereas the binary string is stored as utf8_unicode_ci. My initial thought, then, was that I need to convert the string to UTF-8. Using a Google search led me to here and searching StackOverflow led me to bobince's answer here, both of which recommend using unescape(encodeURIComponent(str)). However, this does return what I need (CN#[�4E����Y�y):
// CN#[Â4EöÃYûy
unescape(encodeURIComponent('434e405b823445c09cb6c359fb1b7918'.hex2bin()))
My question, then, is:
How can I use JavaScript to convert a hexadecimal string into a UTF-8 binary string?
Given a hex-encoded UTF-8 string, `hex',
hex.replace(/../g, '%$&')
will produce a URI-encoded UTF-8 string.
decodeURIComponent converts URI-encoded UTF-8 sequences into JavaScript UTF-16 encoded strings, so
decodeURIComponent(hex.replace(/../g, '%$&'))
should decode a properly hex-encoded UTF-8 string.
You can see that it works by applying it to the example from the hex2bin documentation.
alert(decodeURIComponent('6578616d706c65206865782064617461'.replace(/../g, '%$&')));
// alerts "example hex data"
The string you gave is not UTF-8 encoded though. Specifically,
434e405b823445c09cb6c359fb1b7918
^
82 must follow a byte with at least the first two bits set, and 5b is not such a byte.
RFC 2279 explains:
The table below summarizes the format of these different octet types.
The letter x indicates bits available for encoding bits of the UCS-4
character value.
UCS-4 range (hex.) UTF-8 octet sequence (binary)
0000 0000-0000 007F 0xxxxxxx
0000 0080-0000 07FF 110xxxxx 10xxxxxx
0000 0800-0000 FFFF 1110xxxx 10xxxxxx 10xxxxxx
Your applications don't have to handle binary at any point. Insertion is latest possible point and that's where you
convert to binary at last. Selection is earliest possible point and that's where you convert to hex, and use
hex-strings in application throughout.
When inserting, you can replace UNHEX with blob literals:
INSERT INTO table (id)
VALUES (X'434e405b823445c09cb6c359fb1b7918')
When selection, you can HEX:
SELECT HEX(id) FROM table
Expanding on Mike's answer, here's some code for encoding and decoding.
Note that the escape/unescape() functions are deprecated. If you need polyfills for them, you can check out the more comprehensive UTF-8 encoding example found here: http://jsfiddle.net/47zwb41o
// UTF-8 to hex
var utf8ToHex = function( s ){
s = unescape( encodeURIComponent( s ) );
var chr, i = 0, l = s.length, out = '';
for( ; i < l; i++ ){
chr = s.charCodeAt( i ).toString( 16 );
out += ( chr.length % 2 == 0 ) ? chr : '0' + chr;
}
return out;
};
// Hex to UTF-8
var hexToUtf8 = function( s ){
return decodeURIComponent( s.replace( /../g, '%$&' ) );
};

How to write NON ASCII data in one file with php

I must write a binary file with php, but I think that I don't use the correct method.
I use this function:
$ptr = fopen("file.txt", 'wb');
fwrite($ptr, $Str);
fclose($ptr);
I write this string (chaotic representation of 0 and 1):
$Str="00000001001110000000000100000100000000000000001000000010000000010000001100000101000000000000010000000001000000000000010100000";
I thought that opening the file with OpenOffice I would not have seen the text of zeros and ones, but instead I was sure I saw a chaotic sequence of characters.
Why I see the zeros and ones in open office? How can I do to write the raw data with php?
If you write text in a file, even when open in binary mode, you will get text in the file. Your 0 is not stored as a zero bit, but as the ASCII representation of the 0 caracter.
Use the binary format for numbers in PHP, for instance:
$var = OxFF; // equals to 1111 1111 in binary.
More in the manual
You can write any character with the chr() function.
Alternatively, you can do something like "\x0A" (that's a newline character).

PHP encoding to only have letters and numbers

Is there a encoding function in PHP which will encode strings and the resulting output will only contain letters and numbers? I would use base64 but that still has some stuff which is not numeric/alphanumeric
You could use base32 (code easy to google), which is sort of a standard alternative to base64. Or resort to bin2hex() and pack("H*",$hex) to reverse. Hex encoding however leads to size doubling.
Short answer is no, base64 uses a reduced set of output chars compared with uuencode and was intended to solve most character converions issues - but still isn't url-safe (IIRC).
But the machanism is trivial and easily adapted - I'd suggest having a look at base32 encoding - same as base64 but using one less bit per input char to create the output (and hence a 32 char alphabet is all that's required) but using something different for the padding char ('=' is not url safe).
A quick google found this
Any of the hash functions (md5, sha1, etc.) output will only consist of hexadecimal digits but that's not exactly 'encoding'.
You could write your own base-62 encoder/decoder using a-z/A-Z/0-9. You'd need 3 digits for every ASCII character though, so not that efficient.
I wrote this to use letters, numbers and dashes.
I'm sure you can improve it to take out the dashes:
function pj_code($str) {
$len = strlen($str);
while ($len--) {
$enc .= base_convert(ord(substr($str,$len,1)),10,36) . '-';
}
return $enc;
}
function pj_decode($str) {
$ords = explode('-',$str);
$c = count($ords);
while ($c--) {
$dec .= chr(base_convert($ords[$c],36,10));
}
return $dec;
}
You can use the basic md5 hash function which output only alphanumeric characters.

In PHP what does it mean by a function being binary-safe?

In PHP what does it mean by a function being binary-safe ?
What makes them special and where are they typically used ?
It means the function will work correctly when you pass it arbitrary binary data (i.e. strings containing non-ASCII bytes and/or null bytes).
For example, a non-binary-safe function might be based on a C function which expects null-terminated strings, so if the string contains a null character, the function would ignore anything after it.
This is relevant because PHP does not cleanly separate string and binary data.
The other users already mentioned what binary safe means in general.
In PHP, the meaning is more specific, referring only to what Michael gives as an example.
All strings in PHP have a length associated, which are the number of bytes that compose it. When a function manipulates a string, it can either:
Rely on that length meta-data.
Rely on the string being null-terminated, i.e., that after the data that is actually part of the string, a byte with value 0 will appear.
It's also true that all string PHP variables manipulated by the engine are also null-terminated. The problem with functions that rely on 2., is that, if the string itself contains a byte with value 0, the function that's manipulating it will think the string has ended at that point and will ignore everything after that.
For instance, if PHP's strlen function worked like C standard library strlen, the result here would be wrong:
$str = "abc\x00abc";
echo strlen($str); //gives 7, not 3!
More examples:
<?php
$string1 = "Hello";
$string2 = "Hello\x00World";
// This function is NOT ! binary safe
echo strcoll($string1, $string2); // gives 0, strings are equal.
// This function is binary safe
echo strcmp($string1, $string2); // gives <0, $string1 is less than $string2.
?>
\x indicates hexadecimal notation. See: PHP strings
0x00 = NULL
0x04 = EOT (End of transmission)
ASCII table to see ASCII char list

Categories