I'm trying to parse a Binary File in PHP which is an attachment of a Document in a NoSQL DB. However, in my tests, if the size of a file is of 1MB, the unpacking lasts for around 12-15 seconds. The file contains information about speed from a sensor.
The binary file converted to hexadecimal is structured as follow:
BB22 1100 0015 XXXX ...
BB22 1300 0400 20FB 5900 25FB 5910 ... 20FB 5910
BB22 1100 0015 ...
BB22 1300 0400 20FB 5700 25FB 5810 ... 20FB 5912
BB22 1300 0400 20FB 5700 25FB 5810 ... 20FB 5912
...
The marker BB22 1100 contains the sensor specification, while 0015 refers to the size of that information.
The marker BB22 1300 contains other data plus the actual speed from the sensor. The next two bytes 0400 represent the length of that chunk, which is of 1024 bytes.
I'm only interested in the speed which are the values e.g. 5900 5910 5910 5700 5810 ...
My approach is as follow:
$file = fopen($url, 'r', false, authenticationContext($url));
$result = stream_get_contents($file, -1);
fclose($file);
$hex_result = bin2hex($result);
$markerData = 'bb2213';
$sensorDataUnpack= "sspeed"; // signed int16
while(($pos = strpos($hex_result, $markerData, $pos)) !== FALSE){
$pos=$pos+4;
for ($j=4; $j<1028; $j=$j+4) {
$d = unpack($sensorDataUnpack, substr($result, $pos/2+$j+2));
$sensorData[] = $d;
}
}
I converted the results from binary to hexadecimal because it wasn't working for me to get the positions properly. Anyway, I believe this code can be very much improved, any ideas?.
This should be fast, but without test data I wasn't able to test it.
The key points are these:
Open the URL as binary, and use the fread() to help in positioning and in slicing up the data to parts.
Use the unpack both for parsing the headers and the bodies of the entries as well.
Use the asterisk * repeater to quickly parse the big bodies for signed shorts.
Use array_values() to convert the associative array to a simple array with numeric keys (like: 0, 1, 2, ...).
Update: I solved the endianness and bitness problem around the marker comparison by using the "H4" pack format to get a hexa string in big endian order.
$sensorData = array();
$file = fopen($url, 'rb', false, authenticationContext($url));
while (($header = fread($file, 6)) !== false) {
$fields = unpack("H4marker/ssize", $header);
$body = fread($file, $fields["size"] * 2);
if ($body === false) {
throw new Exception("import: data stream unexpectedly ended.");
}
if ($fields["marker"] == "BB221300") {
$data = array_values(unpack("s*", $body));
// Store only every second value.
for ($i = 1; $i < count($data); $i+=2) {
$sensorData[] = $data[$i];
}
}
}
fclose($file);
Related
I have a 1024 bit long binary stream of data which I'd like to convert to an array of 32-bit integers (e.i. 32 numbers).
From this question I used this code:
$filename = "myFile.sav";
$handle = fopen($filename, "rb");
$fsize = filesize($filename);
$contents = fread($handle, $fsize);
$byteArray = unpack("N*",$contents);
print_r($byteArray);
And even though it formats itself as "N", var_dump then prints out an array of 256 8-bit long integers. (I want 32 32-bit long numbers). What am I doing wrong?
EDIT: its not actually 256 8-bit numbers, but 256 gibberish values
It seems that your file contains the binary representation of your integers. Given any string representation of bits, the bindec function will convert them properly to integer:
$content = "1000110010110110100100101011000100100000101011010101101110101101011101110011001110010010111101001011111000111000101110011110011100110110000111001110000011001101011100111011110000001110111100100110110001000111111001010101100011100101010111000011010101010010001101101100011001101110001001000101110111011001001101111000110101010001101010000101110000100010000110110111000110001110000010000111001100100111110011000101000110100011111100100011110100101010101101011011101100111000101011110111111010001110000011101001011101111010101010011010011010011101100011111111000110000110000000000101110110010100010011010001110101101100110110001011010010010000011000111101110000100100001101100010101011000110010110110111100111010010101110000101011101010010101110100111100111110011000100001010111110010001010100001010001011101101110010011001010000101011110101100001100001111011101010100001001101100100110001101000000110111000111001100001000110000011011100000100011100100110101101000101111011110001100110010001111111010101110111111010110010111001";
$parts = str_split($content, 32);
for ($i = 0; $i < count($parts); ++$i) {
$parts[$i] = bindec($parts[$i]);
}
print_r($parts);
I am writing an application that can stream videos. It requires the filesize of the video, so I use this code:
$filesize = sprintf("%u", filesize($file));
However, when streaming a six gig movie, it fails.
Is is possible to get a bigger interger value in PHP? I don't care if I have to use third party libraries, if it is slow, all I care about is that it can get the filesize properly.
FYI, $filesize is currently 3017575487 which is really really really really far from 6000000000, which is roughly correct.
I am running PHP on a 64 bit operating system.
Thanks for any suggestions!
The issue here is two-fold.
Problem 1
The filesize function returns a signed integer, with a maximum value of PHP_INT_MAX. On 32-bit PHP, this value is 2147483647 or about 2GB. On 64-bit PHP can you go higher, up to 9223372036854775807. Based on the comments from the PHP filesize page, I created a function that will use a fseek loop to find the size of the file, and return it as a float, which can count higher that a 32-bit unisgned integer.
function filesize_float($filename)
{
$f = fopen($filename, 'r');
$p = 0;
$b = 1073741824;
fseek($f, 0, SEEK_SET);
while($b > 1)
{
fseek($f, $b, SEEK_CUR);
if(fgetc($f) === false)
{
fseek($f, -$b, SEEK_CUR);
$b = (int)($b / 2);
}
else
{
fseek($f, -1, SEEK_CUR);
$p += $b;
}
}
while(fgetc($f) !== false)
{
++$p;
}
fclose($f);
return $p;
}
To get the file size of the file as a float using the above function, you would call it like this.
$filesize = filesize_float($file);
Problem 2
Using %u in the sprintf function will cause it to interpret the argument as an unsigned integer, thus limiting the maximum possible value to 4294967295 on 32-bit PHP, before overflowing. Therefore, if we were to do the following, it would return the wrong number.
sprintf("%u", filesize_float($file));
You could interpret the value as a float using %F, using the following, but it will result in trailing decimals.
sprintf("%F", filesize_float($file));
For example, the above will return something like 6442450944.000000, rather than 6442450944.
A workaround would be to have sprintf interpret the float as a string, and let PHP cast the float to a string.
$filesize = sprintf("%s", filesize_float($file));
This will set $filesize to the value of something like 6442450944, without trailing decimals.
The Final Solution
If you add the filesize_float function above to your code, you can simply use the following line of code to read the actual file size into the sprintf statement.
$filesize = sprintf("%s", filesize_float($file));
As per PHP docuemnation for 64 bit platforms, this seems quite reliable for getting the filesize of files > 4GB
<?php
$a = fopen($filename, 'r');
fseek($a, 0, SEEK_END);
$filesize = ftell($a);
fclose($a);
?>
I have some text files that are very large - 100MB each that contain a single-line string (just 1 line). I want to extract the last xx bytes / characters from each of them. I know how to do this by reading them in a string and then searching by strpos() or substr() but that would require a large chunk of the RAM which isn't desirable for such a small action.
Is there any other way I can just extract, say, the last 50 bytes / characters of the text file in PHP before executing the search?
Thank you!
You can use fseek:
$fp = fopen('somefile.txt', 'r');
fseek($fp, -50, SEEK_END); // It needs to be negative
$data = fgets($fp, 50);
You can do this with file_get_contents by playing with the fourth parameter offset.
PHP 7.1.0 onward:
In PHP 7.1.0 the fourth parameter offset can be negative.
// only negative seek if it "lands" inside the file or false will be returned
if (filesize($filename) > 50) {
$data = file_get_contents($filename, false, null, -50);
}
else {
$data = file_get_contents($filename);
}
Pre PHP 7.1.0:
$fsz = filesize($filename);
// only negative seek if it "lands" inside the file or false will be returned
if ($fsz > 50) {
$data = file_get_contents($filename, false, null, $fsz - 50);
}
else {
$data = file_get_contents($filename);
}
I have a device witch use binary format style config, and i have to generate that files on-the-fly.
File structure must consist of a number of configuration settings (1 per parameter) each of the form:
Type
Length
Value
where:
Type: is a single-octet identifier which defines the parameter
Length: is a single octet containing the length of the value field in octets (not including type and length fields)
Value: is from one to 254 octets containing the specific value for the parameter
I have a corresponding table
Type_code[int] => { Type_length[int] => Value[int/string/hex/etc.] }
How to parse that table to that binary format?
And, second way, how to parse that binary file, to php array format back?
There's the pack/unpack functions that can translate between various binary/hex/octal/string formats. Read a chunk of the file, extract necessary bits with unpack, and work from there.
$fh = fopen('data.txt', 'rb'); // b for binary-safe
// read 2 bytes, extract code/length, then read $length more bytes to get $value
while(($data = fread($fh, 2)) !== EOF)) {
list($code, $length) = unpack('CC', $data);
$data = fread($fh, $length);
// do stuff
}
fclose($fh);
How can I read the binary code(to get the 1s and 0s) of a file.
$filename = "something.mp3";
$handle = fopen($filename, "rb");
$contents = fread($handle, filesize($filename));
fclose($handle);
I tried this but it shows some strange characters... I presume that this is the formated binary? I was hoping to get the 1's and 0's instead.
Also I am not looking only .mp3 files it could be anything .e.g: .txt , .doc , .mp4, .php, .jpg, .png etc....
Files are stored on the computer in binary form indeed, but the 1s and 0s are stored together in groups of 8 (called bytes). Now, traditionally, each byte may be represented by an ASCII character because of the fact that there are 256 possible values that can be represented in a byte - which happens to coincide with the total number of different ASCII characters available (this was not a coincidence but actually by design).
That being said, what you are getting back from the fread function is what you're supposed to get: i.e. the contents of the file.
If you want to see the 1s an 0s you will need to print each byte that your receive into it's base 2 representation. You can achieve that using a function such as base_convert or by writing your own.
$filename = "something.mp3";
$handle = fopen($filename, "rb");
$fsize = filesize($filename);
$contents = fread($handle, $fsize);
fclose($handle);
// iterate through each byte in the contents
for($i = 0; $i < $fsize; $i++)
{
// get the current ASCII character representation of the current byte
$asciiCharacter = $contents[$i];
// get the base 10 value of the current characer
$base10value = ord($asciiCharacter);
// now convert that byte from base 10 to base 2 (i.e 01001010...)
$base2representation = base_convert($base10value, 10, 2);
// print the 0s and 1s
echo($base2representation);
}
NOTE
If you have a string of 1s and 0s (the base 2 representation of a character) you can convert it back to the character like so:
$base2string = '01011010';
$base10value = base_convert($base2string, 2, 10); // => 132
$ASCIICharacter = chr($base10value); // => 'Z'
echo($ASCIICharacter); // will print Z
Here you go, the 1s and the 0s:
$filename = "something.mp3";
$handle = fopen($filename, "rb");
$contents = fread($handle, filesize($filename));
for ($i = 0; $i < strlen($contents); $i++) {
$binary = sprintf("%08d", base_convert(ord($contents[$i]), 10, 2));
echo $binary . " ";
}
fclose($handle);
Why not use the PHP function decbin?
for($i = 0; $i < $fsize; $i++){
$base10value = ord($contents[$i]);
echo decbin($base10value);
}