Trouble with zlib compressed File - php

I have inherited a zlib compressed file and long story short, I need to UN-zlib-compress this puppy back to its original content.
I have been racking my brain trying to figure out what in the world is happening, but I am hitting a wall and I am hoping you good people will help me out to figure out what's going on.
I have done alot of things so far, I won't bore you with every single thing, but this is what I landed on last, and all I get garbled output, don't know what in the heck is wrong, especially that the last step of decode complains about the data saying:
Warning: gzuncompress(): data error in
C:\xampp\htdocs\test-box\index.php on line 6
Warning: zlib_decode():
data error in C:\xampp\htdocs\test-box\index.php on line 8
and this is the code - nothing fancy, I am trying to get it to work before going too crazy with it yet and so the simplicity should allow us to better analyze it.
<?php
$filename = 'c5ytvbg4y.x'; // this is the zlib compressed file
$file = filesize($filename); // using this for the length
$zd = gzopen($filename, "r"); // create valid pointer
$contents = gzread($zd, $file); // binary safe read the content
$decoded = gzuncompress($contents); // using gzdecode produces the same issue
gzclose($zd); // close the pointer
zlib_decode($decoded); // decode it but I get nothing but garble
?>
Any assistance would be appreciated. Ideally I want to be able to open it uncompress it back to normal and save it to a new file. But at the moment I would be happy just to find out why in the heck I get nothing but garbled text back. Also keep in mind that I know the $file above is not ideal, I will put a while !feof($zd) or something to that effect later, I wanted to keep it simple for now while trying get the larger issue figured out.
Any thoughts, recommendations, suggestions, code assistance, or whatnot would be greatly appreciated, TIA.
Additions
#Mark's Request:
0A 12 0F 04 04 D8 44 DA BF 63 C4 93 93 3B 49 51 17 A2 6F E3 0C 12 4D E4 24 F6 C8 BA D0 60 76 81

It is definitely not a "zlib compressed file", at least not the first 32 bytes, nor is it any format that uses the deflate compression method (e.g. gzip, zip, png, etc.), because there is no valid deflate compressed data in the provided bytes.

The zlib header typically starts with hexadecimal 78. Your data starts with 0A, which isn't valid as part of a zlib header. (Technically it is sort of valid, but it implies a compression format that isn't supported by any version of zlib.)
The gzip header starts with hexadecimal 1F 8B. That isn't present in your data either.
So, I'm not sure what this data is, but it's neither gzip nor zlib data. You'll need to do some more research to figure out what it is.

Related

Complicated: Error uncompressing data from a packet stream zlib

I have a incoming stream that is compressed using the zlib functions, but I cannot tell the ending of the compressed data, so am having a lot of trouble getting the data out.
I also have a snippet of the source code where it is being uncompressed in AS3 flash, which should have been enough for me to figure it out, but I am at a loss.
Two files included:
http://falazar.com/projects/irnfll/version4/test//stackoverflow/as3_code.as.txt
http://falazar.com/projects/irnfll/version4/test//stackoverflow/bin_data_file
Snippet of the binary data, and what I know:
00 00 02 34 2c 02 31 78 5e ed dc cd 6e da 40 a5
21 19 40 f5 f2 c4 b7 e9 18 85 e1 5b 89 66 3d 42
31 95 90 cd 15 74 99 55 37 51 14 59 c9 a8 8c 54
0234 appears to be a size marker - 564
2c = 44, the code to match as3 COMMAND_WORLD_DATA, that is ok
0231 another size marker, always 3 smaller than above
785e - is the header marker for the zlib compress weakest compression, ZLIB_ENCODING_DEFLATE level 4
Later there is also a 789c which is another larger compressed block
I need to uncompress the two of these to move forward in project, thank you for your help.
There is also mention in the script of bigendian conversion, and am I not sure if I need to handle that.
I have written a couple scripts to try and solve this, including a php snippet that chops off the end until and loops trying to uncompress with no luck.
falazar.com/projects/irnfll/version4/test//stackoverflow/php_test.php.txt
Ideal solution in php or c#, but anything I can see that works will translate into another language easy enough.
(Using Free hex editor nero to view the binary)
You mean zlib.
Use PHP's gzuncompress() starting at each zlib header (e.g. 789c).

Parsing GIF application's extension blocks- how to find block size?

I am parsing a GIF 89a (yes, I need to) file and I am stuck on Application Extension blocks.
They have 13 byte header (including the beginning 21 FF 0B bytes) and then there is some data. How much data is there? How do I know know much to read?
You can skip the section below if you know the answer and just tell me :)
This page says:
ApplicationData contains the information that is used by the software application. This field is structured in a series of sub-blocks identical to the data found in a Plain Text Extension block."
Each sub-block begins with a byte that indicates the number of data bytes that follow. From 1 to 255 data bytes may follow this byte. There may be any number of sub-blocks in this field.
This way I can parse NETSCAPE 2.0 blocks which are:
03 01 00 00 00
so I have a loop in PHP:
for (;;)
{
$size = ord(fread($handle, 1));
if ($size == 0) break;
fseek($handle, $size);
}
or the same in Delphi, if you prefer:
while F.Position < F.Size do begin
F.Read(Size, 1); // F is TFileStream
if Size = 0 then break;
F.Position := F.Position + Size;
end;
The iteration goes:
size = read 1 byte; //size = 3;
read 3 byte;
size = read 1 byte;
size = 0 so break
So far, so good, here comes the problem: the XMP Data
So the bytes in this block go like this (ASCII below):
21 FF 0B 58 4D 50 20 44 61 74 61 58 4D 50
!`.XMP DataXMP
and then goes ASCII XML dump:
<?xpacket begin="" id="W5M0MpCehiHzreSzNTczkc9d"?>
for about 500 bytes.
I obviously can't read it the same way I read NETSCAPE 2.0 blocks.
It seems to be terminated with 00 byte.
Should it just always read until 00 byte? Then if would fail on NETSCAPE 2.0 blocks!
How should a GIF decoder behave on Application Extension blocks? How much data is in them?
Problematic XMP Data image
Ok- the NETSCAPE 2.0 block approach might be fine and it was failing on the XML because my file could be corruptly read.

Check if file is JPEG, PDF or TIFF

How would i check that a file is either jpeg, pdf or tiff? And I mean actually checking, not just from mime type and file extension.
I have access to the raw file data (this check is part of an uploader) and i need to verify that the files are either jpeg, pdf or tiff. I assume I would have to check for some sort of headers in the files but I have no idea what to look for and where to start.
Exif_imagetype is very useful for this: http://us2.php.net/manual/en/function.exif-imagetype.php
It scans the initial bytes of the file to determine the graphic type. It supports a large number of graphic formats (and returns false if it doesn't recognize the format).
You need to implement byte sequence tests.
Here is a guide to checking byte sequences for the most common image formats.
This can be tricky since all files must follow a certain kind of ISO standard with the "magical number" present, which basically is a "header" for the format.
I found this wiki-page about different signatures: http://en.wikipedia.org/wiki/List_of_file_signatures
So in the best case scenario you just need to validate these first bytes.
If you have access to the raw file, you can check the file header for its magic number. This number define the type of file.
to check for image types you can use the exif_imagetype function.
for pdf: you have to open the file and read the first bytes and look if it starts with '%PDF'
$fp = fopen($pdf, 'r');
if(fgets($fp, 4) == '%PDF')
{
... is pdf
}
fclose($fp);
There is no sure fired way to be certain but the first few binary bits of a file are its signature/fingerprint for the file handlers to test. see https://en.wikipedia.org/wiki/List_of_file_signatures
Every file type can vary considerably and some allow for variable / shifting headers, but with a degree of uncertainty (At one time PDF did not mandate the 40 bit signature to be first) we can assume the following hex values sometimes erroneously called "Magic Numbers" as representing the start of each bit stream.
So in general to answer the requested types
FF D8 (ÿØ) would be a Jpeg (EXCEPT JP2000=FF 4F or 00 00) in raw binary or /9j/4 in Base64 format
25 50 44 46 2d (%PDF-) would be the 40 bit signature of a PDF or JVBER in Base64 format
89 50 4E 47 (‰PNG) would be PNG in raw binary or iVBOR in Base64 format
just for good measure here is related older GIF sequence
47 49 46 38 (GIF8) and that's R0lGO as Base64 also we can see the first 8 bits are 01000111 for G
Thus in ALL the above cases just the first "8 bit / byte" would be a very good indicator, no need for Magic strings, but with Zip/###X such as docX pptX cbzX xlsX they ALL have the same Magic Number
50 4B (PK) base64 = UEsDB
Finally the last requested above was Tif(f) which can be two types, Intel or Motorola thus you need to test for
49 49 2A 00 (II* ) base64 = SUkqA
4D 4D 00 2A (MM *) base64 = TU0AK

Convert HEX to ASCII, data from GPS tracker

I have just bought a GPS Tracker, it can send SMS to cellphone just fine. It also supports reporting to a server via GPRS.
I have setup the device to contact my own server on port 8123, it's a FreeBSD server and i have checked that i recieve packets on that port.
I successfully have setup a listener server written in PHP, and i can receive data from the device. But how do i convert the partial HEX data to something usefull (ASCII)?
Example data string:
$$^#T^#E Y'^WÿU210104.000,A,5534.4079,N,01146.2510,E,0.00,,170411,,*10|1.0|72|0000á
Unfortunately i don't know how i can copy-paste the HEX parts
Now how do i get the ID part out? I have tried echo hexdec(mb_substr($data, 4, 7));
The data is following this protocol
From the document:
Command format of GPRS packets are as follows:
From server to tracker:
##\r\n
From tracker to server:
$$\r\n
Note:
Do NOT input ‘’ when writing a command.
All multi-byte data complies with the following sequence: High byte prior to low byte.
The size of a GPRS packet (including data) is about 100 bytes
Item Specification
## 2 bytes. It means the header of packet from server to tracker.
It is in ASCII code (Hex code: 0x40)
$$ 2 bytes. It is the header of packet from tracker to server.
It is in ASCII code (Hex code: 0x24)
L 2 bytes. It means the length of the whole packet including
the header and ending character and it is in hex code
ID 7 bytes, ID must be digit and not over 14 digits, the unused byte
will be stuffed by ‘f’ or ‘0xff’. It is in the format of hex code.
For example, if ID is 13612345678, then it will be shown as
follows: 0x13, 0x61, 0x23, 0x45, 0x67, 0x8f, 0xff.
If all 7 bytes are 0xff, it is a broadcasting command. ID is in hex code
command 2 bytes. The command code is in hex code. Please refer to the
command list below.
data Min 0 byte and max 100 bytes. See Annex 1 for description of ‘data’.
checksum 2 bytes. It indicates CRC-CCITT (default is 0xffff) checksum of
all data (not including CRC itself and the ending character).
It is in hex code.
For example: 24 24 00 11 13 61 23 45 67 8f ff 50 00 05 d8 0d 0a
0x05d8 = CRC-CCITT (24 24 00 11 13 61 23 45 67 8f ff 50 00)
\r\n 2 bytes. It is the ending character and in hex code
(0x0d,0x0a in hex code)
Update
With the answer from Anomie, i was able to piece this together
$arr = unpack('H4length/H14id/H4cmd/H4crc/H4end', mb_substr($data, 2, 11) . mb_substr($data, -4));
var_dump($arr);
This will out put something like
array(5) {
["length"]=>
string(4) "0054"
["id"]=>
string(14) "004512345678ff"
["cmd"]=>
string(4) "9955"
["crc"]=>
string(4) "c97e"
["end"]=>
string(4) "0d0a"
}
It sounds like you are needing to convert binary data to integers or strings. The most straightforward way is to use unpack.
For example, to extract the length you already know you can use
$length_bin = substr($string, 2, 2);
To convert it to an integer, you can use something like
$length = unpack('v', $length_bin); $length = $length[1];
The 'v' code will work for the length and the checksum; if you have a number stored as 4 bytes use 'V', and for the ID you can use 'H*' to get it as a string of hex digits. Other codes are listed in the documentation.
A somewhat less straightforward way is to do the bit manipulation manually, after using unpack with C* to get an array of all the byte values. For example,
$bytes = unpack('C*', $length_bin);
$length = ($bytes[0] << 8) | $bytes[1];
You need to know the format of the messages you are going to receive from the device. You can get this info from the manufacturer. Then, depending on that, you have to create a proper listener in the server side.
I've been working with several devices like that and normally you have to create a process in the server listening to the port with a Socket (or similar). You may have an authentication process also to differentiate between devices (you can have more than one). After that, you simply get the data from the device, you parse it and you store it. Depending on the device you can also send requests or configurations.
Hope this helps
*Edit 26 April:* I have changed the question a bit, thus this seems out of place. Initial question was more on how to read the data from TCP.
I found some great articles on writing a TCP/socket server in PHP (/me slaps PHP around a bit with a large trout)
http://devzone.zend.com/article/1086
http://kevin.vanzonneveld.net/techblog/article/create_daemons_in_php/
Can't wait to get this going :)

Converting a PDF to JPG with ImageMagick in PHP Gives Odd Letter Spacing

I am trying to convert a PDF to a JPG with a PHP exec() call, which looks like this:
convert page.pdf -resize 716x716 page.jpg
For some reason, the JPG comes out with janky text, despite the PDF looking just fine in Acrobat and Mac Preview. Here is the original PDF:
http://whit.info/dev/conversion/page.pdf
and here is the janktastic output:
http://whit.info/dev/conversion/page.jpg
The server is a LAMP stack with PHP 5 and ImageMagick 6.2.8.
Can you help this stumped Geek?
Thanks in advance,
Whit
ImageMagick is just going to call out to Ghostscript to convert this PDF to an image. If you run gs on the pdf, you get the same badly-spaced output.
I suspect Ghostscript isn't handling the PDF's embedded TrueType fonts very well. If you could change your output to either embed Type 1 fonts or use a "core" PostScript font, you'd get better results.
I suspect its an encoding/widths issue. Both are a tad off, though I can't put my finger on why.
Here are some suspects:
First
The text stream is defined in UTF-16 LE. charNULLcharNULL, using the normal string drawing command syntax:
(some text) Tj
There's a way to escape any old character value into a () string. You can also define strings in hex thusly:
<203245> Tj
Neither method are used, just the questionable inline nulls. That could cause an issue in GS if it's trying to work with pointers to char without lengths associated with them.
Second
The widths array is dumb. You can define widths in groups thusly:
[ 32 [450 525 500] 37 [600 250] 40 [0] ]
This defines
32: 450
33: 525
34: 500
37: 600
38: 250
40: 0
These fonts defines their consecutive widths in individual arrays. Not illegal, but definitely wasteful/stupid, and if GS were coded to EXPECT gaps between the arrays, it could induce a bug.
There's also some extremely fishy values in the array. 32 through 126 are defined consecutively, but then it starts jumping all over: ...126 [600] 8364 [500] 8216 [222] 402 [500] 8222 [389]. 8230 [1000] 8224 [444]... and then goes back to being consecutive from 160 to 255.
Just weird.
Third
I'm not even remotely sure, but the CIDToGIDMap stream contains an AWEFUL lot of nulls.
Bottom line
Those fonts are fishy. And I've never heard of "Bellflower Books" or "UFPDF 0.1"
That version number makes me cringe. It should make you cringe too.
Googleing for "UFPDF" I found this note from the author:
Note: I wrote UFPDF as an experiment, not as a finished product. If you have problems using it, don't bug me for support. Patches are welcome though, but I don't have much time to maintain this.
UFPDF is a PHP library that sits on top of FPDF. 0.1. Just run away.

Categories