Which function to deflate a http request? - php

I make a HTTP POST request to a remote service which requires the post body to be "deflated" (and Content-encoding: deflate should be sent in headers). From my understanding, this is covered in RFC 1950. Which php function should I use to be compatible?
gzencode
gzdeflate
gzcompress

Content-Encoding: deflate requires data to be presented using the zlib structure (defined in RFC 1950), with the deflate compression algorithm (defined in RFC 1951).
Consider
<?php
$str = 'test';
$defl = gzdeflate($str);
echo bin2hex($defl), "\n";
$comp = gzcompress($str);
echo bin2hex($comp), "\n";
?>
This gives us:
2b492d2e0100
789c2b492d2e0100045d01c1
so the gzcompress result is the gzdeflate'd buffer preceded by 789c, which appears to be a valid zlib header
0111 | 1000 | 11100 | 0 | 10
CINFO | CM | FCHECK | FDICT | FLEVEL
7=32bit | 8=deflate | | no dict | 2=default algo
and followed by 4 bytes of checksum. This is what we're looking for.
To sum it up,
gzdeflate returns a raw deflated buffer (RFC 1951)
gzcompress returns a deflated buffer wrapped in zlib stuff (RFC 1950)
Content-Encoding: deflate requires a wrapped buffer, that is, use gzcompress when sending deflated data.
Note the confusing naming: gzdeflate is not for Content-Encoding: deflate and gzcompress is not for Content-Encoding: compress. Go figure!

Related

Use zlib inflate() in c++ to decompress via PHP ZLIB_ENCODING_RAW encoded data

When I compress a string in PHP with encoding ZLIB_ENCODING_DEFLATE and output the hex data, I can convert this back to the original string using zlib deflate() in a c++ project.
Per the example here ( https://www.php.net/manual/en/function.zlib-encode.php ) :
<?php
$str = 'hello world';
$enc = zlib_encode($str, ZLIB_ENCODING_DEFLATE);
echo bin2hex($enc);
?>
in c++, after having converted the hex string to binary data first: (simplified code)
z_stream d_stream;
d_stream.zalloc = (alloc_func)0 ;
d_stream.zfree = (free_func)0 ;
d_stream.opaque = (voidpf)0
d_stream.next_in = InBuffer ;
d_stream.avail_in = InBufferLen ;
d_stream.next_out = OutBuffer ;
d_stream.avail_out = OutBufferLen ;
int err = inflateInit(&d_stream) ;
while (err == Z_OK)
err = inflate(&d_stream, Z_NO_FLUSH);
err = inflateEnd(&d_stream);
OutBuffer contains "hello world" again
I was wondering if zlib inflate() also decompresses the via PHP generated zlib_encode($str, ZLIB_ENCODING_RAW); raw data ?
From the zlib documentation I think not:
The deflate compression method (the only one supported in this
version).
#define Z_DEFLATED 8
But PHP's function name zlib_encode() and define ZLIB_ENCODING_RAW seem to suggest zlib does support it ? If so what function and/or parameters do I use ?
The PHP designations are (as usual) confusing. I will assume that ZLIB_ENCODING_RAW means raw deflate data (per RFC 1951), and it appears that ZLIB_ENCODING_DEFLATE actually means zlib-wrapped deflate data (per RFC 1950).
If that's correct, they should have called them ZLIB_ENCODING_DEFLATE and ZLIB_ENCODING_ZLIB, respectively. But I digress.
You can decode raw deflate data with the zlib library by using inflateInit2() instead of inflateInit(), and giving -15 as the second argument.

GuzzleHttp request sends garbled characters

I use GuzzleHTTP 6.0 to get the data from the API server. For some reason the request which the API server receives are not UTF-8 endoded the characters ü,ö,ä,ß are garbled characters.
My default System and Database is UTF-8 encoded.
I set debug to true in the RequestOptions this is the output:
User-Agent: GuzzleHttp/6.2.1 curl/7.47.0 PHP/7.0.22-0ubunut0.16.04.1
Content-type: text/xml;charset="UTF-8"
Accept: text/xml" Cache-Control: no-cache
Content-Length: 2175 * upload completely sent off: 2175 out of 2175 bytes
<HTTP/1.1 200 OK <Server:Apache:Coyote/1.1 <Content-Type: text/xml; charset=utf-8 <Transfer-Encoding: chunked <Date: Thu, 23 Nov 2017 9:34:12 GMT <* Connection #5 to host www.abcdef.com left intact
I have set explicitily the headers contents to UTF-8;
$headers = array(
'Content-type' => 'text/xml;charset="utf-8"',
'Accept' => 'text/xml',
'Content-length' => strlen($requestBody),
);
I also tried to test using mb_detect_encoding() method
mb_detect_encoding($requestBody,'UTF-8',true); // returns UTF-8
Any further ideas how do i debug this issue..??
Content-Length must contain number of bytes, not number of characters. That could the reason if you use mbstring.func_overload. Try to omit manual set of this header, Guzzle will set it automatically in the correct way for you then.

How output data with gzip header and not deflate header

I'm testing compression of html files.
I have 2 HTML files:
Not compressed HTML file ( content will change )
Compressed HTML file .gz ( content won't change )
Using PHP I'm trying to output compressed files data and here begins.
test with already compressed html file.
//header gzip
$data = getfile($name); // custom function packed with fopen fread
header(Content-Encoding: gzip); // header works perfect
echo $data; // output OK
//header deflate
$data = getfile($name); // custom function packed with fopen fread
header(Content-Encoding: deflate); // file was gzip compressed so error is normal
echo $data; // fireFox : Content Encoding Error
test with not compressed html file
//header gzip using gzcompress();
$data = gzcompress(getfile($name), 9);
header(Content-Encoding: gzip); // somehow header is bad
echo $data; // fireFox : Content Encoding Error , but IE 9 output OK
but here we got magic
//header deflate using gzcompress();
$data = gzcompress(getfile($name), 9);
header(Content-Encoding: deflate); // header works perfect
echo $data; // Firefox output OK, but IE output ERROR
How fix this crazy thing and send all data as gzip with gzip header not defalte? maybe someone have any idea what is wrong?
Thank you
The HTTP spec. (RFC2616) says:
gzip
An encoding format produced by the file compression program
"gzip" (GNU zip) as described in RFC 1952 [25].
compress
The encoding format produced by the common UNIX file compression
program "compress".
deflate
The "zlib" format defined in RFC 1950 [31] in combination with
the "deflate" compression mechanism described in RFC 1951 [29].
The PHP docs say:
gzcompress
For details on the ZLIB compression algorithm see the document
"ZLIB Compressed Data Format Specification version 3.3"
(RFC 1950).
gzdeflate
For details on the DEFLATE compression algorithm see the document
"DEFLATE Compressed Data Format Specification version 1.3"
(RFC 1951).
gzencode
For more information on the GZIP file format, see the document:
GZIP file format specification version 4.3 (RFC 1952).
From this, one can come to the conclusion that gzencode() must be used with gzip, and gzcompress() (with the DEFLATE encoding) must be used with deflate.
The first combination works for me. I haven't tried the second; don't know why it wouldn't work with IE. A URL might help to trouble-shoot that problem.

How to determine if a string was compressed?

How can I determine whether a string was compressed with gzcompress (aparts from comparing sizes of string before/after calling gzuncompress, or would that be the proper way of doing it) ?
PRE: I guess, if you send a request, you can immediately look into $http_response_header to see if the one of the items in the array is a variation of Content-Encoding: gzip. But this is not ideal!
there is a far better method.
Here is HOW TO...
Check if its GZIP. Like a BOSS!
according to GZIP RFC:
The header of gzip content looks like this
+---+---+---+---+---+---+---+---+---+---+
|ID1|ID2|CM |FLG| MTIME |XFL|OS | (more-->)
+---+---+---+---+---+---+---+---+---+---+
the ID1 and ID2 identify the content as GZIP. And CM states that the ZLIB_ENCODING (the compression method) is ZLIB_ENCODING_DEFLATE - which is customarily used by GZIP with all web-servers.
oh! and they have fixed values:
The value of ID1 is "\x1f"
The value of ID2 is "\x8b"
The value of CM is "\x08" (or just 8...)
almost there:
`$is_gzip = 0 === mb_strpos($mystery_string , "\x1f" . "\x8b" . "\x08");`
Working example
<?php
/** #link https://gist.github.com/eladkarako/d8f3addf4e3be92bae96#file-checking_gzip_like_a_boss-php */
date_default_timezone_set("Asia/Jerusalem");
while (ob_get_level() > 0) ob_end_flush();
mb_language("uni");
#mb_internal_encoding('UTF-8');
setlocale(LC_ALL, 'en_US.UTF-8');
header('Time-Zone: Asia/Jerusalem');
header('Charset: UTF-8');
header('Content-Encoding: UTF-8');
header('Content-Type: text/plain; charset=UTF-8');
header('Access-Control-Allow-Origin: *');
function get($url, $cookie = '') {
$html = #file_get_contents($url, false, stream_context_create([
'http' => [
'method' => "GET",
'header' => implode("\r\n", [''
, 'Pragma: no-cache'
, 'Cache-Control: no-cache'
, 'User-Agent: Mozilla/5.0 (Windows NT 6.1; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/42.0.2310.0 Safari/537.36'
, 'DNT: 1'
, 'Accept-Language: en-US,en;q=0.8'
, 'Accept: text/plain'
, 'X-Forwarded-For: ' . implode(', ', array_unique(array_filter(array_map(function ($item) { return filter_input(INPUT_SERVER, $item, FILTER_SANITIZE_SPECIAL_CHARS); }, ['HTTP_X_FORWARDED_FOR', 'REMOTE_ADDR', 'HTTP_CLIENT_IP', 'SERVER_ADDR', 'REMOTE_ADDR']), function ($item) { return null !== $item; })))
, 'Referer: http://eladkarako.com'
, 'Connection: close'
, 'Cookie: ' . $cookie
, 'Accept-Encoding: gzip'
])
]]));
$is_gzip = 0 === mb_strpos($html, "\x1f" . "\x8b" . "\x08", 0, "US-ASCII");
return $is_gzip ? zlib_decode($html, ZLIB_ENCODING_DEFLATE) : $html;
}
$html = get('http://www.pogdesign.co.uk/cat/');
echo $html;
What do we see here that is worth mentioning?
start with initializing the PHP engine to use UTF-8 (since we don't really know if the web-server will return a GZIP content.
Providing the header Accept-Encoding: gzip, tells the web-sever, it may output a GZIP content.
Discovering GZIP content (you should use the multi-byte functions with ASCII encoding).
Finally returning the plain output, is easy using the ZLIB methods.
A string and a compressed string are both simply sequences of bytes. You cannot really distinguish one sequence of bytes from another sequence of bytes. You should know whether a blob of bytes represents a compressed format or not from accompanying metadata.
If you really need to guess programmatically, you have several things you can try:
Try to uncompress the string and see if the uncompress operation succeeds. If it fails, the bytes probably did not represent a compressed string.
Try to check for obvious "weird" bytes like anything before 0x20. Those bytes aren't typically used in regular text. There's no real guarantee that they occur in a compressed string though.
Use mb_check_encoding to see whether a string is valid in the encoding you suspect it to be in. If it isn't, it's probably compressed (or you checked for the wrong encoding). With the caveat that virtually any byte sequence is valid in virtually every single-byte encoding, so this'll only work for multi-byte encodings.
This work fine for me:
if (#gzuncompress($_xml)!==false) {
// gzipped sring
You can simply try gzuncompress() on the data as noted by #DiDiegodaFonseca. If it fails, it was not made by gzcompress(), or it was not faithfully transmitted.
If you really want to, you can check the first two bytes for a zlib header (not a gzip header, as incorrectly suggested in the accepted answer). gzcompress() produces a zlib stream, not a gzip stream. gzencode() is what produces a gzip stream. gzdeflate() produces a raw deflate stream.
RFC 1950 describes the zlib header. It is two bytes, where the two bytes taken as a big-endian 16-bit unsigned integer must be a multiple of 31. In addition to checking that, you can check that the low four bits of the first byte is 8 (1000), and that the high bit is zero.

How does gzcompress work?

I'm wondering about why I need to cut off the last 4 Characters, after using gzcompress().
Here is my code:
header("Content-Encoding: gzip");
echo "\x1f\x8b\x08\x00\x00\x00\x00\x00";
$index = $smarty->fetch("design/templates/main.htm") ."\n<!-- Compressed by gzip -->";
$this->content_size = strlen($index);
$this->content_crc = crc32($index);
$index = gzcompress($index, 9);
$index = substr($index, 0, strlen($index) - 4); // Why cut off ??
echo $index;
echo pack('V', $this->content_crc) . pack('V', $this->content_size);
When I don't cut of the last 4 chars, the source ends like:
[...]
<!-- Compressed by gzip -->N
When I cut them off it reads:
[...]
<!-- Compressed by gzip -->
I could see the additional N only in Chromes Code inspector (not in Firefox and not in IEs source). But there seams to be four additional characters at the end of the code.
Can anyone explain me, why I need to cut off 4 chars?
gzcompress implements the ZLIB compressed data format that has the following structure:
0 1
+---+---+
|CMF|FLG| (more-->)
+---+---+
(if FLG.FDICT set)
0 1 2 3
+---+---+---+---+
| DICTID | (more-->)
+---+---+---+---+
+=====================+---+---+---+---+
|...compressed data...| ADLER32 |
+=====================+---+---+---+---+
Here you see that the last four bytes is a Adler-32 checksum.
In contrast to that, the GZIP file format is a list of of so called members with the following structure:
+---+---+---+---+---+---+---+---+---+---+
|ID1|ID2|CM |FLG| MTIME |XFL|OS | (more-->)
+---+---+---+---+---+---+---+---+---+---+
(if FLG.FEXTRA set)
+---+---+=================================+
| XLEN |...XLEN bytes of "extra field"...| (more-->)
+---+---+=================================+
(if FLG.FNAME set)
+=========================================+
|...original file name, zero-terminated...| (more-->)
+=========================================+
(if FLG.FCOMMENT set)
+===================================+
|...file comment, zero-terminated...| (more-->)
+===================================+
(if FLG.FHCRC set)
+---+---+
| CRC16 |
+---+---+
+=======================+
|...compressed blocks...| (more-->)
+=======================+
0 1 2 3 4 5 6 7
+---+---+---+---+---+---+---+---+
| CRC32 | ISIZE |
+---+---+---+---+---+---+---+---+
As you can see, GZIP uses a CRC-32 checksum for the integrity check.
So to analyze your code:
echo "\x1f\x8b\x08\x00\x00\x00\x00\x00"; – puts out the following header fields:
0x1f 0x8b – ID1 and ID2, identifiers to identify the data format (these are fixed values)
0x08 – CM, compression method that is used; 8 denotes the use of the DEFLATE data compression format (RFC 1951)
0x00 – FLG, flags
0x00000000 – MTIME, modification time
the fields XFL (extra flags) and OS (operation system) are set by the DEFLATE data compression format
echo $index; – puts out compressed data according to the DEFLATE data compression format
echo pack('V', $this->content_crc) . pack('V', $this->content_size); – puts out the CRC-32 checksum and the size of the uncompressed input data in binary
gzcompress produces output described here RFC1950 , the last 4 bytes you're chopping off is the adler32 checksum. This is the "deflate" encoding, so you should just set "Content-Encoding: deflate" and not manipulate anything.
If you want to use gzip, use gzencode() , which uses the gzip format.

Categories