PHP - Detect gzip server response - php

I'm using curl to fetch a webpage, I need to detect if the response is gzip or not.
This works perfectly fine if Content-Encoding is specified in the response headers, but some servers instead return "Transfer-Encoding": "Chunked" and no Content-Encoding header.
Is there any way to detect gzip or get the raw (encoded) server response?
I tried looking at curl_getinfo but the content_encoding isn't specified either.
Thanks.

You can check if response starts with gzip magic numbers, specifically 1f 8b.

Is there any way to detect gzip
Yes. You can use cURLs Header functions. For example you can define an function, which handles the header responses. Use curl_setopt()with the CURLOPT_HEADERFUNCTION option. Or write it to an file (which you have created with fopen()) with the CURLOPT_WRITEHEADER option.
There may are more options you could use. Look out the possibilities at the curl_setopt() manual. The header you are looking for have the name: Content-Encoding.
If you have the output in a file, you could also use PHPs finfo with some of its predefined constants. Or mime_content_type() (DEPRECATED!) if finfo is not available to you.
[...] or get the raw (encoded) server response?
Yes. You can specify the accept-encoding header. The value you are look for is identity.
So you can send:
Accept-Encoding: identity
May have look to the HTTP/1.1 RFC
To get an unencoded/uncompressed output (for example to directly write it into a file).
Use CURLOPT_ENCODING for this purpose. You can set it also with *curl_setopt*.

You can either issue a separate HEAD request:
CURLOPT_HEADER => true
CURLOPT_NOBODY => true
Or request the header to be prefixed to your original request:
CURLOPT_HEADER => true
But, if you just want to get the (decoded) HTML, you can use:
CURLOPT_ENCODING => ''
And CURL will automatically negotiate with the server and decode it for you.

Related

Can I add a custom header to simplexml_load_file

I'd like to download a remote page only when it differs from a version I have already. There's no "Last-Modified" or "Expires" (the server sends Cache-Control: max-age=0, private, must-revalidate) but there's the ETag: field.
So, I can send If-None-Match: header with last ETag value and on any error (including 304 Not Modified) retry after a delay.
Currently I'm using simplexml_load_file to grab the URL, and I wonder if I can just call it in some way adding the extra header, or do I need to roll out more heavyweight solutions (curl, file_get_contents etc)?
You can use cURL with adding custom header, then use simplexml_load_string (with return content from cURL request) to get SimpleXMLElementobject.
curl_setopt($ch, CURLOPT_HTTPHEADER, array('If-None-Match:: XXX'));

How to check whether a page has been updated without downloading the entire webpage?

How to check whether a page has been updated without downloading the entire webpage in Php? Whether I need to look in at the header?
One possibility is to check the LastModified header. You can download just the headers by issuing a HEAD request. The server responds back with the HTTP headers only and you can inspect the last modified header and/or the content length header to detect changes.
Last-modified "Mon, 03 Jan 2011 13:02:54 GMT"
One thing to note is that the HTTP server does not need to send this header so this would not work in all cases. The PHP function get_headers will fetch these for you.
// By default get_headers uses a GET request to fetch the headers. If you
// want to send a HEAD request instead, you can do so using a stream context:
stream_context_set_default(
array(
'http' => array(
'method' => 'HEAD'
)
)
);
$headers = get_headers('http://example.com');
You can add a If-Modified-Since: <datetime> header to your request, and the server should return a 304 Not Modified if it hasn't changed since then. But if the document is generated dynamically (php, perl, etc.), the generator could be too lazy to check this header and always return the full document.

PHP cookie handling

A centain web client that I need to support, is sending back the Cookies header to my application twice in the HTTP headers, this in turn is making PHP unable to read the correct value for the cookie thus ignoring the session.
Here is the relevant part of the request I am seeing:
GET / HTTP/1.1
Cache-Control: max-age=0
Accept-Language: en-US
Cookie: PHPSESSID=49af82ddf12740e6a35b15985e93d91a
Connection: Keep-Alive
Cookie: PHPSESSID=49af82ddf12740e6a35b15985e93d91a
[...] Other irrelevant headers
I have two questions:
Is that a PHP bug? or is the behavior undefined when the client sends that same header twice?
Is there a quick workaround to make things work without having to manually parse the HTTP headers so I can read the right value of the cookie (and session) in my application? Or should I manually parse the HTTP header to set the session to its correct value?
According to the HTTP spec, a double header simply concatenates the values together with a comma, making it:
Cookie: PHPSESSID=49af82ddf12740e6a35b15985e93d91a, PHPSESSID=49af82ddf12740e6a35b15985e93d91a
PHP should be able to parse the cookies, but the behavior of sessions is undefined when there are two session IDs.
I strongly recommend fixing the client. If that's not an option, you'll have to parse the headers manually.

How to set correct Content-Encoding

I have some serious (or better say: strange) issues with the HTTP-header: Content-Encoding.
I want to gzip my content before sending it to the clients browser. For this I am checking if the clients browser accepts gzip and if so I am using ob_start("ob_gzhandler") and setting the content-encoding: $response->addHeader("Content-Encoding", "gzip");
I think my problem is the manual setting of the Content-Encoding header.
If I use $response->addHeader("Content-Encoding", "gzip"); the content is only shown in Opera.
If I use $response->addHeader("Content-Encoding", "'gzip'"); the content is shown correct in all browsers, but gzip compression checks say that it is not compressed and the W3C HTML Validation service cannot encode the page:
The error was: Don't know how to decode Content-Encoding ''gzip''
If I don't use the line and online use ob_start("ob_gzhandler") the page can only be shown in opera
My complete lines of code, which does correct output in browser is following:
$accEncoding = $request->getHeader("http_accept_encoding");
if($accEncoding !== NULL && substr_count($accEncoding, 'gzip')) {
ob_start("ob_gzhandler");
$response->addHeader("Content-Encoding", "'gzip'");
$response->addHeader("Vary", "Accept-Encoding");
} else {
ob_start();
}
Am I using the ob_gzhandler wrong or am I doing any other mistake here? I am very confused about the correct handling of gzip output.
ob_gzhandler already verifies that the browser supports gzip compression:
Before ob_gzhandler() actually sends compressed data, it determines what type of content encoding the browser will accept ("gzip", "deflate" or none at all) and will return its output accordingly.
It also sets the Content-Encoding header accordingly.
Also note that using zlib.output_compression is preferred over ob_gzhandler().

http post headers

When I post some headers in a request and view them on on the receiving page, most of them are prefixed with "HTTP_" except for a few like [CONTENT_TYPE] => text/xml [CONTENT_LENGTH] => 8647.
When I post my own headers (which are required for an external server) they then get prefixed e.g. My header: BATCH_TYPE shows up as HTTP_BATCH_TYPE
I'm having some some problems with the headers i.e. I have to include ones like "BATCH_COUNT" & "VENDOR_ID" for an external server and when I test them internally I view them as HTTP_BATCH_COUNT and HTTP_VENDOR_ID
Is the "HTTP_" prefix normal or is there any way to remove it?
Thanks,
If you're using a CGI script to test, then it's the web server that's adding the HTTP_ prefix. Don't worry - that prefix is almost certainly not present on the network. You could use http://www.xhaus.com/headers to check.

Categories