How to set correct Content-Encoding

How to set correct Content-Encoding - php

I have some serious (or better say: strange) issues with the HTTP-header: Content-Encoding.
I want to gzip my content before sending it to the clients browser. For this I am checking if the clients browser accepts gzip and if so I am using ob_start("ob_gzhandler") and setting the content-encoding: $response->addHeader("Content-Encoding", "gzip");
I think my problem is the manual setting of the Content-Encoding header.
If I use $response->addHeader("Content-Encoding", "gzip"); the content is only shown in Opera.
If I use $response->addHeader("Content-Encoding", "'gzip'"); the content is shown correct in all browsers, but gzip compression checks say that it is not compressed and the W3C HTML Validation service cannot encode the page:
The error was: Don't know how to decode Content-Encoding ''gzip''
If I don't use the line and online use ob_start("ob_gzhandler") the page can only be shown in opera
My complete lines of code, which does correct output in browser is following:
$accEncoding = $request->getHeader("http_accept_encoding");
if($accEncoding !== NULL && substr_count($accEncoding, 'gzip')) {
ob_start("ob_gzhandler");
$response->addHeader("Content-Encoding", "'gzip'");
$response->addHeader("Vary", "Accept-Encoding");
} else {
ob_start();
}
Am I using the ob_gzhandler wrong or am I doing any other mistake here? I am very confused about the correct handling of gzip output.

ob_gzhandler already verifies that the browser supports gzip compression:
Before ob_gzhandler() actually sends compressed data, it determines what type of content encoding the browser will accept ("gzip", "deflate" or none at all) and will return its output accordingly.
It also sets the Content-Encoding header accordingly.
Also note that using zlib.output_compression is preferred over ob_gzhandler().

Related

Guzzle response with content-encoding: gzip comes back with incorrect content-length header

I have a symfony2 application where I am using the Guzzle http client to send a GET request to a server in order to retrieve the contents of a json file. The Guzzle response gets transformed into a Symfony2 response to the browser.
The Guzzle response comes back with the following headers:
Content-Encoding: gzip
Content-Length: 2255
Content-Type: application/json
When outputting the data to the UI/browser I notice that it gets cut off because the Content-Length is incorrect. The size of the file is closer to 4905 bytes, not 2255. 2255 is the exact length of the data up to the cut-off point. I suspect that the 2255 is the size of the gzipped data and it gets uncompressed at some point without updating the content-length. Now I did verify that I get all of the data back, however the content-length header is honored which is why the data gets cut off when I forward it to the browser. Interestingly, hitting the url to the json file directly yields the full contents even though the content-length is 2255 which means it gets ignored by Chrome when hitting the file directly. Same if I use the POSTman REST client to make the GET request - full contents get displayed.
By default, Guzzle has a request option decode_content = true for how the responses should be handled. I set it to false when submitting the request but that didn't seem to resolve the issue.
Before converting the Guzzle response to a Symfony response I removed the content-length header and that seems to solve the problem however I am not sure that's the best approach since RFC protocol states that a content-length header should be present unless a transfer-encoding header is present, which it isn't. https://www.w3.org/Protocols/rfc2616/rfc2616-sec4.html
Another alternative is, since this is a streamed response, to get the size of the stream and correct the content-length, however the Guzzle implementation uses strlen() for this which has the undesirable affect of reading the whole stream.
What possible issues might I run into if I choose to omit the content-length header? And alternatively, is there a way to get the TRUE length of the contents without reading the whole stream and simply update the content-length header with the correct amount?

How to fix incorrect mime-type (atom-feed) showed in chrome-devtools?

I'm not sure if the problem occurs because of wrong PHP-code or maybe a wrong configuration of nginx.
I like to generate a feed in atom-format. The XML of the feed is valid. I do set the content-type via
header("Content-type: application/atom+xml");
before I put out the XML. Nonetheless, I get different information from chromium developer-tools.
The tableview in Network shows me text/plain as type:
However, the header itself seems okay as it states application/atom+xml:
This mime-type is correctly set inside nginx-configuration:
types {
[...]
application/atom+xml atom;
[...]
}
What could be missing/wrong that chromium does not recognize the correct mime-type of my feed and states it as text/plain?

The problem seems to be Chrome not regognizing the application/*+xml content type. It looks that you need to use plain old application/xml to get XSLT processing and correct Content-Type display to work in dev tools.

PHP - Detect gzip server response

I'm using curl to fetch a webpage, I need to detect if the response is gzip or not.
This works perfectly fine if Content-Encoding is specified in the response headers, but some servers instead return "Transfer-Encoding": "Chunked" and no Content-Encoding header.
Is there any way to detect gzip or get the raw (encoded) server response?
I tried looking at curl_getinfo but the content_encoding isn't specified either.
Thanks.

You can check if response starts with gzip magic numbers, specifically 1f 8b.

Is there any way to detect gzip
Yes. You can use cURLs Header functions. For example you can define an function, which handles the header responses. Use curl_setopt()with the CURLOPT_HEADERFUNCTION option. Or write it to an file (which you have created with fopen()) with the CURLOPT_WRITEHEADER option.
There may are more options you could use. Look out the possibilities at the curl_setopt() manual. The header you are looking for have the name: Content-Encoding.
If you have the output in a file, you could also use PHPs finfo with some of its predefined constants. Or mime_content_type() (DEPRECATED!) if finfo is not available to you.
[...] or get the raw (encoded) server response?
Yes. You can specify the accept-encoding header. The value you are look for is identity.
So you can send:
Accept-Encoding: identity
May have look to the HTTP/1.1 RFC
To get an unencoded/uncompressed output (for example to directly write it into a file).
Use CURLOPT_ENCODING for this purpose. You can set it also with *curl_setopt*.

You can either issue a separate HEAD request:
CURLOPT_HEADER => true
CURLOPT_NOBODY => true
Or request the header to be prefixed to your original request:
CURLOPT_HEADER => true
But, if you just want to get the (decoded) HTML, you can use:
CURLOPT_ENCODING => ''
And CURL will automatically negotiate with the server and decode it for you.

Send HTTP headers before or after a cookie header?

I was wondering if there are any problems or difference between sending normal headers before or after sending cookie headers. Do some browsers prefer a certain order to headers? If the cookie header is to large would subsequent headers never be parsed?
setcookie("TestCookie", $value);
header("Content-type: text/javascript");
or
header('Location: http://www.example.com/');
setcookie("TestCookie", $value);
or
setcookie("SuperLargeCookie", $massive_value);
setcookie("TinyCookie", $small_value);
header("Status: 404 Not Found");

There is no difference. The Http protocol does not specify that headers are to be in a certain order. Browsers do not differentiate based on the order of headers either.
The total length of Http headers does have a limit. This limit is imposed by the server and not the browser. Typically between 8K and 16K. However this is configurable.

It really doesn't matter as long as the other HTTP headers have not been sent. setcookie() actually writes a header itself:
Set-Cookie: SuperLargeCookie=whatever; Max-Age=3600; Version=1
similar to a header() call:
Location: http://www.example.com/redirect

HTTP messages span packets all the time, so you'd be hard-pressed to overfill one unless you're jamming tons of kilobytes in there. If you need to do that, consider a better design. Browsers don't care about the order of headers since different servers (and applications) append headers all the time. Cookies are implemented as HTTP headers, so they should appear like so in the HTTP request:
Cookie: TestCookie=value\r\n
Content-type: text/javascript\r\n
\r\n
I'm not sure what the Status header is supposed to do in your example, but I don't think it's right since the webserver will set a 200 OK response code if the code executes correctly... The header function page has this examaple:
<?php
header("HTTP/1.0 404 Not Found");
?>
With the PHP header function, just make sure you're not writing any text out before issuing it. Otherwise, you could mess everything up.

Is header('Content-Type:text/plain'); necessary at all?

I didn't see any difference with or without this head information yet.

Define "necessary".
It is necessary if you want the browser to know what the type of the file is. PHP automatically sets the Content-Type header to text/html if you don't override it so your browser is treating it as an HTML file that doesn't contain any HTML. If your output contained any HTML you'd see very different outcomes. If you were to send:
<b><i>test</i></b>
Content-Type: text/html; charset=UTF-8 would display in the browser text in bold and italics:
✅ OK
whereas Content-Type: text/plain; charset=UTF-8 would display in the browser like this:
<b><i>✅ OK</i></b>
TLDR Version: If you really are only outputing plain text with no special characters like < or > then it doesn't really matter, but it IS wrong.

PHP uses Content-Type text/html as default, which is pretty similar to text/plain and this explains why you don't see any differences.
text/plain content-type is necessary if you want to output text as is (including < and > symbols).
Examples:
header("Content-Type: text/plain");
echo "<b>hello world</b>";
// Displays in the browser: <b>hello world</b>
header("Content-Type: text/html");
echo "<b>hello world</b>";
// Displays in the browser with bold font: hello world

It is very important that you tell the browser what type of data you are sending it. The difference should be obvious. Try viewing the output of the following PHP file in your browser;
<?php
header('Content-Type:text/html; charset=UTF-8');
?>
<p>Hello</p>
You will see:
hello
(note that you will get the same results if you miss off the header line in this case - text/html is php's default)
Change it to text/plain
<?php
header('Content-Type:text/plain; charset=UTF-8');
?>
<p>Hello</p>
You will see:
<p>Hello</p>
Why does this matter? If you have something like the following in a php script that, for example, is used by an ajax request:
<?php
header('Content-Type:text/html; charset=UTF-8');
print "Your name is " . $_GET['name']
Someone can put a link to a URL like http://example.com/test.php?name=%3Cscript%20src=%22http://example.com/eviljs%22%3E%3C/script%3E on their site, and if a user clicks it, they have exposed all their information on your site to whoever put up the link. If you serve the file as text/plain, you are safe.
Note that this is a silly example, it's more likely that the bad script tag would be added by the attacker to a field in the database or by using a form submission.

Setting the Content-Type header will affect how a web browser treats your content. When most mainstream web browsers encounter a Content-Type of text/plain, they'll render the raw text source in the browser window (as opposed to the source rendered at HTML). It's the difference between seeing
<b>foo</b>
or
foo
Additionally, when using the XMLHttpRequest object, your Content-Type header will affect how the browser serializes the returned results. Prior to the takeover of AJAX frameworks like jQuery and Prototype, a common problem with AJAX responses was a Content-Type set to text/html instead of text/xml. Similar problems would likely occur if the Content-Type was text/plain.

Say you want to answer a request with a 204: No Content HTTP status.
Firefox will complain with "no element found" in the console of the browser.
This is a bug in Firefox that has been reported, but never fixed, for several years.
By sending a "Content-type: text/plain" header, you can prevent this error in Firefox.

no its not like that,here is Example for the support of my answer ---->the clear difference is visible ,when you go for HTTP Compression,which allows you to compress the data while travelling from Server to Client and the Type of this data automatically becomes as "gzip" which Tells browser that bowser got a zipped data and it has to upzip it,this is a example where Type really matters at Bowser.

We Keep Coding

PHP, A popular general-purpose scripting language that is especially suited to web development.