Caching HTTP responses when they are dynamically created by PHP - php

I think my question seems pretty casual but bear with me as it gets interesting (at least for me :)).
Consider a PHP page that its purpose is to read a requested file from filesystem and echo it as the response. Now the question is how to enable cache for this page? The thing to point out is that the files can be pretty huge and enabling the cache is to save the client from downloading the same content again and again.
The ideal strategy would be using the "If-None-Match" request header and "ETag" response header in order to implement a reverse proxy cache system. Even though I know this far, I'm not sure if this is possible or what should I return as response in order to implement this technique!

Serving huge or many auxiliary files with PHP is not exactly what it's made for.
Instead, look at X-accel for nginx, X-Sendfile for Lighttpd or mod_xsendfile for Apache.
The initial request gets handled by PHP, but once the download file has been determined it sets a few headers to indicate that the server should handle the file sending, after which the PHP process is freed up to serve something else.
You can then use the web server to configure the caching for you.
Static generated content
If your content is generated from PHP and particularly expensive to create, you could write the output to a local file and apply the above method again.
If you can't write to a local file or don't want to, you can use HTTP response headers to control caching:
Expires: <absolute date in the future>
Cache-Control: public, max-age=<relative time in seconds since request>
This will cause clients to cache the page contents until it expires or when a user forces a page reload (e.g. press F5).
Dynamic generated content
For dynamic content you want the browser to ping you every time, but only send the page contents if there's something new. You can accomplish this by setting a few other response headers:
ETag: <hash of the contents>
Last-Modified: <absolute date of last contents change>
When the browser pings your script again, they will add the following request headers respectively:
If-None-Match: <hash of the contents that you sent last time>
If-Modified-Since: <absolute date of last contents change>
The ETag is mostly used to reduce network traffic as in some cases, to know the contents hash, you first have to calculate it.
The Last-Modified is the easiest to apply if you have local file caches (files have a modification date). A simple condition makes it work:
if (!file_exists('cache.txt') ||
filemtime('cache.txt') > strtotime($_SERVER['HTTP_IF_MODIFIED_SINCE'])) {
// update cache file and send back contents as usual (+ cache headers)
} else {
header('HTTP/1.0 304 Not modified');
}
If you can't do file caches, you can still use ETag to determine whether the contents have changed meanwhile.

Related

How to prevent apache from sending connection close header <s>on 304 answers</s>

I have the following setup:
Some files are dynamically generated dependent on some (only a few) session parameters. Since they have not a great diversity, i allow caching in proxys/browsers. The files get an etag on their way, and the reaction of the whole web application at first glance seems correct: Files served in correct dependence from session situations, traffic saved.
And then this erroneous behavior:
But at closer inspection, i found that in his answer in case of a 304 for those dynamically generated files, apache wrongly sends a "Connection: close" Header instead of the normally sent "Connection: KeepAlive". What he should do is: Simply do not manipulate anything concerning "connection".
I cannot find any point where to pinpoint the cause of this behavior: Nowhere in the apache config files is anything written except one single line in one single file where it is instructed to send a keepalive - which it does - as long as it does not send a 304 response for a dynamically generated file. Nowhere in PHP do i instruct that guy to send anything other than keepalives (and the latter only to try to counter the connection:close).
The apache does not do this when it serves "normal" (non-dynamic) files (with 304 answers). So in some way i assume that maybe the PHP kernel is the one who interferes here without permission or being asked. But then, an added "Header set Connection 'Keep-Alive'" in the apache config, which i too added to counter the closing of the connection, does not work, too. Normally, when you put such a header set rule (not of "early" type) in the apache config, this rules takes action AFTER finalization of any subordered work on the requested document (thus AFTER finalization of the PHP output). But in my case, nothing happens - well: in case of a 304 response. In all other cases, everything works normal and correct.
Since there do some other files go over the line at a page request, i would appreciate to get the apache rid of those connection-closures.
Is there anybody who has an idea what to do with this behavior?
P.S.: One day (and a good sleep) later, things are clearing:
The culprit in this case was a shortsightedly (on my behalf) copied example snippet, which had "HTTP/1.>>>0<<< 304" (the Null!) in it.
This protocol version number gets (correctly) post-processed by apache (after everything otherwise - including any apache modules work - got finalized), in that it decides not to send a "Connection: Keep-Alive" over the wire, since that feature didn't exist in version HTTP/1.0.
The problem in this case was to get the focus on the fact that everything inside php and apache modules worked correctly and something in the outer environment of them must have been wrong, and thereafter to shift the view to anything in the code that could possibly influence that outer environment (e.g. the protocol version).

Download counter function inaccurate

We're using a normal PHP download script (with headers etc) to serve files to users.
The issue however is that with some browsers and large downloads the download script is requested multiple times. NGINX logs show the requests with a 206 status code, (suggesting chunked streaming?) which is strange because we don't serve any streamable content?
Regardless, this means the download script is requested multiple times and thus the MySQL function of +1'ing the download counter for the file is run multiple times per download.
We tried using sessions, but seeing as the download is severed from an external server + domain we have no way to clear said sessions after they're set.
We're using Laravel with NGINX + MySQL, any help would be appreciated. Thanks!
Looking at the spec and the headers for the request which would ultimately result in a 206 response, there was one header which struck out which looks like it would be perfect.
The header in question is the Content-Range header which could look like the following:
Content-Range: bytes 21010-47021/47022
What this is saying is it wants to grab bytes 21010-47021 out of 47022 bytes. All you should need to be worried about is the first number here and if it's 0 or not. If the header was set and the first number is 0, you can assume it's just beginning the download and you should increment the counter.

How can I stop X-Sendfile from serving the full video file when IE9 makes the request?

I ran into an issue where regardless of the preload attribute setting, when IE9 makes a request for a video, and the video is served by x-sendfile, the request is listed as pending and keeps the connection open.
Consequently, if you have 10 videos trying to load, IE9 will quickly eat up all of its available connections and the browser will not be able to make further requests.
When telling IE9 to request the same video from Apache, without X-Sendfile, Apache serves a small portion of the file as a 200 request. Then the browser makes a request later when the play button is pressed to serve a range of the file.
It looks like X-Sendfile is causing Apache to serve the entire file initially, instead of serving just a part of it.
How can I make X-Sendfile requests via Apache function the same as a regular request to Apache?
Setting the "Accept-Ranges" header like header("Accept-Ranges: bytes"); tells IE9 to attempt to stream the file by default, instead of serve it in one chunk.
It's recommended to check that the HTTP request is version 1.1 though before setting, since 1.0 doesn't support the header.
if (isset($_SERVER['SERVER_PROTOCOL']) && $_SERVER['SERVER_PROTOCOL'] === 'HTTP/1.1') {
header("Accept-Ranges: bytes");
}
I wasn't able to find any documentation on this anywhere, so I'm posting my solution here.

Varnish doesn't cache without expire header

On my server I have Varnish (caching) running on port 80 with Apache on 8080.
Varnish caches very well when I set the headers like below:
$this->getResponse()->setHeader('Expires', '', true);
$this->getResponse()->setHeader('Cache-Control', 'public', true);
$this->getResponse()->setHeader('Cache-Control', 'max-age=2592000');
$this->getResponse()->setHeader('Pragma', '', true);
But this means people cache my website without ever retrieving a new version when its available.
When I remove the headers people retrieve a new version every page reload (so Varnish never caches).
I can not figure out what goes wrong here.
My ideal situation is people don't cache the html on the client side but leave that up to Varnish.
My ideal situation is people don't cache the html on the client side but leave that up to Varnish.
What you want is varnish to cache the resource and serve it to clients, and only generate a new version if something changed. The easiest way to do this is have varnish cache it for a long time, and invalidate the entry in varnish (with a PURGE command) when this something changed.
By default, varnish will base its cache rules on the headers the back-end supplies. So, if your php code generates the headers you described, the default varnish vcl will adjust its caching strategy accordingly. However, it can only do this in generalized, safe way (e.g. if you use a cookie, it will never cache). You know how your back-end works, and you should change the cache behavior of varnish not by sending different headers from the back-end, but write a varnish .vcl file. You should tell varnish to cache the resource for a long time even though the Cache-Control of Max-Age headers are missing (set the TimeToLive ttl in your .vcl file). Varnish will then serve the generated entry until ttl has passed or you purged the entry.
If you've got this working, there's a more advanced option: cache the resource on the client but have the client 'revalidate' it every time it want to use it. A browser does this with an HTTP GET plus If-Modified-Since header (your response should include a Date header to provoke his behavior) or If-Match header (your response should include an ETAG header to provoke his behavior). This saves bandwith because varnish can respond with a 304 NOT-MODIFIED response, without sending the whole resource again.
Simplest approach is just to turn down the max-age to something more reasonable. Currently, you have it set to 30 days. Try setting it to 15 minutes:
$this->getResponse()->setHeader('Cache-Control', 'max-age=900');
Web caching is a somewhat complicated topic, exacerbated by some very different client interpretations. But in general this will lighten the load on your web server while ensuring that new content is available in a reasonable timeframe.
Set your standard HTTP headers for the client cache to whatever you want. Set a custom header that only Varnish will see, such as X-Varnish-TTL Then in your VCL, incorporate the following code in your vcl_fetch sub:
if (beresp.http.X-Varnish-TTL) {
C{
char *ttl;
/* first char in third param is length of header plus colon in octal */
ttl = VRT_GetHdr(sp, HDR_BERESP, "\016X-Varnish-TTL:");
VRT_l_beresp_ttl(sp, atoi(ttl));
}C
unset beresp.http.X-Varnish-TTL; // Remove so client never sees this
}

flash xml won't cache

I have a flash app that requests xml generated by a php script. The data doesn't change much, and I would like flash to cache the xml instead of loading it every time. I've been checking my access logs, and every single time i reload a page with the flash app on it, the php file is accessed and the xml downloaded.
I've read that flash doesn't control what is cached, as it just requests something from the browser, but nothing else that flash downloads (i.e. mp3 files that are supplied by the xml) doesn't get cached. So I'm not really sure what that means.
I've googled the heck out of this, but everything I find is telling me how to keep flash from caching stuff.
Here's the code I used (AS3):
xmlLoader.load(new URLRequest("info.php"));
It's not a huge deal but sometimes it takes 2-3 seconds to load if my host decides to respond slowly.
edit: I got the headers:
HEAD /beatinfo.php HTTP/1.1[CRLF]
Host: spoonhands.com[CRLF]
Connection: close[CRLF]
User-Agent: Web-sniffer/1.0.37 (+http://web-sniffer.net/)[CRLF]
Accept-Encoding: gzip[CRLF]
Accept-Charset: ISO-8859-1,UTF-8;q=0.7,*;q=0.7[CRLF]
Cache-Control: no-cache[CRLF]
Accept-Language: de,en;q=0.7,en-us;q=0.3[CRLF]
Referer: http://web-sniffer.net/[CRLF]
Try looking at the header function. (http://php.net/manual/en/function.header.php)
That is the one i always use to send html headers so that it will not be cached. I think you can send headers so that it will be cached instead.

Categories