readfile/fpassthru web caching - php

I have noticed that files delivered by PHP through readfile or fpassthru techniques are never cached by the browser.
How can I "encourage" browsers to cache items delivered via these methods?

Whether your content is cached or not has nothing to do with readfile() and consorts, but probably the default caching headers issued by the server (that would activate caching for HTML pages and image resources) don't apply when you use PHP to pass through files.
You will have to send the appropriate headers along with your content, telling the browser that caching for this resource is all right.
See for example
Caching tutorial for Web Authors and Webmasters
How to use HTTP cache headers with PHP

I ended up finding this page and using it as a starting point for my own implementation. The example code on this page, along with some of the reading Pekka pointed to, was a great springboard for me.

Related

Does forcing no-cache on html pages also force no-cache on images?

I have a very simple question, which I have been unable to find a clear answer to. I have a web page which is generated dynamically (so I absolutely do not want it cached), which loads in several thousand images. Since there are so many images, and they never change, I very definitely want the user's browser to cache these images.
I'm applying headers to the HTML page to prevent caching, by following the advice in this solution: How to control web page caching, across all browsers?
My question is: Will this cause the user's browser to also not-cache any images this page contains, or will it cache them? Thank you.
TL;DR the answer is not clear because it is complicated.
There is an ongoing struggle between a drive to do the "right" thing (i.e., follow the standards... which themselves have changed) and a drive to "improve" the standards to achieve better performances or smoother navigation experience for users. So from the application point of view you need to properly use headers such as ETag, If-Modified-Since and Expires, together with cache hinting pragmas, but the browser - or something in the middle such as a proxy - might still decide to override what would be the "clear thing to do".
On the latest Firefox, directly attached to an Apache 2.4 on virtual Ubuntu machine, I have tried with a page (test.html) referring to an image (test.jpg).
When the page is cached, server side I see a single request for the HTML and nothing for the image. What is probably happening is that the "rendering" part of Firefox does request the image (it has to!), but that is entirely supplied by the local cache. This makes sense; if the page has not changed, its content hasn't changed.
When the page is not cached, I see two requests, one for the page and one for the image, to which the server responds with a 304, but that is because I also send the image's Last-Modified header. This also makes sense - if the page has changed, the images might have changed too, so the browser has to know whether this is the case, and can only do so by asking the server (unless the Expires header is used to "assure" the client that the image will not change).
I have not yet tried with an uncached page that responds with a 304. I expect it to generate a single request (no image request to the server), for the same reasons.
What you might want to consider is that your way you will not cache the HTML page but might still perform a thousand image requests (which will yield one thousand 304's, but still). Performances on this kind of event depend on whether the requests are sent independently or back-to-back by using the Keep-Alive HTTP/1.1 extension (has to be enabled and advertised server side).
You should then use the Expires header on the images to tell the client that those resources will not go stale anytime soon.
You might perhaps also want to explore a different approach:
the HTML is cached
images are cached too
the HTML also references a (cached?) Javascript
variable content is loaded by the Javascript via AJAX. That request can be made cache-unfriendly by including a timestamp, without involving the server at all.
This way you can configure the server for caching everything everywhere, except where you make sure it can't via a single crafted request.

Browser cache downloadable file

Say i store pdf files in the database (not important). The users of the application visit a page periodically to grab a stored PDF and print it - for adhesive labels btw.
I am annoyed at the thought of their downloads directory filling up with duplicates of the same document over time since they will download the PDF every time they need to print the labels.
Is there a way to instruct the browser to cache this file? Or any method of relative linking to the users file system possibly? All users will be on Chrome/Firefox using Windows 7 Pro.
Etags will help you to do this. If the file hasn't been updated since the client last downloaded, the server will send a 304 "not modified" response, instead of the file.
If your files are dynamically generated, you will need to manually implement etag generation in PHP rather than relying on the web server.
http://www.php.net/manual/en/function.http-cache-etag.php
I've found a useful solution to my problem.
From the comments on my question, we concluded it would work best to utilize the browser's built in PDF/DOC renderer and download anything else that isn't recognized.
I read this standard: https://www.rfc-editor.org/rfc/rfc6266
This is the solution (header):
Content-Disposition: inline; filename=something.pdf
Instead of attachment, I've used "inline" in order to utilize the browser when necessary.
Most browsers will do this automatically based on the URL. If the URL for a particular PDF blob is constant, the browser will not re-download it unless the server responds that it has changed (by way of HTTP fields).
You should therefore design your site to have "permalinks" for each resource. This could be achieved by having a resource-ID of some sort in the URL string.
As others have said in comments, a server cannot guarantee that a client does ANYTHING in particular; all you can offer are suggestions that you hope most browsers will treat similarly.

What kind of caching machanism is used over at Wikipedia?

If you open up your mozilla Firefox web browser and turn on firebug to check for incoming and outcoming network traffic, you see that, when you look at Wikipedia articles, the amount of cached content is very large.
Unless the article in question has many pictures, most of the content comes from the cache.
I'd like to know whether that is done by the browser itself or if it's some underlying PHP Caching mechanism. (is that what they call memcache?APC?) It works very well so I'd like to know how they do it.
Memcacahe, APC etc are server side data stores. You basically use it as a key value store so you don't have to ping your database all the time.
However, what you're actually seeing is a site being loaded on a primed cache. This is the technique of telling your web server to let the browser know that your commonly used resources haven't changed since the last time you viewed it. This effect is achieved by setting far future headers so that the browser doesn't keep requesting the resources. A lot of sites use this technique, including SO.
Here's a great source to read up on, if you want more info : http://developer.yahoo.com/performance/rules.html

Non-dynamic custom HTTP headers

According to this Mozilla article on Ogg media, media works more seamlessly in the browser with an X-Content-Duration header, giving the length in seconds of the piece.
Assuming that I have that length stored somewhere (certainly in a database, perhaps also in the filename itself (video-file-name.XXX.ogv, where XXX is the time in seconds)), is there any way to form this extra header using only Apache .htaccess settings? I ask because loading the file into a PHP script seems clumsy, especially when PHP will by default add other headers which disable caching, and won't respond properly to range (partial content) requests. Yes, a lot of code could be written in PHP to support ETags and range requests, but it seems like overkill to do all that just to add one header, when Apache has all that functionality built in.
This is the domain of mod_cern_meta. It allows statically assigning extra HTTP headers to files.
You could use a cron job, and generate a *.meta file for every video.
I don't have examples, but you should be able to use mod_header to specify HTTP Response headers at the .htaccess level.
Of course, the question of where should I add the header really depends on how you are accessing it. If you are just hitting a static resource for download, adding it via Apache makes sense. However, you mention a DB. If you decide to store those files in a database, then you have some API providing the file, in which case that API implementation should append the header and not offload to apache.
Also, if the dynamic data you are wanting ever requires processing to determine (its not in the filename or etc) then you're already using some code engine to achieve it, just let that add the header.
This is the kind of thing you do with a mod_perl extension, to intercept these requests and add additional headers before allowing Apache to continue handling it.
One purely PHP approach which might work is to have the requests route through PHP using mod_rewrite, add the additional header, but then let Apache handle the rest by using the virtual function.
Alternatively, you could use your database of durations to contruct a static .htaccess file which uses mod_header to insert the correct duration header for each requested file.

php http headers

Was wondering a couple of things.
Does http headers cache everything on the page. And if i have some javascript files will it cache them as well for subsequent pages, or is it more complicated then that. Example: If I cache all javascript files on page1.php will the files still be cached on page2.php or does it cache files for page1.php only apply to page1.php.
The other question is...
Should I scrap http headers and just use APC and if so how complicated is it, or in fact is it possible to use both(asking cuz yslow says to use http headers). Thanks for any info, Ive been reading but these questions weren't really answered in the text.
Your web server will take care of caching for you if you're just serving up regular .js files. The .js files will be downloaded the first time they are linked from one of your pages. When the user re-loads that page, or goes to another page entirely that uses the same .js file, the browser will used the cached copy. This applies when you load scripts via <script src="code.js"></script> tags.
That's if you have standalone, separate .js files. If, on the other hand, you JavaScript code buried in the HTML your PHP scripts generate, for example:
<script type="text/javascript">
alert("Hello world!");
</script>
...these scripts will be re-generated each time your .php file is loaded. If you're looking to cache the output of your PHP scripts then you will need to manage caching yourself by setting the appropriate HTTP headers from your PHP scripts, be that via the Cache-Control family of headers or the If-Modified-Since and ETag style of headers.
Caching and PHP files don't generally go together, though, since you're usually generating dynamic content that changes based on user input, the time of day, cookies, etc. As caching is purely an optimization the general programming warning against premature optimization applies. If you mess up your HTTP headers you can cause yourself a lot of headaches (believe me on that!). As a rule of thumb, you can probably just let Apache or IIS take care of advanced HTTP things like this and only muck around with HTTP headers if you have a specific need to do so.
I think you're confusing the different types of caching. You've talked about 3 or 4 very different things here.
browser caching -- any normal browser will cache images, JS files, and CSS files between pages. Meaning, the second time a browser wants to display any particular image from your site, it will load it from it's local disk cache instead of going back to your server for it. All this stuff just happens -- don't mess around with it, and it just works.
(exceptions: browsing user has turned off caching, you've changed headers to avoid caching, your mime.types aren't set up correctly so the browser doesn't treat these files correctly.)
server-side content caching -- if your pages are rendering slowly ON THE SERVER, you can use various disk-and-RAM caching schemes to keep the output around, and prevent the server from having to render each page each time. This only works for fairly static sites or static parts of pages.
APC content caching -- APC has commands that let you stuff arbitrary content into a server-side RAM cache. If a piece of your system takes a long time to render, but can be reused by many server hits, this is a good choice.
APC code caching -- Your text PHP scripts are "pseudo-compiled", then sent to the PHP runtime for execution. This "pseudo-compile" stage can be very slow and is redundant, so APC caches the "psuedo-compiled" PHP stage in RAM. It can speed up a whole website quite handily.
Sorry if this is TMI.

Categories