Preventing the browser caching a linked file - php

I've been tasked to maintain a PHP website with a function which automatically generates RTF files and provides a link to download. Each time the previously generated file is overwritten by the new one.
However, it seems that upon attempting to download the generated file, the browser will sometimes retrieve a cached version which is normally different from the latest version (so you get the same document as last time rather than the one you requested).
I managed to work around this by giving each generated file a unique name based on the current timestamp but this generates lots of clutter in the directory which will need to be cleaned out periodically. Ideally I would like to tag this file such that the browser won't cache it and will get the latest version every time. How might I achieve this?

In addition to the possibility of adding a random GET string to the URL (often the easiest way), it would also be possible to solve this by sending the right headers.
Because you are generating static files, this would require a setting in a .htaccess file. It would have to look like this:
<FilesMatch "\.(rtf)$">
Header set Cache-Control "no-store"
</FilesMatch>

Easiest way? Instead of linking to http://yoursite.com/file.rtf, link to http://yoursite.com/file.rtf?<?=time()?> . This will append a query string parameter which will vary each time the client requests it, and it won't therefore be cached.

You could tag the current time value to the file you serve
.../file.rtf?15072010141000
That way you don't have to generate unique names, but you can ensure future requests are not cached.

Although the simple solution of using no-cache header as suggested by Pekka will work, you will lose the potential benefit of caching if same file is downloaded several times.
If the RTF file is big and you would like your user to enjoy benefit of caching when the file is actually not changed, you might want to check this answer:
How to use HTTP cache headers with PHP

Related

Overwriting files properly

I am trying to manage caching on heavily used webpage written in PHP. I have marked some cacheable sections of PHP code, which I want to execute only pre-cache when administrator make changes in CMS. For this, I use this method:
I have file (for example "index-source.php") with some marked ares of PHP code, which are interpretable alone. When admin change some settings, these marked parts are executed and replaced with result (for example MySQL queries which reads menu items from DB are replaced with generated HTML menu). Resulted file is saved as new "index.php", which still have some PHP code, which can't be optimized by caching.
Now to my problem
If we assume, that this server is heavilly load, which means there is for example 100 requests per second, which in PHP requires file index.php. If I will use file_put_contents() to overwrite this index.php with new pre-cached version, is there any risk, that some requests will be interrupted, because of locked/not fully overwritten file? Basically I want to somehow update my PHP file and assure that PHP will include complete old or complete new version of that file or wait few milliseconds until file is overwritten. I dont't want PHP to fail require or load partially overwritten file.
Is that possible? Thanks
file_put_contents is not what you want.
Have a look at this project, and dive into the source to get a feel for what challenges you may have to face as well as the solution chosen.
https://github.com/PHPSocialNetwork/phpfastcache

Which is more efficient: redirecting to an image,or pushing the image through php?

An external party has to use the dynamicly generated images that are used on our site. For that, I created a function that serves the image through a url. E.g. http://test.com/image/$code/$width/$height. So, it finds the image with code $code, resizes it to $width and $height, and then serves the image itself (not the url). The external party can now use <img src="http://test.com/image/$code/$width/$height" />
This is working fine, but of course this is a quite a hit on the server every time it is used, especially if the image is used in newsletters that are sent to 1000s of people.
I can make it a little more efficient by checking if the image is already existing and then returning it without generating it first, of course. But I was also looking at redirection.
So, basically, my question is if it is more efficient to generate/load the image and then serve it, or doing a 301 redirect to the actual image. I know that this also has some drawbacks, most notably needing two requests per image, but I am wondering how that compares to pushing an entire image through php and the image generation process.
Update:
Maybe I should clarify things a bit.
I am interested in server load, not so much in UX. Most probably the latter is worse off by redirecting, as it does the double amount of server requests).
The difference in the two situations is as follows:
Image generation situation:
- Check if image exists. If not, generate.
- Then do this
$path = BASE_PATH."/".$image->Filename;
$mimetype = image_type_to_mime_type(exif_imagetype($path));
header("Content-type: ".$mimetype);
echo readfile($path);
die;
Image redirect situation:
- Check if image exists. If not, generate.
- Then do this
$location = BASE_HREF."/".$image->Filename;
$mimetype = image_type_to_mime_type(exif_imagetype($path));
header('Location: '.$location,true,301); //or maybe a 303 (other)
die;
Obviously, in the second situation php has to do less and apache more (serve 2 files instead of one). In the first situation apache has to do more and php less. So the question is, is the extra work that php has to do more or less than the extra work that Apache has to do?
I don't know, but my gut feeling is that if you're already running a PHP script, then the additional cost of writing some headers and calling readfile() will be trivial.
More importantly, is the file going to be used more than once by the same user?
If so, you could benefit significantly by making the file cacheable. If you redirect to a static file, the web server will automatically take care of the caching. If you serve the file through PHP, you will have to cache it yourself.
To do this, you need to:
Compute a Last-Modified date or an ETag (unique ID).
Check the request headers for an If-Modified-Since: or If-None-Match: header.
Compare the header values against the computed date or ETag.
If your file is newer or doesn't match the ETag (or the request headers don't exist), send a 200 response including the file.
Otherwise, send a 304 response without the file.
In both cases, send the current Last-Modified: or ETag header, and maybe a sensible Expires: header.
First, it is critical that you check to see if the image already exists before you generate the thing. It isn't going to help "a little." Creating a .jpg is a lot more expensive than checking for a filename.
The rest of your question isn't completely clear to me, but I'll try for an answer anyway.
301 redirects aren't generally used for what you're talking about. They're to tell a search spider that a page has moved permanently. Functionally, it doesn't work any differently than a regular redirect.
Moving on, I'm even more confused. "and then serves the image itself (not the url)"
Servers pretty much always serve both. The url tells it which file to send and then it sends the binary data it finds at that URL. And files are always saved somewhere, even if it's just a tmp folder where it will be deleted. Unless you've done something exotic, in which case, ask yourself why you did that.
If the image will never be used again (not even if the same user re-vists the same page) simply send the remote server a link to where the lives in the temporary folder on your server after the image was created. If the image might be re-used, save it somewhere. The local server would be easier and the extra request by the remote won't slow things down a noticeable amount.

Why querystrings after images & css files?

I have seen on various sites a querystring followed by a numbers for images and css files. When I look at the source code (via Chrome Developer), the cached css files and images do not have the number in the query string in their names. I have also seen on sites where the number changes in the querystrings when I refresh the page.
As example:
myimage.jpg?num=12345
myStyles.css?num=82943
After refresh:
myimage.jpg?num=67948
myStyles.css?num=62972
Can anyone explain to me what could possibly be the purpose of these querystrings short of tracking?
Often times developers use those query strings with random numbers (or version numbers) to force the browser to request a fresh copy and avoid caching of those files since the request is different each time.
So if you have a file /image.png but it is a generated image, like perhaps a captcha or something, you could follow it with a random number querystring /image.png?399532 which the browser would then not pull image.png from its cache, but instead will download a fresh copy from the server.
Prevent caching (the query string can provide a unique URL each time the file is updated causing the browser to download a new copy and not load a stale one from its cache)
Versioning (similar to #1 but with a more specific purpose)
The query string is for version controling it force to the navigator to reload the css and the image instead of use the cache

Secure documents with PHP

I have a simple login / access control system to secure some restricted pages, but within these pages there are links that need to be secure, i.e. Word documents. Therefore if I keep these resources within the webroot they could accessible via a URL. What is the best method to secure these resources that are within the restricted page. I know I could password protect the folder but the user would then be challenged twice, one for the restricted page and then for the resource link. Any advice ?
You have a few options here, depending on your use-case.
Use PHP to serve the file. Basically, either intercept all attempts to read the file by PHP (using a mod_rewrite rule), or link directly to PHP and put the file(s) below the document root. Then use something like fpassthru to send the file to the browser. Note that you must properly set the content-type headers. Also note that this will eat up a lot of server resources since the server needs to read the entire file in PHP and send it, so it's easy, but not light.
$f = fopen('file.doc', 'r');
if (!$f) {
//Tell User Can't Open File!
}
header('Content-Type: ...');
header('Content-Length: '.filesize('file.doc'));
fpassthru($f);
die();
The main benefit to doing it this way is that it's easy and portable (will work on all servers). But you're trading off valuable server resources (since while PHP is serving the file, it can't be serving another page) for that benefit...
Use the web-server to send the file using something like X-SendFile (Lighttpd), X-SendFile (Apache2/2.2) or X-Accel-Redirect (NginX). So you'd redirect all requests to the file to PHP (either manually or rewrite). In PHP you'd do your authentication. You'd send the Content-Type headers, and then send a header like X-SendFile: /foo/file.doc. The server will actually send the file, so you don't have to (it's far more efficient than sending from PHP natively).
header('Content-Type: ...');
header('X-SendFile: /foo/file.doc');
die();
The main benefit here is that you don't need to serve the file from PHP. You can still do all of your authentication and logging that you'd like, but free up PHP as soon as you start transferring the file.
Use something like mod_secdownload (lighttpd) or mod_auth_token (Apache). Basically, you create a token in PHP when you generate the link to the file. This token is a combination of a MD5 of a secret password combined with the current timestamp. The benefit here, is the URL is only valid for how long you specify in the configuration (60 seconds by default). So that means that the link you give out will only be active for 60 seconds, and then any further attempts to see the content will generate a 400 series error (I'm not positive which off the top of my head).
$filename = '/file.doc';
$secret = 'your-configured-secret-string';
$time = dechex(time());
$token = md5($secret . $filename . $time);
$url = "/downloads/$token/$time$filename";
echo "Click Here To Download";
The main benefit to doing it this way is that there is very little overhead associated with the implementation. But you have to be comfortable with having URLs being valid for a set time only (60 seconds by default)...
Push it off onto a CDN to handle. This is like option #3 (the one above), but uses a CDN to handle the file serving instead of your local server. Some CDNs such as EdgeCast provide a similar functionality where you set a token which expires after a set amount of time. This case will work nicely if you have a lot of traffic and can justify the expense of a CDN. (Note: no affiliation with the linked CDN, only linked because I know they offer the functionality).
As far as how I'd personally do it, I've done all of the above. It really matters what your use-case is. If you're building a system that's going to be installed on shared hosts or multiple different servers which you don't control, stick to the first option. If you have full control and need to save server resources, do one of the other two.
Note: there are other options than these three. These are just the easiest to implement, and most of the other options are similar enough to these to fit into the category...
I havenĀ“t tried it with word documents (only with images), but I would try to serve the document directly from php, see my answer about images.
It would be something like an a tag linking to a php page that serves a Word document as its content type.

Can PHP be used inside an XML file?

I am trying to generate a RSS feed from a mysql database I already have. Can I use PHP in the XML file that is to be sent to the user so that it generates the content upon request? Or should I use cron on the PHP file and generate an xml file? Or should I add the execution of the php file that generates the xml upon submitting the content that is to be used in the RSS? What do you think is the best practice?
All three approaches are technically possible. However, I would not use cron, because it delays the update process of your XML-files after the database content has changed.
You can easily embed PHP-Code in your XML-files, you just have to make sure that the files are interpreted as PHP on the serverside, either by renaming them with a *.php extension or by changing the server directives in the .htaccess-file.
But I think that the best practice here is to generate new XML-files upon updating the database contents. I guess that the XML-files are viewed more often than the database content changes, so this approach reduces the server load.
Use a cron to automate a PHP script that builds the XML file. You can even automate the mail part as well in your PHP.
The third method you mentioned. I don't understand how cron can be used here, if there are data coming in users' request. The first method cannot be implemented.
Set the Content-type header to text/xml and have your PHP script generate XML just as it would generate any other content. You may want to consider using caching though, so you don't overwhelm the server by accident.

Categories