Say i store pdf files in the database (not important). The users of the application visit a page periodically to grab a stored PDF and print it - for adhesive labels btw.
I am annoyed at the thought of their downloads directory filling up with duplicates of the same document over time since they will download the PDF every time they need to print the labels.
Is there a way to instruct the browser to cache this file? Or any method of relative linking to the users file system possibly? All users will be on Chrome/Firefox using Windows 7 Pro.
Etags will help you to do this. If the file hasn't been updated since the client last downloaded, the server will send a 304 "not modified" response, instead of the file.
If your files are dynamically generated, you will need to manually implement etag generation in PHP rather than relying on the web server.
http://www.php.net/manual/en/function.http-cache-etag.php
I've found a useful solution to my problem.
From the comments on my question, we concluded it would work best to utilize the browser's built in PDF/DOC renderer and download anything else that isn't recognized.
I read this standard: https://www.rfc-editor.org/rfc/rfc6266
This is the solution (header):
Content-Disposition: inline; filename=something.pdf
Instead of attachment, I've used "inline" in order to utilize the browser when necessary.
Most browsers will do this automatically based on the URL. If the URL for a particular PDF blob is constant, the browser will not re-download it unless the server responds that it has changed (by way of HTTP fields).
You should therefore design your site to have "permalinks" for each resource. This could be achieved by having a resource-ID of some sort in the URL string.
As others have said in comments, a server cannot guarantee that a client does ANYTHING in particular; all you can offer are suggestions that you hope most browsers will treat similarly.
Related
A little help, Can a website admin see the location from where i upload the pictures to his server, and when i say location i refer to the link from my computer : C:\user\ ...
Thank you in advance.
You can inspect the exact information that's sent to the server during a regular HTTP file upload with the Network pane in your browser developer console:
As you can see, it only includes the following items (apart from file contents themselves):
File name (without path)
File type (as detected by the browser)
Additionally, JavaScript implements the File interface and allows retrieving file information that might be sent to the server. You can read the API documentation but, in general, the API is designed with security in mind.
No. Per default only the filename will be visible. If this admin wants to get this data on purpose, he may fetch it before upload in some browsers. Modern browsers do not make it easy to do that though. (See this question)
Note that some image formats save some metadata you may not want to share. You can check most of this data here: http://regex.info/exif.cgi
This question can't be answered in general as it might depend on the website respectively the system the website is using.
Many systems won't track where the images come from, also for legal reasons and to save space.
An extreme case with positive answer concerning your question might be Worms or Virus, where the information might be very important.
In Intranets the information might be interesting and legal, so tracking might be reasonable too.
Edit:
Even HTML respectively JavaScript should protect against tracking local locations of uploads this is useless if upload is done in an app inside the website i.e. on Base of Java. Also PDF-Viewer, Flash etc. might be open to those information but this depends on the version of each of those extensions or plugins too.
I have a very simple question, which I have been unable to find a clear answer to. I have a web page which is generated dynamically (so I absolutely do not want it cached), which loads in several thousand images. Since there are so many images, and they never change, I very definitely want the user's browser to cache these images.
I'm applying headers to the HTML page to prevent caching, by following the advice in this solution: How to control web page caching, across all browsers?
My question is: Will this cause the user's browser to also not-cache any images this page contains, or will it cache them? Thank you.
TL;DR the answer is not clear because it is complicated.
There is an ongoing struggle between a drive to do the "right" thing (i.e., follow the standards... which themselves have changed) and a drive to "improve" the standards to achieve better performances or smoother navigation experience for users. So from the application point of view you need to properly use headers such as ETag, If-Modified-Since and Expires, together with cache hinting pragmas, but the browser - or something in the middle such as a proxy - might still decide to override what would be the "clear thing to do".
On the latest Firefox, directly attached to an Apache 2.4 on virtual Ubuntu machine, I have tried with a page (test.html) referring to an image (test.jpg).
When the page is cached, server side I see a single request for the HTML and nothing for the image. What is probably happening is that the "rendering" part of Firefox does request the image (it has to!), but that is entirely supplied by the local cache. This makes sense; if the page has not changed, its content hasn't changed.
When the page is not cached, I see two requests, one for the page and one for the image, to which the server responds with a 304, but that is because I also send the image's Last-Modified header. This also makes sense - if the page has changed, the images might have changed too, so the browser has to know whether this is the case, and can only do so by asking the server (unless the Expires header is used to "assure" the client that the image will not change).
I have not yet tried with an uncached page that responds with a 304. I expect it to generate a single request (no image request to the server), for the same reasons.
What you might want to consider is that your way you will not cache the HTML page but might still perform a thousand image requests (which will yield one thousand 304's, but still). Performances on this kind of event depend on whether the requests are sent independently or back-to-back by using the Keep-Alive HTTP/1.1 extension (has to be enabled and advertised server side).
You should then use the Expires header on the images to tell the client that those resources will not go stale anytime soon.
You might perhaps also want to explore a different approach:
the HTML is cached
images are cached too
the HTML also references a (cached?) Javascript
variable content is loaded by the Javascript via AJAX. That request can be made cache-unfriendly by including a timestamp, without involving the server at all.
This way you can configure the server for caching everything everywhere, except where you make sure it can't via a single crafted request.
I've a php based website and would like browser to cache the images for 30 days .. i am using a shared hosting solution where I do not have access to apache config to enable mod-headers or other modules and so can not use htaccess mechanisms for this.
my site is a regular php app, and has both html contents and images. I would like browser to cache images only. I've seen php's "header" function, but couldn't find a way to force only image cache .. How do i go about it ?
Thanks
As far as I know, if you can't get access to Apache to set the headers, your only other option is to serve images from a PHP script so you can use the PHP Header methods to set the headers.
In this case, you'd need to write a PHP image handler, and replace all your image tags with calls to this handler (e.g. http://mysite.com/imagehandler.php?image=logo.png). You would then have you imagehandler.php script retrieve the image from the file system, set the mime type and cache control headers, and stream the image back to the client.
You could write your own, or if you google, you will find image handler PHP scripts. Either way, make sure you focus on security - don't allow the client to retrieve arbitrary files from your web server, because that would be a fairly major security hole....
I have noticed that files delivered by PHP through readfile or fpassthru techniques are never cached by the browser.
How can I "encourage" browsers to cache items delivered via these methods?
Whether your content is cached or not has nothing to do with readfile() and consorts, but probably the default caching headers issued by the server (that would activate caching for HTML pages and image resources) don't apply when you use PHP to pass through files.
You will have to send the appropriate headers along with your content, telling the browser that caching for this resource is all right.
See for example
Caching tutorial for Web Authors and Webmasters
How to use HTTP cache headers with PHP
I ended up finding this page and using it as a starting point for my own implementation. The example code on this page, along with some of the reading Pekka pointed to, was a great springboard for me.
I am using the header function of PHP
to send the file to the browser with some small code. Its work well
and I have it so that if any one requests it with a referer other than my site
it redirects to a page first.
Unfortunately it's not working with the internet download manager.
What I want to know is how the rabidshare and 4shared sites do this.
You could use sessions to make sure the download is being requested by a valid user.
Not all browsers / softwares that can see web pages will send a Referer to your server. Some sites will make a browser "fingerprint", usually hashed, which might be Referer, User-Agent and a couple of other headers strung together to make a uniquie identifier for that user and thus restrict access as you describe.
Of course, I may have completely missed the point of your post!
A typical design pattern is using a front controller to have a single entry point for all requests. By having a front controller, you can control exactly what the client sees.
You can configure this in Apache so that all requests go through a single file (it's been a while since I've done this because I now concentrate on Java). I think you would need to look at pathinfo documentation for Apache.
This might require a significant change in the rest of your application code. But, the code will be more secure and maintainable in the long run.
I've served images and other binary files through this pattern. This allowed me to easily verify users were authenticated before actually sending them the file. Obfuscation is not security, so if you rely on obfuscating your URL, an attacker may be delayed in getting in, but it is just a matter of time.
Walter
The problem probably is that sending file through php script (with headers you mentioned) doesn't support starting file download at certain position. Download managers use this feature to download file using several simultaneous threads (assuming server gives one thread at certain speed).
For small project I would recommend making a copy of file with unique filename just for download time and redirecting user to this copied file. This way he gets full server download features and it also doesn't load processor as php does. Disadvantages - more disk space required and need to cleanup download directory.