Simple question.
Do browsers cache PHP generated CSS and script files automatically, just like CSS/JS files?
Sure, barring explicit acts to prevent caching. The browser has no way of knowing if the file was a static or dynamically generated resource.
If the URL remains the same, and there aren't hints in the HTTP responses to tell the browser otherwise, they can be cached.
If the URL includes dynamic information, the browser probably won't be able to take advantage of caching.
Changing the URL by adding a timestamp as a dummy parameter (e.g. http://host/myfile.php?t=17279273) is one of the ways you can prevent caching since the browser sees the slight change as a new resource.
Jonathon's answer suggesting the addition of a timestamp to prevent caching is a good one.
A useful tip along these lines is to append the creation/last modified date of a file. Doing this means that while unchanged the browser will cache the file, but when you update the file those changes are forced to your users.
It's not always the best option, but worth noting.
Related
Consider the minimal example: Using php I have a form that you enter text and it produces an image of the text. When I then change the text and update, I don't see the new image because I assume, it is being cached. Is there some way to automatically remove this one image file from the cache when I update it?
This is frequently handled by adding a random string or timestamp to the query.
i.e. <img src="/images/image.jpg?timestamp=1357571065" />
The typical solution is what ceejazoz gave in this answer: an additional timestamp added as a request parameter. That way the url is different each time, so no cache or proxy will deliver a cached version.
However although that works it is an ugly workaround.
The clean solution is to specify headers when delivering the image. Those headers take care that the image is not cached. That is what headers are there for: defining how resources are meant to be used. The drawback: the out-of-the-box configuration of todays http servers used to deliver static images does not offer to specify such headers. Because in 99,99% of all cases it makes no sense. So you will have to write an own mechanism. Not really difficult, but effort nonetheless. Using the above workaround certainly is easier and less hassle.
And to give a precise answer to your actual question:
Cleaning 'the' cache from a single cached object usually is not possible. Though that actually depends on what cache you are talking about. If it is just the browsers cache whilst you are developing (testing), then just make a 'deep reload' (something like CTRL-SHIFT-R or CTRL-F5, depending on your browser). But this clears all cached objects of the current page. There is no easy way to clear a server side cache or even a proxy inbetween.
Add a query string to the image perhaps containing the julian time + a random number so for example your image URL becomes:
//.../myimage.jpg?112233445566-954967254
You can re validate the cache
header("Cache-Control: no-cache, must-revalidate");
Why don't people make .php files for their CSS and JavaScript files?
Adding <?php header("Content-type: text/javascript; charset: UTF-8"); ?> to the file makes it readable by browsers, and you can do the same thing to css files by setting the Content-type property to text/css.
It lets you use all the variables of PHP and methods into the other languages. Letting you, as an example, change the theme main colors depending on user preferences in css, or preloading data that your javascript can use on document load.
Are there bad sides of using this technique?
People do it more often than you think. You just don't get to see it, because usually this technique is used in combination with URL rewriting, which means the browser can't tell the difference between a statically-served .css file and a dynamic stylesheet generated by a PHP script.
However, there are a few strong reasons not to do it:
In a default configuration, Apache treats PHP script output as 'subject to change at any given time', and sets appropriate headers to prevent caching (otherwise, dynamic content wouldn't really work). This, however, means that the browser won't cache your CSS and javascript, which is bad - they'll be reloaded over the network for every single page load. If you have a few hundred page loads per second, this stuff absolutely matters, and even if you don't, the page's responsivity suffers considerably.
CSS and Javascript, once deployed, rarely changes, and reasons to make it dynamic are really rare.
Running a PHP script (even if it's just to start up the interpreter) is more expensive than just serving a static file, so you should avoid it unless absolutely necessary.
It's pretty damn hard to make sure the Javascript you output is correct and secure; escaping dynamic values for Javascript isn't as trivial as you'd think, and if those values are user-supplied, you are asking for trouble.
And there are a few alternatives that are easier to set up:
Write a few stylesheets and select the right one dynamically.
Make stylesheet rules based on class names, and set those dynamically in your HTML.
For javascript, define the dynamic parts inside the parent document before including the static script. The most typical scenario is setting a few global variables inside the document and referencing them in the static script.
Compile dynamic scripts into static files as part of the build / deployment process. This way, you get the comfort of PHP inside your CSS, but you still get to serve static files.
If you want to use PHP to generate CSS dynamically after all:
Override the caching headers to allow browsers and proxies to cache them. You can even set the cache expiration to 'never', and add a bogus query string parameter (e.g. <link rel="stylesheet" type="text/css" href="http://example.com/stylesheet.css?dummy=121748283923">) and change it whenever the script changes: browsers will interpret this as a different URL and skip the cached version.
Set up URL rewriting so that the script's URL has a .css extension: some browsers (IE) are notorious for getting the MIME type wrong under some circumstances when the extension doesn't match, despite correct Content-Type headers.
Some do, the better thing to do is generate your JS/CSS scripts in PHP and cache them to a file.
If you serve all of your CSS/JS files using PHP, then you have to invoke PHP more which incurs more overhead (cpu and memory) which is unnecessary when serving static files. Better to just let the web server (Apache/nginx/lighttpd/iis etc) do their job and serve those files for you without the need for PHP.
Running the PHP engine does not have a zero cost, in either time or CPU. And since CSS and JavaScript files usually rarely change, having them run through the engine to do absolutely nothing is pointless; better to let the browser cache them when appropriate instead.
Here’s one method I’ve used: The HTML page contains a reference to /path/12345.stylesheet.css. That file does not exist. So .htaccess routes the request to /path/index.php. That file (a) does a database request, (b) creates the CSS, (c) saves the file for next time, (d) serves the CSS to the browser. That means that the very next time there’s a request for /path/12345.stylesheet.css, there actually is a physical static file there to be served by Apache as normal.
Oh, and whenever the styles rules are edited (a) the static file is deleted, and (b) the reference ID is changed, so that the HTML page will in future contain a reference to /path/10995.stylesheet.css, or whatever. (Actually, I use a UNIX timestamp.)
I use a similar method to create image thumbnails: create the file on first request, and save a static file in the same place for future requests. I’ve never had occasion to do the same for javascript, but there’s no fundamental reason why not.
This also means that I don’t need to worry about caching headers in PHP: only the first invocation of each CSS file (or image thumbnail) goes through PHP, and if that is served with anti-caching headers, that’s no great problem.
Sometimes you might have to dynamically create javascript or styles.
the issue is webservers are optimized to serve static content. Dynamically generating content with php can be a huge perforamce hit because it needs to be generated on each request.
It's not a bad idea, or all that uncommon, but there are disadvantages. Caching is an important consideration - you need to let browsers cache when the content is the same, but refresh when it will vary (e.g. when someone else logs in). Any query string will immediately stop some browsers caching, so you'll need some rewrite rules as well as HTTP headers.
Any processing that takes noticeable time, or requires a lock on something (e.g. session_start) will hold up the browser while it waits for the asset.
Finally, and quite importantly, mixing languages can make editing code harder - syntax highlighting and structure browsers may not cope, and overlapping syntax can lead to ugly things like multiple backslash escapes.
In javascript, it can be useful to convert some PHP data into (JSON) variables, and then proceed with static JS code. There is also a performance benefit to concatening multiple JS files ago the browser downloads them all in one go.
For CSS, there are specific languages such as Less which are more suited to the purpose. Using LessPHP (http://leafo.net/lessphp/) you can easily initialize a Less template with variables and callbacks from your PHP script.
PHP is often used as a processor to generate dynamic content. It takes time to process a page and then send it. For the sake of efficiency (both for the server and time spent in programming) dynamic JS or CSS files are only created if there isn't a possible way for the static file to successfully accomplish its intended goal.
I recommend only doing this if absolutely you require the assistance of a dynamic, database driven processor.
The bad sides: plenty, but to name just a few:
It'll be dead slow: constructing custom stylesheets for each request puts a huge load on the server, not something you want.
Designers create CSS files, programmers shouldn't (in some cases shouldn't be allowed to). It's not their job/their speciality.
Mixing JS and PHP is, IMHO, one of the greatest mistakes on can make. With jQuery being a very popular lib, using the $ sign, it might be a huge source for bugs and syntax errors. Besides that: JS is a completely different language than virtually any other programming language. Very few people know how to get the most out of it, and letting PHP developers write vast JS scripts often ends in tears. JavaScript is a functional OO (prototypal) language. People who don't full understand these crucial differences write bad code as a result. I know, because I've written tons of terrible JS code.
Why would you want to do this, actually? PHP allows you to change all element's classes while generating the page, just make sure the classes have corresponding style rules in your css files and the colours will change as you want them, without having to send various files, messing with headers and all the headaches that comes with this practice
If you want more reasons why you shouldn't do this, I can think of at least another few dozens. That said: I can only think of 1 reason why you would think of doing this: it makes issues caused by client-side cached scripts less of an issue. Not that it should be an issue in the first place, but hey...
Say I for some reason want to serve my CSS through PHP (because of pre-processing, merging, etc). What do I need to do in my PHP to make this work well? Other than the most obvious:
header('content-type: text/css; charset=utf-8');
What about headers related to caching, modification times, etags, etc? Which ones should I use, why and how? How would I parse incoming headers and respond appropriately (304 Not Modified for example)?
Note: I know this can be tricky and that it would be a lot easier to just do what I want to do with the CSS before I deploy it as a regular CSS file. If I wanted to do it that way, I wouldn't have asked this question. I'm curious to how to do this properly and would like to know. What I do or could do beforehand with the CSS is irrelevant; I just want to know how to serve it properly :)
Note 2: I really would like to know how to do this properly. I feel most of the activity on this question has turned into me defending why I would want to do this, rather than getting answers on how to do this. Would very much appreciate it if someone could answer my question rather than just suggesting things like SASS. I'm sure it's awesome, and I might try it out sometime, but that's not what I'm asking about now. I want to know how to serve CSS through PHP and learn how to deal with the caching and things like that properly.
A commendable effort. Caching gets way too little good will. Please enjoy my short prose attempting to help you on your way.
The summary
Sending an ETag and a Last-Modified header will enable the browser to send a If-Modified-Since and a If-None-Match header back to your server on subsequent requests. You may then, when applicable, respond with a 304 Not Modified HTTP status code and an empty body, i.e. Content-Length: 0. Including a Expires header will help you to serve fresh content one day when the content has indeed changed.
The apprentice
Sounds simple enough, but it can be a bit tricky to get just right. Luckily for us all, there is really good guidance available.
Once you get it up and running, please turn to REDbot to help you smooth out any rough corners you may have left in.
The expert
For the value of the ETag, you will want to have something you can reproduce, but will still change whenever the content does. Otherwise you will not be able to tell whether the incoming value matches or not. A good candidate for a reproducible value which still changes when the content does, is an MD5 hash of the mtime of the file being served through the cache. In your case, it would probably be a sum for all the files being merged.
For Last-Modified the logical answer is the actual mtime of the file being served. Why neglect the obvious. Or for a group of files, as in your case, use the most recent mtime in the bunch.
For Expires, simply choose an appropriate TTL, or time-to-live, for the asset. Add this number to the asset's mtime, or the value you chose for Last-Modified, and you have your answer.
You may also want to include Cache-Control headers to let possible proxies on the way know how to properly serve their clients.
The scholar
For a more concrete response to your question, please refer to these questions predating yours:
What headers do I want to send together with a 304 response?
Get Browser to send both If-None-Match and If-Modified-Since
HTTP if-none-match and if-modified-since and 304 clarification in PHP
Is my implementation of HTTP Conditional Get answers in PHP is OK?
The easiest way to serve CSS (or JavaScript) through PHP would be to use Assetic, a super-useful PHP asset manager similar to Django's contrib.staticfiles or Ruby's Jammit. It handles caching and cache invalidation, dynamic minification, compression, and all the "tricky bits" that were mentioned in other answers.
To understand how to write your own asset server properly, I strongly recommend you read Assetic's source code. It's very commented and readable, and you'll learn a lot about best practices regarding caching, minification, and everything else that Assetic does so well.
One common patter is to include a meaningless GET parameter. In fact, stack exchange sites do exatly this:
<link ... href="http://cdn.sstatic.net/stackoverflow/all.css?v=0285b0392b5c">
The v (version) is presumably a hash of some kind, probably of the css file itself. They do not store the old sheets, it's just a way to force the browser to download the new file and not use the cached one.
With this setup, it is safe to set Cache-Control:max-age to a large value.
The ETag will make server reply 304 if the file is not modified, you might as well use the same hash:
header('ETag: "' . md5("path to css file") . '"');
I just finished explaining here why I don't think PHP-processed CSS is a good idea; I believe most people who implement it would be better served by another application structure. Take a look.
If you must do it, making caching work will require keeping track of each variant independently and having the client send a parameter which uniquely identifies that variant (so you can say "not modified").
The Content-Type header is a good start, but not the tricky bit...
You have to add query string at end of the javascript file, that is good option to say it is new file until that browsers are think same css files
www.example.com/css/tooltip.css?version1.0
or
www.example.com/css/tooltip.css?12-01-2012
so browser is going to understand this new files it reloads again, keep it in cache up to next release,and easy to maintainable if you append automatic date using php at end of the query string.
Does going from
<script type="text/javascript" src="jquery.js"></script>
to
<script type="text/javascript">
<?php echo file_get_contents('jquery.js'); ?>
</script>
really speed things up?
I am thinking it does because php can fetch and embed a file's contents faster than the client's browser can make a full request for the file, because php isn't going over the network.
Is the main difference that the traditional method can be cached?
It may be faster on the first page load, but on every subsequent load it will be much slower. In the first example, the client browser would cache the result. In the second, it can not.
If you only ever serve one single website in your client's life, then yes, because you only have one HTTP request instead of two.
If you are going to serve multiple sites which all link to the same javascript source file, then you're duplicating all this redundant data and not giving the client a chance to cache the file.
You need to transfer the bytes to the browser in both cases. The only difference is that you save a HTTP request in the latter case.
Also make sure to escape the javascript with CDATA or using htmlspecialchars.
If you include your JS lib in your HTML page, it cannot be cached by the browser. It's generally a good idea to keep the JS separate from the normal HTML code because the browser can cache it and does not need to fetch it on subsequent requests.
So to make it short, it's an optimization that works only if the page is called once by the user and jquery is not used on other pages.
Alternatively, you may want to use the jquery from google apis - with the effect that they are often in the browser's cache anyway, so there is no need to transfer the lib at all.
It does so for that ONE PAGE.
All subsequent pages using the same library (jquery.js downloaded from the same URL) SUFFER, because if you include the reference to the external file yes, it has to be downloaded in an extra connection (which is relatively cheap with HTTP\1.1 and pipelining), BUT - provided your webserver serves it with useful headers (Expires:-header far in the future), the browser caches that download, while with the "optimization" it has to retrieve it with every single content-page.
Also see pages like this one:
http://www.stevesouders.com/blog/2008/08/23/revving-filenames-dont-use-querystring/
(the keyword here is "revving" in connection with those far-future expiration dates)
The first one is better since the browser can cache the script. With the second version it will have to re-download the script every time it loads the page even if the script didn't change.
The only time the second version is an improvement for scripts that cannot be cached by the browser.
It depends on how many files use the same file. However, in most situations this will be slower than your first piece of code, mostly because jquery.js can be cached.
Yes, that would initially be a performance optimization regarding the number of HTTP-requests being used to serve the page - your page will however become a bit bigger per pageload as the jquery.js will be cached in the browser after the first download.
It does if your page is static.
But if its not static your browser will download the page very time while jquery doesn't change but still included. if you use src="jquery.js" and the page changes, the browser will load jquery from cache and not download it again so using src="jquery.js" is actually faster.
I need to confirm something before I go accuse someone of ... well I'd rather not say.
The problem:
We allow users to upload images and embed them within text on our site. In the past we allowed users to hotlink to our images as well, but due to server load we unfortunately had to stop this.
Current "solution":
The method the programmer used to solve our "too many connections" issue was to rename the file that receives and processes image requests (image_request.php) to image_request2.php, and replace the contents of the original with
<?php
header("HTTP/1.1 500 Internal Server Error") ;
?>
Obviously this has caused all images with their src attribute pointing to the original image_request.php to be broken, and is also the wrong code to be sending in this case.
Proposed solution:
I feel a more elegant solution would be:
In .htaccess
If the request is for image_request.php
Check referrer
If referrer is not our site, send the appropriate header
If referrer is our site, proceed to image_request.php and process image request
What I would like to know is:
Compared to simply returning a 500 for each request to image_request.php:
How much more load would be incurred if we were to use my proposed alternative solution outlined above?
Is there a better way to do this?
Our main concern is that the site stays up. I am not willing to agree that breaking all internally linked images is the best / only way to solve this. I refuse to tell our users that because of something WE changed they must now manually change the embed code in all their previously uploaded content.
Ok, then you can use mod_rewrite capability of Apache to prevent hot-linking:
http://www.cyberciti.biz/faq/apache-mod_rewrite-hot-linking-images-leeching-howto/
Using ModRwrite will probably give you less load than running a PHP script. I think your solution would be lighter.
Make sure that you only block access in step 3 if the referer header is not empty. Some browsers and firewalls block the referer header completely and you wouldn't want to block those.
I assume you store image paths in database with ids of images, right?
And then you query database for image path giving it image id.
I suggest you install MemCached to the server and do caching of user requests. It's easy to do in PHP. After that you will see server load and decide if you should stop this hotlinking thing at all.
Your increased load is equal to that of a string comparison in PHP (zilch).
The obfuscation solution doesn't even solve the problem to begin with, as it doesn't stop future hotlinking from happening. If you do check the referrer header, make absolutely certain that all major mainstream browsers will set the header as you expect. It's an optional header, and the behavior might vary from browser to browser for images embedded in an HTML document.
You likely have sessions enabled for all requests (whether they're authenticated or not) -- as a backup plan, you can also rename your session cookie name to something obscure (edit: obscurity here actually doesn't matter as long as the cookie is set for your host only (and it is)) and check that a cookie by that name is set in image_request.php (no cookie set would indicate that it's a first-request to your site). Only use that as a fallback or redundancy check. It's worse than checking the referrer.
If you were generating the IMG HTML on the fly from markdown or something else, you could use a private key hash strategy with a short-live expire time attached to the query string. Completely air tight, but it seems way over the top for what you're doing.
Also, there is no "appropriate header" for lying to a client about the availability of a resource ;) Just send a 404.