Say I for some reason want to serve my CSS through PHP (because of pre-processing, merging, etc). What do I need to do in my PHP to make this work well? Other than the most obvious:
header('content-type: text/css; charset=utf-8');
What about headers related to caching, modification times, etags, etc? Which ones should I use, why and how? How would I parse incoming headers and respond appropriately (304 Not Modified for example)?
Note: I know this can be tricky and that it would be a lot easier to just do what I want to do with the CSS before I deploy it as a regular CSS file. If I wanted to do it that way, I wouldn't have asked this question. I'm curious to how to do this properly and would like to know. What I do or could do beforehand with the CSS is irrelevant; I just want to know how to serve it properly :)
Note 2: I really would like to know how to do this properly. I feel most of the activity on this question has turned into me defending why I would want to do this, rather than getting answers on how to do this. Would very much appreciate it if someone could answer my question rather than just suggesting things like SASS. I'm sure it's awesome, and I might try it out sometime, but that's not what I'm asking about now. I want to know how to serve CSS through PHP and learn how to deal with the caching and things like that properly.
A commendable effort. Caching gets way too little good will. Please enjoy my short prose attempting to help you on your way.
The summary
Sending an ETag and a Last-Modified header will enable the browser to send a If-Modified-Since and a If-None-Match header back to your server on subsequent requests. You may then, when applicable, respond with a 304 Not Modified HTTP status code and an empty body, i.e. Content-Length: 0. Including a Expires header will help you to serve fresh content one day when the content has indeed changed.
The apprentice
Sounds simple enough, but it can be a bit tricky to get just right. Luckily for us all, there is really good guidance available.
Once you get it up and running, please turn to REDbot to help you smooth out any rough corners you may have left in.
The expert
For the value of the ETag, you will want to have something you can reproduce, but will still change whenever the content does. Otherwise you will not be able to tell whether the incoming value matches or not. A good candidate for a reproducible value which still changes when the content does, is an MD5 hash of the mtime of the file being served through the cache. In your case, it would probably be a sum for all the files being merged.
For Last-Modified the logical answer is the actual mtime of the file being served. Why neglect the obvious. Or for a group of files, as in your case, use the most recent mtime in the bunch.
For Expires, simply choose an appropriate TTL, or time-to-live, for the asset. Add this number to the asset's mtime, or the value you chose for Last-Modified, and you have your answer.
You may also want to include Cache-Control headers to let possible proxies on the way know how to properly serve their clients.
The scholar
For a more concrete response to your question, please refer to these questions predating yours:
What headers do I want to send together with a 304 response?
Get Browser to send both If-None-Match and If-Modified-Since
HTTP if-none-match and if-modified-since and 304 clarification in PHP
Is my implementation of HTTP Conditional Get answers in PHP is OK?
The easiest way to serve CSS (or JavaScript) through PHP would be to use Assetic, a super-useful PHP asset manager similar to Django's contrib.staticfiles or Ruby's Jammit. It handles caching and cache invalidation, dynamic minification, compression, and all the "tricky bits" that were mentioned in other answers.
To understand how to write your own asset server properly, I strongly recommend you read Assetic's source code. It's very commented and readable, and you'll learn a lot about best practices regarding caching, minification, and everything else that Assetic does so well.
One common patter is to include a meaningless GET parameter. In fact, stack exchange sites do exatly this:
<link ... href="http://cdn.sstatic.net/stackoverflow/all.css?v=0285b0392b5c">
The v (version) is presumably a hash of some kind, probably of the css file itself. They do not store the old sheets, it's just a way to force the browser to download the new file and not use the cached one.
With this setup, it is safe to set Cache-Control:max-age to a large value.
The ETag will make server reply 304 if the file is not modified, you might as well use the same hash:
header('ETag: "' . md5("path to css file") . '"');
I just finished explaining here why I don't think PHP-processed CSS is a good idea; I believe most people who implement it would be better served by another application structure. Take a look.
If you must do it, making caching work will require keeping track of each variant independently and having the client send a parameter which uniquely identifies that variant (so you can say "not modified").
The Content-Type header is a good start, but not the tricky bit...
You have to add query string at end of the javascript file, that is good option to say it is new file until that browsers are think same css files
www.example.com/css/tooltip.css?version1.0
or
www.example.com/css/tooltip.css?12-01-2012
so browser is going to understand this new files it reloads again, keep it in cache up to next release,and easy to maintainable if you append automatic date using php at end of the query string.
Related
Consider the minimal example: Using php I have a form that you enter text and it produces an image of the text. When I then change the text and update, I don't see the new image because I assume, it is being cached. Is there some way to automatically remove this one image file from the cache when I update it?
This is frequently handled by adding a random string or timestamp to the query.
i.e. <img src="/images/image.jpg?timestamp=1357571065" />
The typical solution is what ceejazoz gave in this answer: an additional timestamp added as a request parameter. That way the url is different each time, so no cache or proxy will deliver a cached version.
However although that works it is an ugly workaround.
The clean solution is to specify headers when delivering the image. Those headers take care that the image is not cached. That is what headers are there for: defining how resources are meant to be used. The drawback: the out-of-the-box configuration of todays http servers used to deliver static images does not offer to specify such headers. Because in 99,99% of all cases it makes no sense. So you will have to write an own mechanism. Not really difficult, but effort nonetheless. Using the above workaround certainly is easier and less hassle.
And to give a precise answer to your actual question:
Cleaning 'the' cache from a single cached object usually is not possible. Though that actually depends on what cache you are talking about. If it is just the browsers cache whilst you are developing (testing), then just make a 'deep reload' (something like CTRL-SHIFT-R or CTRL-F5, depending on your browser). But this clears all cached objects of the current page. There is no easy way to clear a server side cache or even a proxy inbetween.
Add a query string to the image perhaps containing the julian time + a random number so for example your image URL becomes:
//.../myimage.jpg?112233445566-954967254
You can re validate the cache
header("Cache-Control: no-cache, must-revalidate");
I had a thought...
Dunno if it's a good one or a bad one.
I am working on an image-less/responsive theme, for a SMF Fork. I was thinking since it's written in PHP, would it be valid to add php include a "style.php" in the header, containing the all the styles for the pages.
I was thinking this would give me two major benefits. One, would be that I could add variables start to the css file. Two, it would be one less HTTP request. I know that pagespeed and yslow would bitch about the css being included inside the page in between tags, but it is none the wiser, correct?
As far as I can tell, I see alot of benefits in doing it this way regardless of what pagespeed/yslow thinks. I could even do this with javascript, maybe...
I wonder if the IE maximum 4096 CSS rules would still apply?
I am a PHP Ultra Noob, but have a good amount of experience in web design. I can't seem to fine a reason "not" to do it. Any experts willing to share their thought on this idea?
I don't think it's a good idea. If you want to use variables in your CSS, look at SASS or LESS. Regarding the additional request, CSS is static, so if you do your job on the server side, the browser will retrieve the CSS only once, and subsequent requests will use the cached copy.
I don't think this can be harmful, however that's quite a diverge from standard development, so it's not a good idea just for this. Also, since nobody does it, is must not be such a smart invention.
A css file is generally more valid for better speed, because it is requested but once, and then cached for a long time. It is 1 extra request for the whole browsing session if they haven't got in in their cache already compared to the same css over and over and over again in the head tags, making your actual pages load slower. All in all, after a few requests a separate (cachable) file usually already wins out, provided you set it to be cachable for a long time (don't worry about people not seeing css changes, if you change your css, just add some query parameter like /styles.css?rev=1. You don't use that parameter, you just increase it whenever your css changes thus making the client request a fresh copy.
That doesn't mean you can't use PHP (or nodejs/less for that matter) to create or serve your CSS file, variables are indeed nice to have. If going the less route, DO convert it to css once on your own server instead of bothering clients with heavy javascript to convert it again and again.
You can actually include anything as a CSS file if it's valid CSS (and actually even if it's not, I suppose):
<link rel="stylesheet" type="text/css" href="/style.php">
//style.php
<?php header('Content-type: text/css');
$style = 'bold';
?>
strong {
font-weight: <?php echo $style ?>;
}
Why don't people make .php files for their CSS and JavaScript files?
Adding <?php header("Content-type: text/javascript; charset: UTF-8"); ?> to the file makes it readable by browsers, and you can do the same thing to css files by setting the Content-type property to text/css.
It lets you use all the variables of PHP and methods into the other languages. Letting you, as an example, change the theme main colors depending on user preferences in css, or preloading data that your javascript can use on document load.
Are there bad sides of using this technique?
People do it more often than you think. You just don't get to see it, because usually this technique is used in combination with URL rewriting, which means the browser can't tell the difference between a statically-served .css file and a dynamic stylesheet generated by a PHP script.
However, there are a few strong reasons not to do it:
In a default configuration, Apache treats PHP script output as 'subject to change at any given time', and sets appropriate headers to prevent caching (otherwise, dynamic content wouldn't really work). This, however, means that the browser won't cache your CSS and javascript, which is bad - they'll be reloaded over the network for every single page load. If you have a few hundred page loads per second, this stuff absolutely matters, and even if you don't, the page's responsivity suffers considerably.
CSS and Javascript, once deployed, rarely changes, and reasons to make it dynamic are really rare.
Running a PHP script (even if it's just to start up the interpreter) is more expensive than just serving a static file, so you should avoid it unless absolutely necessary.
It's pretty damn hard to make sure the Javascript you output is correct and secure; escaping dynamic values for Javascript isn't as trivial as you'd think, and if those values are user-supplied, you are asking for trouble.
And there are a few alternatives that are easier to set up:
Write a few stylesheets and select the right one dynamically.
Make stylesheet rules based on class names, and set those dynamically in your HTML.
For javascript, define the dynamic parts inside the parent document before including the static script. The most typical scenario is setting a few global variables inside the document and referencing them in the static script.
Compile dynamic scripts into static files as part of the build / deployment process. This way, you get the comfort of PHP inside your CSS, but you still get to serve static files.
If you want to use PHP to generate CSS dynamically after all:
Override the caching headers to allow browsers and proxies to cache them. You can even set the cache expiration to 'never', and add a bogus query string parameter (e.g. <link rel="stylesheet" type="text/css" href="http://example.com/stylesheet.css?dummy=121748283923">) and change it whenever the script changes: browsers will interpret this as a different URL and skip the cached version.
Set up URL rewriting so that the script's URL has a .css extension: some browsers (IE) are notorious for getting the MIME type wrong under some circumstances when the extension doesn't match, despite correct Content-Type headers.
Some do, the better thing to do is generate your JS/CSS scripts in PHP and cache them to a file.
If you serve all of your CSS/JS files using PHP, then you have to invoke PHP more which incurs more overhead (cpu and memory) which is unnecessary when serving static files. Better to just let the web server (Apache/nginx/lighttpd/iis etc) do their job and serve those files for you without the need for PHP.
Running the PHP engine does not have a zero cost, in either time or CPU. And since CSS and JavaScript files usually rarely change, having them run through the engine to do absolutely nothing is pointless; better to let the browser cache them when appropriate instead.
Here’s one method I’ve used: The HTML page contains a reference to /path/12345.stylesheet.css. That file does not exist. So .htaccess routes the request to /path/index.php. That file (a) does a database request, (b) creates the CSS, (c) saves the file for next time, (d) serves the CSS to the browser. That means that the very next time there’s a request for /path/12345.stylesheet.css, there actually is a physical static file there to be served by Apache as normal.
Oh, and whenever the styles rules are edited (a) the static file is deleted, and (b) the reference ID is changed, so that the HTML page will in future contain a reference to /path/10995.stylesheet.css, or whatever. (Actually, I use a UNIX timestamp.)
I use a similar method to create image thumbnails: create the file on first request, and save a static file in the same place for future requests. I’ve never had occasion to do the same for javascript, but there’s no fundamental reason why not.
This also means that I don’t need to worry about caching headers in PHP: only the first invocation of each CSS file (or image thumbnail) goes through PHP, and if that is served with anti-caching headers, that’s no great problem.
Sometimes you might have to dynamically create javascript or styles.
the issue is webservers are optimized to serve static content. Dynamically generating content with php can be a huge perforamce hit because it needs to be generated on each request.
It's not a bad idea, or all that uncommon, but there are disadvantages. Caching is an important consideration - you need to let browsers cache when the content is the same, but refresh when it will vary (e.g. when someone else logs in). Any query string will immediately stop some browsers caching, so you'll need some rewrite rules as well as HTTP headers.
Any processing that takes noticeable time, or requires a lock on something (e.g. session_start) will hold up the browser while it waits for the asset.
Finally, and quite importantly, mixing languages can make editing code harder - syntax highlighting and structure browsers may not cope, and overlapping syntax can lead to ugly things like multiple backslash escapes.
In javascript, it can be useful to convert some PHP data into (JSON) variables, and then proceed with static JS code. There is also a performance benefit to concatening multiple JS files ago the browser downloads them all in one go.
For CSS, there are specific languages such as Less which are more suited to the purpose. Using LessPHP (http://leafo.net/lessphp/) you can easily initialize a Less template with variables and callbacks from your PHP script.
PHP is often used as a processor to generate dynamic content. It takes time to process a page and then send it. For the sake of efficiency (both for the server and time spent in programming) dynamic JS or CSS files are only created if there isn't a possible way for the static file to successfully accomplish its intended goal.
I recommend only doing this if absolutely you require the assistance of a dynamic, database driven processor.
The bad sides: plenty, but to name just a few:
It'll be dead slow: constructing custom stylesheets for each request puts a huge load on the server, not something you want.
Designers create CSS files, programmers shouldn't (in some cases shouldn't be allowed to). It's not their job/their speciality.
Mixing JS and PHP is, IMHO, one of the greatest mistakes on can make. With jQuery being a very popular lib, using the $ sign, it might be a huge source for bugs and syntax errors. Besides that: JS is a completely different language than virtually any other programming language. Very few people know how to get the most out of it, and letting PHP developers write vast JS scripts often ends in tears. JavaScript is a functional OO (prototypal) language. People who don't full understand these crucial differences write bad code as a result. I know, because I've written tons of terrible JS code.
Why would you want to do this, actually? PHP allows you to change all element's classes while generating the page, just make sure the classes have corresponding style rules in your css files and the colours will change as you want them, without having to send various files, messing with headers and all the headaches that comes with this practice
If you want more reasons why you shouldn't do this, I can think of at least another few dozens. That said: I can only think of 1 reason why you would think of doing this: it makes issues caused by client-side cached scripts less of an issue. Not that it should be an issue in the first place, but hey...
I am considering building a website using php to deliver different html depending on browser and version. A question that came to mind was, which version would crawlers see? What would happen if the content was made different for each version, how would this be indexed?
The crawlers see the page you show them.
See this answer for info on how Googlebot identifies itself as. Also remember that if you show different content to the bot than what the users see, your page might be excluded from Google's search results.
As a sidenote, in most cases it's really not necessary to build separate HTML for different browsers, so it might be best to rethink that strategy altogether which will solve the search engine indexing issue as well.
The crawlers would see the page that you have specified for them to see via your user-agent handling.
Your idea seems to suggest trying to trick the indexer somehow, don't do that.
You'd use the User-Agent HTTP Header, which is often sent by the browsers, to identify the browsers/versions that interest you, and send a content that would be different in some cases.
So, the crawlers would receive the content you'd send for their specific User-Agent string -- or, if you don't code a specific case for those, your default content.
Still, note that Google doesn't really appreciate if you send it content that is not the same as what real users get (and if a someone using a given browser sends a link to some friend, who doesn't see the same thing as he's using another browser, this will not feel "right").
Basically : sending content that differs on the browser is not really a good practice ; and should in most/all cases be avoided
That depends on what content you'll serve to bots. Crawlers usually identify themselves as some bot or other in the user agent header, not as a regular browser. Whatever you serve these clients is what they'll index.
The crawler obviously only sees the version your server hands to it.
If you create a designated version for the search engine, this version would be indexed (and eventually makes you banned from the index).
If you have a version for the default/undetected browser - this one.
If you have no default version - nothing would be indexed.
Sincerely yours, colonel Obvious.
PS. Assuming you are talking of contents, not markup. Search engines do not index markup.
I need to confirm something before I go accuse someone of ... well I'd rather not say.
The problem:
We allow users to upload images and embed them within text on our site. In the past we allowed users to hotlink to our images as well, but due to server load we unfortunately had to stop this.
Current "solution":
The method the programmer used to solve our "too many connections" issue was to rename the file that receives and processes image requests (image_request.php) to image_request2.php, and replace the contents of the original with
<?php
header("HTTP/1.1 500 Internal Server Error") ;
?>
Obviously this has caused all images with their src attribute pointing to the original image_request.php to be broken, and is also the wrong code to be sending in this case.
Proposed solution:
I feel a more elegant solution would be:
In .htaccess
If the request is for image_request.php
Check referrer
If referrer is not our site, send the appropriate header
If referrer is our site, proceed to image_request.php and process image request
What I would like to know is:
Compared to simply returning a 500 for each request to image_request.php:
How much more load would be incurred if we were to use my proposed alternative solution outlined above?
Is there a better way to do this?
Our main concern is that the site stays up. I am not willing to agree that breaking all internally linked images is the best / only way to solve this. I refuse to tell our users that because of something WE changed they must now manually change the embed code in all their previously uploaded content.
Ok, then you can use mod_rewrite capability of Apache to prevent hot-linking:
http://www.cyberciti.biz/faq/apache-mod_rewrite-hot-linking-images-leeching-howto/
Using ModRwrite will probably give you less load than running a PHP script. I think your solution would be lighter.
Make sure that you only block access in step 3 if the referer header is not empty. Some browsers and firewalls block the referer header completely and you wouldn't want to block those.
I assume you store image paths in database with ids of images, right?
And then you query database for image path giving it image id.
I suggest you install MemCached to the server and do caching of user requests. It's easy to do in PHP. After that you will see server load and decide if you should stop this hotlinking thing at all.
Your increased load is equal to that of a string comparison in PHP (zilch).
The obfuscation solution doesn't even solve the problem to begin with, as it doesn't stop future hotlinking from happening. If you do check the referrer header, make absolutely certain that all major mainstream browsers will set the header as you expect. It's an optional header, and the behavior might vary from browser to browser for images embedded in an HTML document.
You likely have sessions enabled for all requests (whether they're authenticated or not) -- as a backup plan, you can also rename your session cookie name to something obscure (edit: obscurity here actually doesn't matter as long as the cookie is set for your host only (and it is)) and check that a cookie by that name is set in image_request.php (no cookie set would indicate that it's a first-request to your site). Only use that as a fallback or redundancy check. It's worse than checking the referrer.
If you were generating the IMG HTML on the fly from markdown or something else, you could use a private key hash strategy with a short-live expire time attached to the query string. Completely air tight, but it seems way over the top for what you're doing.
Also, there is no "appropriate header" for lying to a client about the availability of a resource ;) Just send a 404.