Does enabling cache affect dynamic content?
For example, on one of my PHP sites, the cache header is set to:
Cache-Control: public, max-age=21600
Does that affect dynamic content as well?
If so, then what is the standard way to cache a dynamic PHP site? Presumably, you would cache static content (e.g. images), but not dynamic content (e.g. html, text, etc). How and where would you accomplish that?
Yes. The answer is yes. This controls browser and some ISP caching. It will cache dynamic content for the time you specify in many locations. Not all. Not all browsers will follow the rules, not all ISPs will follow the rules. Famously, AOL used to disregard people's cache rules and cache everything for strange times leading to broken pages on the early web.
On a dynamic page that isn't cached, you can use this value to set pages that change relatively in-frequently to cache for 10 minutes to an hour. For images, css, js files and things for longer. Caching for 8 hours is probably too much, as it is much longer than 1 hour, but only reduces the total number of hits for the content by 7.
Other Caching
There are other, probably more reliable, ways to cache content. You can look into query caching, file caching, memcached. All of these options can also be used to cache things other than content as well. They will all help you speed up repetitive actions.
Query Caching
Many databases, MySQL being the defacto standard, offer query caching. This will cache the results of queries on tables that haven't been updated since the last time the query was run. Perfect for normalized look-up tables. Ideal for tables that are updated only once in a blue moon. Works well for tables that are updated about once and hour to even every 10 minutes. For tables that change more often than that, they will produce limited time-saving results.
File Caching and Memcached
These can be used to cache key/value pairs of information. They can be page_url/page_content or page_list/array_of_pages_in_site or any other key/value pairs you need. This is how most people go about caching pages for 10-60 minutes these days. They are reliable, controlled on the server, and can be flushed instantly if needed. They don't need to be time based, if your logic is written correctly you can treat these like the query cache. Only when information is updated do you flush that key from the cache, and then the page updates instantly and otherwise sits in cache.
Header Cache
Which brings us back to header cache. It is still smart to cache here for about 10 minutes even with the other caches in place. The other caches still require requests of the server, which can slow it down. While this won't reduce that pressure by much, it will reduce it. And it doesn't take much effort to install.
The idea is your website is made up of resources [pages, images, scripts even], and you provide an expire limit for everyone of them, or invalidate cache for elements you modified [e.g. you added a new post on the homepage, or you edited an entry].
A common solution is to use a reverse proxy like Varnish who will provide cached stuff very fast to the client, and will look for newer versions if cache header changed in your content.
The caching header generator process is up to you - you can find some ideas here.
Related
In PHP,
What are the Advantage and Disadvantage of Caching in Web Development In PHP, how does it affect Database?
Caching works in many different ways, but for PHP specifically I can think of a few ways;
Database calls; they are slow, require computation, and can be quite intensive. If you've got repeated calls, caching the query is golden. There's two levels; at the PHP side where you control the cache, and at the database side where they do.
Running PHP code means the webserver calls the PHP interpreter, it parses the code, and the run it. A PHP cacher can cache the parsing part, and go straight for the running part. THen there's the next generation of directly compiling PHP code to C, and run it from there (like Facebook does).
Computations; if you're doing math or heavy lifting of repeated operation, you can cache the result instead of calculate it every time.
Advantages;
speed
less resources used
reuse
being smart
Disadvantages;
stale data
overhead
complexity
I'll only deal with the disadvantages here;
First, stale data; this means that when you use cached content/data you are at risk of presenting old data that's no longer relevant to the new situation. If you've cached a query of products, but in the mean time the product manager has delete four products, the users will get listings to products that don't exists. There's a great deal of complexity in figuring out how to deal with this, but mostly it's about creating hashes/identifiers for caches that mean something to the state of the data in the cache, or business logic that resets the cache (or updates, or appends) with the new data bits. This is a complicated field, and depends very much on your requirements.
Then overhead is all the business logic you use to make sure your data is somewhere between being fast and being stale, which lead to complexity, and complexity leads to more code that you need to maintain and understand. You'll easily lose oversight of where data exists in the caching complex, at what level, and how to fix the stale data if you get it. It can easily get out of hand, so instead of doing caching on complex logic you revert to simple timestamps, and just say that a query is cached for a minute or so, and hope for the best (which, admittedly, can be quite effective and not too crazy). You could give your cache life-times (say, it will live X minutes in the cache) vs. access (it will live for 10 requests) vs. timed (it will live until 10pm) and variations thereof. The more variation, the more complexity, of course.
However, having said that, caching can turn a bog of a system into quite a snappy little vixen without too much effort or complexity. A little can get you a long way, and writing systems that use caching as a core component is something I'd recommend.
The main advantage, and also the goal, of caching is speeding up loading and minimizing system resources needed to load a page.
The main disadvantage is how it's implemented by the developers, and then maintaining proper caching system for the website, making it properly manageable by the Admin.
The above statements are purely said in general terms.
Caching is used to reduce hefty/slow operations (heavy calculations/parsing/database operations) which will consistently product the same result. Caching this result will reduce the server load and speed up the application (because the hefty/slow operation does not need executing)
The disadvantage is that it'll often increase complexity of the application, because the cache should be purged/altered when the result of the operation will no longer be the result cached.
Simple example: a website whose navigation is stored in the database could cache the navigation once the navigation has been fetched from the database, thus reducing the total amount of db-calls, because we no longer need to execute a query to retrieve the navigation.
When the navigation changes (e.g. a page had been added), the cached value for the navigation should be rebuilt, because the navigation that has been cached does not yet reflect the latest change: the new page is not present there.
When a page is Cached, instead of regenerating the page every time, they store a copy of what they send to your browser. The next time a visitor requests the same page, the script will know it'd already generated one recently, and simply send that to the browser without all the hassle of re-running database queries or searches.
Advantage of Caching:
Reduce load on Web Servers and Database
Page downloads faster
Disadvantage:
As information is stored in cache, it make page the heavy.
Sometimes the updated information doesnot show as the cache is not updated
Advantages and disadvantages of caching in web development totally depends upon our context!
Main advantage is reduce data retrieval time either from database or at page loading time.
and disadvantage is separate maintenance or using third party services or tools for that.
Hi this is more of an information request really.
I'm currently working on a pretty large event listing website and have started thinking about some caching for the data sets being used.
I have been messing with APC this week and have seen some real improvements during testing however what I'm struggling to get my head around is best practices and techniques required when trying to cache data that changes frequently.
Say for example the user hits the home page, this by default displays the latest 10 events happening and if that user is logged in those events are location specific. Is it possible to deploy some kind of caching system when dealing with logged in states and data that changes frequently, the system currently allows the user to "show more events: which is an ajax request to pull extra results from the db.
I haven't really found anything on this as I'm not sure what to search for but I'm really interested to know the techniques used for advanced caching systems that deal especially with data that changes and data specific to users?
I mean is it even worth it? are the other performance boosters when dealing with this sort of criteria?
Any articles or tips and info on this will be greatly appreciated!! Please let me know if any other info is required!!
Your basic solutions are:
file cache
memcached/redis
APC
Each used for slightly different goal.
File cache is usually something that you utilize when you can pre-render files or parts of them. It is used in templating solutions, partial views (mvc), css frameworks. That sort of stuff.
Memcached and redis are both more or less equal, except redis is more of a noSQL oriented thing. They are used for distributed cache ( multiple servers , same cached data ) and for storing the sessions, if you have cluster of webservers.
APC is good for two things: opcode cache and data cache. Faster then memcached, but works for each server separately.
Bottom line is : in a huge project you will use all of them. Each for a different task.
So you have opcode caching, which speeds things up by saving already compiled PHP files in cache.
Then you have data caching, where you save variables or objects that take time to get like data built from SQL queries.
Then you have output caching, which is where you save entire blocks of your webpages in files, and output those files instead of building that block of your webpage on each request.
I once wrote a blog post about how to do output caching:
http://www.spotlesswebdesign.com/blog.php?id=17
If it's location specific, and there are a billion locations, your best bet is probably output caching assuming you have a lot of disc space, but you will have to use your head for what is best, as each situation is very different when it comes to how best to apply caching.
If done correctly, using memcached or similar solutions can give huge boosts to site performance. By altering the cached data directly instead of rehydrating it from the database you can bypass the database entirely for data that either doesn't need to be saved or can be trivially rebuilt. Since the database is often the most critical component in web applications, any load you can take off it is a bonus.
On the other hand, making sure your database queries are as light and efficient as possible will have a much larger impact on performance than most cache tweaks.
I need help find the right caching solution for a clients site. Current site is centoOS, php, mysql, apache using smarty templates (i know they suck but it as built by someone else). The current models/methods use fairly good OO structure but there are WAY to many queries being done for some of the simple page functions. I'm looking try find some sort of caching solution but i'm a noob when it comes to this and don't know what is available that would fit the current site setup.
It is an auction type site with say 10 auctions displayed on one page at one time -- the time and current bid on each auction being updated via an ajax call returning json every 1 second (it's a penny auction site like beezid.com so updates every second are necessary). As you can see, if the site gets any sort of traffic the number of simultaneous requests could be huge. Obviously this data changes every second because the json data returned has the updated time left in the auction, and possibly updated bid amounts and bid users for each auction.
What i want is the ability to cache certain pages for a given amount of time or based on other changed variable. For example, memory caching the page that displays 10 auctions and only updating that cache copy when one of the auctions ends. Or even the script above that returns json string data every second. If i was able to cache the first request to this page in memory, serve the following requests from memory and then re-cache it again after 1 second, that could potentially reduce the serverload a lot. But i don't know if this is even possible or if the overhead of doing something like this outweights any request load savings.
I looked into xcache some but i couldn't find a way that i could set a particular cache time on a specific page or based on other variables?!? Maybe i'm missed something, but does anyone have a recommendation on a caching scheme that would work for these requirements?
Mucho thanks for any input you might have...
Cacheing can be done using many methods. Memcached springs to mind as being suited to your task. but if the site is ultra busy you may run out of ram.
When I do caching I often use a simple file cache, while it does involve at least one stat call to determine the freshness of the cached content it is still fast and marginally better than calling a sql server.
If you must call a sql server then it may pay to use a memory(heap) table to store much of the precomputed data. this technique is no more efficient than memcached, probably less so but saves you installing memcached.
DC
Zend_Cache can do what you want, and a lot more. It supports a lot of backends, including xcache and memcache, and allows you to cache data, full pages, partial pages, and well, just about anything you can imagine :p.
And in case you are wondering : you can use the Zend_Cache component by itself, you don't have to use the complete Zend framework for your application.
I have a personal caching class, which can be seen here ( based off WordPress' ):
http://pastie.org/988427
I recently learned about memcache and it said to memcache EVERYTHING:
http://highscalability.com/blog/2010/5/17/7-lessons-learned-while-building-reddit-to-270-million-page.html
My first thought was just to keep my class with the current functions and make it use memcache instead -- is there any downside to doing this?
The main difference I see is that memcache stays on with the server from page to page, while mine is for 1 page load. The problem I see arising, and this is with any system, is that they're dynamic. They change all the time. Whether its search results, visible products, etc. etc. If it's all cached, won't the create a problem?
Is there a way to handle this? Obviously if something is bringing back the same results everytime it would be cached, but that's why I was doing it on a per page load basis. I'm sure there is a way to handle this, or is the cache time usually set between 5 minutes and an hour?
You certainly need a good caching strategy to avoid problems with stale data. With dynamic data and using memcached, you would have to delete cache entries on certain data updates. You can't just rely on cache entries to time out. With memcached you can cache just parts of your dynamic content for a specific page generation. If you want to cache complete html documents, I would recommend using a reverse proxy like varnish (http://varnish-cache.org/).
We are using Smarty Templates on our LAMP site but my question would also apply to a site running Memcached (which we are planning to also bring online). Many of the pages of our user generated site have different views depending on who is looking at them. For instance, a list of comments where your own comments are highlighted. There would need to be a unique cache-id for each logged in user for this specific view. My question is, in this scenario, would you not even cache these views? Or is the overhead in creating/using the cache (either for smarty or memcached), low enough that you still would see some benefit to the cache?
Unless individual users are requesting the pages over and over again, there's no point caching this sort of thing, and I expect the overhead of caching will vastly exceed the performance benefits, simply since the cache hit ratio will be poor.
You may be better off looking into caching fragments of your site that do not depend on the individual user, or fragments that will be the same for a large number of page impressions (e.g. content that is the same for a large subset of your users).
For example - on this page you might want to cache the list of related questions, or the tag information, but there's probably little point caching the top-bar with reputation info too aggressively, since it will be requested relatively infrequently.
If the view code isn't too complicated just cache the data and generate the view each time.