We are using Smarty Templates on our LAMP site but my question would also apply to a site running Memcached (which we are planning to also bring online). Many of the pages of our user generated site have different views depending on who is looking at them. For instance, a list of comments where your own comments are highlighted. There would need to be a unique cache-id for each logged in user for this specific view. My question is, in this scenario, would you not even cache these views? Or is the overhead in creating/using the cache (either for smarty or memcached), low enough that you still would see some benefit to the cache?
Unless individual users are requesting the pages over and over again, there's no point caching this sort of thing, and I expect the overhead of caching will vastly exceed the performance benefits, simply since the cache hit ratio will be poor.
You may be better off looking into caching fragments of your site that do not depend on the individual user, or fragments that will be the same for a large number of page impressions (e.g. content that is the same for a large subset of your users).
For example - on this page you might want to cache the list of related questions, or the tag information, but there's probably little point caching the top-bar with reputation info too aggressively, since it will be requested relatively infrequently.
If the view code isn't too complicated just cache the data and generate the view each time.
Related
Just looking for a piece of advice. On one of our webpages we have a debate/forum site. Everytime a user request the debate page, he/she will get a list of all topics (and their count of answers etc.).
Too when the user request a specific topic/thread, all answers to the thread will be shown to the user a long with username, user picture, age, number of totalt forum-posts from the poster of the answer.
All content is currently retrieved by using an MySQL-query everytime the page is accessed. But this is however starting to get painfully slow (especially with large threads, +3000 answers).
I would like to cache the debate entries somehow, to speed up this proces. However the problem is, that if I cache the entries it self, number of post etc. (which is dynamic, of course), will not always be up to date.
Is there any smart way of caching the pages/recaching them when stuff like this is updated? :)
Thanks in advance,
fischer
You should create a tag or a name for the cache based on it's data.
For example for the post named Jake's Post you could create an md5 of the name, this would give you the tag 49fec15add24931728652baacc08b8ee.
Now cache the contents and everything to do with this post against the tag 49fec15add24931728652baacc08b8ee. When the post is updated or a comment is added go to the cache and delete everything associated with 49fec15add24931728652baacc08b8ee.
Now there is no cache and it will be rebuilt when the next visitors arrives to new the post.
You could break this down further by having multiple tags per post. E.g you could have a tag for comments and answers, when a comment is added delete the comments tag, but not the answers tag. This reduces the work the server has to do when rebuilding the cache as only the comments are now missing.
There are number of libraries and frameworks that can aid you in doing this.
Jake
EDIT
I'd use files to store the data, more specifically the HTML output of the page. You can then do something like:
if(file_exists($tag))
{
// Load the contents of the cache file here and output it
}
else
{
// Do complex database look up and cache the file for later
}
Remember that frameworks like Zend have this sort of stuff built in. I would seriously considering using a framework.
Interesting topic!
The first thing I'd look at is optimizing your database - even if you have to spend money upgrading the hardware, it will be significantly easier and cheaper than introducing a cache - fewer moving parts, fewer things that can go wrong...
If you can't squeeze more performance out of your database, the next thing I'd consider is de-normalizing the data a little. For instance, maintain a "reply_count" column, rather than counting the replies against each topic. This is ugly, but introduces fewer opportunities for things to go wrong - with a bit of luck, you can localize all the logic in your data access layer.
The next option I'd consider is to cache pages. For instance, just caching the "debate page" for 30 seconds should dramatically reduce the load on your database if you've got reasonable levels of traffic, and even if it all goes wrong, because you're caching the entire page, it will sort itself out the next time the page goes stale. In most situations, caching an entire page is okay - it's not the end of the world if a new post has appeared in the last 30 seconds and you don't see it on your page.
If you really have to provide more "up to date" content on the page, you might introduce caching at the database access level. I have, in the past, built a database access layer which cached the results of SQL queries based on hard-wired logic about how long to cache the results. In our case, we built a function to call the database which allowed you to specify the query (e.g. get posts for user), an array of parameters (e.g. username, date-from), and the cache duration. The database access function would cache results for the cache duration based on the query and the parameters; if the cache duration had expired, it would refresh the cache.
This scheme was fairly bug-proof - as an end user, you'd rarely notice weirdness due to caching, and because we kept the cache period fairly short, it all sorted itself out very quickly.
Building up your page by caching snippets of content is possible, but very quickly becomes horribly complex. It's very easy to create a page that makes no sense to the end user due to the different caching policies - "unread posts" doesn't add up to the number of posts in the breakdown because of different caching policies between "summary" and "detail".
I need help find the right caching solution for a clients site. Current site is centoOS, php, mysql, apache using smarty templates (i know they suck but it as built by someone else). The current models/methods use fairly good OO structure but there are WAY to many queries being done for some of the simple page functions. I'm looking try find some sort of caching solution but i'm a noob when it comes to this and don't know what is available that would fit the current site setup.
It is an auction type site with say 10 auctions displayed on one page at one time -- the time and current bid on each auction being updated via an ajax call returning json every 1 second (it's a penny auction site like beezid.com so updates every second are necessary). As you can see, if the site gets any sort of traffic the number of simultaneous requests could be huge. Obviously this data changes every second because the json data returned has the updated time left in the auction, and possibly updated bid amounts and bid users for each auction.
What i want is the ability to cache certain pages for a given amount of time or based on other changed variable. For example, memory caching the page that displays 10 auctions and only updating that cache copy when one of the auctions ends. Or even the script above that returns json string data every second. If i was able to cache the first request to this page in memory, serve the following requests from memory and then re-cache it again after 1 second, that could potentially reduce the serverload a lot. But i don't know if this is even possible or if the overhead of doing something like this outweights any request load savings.
I looked into xcache some but i couldn't find a way that i could set a particular cache time on a specific page or based on other variables?!? Maybe i'm missed something, but does anyone have a recommendation on a caching scheme that would work for these requirements?
Mucho thanks for any input you might have...
Cacheing can be done using many methods. Memcached springs to mind as being suited to your task. but if the site is ultra busy you may run out of ram.
When I do caching I often use a simple file cache, while it does involve at least one stat call to determine the freshness of the cached content it is still fast and marginally better than calling a sql server.
If you must call a sql server then it may pay to use a memory(heap) table to store much of the precomputed data. this technique is no more efficient than memcached, probably less so but saves you installing memcached.
DC
Zend_Cache can do what you want, and a lot more. It supports a lot of backends, including xcache and memcache, and allows you to cache data, full pages, partial pages, and well, just about anything you can imagine :p.
And in case you are wondering : you can use the Zend_Cache component by itself, you don't have to use the complete Zend framework for your application.
So I'm looking to do caching for a forum I'm building and I want to understand the best method. I've been doing some reading and the way that the Zend Framework handles caching (here) explains the idea well, but there are a few things I'm not sure about.
Let's say that I want to cache posts, should I simply "dump" the contents of the query into a file and then retrieve from that, or should I be building the layout around the data and then simply returning the contents of the file? How would I handle user information, historically the standard forum display includes a users total postcount next to a post, this can change (assuming 30 posts per page) very often and would mean I'd have to constantly clear the cache, which would seem pretty redundant.
I can't find any articles about how I should approach this and I'd be interested to learn more, does anyone have any insight or relevant articles to help?
There's always a trade-off between how often you will hit the cache (and hence who useful the cache is) and how much you want to cache and how big the lifetime should be.
You should identify the bottlenecks in your application. If it's the query that's holding the performance back, by all means cache the query. If it's building some parts of the page, cache those instead.
As to retrieving the user posts, if you want that be as live as possible, then you can't cache those (or if you do, you'll have to invalidate all the cached threads where that user has ever posted...). Retrieving post counts from the database (if done right) shouldn't be too taxing. You can just cache a template where the post count is left blank to be filled later or you can do some tricks with Javascript.
We have a PHP website like reddit, users can vote for the stories.
We tried to use APC, memcached etc. for the website but we gave up. The problem is we want to use a caching mechanism, but users can vote anytime on site and the cached data may be old and confusing for the other visitors.
Let me explain with an example, We have an array of 100 stories and stored in cache for 5 mins., a user voted for some stories so the ratings of the stories are changed. When the other user enter the website, he/she will see the cached data, therefore the old data. (This is the same if the voter user refreshes the page, he'll also see the old vote number for the stories.)
We cannot figure it out, any help will be highly appreciated
This is a matter of finding a balance between low-latency updates, and overall system/network load (aka, performance vs. cost).
If you have capacity to spare, the simplest solution is to keep your votes in a database, and always look them up during a page load. Of course, there's no caching here.
Another low-latency (but high-cost) solution is to have a pub-sub type system that publishes votes to all other caches on the fly. In addition to the high cost, there are various synchronization issues you'll need to deal with here.
The next alternative is to have a shared cache (e.g., memcached, but shared across different machines). Updates to the database will always update the cache. This reduces the load on the database and would get you lower latency responses (since cache lookups are usually cheaper than queries to a relational database). But if you do this, you'll need to size the cache carefully, and have enough redundancy such that the shared cache isn't a single point of failure.
Another, more commonly used, alternative is to have some kind of background vote aggregation, where votes are only stored as transactions on each of the front-end servers, and you have a background process that continuously (e.g., every five seconds) aggregates the votes and populates all the caches.
AFAIK, reddit does not do live low-latency vote propagation. If you vote something up, it isn't immediately reflected across other clients. My guess is that they're doing some kind of aggregation (as in #4), but that's just me speculating.
Perhaps this is a solution you've already considered, but why not just cache everything but the ratings? Instead, just update a single array, where the ith position contains the rating for the ith top story. Keep this in memory all the time, and flush ratings back to the database as it's available.
If you only care about the top N stories being the most up-to-date, then i only needs to be the size of the number of stories on the front page, which is presumably a very small number like 50 or so.
I have a page with a post and multiple comments, by using PHP's ob_start() I am able to cache it successfully.
Next to each comment I have a username and its number of current posts and reputation. Now I am keeping the cache of the page with the post all until someone adds a new comment, only then I update the cache file.
Now the problem is that a user's post number and reputation will increase as he posts/comments on other topics, and its post number and reputation will not change on elder posts.
What would be the best practice to tackle this issue.
If you are by any means concerned with your site's performance you should switch to APC as it provides both opcode caching as well as means for caching as a key/value store.
You can store entire blocks of content, arrays, objects, you name it:
// you must supply:
// 1. a key you will later use to retrieve your content
// 2. the data you wish to cache
// 3. how long the cache should remain valid
apc_store($key, $data, $ttl);
As far as retrieval goes, you simply make a call like:
$data = apc_fetch($key);
I sort of hope to be proven wrong, but I don't think there's currently any easy way around this other than limiting the duration of the cache.
You could of course update the relevant reputations, etc. via AJAX but it's quite possible that the connections & bandwidth that this consumes would ultimately outweigh the benefit of caching the page in the first place.
If one of the main goals of caching is to reduce processing overhead (as opposed to bandwidth consumption) you could of course simply flatten out the non-dynamic parts of the page (each post as a static text file or similar - hence reducing the need to re-generate the HTML if you're using Markdown or BBCode, etc.) and include these as required/update them if they're edited.
Some of my thoughts:
You could choose to keep the post pages cached for a certain period of time, like one hour or 15 minutes. This time is depending on the amount of visitors you get on the page, the frequency the details change and your personal preference. Because it does not really matter whether the number of posts of an user is slightly outdated. After this period remove the cached version (also saves resources) and if the page is visited again, it will be re-cached with the updated details.
By clever (re-)using ob_start() you can buffer multiple parts of the page, like the post part and the comments part. Store these parts separately and you only need to regenerate one part instead of the complete page. Most of the times, the post part is not changing very often.
Keep track of the pages where a certain user posted comments (or the page itself, if he created it). Upon changes in the user details (new post/comment added), make these pages obsolete (ie remove the cached version). If you have a lot of changes in a small period of time you could use some background process to re-cache the pages and keep your web-server responsive.
Insert tokens (unique pieces of text, like %user:123,postcount%) of frequent changing details is another possibility. Then store this version into your cache and upon a page request you can replace the tokens with their details. This could also be combined with other caching techniques if the number of page views per period of time is very high (or at least much higher then the frequency of the detail changes).