I have a page with a post and multiple comments, by using PHP's ob_start() I am able to cache it successfully.
Next to each comment I have a username and its number of current posts and reputation. Now I am keeping the cache of the page with the post all until someone adds a new comment, only then I update the cache file.
Now the problem is that a user's post number and reputation will increase as he posts/comments on other topics, and its post number and reputation will not change on elder posts.
What would be the best practice to tackle this issue.
If you are by any means concerned with your site's performance you should switch to APC as it provides both opcode caching as well as means for caching as a key/value store.
You can store entire blocks of content, arrays, objects, you name it:
// you must supply:
// 1. a key you will later use to retrieve your content
// 2. the data you wish to cache
// 3. how long the cache should remain valid
apc_store($key, $data, $ttl);
As far as retrieval goes, you simply make a call like:
$data = apc_fetch($key);
I sort of hope to be proven wrong, but I don't think there's currently any easy way around this other than limiting the duration of the cache.
You could of course update the relevant reputations, etc. via AJAX but it's quite possible that the connections & bandwidth that this consumes would ultimately outweigh the benefit of caching the page in the first place.
If one of the main goals of caching is to reduce processing overhead (as opposed to bandwidth consumption) you could of course simply flatten out the non-dynamic parts of the page (each post as a static text file or similar - hence reducing the need to re-generate the HTML if you're using Markdown or BBCode, etc.) and include these as required/update them if they're edited.
Some of my thoughts:
You could choose to keep the post pages cached for a certain period of time, like one hour or 15 minutes. This time is depending on the amount of visitors you get on the page, the frequency the details change and your personal preference. Because it does not really matter whether the number of posts of an user is slightly outdated. After this period remove the cached version (also saves resources) and if the page is visited again, it will be re-cached with the updated details.
By clever (re-)using ob_start() you can buffer multiple parts of the page, like the post part and the comments part. Store these parts separately and you only need to regenerate one part instead of the complete page. Most of the times, the post part is not changing very often.
Keep track of the pages where a certain user posted comments (or the page itself, if he created it). Upon changes in the user details (new post/comment added), make these pages obsolete (ie remove the cached version). If you have a lot of changes in a small period of time you could use some background process to re-cache the pages and keep your web-server responsive.
Insert tokens (unique pieces of text, like %user:123,postcount%) of frequent changing details is another possibility. Then store this version into your cache and upon a page request you can replace the tokens with their details. This could also be combined with other caching techniques if the number of page views per period of time is very high (or at least much higher then the frequency of the detail changes).
Related
Well this is kind of a question of how to design a website which uses less resources than normal websites. Mobile optimized as well.
Here it goes: I was about to display a specific overview of e.g. 5 posts (from e.g. a blog). Then if I'd click for example on the first post, I'd load this post in a new window. But instead of connecting to the Database again and getting this specific post with the specific id, I'd just look up that post (in PHP) in my array of 5 posts, that I've created earlier, when I fetched the website for the first time.
Would it save data to download? Because PHP works server-side as well, so that's why I'm not sure.
Ok, I'll explain again:
Method 1:
User connects to my website
5 Posts become displayed & saved to an array (with all its data)
User clicks on the first Post and expects more Information about this post.
My program looks up the post in my array and displays it.
Method 2:
User connects to my website
5 Posts become displayed
User clicks on the first Post and expects more Information about this post.
My program connects to MySQL again and fetches the post from the server.
First off, this sounds like a case of premature optimization. I would not start caching anything outside of the database until measurements prove that it's a wise thing to do. Caching takes your focus away from the core task at hand, and introduces complexity.
If you do want to keep DB results in memory, just using an array allocated in a PHP-processed HTTP request will not be sufficient. Once the page is processed, memory allocated at that scope is no longer available.
You could certainly put the results in SESSION scope. The advantage of saving some DB results in the SESSION is that you avoid DB round trips. Disadvantages include the increased complexity to program the solution, use of memory in the web server for data that may never be accessed, and increased initial load in the DB to retrieve the extra pages that may or may not every be requested by the user.
If DB performance, after measurement, really is causing you to miss your performance objectives you can use a well-proven caching system such as memcached to keep frequently accessed data in the web server's (or dedicated cache server's) memory.
Final note: You say
PHP works server-side as well
That's not accurate. PHP works server-side only.
Have you think in saving the posts in divs, and only make it visible when the user click somewhere? Here how to do that.
Put some sort of cache between your code and the database.
So your code will look like
if(isPostInCache()) {
loadPostFromCache();
} else {
loadPostFromDatabase();
}
Go for some caching system, the web is full of them. You can use memcached or a static caching you can made by yourself (i.e. save post in txt files on the server)
To me, this is a little more inefficient than making a 2nd call to the database and here is why.
The first query should only be pulling the fields you want like: title, author, date. The content of the post maybe a heavy query, so I'd exclude that (you can pull a teaser if you'd like).
Then if the user wants the details of the post, i would then query for the content with an indexed key column.
That way you're not pulling content for 5 posts that may never been seen.
If your PHP code is constantly re-connecting to the database you've configured it wrong and aren't using connection pooling properly. The execution time of a query should be a few milliseconds at most if you've got your stack properly tuned. Do not cache unless you absolutely have to.
What you're advocating here is side-stepping a serious problem. Database queries should be effortless provided your database is properly configured. Fix that issue and you won't need to go down the caching road.
Saving data from one request to the other is a broken design and if not done perfectly could lead to embarrassing data bleed situations where one user is seeing content intended for another. This is why caching is an option usually pursued after all other avenues have been exhausted.
Assume we have an application which present continuous data to user. E.g. blog - we present list of blog entries and this list is divided into pages - so we end up with /page1, /page2 etc.
The first page is obviously requested most often, but the higher number of page the less often it is requested.
If we implement cache for our app we have two choices:
update cache of every page after every new entry
when a page is requested PHP is looking for cached version; if it exists it is returned, otherwise the cache is created with expiration date set to, let's say, an hour
First solution seems like a really waste of resources to me. The second one creates possibility of dangerous scenario:
What happends if user requests for pages x and then (x+1), where page x is cached and page (x+1) is not? If cache for page x is outdated then on page (x+1) user'll see the same content. Or worse, what if user go from x page to (x-1) page? He'll miss some entries!
How to implement caching to avoid this problem?
It's usually best to cache on demand, not cache eagerly unless you can be assured that the work done there won't be wasted.
Typically you use a backing store like Memcached do hold your transient data. This can be configured with a "time-to-live" (TTL) That will automatically expire anything that becomes stale or hasn't been used.
Generally you cache a good chunk of the page into a string, then save that using an identifying key of some sort. In your case the page URL or some subset of the parameters might serve as sufficiently unique. Remember that if the user session has an impact on the contents of this section, then something relating to that, such as user_id must be part of the cache key as well.
Just looking for a piece of advice. On one of our webpages we have a debate/forum site. Everytime a user request the debate page, he/she will get a list of all topics (and their count of answers etc.).
Too when the user request a specific topic/thread, all answers to the thread will be shown to the user a long with username, user picture, age, number of totalt forum-posts from the poster of the answer.
All content is currently retrieved by using an MySQL-query everytime the page is accessed. But this is however starting to get painfully slow (especially with large threads, +3000 answers).
I would like to cache the debate entries somehow, to speed up this proces. However the problem is, that if I cache the entries it self, number of post etc. (which is dynamic, of course), will not always be up to date.
Is there any smart way of caching the pages/recaching them when stuff like this is updated? :)
Thanks in advance,
fischer
You should create a tag or a name for the cache based on it's data.
For example for the post named Jake's Post you could create an md5 of the name, this would give you the tag 49fec15add24931728652baacc08b8ee.
Now cache the contents and everything to do with this post against the tag 49fec15add24931728652baacc08b8ee. When the post is updated or a comment is added go to the cache and delete everything associated with 49fec15add24931728652baacc08b8ee.
Now there is no cache and it will be rebuilt when the next visitors arrives to new the post.
You could break this down further by having multiple tags per post. E.g you could have a tag for comments and answers, when a comment is added delete the comments tag, but not the answers tag. This reduces the work the server has to do when rebuilding the cache as only the comments are now missing.
There are number of libraries and frameworks that can aid you in doing this.
Jake
EDIT
I'd use files to store the data, more specifically the HTML output of the page. You can then do something like:
if(file_exists($tag))
{
// Load the contents of the cache file here and output it
}
else
{
// Do complex database look up and cache the file for later
}
Remember that frameworks like Zend have this sort of stuff built in. I would seriously considering using a framework.
Interesting topic!
The first thing I'd look at is optimizing your database - even if you have to spend money upgrading the hardware, it will be significantly easier and cheaper than introducing a cache - fewer moving parts, fewer things that can go wrong...
If you can't squeeze more performance out of your database, the next thing I'd consider is de-normalizing the data a little. For instance, maintain a "reply_count" column, rather than counting the replies against each topic. This is ugly, but introduces fewer opportunities for things to go wrong - with a bit of luck, you can localize all the logic in your data access layer.
The next option I'd consider is to cache pages. For instance, just caching the "debate page" for 30 seconds should dramatically reduce the load on your database if you've got reasonable levels of traffic, and even if it all goes wrong, because you're caching the entire page, it will sort itself out the next time the page goes stale. In most situations, caching an entire page is okay - it's not the end of the world if a new post has appeared in the last 30 seconds and you don't see it on your page.
If you really have to provide more "up to date" content on the page, you might introduce caching at the database access level. I have, in the past, built a database access layer which cached the results of SQL queries based on hard-wired logic about how long to cache the results. In our case, we built a function to call the database which allowed you to specify the query (e.g. get posts for user), an array of parameters (e.g. username, date-from), and the cache duration. The database access function would cache results for the cache duration based on the query and the parameters; if the cache duration had expired, it would refresh the cache.
This scheme was fairly bug-proof - as an end user, you'd rarely notice weirdness due to caching, and because we kept the cache period fairly short, it all sorted itself out very quickly.
Building up your page by caching snippets of content is possible, but very quickly becomes horribly complex. It's very easy to create a page that makes no sense to the end user due to the different caching policies - "unread posts" doesn't add up to the number of posts in the breakdown because of different caching policies between "summary" and "detail".
So I'm looking to do caching for a forum I'm building and I want to understand the best method. I've been doing some reading and the way that the Zend Framework handles caching (here) explains the idea well, but there are a few things I'm not sure about.
Let's say that I want to cache posts, should I simply "dump" the contents of the query into a file and then retrieve from that, or should I be building the layout around the data and then simply returning the contents of the file? How would I handle user information, historically the standard forum display includes a users total postcount next to a post, this can change (assuming 30 posts per page) very often and would mean I'd have to constantly clear the cache, which would seem pretty redundant.
I can't find any articles about how I should approach this and I'd be interested to learn more, does anyone have any insight or relevant articles to help?
There's always a trade-off between how often you will hit the cache (and hence who useful the cache is) and how much you want to cache and how big the lifetime should be.
You should identify the bottlenecks in your application. If it's the query that's holding the performance back, by all means cache the query. If it's building some parts of the page, cache those instead.
As to retrieving the user posts, if you want that be as live as possible, then you can't cache those (or if you do, you'll have to invalidate all the cached threads where that user has ever posted...). Retrieving post counts from the database (if done right) shouldn't be too taxing. You can just cache a template where the post count is left blank to be filled later or you can do some tricks with Javascript.
We are using Smarty Templates on our LAMP site but my question would also apply to a site running Memcached (which we are planning to also bring online). Many of the pages of our user generated site have different views depending on who is looking at them. For instance, a list of comments where your own comments are highlighted. There would need to be a unique cache-id for each logged in user for this specific view. My question is, in this scenario, would you not even cache these views? Or is the overhead in creating/using the cache (either for smarty or memcached), low enough that you still would see some benefit to the cache?
Unless individual users are requesting the pages over and over again, there's no point caching this sort of thing, and I expect the overhead of caching will vastly exceed the performance benefits, simply since the cache hit ratio will be poor.
You may be better off looking into caching fragments of your site that do not depend on the individual user, or fragments that will be the same for a large number of page impressions (e.g. content that is the same for a large subset of your users).
For example - on this page you might want to cache the list of related questions, or the tag information, but there's probably little point caching the top-bar with reputation info too aggressively, since it will be requested relatively infrequently.
If the view code isn't too complicated just cache the data and generate the view each time.