PHP Memcache potential problems? - php

I'll most probably be using MemCache for caching some database results.
As I haven't ever written and done caching I thought it would be a good idea to ask those of you who have already done it. The system I'm writing may have concurrency running scripts at some point of time. This is what I'm planning on doing:
I'm writing a banner exchange system.
The information about banners are stored in the database.
There are different sites, with different traffic, loading a php script that would generate code for those banners. (so that the banners are displayed on the client's site)
When a banner is being displayed for the first time - it get's cached with memcache.
The banner has a cache life time for example 1 hour.
Every hour the cache is renewed.
The potential problem I see in this task is at step 4 and 6.
If we have for example 100 sites with big traffic it may happen that the script has a several instances running simultaneously. How could I guarantee that when the cache expires it'll get regenerated once and the data will be intact?

How could I guarantee that when the cache expires it'll get regenerated once and the data will be intact?
The approach to caching I take is, for lack of a better word, a "lazy" implementation. That is, you don't cache something until you retrieve it once, with the hope that someone will need it again. Here's the pseudo code of what that algorithm would look like:
// returns false if there is no value or the value is expired
result = cache_check(key)
if (!result)
{
result = fetch_from_db()
// set it for next time, until it expires anyway
cache_set(key, result, expiry)
}
This works pretty well for what we want to use it for, as long as you use the cache intelligently and understand that not all information is the same. For example, in a hypothetical user comment system, you don't need an expiry time because you can simply invalidate the cache whenever a new user posts a comment on an article, so the next time comments are loaded, they're recached. Some information however (weather data comes to mind) should get a manual expiry time since you're not relying on user input to update your data.
For what its worth, memcache works well in a clustered environment and you should find that setting something like that up isn't hard to do, so this should scale pretty easily to whatever you need it to be.

Related

Is it possible at the same time to give all users of website the same session?

Is it possible at the same time to give all users of website the same $_SESSION?
I have interpreted your question in the following way:
Is it possible at the same time to give all users of a website the same [state]?
Yes, shared state is usually stored in a database, although state may also be stored in the local file system.
Using $_SESSION is meant to save state for a single user. If you abuse that tool, you will create an insecure system.
I want to do following
if($_SESSION['new_posts'] + 60 >
time()) echo "there are new posts in
the forum!"; I don't want to use mysql
for that.
An easy and fast way
on every new post you do:
//will write an empty file or just update modification time
touch ('new_posts.txt');
And then:
if(filemtime('new_posts.txt') + 60 > time()) { ... }
Global sessions are not possible without opening your site up for a gigantic security hole.
Instead, let's look at what you want to do, which is grabbing data while avoiding a query every page.
1) Writing the value out to a file and then reading it every request is an option. Sessions are stored in a file on the server by default, so it would have the same speed as a session.
2) Store it in a cache such as APC, Memcache, Redis.
Keep in mind these are cached values - you'll still have to update them regularly. You have to use either a cron job or have the client update them. But what do you do when the cache expires and tons of clients are trying to update the cache at once? This is called the dogpile effect and it's something to think about.
Or you could just simply write a SQL query and execute it every page and keep it simple. Why don't you want to do this? Have you profiled the code and determined that it's an actual issue? Worrying about this before it's a known problem is a waste of time.

Caching only frequently used data in PHP

I have a news site which receives around 58,000 hits a day for 36,000 articles. Of this 36000 unique stories, 30000 get only 1 hit (majority of which are search engine crawlers) and only 250 stories get over 20 impressions. It is a wastage of memory to cache anything, but these 250 articles.
Currently I am using MySQL Query Cache and xcache for data caching. The table is updated every 5-10 mins, hence Query Cache alone is not much useful. How can I detect frequently visited pages alone and cache the data?
I think you can have two options to start with:
You don't cache anything by default.
You can implement with an Observer/Observable pattern a way to trigger an event when the article's view reaches a threshold, and start caching the page.
You cache every article at creation
In both case, you can use a cron to purge articles which don't reaches your defined threshold.
In any case, you'll probably need to use any heuristic method to determine enough early that your article will need to be cached, and as in any heuristic method, you'll have false-positive and vice-versa.
It'll depend on how your content is read, if articles are realtime news, it'll probably be efficient as it'll quickly generate high traffic.
The main problem with those method is you'll need to store extra information like the last access datetime and its current page views which could result in extra queries.
You can cache only new articles (let's say the ones which have been added recently). I'd suggest having a look at memcached and Redis - they are both very useful, simple and at the same time powerful caching engines.

Generating scoreboards on large traffic sites

Bit of an odd question but I'm hoping someone can point me in the right direction. Basically I have two scenarios and I'd like to know which one is the best for my situation (a user checking a scoreboard on a high traffic site).
Top 10 is regenerated every time a user hits the page - increase in load on the server, especially in high traffic, user will see his/her correct standing asap.
Top 10 is regenerated at a set interval e.g. every 10 minutes. - only generates one set of results causing one spike every 10 minutes rather than potentially once every x seconds, if a user hits in between the refresh they won't see their updated score.
Each one has it's pros and cons, in your experience which one would be best to use or are there any magical alternatives?
EDIT - An update, after taking on board what everyone has said I've decided to rebuild this part of the application. Rather than dealing with the individual scores I'm dealing with the totals, this is then saved out to a separate table which sort of acts like a cached data source.
Thank you all for the great input.
Adding to Marcel's answer, I would suggest only updating the scoreboards upon write events (like new score or deleted score). This way you can keep static answers for popular queries like Top 10, etc. Use something like MemCache to keep data cached up for requests, or if you don't/can't install something like MemCache on your server serialize common requests and write them to flat files, and then delete/update them upon write events. Have your code look for the cached result (or file) first, and then iff it's missing, do the query and create the data
Nothing is never needed real time when it comes to the web. I would go with option 2 users will not notice that there score is not changing. You can use some JS to refresh the top 10 every time the cache has cleared
To add to Jordan's suggestion: I'd put the scorecards in a separate (HTML formatted) file, that is produced every time when new data arrives and only then. You can include this file in the PHP page containing the scorecard or even let a visitor's browser fetch it periodically using XMLHttpRequests (to save bandwidth). Users with JavaScript disabled or using a browser that doesn't support XMLHttpRequests (rare these days, but possible) will just see a static page.
The Drupal voting module will handle this for you, giving you an option of when to recalculate. If you're implementing it yourself, then caching the top 10 somewhere is a good idea - you can either regenerate it at regular intervals or you can invalidate the cache at certain points. You'd need to look at how often people are voting, how often that will cause the top 10 to change, how often the top 10 page is being viewed and the performance hit that regenerating it involves.
If you're not set on Drupal/MySQL then CouchDB would be useful here. You can create a view which calculates the top 10 data and it'll be cached until something happens which causes a recalculation to be necessary. You can also put in an http caching proxy inline to cache results for a set number of minutes.

How to live update browser game attributes like the 4 resources in Travian game?

I would like to make a web-based game which is Travian-like (or Ikariam-like). The game will be in PHP & MySQL-based. I wonder how can I achieve the live updating of game attributes.
For frontend, I can achieve by using AJAX calls (fetch the latest values from database), or even fake update of values (not communicated with server).
For backend, is this done by a PHP cron job (which runs every few seconds)? If so, can anyone provide me some sample codes?
by the way, I know it would be a trouble if I use IIS + FastCGI.
=== Version Information ===
PHP : 5.2.3
IIS : 6.0 with FastCGI
OS : Windows Server 2003 Standard R2
The correct answer depends on your exact needs.
Does everyone always get resources at the same rate? If so, a simple solution is to track how long their user has existed, calculate the amount of resources based on the rate they're getting, and subtract the number of resources they've spent in total. That's going to be a bit of a problem if the rate can ever change, though, so if you use this solution, you're pretty much stuck with the rate you pick unless you rewrite the handling entirely (for example to the one below).
If it varies how quickly people can get resources, you'll need to update the data periodically. A cronjob/scheduled task would work well to make sure everyone is updated, but in some situations, it might be better to simply measure how long it's been since you've updated each user's resources, and then update them on every page load they make while logged in by multiplying the time they've been away by the rate at which they gain resources - that way, you avoid updating until you actually need the new value.
For a Travian like resource management you need to keep track when you updated the users resources for the last time. If you read the resource values (for a page refresh or something), you need to add the amount of resources gained since the 'last update time' (depending on the amount of resources fields and boni the user gets) and send that value to the browser. You could also the let browser script calculate these amounts.
You might to consider caching all resource amounts somehow, since these values are required a lot, improving the communication with your database.
If a user finishes building a resource field, uses the market, builds a structure, etc you need to update the amount of resources (and the 'last update time'), because you cannot keep track on these kind of events simply.
By calculating the resources the database load is reduced, since you do not need to write the new values every time when the user refreshes the browser page. It is also more accurate since you have less rounding errors.
To keep the resources increasing between page refreshes you need a method as Frank Farmer described. Just embed the resource amount and the 'gain frequency' in some javascript and increase the resource amount every 'gain frequency' by one.
You can also calculate the ressources each time a page or the javascript asks. You'd need to store the last updated time.
It may be an old post but it comes up right away in Google so here's another option which is how the game I've been developing does it.
I use a client side JavaScript that uses a flash socket to get live updates from a dedicated game server running on the host.
I use the xmlsocket kit from http://devpro.it/xmlsocket/

Zend_Cache vs cronjob

I am working on my bachelor's project and I'm trying to figure out a simple dilemma.
It's a website of a football club. There is some information that will be fetched from the website of national football association (basically league table and matches history). I'm trying to decide the best way to store this fetched data. I'm thinking about two possibilities:
1) I will set up a cron job that will run let's say every hour. It will call a script that will fetch the league table and all other data from the website and store them in a flat file.
2) I will use Zend_Cache object to do the same, except the data will be stored in cached files. The cache will get updated about every hour as well.
Which is the better approach?
I think the answer can be found in why you want to cache the file. Is it to place minimal load on the external server by only updating the cache every so often, or is it to keep pages loading fast because the file takes long to download or process?
If it's only to respect the other server, and fetching/processing the page takes little noticable time to the end user, I'd just implement Zend_Cache. It's simple, you don't have to worry about one script downloading the page, then another script loading the downloaded data (plus the cron job).
If the cache is also needed because fetching/processing the page is significant, I'd still use Zend_Cache; however, I'd set the cache to expire every 2 hours, and setup a cron job (or something similar) to manually update the cache every hour. Sure, this adds back the complexity of two scripts (or at least adding a request flag to manually refresh the cache), but should the cron job fail, you're still fine.
Well if you choose 1 it somewhat adds complexity because you have to use cron as well (not that cron is overly complex) and then you have to test that the data file is complete before using it or deall with moving files from a temp location after they have downloaded and been parsed to the proper format.
If you use two it eliminates much of 1, except now on the request where the cache is dead you have to wait for the download/parse.
I would say 1 is the better option, but 2 is going to be easier to implement and less prone to error. That said its fairly trivial to implement things in the cron script to prevent the negatives i describe. So i would probably go with 1.

Categories