Generating scoreboards on large traffic sites - php

Bit of an odd question but I'm hoping someone can point me in the right direction. Basically I have two scenarios and I'd like to know which one is the best for my situation (a user checking a scoreboard on a high traffic site).
Top 10 is regenerated every time a user hits the page - increase in load on the server, especially in high traffic, user will see his/her correct standing asap.
Top 10 is regenerated at a set interval e.g. every 10 minutes. - only generates one set of results causing one spike every 10 minutes rather than potentially once every x seconds, if a user hits in between the refresh they won't see their updated score.
Each one has it's pros and cons, in your experience which one would be best to use or are there any magical alternatives?
EDIT - An update, after taking on board what everyone has said I've decided to rebuild this part of the application. Rather than dealing with the individual scores I'm dealing with the totals, this is then saved out to a separate table which sort of acts like a cached data source.
Thank you all for the great input.

Adding to Marcel's answer, I would suggest only updating the scoreboards upon write events (like new score or deleted score). This way you can keep static answers for popular queries like Top 10, etc. Use something like MemCache to keep data cached up for requests, or if you don't/can't install something like MemCache on your server serialize common requests and write them to flat files, and then delete/update them upon write events. Have your code look for the cached result (or file) first, and then iff it's missing, do the query and create the data

Nothing is never needed real time when it comes to the web. I would go with option 2 users will not notice that there score is not changing. You can use some JS to refresh the top 10 every time the cache has cleared

To add to Jordan's suggestion: I'd put the scorecards in a separate (HTML formatted) file, that is produced every time when new data arrives and only then. You can include this file in the PHP page containing the scorecard or even let a visitor's browser fetch it periodically using XMLHttpRequests (to save bandwidth). Users with JavaScript disabled or using a browser that doesn't support XMLHttpRequests (rare these days, but possible) will just see a static page.

The Drupal voting module will handle this for you, giving you an option of when to recalculate. If you're implementing it yourself, then caching the top 10 somewhere is a good idea - you can either regenerate it at regular intervals or you can invalidate the cache at certain points. You'd need to look at how often people are voting, how often that will cause the top 10 to change, how often the top 10 page is being viewed and the performance hit that regenerating it involves.
If you're not set on Drupal/MySQL then CouchDB would be useful here. You can create a view which calculates the top 10 data and it'll be cached until something happens which causes a recalculation to be necessary. You can also put in an http caching proxy inline to cache results for a set number of minutes.

Related

Server optimization for ajax-heavy Apache+Mysql+PHP site

I've looked around and haven't found a pre-existing answer to this question.
Info
My site relies on Ajax, Apache, Mysql, and PHP.
I have already built my site and it works well however as soon as too many users begin to connect (when receiving roughly 200+ requests per second) the server performs very poorly.
This site is very reliant on ajax. The main page of the site performs an ajax request every second so if 100 people are online, I'm receiving at least 100 requests per second.
These ajax queries invoke mysql queries on the server-side. These queries return small datasets. The returned datasets will change very often so I'd imagine caching would be ineffective.
Questions
1) What configuration practices would be best to help me increase the maximum number of requests per second? This applies to Ajax, Mysql, PHP, and Apache.
2) For Apache, do I want persistent connections (the KeepAlive directive) to be "On" or "Off"? As I understand, Off is useful if you are expecting many users, but On is useful for ajax and I need both of these things.
3) When I test the server's performance on serving a plain, short html page with no ajax (and involving only 1 minor mysql query) it still performs very poorly when this page gets 200+ requests per second. I'd imagine this must be due to apache configuration / server resources. What options do I have to improve this?
Thanks for any help!
Depending on the actual user need the caching can be implemented in different patterns. In many cases the users don't really need updates per every second and/or they can be cached for longer period of times and just make it look like it updates a lot. It depends...
Just to give some ideas:
Do every user need to get really unique, user specific responses from ajax requests or is it same or similar to all or sub groups of the users?
Does it make sense to have every second updates for every user?
Can the users notice the difference if the data is cached for, let's say, 10 seconds?
If the data is really unique for every user, but doesn't get updated for every user per every second, couldn't you use data refreshing (invalidate cached data when the data actually changes)?
I used requirejs to lazy load the js, html and css files. For server to serve loads of assets you need to keep the KeepAliveTimeout to 15

PHP Memcache potential problems?

I'll most probably be using MemCache for caching some database results.
As I haven't ever written and done caching I thought it would be a good idea to ask those of you who have already done it. The system I'm writing may have concurrency running scripts at some point of time. This is what I'm planning on doing:
I'm writing a banner exchange system.
The information about banners are stored in the database.
There are different sites, with different traffic, loading a php script that would generate code for those banners. (so that the banners are displayed on the client's site)
When a banner is being displayed for the first time - it get's cached with memcache.
The banner has a cache life time for example 1 hour.
Every hour the cache is renewed.
The potential problem I see in this task is at step 4 and 6.
If we have for example 100 sites with big traffic it may happen that the script has a several instances running simultaneously. How could I guarantee that when the cache expires it'll get regenerated once and the data will be intact?
How could I guarantee that when the cache expires it'll get regenerated once and the data will be intact?
The approach to caching I take is, for lack of a better word, a "lazy" implementation. That is, you don't cache something until you retrieve it once, with the hope that someone will need it again. Here's the pseudo code of what that algorithm would look like:
// returns false if there is no value or the value is expired
result = cache_check(key)
if (!result)
{
result = fetch_from_db()
// set it for next time, until it expires anyway
cache_set(key, result, expiry)
}
This works pretty well for what we want to use it for, as long as you use the cache intelligently and understand that not all information is the same. For example, in a hypothetical user comment system, you don't need an expiry time because you can simply invalidate the cache whenever a new user posts a comment on an article, so the next time comments are loaded, they're recached. Some information however (weather data comes to mind) should get a manual expiry time since you're not relying on user input to update your data.
For what its worth, memcache works well in a clustered environment and you should find that setting something like that up isn't hard to do, so this should scale pretty easily to whatever you need it to be.

Caching only frequently used data in PHP

I have a news site which receives around 58,000 hits a day for 36,000 articles. Of this 36000 unique stories, 30000 get only 1 hit (majority of which are search engine crawlers) and only 250 stories get over 20 impressions. It is a wastage of memory to cache anything, but these 250 articles.
Currently I am using MySQL Query Cache and xcache for data caching. The table is updated every 5-10 mins, hence Query Cache alone is not much useful. How can I detect frequently visited pages alone and cache the data?
I think you can have two options to start with:
You don't cache anything by default.
You can implement with an Observer/Observable pattern a way to trigger an event when the article's view reaches a threshold, and start caching the page.
You cache every article at creation
In both case, you can use a cron to purge articles which don't reaches your defined threshold.
In any case, you'll probably need to use any heuristic method to determine enough early that your article will need to be cached, and as in any heuristic method, you'll have false-positive and vice-versa.
It'll depend on how your content is read, if articles are realtime news, it'll probably be efficient as it'll quickly generate high traffic.
The main problem with those method is you'll need to store extra information like the last access datetime and its current page views which could result in extra queries.
You can cache only new articles (let's say the ones which have been added recently). I'd suggest having a look at memcached and Redis - they are both very useful, simple and at the same time powerful caching engines.

Opinions/expertise/suggestions wanted - provide feedback from php performing a lengthy task

I am thinking about converting a visual basic application (that takes pipe delimited files and imports them into a microsoft sql database) into a php page. One of these files is on average about 9 megabytes in size. (I couldn't be accurate about the number of lines involved but I'd say it's about 20 thousand)
One of the advantages being that any changes made to the page would be automatically 'deployed' to the intended user (currently when I make changes to the visual basic app, which was originally created by someone else, I have to put the latest version on all the PCs of the people that use it).
Problem is these imports can take like two minutes to complete. Two minutes is a long time in website-time so I would like to provide feedback to the user to indicate that the page hasn't failed/timed out and is definitely doing something.
The best idea that I can think of so far is to use ajax to do it incrementally. Say import 1000 records at a time then feed back, implement the next 1000, feed back, and so on.
Are there better ways of doing this sort of thing that wouldn't require me to learn new programming languages or download apps or libraries?
You don't have to make the Visual Basic -> PHP switch. You can stick with VB syntax in ASP or ASP.NET applications. With an ASP based solution, you can reuse plenty of the existing code so it won't be learning a new language / starting from scratch.
As for how to present a long running process to the user, you're looking for "Asynchronous Handlers" - the basic premise being the user visits a web page (A) which starts the process page (B).
(A) initiates (B), reports starting to the user and sets the page to reload in n seconds.
(B) does all the heavy lifting - just like your existing VB app. Progress is stored in some shared space (a flat file, a database, a memory cache, etc)
Upon reload, (A) reports current progress of (B) by read-only accessing the shared space (B) is keeping progress in.
Scope of (A):
Look for running (B) process - report status if found, or initiate fresh (B) process. Since (B) appears to be based on the existence of files (from your description) you might grant (A) the ability to determine if there's any point in calling (B) or not (ie. If files exist call (B) else report: nothing to do) or you may wish to keep the scopes entirely free and call (B).
Report progress of (B).
Should take very little time to execute, may want to include HTTP refresh header so user automatically gets updates.
Scope of (B):
Same as existing VB script – look for files, load… yada yada yada.
Should take similar time to execute as existing VB script (2 minutes)
Potential Improvements:
(A) could use an AJAX interface, so instead of a page-reload (HTTP refresh), an AJAX call is made every n seconds and simply the status box is updated. Some sort of animated icon (swirling wheel) will give the user the impression something is going on between refreshes.
It sounds like (B) could benefit from a multi-threaded approach (loading multiple files at once) depending on whether the files are related. As pointed out by Ponies, there may be a better strategy to such a load, but that's a different topic all together :)
Some sort of semaphore/flag approach may be required if page (A) could be simultaneously hit at the same time by multiple users and (B) takes a few seconds to start up and report status'.
Both (A) and (B) can be developed in PHP or ASP technology.
How are you importing the data into the database? Ideally, you should be using SQL Server's BULK INSERT which likely would speed up things. But it's still a matter of uploading the file for parsing...
I don't think it's worth the effort to get status of insertions - most sites only display an animated gif/etc (like the hourglass, etc) to indicate that the system is processing things but no real details.

How to live update browser game attributes like the 4 resources in Travian game?

I would like to make a web-based game which is Travian-like (or Ikariam-like). The game will be in PHP & MySQL-based. I wonder how can I achieve the live updating of game attributes.
For frontend, I can achieve by using AJAX calls (fetch the latest values from database), or even fake update of values (not communicated with server).
For backend, is this done by a PHP cron job (which runs every few seconds)? If so, can anyone provide me some sample codes?
by the way, I know it would be a trouble if I use IIS + FastCGI.
=== Version Information ===
PHP : 5.2.3
IIS : 6.0 with FastCGI
OS : Windows Server 2003 Standard R2
The correct answer depends on your exact needs.
Does everyone always get resources at the same rate? If so, a simple solution is to track how long their user has existed, calculate the amount of resources based on the rate they're getting, and subtract the number of resources they've spent in total. That's going to be a bit of a problem if the rate can ever change, though, so if you use this solution, you're pretty much stuck with the rate you pick unless you rewrite the handling entirely (for example to the one below).
If it varies how quickly people can get resources, you'll need to update the data periodically. A cronjob/scheduled task would work well to make sure everyone is updated, but in some situations, it might be better to simply measure how long it's been since you've updated each user's resources, and then update them on every page load they make while logged in by multiplying the time they've been away by the rate at which they gain resources - that way, you avoid updating until you actually need the new value.
For a Travian like resource management you need to keep track when you updated the users resources for the last time. If you read the resource values (for a page refresh or something), you need to add the amount of resources gained since the 'last update time' (depending on the amount of resources fields and boni the user gets) and send that value to the browser. You could also the let browser script calculate these amounts.
You might to consider caching all resource amounts somehow, since these values are required a lot, improving the communication with your database.
If a user finishes building a resource field, uses the market, builds a structure, etc you need to update the amount of resources (and the 'last update time'), because you cannot keep track on these kind of events simply.
By calculating the resources the database load is reduced, since you do not need to write the new values every time when the user refreshes the browser page. It is also more accurate since you have less rounding errors.
To keep the resources increasing between page refreshes you need a method as Frank Farmer described. Just embed the resource amount and the 'gain frequency' in some javascript and increase the resource amount every 'gain frequency' by one.
You can also calculate the ressources each time a page or the javascript asks. You'd need to store the last updated time.
It may be an old post but it comes up right away in Google so here's another option which is how the game I've been developing does it.
I use a client side JavaScript that uses a flash socket to get live updates from a dedicated game server running on the host.
I use the xmlsocket kit from http://devpro.it/xmlsocket/

Categories