MySQL query cache vs caching result-sets in the application layer - php

I'm running a php/mysql-driven website with a lot of visits and I'm considering the possibility of caching result-sets in shared memory in order to reduce database load.
However, right now MySQL's query cache is enabled and it seems to be doing a pretty good job since if I disable query caching, the use of CPU jumps to 100% immediately.
Given that situation, I dont know if caching result-sets (or even the generated HTML code) locally in shared memory with PHP will result in any noticeable performace improvement.
Does anyone out there have any experience on this matter?
PS: Please avoid suggesting heavy-artillery solutions like memcached. Right now I'm looking for simple solutions that dont require too much time to implement, deploy and maintain.
Edit:
I see my comment about memcached deviated answers from the actual point, which is whether caching DB queries in the application layer would result in a noticeable performace impact considering that the result of those queries are already being cached at the DB level.

I know you didn't want to hear about memcached, but it is one of the best solutions for what you're trying to do. Depending on your site usage, there can be massive improvements in performance. By simply using memcached's session handler over my database session handler, I was able to cut the load in half and cut back on request serving times by over 30%.
Realistically, memcached is a simple solution. It's already integrated with PHP (if you have the extension loaded), and it requires virtually no configuration (I simply had to add memcached as a service on my linux box, which is done in one or two shell commands).
I would suggest storing session data (and anything that lends itself to caching) in memcache. For dynamic pages (such as stack overflow homepage), I would recommend caching output for a couple of seconds to prevent flooding.

A decent single box solution is file-based caching, but you have to sweep them out manually. Other than that, you could use APC, which is very fast and in-memory (still have to expire them yourself though).
As soon as you scale past one web server, though, you're going to need a shared cache, which is memcached. Why are you so adamant about not deploying this? It's not hard, and it's just going to save you time down the road. You can either start using memcache now and be done with it, or you could use one of the above methods for now and then end up switching to memcache later anyways, resulting in even more work. Plus too, you don't have to deal with running a cronjob or some other ugly hack to get cache expiration features: it does that for you.
The mysql query cache is nice, but it's not without issues. One of the big ones is it expires automatically every time the source data is changed, which you probably don't want.

Related

Is APC better cache option than MySQL?

Whenever I needed to cache some information I relied on timestamps and MySQL, storing the data into a database and fetching it that way. I just read about APC.
APC is so much easier but is it worth converting my previous cache methods to switch to APC besides just less SQL's going through and cleaner code?
If you already have a database running and doing most of your things the first step to improve your performance is to peroperly tune the database. MySQL, properly configured, is very fast.
Obviously at some point in time it isn't fast enough anymore and one needs further caches. When caching one thing to consider is that your data might not be consistent anymore. Meaning that you might update data in your primary store (the database) but others stll read an outdated cache entry
Now you've mentoned APC as a possible solution: APC is two related but different things:
An opcode cache for the PHP scrip
A shared memorz cache for PHP user data
An opcode cache works by storing the compiled PHP script in memory. So when requesting a site the PHP interpreter doesn't have to read the file from disk and analyze the code but can directly execute it. This gives a major boost and is always a good thing.
A shared memory cache takes any PHP variable (well, there are a few exceptions ...) and stores it in shared memory in the system, so all PHP processes on the same machine might read it. So if you store the result of a database query inside APC you save time as access to shared memory is very fast compared to querying a database (sending the query to a different machine, parsing it, executing it, sending the result back ...) but as said in the begginning you have to mind that the data might be outdated. And also mind that all data is stored in memory. So depending on the amount of avilable RAM there are limitations in what can be stored. Another big downside of this is that the data is stored in memory only. This means whenerver the system goes down the cache will be empty and everything in there will be lost.
To answer literally to the question, yes. Mysql is not a cache, APC is, and thus, is better.
Mysql is an storage option to implement a cache on top of it, but you are implementing the cache with those timestamps you mention and whatever logic you are doing with them. APC is a complete implementation of a cache, both for data and for code.
Performance wise, accessing the local APC cache will always be infinitely faster than accessing a mysql database. Keyword there is local, APC is not distributed (as far as I know), so if you want to share your cache, you'll need an external cache system, such as memcached.
Generally, APC will be much, much faster than MySQL, so it's well worth the time to look into it and consider switching from one system to the other. And, as you mention, you will be firing less SQL queries to the database.
More information can be found via Google, I came across the following:
http://www.mysqlperformanceblog.com/2006/08/09/cache-performance-comparison/

Is it worth it to use staged caching in PHP (APC/memcached)?

I'm primarily wondering what the speed difference is in accessing the object cache of APC v. memcached (NOT op-code cache). The primary advantage of memcached is that it is distributed and not restricted to the local machine. However, since it is over the network, there's is some sort of latency involved.
I was wondering whether the speed difference between accessing APC (on the machine) and memcached (on another server) is big enough to warrant having a staged caching scheme, where the program first tries APC, then memcached, and finally the database if all else fails.
Like most everything else: it depends.
If you have a lot of calculations and can store the results then caching will speed things up. If you're just basically storing rows from the database then in memory caching will help but memcached may not add a huge amount of difference vs. a database (assuming the db queries are all simple). On the other hand if you're doing complex queries, or a lot of programmatic work to create something, then caching makes much more sense.
To give you an example, I recently worked on a site that was written by a 3rd party contractor who did not do any performance work during design. It was slow as an ox because it had a lot of unoptimized includes and such. Adding APC basically improved the performance by 10x. Adding memached decreased load times by 10 - 20 ms.
If you're far enough along then do some performance testing (look up xdebug, or another tool) and see where your bottlenecks are, then plan accordingly.
Keep in mind that if you fill up your APC cache with other things then APC will have to re-calculate the op-code for your pages again. This can cause problems if the pages keep removing objects, then once the page runs the objects keep removing pages. Not fun.
Just be safe and don't be tempted to use APC for anything but config values which won't cause your pages to be removed to make space.
TL;DR Once APC gets full your site will slow down and your server will work much harder.

What are the different Cache Engines best used for?

Trying to get to grips with the different types of cache engines File, APC, Xcache, Memcache. Anybody know of any good resources/links?
Note I am using Linux, PHP and mysql
There are 2 types of caching terminology thrown around in PHP.
First is an optcode cache:
http://en.wikipedia.org/wiki/PHP_accelerator
Second is a data cache:
http://simas.posterous.com/php-data-caching-techniques
A few of the technologies can cross boundaries into both realms, but the basics behind them are simple. The idea is: Keep as much data in ram and precompiled because compiling and HD seeks are very expensive processes. HD Seeks can be done to find a file to compile / query the DB to get data / looking for a temp file, and every time that happens it slows down the user experience.
Memcached is generally the way to go, but it has some "features" such as once you save some data to t cache, it doesn't necessarily guarantee that it will be available later as it dynamically removes old caches to make way for new ones. It's also fairly basic, you'll need to roll your own system for handling timeouts and preventing cascading but it's all fairly simple. There's tons of info in the Memcached FAQ, or feel free to ask and I'll post some code examples. Memcached can also act as a session handler which is great if you have lots of users or more than one server.
Otherwise disc caching is good if you only have one server or don't mind generating separate caches of each server. Generally faster than memcached as it doesn't have the network overhead (unless you have memcached on the same server). There are plenty of good disc caching frameworks but probably the best are Pear Cache_Lite and APC.
APC also has the added advantage that it can cache your compiled PHP code which may help on high-performance websites.

About general logging, caching and performance in new framework

I am creating a new PHP framework depending on Zend Framework.
It will be a general purpose MVC framework for web development.
I am worried about 2 aspects:
Logging:
Should I use logging? Is there any substantial performance problems when using logging?
Caching database queries:
I am caching some queries from database.
I am concerned about caching user related information. Suppose there are some information related to users. Like their personal info, etc.
If I cache such data, for every user a cache file will be generated in my data folder. Now suppose there are 10,000 - 20,000 users online in 2 hours span of time. These means that there will be 20000 files on my folder.
My question is that, will it affect the performance of my server. Is there any upper limit on how many files a folder can have on server.
Do not use a file based cache. File system operations are exceptionally slow: http://imgur.com/X1Hi1.gif . Use memcached, you don't need a lot of memory contrary to what the above post says, the amount of memory you need for it is totally proportional to how much stuff you want to store, plus memcached can cull data based on access frequency.
1) You definitely want logging, I'd recommend xdebug available at http://www.xdebug.org/. You can read further about the performance overheads at their site. (plus it integrates nicely with Eclipse's PHP version.)
2) I'm not really sure I'd want to cache much user information, but memcache is probably one of the better choices for caching in php (http://se2.php.net/memcache). And yeah, there's no limit on file number, and you'll probably not be going over the 32-bit filesize limit either =)
Caching is a real problem it's almost impossible to get it right from a user/programmer perspective. I wouldn't cache things as simple as user data. This is already cached in the database. Focus more on complex queries and complete webpages (or parts of it).
Unless you have a page like stackoverflow where i see really few ways to cache anything you have to search hard and check your logfiles about what users do on your site and you will see some hotspots soon.
Memcache is not recommended by me unless you have a lot of memory (> 8GB) on your machine. Memcache works best if you throw in Memcache servers with 16 GB doing nothing else them caching things.
For smaller sites, hardware and requirements you should consider APC as this is a very low overhead cache for data and it speeds up the execution of php at the same time (you don't want to run a production server without a bytecode cache).

Optimizing Kohana-based Websites for Speed and Scalability

A site I built with Kohana was slammed with an enormous amount of traffic yesterday, causing me to take a step back and evaluate some of the design. I'm curious what are some standard techniques for optimizing Kohana-based applications?
I'm interested in benchmarking as well. Do I need to setup Benchmark::start() and Benchmark::stop() for each controller-method in order to see execution times for all pages, or am I able to apply benchmarking globally and quickly?
I will be using the Cache-library more in time to come, but I am open to more suggestions as I'm sure there's a lot I can do that I'm simply not aware of at the moment.
What I will say in this answer is not specific to Kohana, and can probably apply to lots of PHP projects.
Here are some points that come to my mind when talking about performance, scalability, PHP, ...
I've used many of those ideas while working on several projects -- and they helped; so they could probably help here too.
First of all, when it comes to performances, there are many aspects/questions that are to consider:
configuration of the server (both Apache, PHP, MySQL, other possible daemons, and system); you might get more help about that on ServerFault, I suppose,
PHP code,
Database queries,
Using or not your webserver?
Can you use any kind of caching mechanism? Or do you need always more that up to date data on the website?
Using a reverse proxy
The first thing that could be really useful is using a reverse proxy, like varnish, in front of your webserver: let it cache as many things as possible, so only requests that really need PHP/MySQL calculations (and, of course, some other requests, when they are not in the cache of the proxy) make it to Apache/PHP/MySQL.
First of all, your CSS/Javascript/Images -- well, everything that is static -- probably don't need to be always served by Apache
So, you can have the reverse proxy cache all those.
Serving those static files is no big deal for Apache, but the less it has to work for those, the more it will be able to do with PHP.
Remember: Apache can only server a finite, limited, number of requests at a time.
Then, have the reverse proxy serve as many PHP-pages as possible from cache: there are probably some pages that don't change that often, and could be served from cache. Instead of using some PHP-based cache, why not let another, lighter, server serve those (and fetch them from the PHP server from time to time, so they are always almost up to date)?
For instance, if you have some RSS feeds (We generally tend to forget those, when trying to optimize for performances) that are requested very often, having them in cache for a couple of minutes could save hundreds/thousands of request to Apache+PHP+MySQL!
Same for the most visited pages of your site, if they don't change for at least a couple of minutes (example: homepage?), then, no need to waste CPU re-generating them each time a user requests them.
Maybe there is a difference between pages served for anonymous users (the same page for all anonymous users) and pages served for identified users ("Hello Mr X, you have new messages", for instance)?
If so, you can probably configure the reverse proxy to cache the page that is served for anonymous users (based on a cookie, like the session cookie, typically)
It'll mean that Apache+PHP has less to deal with: only identified users -- which might be only a small part of your users.
About using a reverse-proxy as cache, for a PHP application, you can, for instance, take a look at Benchmark Results Show 400%-700% Increase In Server Capabilities with APC and Squid Cache.
(Yep, they are using Squid, and I was talking about varnish -- that's just another possibility ^^ Varnish being more recent, but more dedicated to caching)
If you do that well enough, and manage to stop re-generating too many pages again and again, maybe you won't even have to optimize any of your code ;-)
At least, maybe not in any kind of rush... And it's always better to perform optimizations when you are not under too much presure...
As a sidenote: you are saying in the OP:
A site I built with Kohana was slammed with
an enormous amount of traffic yesterday,
This is the kind of sudden situation where a reverse-proxy can literally save the day, if your website can deal with not being up to date by the second:
install it, configure it, let it always -- every normal day -- run:
Configure it to not keep PHP pages in cache; or only for a short duration; this way, you always have up to date data displayed
And, the day you take a slashdot or digg effect:
Configure the reverse proxy to keep PHP pages in cache; or for a longer period of time; maybe your pages will not be up to date by the second, but it will allow your website to survive the digg-effect!
About that, How can I detect and survive being “Slashdotted”? might be an interesting read.
On the PHP side of things:
First of all: are you using a recent version of PHP? There are regularly improvements in speed, with new versions ;-)
For instance, take a look at Benchmark of PHP Branches 3.0 through 5.3-CVS.
Note that performances is quite a good reason to use PHP 5.3 (I've made some benchmarks (in French), and results are great)...
Another pretty good reason being, of course, that PHP 5.2 has reached its end of life, and is not maintained anymore!
Are you using any opcode cache?
I'm thinking about APC - Alternative PHP Cache, for instance (pecl, manual), which is the solution I've seen used the most -- and that is used on all servers on which I've worked.
See also: Slides APC Facebook,
Or Benchmark Results Show 400%-700% Increase In Server Capabilities with APC and Squid Cache.
It can really lower the CPU-load of a server a lot, in some cases (I've seen CPU-load on some servers go from 80% to 40%, just by installing APC and activating it's opcode-cache functionality!)
Basically, execution of a PHP script goes in two steps:
Compilation of the PHP source-code to opcodes (kind of an equivalent of JAVA's bytecode)
Execution of those opcodes
APC keeps those in memory, so there is less work to be done each time a PHP script/file is executed: only fetch the opcodes from RAM, and execute them.
You might need to take a look at APC's configuration options, by the way
there are quite a few of those, and some can have a great impact on both speed / CPU-load / ease of use for you
For instance, disabling [apc.stat](https://php.net/manual/en/apc.configuration.php#ini.apc.stat) can be good for system-load; but it means modifications made to PHP files won't be take into account unless you flush the whole opcode-cache; about that, for more details, see for instance To stat() Or Not To stat()?
Using cache for data
As much as possible, it is better to avoid doing the same thing over and over again.
The main thing I'm thinking about is, of course, SQL Queries: many of your pages probably do the same queries, and the results of some of those is probably almost always the same... Which means lots of "useless" queries made to the database, which has to spend time serving the same data over and over again.
Of course, this is true for other stuff, like Web Services calls, fetching information from other websites, heavy calculations, ...
It might be very interesting for you to identify:
Which queries are run lots of times, always returning the same data
Which other (heavy) calculations are done lots of time, always returning the same result
And store these data/results in some kind of cache, so they are easier to get -- faster -- and you don't have to go to your SQL server for "nothing".
Great caching mechanisms are, for instance:
APC: in addition to the opcode-cache I talked about earlier, it allows you to store data in memory,
And/or memcached (see also), which is very useful if you literally have lots of data and/or are using multiple servers, as it is distributed.
of course, you can think about files; and probably many other ideas.
I'm pretty sure your framework comes with some cache-related stuff; you probably already know that, as you said "I will be using the Cache-library more in time to come" in the OP ;-)
Profiling
Now, a nice thing to do would be to use the Xdebug extension to profile your application: it often allows to find a couple of weak-spots quite easily -- at least, if there is any function that takes lots of time.
Configured properly, it will generate profiling files that can be analysed with some graphic tools, such as:
KCachegrind: my favorite, but works only on Linux/KDE
Wincachegrind for windows; it does a bit less stuff than KCacheGrind, unfortunately -- it doesn't display callgraphs, typically.
Webgrind which runs on a PHP webserver, so works anywhere -- but probably has less features.
For instance, here are a couple screenshots of KCacheGrind:
(source: pascal-martin.fr)
(source: pascal-martin.fr)
(BTW, the callgraph presented on the second screenshot is typically something neither WinCacheGrind nor Webgrind can do, if I remember correctly ^^ )
(Thanks #Mikushi for the comment) Another possibility that I haven't used much is the the xhprof extension : it also helps with profiling, can generate callgraphs -- but is lighter than Xdebug, which mean you should be able to install it on a production server.
You should be able to use it alonside XHGui, which will help for the visualisation of data.
On the SQL side of things:
Now that we've spoken a bit about PHP, note that it is more than possible that your bottleneck isn't the PHP-side of things, but the database one...
At least two or three things, here:
You should determine:
What are the most frequent queries your application is doing
Whether those are optimized (using the right indexes, mainly?), using the EXPLAIN instruction, if you are using MySQL
See also: Optimizing SELECT and Other Statements
You can, for instance, activate log_slow_queries to get a list of the requests that take "too much" time, and start your optimization by those.
whether you could cache some of these queries (see what I said earlier)
Is your MySQL well configured? I don't know much about that, but there are some configuration options that might have some impact.
Optimizing the MySQL Server might give you some interesting informations about that.
Still, the two most important things are:
Don't go to the DB if you don't need to: cache as much as you can!
When you have to go to the DB, use efficient queries: use indexes; and profile!
And what now?
If you are still reading, what else could be optimized?
Well, there is still room for improvements... A couple of architecture-oriented ideas might be:
Switch to an n-tier architecture:
Put MySQL on another server (2-tier: one for PHP; the other for MySQL)
Use several PHP servers (and load-balance the users between those)
Use another machines for static files, with a lighter webserver, like:
lighttpd
or nginx -- this one is becoming more and more popular, btw.
Use several servers for MySQL, several servers for PHP, and several reverse-proxies in front of those
Of course: install memcached daemons on whatever server has any amount of free RAM, and use them to cache as much as you can / makes sense.
Use something "more efficient" that Apache?
I hear more and more often about nginx, which is supposed to be great when it comes to PHP and high-volume websites; I've never used it myself, but you might find some interesting articles about it on the net;
for instance, PHP performance III -- Running nginx.
See also: PHP-FPM - FastCGI Process Manager, which is bundled with PHP >= 5.3.3, and does wonders with nginx.
Well, maybe some of those ideas are a bit overkill in your situation ^^
But, still... Why not study them a bit, just in case ? ;-)
And what about Kohana?
Your initial question was about optimizing an application that uses Kohana... Well, I've posted some ideas that are true for any PHP application... Which means they are true for Kohana too ;-)
(Even if not specific to it ^^)
I said: use cache; Kohana seems to support some caching stuff (You talked about it yourself, so nothing new here...)
If there is anything that can be done quickly, try it ;-)
I also said you shouldn't do anything that's not necessary; is there anything enabled by default in Kohana that you don't need?
Browsing the net, it seems there is at least something about XSS filtering; do you need that?
Still, here's a couple of links that might be useful:
Kohana General Discussion: Caching?
Community Support: Web Site Optimization: Maximum Website Performance using Kohana
Conclusion?
And, to conclude, a simple thought:
How much will it cost your company to pay you 5 days? -- considering it is a reasonable amount of time to do some great optimizations
How much will it cost your company to buy (pay for?) a second server, and its maintenance?
What if you have to scale larger?
How much will it cost to spend 10 days? more? optimizing every possible bit of your application?
And how much for a couple more servers?
I'm not saying you shouldn't optimize: you definitely should!
But go for "quick" optimizations that will get you big rewards first: using some opcode cache might help you get between 10 and 50 percent off your server's CPU-load... And it takes only a couple of minutes to set up ;-) On the other side, spending 3 days for 2 percent...
Oh, and, btw: before doing anything: put some monitoring stuff in place, so you know what improvements have been made, and how!
Without monitoring, you will have no idea of the effect of what you did... Not even if it's a real optimization or not!
For instance, you could use something like RRDtool + cacti.
And showing your boss some nice graphics with a 40% CPU-load drop is always great ;-)
Anyway, and to really conclude: have fun!
(Yes, optimizing is fun!)
(Ergh, I didn't think I would write that much... Hope at least some parts of this are useful... And I should remember this answer: might be useful some other times...)
Use XDebug and WinCacheGrind or WebCacheGrind to profile and analyze slow code execution.
(source: jokke.dk)
Profile code with XDebug.
Use a lot of caching. If your pages are relatively static, then reverse proxy might be the best way to do it.
Kohana is out of the box very very fast, except for the use of database objects. To quote Zombor "You can reduce memory usage by ensuring you are using the database result object instead of result arrays." This makes a HUGEE performance difference on a site that is being slammed. Not only does it use more memory, it slows down execution of scripts.
Also - you must use caching. I prefer memcache and use it in my models like this:
public function get($e_id)
{
$event_data = $this->cache->get('event_get_'.$e_id.Kohana::config('config.site_domain'));
if ($event_data === NULL)
{
$this->db_slave
->select('e_id,e_name')
->from('Events')
->where('e_id', $e_id);
$result = $this->db_slave->get();
$event_data = ($result->count() ==1)? $result->current() : FALSE;
$this->cache->set('event_get_'.$e_id.Kohana::config('config.site_domain'), $event_data, NULL, 300); // 5 minutes
}
return $event_data;
}
This will also dramatically increase performance. The above two techniques improved a sites performance by 80%.
If you gave some more information about where you think the bottleneck is, I'm sure we could give some better ideas.
Also check out yslow (google it) for some other performance tips.
Strictly related to Kohana (you probably already have done this, or not):
In production mode:
Enable internal caching (this will only cache the Kohana::find_file results, but this actually can help a lot.
Disable profiler
Just my 2 cents :)
I totally agree with the XDebug and caching answers. Don't look into the Kohana layer for optimization until you've identified your biggest speed and scale bottlenecks.
XDebug will tell you were you spend the most of your time and identify 'hotspots' in your code. Keep this profiling information so you can baseline and measure performance improvements.
Example problem and solution:
If you find that you're building up expensive objects from the database each time, that don't really change often, then you can look at caching them with memcached or another mechanism. All of these performance fixes take time and add complexity to your system, so be sure of your bottlenecks before you start fixing them.

Categories