I'm working on some old(ish) software in PHP that maintains a $cache array to reduce the number of SQL queries. I was thinking of just putting memcached in its place and I'm wondering whether or not to get rid of the internal caching. Would there still be a worthwihle performance increase if I keep the internal caching, or would memcached suffice?
According to this blog post, the PHP internal array is way faster than any other method:
Cache Type Cache Gets/sec
Array Cache 365000
APC Cache 98000
File Cache 27000
Memcached Cache (TCP/IP) 12200
MySQL Query Cache (TCP/IP) 9900
MySQL Query Cache (Unix Socket) 13500
Selecting from table (TCP/IP) 5100
Selecting from table (Unix Socket) 7400
It seems likely that memcache (which is implemented on the metal) would be faster than some php interpreted caching scheme.
However: if it's not broken, don't fix it.
If you remove the custom caching code, you might have to deal with other code that depends on the cache. I can't speak for the quality of the code you have to maintain but it seems like one of those "probably not worth it" things.
Let me put it this way: Do you trust the original developer(s) to have written code that will still work if you rip out the caching? (I probably wouldn't)
So unless the existing caching is giving you problems I would recommend against taking it out.
There's an advantage in using memcache vs local caching if:
1) you have mulitple webservers running off the same database, and have memcache set up to run across multiple nodes
2) the database does not implement query result caching or is very slow to access
Otherwise, unless the caching code is very poor, you shouldn't expect to see much performance benefit.
HTH
C.
Related
Whenever I needed to cache some information I relied on timestamps and MySQL, storing the data into a database and fetching it that way. I just read about APC.
APC is so much easier but is it worth converting my previous cache methods to switch to APC besides just less SQL's going through and cleaner code?
If you already have a database running and doing most of your things the first step to improve your performance is to peroperly tune the database. MySQL, properly configured, is very fast.
Obviously at some point in time it isn't fast enough anymore and one needs further caches. When caching one thing to consider is that your data might not be consistent anymore. Meaning that you might update data in your primary store (the database) but others stll read an outdated cache entry
Now you've mentoned APC as a possible solution: APC is two related but different things:
An opcode cache for the PHP scrip
A shared memorz cache for PHP user data
An opcode cache works by storing the compiled PHP script in memory. So when requesting a site the PHP interpreter doesn't have to read the file from disk and analyze the code but can directly execute it. This gives a major boost and is always a good thing.
A shared memory cache takes any PHP variable (well, there are a few exceptions ...) and stores it in shared memory in the system, so all PHP processes on the same machine might read it. So if you store the result of a database query inside APC you save time as access to shared memory is very fast compared to querying a database (sending the query to a different machine, parsing it, executing it, sending the result back ...) but as said in the begginning you have to mind that the data might be outdated. And also mind that all data is stored in memory. So depending on the amount of avilable RAM there are limitations in what can be stored. Another big downside of this is that the data is stored in memory only. This means whenerver the system goes down the cache will be empty and everything in there will be lost.
To answer literally to the question, yes. Mysql is not a cache, APC is, and thus, is better.
Mysql is an storage option to implement a cache on top of it, but you are implementing the cache with those timestamps you mention and whatever logic you are doing with them. APC is a complete implementation of a cache, both for data and for code.
Performance wise, accessing the local APC cache will always be infinitely faster than accessing a mysql database. Keyword there is local, APC is not distributed (as far as I know), so if you want to share your cache, you'll need an external cache system, such as memcached.
Generally, APC will be much, much faster than MySQL, so it's well worth the time to look into it and consider switching from one system to the other. And, as you mention, you will be firing less SQL queries to the database.
More information can be found via Google, I came across the following:
http://www.mysqlperformanceblog.com/2006/08/09/cache-performance-comparison/
I'm primarily wondering what the speed difference is in accessing the object cache of APC v. memcached (NOT op-code cache). The primary advantage of memcached is that it is distributed and not restricted to the local machine. However, since it is over the network, there's is some sort of latency involved.
I was wondering whether the speed difference between accessing APC (on the machine) and memcached (on another server) is big enough to warrant having a staged caching scheme, where the program first tries APC, then memcached, and finally the database if all else fails.
Like most everything else: it depends.
If you have a lot of calculations and can store the results then caching will speed things up. If you're just basically storing rows from the database then in memory caching will help but memcached may not add a huge amount of difference vs. a database (assuming the db queries are all simple). On the other hand if you're doing complex queries, or a lot of programmatic work to create something, then caching makes much more sense.
To give you an example, I recently worked on a site that was written by a 3rd party contractor who did not do any performance work during design. It was slow as an ox because it had a lot of unoptimized includes and such. Adding APC basically improved the performance by 10x. Adding memached decreased load times by 10 - 20 ms.
If you're far enough along then do some performance testing (look up xdebug, or another tool) and see where your bottlenecks are, then plan accordingly.
Keep in mind that if you fill up your APC cache with other things then APC will have to re-calculate the op-code for your pages again. This can cause problems if the pages keep removing objects, then once the page runs the objects keep removing pages. Not fun.
Just be safe and don't be tempted to use APC for anything but config values which won't cause your pages to be removed to make space.
TL;DR Once APC gets full your site will slow down and your server will work much harder.
I use MySQL for my primary database, where I keep the actual objects. When an object is rendered using a template, rendering takes a lot of time.
Because of that I've decided to cache the produced HTML. Right now I store the cache in files, named appropriate, and it does work significantly faster. I am however aware that it is not the best way to do so.
I need a (preferably key-value) database to store my cache in. I cannot use a caching proxy because I still need to process the cached HTML. Is there such a database with a PHP front end?
Edit: If I use memcached, and I cache about a million pages, won't I run out of RAM?
Edit 2: And again, I have a lot of HTML to cache (gigabytes of it).
If I use memcached, and I cache about
a million pages, won't I run out of
RAM?
Memcached
memcached is also a real solid product(like redis more) used at all big sites to keep them up and running. Almost al active tweets(which user fetch) are stored in memcached for insane performance.
If you want to be fast you should have your active dataset in memory. But yeah if the dataset is bigger then your available memory you should(should always store data in persistent datastore because memcached is volatile) store data in a persistent datastore like for example mysql. When it's not available in memory you will try and fetch it from datastore and cache it memcache for future reference(with expire header).
Redis
I really like redis because it is an advanced key-value store with insane performance
Redis is an advanced key-value store.
It is similar to memcached but the
dataset is not volatile, and values
can be strings, exactly like in
memcached, but also lists, sets, and
ordered sets. All this data types can
be manipulated with atomic operations
to push/pop elements, add/remove
elements, perform server side union,
intersection, difference between sets,
and so forth. Redis supports different
kind of sorting abilities.
Redis has a VM so you don't need a seperate persisent datastore. I really like redis because of all the available commands (power :)?). This tutorial by simon willison displays(a lot of) the raw power which redis has.
Speed
Redis is pretty fast!, 110000 SETs/second, 81000 GETs/second in an entry level Linux box. Check the benchmarks.
Commits
Redis is more actively developed. 8 hours ago antirez(redis) commited something versus memcached 12 November latest commit.
Install Redis
Redis is insanely easy to install. It has no dependencies. You only have to perform:
make
./redis-server redis.conf #start redis
to compile redis(Awesome :)?).
Install Memcached
Memcached has dependency(libevent) which makes it more difficult to install.
wget http://memcached.org/latest
tar -zxvf memcached-1.x.x.tar.gz
cd memcached-1.x.x
./configure
make && make test
sudo make install
not totally true because memcached has libevent dependency and ./configure will fail of libevent is missing. But then again they have packages which are cool, but require root to install.
Redis is pretty fast: 110,000
SETs/second
If speed is a concern, why use the network layer?
According to: http://tokutek.com/downloads/mysqluc-2010-fractal-trees.pdf
InnoDB inserts ....................43,000 records per second AT ITS PEAK*;
TokuDB inserts ....................34,000 records per second AT ITS PEAK*;
G-WAN KV inserts ....100,000,000 records per second
(*) after a few thousands of inserts, performances degrade severely for InnoDB and TokuDB which end to write to disk when their cache and the system cache and the disk controller cache are full. See the PDF for an interesting discussion of the problems caused by the topology of the InnoDB database index (which severely breaks locality while the Fractals topology scales much better... but still not linearly).
To clarify the answers into logical views:
Flat Files are as fast the storage medium being used (DISK or RAM)
An environment which caches in RAM the MRU (Most Recently Used) items
Solution has a smart/fast hash index to all locations (what SQL systems rely on)
That combination will get you the best solution that you are looking for.
For argument sake, flat file or not - excluding a MEMORY ONLY solution - all engines use some form of flat file. The magic is knowing where your data is, and tuning reads to pull the data back most optimal. In the 80's at IBM we used a fixed record length flat file design - which wasn't optimized for disk space, it was optimized for I/O. Indexes then were based on Record Length * ROWID.
Now to your need, your ultimate performance for scale is to introduce a smart combination - we host over 1 million companies, with over 10 pages per company - 10 million files, plus js, css and images.
Theory 1) - You know your limitation is RAM - spool dynamic content to disk when feasible and drop such features as hit counters. Leverage NGINX or HIGHLY tune APACHE (or as we did, wrote our own web servers since 2001) - the whole concept is leverage RAM for the MOST USED, and have a very intelligent lookup for disk based content - normally the URI is fine.
Theory 2) - Trend Analysis and User Anticipation - I have spent years researching and developing systems that track trends. If I know a user will go path A, B, C, D - then when he hits B, I have already prefetched C and D. If I know a user will go A, B but may go E then D. You have the choice to pre-cache C and E, or for RAM sake prefetch D. and manually fetch C or E when the user picks that.
The Web Server we have developed along with some accounting systems I have developed over the years integrate Theory 2 to prefetch, with combinations of Smart Caching. We also store the content to disk in deflate - so the transport layer simply pumps the content onto the stack as 99% of the browsers support deflated streams. (It's faster to reflate before sending for that 1% than deflate 99% of the time)
Per the thought of MEMCACHED and SWAP - Disk speed is your enemy, however, tying up the kernel to manage that enemy is an epic fail! If you want to beat MEMCACHED performance, learn how to setup a RAM DISK and keep your deflated HOT requested items there!
** DISCLAIMER: This all assumes that you have enough bandwidth that your Infrastructure/Users bandwidth is not your bottleneck, but your servers are. #3FINC
http://memcached.org/ + http://php.net/manual/en/book.memcache.php
Flat files are "technically" the fastest - but if you're looking for something with a PHP front end and just screams - take a look at postgres.
http://www.wikivs.com/wiki/MySQL_vs_PostgreSQL#Raw_Speed
For memory caching look at memcached
http://memcached.org/
*Edit: from your edit ... (redundant yes) ... if you cache that volume in memory you will have issues. Look into postgres columnar table queries or a quasi-custom flat file solution.
As far as I know, using the file system is actually the fastest way to cache rendered templates without resorting to storing them in memory. Any database would simply add overhead and would make the whole thing slower by comparison.
I would use memcached or APC. Depending if you need caching shared between servers. Memcached is a daemon you connect to, where APC is actually inside of PHP instance (a little faster). Both of them store the cache in memory so it's blazing fast.
In fact storing cache in files is really the fastest way to do this. But, if you're really interested in putting them into a database, you can check out MongoDB. MongoDB is a document-oriented database so there are no server-side joins, that's why it's faster than mysql (1. with php 2. there are a lot of benchmarks on the internet).
I'm running a php/mysql-driven website with a lot of visits and I'm considering the possibility of caching result-sets in shared memory in order to reduce database load.
However, right now MySQL's query cache is enabled and it seems to be doing a pretty good job since if I disable query caching, the use of CPU jumps to 100% immediately.
Given that situation, I dont know if caching result-sets (or even the generated HTML code) locally in shared memory with PHP will result in any noticeable performace improvement.
Does anyone out there have any experience on this matter?
PS: Please avoid suggesting heavy-artillery solutions like memcached. Right now I'm looking for simple solutions that dont require too much time to implement, deploy and maintain.
Edit:
I see my comment about memcached deviated answers from the actual point, which is whether caching DB queries in the application layer would result in a noticeable performace impact considering that the result of those queries are already being cached at the DB level.
I know you didn't want to hear about memcached, but it is one of the best solutions for what you're trying to do. Depending on your site usage, there can be massive improvements in performance. By simply using memcached's session handler over my database session handler, I was able to cut the load in half and cut back on request serving times by over 30%.
Realistically, memcached is a simple solution. It's already integrated with PHP (if you have the extension loaded), and it requires virtually no configuration (I simply had to add memcached as a service on my linux box, which is done in one or two shell commands).
I would suggest storing session data (and anything that lends itself to caching) in memcache. For dynamic pages (such as stack overflow homepage), I would recommend caching output for a couple of seconds to prevent flooding.
A decent single box solution is file-based caching, but you have to sweep them out manually. Other than that, you could use APC, which is very fast and in-memory (still have to expire them yourself though).
As soon as you scale past one web server, though, you're going to need a shared cache, which is memcached. Why are you so adamant about not deploying this? It's not hard, and it's just going to save you time down the road. You can either start using memcache now and be done with it, or you could use one of the above methods for now and then end up switching to memcache later anyways, resulting in even more work. Plus too, you don't have to deal with running a cronjob or some other ugly hack to get cache expiration features: it does that for you.
The mysql query cache is nice, but it's not without issues. One of the big ones is it expires automatically every time the source data is changed, which you probably don't want.
I am using memcache for cacheing objects, but would like to add in addition an opcode accelerator like APC. Since they both involve cacheing, I am not sure if they will be "stepping on each others toes", i.e. I am not sure if memcache is already an OP code accelerator.
Can someone clarify? I would like to use them both - bit for different things. memcache for cacheing my objects and APC for code acceleration
Memcache is more along the lines of a distributed object cache vs something like APC or XCache, which stores PHP bytecode in memory so you avoid having to parse it each time. Their main purposes are different.
For example, if you had a very CPU intensive database query that people often requested, you could cache the resulting object in memcache and then refer to it instead of re-running that query all the time.
APC & XCache do have similar object caching features, but you are limited to the host machine. What if you wanted 10 different servers to all have access to that one object without having to re-do the query for each server? You'd just direct them to your memcache server and away you go. You still get a benefit if you only have a single server because using memcache will help you scale in the future if you need to branch out to more boxes.
The main thing to consider is if you think your app is going to need to scale. Memcache has more overhead since you have to use a TCP connection to access it, versus just a function call for APC/Xcache shared objects.
However, Memcache has the following benefits:
Faster than the disk or re-running query.
Scales to multiple servers.
Works with many different languages, your objects are not locked into PHP + APC/Xcache only.
All processes/languages have access to the same objects, so you don't have to worry if your PHP child processes have an empty object cache or not. This may not be as big a deal if you're running PHP-FPM though.
In most cases, I would recommend caching your objects in memcache as it's not much harder & is more flexible for the future.
Keep in mind that this is only regarding caching objects. Memcache does NOT have any bytecode or PHP acceleration features, which is why I would run it side-by-side with APC or Xcache
yes you can use them both together at the same time.