Currently we are having a site which do a lot of api calls from our parent site for user details and other data. We are planning to cache all the details on our side. I am planning to use memcache for this. as this is a live site and so we are expecting heavier traffic in coming days(not that like FB but again my server is also not like them ;) ) so I need your opinion what issues we can face if we are going for memcache and cross opinions of yours why shouldn't we go for it. Any other alternative will also help.
https://github.com/steveyen/community-site/blob/master/db_doc/main/WhyNotMemcached.wiki
Memcached is terrific! But not for every situation...
You have objects larger than 1MB.
Memcached is not for large media and streaming huge blobs.
Consider other solutions like: http://www.danga.com/mogilefs
You have keys larger than 250 chars.
If so, perhaps you're doing something wrong?
And, see this mailing list conversation on key size for suggestions.
Your hosting provider won't let you run memcached.
If you're on a low-end virtual private server (a slice of a machine), virtualization tech like vmware or xen might not be a great place to run memcached. Memcached really wants to take over and control a hunk of memory -- if that memory gets swapped out by the OS or hypervisor, performance goes away. Using virtualization, though, just to ease deployment across dedicated boxes is fine.
You're running in an insecure environment.
Remember, anyone can just telnet to any memcached server. If you're on a shared system, watch out!
You want persistence. Or, a database.
If you really just wish that memcached had a SQL interface, then you probably need to rethink your understanding of caching and memcached.
You should implement a generic caching layer for the API calls first. Within the domain of the caching layer you can then change the strategy which backend you want to use. If you then see that memcache is not fitting you can actually switch (and/or testwise monitor how it works compared with other backends).
Even better, you can first code this build upon the filesystem quite easily (which has multiple backends, too) without the hurdle to rely on another daemon, so already get started with caching - probably file system is already enough for your caching needs?
Memcache is fast, but it also can use a lot of memory if you want to get the most out of it. Whenever you hit the disk for I/O, you're increasing the latency of your application. Pull items that are frequently accessed and put them on memcache. For my large scale deployments, we cache sessions there because DB is slow as well as filesystem session storage.
A recommendation to add to your stack is APC. It caches PHP files and lessens the overall memory usage per page.
Alternative: Redis
Memcached is, obviously, limited by your available memory and will start to jettison data when memory thresholds are reached. You may want to look redis which is as fast (faster in some benchmarks) as memcached but allows the use of both volatile and non-volatile keys, more complex data structures, and the option of using virtual memory to put Least Recently Used (LRU) key values to disk.
Related
I'm using PHP APC on production servers of a web service with 10M's hits/day successfuly for a long time.
I'm considering offloading much more data to the APC local cache.
Theoretically it looks to me that since APC call is mainly a local memory access. It should not become an issue to call it 10,000s of times/sec. As far as I can tell its limits can be in the memory size but as long as the server has free CPU it should not have performance or corruption issues at high rates.
Is there any limit I'm not aware of that might prevent me from using APC's local object cache in very high rate on app server (ubuntu).
Update:
Apparently according to the answers below my question wasn't clear. I'm not looking for alternaive caching options (memcache,redis etc..). My question is whether there is any concern or limit in using local APC in very high rates and read concurrency.
I'm personally a big fan of using memcached for this kind of storage. It has several advantages:
It's a program that focuses entirely on the storage, and development of memcached will always focus on that. APC is primarily a cache for code, which just happens to offer some access to user storage.
When you reload or restart Apache (or whatever webserver you use), APC's cache gets emptied. When you use a standalone solution such as memcached, you can control when the cache gets emptied. This is really something that was very important in my case, as I sometimes have to make changes to Apache's configuration and really don't want to clear the cache when I do, as it creates a large CPU spike (loading data into the cache again).
It has the possibility to create a distributed cache, making it more scalable. When you have to add a second server because your website becomes large, you don't want two caches which cache the same stuff. memcached scales well, while APC's cache doesn't.
There are many other advantages of using memcached over APC's user cache, but for me these were the primary three reasons to not use APC's user cache. I do use APC, of course, just not the user cache.
Here's a little background, currently i have
3 web servers
one db server which also host memcache for php sessions for the 3 web servers.
I have the php configs on the 3 servers to point to the memcache server for sessions. It was working fine until alot of connections were being produced for reads etc, which then caused connection timeouts.
So I'm currently looking at clustering the memcache on each web server for sessions, my only concern is how to go about making sure that memcache on all the servers have the same information for sessions.
Someone guided me to http://github.com/trs21219/Memcached-Library because i am using codeigniter but how do i converge my php sessions onto this since memcache seems as a key-value store? Thanks in advance.
Has anyone checked out http://repcached.sourceforge.net/ and does it work?
I'm not sure you have the same expectations of memcache that its designers had.
First, however, memcache distribution works differently than you expect: there is no mechanism to replicate stored information. Each memcache instance is a simple key-value store, as you've noticed. The distribution is done by the client code which has a list of all configured memcache instances and does a hash of the key to direct it to one of the instances. It is possible for the client to store it everywhere and retrieve it locally, or for it to hash it multiple times for redundancy, but these are not straightforward exercises.
But the other issue is that memcache is designed for reasonably short-lived data that memcache is allowed to throw away at any time. This makes it really good for caching frequently accessed data that can be a little stale (say up to a few minutes old) but might be expensive to retrieve (such as almost a minute to generate from a query).
PHP sessions don't really qualify for this, in my experience. A database can easily support many thousands of PHP sessions with barely visible traffic, but you need a lot of memcache storage to support the same number: 50k per session and 5000 sessions means close to 256Mb, and then there is all the other data you want to put in there. Not enough storage and you get lots of unexplained logouts (as memcache discards session data when under memory pressure) and thus lots of annoyed users who have to keep logging in again.
We've found GREAT advantage applying MongoDB instead of MySQL for most things, including session handling. It's far faster, far smaller, far easier. We keep MySQL around for transactional needs, but everything else goes into Mongo now. We've relegated memcache to simply caching pages and other data that isn't critical if it's there or not, something like smarty does.
There is no need to use some 3rd party libraries to organize memcached "cluster".
http://ru.php.net/manual/en/memcached.addserver.php
Just use this function to add several servers into the pool and after that data will be stored and distributed over those servers. The server for storing/retrieving the data for the specific key will be selected according to consistent key distribution option.
So in this case you don't need to worry about "how to go about making sure that memcache on all the servers have the same information for sessions"
My setup:
4 webservers
Static content server (NFS mount)
2 db servers
2 "do magic" servers
An additional 8 machines designated multi-purpose.
I'm writing a wrapper for three caching mechanisms so that they can be used in a somewhat normalized manner: Filesystem, Memcached and APC. I'm trying to come up with examples for use (and what to actually put in each cache).
File System
Handles content that we generate and then statically serve. RSS feeds, old report data, user specific pages, etc... This is all cached to the static server.
Memcached
PHP session data, MySQL query results, generally things that need to be available across our systems. We have 8 machines that can be included in the server pool.
APC
I have no idea. The two "do magic" servers are not part of any distributed system, so it seems likely that they could cache query results in APC and work from there. Past that, I can't think of anything.
Query Caching
Given the nature of our SQL use, query caching reduces performance. I've disabled this.
In general, what types of data should be stored where? Does this setup even make sense?
Is there any use for an APC data cache in a distributed system (I can't think of one)?
Is there something that I'm missing that would make things easier or more efficient?
Edit: I've figured out what Pascal was saying, finally. I had it stuck in my head that I would only be moving a portion of my config / whatever to APC, and still loading the rest of the file from disk. Any other suggestions?
I'm using the same kind of caching mecanism for some projects ; and we are using APC + memcached as caching systems.
There are two/three main differences between APC and memcached, when it comes to caching data :
APC access is a bit faster (something like 5 times faster than memcached, if I remember correctly), as it's only local -- i.e. no network involved.
Using APC, your cache is duplicated on each server ; using memcached, there is no duplication accross servers
whic means that memcached ensures that all servers have the same version of the data ; while data stored in APC can be different on each server
We generally use :
APC for data that has to be accessed very often, is quick to generate, and :
either is not modified often
or it doesn't matter if it's not identical on all servers
memcached for data that takes more time to generate, and/or is less used.
Or for data for which modifications must be visible immediatly (i.e. when there is a write to the DB, the cached entry is regenerated too)
For instance, we could :
Use APC to store configuration variables :
Don't change often
Are accessed very often
Are small
Use memcached for content of articles (for a CMS application, for example) :
Not that small, and there are a lot of them, which means it might need more memory than we have on one server alone
Pretty hard/heavy to generate
A couple of sidenotes :
If several servers try to write to the same file that's shared via NFS, there can be problems, as there is no locking mecanism on NFS (as far as I remember)
APC can be used to cache data, yes -- but the most important reason to use it is it's opcode caching feature (can save a large amount of CPU on the PHP servers)
Entries in memcached are limited in size : you cannot store an entry that's bigger than 1M (I've sometimes run into that problem -- rarely, but it's not good when it happens ^^ )
Consider a web app in which a call to the app consists of PHP script running several MySQL queries, some of them memcached.
The PHP does not do very complex job. It is mainly serving the MySQL data with some formatting.
In the past it used to be recommended to put MySQL and the app engine (PHP/Apache) on separate boxes.
However, when the data can be divided horizontally (for example when there are ten different customers using the service and it is possible to divide the data per customer) and when Nginx +FastCGI is used instead of heavier Apache, doesn't it make sense to put Nginx Memcache and MySQL on the same box? Then when more customers come, add similar boxes?
Background: We are moving to Amazon Ec2. And a separate box for MySQL and app server means double EBS volumes (needed on app servers to keep the code persistent as it changes often). Also if something happens to the database box, more customers will fail.
Clarification: Currently the app is running with LAMP on a single server (before moving to EC2).
If your application architecture is already designed to support Nginx and MySQL on separate instances, you may want to host all your services on the same instance until you receive enough traffic that justifies the separation.
In general, creating new identical instances with the full stack (Nginx + Your Application + MySQL) will make your setup much more difficult to maintain. Think about taking backups, releasing application updates, patching the database engine, updating the database schema, generating reports on all your clients, etc. If you opt for this method, you would really need to find some big advantages in order to offset all the disadvantages.
You need to measure carefully how much memory overhead everything has - I can't see enginex vs Apache making much difference, it's PHP which will use all the RAM (this in turn depends on how many processes the web server chooses to run, but that's more of a tuning issue).
Personally I'd stay away from enginex on the grounds that it is too risky to run such a weird server in production.
Databases always need lots of ram, and the only way you can sensibly tune the memory buffers is to have them on dedicated servers. This is assuming you have big data.
If you have very small data, you could keep it on the same box.
Likewise, memcached makes almost no sense if you're not running it on dedicated boxes. Taking memory from MySQL to give to memcached is really robbing Peter to pay Paul. MySQL can cache stuff in its innodb_buffer_pool quite efficiently (This saves IO, but may end up using more CPU as you won't cache presentation logic etc, which may be possible with memcached).
Memcached is only sensible if you're running it on dedicated boxes with lots of ram; it is also only sensible if you don't have enough grunt in your db servers to serve the read-workload of your app. Think about this before deploying it.
If your application is able to work with PHP and MySQL on different servers (I don't see why this wouldn't work, actually), then, it'll also work with PHP and MySQL on the same server.
The real question is : will your servers be able to handle the load of both Apache/nginx/PHP, MySQL, and memcached ?
And there is only one way to answer that question : you have to test in a "real" "production" configuration, to determine own loaded your servers are -- or use some tool like ab, siege, or OpenSTA to "simulate" that load.
If there is not too much load with everything on the same server... Well, go with it, if it makes the hosting of your application cheapier ;-)
Am I right in thinking that until I am able to afford dedicated servers or have any spare servers, I could successfully run a small number of memcached servers through EC2?
With the annoucement of the new auto-scaling and load balancing by Amazon today, do you guys think this would be a viable option?
And what would be the basic technical steps you'd recommend me taking?
Thanks
Currently, I have one dedicated server and no memcached servers. I want to use the power of EC2 to setup a few instances and run such memcached servers. That's my current setup.
Load balancing has nothing to do with Memcached -- it uses a hash algorithm for connecting to servers
I highly recommend not using autoscaling with Memcached -- adding servers breaks the hashing algorithm and invalidates your cache. Data will go missing and you'll have to recache.
You'll want to check the latency from your servers to EC2 -- if it's more than 50ms, you'll be hurting your performance significantly. Well, I'd assume anyway.
You can pull multiple keys (see here for how) with one request to reduce the latency effect, but you'll still take the initial hit. And it also means you need to know all the keys your going to get before you make the call. Otherwise each request adds 50ms (or more) to the execution time of your script.
Consider the data your trying to cache. Is a 64mb slab large enough to help you? You can probably run it on your main servers.
To really take advantage of memcached you need to have your memcache communicating with your code as quickly as possible. You may want to investigate how much latency you'd have between the EC2 servers and your own.
Ultimately, you might be better served upping the ram on your current box to something like 4 gigs (should run you about 50 bucks) and putting memcached on the main server. The documentation actually recommends that you install memcached on the same server that is serving out requests. Depending on the size of your application and what it does, a memcached instance with a gig or two may be way more than what you need.
Also, if you're not using a php object caching engine like APC or Eaccelerator, that will also help.
Recently AWS has released a new web service - Amazon ElasticCache. This service is protocol-complaint with Memcached.
For more details refer to : http://aws.amazon.com/elasticache/
How much free memory do you normally have on your current box? Could you not just set up a memcached instance there? I'm thinking that it's possible the latency/overhead/etc. from having remote caches is such that you'd negate any benefits, but perhaps that's not the case.
More in general:
If you want to use any type of caching mechanism, it makes sense to have your servers VERY CLOSE to your cache servers. Example: Database servers and Memcached servers, they should be in the same colocation, or same AWS "Region".
If you try to use a caching system, is because you want to improve performance. If you put the caching system away from your servers, you're basically wasting all the benefits.
Best,