Memcached on EC2

Memcached on EC2 - php

Am I right in thinking that until I am able to afford dedicated servers or have any spare servers, I could successfully run a small number of memcached servers through EC2?
With the annoucement of the new auto-scaling and load balancing by Amazon today, do you guys think this would be a viable option?
And what would be the basic technical steps you'd recommend me taking?
Thanks
Currently, I have one dedicated server and no memcached servers. I want to use the power of EC2 to setup a few instances and run such memcached servers. That's my current setup.

Load balancing has nothing to do with Memcached -- it uses a hash algorithm for connecting to servers
I highly recommend not using autoscaling with Memcached -- adding servers breaks the hashing algorithm and invalidates your cache. Data will go missing and you'll have to recache.
You'll want to check the latency from your servers to EC2 -- if it's more than 50ms, you'll be hurting your performance significantly. Well, I'd assume anyway.
You can pull multiple keys (see here for how) with one request to reduce the latency effect, but you'll still take the initial hit. And it also means you need to know all the keys your going to get before you make the call. Otherwise each request adds 50ms (or more) to the execution time of your script.
Consider the data your trying to cache. Is a 64mb slab large enough to help you? You can probably run it on your main servers.

To really take advantage of memcached you need to have your memcache communicating with your code as quickly as possible. You may want to investigate how much latency you'd have between the EC2 servers and your own.
Ultimately, you might be better served upping the ram on your current box to something like 4 gigs (should run you about 50 bucks) and putting memcached on the main server. The documentation actually recommends that you install memcached on the same server that is serving out requests. Depending on the size of your application and what it does, a memcached instance with a gig or two may be way more than what you need.
Also, if you're not using a php object caching engine like APC or Eaccelerator, that will also help.

Recently AWS has released a new web service - Amazon ElasticCache. This service is protocol-complaint with Memcached.
For more details refer to : http://aws.amazon.com/elasticache/

How much free memory do you normally have on your current box? Could you not just set up a memcached instance there? I'm thinking that it's possible the latency/overhead/etc. from having remote caches is such that you'd negate any benefits, but perhaps that's not the case.

More in general:
If you want to use any type of caching mechanism, it makes sense to have your servers VERY CLOSE to your cache servers. Example: Database servers and Memcached servers, they should be in the same colocation, or same AWS "Region".
If you try to use a caching system, is because you want to improve performance. If you put the caching system away from your servers, you're basically wasting all the benefits.
Best,

Related

Memcached and PHP Sessions in multiple servers

I will like to know how memcached manage cache for php sessions i mean. I would like to design a php app that scale out and in each http-PHP server include a memcached layer for (db,app cache and session caching), but if memcached dont replicate de data when a user come to webserver1 dont see the same session in webserver2.
memcached1 and memcached2 need to be replicated to handle php sessions
thanks in advance.
regards.

While I agree there is no question here we could try to help the OP understand how memcache works.
When you use memcache which is an in-memory cache how you set it up is determined upon your current infrastructure.
For instance if you only have 1 web server you could install memcache on that same machine along with the database layer being on that machine as well. This works for increasing performance of the site because the site can get data from memcache (in memory) rather than from the database (on disk, and slower to read). Using it in this manner is good but as your site requires better performance or scalability you would probably start up a cluster of web servers behind a load balancer.
This is when things can get a bit tricky. You have all these machines and you are thinking that you need to have memcache on every machine so how do we replicate these instances? The simple answer is you don't. If you have multiple web servers the best method is to put memcache on it's own server (or cluster behind a load balancer), this way every web server is hitting the same IP address for the memcache server(s).
You do not need to worry about keeping anything in sync because the way memcache works is it creates a hash that specifies which server the key has been assigned to (when you have a cluster of memcache servers).
Based on this question it would appear that you would need to do one of the following:
1.) Read up on system architecture
2.) Hire someone to architect your systems layer.
My best suggestion would be to use a single server for your memcache instance and set the web servers to use that for memcache rather than trying to run memcache on each of the web servers.

Joseph.
I undertand your point, I already test the architecture with a separate memcached server (and redis too). My intentions is to "pack" the application server in a unit (docker) and the measure the load parameters to deploy a new instance, to scale out the infraestructure.
I found this.
https://www.digitalocean.com/community/tutorials/how-to-share-php-sessions-on-multiple-memcached-servers-on-ubuntu-14-04
thanks for your reply!
regards.

How much overhead does the NewRelic PHP agent add?

By no means, NewRelic is taking the world by storm with many successful deployments.
But what are the cons of using it in production?
PHP monitoring agent works as a .so extension. If I understand correctly, it connects to another system aggregation service, which filters data out and pushes them into the NewRelic cloud.
This simply means that it works transparently under the hood. However, is this actually true?
Any monitoring, profiling or api service adds some overhead to the entire stack.
The extension itself is 0.6 MB, which adds up to each php process, this isn't much so my concern is rather CPU and IO.
The image shows CPU Utilization on a production EC2 t1.micro instances with NewRelic agent (top blue one) and w/o the agent (other lines)
What does NewRelic really do what cause the additional overhead?
What are other negative sides when using it?

Your mileage may vary based on the settings, your particular site's code base, etc...
The additional overhead you're seeing is less the memory used, but the tracing and profiling of your php code and gathering analytic data on it as well as DB request profiling. Basically some additional overhead hooked into every php function call. You see similar overhead if you left Xdebug or ZendDebugger running on a machine or profiling. Any module will use some resources, ones that hook deep in for profiling can be the costliest, but I've seen new relic has config settings to dial back how intensively it profiles, so you might be able to lighten it's hit more than say Xdebug.
All that being said, with the newrelic shared PHP module loaded with the default setup and config from their site my company's website overall server response latency went up about 15-20% across the board when we turned it on for all our production machines. I'm only talking about the time it takes for php-fpm to generate an initial response. Our site is http://www.nara.me. The newrelic-daemon and newrelic-sysmon services running as well, but I doubt they have any impact on response time.
Don't get me wrong, I love new relic, but the perfomance hit in my specific situation hit doesn't make me want to keep the PHP module running on all our live load balanced machines. We'll probably keep it running on one machine all the time. We do plan to keep the sysmon stuff going 100% and keep the module disabled in case we need it for troubleshooting.
My advice is this:
Wrap any calls to new relic functions in if(function_exists ( $function_name )) blocks so your code can run without error if the new relic module isn't loaded
If you've multiple identical servers behind a loadbalancer sharing the same code, only enable the php module on one image to save performance. You can keep the sysmon stuff running if you use new relic for this.
If you've just one server, only enable the shared php module when you need it--when you're actually profiling your code or mysql unless a 10-20% performance hit isn't a problem.
One other thing to remember if your main source of info is the new relic website: they get paid by the number of machines you're monitoring, so don't expect them to convince you to not use it on anything less than 100% of your machines even if it not needed. I think one of their FAQ's or blogs state basically you should expect some performance impact, but if you use it as intended and fix the issues you see from it, you should recoup the latency lost. I agree, but I think once you fix the issues, limit the exposure to the smallest needed number of servers.

The agent shouldn't be adding much overhead the way it is designed. Because of the level of detail required to adequately troubleshoot the problem, this seems like a good question to ask at https://support.newrelic.com

When not to use memcache

Currently we are having a site which do a lot of api calls from our parent site for user details and other data. We are planning to cache all the details on our side. I am planning to use memcache for this. as this is a live site and so we are expecting heavier traffic in coming days(not that like FB but again my server is also not like them ;) ) so I need your opinion what issues we can face if we are going for memcache and cross opinions of yours why shouldn't we go for it. Any other alternative will also help.

https://github.com/steveyen/community-site/blob/master/db_doc/main/WhyNotMemcached.wiki
Memcached is terrific! But not for every situation...
You have objects larger than 1MB.
Memcached is not for large media and streaming huge blobs.
Consider other solutions like: http://www.danga.com/mogilefs
You have keys larger than 250 chars.
If so, perhaps you're doing something wrong?
And, see this mailing list conversation on key size for suggestions.
Your hosting provider won't let you run memcached.
If you're on a low-end virtual private server (a slice of a machine), virtualization tech like vmware or xen might not be a great place to run memcached. Memcached really wants to take over and control a hunk of memory -- if that memory gets swapped out by the OS or hypervisor, performance goes away. Using virtualization, though, just to ease deployment across dedicated boxes is fine.
You're running in an insecure environment.
Remember, anyone can just telnet to any memcached server. If you're on a shared system, watch out!
You want persistence. Or, a database.
If you really just wish that memcached had a SQL interface, then you probably need to rethink your understanding of caching and memcached.

You should implement a generic caching layer for the API calls first. Within the domain of the caching layer you can then change the strategy which backend you want to use. If you then see that memcache is not fitting you can actually switch (and/or testwise monitor how it works compared with other backends).
Even better, you can first code this build upon the filesystem quite easily (which has multiple backends, too) without the hurdle to rely on another daemon, so already get started with caching - probably file system is already enough for your caching needs?

Memcache is fast, but it also can use a lot of memory if you want to get the most out of it. Whenever you hit the disk for I/O, you're increasing the latency of your application. Pull items that are frequently accessed and put them on memcache. For my large scale deployments, we cache sessions there because DB is slow as well as filesystem session storage.
A recommendation to add to your stack is APC. It caches PHP files and lessens the overall memory usage per page.

Alternative: Redis
Memcached is, obviously, limited by your available memory and will start to jettison data when memory thresholds are reached. You may want to look redis which is as fast (faster in some benchmarks) as memcached but allows the use of both volatile and non-volatile keys, more complex data structures, and the option of using virtual memory to put Least Recently Used (LRU) key values to disk.

Does a separate MySQL server make sense when using Nginx instead of Apache?

Consider a web app in which a call to the app consists of PHP script running several MySQL queries, some of them memcached.
The PHP does not do very complex job. It is mainly serving the MySQL data with some formatting.
In the past it used to be recommended to put MySQL and the app engine (PHP/Apache) on separate boxes.
However, when the data can be divided horizontally (for example when there are ten different customers using the service and it is possible to divide the data per customer) and when Nginx +FastCGI is used instead of heavier Apache, doesn't it make sense to put Nginx Memcache and MySQL on the same box? Then when more customers come, add similar boxes?
Background: We are moving to Amazon Ec2. And a separate box for MySQL and app server means double EBS volumes (needed on app servers to keep the code persistent as it changes often). Also if something happens to the database box, more customers will fail.
Clarification: Currently the app is running with LAMP on a single server (before moving to EC2).

If your application architecture is already designed to support Nginx and MySQL on separate instances, you may want to host all your services on the same instance until you receive enough traffic that justifies the separation.
In general, creating new identical instances with the full stack (Nginx + Your Application + MySQL) will make your setup much more difficult to maintain. Think about taking backups, releasing application updates, patching the database engine, updating the database schema, generating reports on all your clients, etc. If you opt for this method, you would really need to find some big advantages in order to offset all the disadvantages.

You need to measure carefully how much memory overhead everything has - I can't see enginex vs Apache making much difference, it's PHP which will use all the RAM (this in turn depends on how many processes the web server chooses to run, but that's more of a tuning issue).
Personally I'd stay away from enginex on the grounds that it is too risky to run such a weird server in production.
Databases always need lots of ram, and the only way you can sensibly tune the memory buffers is to have them on dedicated servers. This is assuming you have big data.
If you have very small data, you could keep it on the same box.
Likewise, memcached makes almost no sense if you're not running it on dedicated boxes. Taking memory from MySQL to give to memcached is really robbing Peter to pay Paul. MySQL can cache stuff in its innodb_buffer_pool quite efficiently (This saves IO, but may end up using more CPU as you won't cache presentation logic etc, which may be possible with memcached).
Memcached is only sensible if you're running it on dedicated boxes with lots of ram; it is also only sensible if you don't have enough grunt in your db servers to serve the read-workload of your app. Think about this before deploying it.

If your application is able to work with PHP and MySQL on different servers (I don't see why this wouldn't work, actually), then, it'll also work with PHP and MySQL on the same server.
The real question is : will your servers be able to handle the load of both Apache/nginx/PHP, MySQL, and memcached ?
And there is only one way to answer that question : you have to test in a "real" "production" configuration, to determine own loaded your servers are -- or use some tool like ab, siege, or OpenSTA to "simulate" that load.
If there is not too much load with everything on the same server... Well, go with it, if it makes the hosting of your application cheapier ;-)

Architecture of a PHP app on Amazon EC2

I recently experienced a flood of traffic on a Facebook app I created (mostly for the sake of education, not with any intention of marketing)
Needless to say, I did not think about scalability when I created the app. I'm now in a position where my meager virtual server hosted by MediaTemple isn't cutting it at all, and it's really coming down to raw I/O of the machine. Since this project has been so educating to me so far, I figured I'd take this as an opportunity to understand the Amazon EC2 platform.
The app itself is created in PHP (using Zend Framework) with a MySQL backend. I use application caching wherever possible with memcached. I've spent the weekend playing around with EC2, spinning up instances, installing the packages I want, and mounting an EBS volume to an instance.
But what's the next logical step that is going to yield good results for scalability? Do I fire up an AMI instance for the MySQL and one for the Apache service? Or do I just replicate the instances out as many times as I need them and then do some sort of load balancing on the front end? Ideally, I'd like to have a centralized database because I do aggregate statistics across all database rows, however, this is not a hard requirement (there are probably some application specific solutions I could come up with to work around this)
I know this is probably not a straight forward answer, so opinions and suggestions are welcome.

So many questions - all of them good though.
In terms of scaling, you've a few options.
The first is to start with a single box. You can scale upwards - with a more powerful box. EC2 have various sized instances. This involves a server migration each time you want a bigger box.
Easier is to add servers. You can start with a single instance for Apache & MySQL. Then when traffic increases, create a separate instance for MySQL and point your application to this new instance. This creates a nice layer between application and database. It sounds like this is a good starting point based on your traffic.
Next you'll probably need more application power (web servers) or more database power (MySQL cluster etc.). You can have your DNS records pointing to a couple of front boxes running some load balancing software (try Pound). These load balancing servers distribute requests to your webservers. EC2 has Elastic Load Balancing which is an alternative to managing this yourself, and is probably easier - I haven't used it personally.
Something else to be aware of - EC2 has no persistent storage. You have to manage persistent data yourself using the Elastic Block Store. This guide is an excellent tutorial on how to do this, with automated backups.
I recommend that you purchase some reserved instances if you decide EC2 is the way forward. You'll save yourself about 50% over 3 years!
Finally, you may be interested in services like RightScale which offer management services at a cost. There are other providers available.

First step is to separate concerns. I'd split off with a separate MySQL server and possibly a dedicated memcached box, depending on how high your load is there. Then I'd monitor memory and CPU usage on each box and see where you can optimize where possible. This can be done with spinning off new Media Temple boxes. I'd also suggest Slicehost for a cheaper, more developer-friendly alternative.
Some more low-budget PHP deployment optimizations:
Using a more efficient web server like nginx to handle static file serving and then reverse proxy app requests to a separate Apache instance
Implement PHP with FastCGI on top of nginx using something like PHP-FPM, getting rid of Apache entirely. This may be a great alternative if your Apache needs don't extend far beyond mod_rewrite and simpler Apache modules.
If you prefer a more high-level, do-it-yourself approach, you may want to check out Scalr (code at Google Code). It's worth watching the video on their web site. It facilities a scalable hosting environment using Amazon EC2. The technology is open source, so you can download it and implement it yourself on your own management server. (Your Media Temple box, perhaps?) Scalr has pre-built AMIs (EC2 appliances) available for some common use cases.
web: Utilizes nginx and its many capabilities: software load balancing, static file serving, etc. You'd probably only have one of these, and it would probably implement some sort of connection to Amazon's EBS, or persistent storage solution, as mentioned by dcaunt.
app: An application server with Apache and PHP. You'd probably have many of these, and they'd get created automatically if more load needed to be handled. This type of server would hold copies of your ZF app.
db: A database server with MySQL. Again, you'd probably have many of these, and more slave instances would get created automatically if more load needed to be handled.
memcached: A dedicated memcached server you can use to have centralized caching, session management, et cetera across all your app instances.
The Scalr option will probably take some more configuration changes, but if you feel your scaling needs accelerating quickly it may be worth the time and effort.

We Keep Coding

PHP, A popular general-purpose scripting language that is especially suited to web development.

Memcached on EC2 - php

Recently AWS has released a new web service - Amazon ElasticCache. This service is protocol-complaint with Memcached. For more details refer to : http://aws.amazon.com/elasticache/

How much free memory do you normally have on your current box? Could you not just set up a memcached instance there? I'm thinking that it's possible the latency/overhead/etc. from having remote caches is such that you'd negate any benefits, but perhaps that's not the case.