I am using Moodle 2.7.2 for our application on load balanced environment. I am using AWS elastic cache memcached cluster with multiple node.
Whenever i am doing any confrontational changes or database update, on front-end some times new changes reflects but some time old data are displayed.
I researched about this issue and found that i should set
memcached.sess_consistent_hash=On
I changes this and restarted the server but still i am getting inconsistent data.
I guess the problem you have to solve is the cache and permanent storage updates when you have dirty data.
The consistenthash parameter is for how the data should be distributed in cluster.
For your problem, there are various strategies for this like write-back, write-through, write-around. Typically if consistency and durability are important, one will choose write-through. Also, for lots of read and less write operations - write-through is a good fit.
Hope it helps!
Related
I will like to know how memcached manage cache for php sessions i mean. I would like to design a php app that scale out and in each http-PHP server include a memcached layer for (db,app cache and session caching), but if memcached dont replicate de data when a user come to webserver1 dont see the same session in webserver2.
memcached1 and memcached2 need to be replicated to handle php sessions
thanks in advance.
regards.
While I agree there is no question here we could try to help the OP understand how memcache works.
When you use memcache which is an in-memory cache how you set it up is determined upon your current infrastructure.
For instance if you only have 1 web server you could install memcache on that same machine along with the database layer being on that machine as well. This works for increasing performance of the site because the site can get data from memcache (in memory) rather than from the database (on disk, and slower to read). Using it in this manner is good but as your site requires better performance or scalability you would probably start up a cluster of web servers behind a load balancer.
This is when things can get a bit tricky. You have all these machines and you are thinking that you need to have memcache on every machine so how do we replicate these instances? The simple answer is you don't. If you have multiple web servers the best method is to put memcache on it's own server (or cluster behind a load balancer), this way every web server is hitting the same IP address for the memcache server(s).
You do not need to worry about keeping anything in sync because the way memcache works is it creates a hash that specifies which server the key has been assigned to (when you have a cluster of memcache servers).
Based on this question it would appear that you would need to do one of the following:
1.) Read up on system architecture
2.) Hire someone to architect your systems layer.
My best suggestion would be to use a single server for your memcache instance and set the web servers to use that for memcache rather than trying to run memcache on each of the web servers.
Joseph.
I undertand your point, I already test the architecture with a separate memcached server (and redis too). My intentions is to "pack" the application server in a unit (docker) and the measure the load parameters to deploy a new instance, to scale out the infraestructure.
I found this.
https://www.digitalocean.com/community/tutorials/how-to-share-php-sessions-on-multiple-memcached-servers-on-ubuntu-14-04
thanks for your reply!
regards.
I am in the final stages of configuring a service that is accessible from four global locations (with plans to add more later). I will be running the servers on an Ubuntu 12.04 box with MariaDB. My initial thought was to create servers that run independently of each other with 4 distinct databases and live with the constraint that users would only be able to login to the server where they were initially registered.
However, I have just run into this article that has got me thinking... .
From my reading of things if I set up a Galera cluster with master-master replication as suggested in the article I can move have the luxury of one large database that is consistently available across all four servers. I have gathered (and am hoping) that with the cluster setup correctly and functioning well I need do pretty much nothing in my PHP code (the four MariaDB instances will have the same user to access the database) - not even alter the PDO connection string.
However, this sounds almost too good to be true. My questions are:
are there other issues involved here that make for complications?
Do the PHP PDO connection strings need to be altered in anway?
Does the fact that my application is already structured to ensure that there is absolutely zero chance of two servers attempting to simultaneously write the same row help?
And finally, reading from the MariaDB docs, that this will not work with the TokuDB storage engine?
Is there a way to specifically stop the replication of a selected table? Could I in fact exploit the "only InnoDB/XtraDB" constraint and use another storage engine on the table I do not want to have replicated?
are there other issues involved here that make for complications?
There are some Known Limitations that you should be aware of. Generally, with clusters, you should ideally have an odd number of nodes to prevent split brain conditions, but an even number will usually work just as well.
Do the PHP PDO connection strings need to be altered in anway?
No. Your existing connection strings should work.
Does the fact that my application is already structured to ensure that there is absolutely zero chance of two servers attempting to simultaneously write the same row help?
Look at the known limitations and make sure your application will still do that. If you're using named locks, you'll need to change your application.
And finally, reading from the MariaDB docs, that this will not work with the TokuDB storage engine?
TokuDB support was added in the recent galera cluster distribution. I have used some and it does replicate just like InnoDB but I wouldn't rely on it since it's new in the galera cluster build.
Is there a way to specifically stop the replication of a selected table? Could I in fact exploit the "only InnoDB/XtraDB" constraint and use another storage engine on the table I do not want to have replicated?
I've heard a lot of people ask if they can omit tables or databases from replication but I still haven't heard a good reason why. Galera replication provides HA and is cheap and easy so even if some tables aren't important I can't find any realistic reason to not replicate the data. That being said, you could have data not replicated by using MyISAM/Aria.
I've been using MariaDB with galera in multiple moderately sized projects and it is the best solution I've found for HA and it also provides performance benefits. Other solutions are generally expensive or not mature. One thing you should consider is setting up a proxy for connecting to the database servers like HA Proxy, mysql-proxy, or glbd (which I use) to provide better redundancy and connection balancing for performance.
In response to DroidOS's comment below:
Every write in the cluster needs to be agreed upon by every node so any latency between nodes is added to every write. So, basically, every write will have the greatest round trip time between the writing server and the other nodes added to it.
No. Galera replication is all or nothing across the entire cluster. If any node has a problem writing the data, which can happen if a table doesn't have a primary key, the node will gracefully kill itself since it can't guarantee its data is consistent with the rest of the cluster. If that happens, the rest of the cluster will continue to operate normally. If there is a network issue, if one of the segments has quorum, it will continue to operate normally. Any segments without quorum will wait for more nodes to get quorum but will not accept queries. With this behavior, you can be sure that any node that you are able to query is consistent with the rest of the cluster.
Given that this has turned out to be such a popular question I thought I should add an extra answer by way of comment for anyone who runs into it.
The big issue with synchronous replication is the latency that introduced by the process. There will certainly be times when synchronous replication is required and latency has to be managed and then lived with. However, you might on reflection -as I did - realize that you can live with lazy replication. There are commercial solutions that deliver this albeit at a hefty fee. You also have the possibility of spinning your own solution - easier than you might think.
I am using Amazon's RDS. I have a single database, and we are getting fairly heavy traffic. I already scaled our EC2 instances without any issues, it's been working great, but I want to loosen the database load by creating:
1 - Write database
2 - Read databases
Obviously, I will have to have multiple connections going on in my script, and reading from one and writing to one is easy enough, but what is the logic for load balancing multiple read databases?
Is there something in Amazon I can setup to do this? Like the load balancing for EC2? Or is this something I have to setup within my scripts automatically?
Technically, I may NOT need 2 read db instances at this time, but surely this is a common thing, right? I would assume this would need to be done, and I was curious about the architecture.
Unfortunately there is no easy way of doing this. Due to the automagically managed nature of RDS, you are at the mercy of amazon and the services they provide. You have a few options though.
1. You stick with RDS and set up a round robin DNS.
This is achieved easiest through route53. You do this by creating multiple CNAME records for each of your read replicas' endpoints. eg db.mydomain.com -> somename.23ui23asdad4r.region.rds.amazonaws.com
Make sure to turn on weighted routing policy and set the weight and "set ID" to the same.
rinse and repeat for each read replica.
http://note.io/1agsSMB
Caveat 1: this is not a true load balancer. This is simply rolling a die and pointing each request to one of your RDS
Caveat 2: There is no way to health check your RDS instances and there is no way to auto-scale the instances either. Unless you do some crazy things with cloud watch trigger scripts to manually add and remove RDS read replicas and update route53.
2. Use a die roll in your application itself.
A really cheap and nasty approach you could try is to create a config for each of your read replicas in CodeIgniter and when you connect to the database you randomly choose one.
Caveats: Same as above but even worse as you will need to update your codeigniter config each time you add or remove a read replica.
3. Spend hours and hours porting your RDS to ec2 instances.
You move your database to EC2 instances. This is perhaps the most difficult solution as you will need to manage ALL of your database tweaking and tuning yourself. On the plus side you will be able to put them in an autoscaling group and behind an internal load balancer in your VPC
RDS cluster provides you two endpoints read and write. If you send the read traffic on read endpoint, AWS will manage load balancing for all read replicas. You can also apply a scaling policy for read replicas.
These options are available for AWS Aurora clusters.
I am building a web-application and have a couple of quick questions. From what I learnt, one should not worry about scalability when initially building the app and should only start worrying when the traffic increases. However, this being my first web-application, I am not quite sure if I should take an approach where I design things in an ad-hoc manner and later "fix" them. I have been reading stories about how people start off with an app that gets millions of users in a week or two. Not that I will face the same situation but I can't help but wonder, how do these people do it?
Currently, I bought a shared hosting account on Lunarpages and that got me started in building and testing the application. However, I am interested in learning how to build the same application in a scalable-manner using the cloud, for instance, Amazon's EC2. From my understanding, I can see a couple of components:
There is a load balancer that first receives requests and then decides where to route each request
This request is then handled by a server replica that then processes the request and updates (if required) the database and sends back the response to the client
If a similar request comes in, then a caching mechanism like memcached kicks into picture and returns objects from the cache
A blackbox that handles database replication
Specifically, I am trying to do the following:
Setting up a load balancer (my homework revealed that HAProxy is one such load balancer)
Setting up replication so that databases can be synchronized
Using memcached
Configuring Apache to work with multiple web servers
Partitioning application to use Amazon EC2 and Amazon S3 (my application is something that will need great deal of storage)
Finally, how can I avoid burning myself when using Amazon services? Because this is just a learning phase, I can probably do with 2-3 servers with a simple load balancer and replication but until I want to avoid paying loads of money accidentally.
I am able to find resources on individual topics but am unable to find something that starts off from the big picture. Can someone please help me get started?
Personally, I think you should be considering how your app will scale initially - as otherwise you'll run into problems down the line.
I'm not saying you need to build it initially as a multi-server system, but if you think you'll need to do it later, be mindful of the concerns now.
In my experience, this includes things like:
Sessions. Unless you use 'sticky' load balancing, you will have to have some way of sharing session state between servers. This probably means storing session data on either shared storage, or in a DB.
File uploads and replication. If you allow users to upload files, or you have a CMS that allows you to upload images/documents, it needs to cater for the fact that these files will also need to find their way onto other nodes in your cluster. However, if you've gone down the shared storage route mentioned above, this should cover it.
DB scalability. If you're using traditional DB servers, you might want to think about how you'll implement scalability at that level. This may mean coding your app so you use one connection string for reads, and another for writes. Then, you are free to implement replication with one master node handling the inserts/updates cascading the changes to read only nodes that handle the bulk of the work.
Middleware. You might even want to go down the route of implementing some kind of message oriented middleware solution to completely hand off business logic functions - this will give you a great level of flexibility in how you wish to scale this business logic layer in the future. Although initially this will be a lot of complication and work for not a great deal of payoff.
Have you considered playing around with VMs first? You can run 2-3 VMs on your local machine and set them up like you would actual servers, they just won't be able to handle real traffic levels. If all you're looking for is the learning experience, it might be an ideal way to go about it.
Here's a little background, currently i have
3 web servers
one db server which also host memcache for php sessions for the 3 web servers.
I have the php configs on the 3 servers to point to the memcache server for sessions. It was working fine until alot of connections were being produced for reads etc, which then caused connection timeouts.
So I'm currently looking at clustering the memcache on each web server for sessions, my only concern is how to go about making sure that memcache on all the servers have the same information for sessions.
Someone guided me to http://github.com/trs21219/Memcached-Library because i am using codeigniter but how do i converge my php sessions onto this since memcache seems as a key-value store? Thanks in advance.
Has anyone checked out http://repcached.sourceforge.net/ and does it work?
I'm not sure you have the same expectations of memcache that its designers had.
First, however, memcache distribution works differently than you expect: there is no mechanism to replicate stored information. Each memcache instance is a simple key-value store, as you've noticed. The distribution is done by the client code which has a list of all configured memcache instances and does a hash of the key to direct it to one of the instances. It is possible for the client to store it everywhere and retrieve it locally, or for it to hash it multiple times for redundancy, but these are not straightforward exercises.
But the other issue is that memcache is designed for reasonably short-lived data that memcache is allowed to throw away at any time. This makes it really good for caching frequently accessed data that can be a little stale (say up to a few minutes old) but might be expensive to retrieve (such as almost a minute to generate from a query).
PHP sessions don't really qualify for this, in my experience. A database can easily support many thousands of PHP sessions with barely visible traffic, but you need a lot of memcache storage to support the same number: 50k per session and 5000 sessions means close to 256Mb, and then there is all the other data you want to put in there. Not enough storage and you get lots of unexplained logouts (as memcache discards session data when under memory pressure) and thus lots of annoyed users who have to keep logging in again.
We've found GREAT advantage applying MongoDB instead of MySQL for most things, including session handling. It's far faster, far smaller, far easier. We keep MySQL around for transactional needs, but everything else goes into Mongo now. We've relegated memcache to simply caching pages and other data that isn't critical if it's there or not, something like smarty does.
There is no need to use some 3rd party libraries to organize memcached "cluster".
http://ru.php.net/manual/en/memcached.addserver.php
Just use this function to add several servers into the pool and after that data will be stored and distributed over those servers. The server for storing/retrieving the data for the specific key will be selected according to consistent key distribution option.
So in this case you don't need to worry about "how to go about making sure that memcache on all the servers have the same information for sessions"