I was thinking about using memcached to store sessions instead of mySQL, which seemed like a good idea, at first.
When it comes to the failover part of utilizing memcached servers, It's a bit worrying that my sessions will stop working if the memcached would go offline. It will certainly affect my users.
There's a few techniques that we already utilize to reduce failover, including having a pool of servers available to compensate in the event of downtime, utilizing sharding/consistent hashing across the server pool and so on. We would also do some sort of graceful degradation that tells the users that something have gone wrong and they are welcome to login again, in the event of them being kicked out due to memcached server failover.
So how does people generally deal with these issues when storing sessions on memcached servers?
First, if you put something in memcache only, you should be OK losing it. For everything else, there's persistent storage.
Second, memcached simply doesn't fail very often. There aren't any moving parts like disk platters. The only times I've ever lost sessions were due to reboots for kernel upgrades. But losing those sessions weren't a big deal, because of the first point.
So to answer your question directly, if a datum is OK to lose, storing it in a memcache session only is OK. If it's not OK to lose, store it in persistent storage, and maybe cache it in memcache for speed.
You could create a fail safe method by using both the db and memcached. Check to see if your memcached object is in memory else store session in the db then create the memcache instance. Just make sure when log out / sign out, it flushes/removes the memcached...
So check memcached first, if fails, check db... :)
Related
My Silex app has always had the session data stored on the server, but I want to move to the mysql database so that I'm not so tied to a single webserver. I'm wondering about performance, though. I plan to use the PdoSessionHandler. My question is this: currently I have about 177K stored sessions. Will the garbage collection be slow? Will I be taking a performance hit by moving to the database from the filesystem?
Are you going to have an index on the session expiry? If there is no index, then yes, it will be slow. OTOH, how fast do you think searching 177,000 files on disk is? Probably a lot slower than using a database to do the thing it is expressly designed to do.
Will you take a performance hit? Probably. Will it be significant? Depends what else the system is doing with the database, the configuration of the DB, and the server it runs on.
In short - yes, there will be an inevitable cost to use the database as a session store, but it could be worth it for the abilities it gives you.
I'd suggest using Redis, backed to disk though.
Honestly, using a MySQL database as the defacto session storage in the name of scaling is about one of the worst mistakes you can make in distributed session storage.
Let me explain why...
Your MySQL database is likely already your biggest bottleneck in that PHP probably connects to it for just about everything else persistent anyway. However, there are probably a handful of request URIs where PHP might be relying on cache and not hitting your db. In the case that you're using sessions on those pages (well, there goes your connection overhead again).
The cost of deleting rows from a large table (in your case for GC) in MySQL can be extremely expensive at scale. In MyISAM the entire table is locked (worst outcome the entire site blocks during a large GC cycle). With InnoDB the DBMS has to write all of your undo information to a large commit log taking up added I/O and sometimes causing sluggishness depending on fragmentation issues. This could especially prove problematic if you have re-indexing issues too.
There are already better alternatives and they require you to write less code!
My recommendation is to just use something like memcached instead. Where the connection overhead can be significantly lower, there are no db schemas to write, and the drivers for the session handler already exist in PHP by default. Throw something like igbinary on top of memcached and you have blazing fast serialization coupled with cheaper in-memory session handling that can easily be scaled up and distributed with minimal effort and side effects. For example, AWS offers you Elasticache for memcached/redis load-balancing and replication solution in their PaS. There's also Twem Prox if you're not on AWS.
You should probably pivot to storing session data in Redis. It serves blazing fast queries via memory, but it can also recover and repopulate the memory after a crash from a static log.
Besides the drawback of when you restart memcached all sessions are lost and users logged out, what are any other drawbacks for using memcached for storing PHP sessions data instead of files. Any security concerns? Is performance better using memcached instead of standard files on disk?
Although, many have been able to optimize database performance through the use of Memcached it may not be the best solution for every situation.
Some of the drawbacks of Memcached:
Size Requirement
Not much Documentation support
Volatility (If a Memcached server instance crashes, any object data stored within the session is gone)
Security (There is no authentication built into Memcached).
But still Memcached is a good choice in many apps because of following reasons:
Memcached can compensate for insufficient ACID properties and it never blocks.
Memcached is cross-platform
Cross-DBMS
Its Cheap
Lets look at the brighter side!
Not a security concern specific to using memcached for sessions, but rather something I often come along: You absolutely must make sure that your memcached instances are either using unix sockets, or - if they're bound to a part - their port is blocked. Otherwise, people can just telnet in and view, modify and delete (session) data.
Also, as the name implies, it is a caching solution, not a storage solution. As such, if you decide to use memcached for session storage, you ought to have it either database backed or file-storage backed, so if there is a cache miss (entry deleted due to time out, manual removal, flush or because the assigned memory was full and it got pruned), it can check a more persistent type of storage before saying "nope, it isn't there".
For a high traffic web site we are planning to scale up to use 2 web servers in a HA setup.
One issue we will need to tackle is the management of PHP sessions.
The obvious answer is to move session handling to the DB which is easy and example code is widely available ton the internet.
On the other hand we are aware of the benefits of memcached but once a memcached node fails, users on that node will lose their session.
So we are thinking of implementing a setup where sessions are handled in memcached by default but also written in the DB. When we get a memcached MISS we would try to also retrieve it from the DB.
Does the above make sense and are there any implementation examples you are aware of?
thanks in advance
I refer you to Dormando's oft-cited explanation of how to store sessions in MySQL with memcached caching. The original LiveJournal post is more wordy but more thoroughly explains why storing sessions in memcached only is a bad idea.
In short:
Read session data from memcached first, look in MySQL on a cache miss.
Write session data to memcached on every update.
Only write to MySQL if cache data hasn't been synced for 120 seconds or so.
Run a periodic script that checks MySQL for expired sessions. For every expired session, update from memcached and only expire the ones that are truly expired.
Sessions it's a temporary thing, there is nothing to worry about if once per month memcache-server will fail and truncate sessions. I'm sure you can use just memcache for sessions, without replication in DB.
But if you still want to dump sessions to disk, as existing solution you can use Redis:
Redis works with an in-memory dataset.
Depending on your use case, you can
persist it either by dumping the
dataset to disk
...
Redis also supports trivial-to-setup master-slave replication, with very
fast non-blocking first
synchronization, auto-reconnection on
net split and so forth.
Here's a little background, currently i have
3 web servers
one db server which also host memcache for php sessions for the 3 web servers.
I have the php configs on the 3 servers to point to the memcache server for sessions. It was working fine until alot of connections were being produced for reads etc, which then caused connection timeouts.
So I'm currently looking at clustering the memcache on each web server for sessions, my only concern is how to go about making sure that memcache on all the servers have the same information for sessions.
Someone guided me to http://github.com/trs21219/Memcached-Library because i am using codeigniter but how do i converge my php sessions onto this since memcache seems as a key-value store? Thanks in advance.
Has anyone checked out http://repcached.sourceforge.net/ and does it work?
I'm not sure you have the same expectations of memcache that its designers had.
First, however, memcache distribution works differently than you expect: there is no mechanism to replicate stored information. Each memcache instance is a simple key-value store, as you've noticed. The distribution is done by the client code which has a list of all configured memcache instances and does a hash of the key to direct it to one of the instances. It is possible for the client to store it everywhere and retrieve it locally, or for it to hash it multiple times for redundancy, but these are not straightforward exercises.
But the other issue is that memcache is designed for reasonably short-lived data that memcache is allowed to throw away at any time. This makes it really good for caching frequently accessed data that can be a little stale (say up to a few minutes old) but might be expensive to retrieve (such as almost a minute to generate from a query).
PHP sessions don't really qualify for this, in my experience. A database can easily support many thousands of PHP sessions with barely visible traffic, but you need a lot of memcache storage to support the same number: 50k per session and 5000 sessions means close to 256Mb, and then there is all the other data you want to put in there. Not enough storage and you get lots of unexplained logouts (as memcache discards session data when under memory pressure) and thus lots of annoyed users who have to keep logging in again.
We've found GREAT advantage applying MongoDB instead of MySQL for most things, including session handling. It's far faster, far smaller, far easier. We keep MySQL around for transactional needs, but everything else goes into Mongo now. We've relegated memcache to simply caching pages and other data that isn't critical if it's there or not, something like smarty does.
There is no need to use some 3rd party libraries to organize memcached "cluster".
http://ru.php.net/manual/en/memcached.addserver.php
Just use this function to add several servers into the pool and after that data will be stored and distributed over those servers. The server for storing/retrieving the data for the specific key will be selected according to consistent key distribution option.
So in this case you don't need to worry about "how to go about making sure that memcache on all the servers have the same information for sessions"
I had some thoughts back ago about using memcached for session storage, but came to the conclusion that it wouldn't be sufficient in the event of one or more of the servers in the memcached pool were about to go down.
A hybrid version is to save the main database (mySQL) from load caused by reads would be to work out a function that tries to fetch the data from the cache pool, and if that fails gets it from the database.
After putting some more thought into it, I started to think about using APC cache for session related data. If our web server would go down, sessions would be lost either way, so storing them in a local APC or a localhost memcached server maybe isn't that bad?
What's your experiences?
Generally, session data is something which should be treated as volatile in any situation. The user can always choose to eliminate the cookie themselves at any point (if you are using cookies, of course). For this reason, I see no problem with using memcached for session data.
For me, I'd just keep it simple - no need for a DB fallback unless you absolutely must never lose the user's session in the event of a memcached server failure. As I said at the beginning, I always treat sessions as purely volatile in any case and don't really store anything of any significance in them.
That's my two cents anyways.