Which would be the best way to achieve a fast hash / session storage with one of these three ways?
Way 1:
Create a memory table in MySQL that stores a hash and a timestamp when the entry was created. A MySQL event automatically deletes all entries older than 20 minutes. This should be pretty fast because all data is stored in memory, but the overhead of connecting to the database server might destroy this benefit.
Way 2:
I create an empty file with the hash as its filename and create a cronjob that automatically deletes all files older than 20 minutes. This can become slow because of all the read operations on the HDD.
Way 3:
Since this is going to be PHP related and we use the Zend Framework I could use the Zend_Cache and store the hash with a time-to-live of 20 minutes.
I don't want to use Memcached or APC for this, because I think that is a big overhead for just some small hashes.
Do you have any experience with similar scenarios? I would appreciate your experience and solutions for this.
If performance is an issue i would defenately go with memcached. All big internet sites rely on memcached for either caching otherwise server intensive tasks or even session storage and locking mechanisms.
Memcached is the way to go if you ask me
Don't reinvent the wheel - use memcache. Either that, or measure your MySQL performance vs. Memcache. Remember that db access is usually always a bottleneck in high-perf environments anyway.
Do you consider also scaling issues with all these approaches? How many hashes do you talk about? How many megabytes of data you talk about?
Hold it in memory of your computer if you can (your variant 3)
Put it on the disk, but hold it in memory if you can (your variant 2, if you use file-based sessions, you can use $session as someone pointed out)
Use database
Do you have a lot of data? Use memcached.
As others have said, use memcached. If PHP supported database connection pools, like Java for instance, I would recommend MySQL, but PHP being what it is, memcached is the only real answer.
Related
My Silex app has always had the session data stored on the server, but I want to move to the mysql database so that I'm not so tied to a single webserver. I'm wondering about performance, though. I plan to use the PdoSessionHandler. My question is this: currently I have about 177K stored sessions. Will the garbage collection be slow? Will I be taking a performance hit by moving to the database from the filesystem?
Are you going to have an index on the session expiry? If there is no index, then yes, it will be slow. OTOH, how fast do you think searching 177,000 files on disk is? Probably a lot slower than using a database to do the thing it is expressly designed to do.
Will you take a performance hit? Probably. Will it be significant? Depends what else the system is doing with the database, the configuration of the DB, and the server it runs on.
In short - yes, there will be an inevitable cost to use the database as a session store, but it could be worth it for the abilities it gives you.
I'd suggest using Redis, backed to disk though.
Honestly, using a MySQL database as the defacto session storage in the name of scaling is about one of the worst mistakes you can make in distributed session storage.
Let me explain why...
Your MySQL database is likely already your biggest bottleneck in that PHP probably connects to it for just about everything else persistent anyway. However, there are probably a handful of request URIs where PHP might be relying on cache and not hitting your db. In the case that you're using sessions on those pages (well, there goes your connection overhead again).
The cost of deleting rows from a large table (in your case for GC) in MySQL can be extremely expensive at scale. In MyISAM the entire table is locked (worst outcome the entire site blocks during a large GC cycle). With InnoDB the DBMS has to write all of your undo information to a large commit log taking up added I/O and sometimes causing sluggishness depending on fragmentation issues. This could especially prove problematic if you have re-indexing issues too.
There are already better alternatives and they require you to write less code!
My recommendation is to just use something like memcached instead. Where the connection overhead can be significantly lower, there are no db schemas to write, and the drivers for the session handler already exist in PHP by default. Throw something like igbinary on top of memcached and you have blazing fast serialization coupled with cheaper in-memory session handling that can easily be scaled up and distributed with minimal effort and side effects. For example, AWS offers you Elasticache for memcached/redis load-balancing and replication solution in their PaS. There's also Twem Prox if you're not on AWS.
You should probably pivot to storing session data in Redis. It serves blazing fast queries via memory, but it can also recover and repopulate the memory after a crash from a static log.
Besides the drawback of when you restart memcached all sessions are lost and users logged out, what are any other drawbacks for using memcached for storing PHP sessions data instead of files. Any security concerns? Is performance better using memcached instead of standard files on disk?
Although, many have been able to optimize database performance through the use of Memcached it may not be the best solution for every situation.
Some of the drawbacks of Memcached:
Size Requirement
Not much Documentation support
Volatility (If a Memcached server instance crashes, any object data stored within the session is gone)
Security (There is no authentication built into Memcached).
But still Memcached is a good choice in many apps because of following reasons:
Memcached can compensate for insufficient ACID properties and it never blocks.
Memcached is cross-platform
Cross-DBMS
Its Cheap
Lets look at the brighter side!
Not a security concern specific to using memcached for sessions, but rather something I often come along: You absolutely must make sure that your memcached instances are either using unix sockets, or - if they're bound to a part - their port is blocked. Otherwise, people can just telnet in and view, modify and delete (session) data.
Also, as the name implies, it is a caching solution, not a storage solution. As such, if you decide to use memcached for session storage, you ought to have it either database backed or file-storage backed, so if there is a cache miss (entry deleted due to time out, manual removal, flush or because the assigned memory was full and it got pruned), it can check a more persistent type of storage before saying "nope, it isn't there".
Could anyone explain disadvantages of storing large amounts of data within the session or point me to some reading ?
I would also be interested if there is any difference between storing data in a session and reading data from datafiles ?
If you store a large amount of data within a session, you'll have input/output performance drops since there's going to be a lot of reading/writing.
Sessions in PHP, by default, are stored in /tmp directory in a flat file. Therefore, your session data is written in a data-file of some sort.
PHP allows you to override its default session handler by session_set_save_handler() function where you can redefine the way sessions are read / written / maintained.
You can also override this via php.ini file where you specify it via session.save_handler directive.
Now, the implication of having a large number of sessions storing large data is that a lot of files will be created and it will take some time to find them due to the ways hard drives operate (mechanical ones of course, which are the common ones still).
The more you have, the longer it takes to find it. The larger they are, longer it takes to read it. If you have a lot of them and they are large - double the trouble, a change in approach is needed.
So what's the solution?
Usually, when met with performance drop - people load balance their websites. That doesn't work with sessions unfortunately because load balancing is choosing which computer to use that will serve the current request. That means that different computers will serve pages you browse at some website. Which means, if those computers use default mechanism of session storage (/tmp directory), the sessions will not be preserved across the servers since they cannot access each other's /tmp directory.
You can solve this by mounting a NAS and making it globally visible to all of the computers in the cluster, but that's both expensive and difficult to maintain.
The other option is to store the sessions in a database. A database is accessible from any of the computers in our fictional cluster. Usually, there are specialised databases used for handling sessions, specialised in sense of being separate from the database storing your website content or whatever.
In the time of NoSQL popularity - in my opinion, NoSQL is great for handling sessions. They scale easily, they are faster in writing the data to storage devices than RDBMSs are.
Third option is to boost all of this, ditch hard drives as permanent storage solution and just use your server's memory for session storage.
What you get is incredible performance, however all your RAM might be quickly gone.
You can also create a cluster of computers that store sessions in their RAM.
Redis and Memcache are great for this task, googling a bit will give you good resources that explain how to use Redis or Memcache to store sessions.
Bottom line of all this is: don't store too much data in your sessions.
According to your needs and budget - there are 3 options available how to store and work with sessions.
This is a good link: http://tuxradar.com/practicalphp/10/1/0
Session data is very expensive workload too. The best way to do it is to store a cookie, or session_id and use that to look up what you need from a dbfile/rdbms. This also allows your site to run across a multi-server environment where as session data is limited to a single.
For a high traffic web site we are planning to scale up to use 2 web servers in a HA setup.
One issue we will need to tackle is the management of PHP sessions.
The obvious answer is to move session handling to the DB which is easy and example code is widely available ton the internet.
On the other hand we are aware of the benefits of memcached but once a memcached node fails, users on that node will lose their session.
So we are thinking of implementing a setup where sessions are handled in memcached by default but also written in the DB. When we get a memcached MISS we would try to also retrieve it from the DB.
Does the above make sense and are there any implementation examples you are aware of?
thanks in advance
I refer you to Dormando's oft-cited explanation of how to store sessions in MySQL with memcached caching. The original LiveJournal post is more wordy but more thoroughly explains why storing sessions in memcached only is a bad idea.
In short:
Read session data from memcached first, look in MySQL on a cache miss.
Write session data to memcached on every update.
Only write to MySQL if cache data hasn't been synced for 120 seconds or so.
Run a periodic script that checks MySQL for expired sessions. For every expired session, update from memcached and only expire the ones that are truly expired.
Sessions it's a temporary thing, there is nothing to worry about if once per month memcache-server will fail and truncate sessions. I'm sure you can use just memcache for sessions, without replication in DB.
But if you still want to dump sessions to disk, as existing solution you can use Redis:
Redis works with an in-memory dataset.
Depending on your use case, you can
persist it either by dumping the
dataset to disk
...
Redis also supports trivial-to-setup master-slave replication, with very
fast non-blocking first
synchronization, auto-reconnection on
net split and so forth.
Currently we are having a site which do a lot of api calls from our parent site for user details and other data. We are planning to cache all the details on our side. I am planning to use memcache for this. as this is a live site and so we are expecting heavier traffic in coming days(not that like FB but again my server is also not like them ;) ) so I need your opinion what issues we can face if we are going for memcache and cross opinions of yours why shouldn't we go for it. Any other alternative will also help.
https://github.com/steveyen/community-site/blob/master/db_doc/main/WhyNotMemcached.wiki
Memcached is terrific! But not for every situation...
You have objects larger than 1MB.
Memcached is not for large media and streaming huge blobs.
Consider other solutions like: http://www.danga.com/mogilefs
You have keys larger than 250 chars.
If so, perhaps you're doing something wrong?
And, see this mailing list conversation on key size for suggestions.
Your hosting provider won't let you run memcached.
If you're on a low-end virtual private server (a slice of a machine), virtualization tech like vmware or xen might not be a great place to run memcached. Memcached really wants to take over and control a hunk of memory -- if that memory gets swapped out by the OS or hypervisor, performance goes away. Using virtualization, though, just to ease deployment across dedicated boxes is fine.
You're running in an insecure environment.
Remember, anyone can just telnet to any memcached server. If you're on a shared system, watch out!
You want persistence. Or, a database.
If you really just wish that memcached had a SQL interface, then you probably need to rethink your understanding of caching and memcached.
You should implement a generic caching layer for the API calls first. Within the domain of the caching layer you can then change the strategy which backend you want to use. If you then see that memcache is not fitting you can actually switch (and/or testwise monitor how it works compared with other backends).
Even better, you can first code this build upon the filesystem quite easily (which has multiple backends, too) without the hurdle to rely on another daemon, so already get started with caching - probably file system is already enough for your caching needs?
Memcache is fast, but it also can use a lot of memory if you want to get the most out of it. Whenever you hit the disk for I/O, you're increasing the latency of your application. Pull items that are frequently accessed and put them on memcache. For my large scale deployments, we cache sessions there because DB is slow as well as filesystem session storage.
A recommendation to add to your stack is APC. It caches PHP files and lessens the overall memory usage per page.
Alternative: Redis
Memcached is, obviously, limited by your available memory and will start to jettison data when memory thresholds are reached. You may want to look redis which is as fast (faster in some benchmarks) as memcached but allows the use of both volatile and non-volatile keys, more complex data structures, and the option of using virtual memory to put Least Recently Used (LRU) key values to disk.