During my research I have found opinions pointing in both directions on this issue. A discussion would be appreciated to clarify this issue.
We are aware of the obvious advantageous security aspects on storing the session data in the database.
Option 1:
Using the default session storage on the filesystem (defaults to /tmp)
Option 2:
Using session_set_save_handler() to
store the session data in the/a
database.
My questions are:
On a high traffic site, what example would give the best performance?
Is this a matter of system hardware and where the current bottlenecks on this specific site are? In this case, the site is heavily pointed toward displaying specific user data from the database. Possibly this would be a question in need of serverfault input.
The site will probably have to propagate on to multiple servers soon, to deal with load and accessibility from other parts of the world. Think CDN. Does this affect my decision? I'm thinking it would be much easier to manage session information between the different computers if it is stored in the database.
Reason why storing to /tmp isn't favored on high traffic sites is because they use load balancing. Load balancing effectively swaps which machine delivers the data. If the session is in /tmp, not all machines have the same /tmp directory which means your users might start appearing logged in or logged out for no apparent reason.
That's why some sites store data in databases. However, that's ineffective as every request to the site means pulling info from db, which means connecting constantly, transforming data from text to an array and so on.
So, there's the third option - store the session data with Memcache. It's really easy, and if you google-fu a bit about this, you'll find answers and you can set the whole thing up in less than 5 minutes.
Related
I've been trying to understand where PHP (or other languages, really, I suppose the principle is the same) keeps session data server-side.
I read this question and this question, which both seem to indicate that it defaults to just creating ordinary files in the /tmp folder, files whose names match the session IDs stored in client-side cookies.
But that seemed ever so slightly odd to me.....what about high traffic sites with millions of concurrent users....do they really just have a giant folder full of session files? There's no database involved?
Even on smaller sites, I don't really know that many people putting files in a directory for session storage. It's slow, and session data can be something you need access to quickly and frequently, depending on the site.
Very often, an in-memory data store like Redis will be used. Many of these type of databases enable basic sharding across multiple hosts, and simple forms of replication to enable the scaling of your session storage.
When you get to the scale of millions of concurrent users, your specific needs become much more amplified. How much data do you need to store in session? Can that data be replicated to others on a best-effort basis or must it be atomic? At this point, everyone does it a bit different but the principle is the same. Fast data accessible from everywhere it needs to be. Store as little as you can.
Default session is in filesystem but You can configure your app to store Your session in memory cache like Memcached or Redis. So then You can have many web servers and many cache servers for your sessions.
Could anyone explain disadvantages of storing large amounts of data within the session or point me to some reading ?
I would also be interested if there is any difference between storing data in a session and reading data from datafiles ?
If you store a large amount of data within a session, you'll have input/output performance drops since there's going to be a lot of reading/writing.
Sessions in PHP, by default, are stored in /tmp directory in a flat file. Therefore, your session data is written in a data-file of some sort.
PHP allows you to override its default session handler by session_set_save_handler() function where you can redefine the way sessions are read / written / maintained.
You can also override this via php.ini file where you specify it via session.save_handler directive.
Now, the implication of having a large number of sessions storing large data is that a lot of files will be created and it will take some time to find them due to the ways hard drives operate (mechanical ones of course, which are the common ones still).
The more you have, the longer it takes to find it. The larger they are, longer it takes to read it. If you have a lot of them and they are large - double the trouble, a change in approach is needed.
So what's the solution?
Usually, when met with performance drop - people load balance their websites. That doesn't work with sessions unfortunately because load balancing is choosing which computer to use that will serve the current request. That means that different computers will serve pages you browse at some website. Which means, if those computers use default mechanism of session storage (/tmp directory), the sessions will not be preserved across the servers since they cannot access each other's /tmp directory.
You can solve this by mounting a NAS and making it globally visible to all of the computers in the cluster, but that's both expensive and difficult to maintain.
The other option is to store the sessions in a database. A database is accessible from any of the computers in our fictional cluster. Usually, there are specialised databases used for handling sessions, specialised in sense of being separate from the database storing your website content or whatever.
In the time of NoSQL popularity - in my opinion, NoSQL is great for handling sessions. They scale easily, they are faster in writing the data to storage devices than RDBMSs are.
Third option is to boost all of this, ditch hard drives as permanent storage solution and just use your server's memory for session storage.
What you get is incredible performance, however all your RAM might be quickly gone.
You can also create a cluster of computers that store sessions in their RAM.
Redis and Memcache are great for this task, googling a bit will give you good resources that explain how to use Redis or Memcache to store sessions.
Bottom line of all this is: don't store too much data in your sessions.
According to your needs and budget - there are 3 options available how to store and work with sessions.
This is a good link: http://tuxradar.com/practicalphp/10/1/0
Session data is very expensive workload too. The best way to do it is to store a cookie, or session_id and use that to look up what you need from a dbfile/rdbms. This also allows your site to run across a multi-server environment where as session data is limited to a single.
Here's a little background, currently i have
3 web servers
one db server which also host memcache for php sessions for the 3 web servers.
I have the php configs on the 3 servers to point to the memcache server for sessions. It was working fine until alot of connections were being produced for reads etc, which then caused connection timeouts.
So I'm currently looking at clustering the memcache on each web server for sessions, my only concern is how to go about making sure that memcache on all the servers have the same information for sessions.
Someone guided me to http://github.com/trs21219/Memcached-Library because i am using codeigniter but how do i converge my php sessions onto this since memcache seems as a key-value store? Thanks in advance.
Has anyone checked out http://repcached.sourceforge.net/ and does it work?
I'm not sure you have the same expectations of memcache that its designers had.
First, however, memcache distribution works differently than you expect: there is no mechanism to replicate stored information. Each memcache instance is a simple key-value store, as you've noticed. The distribution is done by the client code which has a list of all configured memcache instances and does a hash of the key to direct it to one of the instances. It is possible for the client to store it everywhere and retrieve it locally, or for it to hash it multiple times for redundancy, but these are not straightforward exercises.
But the other issue is that memcache is designed for reasonably short-lived data that memcache is allowed to throw away at any time. This makes it really good for caching frequently accessed data that can be a little stale (say up to a few minutes old) but might be expensive to retrieve (such as almost a minute to generate from a query).
PHP sessions don't really qualify for this, in my experience. A database can easily support many thousands of PHP sessions with barely visible traffic, but you need a lot of memcache storage to support the same number: 50k per session and 5000 sessions means close to 256Mb, and then there is all the other data you want to put in there. Not enough storage and you get lots of unexplained logouts (as memcache discards session data when under memory pressure) and thus lots of annoyed users who have to keep logging in again.
We've found GREAT advantage applying MongoDB instead of MySQL for most things, including session handling. It's far faster, far smaller, far easier. We keep MySQL around for transactional needs, but everything else goes into Mongo now. We've relegated memcache to simply caching pages and other data that isn't critical if it's there or not, something like smarty does.
There is no need to use some 3rd party libraries to organize memcached "cluster".
http://ru.php.net/manual/en/memcached.addserver.php
Just use this function to add several servers into the pool and after that data will be stored and distributed over those servers. The server for storing/retrieving the data for the specific key will be selected according to consistent key distribution option.
So in this case you don't need to worry about "how to go about making sure that memcache on all the servers have the same information for sessions"
That's about all that I need to ask
I am dealing with a site right now and I can't see a really significant difference in storing my sessions in a database table over and not doing so.
There are a couple reasons why I sometimes store session data in a DB. Here are the biggest two:
Security Concerns on a Shared Server If you're running on a shared server, the chances are that it's easy for other users of the server to meddle their way into your temp directory and have access to the session data you have stored there. This isn't too common, but it can happen.
Using Multiple Servers If you're upscaling and using more than one server, it's best to store the session data in a database. That way the data is easily available throughout your entire server stack (or farm depending on how big you're going). This is also attainable through a flat file system, but using a database is usually a more elegant, easy solution.
The only thing I can think of for not using a database is simply the number of queries you'll be running. For each page load, you'll have an extra query to gather the session data. However, one small extra query shouldn't make that much difference. The two points I outlined above outweigh this small cost.
Hope that helped a bit.
On a shared host when you have no control over who can access the directory where session files are stored. In this case storing sessions in the DB can offer better security.
And one scenario with which I have no experience myself, but I believe is a realistic scenario:
On a loadbalanced server farm where subsequent requests of one user can be dispatched over multiple servers. In this case you could choose to have one central DB server. In such a scenario, if you wouldn't have such a centralized session repository, session data of users would get lost because they could switch servers per request.
There is a huge difference when you are using several servers, with a load-balancing mecanism that doesn't guarantee that a given use will always be sent to the same server :
with file-based session, if the user is load-balanced to a server that is not the same as the one which served the previous page, the file containing its session will not be found (as it's on another server), and he will not have his session data
with databased-based or memcached-based sessions, the session data will be available from whatever server -- which is quite nice actually, in this quite of situation.
There's also a difference when you are using some shared hosting : with file-based session, if those are placed in the "temporaty" directory of the server (like /tmp), anyone might read your sessions files, depending on the configuration of the server. With DB-based sessions, this problem doesn't exists, as each user will have a different DB and DB user.
In addition to the above posts:
Database sessions (when session table is of Memory type) are faster.
When using file-based sessions, session file is locked until script ends. So, user cannot have two working at the same time scripts on server. This matters for example, when you write a download server. User downloads a file, script sends file to him, leaving session file locked. And user cannot at the same time browse the contents of file archive.
OK, so I've got this totally rare an unique scenario of a load balanced PHP website. The bummer is - it didn't used to be load balanced. Now we're starting to get issues...
Currently the only issue is with PHP sessions. Naturally nobody thought of this issue at first so the PHP session configuration was left at its defaults. Thus both servers have their own little stash of session files, and woe is the user who gets the next request thrown to the other server, because that doesn't have the session he created on the first one.
Now, I've been reading PHP manual on how to solve this situation. There I found the nice function of session_set_save_handler(). (And, coincidentally, this topic on SO) Neat. Except I'll have to call this function in all the pages of the website. And developers of future pages would have to remember to call it all the time as well. Feels kinda clumsy, not to mention probably violating a dozen best coding practices. It would be much nicer if I could just flip some global configuration option and VoilĂ - the sessions all get magically stored in a DB or a memory cache or something.
Any ideas on how to do this?
Added: To clarify - I expect this to be a standard situation with a standard solution. FYI - I have a MySQL DB available. Surely there must be some ready-to-use code out there that solves this? I can, of course, write my own session saving stuff and auto_prepend option pointed out by Greg seems promising - but that would feel like reinventing the wheel. :P
Added 2: The load balancing is DNS based. I'm not sure how this works, but I guess it should be something like this.
Added 3: OK, I see that one solution is to use auto_prepend option to insert a call to session_set_save_handler() in every script and write my own DB persister, perhaps throwing in calls to memcached for better performance. Fair enough.
Is there also some way that I could avoid coding all this myself? Like some famous and well-tested PHP plugin?
Added much, much later: This is the way I went in the end: How to properly implement a custom session persister in PHP + MySQL?
Also, I simply included the session handler manually in all pages.
You could set PHP to handle the sessions in the database, so all your servers share same session information as all servers use the same database for that.
A good tutorial for that can be found here.
The way we handle this is through memcached. All it takes is changing the php.ini similar to the following:
session.save_handler = memcache
session.save_path = "tcp://path.to.memcached.server:11211"
We use AWS ElastiCache, so the server path is a domain, but I'm sure it'd be similar for local memcached as well.
This method doesn't require any application code changes.
You don't mentioned what technology you are using for load balancing (software, hardware etc.); but in any case, the solution to your problem is to employ "sticky sessions" on the load balancer.
In summary, this means that when the first request from a "new" visitor comes in, they are assigned a specific server from the cluster: all future requests for the lifetime of their session are then directed to that server. In practice this means that applications written to work on a single server can be up-scaled to a balanced environment with zero/few code changes.
If you are using a hardware balancer, such as a Radware device, then the sticky sessions is configured as part of the cluster setup. Hardware devices usually give you more fine-grained control: such as which server a new user is assigned to (they can check for health status etc. and pick the most healthy / least utilised server), and more control of what happens when a server fails and drops out of the cluster. The drawback of hardware balancers is the cost - but they are worth it imho.
As for software balancers, it comes down to what you are using. For Apache there is the stickysession property on mod_proxy - and plenty of articles via google to get this working with the php session ( for example )
Edit:
From other comments posted after the original question, it sounds like your "balancing" is done via Round Robin DNS, so the above probably won't apply. I'll refrain from commenting further and starting a flame against round robin dns.
The easiest thing to do is configure your load balancer to always send the same session to the same server.
If you still want to use session_set_save_handler then maybe take a look at auto_prepend.
If you have time and you still want to check more solutions, take a look at
http://redis4you.com/articles.php?id=01..
Using redis you are fault tolerant. From my point of view, it could be better than memcache solutions because of this robustness.
If you are using php sessions you could share with NFS the /tmp directory, where I think the sessions are stored, between all the servers in the cluster. That way you don't need database.
Edited: You can also use an external service like memcachedb (persistent and fast) and store the session info in the memcachedb index and indentify it with a hash of the content or even the session ID.
When we had this situation we implemented some code that lives in a common header.
Essentially for each page we check if we know the session Id. If we dont we check if we're in the situation whehich you describe, by checking if we have stored sesion data in the DB.Otherwise we just start a new session.
Obviously this requires all relevant data to be copied to the DB, but if you encapsulate your session data in a seperate class then it works OK.
you could also try using memcache as session handler
Might be too late, but check this out: http://www.pureftpd.org/project/sharedance
Sharedance is a high-performance server to centralize ephemeral key/data
pairs on remote hosts, without the overhead and the complexity of an SQL
database.
It was mainly designed to share caches and sessions between a pool of web
servers. Access to a sharedance server is trivial through a simple PHP API and
it is compatible with the expectations of PHP 4 and PHP 5 session handlers.
When it comes to php session handling in the Load Balancing Cluster, it's best to have Sticky Sessions. For that ask the network of datacenter who is maintaining the load balancer to enable the sticky session. Once that is enabled you'll don't need worry about sessions at php end