I'm looking for some advice on whether or not I should use a separate database to handle my sessions.
We are writing a web app for multiple users to login and check/update their account specific information. We didn't want to use the file storage method on the webserver for storing session information, so we decided to use a database (MySQL). It's working fine, but I'm wondering about performance when this gets into production.
Currently, we have two databases (rst_sessions, and rst). The "RST" database is where all the tables are stored for the webapp...they are all MYSQL InnoDB using Referential Integrity/foreign keys to link the tables. The "RST_SESSIONS" database simply has one table and all the session information gets stored there.
Here's one of my concerns. In the PHP code if I want to run a query against "RST" then I have to select that database as such inside php ( $db->select("RST") )...when I'm done with the query I have to re-select the "RST_SESSIONS" ( $db->select("RST_SESSIONS") ) or else the session specific information doesn't get set. So, throught the webapp the code is doing a lot of selecting and reselecting of the two databases. Is this likely to cause performance issues with user base of say (10,000 - 15,000)? Would we be better off moving the RST_SESSIONS table into the RST database to avoid all the selecting?
One reason we initially set things up this way was to be able to store the sessions information on a separate database server so it didn't interfere with the operations of the webapp database.
What are some of the pro's and con's of both methods and what would you suggest we do for performance? Thanks in advance.
If you're worrying about performances, another alternate solution would be to not store your sessions in database, but to use something like memcached -- the PHP library to dialog with memcached already provides a handler for sessions.
A couple of advantages of using memcached :
No hit to the disk : everything is in RAM
Of course, this means sessions will be lost if your server crashes ; but if a crash happens, you'll probably have other troubles than jsut losing sessions, and this is not likely to happen often
Used in production by many websites, and works well (I'm using it for a couple of websites)
Better scalability : if you need more RAM or more CPU-power for your memcached cluster, just add a couple of servers
And I would add : once you've started using memcached, you can also use it as a caching mecanism ;-)
Now, to answer to your specific questions :
Instead of selecting the DB, I would use two distinct connections :
One for the DB that's use for the application,
And one other for the DB that's used for the sessions.
Of course, this means a bit more load on the server (it doubles the number of opened connections), but it make sure that, the day it becomes needed, you'll be able to move the "session" database to another server : you'll just have to re-configure a connection string ; and as the application already uses two separate connections, it'll still work fine.
If you can live with it, just open a second connection to the database. That way you won't have to switch between databases at all. Of course, now you consume twice as many connections, and may need to bump the limit.
Unless there's some overriding reason to put your auth information in a separate database, why not put it with the rest of your data? You may find it convenient to have everything in one place.
Notice also that you can qualify your table names in your sql queries with a schema (database) name e.g.
SELECT ACTIVE
FROM RST_SESSIONS.SESSION
WHERE SID=*whatever*
This may get you out of the need to switch dbs explicitly, if they're both on the same server.
Related
We are using Codeigniter in an app that is being sharded. This involves splitting the database across user ids. There are two kinds of tables in the sharded database - one is sharded by user id, so that the data is evenly split across multiple shards, and global, where the data is replicated across all shards. We are also of course load balanced, so using the default php sessions doesn't work.
We like the CI sessions database for it's security, but we can't shard it, we are going to be hitting that table pretty heavily, and it's going to be replicated madly back and forth across all shards. This is not a good situation. Our load testing has indicated that the the ci_sessions table is already a pain point.
We have a couple of strategies for dealing with it. One, because we are using propel for an ORM, we can easily split ci_sessions, which uses CI's database access layer, off into a DB of it's own. We could even shard that one, if necessary.
I'm wondering whether there are other alternatives that people would suggest?
what about NOT using db session, or html5 session/local Storage? :)
or using cookies to set custom vars or ids or domain paths?
anyway it depends on how much session/cookie size you need for each user, sure db will guarantees more space and better security.
I love mongodb anyway :D https://github.com/sepehr/ci-mongodb-session
I'm designing my own session handler for my web app, the PHP sessions are too limited when trying to control the time the session should last.
Anyway, my first tests were like this: a session_id stored on a mysql row and also on a cookie, on the same mysql row the rest of my session vars.
On every request to the server I make a query, get these vars an put them on an array to use the necesary ones on runtime.
Last night I was thinking if I could write the vars on a server file once, on the login stage, and later just include that file instead of making a mysql query on every request.
So, my question is: which is less resource consuming? doing this on mysql or on a file?
I know, I know, I already read several threads on stackoverflow about this issue, but I have something different from all those cases (I hope I didn't miss something):
I need to keep track of the time that has passed since the last time the user used the app, so, in every call to the server not only I request the entire database row, I also update a timestamp on that same row.
So, on both cases I need to write to the session on every request...
FYI: the entire app runs on one server so the several servers scenario when using files does not apply..
It's easier to work with when it's done in a database and I've been using sessions in database mostly for scalability.
You may use MySQL since it can store sessions in it's temporary memory with well-configured MySQL servers, you can even use memory tables to fasten the thing if you can store all the sessions within memory. If you get near your memory limit it's easy to switch to a normal table.
I'd say MySQL wins over files for performance for medium to large sites and also for customization/options. For smaller websites I think that it doesn't make that much of a difference, but you will use more of the hard drive when using files.
I'm writing a program that runs (24/7) on a Linux server and adds entries to a MySQL database.
The contents of the database are presented on a web interface with PHP and the user should be able to delete entries using the web interface.
Is it possible to access the database from multiple processes at the same time?
Yes, databases are designed for this purpose quite well. You'll want to keep a few things in mind in your designs:
Concurrency and race conditions on database writes.
Performance.
Separate database permissions for separate applications.
Unless you're doing something like accessing the DB using a singleton, the max number of simultaneous mysql connections php will use is limited in your php.ini. I believe it defaults to 100.
Yes multiple users can access the database at the same time.
You should however take care that the data is consistent.
If you create/edit entry with many small sql statements and in the meantime someone useses the web interface this may lead to some errors.
If you have a simple db this should not be a problem, else you should consider using transactions.
http://dev.mysql.com/doc/refman/5.0/en/ansi-diff-transactions.html
Yes and there will not be any problems while trying to delete records in the presence of that automated program which runs 24/7 if you are using the InnoDb engine. This is because transactions happen one at a time, one starts after another has finished and the database is consistent everytime.
This answer How to implement the ACID model for a database has many relevant points.
Read about the ACID Properties of a database. A Mysql database with InnoDb engine will take care of all these things for you and you need not worry about that.
Many database libraries come setup for multiple database connections - but I've never actually known of an scripting application that needed to connect to two databases during it's run. (compiled, daemon-running languages are a different matter).
I understand having database slaves so that you can spread the load out - but usually on startup only one of them is chosen to handle that scripts needs.
So why would a PHP or Ruby application need to connect to more than one database? Or rather, why would you split your data up among several databases?
The only thing I can think of is bad design from a slowly evolving system that started off in multiple separate parts.
Are you talking about different physical database servers or different databases in the "schema" sense?
Regarding physical servers, If you're using MySQL replication you might write to a master and always read from a slave. This helps split the load among each database.
The simple answer is "scalability".
The ready availability of replication and clustering in a number of database products makes multiple database use a definite 'this must be possible'. Any decent ORM should know how to connect to multiple databases as required.
But even when the main application doesn't connect to more than one, there will often be other needs that do. Report generation, either scripted or ad-hoc, often involve queries that run for a long time. These are best run on database replicants dedicated (and configured) for these queries so they don't disrupt the main application.
Another good use is a type of scripted processing. Many apps will have a regular process that needs to rummage through a large part of the database. Whislt updates obviously have to go to the master, the big read queries can be run off a replicant.
Of course, the obvious need is simple performance. I oversaw a webapp and database that grew from surviving comfortably on one MySQL databse on a 32-bit dual-core machine with 3Gb to needing two 8-core 64-bit servers with 8Gb. Once it reached this stage, it relied on the database handler directing traffic to both servers. We had a window of about 50 minutes in a day where it could survive on just one database.
I have a Ruby application that connects to multiple databases. One database contains user login credentials (which is shared between several other projects). Another database contains archived data that my application tracks and compares (that only my application accesses). Another database contains data regarding physical machine resources which my application uses to generate new data (these resources are used by several different applications). By splitting the data into multiple databases, different applications only access the data that they need to be accessing.
It is all too frequently the case that some of the data you need is stored in The Wrong Database. Sometimes it's personnel records in a PeopleSoft (Oracle) database. Maybe it's Enterprise CRM data on Informix. Or some departmental database stored in MS SQL Server. Whatever it is, it's in a different database, but you still need access (hopefully read-only).
Unless your primary database is magic-based, it isn't going to be able to provide you with remote table access for every other database out there. (Most will only provide remote access to other databases of the same type, eg: MySQL->MySQL.) When that all too frequent situation occurs, you'll have no other option but to have multiple database connections, and be glad that your framework supports it.
I have a site that connects with two databases. One powers the website content (CMS DB) the other drives a web application that runs within the site (large amounts of non-CMS data) In fact, the latter uses replication.
I don't feel that's bad design. If one set of data has no relation to the other, then it makes sense even from a pure organization perspective to house it in a separate DB. Otherwise, people would just put all their tables in one DB.
For added security, I always create two accounts for every database: a read-only account (good for SELECT) and a read-write account (for SELECT, UPDATE, INSERT, DELETE and whatever else I might need). On some pages, I may need to use both accounts, thus I will consume two connections for only one database.
Well, reading from one and writing to another is a very common use case. It's easy and fun to write a data access layer that reads from one connection (reading from the slave), and writes to another (the master). A single script might make multiple reads before writing -- perhaps some lookups are necessary for validation, for instance.
Scripting languages are also frequently used for integration. You might have two off-the-shelf codebases, both of which want to maintain their own database. Your integration code might want to talk to both of them.
In general, you can usually design out of using more than one connection, but in general, I don't see anything fundamentally wrong with using connections to more than one database.
Other reasons to have multiple databases. We have one application that everyone can access. We also have client database that are very differnt from client to client. It is easier to maintain the application that all clients use (and which is maintained by a differnte team) if the client_specific data is separated out to their own databases. It is also easier to move the client to a new server when they become a large enterprise client rather than the smaller clietns who run on a server with many other clients.
Further there are types of data that are transactional and need to be in databases that are set to full recovery mode with full transaction logging. Other data is only populated from imports and does not need transactional logging and which might slow down the system as the log grew enough to handle the 10,000,000 record import. These are often split out to a separate databse so they can be in simple recovery mode as it si not necessary to recover data from the transaction log if there is a problem, it can be easily recoverd by re-running the import.
Then data is split out into datawarehouses which are optimized for data reporting not transactions. Again these reporting databases are usually separate databases (often on separate servers).
Then you have the databases for multiple different COTS applications (we have accounting databases, Credit Card transaction porcessing databases, HR databases, our project management database). A particular website might need to access more than one of these or transfer information from one to the other. Believe me vendors won't let you copy their database structure into one database to rule them all.
We have several hundred databases here on many differnt servers.
Is it "better" (more efficient, faster, more secure, etc) to (A) cache data that is used on every page load in the $_SESSION array (but still querying a table for a flag to reload the data fresh), or (B) to load it from the database each time?
I'm using the cache method (A), but I'm worried that with hundreds of users, memory could become an issue? It's just simple data, like firstname, lastname, birthday, etc.
With either method, there's still a query being run. Thoughts?
If your data is used on every pages, and is the same for all users, I wouldn't cache it in $_SESSION (which means having a different copy of that data for each user), but with another mecanism, like :
file
In memory, with APC for instance (if only 1 server)
In memory, with memcached, for instance (if you have several servers)
If your data requires long calculations or several DB queries to be obtained, caching it in database could be another possibility (would mean only 1 query to fetch back, and less calculations)
If your data is not the same for each user (which seems to be the case in your situation, as you are caching names, birthdates, ...) :
I would make sure I only cache what is necessary
Once you only have a few data to cache, putting it in session should be quite OK
If you really have that many users, you'll probably have some other scalability problems, and will most likely come to use something like memcached anyway ; which means you'll have some other way of caching ;-)
As a sidenote : if you are doing the same query over and over again, you DB server should cache it by itself (for MySQL, it would go into the "query cache") ; so, it would not be as bad as you think, I suppose -- even if not that much optimized ^^
It depends on what you're session handler is. Your session handler could be MySQL, and thus the question would not be which is better, but how to optimize your session handling.
The default PHP session handler is files, but it can be changed to mysql quite easily.
If you're talking about non-user specific data, then just save it to the DB. Worry about optimizing if you run into problems later. It is usually much more beneficial to use a better design pattern then thinking about optimizing before hand. Design your code so you can easily use a different handler for storage, and you won't have optimizing problems later.
If it is user specific, use the session, but use an appropriate session handler if necessary.