We are using Codeigniter in an app that is being sharded. This involves splitting the database across user ids. There are two kinds of tables in the sharded database - one is sharded by user id, so that the data is evenly split across multiple shards, and global, where the data is replicated across all shards. We are also of course load balanced, so using the default php sessions doesn't work.
We like the CI sessions database for it's security, but we can't shard it, we are going to be hitting that table pretty heavily, and it's going to be replicated madly back and forth across all shards. This is not a good situation. Our load testing has indicated that the the ci_sessions table is already a pain point.
We have a couple of strategies for dealing with it. One, because we are using propel for an ORM, we can easily split ci_sessions, which uses CI's database access layer, off into a DB of it's own. We could even shard that one, if necessary.
I'm wondering whether there are other alternatives that people would suggest?
what about NOT using db session, or html5 session/local Storage? :)
or using cookies to set custom vars or ids or domain paths?
anyway it depends on how much session/cookie size you need for each user, sure db will guarantees more space and better security.
I love mongodb anyway :D https://github.com/sepehr/ci-mongodb-session
Related
I am working on a project with a custom HTML5 front end and a backend I've designed from experience. The backend is composed of a message queue and a cache - currently I've chosen Beanstalk and Memcache because I'm famliar with them but I am open to suggestions.
My question though comes from how my coder is interfacing with the MySQL DB we are using to store the data. The idea is to pre-cache most or all of the DB so the site runs really fast. It's not a huge DB so RAM for Memcache shouldn't be an issue. However, my coder is using CodeIgniter with GreenBean. I've never heard of GreenBean before and when I google it I get almost nothing that isn't related to greenbeans the food. What little I could find suggested it was an ORM which fits from what my coder has told me.
The problem is this. With raw PDO my pre-caching scheme is simple - I would grab each row from each table and store it in the cache with a key. Then every time I needed that data I would look at the cache first for it and then the DB. If something is changed on the backend then I only need to update that row in the DB and the associated key in the cache.
With an ORM, if I store the entire ORM object serialized into the cache then it holds a bunch of related data. Data that could be incorrect if something were changed. For example, you have a DB of employees that is linked to the office they work in and the dept they work in. The ORM grabs the office and the dept and we store all of that in the cache. But if the office address changes the ORM object for every employee in that office is now stale/incorrect.
In that example, just letting the cache expire probably isn't an issue most of the time. But in my application, that data should really get updated immediately. So in a simple PDO scheme you flush the cache keys related to the data that changed and every future page call gets the updated data. But with an ORM you have lots and lots of cached object instances that might be incorrect and no good way of finding them. So it seems to me you are now left with some form of indexing of your cached objects and when you change something simple you could be flushing and refilling a big chunk of the cache. The site gets really slow then.
Typically I would just cache a DB result after the first time I needed it but in this case I think that could end up being really slow for a lot of users that make the first requests that particular set of data. Additionally, there are some search features that could require a lot of data from the DB. Thus my desire to pre-cache.
So in this case I'm thinking an ORM would hurt the site's performance. I'm thinking I'm not the first person to have this issue though. Is there an ORM out there that would handle this scenario well? Is there a better backend architecture I'm missing?
Thanks
I've recently taken over a project linking to a large MySQL DB that was originally designed many years ago and need some help.
Currently the DB has 5 tables per client that store their users information, transaction history, logs etc. However we currently have ~900 clients that have applied to use our services, with an average of 5 new clients applying weekly. So the DB has grown to nearly 5000 tables and ever increasing. Many of our clients do not end up using our services so their tables are all empty but still in the DB.
The original DB designer says it was created this way so if a table was ever compromised it would not reveal information on any other client.
As I'm redesigning the project in PHP I'm thinking of redesigning the DB to have an overall user, transaction history, log etc tables using the clients unique id to reference them.
Would this approach be correct or should the DB stay as is?
Could you see any possible security / performance concerns
Thanks for all your help
You should redesign the system to have just five tables, with a separate column identifying which client the row pertains to. SQL handles large tables well, so you shouldn't have to worry about performance. In fact, having many, many tables can be a hinderance to performance in many cases.
This has many advantages. You will be able to optimize the table structures for all clients at once. No more trying to add an index to 300 tables to meet some performance objective. Managing the database, managing the tables, backing things up -- all of these should be easier with a single table.
You may find that the database even gets smaller in size. This is because, on average, each of those thousands of tables has a half-paged filled at the end. This will go from thousands of half-pages to just one.
The one downside is security. It is easier to put security on tables than one rows in tables. If this is a concern, you may need to think about these requirements.
This may just be a matter of taste, but I would find it far more natural - and thus maintainable - to store this information in as few tables as possible. Also most if not all database ORMs will be expecting a structure like this, and there is no reason to reinvent that wheel.
From the perspective of security, it sounds like this project could be described as a web app. Obviously I don't know the realities of the business logic you're dealing with, but it seems like regardless of the table permissions all access to the database would be via the code base, in which case the app itself needs full permissions for all tables - nullifying any advantage of keeping the tables separated.
If there is a compelling reason for the security measures - say, different services that feed data into the DB independently of the web app, I would still explore ways to handle that authentication at the application layer instead of at the database layer. It will be much easier to handle your security rules in that way. Instead of having rules set in 5000+ different places, a single security rule of 'only let a user view a row of data if their user id equals the user_id column" is far simpler, easier to understand, and therefore far more maintainable (and possibly more secure).
Different people approach databases in different ways. I am a web developer, so I view databases as the place to store my data and nothing more, as it's always a dedicated and generally single-purpose DB installation, and I handle all other logic at the application level. There are people who view databases as the application itself, who make far more extensive use of built-in security features for their massive, distributed, multi-user systems - but I honestly don't know enough about those scenarios to comment on exactly where that line should be drawn.
From looking at the way some forum softwares are storing data in a database (eg. phpBB uses MySQL databases for storing just about everything) I started to wonder why they do it that way? Couldn't it be just as fast and efficient to use.. maybe xsl with xslt to store forum topics and posts? Or to at least store the posts in a topic?
There are loads of reasons why they use databases and not flat files. Here are a few off the top of my head.
Referential integrity
Indexes and efficient searching
SQL Joins
Here are a couple more posts you can look at for more information :
If i can store my data in text files and easily can deal with these files, why should i use database like Mysql, oracle etc
Why use MySQL over flatfiles?
Why use SQL database?
But this is exactly what databases have been designed and optimized for, storage and retrieval of data. Using a database allows the forum designer to focus on their problem and not worry about implementing storage as well. It wouldn't make sense to ignore all the work that has been done in the database world and instead implement your own solution. It would take more time, be more buggy, and not run as quickly.
Database engines handle all the problems of concurrency. Imagine that, two users try to write in your forum at the same time. If you store the post in files, the first attempt will lock the file so the second has to wait for the first to finish.
Otherwise if you want to search, it's much faster to do it in database than scanning all the files.
So basically, it's not a good idea to store data wich can be modified by useres simultaneously, and searching is much more efficient in database.
Simply, easy access to data. It's a lot easier to find posts between a date, created by a user, or with certain keywords. You could do all of the above with flat file storage, but this would be IO intensive and slow. If you had the idea of storing each post in its own file, you'd then have the problem of running out of disk space, not because of lack of capacity, but because you'd have consumed all the available inodes.
Software such as this usually has a static caching feature - pages that don't change are written out to static HTML files, and those are served instead of hitting the database.
Mixing static caching with relational DB storage provides the best of both worlds.
I'm looking for some advice on whether or not I should use a separate database to handle my sessions.
We are writing a web app for multiple users to login and check/update their account specific information. We didn't want to use the file storage method on the webserver for storing session information, so we decided to use a database (MySQL). It's working fine, but I'm wondering about performance when this gets into production.
Currently, we have two databases (rst_sessions, and rst). The "RST" database is where all the tables are stored for the webapp...they are all MYSQL InnoDB using Referential Integrity/foreign keys to link the tables. The "RST_SESSIONS" database simply has one table and all the session information gets stored there.
Here's one of my concerns. In the PHP code if I want to run a query against "RST" then I have to select that database as such inside php ( $db->select("RST") )...when I'm done with the query I have to re-select the "RST_SESSIONS" ( $db->select("RST_SESSIONS") ) or else the session specific information doesn't get set. So, throught the webapp the code is doing a lot of selecting and reselecting of the two databases. Is this likely to cause performance issues with user base of say (10,000 - 15,000)? Would we be better off moving the RST_SESSIONS table into the RST database to avoid all the selecting?
One reason we initially set things up this way was to be able to store the sessions information on a separate database server so it didn't interfere with the operations of the webapp database.
What are some of the pro's and con's of both methods and what would you suggest we do for performance? Thanks in advance.
If you're worrying about performances, another alternate solution would be to not store your sessions in database, but to use something like memcached -- the PHP library to dialog with memcached already provides a handler for sessions.
A couple of advantages of using memcached :
No hit to the disk : everything is in RAM
Of course, this means sessions will be lost if your server crashes ; but if a crash happens, you'll probably have other troubles than jsut losing sessions, and this is not likely to happen often
Used in production by many websites, and works well (I'm using it for a couple of websites)
Better scalability : if you need more RAM or more CPU-power for your memcached cluster, just add a couple of servers
And I would add : once you've started using memcached, you can also use it as a caching mecanism ;-)
Now, to answer to your specific questions :
Instead of selecting the DB, I would use two distinct connections :
One for the DB that's use for the application,
And one other for the DB that's used for the sessions.
Of course, this means a bit more load on the server (it doubles the number of opened connections), but it make sure that, the day it becomes needed, you'll be able to move the "session" database to another server : you'll just have to re-configure a connection string ; and as the application already uses two separate connections, it'll still work fine.
If you can live with it, just open a second connection to the database. That way you won't have to switch between databases at all. Of course, now you consume twice as many connections, and may need to bump the limit.
Unless there's some overriding reason to put your auth information in a separate database, why not put it with the rest of your data? You may find it convenient to have everything in one place.
Notice also that you can qualify your table names in your sql queries with a schema (database) name e.g.
SELECT ACTIVE
FROM RST_SESSIONS.SESSION
WHERE SID=*whatever*
This may get you out of the need to switch dbs explicitly, if they're both on the same server.
I am wondering if it is viable to store cached items in Session variables, rather than creating a file-based caching solution? Because it is once per user, it could reduce some extra calls to the database if a user visits more than one page. But is it worth the effort?
If the data you are caching (willing to cache) does not depend on the user, why would you store in the session... which is attached to a user ?
Considering sessions are generally stored in files, it will not optimise anything in comparaison of using files yourself.
And if you have 10 users on the site, you will have 10 times the same data in cache ? I do not think this is the best way to cache things ;-)
For data that is the same fo all users, I would really go with another solution, be it file-based or not (even for data specific to one user, or a group of users, I would probably not store it in session -- except if very small, maybe)
Some things you can look about :
Almost every framework provides some kind of caching mecanism. For instance :
PEAR::Cache_Lite
Zend_Cache
You can store cached data using lots of backend ; for instance :
files
shared memory (using something like APC, for example)
If you have several servers and loads of data, memcached
(some frameworks provide classes to work with those ; switching from one to the other can even be as simple as changing a couple of lines in a config file ^^ )
Next question is : what do you need to cache ? For how long ? but that's another problem, and only you can answer that ;-)
It can be, but it depends largely on what you're trying to cache, as well as some other circumstances.
Is the information likely to change?
Is it a problem if slightly outdated information is shown?
How heavy is the load the query imposes on the database?
What is the latency to the database server? (shouldn't be an issue on local network)
Should the information be cached on a per user basis, or globally for the entire application?
Amount of data involved
etc.
Performance gain can be significant in some cases. On a particular ASP.NET / SQL Server site I've worked on, adding a simple caching mechanism (at application level) reduced the CPU load on the web server by a factor 3 (!) and at the same time prevented a whole bunch of database timeout issues when accessing a certain table.
It's been a while since I've done anything serious in PHP, but I think your only option there is to do this at the session level. Most of my considerations above are still valid however. As for effort; it should take very little effort to implement, assuming your code is sufficiently structured.
Session should only really be used strictly for user specific data. If you're using it to cache things that should be common across multiple sessions, you're duplicating a lot of data needlessly. Why not just use the Cache that comes with ASP.NET (you can use inProcess, rather than SQL if your concern is DB roundtrips, since you'll be storing Cached data in memory)