I have a web application where companies can register their company and use a set of features. However, lets say company 1 and company 2 has registered. They are still accessing the same website. Now each of these companies are 100% independent of each other when it comes to sharing information etc. The only thing they might share, is the users/employees.
Now my question is really, what is the best practice if each of these companies are to insert, select, update and deleted about 10K rows a day, each.
It can be everything from project handling, hourlists etc. All of which are split into different tables.
Would it be best practice to have independent databases, or use the same database for all the companies, and identify them by company_id?
Also keeping in mind the web application has to easily adapt to more than 10+ companies.
You could go one of two ways:
Add a companyId column to your tables,
Create a separate database for each company.
Option 1:
This option is the most dynamic one. You can keep the data separated by adding the correct companyId identifier to the where clause of your query.
This method is good when:
You expect a large number of customers,
You expect your number of customers to increase and decrease on a regular basis,
You do not need to share your database access with your customers (they only access it through your API/GUI).
Option 2:
This option gives a better separation of data. You keep each custommers data in their own dedicated instance of the database schema. This option allows you to offload the access-control burden to the database server, instead of having to enforce it in your application logic (which is more error prone).
However, there are some downsides: whenever a new customer shows up, you need to create a new database instance for them, which implies having a user with create database and grant privileges, something not every system administrator would be overly happy about.
The other issue is that whenever something changes in the database structure, you need to apply the chance to each instance of the database.
The good thing about this option is that you can give backup copies of your database to your customers, give them direct access to the database server, if needs be, or, in a more limited form, you could give them a copy of the database structure, without the need to filter out the customerId columns (as would be the case with option 1 above).
In summary:
There is no silver bullet, it all depends on your use-case. Option 1 is more flexible, Options 2 offers a better separation of data and easier access management.
[1]Keep separate database as there is more DML operations with your database.
[2]Keep very good database maintenance plan for Statistics management, Index maintenance and Backup/Recovery,otherwise you will have performance issue or more down time in case of database crash.
Related
I have a question regarding databases and performances, so let me explain the situation.
The application - to be build - has the following set-up:
A group, with under that group, users.
Data / file-locations, (which is used to search through), estimated that one group can easily reach one million "search" terms.
Now, groups can never look at each other's data, and users can only look at the data which belongs to their group.
The only thing they should have in common is, some place to send error logs to (maybe, not even necessary).
Now in this situation, would you create a new database per group, or always limit your search results with a query, which will take someones user-group-id into account?
Now my idea was to just create a new Database, because you do not need to limit your query, every single time and it will keep the results to search through lower (?) but is that really necessary or is, even on over a million records, a "where groupid = 1" fast enough to not notice a decrease in performance.
This is the regular multi-tenant SaaS Architecture problem, which has been discussed at length, and the solution always varies according to your own situation. Here is one example of this discussion that I will just link to instead of copy-paste since all of it is worth a read: Multi-tenant PHP SaaS - Separate DB's for each client, or group them?
In addition to that I would like to add some more high level considerations:
Are there any legal requirements regarding the storage of your user's data? Some businesses operate in a regulatory environment where they are not allowed to store their data in a shared environment, quite common in the financial and medical industries.
Will you offer the same security (login method, data storage encryption), backup/restore service, geolocation redundancy and up-time guarantee to all users?
Are there any users who are willing to pay extra to have their data stored in a separate environment?
Are there any users who will potentially have requirements that are not compatible with the standard product that you will be offering? If so will you try to accommodate them? Note that occasionally there is some big customer that comes along and offers a lot of cash for a special treatment.
What is a separate environment? Is it a separate database, a separate virtual machine, a separate physical machine, a machine managed by the customer?
What parts of your application is part of each environment (hardware configuration, network config, database, source code, binaries, encryption certificates, etc)?
Will there be some heavy users that may produce loads on your application that will negatively impact the performance for the smaller users?
If you go for all users in one environment then is there a possibility that you in the future will create a separate environment for some customer? If so this will impact where you put shared data, eg configuration data like tax rates, and exchange rate data, etc.
I hope this helps.
Performance isn't really your problem, maintaining and data security is. If you have a lot of databases, you will have more to maintain. Not only backups but connection strings, patches, schema updates on release and so on. Multiple databases also suggests that you will have multiple PHP sites too. That will gradually get more expensive as the number of groups grows.
If you have one database then you need to ensure that every query contains the group id before it can run.
Database tables can be very, very large if you choose your indexes and constraints carefully. If you are performing joins against very large tables then it will be slow but a simple lookup, where you have an index on the group column should be fast enough.
If you were to share a single database, would you ever move a group out of it? If that's a possibility then split the databases now. If you are going to have one PHP site then I would recommend a single database with a group column.
I've recently taken over a project linking to a large MySQL DB that was originally designed many years ago and need some help.
Currently the DB has 5 tables per client that store their users information, transaction history, logs etc. However we currently have ~900 clients that have applied to use our services, with an average of 5 new clients applying weekly. So the DB has grown to nearly 5000 tables and ever increasing. Many of our clients do not end up using our services so their tables are all empty but still in the DB.
The original DB designer says it was created this way so if a table was ever compromised it would not reveal information on any other client.
As I'm redesigning the project in PHP I'm thinking of redesigning the DB to have an overall user, transaction history, log etc tables using the clients unique id to reference them.
Would this approach be correct or should the DB stay as is?
Could you see any possible security / performance concerns
Thanks for all your help
You should redesign the system to have just five tables, with a separate column identifying which client the row pertains to. SQL handles large tables well, so you shouldn't have to worry about performance. In fact, having many, many tables can be a hinderance to performance in many cases.
This has many advantages. You will be able to optimize the table structures for all clients at once. No more trying to add an index to 300 tables to meet some performance objective. Managing the database, managing the tables, backing things up -- all of these should be easier with a single table.
You may find that the database even gets smaller in size. This is because, on average, each of those thousands of tables has a half-paged filled at the end. This will go from thousands of half-pages to just one.
The one downside is security. It is easier to put security on tables than one rows in tables. If this is a concern, you may need to think about these requirements.
This may just be a matter of taste, but I would find it far more natural - and thus maintainable - to store this information in as few tables as possible. Also most if not all database ORMs will be expecting a structure like this, and there is no reason to reinvent that wheel.
From the perspective of security, it sounds like this project could be described as a web app. Obviously I don't know the realities of the business logic you're dealing with, but it seems like regardless of the table permissions all access to the database would be via the code base, in which case the app itself needs full permissions for all tables - nullifying any advantage of keeping the tables separated.
If there is a compelling reason for the security measures - say, different services that feed data into the DB independently of the web app, I would still explore ways to handle that authentication at the application layer instead of at the database layer. It will be much easier to handle your security rules in that way. Instead of having rules set in 5000+ different places, a single security rule of 'only let a user view a row of data if their user id equals the user_id column" is far simpler, easier to understand, and therefore far more maintainable (and possibly more secure).
Different people approach databases in different ways. I am a web developer, so I view databases as the place to store my data and nothing more, as it's always a dedicated and generally single-purpose DB installation, and I handle all other logic at the application level. There are people who view databases as the application itself, who make far more extensive use of built-in security features for their massive, distributed, multi-user systems - but I honestly don't know enough about those scenarios to comment on exactly where that line should be drawn.
You'll have to bear with me here for possibly getting some of the terminology slightly wrong as I wasn't even aware that this fell into the whole 'multi-tenant' 'software as a service' category, but here it does.
I've developed a membership system (in PHP) for a client. We're now looking at offering it as a completely hosted solution for our other clients, providing a subdomain (or even their own domain).
The options I seem to have on the table, as far as data storage goes are:
Option 1 - Store everything in 1 big database, and have a 'client_id' field on the tables that need it (there would be around 30 tables that it would apply to), and have a 'clients' table storing their main settings, details, etc and the domain to map to them. This then just sets a globally accessible variable containing their individual client id - I'd obviously have to modify every single query to check for the client_id column.
Option 2 - Have a master table with the 'shared reference' tables, and the 'clients' table. Then have 'blocks' of other databases, which each contain, say 10 clients. The client would get their own database tables, prefixed with their client ID. This adds a little bit of security to protect against seeing other client data if something went really wrong.
Option 3 - Exactly the same as option 2, except you have 1 database for each and every client, completely isolating them from other clients, and theoretically providing a bit more protection that if 1 client's tables were hacked or otherwise damaged, it wouldn't affect anyone else. The biggest downside is that when deploying a new client, an entire database, user and password need setting up, etc. Could this possibly also cause a fair amount of overhead, or would it be pretty much the same as if you had everyone in one database?
A few points as well - some of these clients will have 5000+ 'customers' along with all the details for those customers - this is why option 1 may be a bit of an issue - if I've got 100 clients, that could equal over half a million rows in 1 table.
Am I correct in thinking Option 3 would be the best way to go in a situation where security of customer data (and payment information) is key. From recommendations I've had, a few people have said to go with option 1 because 'its easier' however I really don't see it that way. I see it as a potential bottleneck down the line, as surely I can move clients around much easier if they have their own database.
(FYI The system is PHP based with MySQL)
Option 3 is the most scalable. While at first it may seem more complicated, it can be completely automated and will save you headaches on the future. You can also scale more efficiently by having client databases on multiple servers for increased performance.
I agree with Ozzy - I did this for an online database product. We had one master database that basically had a glorified user table. Each customer had their own database. What was great about this is that I could move one customers database from server A to server B easily [mysql] and could do so with command line tools in a pinch. Also doing maintenance on large tables, dropping/adding indexes can really screw up your application, especially if say, adding an index locks the table [mysql]. It affects everyone. With, presumably, smaller databases you are more immune to this and have more options when you need to roll out schema level changes. I just like the flexibility.
When many years ago I designed a platform for building SaaS applications in PHP, I opted for the third option: multi tenant code and single tenant databases.
In my experience, that is the most scalable option, but it also needs a set of scripts to propagate changes when updating code, DB schemes, enabling applications to a tenant, etc.
So a lot of my effort went in building a component based, extensible engine to fully automate all those tasks and minimize system administration stuff. I strongly advise to build such an architecture if you want to adopt the third option.
I have built a web application for one user, but now I would like to offer it to many users (it's an application for photographer(s)).
Multiple databases problems
I first did this by creating an application for each user, but this has many problems, like:
Giving access to a new user can't be automated (or is very difficult) since I have to create a subdomain, a database, initial tables, copy code to a new location, etc. This is tedious to do by hand!
I can't as easily create reports and statistics of usage, like how many projects do my users have, how many photos, etc.
Single database problems
But having just one database for each users creates it's own problems in code:
Now I have to change the DB schema to accommodate extra users, like the projects table having a user_id column (the same goes for some other tables like settings, etc.).
I have to look at almost each line of code that accesses the database and edit the SQL for selecting and inserting, so that I sava data for that specific user, at the same time doing joins so that I check permissions (select ... from projects inner join project_users ... where user_id = ?).
If I forget to do that at one spot in the code it means security breach or another unpleasant thing (consider showing user's projects by just doing select * from projects like I used to do - it will show all users' projects).
Backup: backup is harder because there's more data for the whole database and if a user says: "hey, I made a mistake today, can you revert the DB to yesterday", I can't as easily do that.
A solution?
I have read multiple questions on stackoverflow and have decided that I should go the "single database" route. But I'd like to get rid of the problems, if it's possible.
So I was thinking if there was a way to segment my database somehow so that I don't get these nasty (sometimes invisible) bugs?
I can reprogram the DB access layer if needed, but I'm using SQLs and not OO getter and setter methods.
Any help would be greatly appreciated.
I don't think there's a silver bullet on this one - though there are some things you can do.
Firstly, you could have your new design use a different MySQL user, and deny that user "select" rights on tables that should only be accessed through joins with the "users" table. You can then create a view which joins the two tables together, and use that whenever you run "select" queries. This way, if you forget a query, it will fail spectacularly, instead of silently. You can of course also limit insert, update and delete in this way - though that's a lot harder with a view.
Edit
So, if your application currently connects as "web_user", you could revoke select access on the projects table from that user. Instead, you'd create a view "projects_for_users", and grant "select" permissions on that view to a new user - "photographer", perhaps. The new user should also not have select access to "projects".
You could then re-write the application's data access step by step, and you'd be sure that you'd caught every instance where your app selects projects, because it would explode when trying to retrieve data - neither of your users would have "select" permissions on the projects table.
As a little side bonus - the select permission is also required for updates with a where clause, so you'd also be able to find instances where the application updates the project table without having been rewritten.
Secondly, you want to think about the provisioning process - how will you grant access to the system to new users? Who does this? Again, by separating the database user who can insert records into "users", you can avoid stupid bugs where page in your system does more than you think it does. With this kind of system, there are usually several steps that make up the provisioning process. Make sure you separate out the privileges for those tasks from the regular user privileges.
Edit
Provisioning is the word for setting up a service for a new user (I think it comes from the telephony world, where phone companies will talk about provisioning a new service on an existing phone line). It usually includes a whole bunch of business processes - and each step in the process must succeed for the next one to start. So, in your app, you may need to set up a new user account, validate their email address, set up storage space etc. Each of those steps needs to be considered as a step in the process, not just a single task.
Finally, while you're doing this, you may as well think about different levels of privilege. Will your system merit different types of user? Photographers, who can upload work, reviewers who can't? If that's a possible feature extension, you may want to build support for that now, even if the only type of user you support on go-live is photographer.
Well, time to face some hard facts -- I think. The "single database problem" that you describe, is not a problem, but a normal (usual) design. Quite often, one is simply a special case of many.
For some reason you have designed a web-app for one user -- not many of those around.
So, time to re-design.
So, first things first, I'm a student. I'm developing an application where other students can have access to a MySQL database. Basically, I wanted to spare the students the need to search for hosting or even installing MySQL on their computers. Another plus is the fact that they can present their works to the class just by browsing a website. So, my idea was to use the same database for everyone, and add a login system for the students. This way, I can associate a prefix to every student, and they can execute any type of query without worrying if it will clash with someone's table, because the system would prefix their queries tables automatically. My idea was to limit how much tables and rows each user can have, which shouldn't be hard with a parser. It doesn't necessarily need to be a parser in PHP, it could be in perl or python. PHP is just more convenient. .NET would be more troublesome because of Windows
By the way, each class of "introduction to database systems" has around 50 students and there are 3 classes, so it could reach about 150 students...
For example, SELECT * FROM employees
has to become
SELECT * FROM prefix_employees
I do not know how the query will look like, it could get fairly complex so I'd probably need a well written parser, which I haven't found yet for PHP.
Thanks guys, I hope I have made myself clear
Unfortunately, MySQL does not (AFAIK) have schemas as some other databases (e.g. PostgreSQL) have them (for seperating content (tables, etc...) logically within one database).
But I would definitely go for the seperate databases-scenario.
Your parser (with the 'prefixing sheme') will be broken (unwillingly and also possibly willingly) unless you are willing to put an extreme amount of time into making this work.
I'd rather go with the "one database per user" approach. This solution requires some administration (you can either create the users/databases manually using a tool like phpMyAdmin, or simply create your own little administration panel in which you allow the students to register), but will require far less amount of work from you than filtering all requests.
This way, each student has his login/password, with preferably a database of the same name on which he has all rights (this can be done automatically with phpMyAdmin), and is able to work without interferring with other students. You can be sure that some will try to break your security, no matter how hard you try and how good-willing you are. Clustering them in different databases will leave them no choice than trying to gain admin access of your DB, which will be pretty hard if you maintain an up to date server and complex enough passwords (and you don't store them in clear on a "readable by all" .txt file on your university server.
Plus, you will be able to monitor the disk space, usage, etc... of each database individually, which is easier than having to look at tables separately.
Depending on your exact requirements, you may be able to use table permissions to prevent one student from modifying (or viewing) data from another student. You would still need a process to allow students to create a new table with their assigned prefix (and create an appropriate permissions entry), but once created, the DB would control access through all queries so you would not have to (just don't allow student accounts to directly create/alter tables).
As for quota, I'm not aware of MySQL directly supporting a quota system but you could create the files that back the tables for each user on a separate directory and use OS level quota systems to limit disk space usage.