I'm currently developping an application which allows doctors to dinamically generate invoices. The fact is, each doctors requires 6 differents database tables, and there could be like 50 doctors connected at the same time and working with the database (writing and reading) at the same time.
What I wanted to know is if the construction of my application fits. For each doctors, I create a personnal Sqlite3 database (all database are secure) which only him can connect to. I'll have like 200 Sqlite database, but is there any problems ? I thought it could be better than using a big MySQL database for everyone.
Is this solution viable ? Will I have problems to deal with ? I never did such an application with so many users, but I thought it could be the best solution
Firstly, to answer your question: no, you probably will not have any significant problems if a single sqlite database is used only by one person (user) at a time. If you highly value certain edge cases, like the ability to move some users/databases to another server, this might be a very good solution.
But it is not a terribly good design. The usual way is to have all data in the same database, and tables having a field which identifies which rows belong to which users. The application code is responsible for maintaining security (i.e. not to let users see data which doesn't belong to them), and indexes in the database (which you should use in all cases, even in your own design) are responsible for making it fast.
There are a large number of tutorials which could help you to make a better database design; a random google result is http://www.profsr.com/sql/sqless02.htm .
Related
I've recently taken over a project linking to a large MySQL DB that was originally designed many years ago and need some help.
Currently the DB has 5 tables per client that store their users information, transaction history, logs etc. However we currently have ~900 clients that have applied to use our services, with an average of 5 new clients applying weekly. So the DB has grown to nearly 5000 tables and ever increasing. Many of our clients do not end up using our services so their tables are all empty but still in the DB.
The original DB designer says it was created this way so if a table was ever compromised it would not reveal information on any other client.
As I'm redesigning the project in PHP I'm thinking of redesigning the DB to have an overall user, transaction history, log etc tables using the clients unique id to reference them.
Would this approach be correct or should the DB stay as is?
Could you see any possible security / performance concerns
Thanks for all your help
You should redesign the system to have just five tables, with a separate column identifying which client the row pertains to. SQL handles large tables well, so you shouldn't have to worry about performance. In fact, having many, many tables can be a hinderance to performance in many cases.
This has many advantages. You will be able to optimize the table structures for all clients at once. No more trying to add an index to 300 tables to meet some performance objective. Managing the database, managing the tables, backing things up -- all of these should be easier with a single table.
You may find that the database even gets smaller in size. This is because, on average, each of those thousands of tables has a half-paged filled at the end. This will go from thousands of half-pages to just one.
The one downside is security. It is easier to put security on tables than one rows in tables. If this is a concern, you may need to think about these requirements.
This may just be a matter of taste, but I would find it far more natural - and thus maintainable - to store this information in as few tables as possible. Also most if not all database ORMs will be expecting a structure like this, and there is no reason to reinvent that wheel.
From the perspective of security, it sounds like this project could be described as a web app. Obviously I don't know the realities of the business logic you're dealing with, but it seems like regardless of the table permissions all access to the database would be via the code base, in which case the app itself needs full permissions for all tables - nullifying any advantage of keeping the tables separated.
If there is a compelling reason for the security measures - say, different services that feed data into the DB independently of the web app, I would still explore ways to handle that authentication at the application layer instead of at the database layer. It will be much easier to handle your security rules in that way. Instead of having rules set in 5000+ different places, a single security rule of 'only let a user view a row of data if their user id equals the user_id column" is far simpler, easier to understand, and therefore far more maintainable (and possibly more secure).
Different people approach databases in different ways. I am a web developer, so I view databases as the place to store my data and nothing more, as it's always a dedicated and generally single-purpose DB installation, and I handle all other logic at the application level. There are people who view databases as the application itself, who make far more extensive use of built-in security features for their massive, distributed, multi-user systems - but I honestly don't know enough about those scenarios to comment on exactly where that line should be drawn.
I´m new on php/mysql, and i´m codding a simple CMS. But in this case i will host multiple companies (each company with their multiple users), that pays a fee to use the system.
So... My question is about how to organize the Data Base... Talking about security, management and performance, i just want to know the opinion of ou guys of wich of these cases is the best:
Host all companies on a single DB and they get a company id to match with the users.
Each company have a separated DB that holds the users in there (and dont need the companies id anymore).
I would start the development following the first situation... But than i thought if i have some hacker attack / sql injection, every client would be harmed. Having separated DBs, the damage will get only one client. So maybe the 2nd situation could be better in terms of security. But could not say the same about management and performance.
So, based on your experience, any help or tip would be great!
Thanks in advance, and sorry about my poor english.
I would go for seperate DBs. But not only for hacking.
Scalability:
Lets say you have a server that handles 10 websites, but 1 of those websites in growing fast in requests, content, etc. Your server is having a hard time to host all of them.
With seperate DB's it is a piece of cake to spread over multiple servers. With a single one you would have to upgrade you current DB or cluster it, but that is sometimes not possible with the hosting company or very expensive.
Performance:
You they are all on 1 DB and data of multiple users is in 1 table, locks might slow down other users.
Large tables, mean large indices, large lookups, etc. So splitting to diffrent DB's would actualy speed that up.
You would have to deal with extra memory and CPU overhead per DB but they normaly do not have an amazingly large impact.
And yes, management for multiple DBs is more work, but having proper update scripts and keeping a good eye on the versions of the DB schema will reduce your management concerns a lot.
Update: also see this article.
http://msdn.microsoft.com/en-us/library/aa479086.aspx
Separate DBs has many advantages including performance, security, scalability, mobility, etc. There is more risk less reward trying to pack everything into 1 database especially when you are talking about separate companies data.
You haven't provided any details, but generally speaking, I would opt for separate databases.
Using an autonomous database for every client allows a finer degree of control, as it would be possible to manage/backup/trash/etc. them individually, without affecting the others. It would also require less grooming, as data is easier to be distinguished, and one database cannot break the others.
Not to mention it would make the development process easier -- note that separate databases mean that you don't have to always verify the "owner" of the rows.
If you plan to have this database hosted in a cloud environment such as Azure databases where resources are (relatively) cheap, clients are running the same code base, the database schema is the same (obviously), and there is the possibility of sharing some data between the companies then a multi-tenant database may be the way to go. For anything else you, you will probably be creating a lot of extra work going with a multi-tenant database.
Keep in mind that if you go the separate databases route, trying to migrate to a multi-tenant cloud solution later on is a HUGE task. I only mention this because all I've been hearing for the past few years around the IT water coolers is "Cloud! Cloud! Cloud!".
You'll have to bear with me here for possibly getting some of the terminology slightly wrong as I wasn't even aware that this fell into the whole 'multi-tenant' 'software as a service' category, but here it does.
I've developed a membership system (in PHP) for a client. We're now looking at offering it as a completely hosted solution for our other clients, providing a subdomain (or even their own domain).
The options I seem to have on the table, as far as data storage goes are:
Option 1 - Store everything in 1 big database, and have a 'client_id' field on the tables that need it (there would be around 30 tables that it would apply to), and have a 'clients' table storing their main settings, details, etc and the domain to map to them. This then just sets a globally accessible variable containing their individual client id - I'd obviously have to modify every single query to check for the client_id column.
Option 2 - Have a master table with the 'shared reference' tables, and the 'clients' table. Then have 'blocks' of other databases, which each contain, say 10 clients. The client would get their own database tables, prefixed with their client ID. This adds a little bit of security to protect against seeing other client data if something went really wrong.
Option 3 - Exactly the same as option 2, except you have 1 database for each and every client, completely isolating them from other clients, and theoretically providing a bit more protection that if 1 client's tables were hacked or otherwise damaged, it wouldn't affect anyone else. The biggest downside is that when deploying a new client, an entire database, user and password need setting up, etc. Could this possibly also cause a fair amount of overhead, or would it be pretty much the same as if you had everyone in one database?
A few points as well - some of these clients will have 5000+ 'customers' along with all the details for those customers - this is why option 1 may be a bit of an issue - if I've got 100 clients, that could equal over half a million rows in 1 table.
Am I correct in thinking Option 3 would be the best way to go in a situation where security of customer data (and payment information) is key. From recommendations I've had, a few people have said to go with option 1 because 'its easier' however I really don't see it that way. I see it as a potential bottleneck down the line, as surely I can move clients around much easier if they have their own database.
(FYI The system is PHP based with MySQL)
Option 3 is the most scalable. While at first it may seem more complicated, it can be completely automated and will save you headaches on the future. You can also scale more efficiently by having client databases on multiple servers for increased performance.
I agree with Ozzy - I did this for an online database product. We had one master database that basically had a glorified user table. Each customer had their own database. What was great about this is that I could move one customers database from server A to server B easily [mysql] and could do so with command line tools in a pinch. Also doing maintenance on large tables, dropping/adding indexes can really screw up your application, especially if say, adding an index locks the table [mysql]. It affects everyone. With, presumably, smaller databases you are more immune to this and have more options when you need to roll out schema level changes. I just like the flexibility.
When many years ago I designed a platform for building SaaS applications in PHP, I opted for the third option: multi tenant code and single tenant databases.
In my experience, that is the most scalable option, but it also needs a set of scripts to propagate changes when updating code, DB schemes, enabling applications to a tenant, etc.
So a lot of my effort went in building a component based, extensible engine to fully automate all those tasks and minimize system administration stuff. I strongly advise to build such an architecture if you want to adopt the third option.
I'm going to try to make this as brief as possible while covering all points - I work as a PHP/MySQL developer currently. I have a mobile app idea with a friend and we're going to start developing it.
I'm not saying it's going to be fantastic, but if it catches on, we're going to have a LOT of data.
For example, we'd have "clients," for lack of a better term, who would have anywhere from 100-250,000 "products" listed. Assuming the best, we could have hundreds of clients.
The client would edit data through a web interface, the mobile interface would just make calls to the web server and return JSON (probably).
I'm a lowly cms-developing kinda guy, so I'm not sure how to handle this. My question is more or less about performance; the most I've ever seen in a MySQL table was 340k, and it was already sort of slow (granted it wasn't the best server either).
I just can't fathom a table with 40 million rows (and potential to continually grow) running well.
My plan was to have a "core" database that held the name of the "real" database, so the user would come in and try to access a client's data, it would go to the core database and figure out which database to get the information from.
I'm not concerned with data separation or data security (it's not private information)
Yes, it's possible and my company does it. I'm certainly not going to say it's smart, though. We have a SAAS marketing automation system. Some client's databases have 1 million+ records. We deal with a second "common" database that has a "fulfillment" table tracking emails, letters, phone calls, etc with over 4 million records, plus numerous other very large shared tables. With proper indexing, optimizing, maintaining a separate DB-only server, and possibly clustering (which we don't yet have to do) you can handle a LOT of data......in many cases, those who think it can only handle a few hundred thousand records work on a competing product for a living. If you still doubt whether it's valid, consider that per MySQL's clustering metrics, an 8 server cluster can handle 2.5million updates PER SECOND. Not too shabby at all.....
The problem with using two databases is juggling multiple connections. Is it tough? No, not really. You create different objects and reference your connection classes based on which database you want. In our case, we hit the main database's company class to deduce the client db name and then build the second connection based on that. But, when you're juggling those connections back and forth you can run into errors that require extra debugging. It's not just "Is my query valid?" but "Am I actually getting the correct database connection?" In our case, a dropped session can cause all sorts of PDO errors to fire because the system no longer can keep track of which client database to access. Plus, from a maintainability standpoint, it's a scary process trying to push table structure updates to 100 different live database. Yes, it can be automated. But one slip up and you've knocked a LOT of people down and made a ton of extra work for yourself. Now, calculate the extra development and testing required to juggle connections and push updates....that will be your measure of whether it's worthwhile.
My recommendation? Find a host that allows you to put two machines on the same local network. We chose Linode, but who you use is irrelevant. Start out with your dedicated database server, plan ahead to do clustering when it's necessary. Keep all your content in one DB, index and optimize religiously. Finally, find a REALLY good DB guy and treat him well. With that much data, a great DBA would be a must.
I am currently in a debate with a coworker about the best practices concerning the database design of a PHP web application we're creating. The application is designed for businesses, and each company that signs up will have multiple users using the application.
My design methodology is to create a new database for every company that signs up. This way everything is sand-boxed, modular, and small. My coworkers philosophy is to put everyone into one database. His argument is that if we have 1000+ companies sign up, we wind up with 1000+ databases to deal with. Not to mention the mess that doing Business Intelligence becomes.
For the sake of example, assume that the application is an order entry system. With separate databases, table size can remain manageable even if each company is doing 100+ orders a day. In a single-bucket application, tables can get very big very quickly.
Is there a best practice for this? I tried hunting around the web, but haven't had much success. Links, whitepapers, and presentations welcome.
Thanks in advance,
The1Rob
I talked to the database architect from wordpress.com, the hosting service for WordPress. He said that they started out with one database, hosting all customers together. The content of a single blog site really isn't that much, after all. It stands to reason that a single database is more manageable.
This did work well for them until they got hundreds and thousands of customers, they realized that they needed to scale out, running multiple physical servers and hosting a subset of their customers on each server. When they add a server, it would be easy to migrate individual customers to the new server, but harder to separate data within a single database that belongs to an individual customer's blog.
As customers come and go, and some customers' blogs have high-volume activity while others go stale, the rebalancing over multiple servers becomes an even more complex maintenance job. Monitoring size and activity per individual database is easier too.
Likewise doing a database backup or restore of a single database containing terrabytes of data, versus individual database backups and restores of a few megabytes each, is an important factor. Consider: a customer calls and says their data got SNAFU'd due to some bad data entry, and could you please restore the data from yesterday's backup? How would you restore one customer's data if all your customers share a single database?
Eventually they decided that splitting into a separate database per customer, though complex to manage, offered them greater flexibility and they re-architected their hosting service to this model.
So, while from a data modeling perspective it seems like the right thing to do to keep everything in a single database, some database administration tasks become easier as you pass a certain breakpoint of data volume.
I would never create a new database for each company. If you want a modular design, you can create this using tables and properly connected primary and secondary keys. This is where i learned about database normalization and I'm sure it will help you out here.
This is the method I would use. SQL Article
I'd have to agree with your co-worker. Relational databases are designed to handle large amounts of data, and the numbers you're talking about (1000+ companies, multiple users per company, 100+ orders/day) are well within the expected bounds. Separate databases means:
multiple database connections in each script (memory and speed penalty)
maintenance is harder (DB systems generally do not provide tools for acting on databases as a group) so schema changes, backups, and similar tasks will be more difficult
harder to run queries on data from multiple companies
If your site becomes huge, you may eventually need to distribute your data across multiple servers. Deal with that when it happens. To start out that way for performance reasons sounds like premature optimization.
I haven't personally dealt with this situation, but I would think that if you want to do business intelligence, you should aggregate the data into an offline database that you can then run any analysis you want on.
Also, keeping them in separate databases makes it easier to partition across servers (which you will likely have to do if you have 1000+ customers) without resorting to messy replication technologies.
I had a similar question a while back and came to the conclusion that a single database is drastically more manageable. Right now, we have multiple databases (around 10) and it is already becoming a pain to manage especially when we upgrade the code. We have to migrate every single database.
The upside is that the data is segregated cleanly. Due to the sensitivity of our data, this is a good thing, but it does make it quite a bit more difficult to keep up with.
The separate database methodology has a very big advance over the other:
+ You could broke it up into smaller groups, this architecture scales much better.
+ You could make stand alone servers in an easy way.
That depends on how likely your schemas are to change. If they ever have to change, will you be able to safely make those changes to 1000 separate databases? If a scalability problem is found with your design, how are you going to fix it for 1000 databases?
We run a SaaS (Software-as-a-Service) business with a large number of customers and have elected to keep all customers in the same database. Managing 1000's of separate databases is an operational nightmare.
You do have to be very diligent creating your data model and the business objects / reporting queries that access them. One approach you may want to consider is to carry the company ID in every table and ensure that every WHERE clause includes the company ID for the currently logged-in user. If you use a data access layer, you can enforce that condition there.
As you grow large, you can still vertically partition by placing groups of companies on each physical server, e.g. the first 100 companies on Server A, the next 100 companies on Server B.