I've recently taken over a project linking to a large MySQL DB that was originally designed many years ago and need some help.
Currently the DB has 5 tables per client that store their users information, transaction history, logs etc. However we currently have ~900 clients that have applied to use our services, with an average of 5 new clients applying weekly. So the DB has grown to nearly 5000 tables and ever increasing. Many of our clients do not end up using our services so their tables are all empty but still in the DB.
The original DB designer says it was created this way so if a table was ever compromised it would not reveal information on any other client.
As I'm redesigning the project in PHP I'm thinking of redesigning the DB to have an overall user, transaction history, log etc tables using the clients unique id to reference them.
Would this approach be correct or should the DB stay as is?
Could you see any possible security / performance concerns
Thanks for all your help
You should redesign the system to have just five tables, with a separate column identifying which client the row pertains to. SQL handles large tables well, so you shouldn't have to worry about performance. In fact, having many, many tables can be a hinderance to performance in many cases.
This has many advantages. You will be able to optimize the table structures for all clients at once. No more trying to add an index to 300 tables to meet some performance objective. Managing the database, managing the tables, backing things up -- all of these should be easier with a single table.
You may find that the database even gets smaller in size. This is because, on average, each of those thousands of tables has a half-paged filled at the end. This will go from thousands of half-pages to just one.
The one downside is security. It is easier to put security on tables than one rows in tables. If this is a concern, you may need to think about these requirements.
This may just be a matter of taste, but I would find it far more natural - and thus maintainable - to store this information in as few tables as possible. Also most if not all database ORMs will be expecting a structure like this, and there is no reason to reinvent that wheel.
From the perspective of security, it sounds like this project could be described as a web app. Obviously I don't know the realities of the business logic you're dealing with, but it seems like regardless of the table permissions all access to the database would be via the code base, in which case the app itself needs full permissions for all tables - nullifying any advantage of keeping the tables separated.
If there is a compelling reason for the security measures - say, different services that feed data into the DB independently of the web app, I would still explore ways to handle that authentication at the application layer instead of at the database layer. It will be much easier to handle your security rules in that way. Instead of having rules set in 5000+ different places, a single security rule of 'only let a user view a row of data if their user id equals the user_id column" is far simpler, easier to understand, and therefore far more maintainable (and possibly more secure).
Different people approach databases in different ways. I am a web developer, so I view databases as the place to store my data and nothing more, as it's always a dedicated and generally single-purpose DB installation, and I handle all other logic at the application level. There are people who view databases as the application itself, who make far more extensive use of built-in security features for their massive, distributed, multi-user systems - but I honestly don't know enough about those scenarios to comment on exactly where that line should be drawn.
Related
I'm currently developping an application which allows doctors to dinamically generate invoices. The fact is, each doctors requires 6 differents database tables, and there could be like 50 doctors connected at the same time and working with the database (writing and reading) at the same time.
What I wanted to know is if the construction of my application fits. For each doctors, I create a personnal Sqlite3 database (all database are secure) which only him can connect to. I'll have like 200 Sqlite database, but is there any problems ? I thought it could be better than using a big MySQL database for everyone.
Is this solution viable ? Will I have problems to deal with ? I never did such an application with so many users, but I thought it could be the best solution
Firstly, to answer your question: no, you probably will not have any significant problems if a single sqlite database is used only by one person (user) at a time. If you highly value certain edge cases, like the ability to move some users/databases to another server, this might be a very good solution.
But it is not a terribly good design. The usual way is to have all data in the same database, and tables having a field which identifies which rows belong to which users. The application code is responsible for maintaining security (i.e. not to let users see data which doesn't belong to them), and indexes in the database (which you should use in all cases, even in your own design) are responsible for making it fast.
There are a large number of tutorials which could help you to make a better database design; a random google result is http://www.profsr.com/sql/sqless02.htm .
I´m new on php/mysql, and i´m codding a simple CMS. But in this case i will host multiple companies (each company with their multiple users), that pays a fee to use the system.
So... My question is about how to organize the Data Base... Talking about security, management and performance, i just want to know the opinion of ou guys of wich of these cases is the best:
Host all companies on a single DB and they get a company id to match with the users.
Each company have a separated DB that holds the users in there (and dont need the companies id anymore).
I would start the development following the first situation... But than i thought if i have some hacker attack / sql injection, every client would be harmed. Having separated DBs, the damage will get only one client. So maybe the 2nd situation could be better in terms of security. But could not say the same about management and performance.
So, based on your experience, any help or tip would be great!
Thanks in advance, and sorry about my poor english.
I would go for seperate DBs. But not only for hacking.
Scalability:
Lets say you have a server that handles 10 websites, but 1 of those websites in growing fast in requests, content, etc. Your server is having a hard time to host all of them.
With seperate DB's it is a piece of cake to spread over multiple servers. With a single one you would have to upgrade you current DB or cluster it, but that is sometimes not possible with the hosting company or very expensive.
Performance:
You they are all on 1 DB and data of multiple users is in 1 table, locks might slow down other users.
Large tables, mean large indices, large lookups, etc. So splitting to diffrent DB's would actualy speed that up.
You would have to deal with extra memory and CPU overhead per DB but they normaly do not have an amazingly large impact.
And yes, management for multiple DBs is more work, but having proper update scripts and keeping a good eye on the versions of the DB schema will reduce your management concerns a lot.
Update: also see this article.
http://msdn.microsoft.com/en-us/library/aa479086.aspx
Separate DBs has many advantages including performance, security, scalability, mobility, etc. There is more risk less reward trying to pack everything into 1 database especially when you are talking about separate companies data.
You haven't provided any details, but generally speaking, I would opt for separate databases.
Using an autonomous database for every client allows a finer degree of control, as it would be possible to manage/backup/trash/etc. them individually, without affecting the others. It would also require less grooming, as data is easier to be distinguished, and one database cannot break the others.
Not to mention it would make the development process easier -- note that separate databases mean that you don't have to always verify the "owner" of the rows.
If you plan to have this database hosted in a cloud environment such as Azure databases where resources are (relatively) cheap, clients are running the same code base, the database schema is the same (obviously), and there is the possibility of sharing some data between the companies then a multi-tenant database may be the way to go. For anything else you, you will probably be creating a lot of extra work going with a multi-tenant database.
Keep in mind that if you go the separate databases route, trying to migrate to a multi-tenant cloud solution later on is a HUGE task. I only mention this because all I've been hearing for the past few years around the IT water coolers is "Cloud! Cloud! Cloud!".
I would like to know what do you think about storing chat messages in a database?
I need to be able to bind other stuff to them (like files, or contacts) and using a database is the best way I see for now.
The same question comes for files, because they can be bound to chat messages, I have to store them in the database too..
With thousands of messages and files I wonder about performance drops and database size.
What do you think considering I'm using PHP with MySQL/Doctrine?
I think that it would be OK to store any textual information on the database (names, messages history, etc) provided that you structure your database properly. I have worked for big Web-sites (multi-kilo visits a day) and telecom companies that store information about their users (including their traffic statistics) on the databases that have grown up to hundreds of gigabytes and the applications were working fine.
But regarding binary information like images and files it would be better to store them on the file systems and store only their paths on the database, because it will be cheaper to read them off the disks that to tie a database process to reading a multi-megabyte file.
As I said, it is important that you do several things:
Structure you information properly - it is very important to properly design your database, properly divide it into tables and tables into fields with your performance goals in mind because this will form the basis for your application and queries. Get that wrong and your queries will be slow.
Make proper decisions on table engines pertinent to every table. This is an important step because it will greatly affect the performance of your queries. For example, MyISAM blocks reading access to the table while it is being updated. That will be a problem for a web application like a social networking or a news site because im many situations your users will basically have to wait for a information update to be completed before the will see a generated page.
Create proper indexes - very important for performance, especially for applications with rapidly growing big databases.
Measure performance of your queries as data grows and look for the ways to improve it - you will always find bottlenecks that have to be removed, this is an ongoing non-stop process. Every popular web application has to do it.
I think a NoSQL database like CouchDB or MongtoDB is an option. You can also store the files separate and link them via a known filename but it depends on your system architecture.
I am writing a PHP application in ZF. Customers will use it to sell their products to final customers. Customers will host their application on my server or they could use their own. Most of them will host this application on my server.
I could design one database for all customers at once, so every customer will use the same database, but of course products etc. will be assigned to particular customer. Trivial.
I could use separate database for every customer, so the database structure will be simpler. I will then probably use separate subdomains and maybe even file location, but that is just a detail.
Which solution will have better performance and how big will be the difference? Which one would you choose?
I would use a separate database for each customer. It makes backup and scaling easier. If you ever get a large customer that needs some custom changes to the schema, you can do it easily.
If one customer needs you to restore their data, with a single database it is trivial. On a shared db, much harder.
And that if large customer ever gets a lot of traffic, you can easily put them on another server with minimal changes.
If one site gets compromised, you don't have all of teh data for everyone in one place, the damage is mitigated to just the site that was hacked.
I'd definitely recommend going with 1 db per customer if possible.
Personally, I would go with multiple databases - i.e. a database for each client.
As I understand it all your clients will be using just an instance of your application so these instances should have their own databases.
If you go with a single database, you are creating a great potential security risk. One client compromising the login details to the db server would automatically compromise data of all your clients.
Also a single security vulnerability (a SQL injection attack) could destroy data of all clients (with multiple dbs you could still have time to fix the security hole and release a patch before all other sites are attacked).
You don't want to have an army of 1000000 mad clients instead of just 1 angry client.
Multiple databases also give you a greater possibility of load balancing (you can have the dbs spread across more servers).
Performance wise you're basically start with a 'sharding' approach. Because of this, the sharding performance strategy will be piece of cake.
The downside is that you could argue you're losing some (undefined) bit of overhead in the duplication.
One pitfall is that you might not notice performance issues in major components as quickly. This is because they are so scattered, so they might not be visible on your radar. Load testing is the way to get ahead of this.
To some extent this is a question of personal opinion. There are pros and cons of both models.
Personally, and because of the "they could use their own" comment, I would go with a seperate database per customer. This gives you
The ability to move customer data around when necessary. For example moving a single customer onto a different servers/setups depending on things like load.
If something goes wrong you only impact one customer and not everybody.
You can spread DB load across multiple DB servers if necessary.
If a customer comes to you with a specific requirement you can more easily cater for this without impact other customers.
From a performance perspective, to be honest I don't think there is any real performace gain in either model. That said this does of course depend on the structure of your DB and the hardware it runs on.
Don't choose multiple databases solution, if your needs can be fulfilled with one database. Because multiple databases will lead to big burden in long run, and your system will become highly complicated and unmanageable as you grow.
Using proper relationship you can go long way
A Client model can have many Products // why multiple databases?
Performance can achieved in either ways, just going multiple dbs will NOT benefit in that direction
I am currently in a debate with a coworker about the best practices concerning the database design of a PHP web application we're creating. The application is designed for businesses, and each company that signs up will have multiple users using the application.
My design methodology is to create a new database for every company that signs up. This way everything is sand-boxed, modular, and small. My coworkers philosophy is to put everyone into one database. His argument is that if we have 1000+ companies sign up, we wind up with 1000+ databases to deal with. Not to mention the mess that doing Business Intelligence becomes.
For the sake of example, assume that the application is an order entry system. With separate databases, table size can remain manageable even if each company is doing 100+ orders a day. In a single-bucket application, tables can get very big very quickly.
Is there a best practice for this? I tried hunting around the web, but haven't had much success. Links, whitepapers, and presentations welcome.
Thanks in advance,
The1Rob
I talked to the database architect from wordpress.com, the hosting service for WordPress. He said that they started out with one database, hosting all customers together. The content of a single blog site really isn't that much, after all. It stands to reason that a single database is more manageable.
This did work well for them until they got hundreds and thousands of customers, they realized that they needed to scale out, running multiple physical servers and hosting a subset of their customers on each server. When they add a server, it would be easy to migrate individual customers to the new server, but harder to separate data within a single database that belongs to an individual customer's blog.
As customers come and go, and some customers' blogs have high-volume activity while others go stale, the rebalancing over multiple servers becomes an even more complex maintenance job. Monitoring size and activity per individual database is easier too.
Likewise doing a database backup or restore of a single database containing terrabytes of data, versus individual database backups and restores of a few megabytes each, is an important factor. Consider: a customer calls and says their data got SNAFU'd due to some bad data entry, and could you please restore the data from yesterday's backup? How would you restore one customer's data if all your customers share a single database?
Eventually they decided that splitting into a separate database per customer, though complex to manage, offered them greater flexibility and they re-architected their hosting service to this model.
So, while from a data modeling perspective it seems like the right thing to do to keep everything in a single database, some database administration tasks become easier as you pass a certain breakpoint of data volume.
I would never create a new database for each company. If you want a modular design, you can create this using tables and properly connected primary and secondary keys. This is where i learned about database normalization and I'm sure it will help you out here.
This is the method I would use. SQL Article
I'd have to agree with your co-worker. Relational databases are designed to handle large amounts of data, and the numbers you're talking about (1000+ companies, multiple users per company, 100+ orders/day) are well within the expected bounds. Separate databases means:
multiple database connections in each script (memory and speed penalty)
maintenance is harder (DB systems generally do not provide tools for acting on databases as a group) so schema changes, backups, and similar tasks will be more difficult
harder to run queries on data from multiple companies
If your site becomes huge, you may eventually need to distribute your data across multiple servers. Deal with that when it happens. To start out that way for performance reasons sounds like premature optimization.
I haven't personally dealt with this situation, but I would think that if you want to do business intelligence, you should aggregate the data into an offline database that you can then run any analysis you want on.
Also, keeping them in separate databases makes it easier to partition across servers (which you will likely have to do if you have 1000+ customers) without resorting to messy replication technologies.
I had a similar question a while back and came to the conclusion that a single database is drastically more manageable. Right now, we have multiple databases (around 10) and it is already becoming a pain to manage especially when we upgrade the code. We have to migrate every single database.
The upside is that the data is segregated cleanly. Due to the sensitivity of our data, this is a good thing, but it does make it quite a bit more difficult to keep up with.
The separate database methodology has a very big advance over the other:
+ You could broke it up into smaller groups, this architecture scales much better.
+ You could make stand alone servers in an easy way.
That depends on how likely your schemas are to change. If they ever have to change, will you be able to safely make those changes to 1000 separate databases? If a scalability problem is found with your design, how are you going to fix it for 1000 databases?
We run a SaaS (Software-as-a-Service) business with a large number of customers and have elected to keep all customers in the same database. Managing 1000's of separate databases is an operational nightmare.
You do have to be very diligent creating your data model and the business objects / reporting queries that access them. One approach you may want to consider is to carry the company ID in every table and ensure that every WHERE clause includes the company ID for the currently logged-in user. If you use a data access layer, you can enforce that condition there.
As you grow large, you can still vertically partition by placing groups of companies on each physical server, e.g. the first 100 companies on Server A, the next 100 companies on Server B.