Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 4 years ago.
Improve this question
I'm designing a back-end for an application that will keep track of several different restaurants and their order history. So I've been thinking about the most efficient way to do this. What I think I want to do have one generic design for any restaurant and create a new database as we add different restaurants. (Please let me know if there is anything wrong with doing it that way)
What I'm trying to figure out is how i'm going to store the specific order data for the restaurants many orders. I was thinking having one large table that keeps track of all the orders and the create a separate mini tables for each order detailing what was ordered, prices, and if any discounts/coupons were applied.
I imagine in one month a restaurant can have thousands of orders, so that would end with me having several thousands of mini tables with separate orders. I was also considering having a table full of each individual item for all orders and just attaching an order_id to each of them. But then I would have a table with up to tens of thousands of entries a month.
Which is the most efficient way to do this? Of course, both these implementation ideas might be way off, so i'm open to hearing any other ideas or thoughts!
Try not to sacrifice design for performance when you're not actually needing it. It just makes unnecessary complications. Just try to design your entities in normal forms like BCNF or 3NF. Then after you designed your entities and everything looks nice about it.
Then there are many solutions for performance tuning and scaling.
The first is indexing, By indexing, you can save a lot of computational power. Because querying a database without indexes is O(n) and by indexes, you can have O(log(n)) or even O(1) depend on the indexing algorithm you are using.
The next solution is partitioning the tables. Think of it as dividing your table into many tables but the database abstracts that and you see only one table.
These two first solutions help you to tune performance in one machine. While scaling up a machine can help you to achieve better performance. There are hardware limitations on that. So if you have to scale out, there is replication and sharding.
Basically, the replication helps you to scale your read queries. There are replication solutions which can scale write queries but they aren't really super effective as they have to write one thing in all the machines in the cluster. Though they are perfect solutions for high availability.
So if you reach a level that you have so much writes that the replication doesn't help anymore. You can go to the sharding. There are many aspects to sharding. Like is sharding should be done on application level or database level? And how to divide data between machines?
For myself, I prefer the database level sharding (and I actually use it in production). Because the application level sharding could make the application server code complicated (since you may need service discovery, etc.) and even dirty if it isn't handled carefully. Also, the abstraction helps developers to think of the cluster as one database instead of many divided databases.
And for dividing the data between shards. There are vertical sharding and horizontal sharding.
In vertical sharding, you divide the data by entities like customers in one database, orders in another, etc.
This should be the first approach since it's super easier to do than horizontal sharding.
In horizontal sharding, you divide one entity between many database servers. For example, you can use a formula to divide the rows evenly like the rows with odd ID goes to instance A and the rows with even ID goes to instance B.
Another way is to divide them by something common like dividing orders by restaurants.
Hope it helps. If you have further questions, I will be happy to answer.
Related
We have a large number of tables in our company's MySQL database, each representing different products (and/or history/transactions for those products) plus a "main" table for parent establishments. Almost all of these tables are using MyISAM (changing everything to InnoDB might help but it's not an option at the moment).
We have a "filter" tool in our backend for finding establishments that match certain criteria. The results are printed in tabular format with all data available for that establishment (ID, name, which products they do/don't have, how many transactions, etc. etc.) and currently this is achieved with a very large MySQL statement with many JOINs.
We had a situation last week where a particularly large filter was run during peak business hours and the resulting READ LOCKs on dependent tables (via the aforementioned JOIN statements) caused the entire database to stop responding for almost 30mins even though the filter in question only takes ~43s to run on it's own (locally, anyway). Very bad.
While important, this filter tool is only used by a few people on the team and not by clients. The speed/performance of this filter tool is not critical nor the goal of this question. I would prefer for this tool to "yield" to other apps that need access to these tables rather than force them to wait until the entire filter has finished.
Which brings me to my question; Will splitting one large query (with multiple JOINs) into multiple smaller queries help mitigate table locking and force a script to "yield" to other, higher priority scripts that might need access to the same tables in between the smaller queries?
Disclaimer: I have reviewed so many other questions here on StackOverflow and on other sites via Google over the last week and they're all interested in speed. That is not what I am asking. If this is a duplicate I apologize and it can be locked, but please provide a link to it so that I may use it. Thank you!
EDIT: I appreciate the comments thus far and the additional information/ideas they provide, though none have answered the question unfortunately. I'm in a position at the company where I have control over the filter's code and that's it. I cannot change the database engine, I cannot initiate replication or create data warehouses, and I'm already aware that MyISAM is the inferior choice for tables, but I don't have control over that. Thank you.
Closed. This question is opinion-based. It is not currently accepting answers.
Want to improve this question? Update the question so it can be answered with facts and citations by editing this post.
Closed 4 years ago.
Improve this question
I've built a school management system for my own needs. The size of tables ranges from ~200 rows to ~30,000 rows /at the end of school year/.
Some of my friends have seen the system and they urge me to make it available to other schools. I'd like to give it a try at least with a few schools for now. Considering my current architecture and shared hosting I'd have to store all schools in single db and so 2 questions bother me:
Can MySql easily handle tables with >300,000 rows?
The system is based on Yii2 at the moment, I've optimized it for max performance - do you think it's wise to try or better work towards solution with a dedicated server and separate db for each school?
I don't know if it's wise to store all students, attendance, payments etc info from my and 10 others schools across shared tables in single db. I'd better ask than cause trouble to myself..
Any advice is more than welcome:)
premature optimization is the root of all evil (or at least most of
it) in programming
You should not worry about this at the moment. Start running your application and as you scale, identify the bottlenecks and then try to figure out a solution for it.
Can MySql easily handle tables with >300,000 rows?
First things first, use the best normlization principles to structure your tables and relations. MySQL is pretty good at handling rows up to 10,000,000. but it also depends on how you are indexing/querying the data. Use proper db indexes on the columns you frequenty use for lookup. A big no to "like" queries but if you must, then use a search engine like (elastic, solr).
The system is based on Yii2 at the moment, I've optimized it for max
performance - do you think it's wise to try or better work towards
solution with a dedicated server and separate db for each school?
I have a very little idea about Yii2, but certainly there are much better frameworks available in php which you can give a try eg. larvel (this will give you a better idea). Ofcourse, best would be to host this application on a dedicated server. why to waste money when you can have private VPS just in 5$ from digitalocean.
I don't know if it's wise to store all students, attendance, payments
etc info from my and 10 others schools across shared tables in single
db. I'd better ask than cause trouble to myself..
There is absolutely no problem storing students, attendance, payments info in the same database, just structure your tables properly.
I've recently taken over a project linking to a large MySQL DB that was originally designed many years ago and need some help.
Currently the DB has 5 tables per client that store their users information, transaction history, logs etc. However we currently have ~900 clients that have applied to use our services, with an average of 5 new clients applying weekly. So the DB has grown to nearly 5000 tables and ever increasing. Many of our clients do not end up using our services so their tables are all empty but still in the DB.
The original DB designer says it was created this way so if a table was ever compromised it would not reveal information on any other client.
As I'm redesigning the project in PHP I'm thinking of redesigning the DB to have an overall user, transaction history, log etc tables using the clients unique id to reference them.
Would this approach be correct or should the DB stay as is?
Could you see any possible security / performance concerns
Thanks for all your help
You should redesign the system to have just five tables, with a separate column identifying which client the row pertains to. SQL handles large tables well, so you shouldn't have to worry about performance. In fact, having many, many tables can be a hinderance to performance in many cases.
This has many advantages. You will be able to optimize the table structures for all clients at once. No more trying to add an index to 300 tables to meet some performance objective. Managing the database, managing the tables, backing things up -- all of these should be easier with a single table.
You may find that the database even gets smaller in size. This is because, on average, each of those thousands of tables has a half-paged filled at the end. This will go from thousands of half-pages to just one.
The one downside is security. It is easier to put security on tables than one rows in tables. If this is a concern, you may need to think about these requirements.
This may just be a matter of taste, but I would find it far more natural - and thus maintainable - to store this information in as few tables as possible. Also most if not all database ORMs will be expecting a structure like this, and there is no reason to reinvent that wheel.
From the perspective of security, it sounds like this project could be described as a web app. Obviously I don't know the realities of the business logic you're dealing with, but it seems like regardless of the table permissions all access to the database would be via the code base, in which case the app itself needs full permissions for all tables - nullifying any advantage of keeping the tables separated.
If there is a compelling reason for the security measures - say, different services that feed data into the DB independently of the web app, I would still explore ways to handle that authentication at the application layer instead of at the database layer. It will be much easier to handle your security rules in that way. Instead of having rules set in 5000+ different places, a single security rule of 'only let a user view a row of data if their user id equals the user_id column" is far simpler, easier to understand, and therefore far more maintainable (and possibly more secure).
Different people approach databases in different ways. I am a web developer, so I view databases as the place to store my data and nothing more, as it's always a dedicated and generally single-purpose DB installation, and I handle all other logic at the application level. There are people who view databases as the application itself, who make far more extensive use of built-in security features for their massive, distributed, multi-user systems - but I honestly don't know enough about those scenarios to comment on exactly where that line should be drawn.
I am currently in a debate with a coworker about the best practices concerning the database design of a PHP web application we're creating. The application is designed for businesses, and each company that signs up will have multiple users using the application.
My design methodology is to create a new database for every company that signs up. This way everything is sand-boxed, modular, and small. My coworkers philosophy is to put everyone into one database. His argument is that if we have 1000+ companies sign up, we wind up with 1000+ databases to deal with. Not to mention the mess that doing Business Intelligence becomes.
For the sake of example, assume that the application is an order entry system. With separate databases, table size can remain manageable even if each company is doing 100+ orders a day. In a single-bucket application, tables can get very big very quickly.
Is there a best practice for this? I tried hunting around the web, but haven't had much success. Links, whitepapers, and presentations welcome.
Thanks in advance,
The1Rob
I talked to the database architect from wordpress.com, the hosting service for WordPress. He said that they started out with one database, hosting all customers together. The content of a single blog site really isn't that much, after all. It stands to reason that a single database is more manageable.
This did work well for them until they got hundreds and thousands of customers, they realized that they needed to scale out, running multiple physical servers and hosting a subset of their customers on each server. When they add a server, it would be easy to migrate individual customers to the new server, but harder to separate data within a single database that belongs to an individual customer's blog.
As customers come and go, and some customers' blogs have high-volume activity while others go stale, the rebalancing over multiple servers becomes an even more complex maintenance job. Monitoring size and activity per individual database is easier too.
Likewise doing a database backup or restore of a single database containing terrabytes of data, versus individual database backups and restores of a few megabytes each, is an important factor. Consider: a customer calls and says their data got SNAFU'd due to some bad data entry, and could you please restore the data from yesterday's backup? How would you restore one customer's data if all your customers share a single database?
Eventually they decided that splitting into a separate database per customer, though complex to manage, offered them greater flexibility and they re-architected their hosting service to this model.
So, while from a data modeling perspective it seems like the right thing to do to keep everything in a single database, some database administration tasks become easier as you pass a certain breakpoint of data volume.
I would never create a new database for each company. If you want a modular design, you can create this using tables and properly connected primary and secondary keys. This is where i learned about database normalization and I'm sure it will help you out here.
This is the method I would use. SQL Article
I'd have to agree with your co-worker. Relational databases are designed to handle large amounts of data, and the numbers you're talking about (1000+ companies, multiple users per company, 100+ orders/day) are well within the expected bounds. Separate databases means:
multiple database connections in each script (memory and speed penalty)
maintenance is harder (DB systems generally do not provide tools for acting on databases as a group) so schema changes, backups, and similar tasks will be more difficult
harder to run queries on data from multiple companies
If your site becomes huge, you may eventually need to distribute your data across multiple servers. Deal with that when it happens. To start out that way for performance reasons sounds like premature optimization.
I haven't personally dealt with this situation, but I would think that if you want to do business intelligence, you should aggregate the data into an offline database that you can then run any analysis you want on.
Also, keeping them in separate databases makes it easier to partition across servers (which you will likely have to do if you have 1000+ customers) without resorting to messy replication technologies.
I had a similar question a while back and came to the conclusion that a single database is drastically more manageable. Right now, we have multiple databases (around 10) and it is already becoming a pain to manage especially when we upgrade the code. We have to migrate every single database.
The upside is that the data is segregated cleanly. Due to the sensitivity of our data, this is a good thing, but it does make it quite a bit more difficult to keep up with.
The separate database methodology has a very big advance over the other:
+ You could broke it up into smaller groups, this architecture scales much better.
+ You could make stand alone servers in an easy way.
That depends on how likely your schemas are to change. If they ever have to change, will you be able to safely make those changes to 1000 separate databases? If a scalability problem is found with your design, how are you going to fix it for 1000 databases?
We run a SaaS (Software-as-a-Service) business with a large number of customers and have elected to keep all customers in the same database. Managing 1000's of separate databases is an operational nightmare.
You do have to be very diligent creating your data model and the business objects / reporting queries that access them. One approach you may want to consider is to carry the company ID in every table and ensure that every WHERE clause includes the company ID for the currently logged-in user. If you use a data access layer, you can enforce that condition there.
As you grow large, you can still vertically partition by placing groups of companies on each physical server, e.g. the first 100 companies on Server A, the next 100 companies on Server B.
First of all I am an autodidact so I don't have great know how about optimization and stuff. I created a social networking website.
It contains 29 tables right now. I want to extend its functionality by adding things like yellow pages, events etc to make it more like a portal.
Now the question is should I simply add the tables in the same database or should I use a different database?
And in case if I create a new database, I also want users to be able to comment on business listing etc just like reviews. So how will I be able to pull out entries since the reviews will be on one database and user details on other.
Is it possible to join tables on 2 different databases ?
You can join tables in separate databases by fully justifying the name, but the real question is why do you want the information in separate databases? If the information you are storing all relates together, it should go in one database unless there is a compelling (usually performance related) reason against it.
The main reason I could see for separating your YellowPages out is if you wished to have one YellowPages accessible to several different, non-interacting, websites. That said, assumably you wouldn't want cross-talk comments on the listings, so comments would need to be stored in the website databases rather than the YellowPages database. And that just sounds like a maintenance nightmare.
Don't Optimize until you need to.
If performance is ok, go for the easiest to maintain solution.
Monitor the performance of your site and if it starts to get slow, figure out exactly what is causing the slowdown and focus on performance on that section only.
You definitely can query and join tables from two different databases - you just need to specify the tables in a dbname.tablename format.
SELECT a.username, b.post_title
FROM dbOne.users a INNER JOIN dbTwo.posts b USING (user_id)
However, it might make management and maintenance a lot more complicated for you. For example, you'll have to track which table belongs in which database, and will continually need to be adding the database names into all your queries. When it comes time to back up the data, your work will increase there as well. MySQL databases can easily contain hundreds of tables so I see no benefit in splitting it up - just stick with one.
You can prove an algorithm is the fastest it can. math.h and C libraries are very optimized since half a century and other very advances when optimizing is perl strucutres. Just avoid put everything on online to easify debugging. There're conventions, try keep every programmer in the team following same convention. Which convention is "right" makes less optimum than being consequent and consistent. Performance is the last thing you do, security and intelligibility top prios. Read about ordo notation depends on software only while suboptimal software can be faster than optimal relative different hardware. A totally buginfested spaghetti code with no structure can respond many times faster than the most proven optimal software relative hardware.