Closed. This question is opinion-based. It is not currently accepting answers.
Want to improve this question? Update the question so it can be answered with facts and citations by editing this post.
Closed 4 years ago.
Improve this question
I've built a school management system for my own needs. The size of tables ranges from ~200 rows to ~30,000 rows /at the end of school year/.
Some of my friends have seen the system and they urge me to make it available to other schools. I'd like to give it a try at least with a few schools for now. Considering my current architecture and shared hosting I'd have to store all schools in single db and so 2 questions bother me:
Can MySql easily handle tables with >300,000 rows?
The system is based on Yii2 at the moment, I've optimized it for max performance - do you think it's wise to try or better work towards solution with a dedicated server and separate db for each school?
I don't know if it's wise to store all students, attendance, payments etc info from my and 10 others schools across shared tables in single db. I'd better ask than cause trouble to myself..
Any advice is more than welcome:)
premature optimization is the root of all evil (or at least most of
it) in programming
You should not worry about this at the moment. Start running your application and as you scale, identify the bottlenecks and then try to figure out a solution for it.
Can MySql easily handle tables with >300,000 rows?
First things first, use the best normlization principles to structure your tables and relations. MySQL is pretty good at handling rows up to 10,000,000. but it also depends on how you are indexing/querying the data. Use proper db indexes on the columns you frequenty use for lookup. A big no to "like" queries but if you must, then use a search engine like (elastic, solr).
The system is based on Yii2 at the moment, I've optimized it for max
performance - do you think it's wise to try or better work towards
solution with a dedicated server and separate db for each school?
I have a very little idea about Yii2, but certainly there are much better frameworks available in php which you can give a try eg. larvel (this will give you a better idea). Ofcourse, best would be to host this application on a dedicated server. why to waste money when you can have private VPS just in 5$ from digitalocean.
I don't know if it's wise to store all students, attendance, payments
etc info from my and 10 others schools across shared tables in single
db. I'd better ask than cause trouble to myself..
There is absolutely no problem storing students, attendance, payments info in the same database, just structure your tables properly.
Related
Closed. This question is opinion-based. It is not currently accepting answers.
Want to improve this question? Update the question so it can be answered with facts and citations by editing this post.
Closed last year.
Improve this question
Currently, I am working on a website and just started studying backend. I wonder why nobody uses JSON as a database. Also, I don't quite get the utility of php and SQL. Since I could easily get data from JSON file and use it, why do I need php and SQL?
ok! let assume you put the data in a JSON variable and store it in a file for all your projects.
obviously, u need to add a subsystem for getting back up, then you will write it.
you must increase the performance for handling a very large amount of data, just like indexing, hash algorithms, and... , assume u handle it.
if you need some API for working and connecting with a variety of programming languages, u need to write them.
what about functionalities? what if you need to add some triggers, store procedures, views, full-text search and etc? ok, you will pay your time and add them.
ok, good job, but your system will grow up and you need to scale it, can you do it? u will write abilities for clustering across servers, sharding, and ...
now you need to guarantee that your system will compatible with ACID rules, to keep atomicity, Consistency, Isolation, and Durability.
can you always handle all querying techniques (Map/Reduce) and respond with a fast and standard structure?
now it's time to offer very quick write speeds, it brings serious issues for you
ok, now proper your solutions for condition racing, isolation level, locking, relations and ...
after you do all this work plus thousands of many others, probably you will have a DBMS a little bit just like MongoDB or other relational and non-relational databases!
so it's better to use them, however, obviously, you can choose to don't to use them too, I admit that sometimes saving data in a single file has better performance, but only sometimes, in some cases, with some data, for some purpose! if you know what exactly you do, then ist OK to save data in a JSON file.
Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 4 years ago.
Improve this question
I'm designing a back-end for an application that will keep track of several different restaurants and their order history. So I've been thinking about the most efficient way to do this. What I think I want to do have one generic design for any restaurant and create a new database as we add different restaurants. (Please let me know if there is anything wrong with doing it that way)
What I'm trying to figure out is how i'm going to store the specific order data for the restaurants many orders. I was thinking having one large table that keeps track of all the orders and the create a separate mini tables for each order detailing what was ordered, prices, and if any discounts/coupons were applied.
I imagine in one month a restaurant can have thousands of orders, so that would end with me having several thousands of mini tables with separate orders. I was also considering having a table full of each individual item for all orders and just attaching an order_id to each of them. But then I would have a table with up to tens of thousands of entries a month.
Which is the most efficient way to do this? Of course, both these implementation ideas might be way off, so i'm open to hearing any other ideas or thoughts!
Try not to sacrifice design for performance when you're not actually needing it. It just makes unnecessary complications. Just try to design your entities in normal forms like BCNF or 3NF. Then after you designed your entities and everything looks nice about it.
Then there are many solutions for performance tuning and scaling.
The first is indexing, By indexing, you can save a lot of computational power. Because querying a database without indexes is O(n) and by indexes, you can have O(log(n)) or even O(1) depend on the indexing algorithm you are using.
The next solution is partitioning the tables. Think of it as dividing your table into many tables but the database abstracts that and you see only one table.
These two first solutions help you to tune performance in one machine. While scaling up a machine can help you to achieve better performance. There are hardware limitations on that. So if you have to scale out, there is replication and sharding.
Basically, the replication helps you to scale your read queries. There are replication solutions which can scale write queries but they aren't really super effective as they have to write one thing in all the machines in the cluster. Though they are perfect solutions for high availability.
So if you reach a level that you have so much writes that the replication doesn't help anymore. You can go to the sharding. There are many aspects to sharding. Like is sharding should be done on application level or database level? And how to divide data between machines?
For myself, I prefer the database level sharding (and I actually use it in production). Because the application level sharding could make the application server code complicated (since you may need service discovery, etc.) and even dirty if it isn't handled carefully. Also, the abstraction helps developers to think of the cluster as one database instead of many divided databases.
And for dividing the data between shards. There are vertical sharding and horizontal sharding.
In vertical sharding, you divide the data by entities like customers in one database, orders in another, etc.
This should be the first approach since it's super easier to do than horizontal sharding.
In horizontal sharding, you divide one entity between many database servers. For example, you can use a formula to divide the rows evenly like the rows with odd ID goes to instance A and the rows with even ID goes to instance B.
Another way is to divide them by something common like dividing orders by restaurants.
Hope it helps. If you have further questions, I will be happy to answer.
Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 8 years ago.
Improve this question
I am about to create a PHP web project that will be consisting of a large database. The database will be MYSQL and will store more than 30000 records per day. To optimize the DB I thought to use MEMCACHED library with it. Am i going the correct way or some other alternative can be used to overcome the data optimization problem. I just want to provide faster retrieval and insertion. Can somebody advise me which tool should I use and how, as the data will gradually increase at a higher rate ? Should i use object relational mapping concept too ?
You can use Master & Slave technique for this purpose. Basically it would be combination of 2 db first for read operation and other for write operation.
I'd side with #halfer and say he's right about the test data. At least you'll know that you're not trying to optimize something that doesn't need optimizing.
On top of test data you'll also need some test scenarios to mimic the traffic patterns of your production environment, that's the hard part and really depends on the exact application patterns: how many reads versus writes versus updates / per second.
Given your number (30k) you'd average out at about 3 inserts / second which I'd assume even the cheapest machines could handle with ease. As for reads, a years worth of data would be just under 11M records. You may want to partition the data (mysql level or application level) if look ups become slow but I doubt you'd need to with such relatively small volumes. The real difference maker would be if the # of reads is 1000x more than the number of inserts, then you could look into what #ram sharma suggested and set up a replicated master-slave model where the master takes all the writes and the slaves are read-only.
Memcached is a powerful beast when used correctly and can turn a slow DB disk read into a blazing fast memory read. I'd still only suggest you look into it IF the DB is too slow. Adding moving parts to any application also adds potential failure points and increases the overall complexity.
EDIT: as for the use of an ORM, that's your choice and really won't change a thing concerning the DB's speed although it may add fractions of milliseconds to the end user.. usually worth it in my experience.
Cheers --
Closed. This question is opinion-based. It is not currently accepting answers.
Want to improve this question? Update the question so it can be answered with facts and citations by editing this post.
Closed 1 year ago.
Improve this question
I need some info on this subject. I've searched around a bit but it seems that it really depends on your situation. My situation is explained below:
We have developed a system where in a company can keep track of their projects and financial situation. They can create orders, divide tasks between employees, send invoices, check if they are paid, etc.
Currently we have 1 domain with 1 database with all the data for this company. We would like to use this system for other company's as well, but on 1 domain with the same files for every company. So we can maintain the files on 1 place and keep everything on our own server.
We want to use multiple databases for the following reason's:
We want the files to be in 1 place, easier to maintain and update
A client can't have acces to another clients financial data by accident
We can make induvidual backups of clients data
Downside's in my opinion are:
If something in a table needs to get updated you have to do that manually in every database
Could MySQL get really slow after 100+ databases?
Am i correct, and are we doing the right thing by giving every Company an induvidual database?
Thanks in advance!
There is technically no limit to the number of databases you can have. A brief search shows a few people have into the 1000+ databases, I don't see a problem with 100+ databases
We want the files to be in 1 place, easier to maintain and update
As you already mention under downsides, what if an update were to require a modification to the database's schema? Having hundreds of databases would be just as problematic to maintain, versus a single database (with client indicator columns in the relevant tables).
A client can't have acces to another clients financial data by accident
But clients can only access the data through your webapp. If that becomes compromised, by accident or otherwise, what is to stop it accessing other databases any moreso than unintended records in the same database?
Views could provide similar security benefit (albeit currently with some performance cost). However, I tend to create stored procedures and force my apps to perform all database actions through them, wherein I can perform my own security checks whilst limiting all database access to only predefined operations.
We can make induvidual backups of clients data
One could still make selective backups e.g. with SELECT ... INTO OUTFILE.
Sometimes it's better to use an own database for each company. Never in theory but often in practise.
Sql commands are cleaner and easier to write.
It's more secure. The companies can't accidentally have access to each other's database if an sql query or a script is faulty. You just have to select the database carefully.
Later if the database gets too busy it's very to separate databases to multiple servers. Big databases or tables may be difficult to split.
The tables stay smaller so the queries are also faster.
Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 1 year ago.
Improve this question
I'm actually developping a system for bus ticket reservation. The provider has many routes and different trips. I've setup a rather comprehensive database that maps all this together, but i'm having trouble getting the pathing algorithm working when i comes to cross route reservation.
For example, the user wants to go from Montreal to Sherbrooke, he'll only take what we call here Route #47. But in the event he goes to Sutton instead of Sherbrooke, he now has to transfer into route #53 at some point.
Now, it isn't too hard to detect one and only one transfer. But when i comes to detecting what are the options he can do to cross multiple routes, i'm kinda scared. I've devised a cute and relatively efficient way to do so on 1-3 hops using only SQL but i'm wondering how i should organize all this in a much broader spectrum as the client will probably not stay with 2 routes for the rest of is life.
Example of what i've thought of so far:
StartingStop
joins to Route
joins to StopsOfTheRoute
joins to TransfersOnThatStop
joins to TargetStopOfThatTransfer
joins to RouteOfThatStop
joins to StopsOfThatNewRoute
[wash rince repeat for more hops]
where StopsOFThatNewRoute = EndingStop
Problem is, if i have more than 3 hops, i'm sure my SQL server will choke rather fast under the pressure, even if i correctly index my database, i can easily predict a major failure eventually...
Thank you
My understanding of your problem: You are looking for an algorithm that will help you identify a suitable path (covering segments of one or more routes).
This is as Jason correctly points out, a pathfinding problem. For this kind of problem, you should probably start by having a look at Wikipedia's article on pathfinding, and then dig into the details of Djikstra's algorithm. That will probably get you started.
You will however most likely soon realise that your data model might pose a problem, both from a structure and performance perspective. Typical example would be if you need to manage time constraints: which path has the shortest travel time, assuming you find several? One path might be shortest, but only provide one ride per day, while another path might be longer but provide several rides per day.
A possible way of handling this is to create a graph where each node corresponds to a particular stop, at a particular time. An edge would connect from this stop in room-time to both other geographical stops, as well as itself at the next point in time.
My suggestion would be to start by reading up on the pathfinding algorithms, then revisit your data model with regards to any constraints you might have. Then, focus on the structure for storing the data, and the structure for seaching for paths.
Suggestion (not efficient, but could work if you have a sufficient amount of RAM to spare). Use the relational database server for storing the basics: stops, which routes are connected to which stops and so on. Seems you have this covered already. Then build an in-memory representation of a graph given the constraints that you have. You could probably build your own library for this pretty fast (I am not aware of any such libraries for PHP).
Another alternative could be to use a graph database such as Neo4j and its REST interface. I guess this will require significant some redesign of your application.
Hope this gives you some helpful pointers.