I'm building a very large website currently it uses around 13 tables and by the time it's done it should be about 20.
I came up with an idea to change the preferences table to use ID, Key, Value instead of many columns however I have recently thought I could also store other data inside the table.
Would it be efficient / smart to store almost everything in one table?
Edit: Here is some more information. I am building a social network that may end up with thousands of users. MySQL cluster will be used when the site is launched for now I am testing using a development VPS however everything will be moved to a dedicated server before launch. I know barely anything about NDB so this should be fun :)
This model is called EAV (entity-attribute-value)
It is usable for some scenarios, however, it's less efficient due to larger records, larger number or joins and impossibility to create composite indexes on multiple attributes.
Basically, it's used when entities have lots of attributes which are extremely sparse (rarely filled) and/or cannot be predicted at design time, like user tags, custom fields etc.
Granted I don't know too much about large database designs, but from what i've seen, even extremely large applications store their things is a very small amount of tables (20GB per table).
For me, i would rather have more info in 1 table as it means that data is not littered everywhere, and that I don't have to perform operations on multiple tables. Though 1 table also means messy (usually for me, each object would have it's on table, and an object is something you have in your application logic, like a User class, or a BlogPost class)
I guess what i'm trying to say is that do whatever makes sense. Don't put information on the same thing in 2 different table, and don't put information of 2 things in 1 table. Stick with 1 table only describes a certain object (this is very difficult to explain, but if you do object oriented, you should understand.)
nope. preferences should be stored as-they-are (in users table)
for example private messages can't be stored in users table ...
you don't have to think about joining different tables ...
I would first say that 20 tables is not a lot.
In general (it's hard to say from the limited info you give) the key-value model is not as efficient speed wise, though it can be more efficient space wise.
I would definitely not do this. Basically, the reason being if you have a large set of data stored in a single table you will see performance issues pretty fast when constantly querying the same table. Then think about the joins and complexity of queries you're going to need (depending on your site)... not a task I would personally like to undertake.
With using multiple tables it splits the data into smaller sets and the resources required for the query are lower and as an extra bonus it's easier to program!
There are some applications for doing this but they are rare, more or less if you have a large table with a ton of columns and most aren't going to have a value.
I hope this helps :-)
I think 20 tables in a project is not a lot. I do see your point and interest in using EAV but I don't think it's necessary. I would stick to tables in 3NF with proper FK relationships etc and you should be OK :)
the simple answer is that 20 tables won't make it a big DB and MySQL won't need any optimization for that. So focus on clean DB structures and normalization instead.
Related
I have an activity records table named revisions (showed in following image) built for a big learning management system, which mainly keeps record of CRUD operations on tables (e.g. who has done what on which object in what time).
This table may contain up to 3M records of data. I want to build a search functionality for this on the front-end with PHP/Laravel.
Now my question is that what things should I consider for building search functionalities with high performance for tables with millions of records of data, what are the things on code level, database level, or are there 3rd party stuff to support these kind of issues?
I am experienced with building systems with PHP/Laravel, Python/Django, Ruby, etc. But I have never encountered with a case like this, dealing with millions records of data. So please keep in mind my knowledge/experience level. I have NO experience on this level.
Note: Search will be an advance search, making users able to search with different criteria and parameters, the object which is changed, who has changed it, when it's changed, etc.
Let me know if my question still isn't clear.
I would recommend to take a look at the https://www.elastic.co/products/elasticsearch and save your activity records to its storage when you do save to the main database. Then you can easily search any field. Elasticsearch can store a schema free JSON documents, if you prefer more SQL way, there is another search engine - http://sphinxsearch.com/.
There is no problem inserting a zillion rows into a table. Performance problems come when you try to do non-trivial SELECTs on the table. You mentioned "search"; you will have to limit what the 'users' can search for. But at least make a stab at what they might want to search for.
You mentioned "searching for an object", but I don't see a column called object. How many rows might there be for a given object? Do you need all the rows? Or selected ones? (An INDEX on object is likely to make the query efficient, regardless of table size.)
Third-party software sometimes gets in the way of dealing with really large tables. Beware.
So I'm working on site that will replace an older site with a lot of traffic, and I will also have a lot of data in the DB, so my question to you guys is what is the best way to design mysql tables for growth?
I was thinking to split let's say a table with 5 000 000 rows in 5 tables,with 1 000 000 rows/table and create a relationship between the tables, but I guess this isn't a good option since I will spend a lot of resources and time to figure out in what table my data is.
Or can you guys give me some tips mabe some useful articles?
No, you're absolutely right on the relationships. This technique is called Normalization where you define separate tables because these individual tables are affected with time and independent of other tables.
So if you have a hotel database that keeps a track of rooms and guests, then you know normalization is necessary because rooms and guests are independent of each other.
But you will have foreign keys/surrogate keys in each table (for instance, room_id) that could relate the particular guest entering for that particular room.
Normalization, in your case, could help you optimize that 5000 rows of yours as it would not be optimal for a loop to go over 5000 elements and retrieve an entire data.
Here is a strong example for why normalization is essential in database management.
Partitioning as mentioned in a comment is one way to go, but the first path to check out is even determining if you can break down the tables with the large amounts of data into workable chunks based on some internal data.
For instance, lets say you have a huge table of contacts. You can essentially break down the data into contacts that start from a-d, e-j, etc. Then when you go to add records you just make sure you add the records to the correct table (I'd suggest checking out stored procedures for handling this, so that logic is regulated in the database). You'd also probably set up stored procedures to also get data from the same tables. By doing this however, you have to realize that using auto-incrementing IDs won't work correctly as you won't be able to maintain unique IDs across all of the tables without doing some work yourself.
These of course are the simple solutions. There are tons of solutions for large data sets which also includes looking at other storage solutions, clustering, partitioning, etc. Doing some of these things manually yourself can give you a little bit of an understanding on some of the possibly "manual solutions".
I have a classifieds website, and I am thinking about redesigning the database a bit.
Currently I have 7 tables in the db. One table for each "MAIN CATEGORY".
For example, I have a "VEHICLES" table which holds all information about the following categories of classifieds:
cars
mc
mopeds/scooters
trucks
boats
etc etc
However, users on the website usually search in specific categories. For example, the user chooses the "cars" category to search in, and enters a keyword.
My code today, will search the entire VEHICLES table for all records with the field "category" equal to "cars", and then get their details:
"SELECT * IN vehicles WHERE category='cars' AND alot of other conditions" // just for example, not tested
I am thinking about making a table now, for each of these "sub-categories".
Ie, one for cars, one for mc, one for trucks etc, so that search isn't done through information which isn't needed.
Will this increase search speed? Because I have calculated that I will need atleast 30 or so tables for this.
Thanks
With a properly indexed table and a "reasonable" number of rows, you will not gain much speed from this approach. Anything you gain in speed of execution you will lose in time-to-market because your programming will become more complicated.
Do not perform this optimization unless and until you encounter a performance problem in testing with a representative set of data.
It will increase the speed of a search within the same category. It will potentially slow down queries where you need aggregate information from the different categories. You need to decide which is the best option for your site.
How many records do you have in total in the vehicles table. Its quite likely that adding proper indexes will greatly increase the speed of your searches.
Check out the 'EXPLAIN' query option in MySQL. Understanding this will help you optimize your database a lot with indices.
Performance optimization is as much art as science, and to really understand what's the best option requires that you do some benchmarking; anyone offering a definitive answer given the available information is just wrong. That said, a few thoughts on your situation:
You don't say what type your category column is now, but if it's a string type, it's probably using more space than other options, thus making the table larger. Proper indexing can help tremendously with speed, but a larger table with larger indexes will always work to do just the opposite.
As already mentioned by someone else, your queries within a category will be faster in the simple case of a category search. How much faster depends on how much data you have in your current table, and the increases may be negated if you have to join in other tables to satisfy the need for all the other conditions to which you alluded. OTOH, it may actually speed things up in certain join cases (e.g., if you were doing self-joins with your all-encompassing table).
If you're working with a lot of data, splitting into multiple tables can greatly ease backups.
Splitting into multiple tables may also make it easier to shard your data across multiple servers for performance reasons. Similarly, it may make replication setups easier to keep running.
If you're tracking data that's category-specific, separate tables enables you to better normalize your database and likely reap some nice performance as a result of using much smaller tables.
Splitting obviously means modifying your code. If your code is of the old, creaky type, you may very well achieve a performance gain from the clean-up. Of course, there's also the risk that you'll break something....
Check your indexes. Bad indexes are a very common cause of poor performance but are relatively easy to fix with a bit of quality time spent on self-education. MySQL's EXPLAIN can tell you whether your queries are using the indexes, and the index stats (look in the docs) can tell you how efficiently your indexes are working.
Finally, speaking of code, check yours. Try experimenting with a few approaches, regardless of how the database is set up. For example, it may be quicker to do a couple of separate queries and join the results in code than to do the join in the database. Likewise, it's often quicker to do things like sorts in code, particularly in cases where a join or something means the database would have to create a temporary file/table. Again, check the EXPLAIN output, and if you can't eliminate a problem area in your queries, see if it helps to simplify the queries and do more work in the code. This can be particularly beneficial in the common case where the web server has more resources to spare than the database server.
There are many more factors to consider. Ultimately, though, the best way to make these decisions is not to spend time pondering theories but to put both methods to the test. Create some test databases and benchmark the sort of queries you'd run most often, with and without simulated load. You'll get your answer.
if you are using php try something like
$query = mysql_query($sql);
while($row = mysql_fetch_assoc($query)){
$tempvalue[]=$row;
}
and then to loop the info use for like sentence
foreach($tempvalue as $key => $value){
write the table .....
}
maybe mysql isnt slow and the problem is in the code
test dont kill anyone =)
I have a pretty large social network type site I have working on for about 2 years (high traffic and 100's of files) I have been experimenting for the last couple years with tweaking things for max performance for the traffic and I have learned a lot. Now I have a huge task, I am planning to completely re-code my social network so I am re-designing mysql DB's and everything.
Below is a photo I made up of a couple mysql tables that I have a question about. I currently have the login table which is used in the login process, once a user is logged into the site they very rarely need to hit the table again unless editing a email or password. I then have a user table which is basicly the users settings and profile data for the site. This is where I have questions, should it be better performance to split the user table into smaller tables? For example if you view the user table you will see several fields that I have marked as "setting_" should I just create a seperate setting table? I also have fields marked with "count" which could be total count of comments, photo's, friends, mail messages, etc. So should I create another table to store just the total count of things?
The reason I have them all on 1 table now is because I was thinking maybe it would be better if I could cut down on mysql queries, instead of hitting 3 tables to get information on every page load I could hit 1.
Sorry if this is confusing, and thanks for any tips.
alt text http://img2.pict.com/b0/57/63/2281110/0/800/dbtable.jpg
As long as you don't SELECT * FROM your tables, having 2 or 100 fields won't affect performance.
Just SELECT only the fields you're going to use and you'll be fine with your current structure.
should I just create a seperate setting table?
So should I create another table to store just the total count of things?
There is not a single correct answer for this, it depends on how your application is doing.
What you can do is to measure and extrapolate the results in a dev environment.
In one hand, using a separate table will save you some space and the code will be easier to modify.
In the other hand you may lose some performance ( and you already think ) by having to join information from different tables.
About the count I think it's fine to have it there, although it is always said that is better to calculate this kind of stuff, I don't think for this situation it hurt you at all.
But again, the only way to know what's better your you and your specific app, is to measuring, profiling and find out what's the benefit of doing so. Probably you would only gain 2% of improvement.
You'll need to compare performance testing results between the following:
Leaving it alone
Breaking it up into two tables
Using different queries to retrieve the login data and profile data (if you're not doing this already) with all the data in the same table
Also, you could implement some kind of caching strategy on the profile data if the usage data suggests this would be advantageous.
You should consider putting the counter-columns and frequently updated timestamps in its own table --- every time you bump them the entire row is written.
I wouldn't consider your user table terrible large in number of columns, just my opinion. I also wouldn't break that table into multiple tables unless you can find a case for removal of redundancy. Perhaps you have a lot of users who have the same settings, that would be a case for breaking the table out.
Should take into account the average size of a single row, in order to find out if the retrieval is expensive. Also, should try to use indexes as while looking for data...
The most important thing is to design properly, not just to split because "it looks large". Maybe the IP or IPs could go somewhere else... depends on the data saved there.
Also, as the socialnetworksite using this data also handles auth and autorization processes (guess so), the separation between login and user tables should offer a good performance, 'cause the data on login is "short enough", while the access to the profile could be done only once, inmediately after the successful login. Just do the right tricks to improve DB performance and it's done.
(Remember to visualize tables as entities, name them as an entity, not as a collection of them)
Two things you will want to consider when deciding whether or not you want to break up a single table into multiple tables is:
MySQL likes small, consistent datasets. If you can structure your tables so that they have fixed row lengths that will help performance at the potential cost of disk space. One thing that from what I can tell is common is taking fixed length data and putting it in its own table while the variable length data will go somewhere else.
Joins are in most cases less performant than not joining. If the data currently in your table will normally be accessed all at the same time then it may not be worth splitting it up as you will be slowing down both inserts and quite potentially reads. However, if there is some data in that table that does not get accessed as often then that would be a good candidate for moving out of the table for performance reasons.
I can't find a resource online to substantiate this next statement but I do recall in a MySQL Performance talk given by Jay Pipes that he said the MySQL optimizer has issues once you get more than 8 joins in a single query (MySQL 5.0.*). I am not sure how accurate that magic number is but regardless joins will usually take longer than queries out of a single table.
building a site using PHP and MySQL that needs to store a lot of properties about users (for example their DOB, height, weight etc) which is fairly simple (single table, lots of properties (almost all are required)).
However, the system also needs to store other information, such as their spoken languages, instrumental abilities, etc. All in all their are over a dozen such characteristics. By default I assumed creating a separate table (called maybe languages) and then a link table with a composite id (user_id, language_id).
The problem I foresee though is when visitors attempt to search for users using these criteria. The dataset we're looking to use will have over 15,000 users at time of launch and the primary function will be searching and refining users. That means hundreds of queries daily and the prospect of using queries with up a dozen or more JOINs in them is not appealing.
So my question is, is there an alternative that's going to be more efficient? One way I was thinking is storing the M2M values as a CSV of IDs in the user table and then running a LIKE query against it. I know LIKE isn't the best, but is it better than a join?
Any possible solutions will be much appreciated.
Do it with joins. Then, if your performance goals are not met, try something else.
Start with a normalized database (e.g. a languages table, linked to the users table by a mapping table) to make sure you data is represented cleanly and logically.
If you have performance problems, examine your queries and make sure you have suitable indexes.
If you dislike repeatedly coding up queries with many joins, define some views.
If views are very slow to query, consider materialized views.
If you have several thousand records and a few hundred queries per day (really, that's pretty small and low-usage), these techniques will allow your site to run at full speed, with no compromise on data integrity. If you need to scale to many millions of records and millions of queries per day, even these techniques may not be enough; in which case, investigate cacheing and denormalization.