I have a large list of exercises that I am currently storing in a SQL table with a unique index value (int) in one column and the exercise name in the other column. I am trying to determine an efficient way of storing sets of exercises as workouts.
As of now, I have two ideas in mind but am not sure what the most efficient (both from a storage and a speed standpoint). These are the ideas I am considering:
Solution 1 Create a new table that has a column for every exercise that would have a 1 or 0 depending on whether or not it was within a workout. The workout would be an index column, with a unique int id.
Solution 2 Create a new table that has a the index column representing the unique workouts. A second column that has an array of the numbers that would correspond to the set of workouts. What I prefer about this option is that it would allow me to preserve the order of exercises within the workout.
While I currently only have a list of ~800 exercises and ~400 workouts making (1) an array of size 800 x 400 whereas (2) would be 400 x 2. However, with the option to create new workout (lets just say that the exercise list will be fixed) table (1) would grow significantly faster.
Does anyone have suggestions as to how to minimize size and maintain speed in such a context?
I like #2 but you could also, if you wanted a more robust solution, have 3 tables:
exercises
workouts
workout_exercises
In addition, I generally would not store the exercises per work out as an array. It would be difficult to use sql to search for a specific exercise within that field, eg:
What workouts contain exercise XYZ?
Your table sizes are actually tiny. When you have 10,000 records or 100,000 records you can start to think about size issues.
Related
I'm attempting a redesign of the database schema for a website where we have a dynamic form filled out by users based on the category they fit into. The end result of this dynamic form is that I have 100+ columns in the database to store their answers but in any given case less than 10 will actually have a value.
I'm looking to really tighten up the database as much as possible and remove NULL values where possible and 1 solution to the above problem that I am considering is to use a EAV table to hold the answers to variable questions.
So table 1 would hold their name and a few other bits and table 2 would have a row holding the question and answer to each question with a foreign key linking to table 1.
I've done some slap dash calculations that if they were to get 1000 forms filled a day (high estimate, not impossible in the longer term) I'm looking at up to 2,600,000 rows in table 2 per year (1000 * 10 * 260 - working days). My primary key will go up as far as 4 billion so that isn't a problem but I'm concerned that after a few years with 5 - 10 million records that performance will seriously degrade.
I'm mostly concerned with performance, as far as structure goes, EAV is very much preferable as it will allow my client to add new questions without having to interact with the database and it helps keep my database schema much tighter and I'm not concerned about how this complicates my code if it is the right data solution.
I've got a MySQL INNODB table containing about 2,000,000 rows with 10 fields (table "cars"). It'll keep increasing progressively at a current rate of about 500,000 rows a year. It's a busy table getting different type of queries on average 2-3 times a second 24/7.
The situation right now is that I need to expand the information to include an INT field ("country_id"). But, this field will for at least 99 % of all rows be default "1".
My question is: Would there be any specific reasons to do either of the following solutions:
Add the INT field to the table and index it ("cars"."country_id")
Add a relational table ("car_countries") which includes the fields "car_id" and "country_id"
I setup these examples in the test environment made a few thousand iterations of querying the tables for data to find this out:
Database/table size will due to the index increase with 19 % (~21 MB)
Queries will take on average 16 % longer (0.37717 secs vs 0.32431 secs for 1,000 queries each)
I've previously tried to keep tables filled with appropriate information for all fields and added relational tables where non-mandatory information was needed for a table but now I've read there's little gain in this as long as there's no need to have arrayed data (which MySQL doesn't handle (and PostgreSQL does)) in the table. In my example a specific car will never be sold to 2 countries so there will never be a need to add more countries to a specific car.
Almost everything is easier with solution 1 and since disk space doesn't really matter. Should I still consider solution 2 anyway? If so, why?
Best regards,
/Thomas
The theoretical answer is that option 1 reflects your underlying relationships - a car can be sold to only one country, and therefore a "many to many" relationship (which option 2 suggests) is not appropriate. It would confuse future developers, and pollutes the data model.
The pragmatic answer is that option 2 doesn't appear to have a dramatic performance improvement today, and - crucially - it's likely to introduce complexity into your code. If 99% of the queries don't need the country data, you either have to write the query to include it (thus negating the performance benefit), or build nasty "if I need country THEN query = xxx ELSE query = yyy" logic.
Finally, apropos the indexing question - MySQL only uses one index for a query, so unless you're writing a query where "country" is in the where clause or being joined on, it's unlikely to have an impact.
Thanks to bwoebi, Raphaël Althaus, AgRizzo, Alfons and Ed Gibbs for the input to the question!
Short summary:
Since there can't be two countries to a car and there's only one extra field needed:
Go with Solution 1
Also, an index is probably not needed, check our cardinality and performance on the specific scenario
/Thomas
I have a table which stores highscores for a game. This game has many levels where scores are then ordered by score DESC (which is an index) where the level is a level ID. Would partitioning on this level ID column create the same result as create many seperate level tables (one for each level ID)? I need this to seperate out the level data somehow as I'm expecting 10's of millions of entries. I hear partitioning could speed this process up, whilst leaving my tables normalised.
Also, I have an unknown amount of levels in my game (levels may be added or removed at any time). Can I specify to partition on this level ID column and have new partitions automaticaly get created when a new (distinct level ID) is added to the highscore table? I may start with 10 seperate levels but end up with 50, but all my data is still kept in one table, but many partitions? Do I have to index the level ID to make this work?
Thanks in advance for your advice!
Creting an index on a single column is good, but creating an index that contains two columns would be a better solution based on the information you have given. I would run a
alter table highscores add index(columnScore, columnLevel);
This will make performance much better. From a database point of view, no matter what highscores you are looking for, the database will know where to search for them.
On that note, if you can, (and you are using mysami tables) you could also run a:
alter table order by columnScore, columnLevel;
which will then group all your data together, so that even though the database KNOWS where each bit is, it can find all the records that belong to one another nearby - which means less hard drive work - and therefore quicker results.
That second operation too, can make a HUGE difference. My PC at work (horrible old machine that was top of the range in the nineties) has a database with several million records in it that I built - nothing huge, about 2.5gb of data including indexes - and performance was dragging, but ordering the data for the indexes improved query time from about 1.5 minutes per query to around 8 seconds. That's JUST due to hard drive speed in being able to get to all the sectors that contain the data.
If you plan to store data for different users, what about having 2 tables - one with all the information about different levels, another with one row for every user alongside with his scores in XML/json?
I pull a range (e.g. limit 72, 24) of games from a database according to which have been voted most popular. I have a separate table for tracking game data, and one for tracking individual votes for a game (rating from 1 to 5, one vote per user per game). A game is considered "most popular" or "more popular" when that game has the highest average rating of all the rating votes for said game. Games with less than 5 votes are not considered. Here is what the tables look like (two tables, "games" and "votes"):
games:
gameid(key)
gamename
thumburl
votes:
userid(key)
gameid(key)
rating
Now, I understand that there is something called an "index" which can speed up my queries by essentially pre-querying my tables and constructing a separate table of indices (I don't really know.. that's just my impression).
I've also read that mysql operates fastest when multiple queries can be condensed into one longer query (containing joins and nested select statements, I presume).
However, I am currently NOT using an index, and I am making multiple queries to get my final result.
What changes should be made to my database (if any -- including constructing index tables, etc.)? And what should my query look like?
Thank you.
Your query that calculates the average for every game could look like:
SELECT gamename, AVG(rating)
FROM games INNER JOIN votes ON games.gameid = votes.gameid
GROUP BY games.gameid
HAVING COUNT(*)>=5
ORDER BY avg(rating) DESC
LIMIT 0,25
You must have an index on gameid on both games and votes. (if you have defined gameid as a primary key on table games that is ok)
According to the MySQL documentation, an index is created when you designate a primary key at table creation. This is worth mentioning, because not all RDBMS's function this way.
I think you have the right idea here, with your "votes" table acting as a bridge between "games" and "user" to handle the many-to-many relationship. Just make sure that "userid" and "gameid" are indexed on the "votes" table.
If you have access to use InnoDB storage for your tables, you can create foreign keys on gameid in the votes table which will use the index created for your primary key in the games table. When you then perform a query which joins these two tables (e.g. ... INNER JOIN votes ON games.gameid = votes.gameid) it will use that index to speed things up.
Your understanding of an index is essentially correct — it basically creates a separate lookup table which it can use behind the scenes when the query is executed.
When using an index it is useful to use the EXPLAIN syntax (simply prepend your SELECT with EXPLAIN to try this out). The output it gives show you the list of possible keys available for the query as well as which key the query is using. This can be very helpful when optimising your query.
An index is a PHYSICAL DATA STRUCTURE which is used to help speed up retrieval type queries; it's not simply a table upon a table -> good for a concept though. Another concept is the way indexes work at the back of your text book (the only difference is with your book a search key could point to multiple pages / matches whereas with indexes a search key points to only one page/match). An index is defined by data structures so you could use a B+ tree index and there are even hash indexes. It's Database/Query optimization from the physical/internal level of the Database - I'm assuming that you know that you're working at the higher levels of the DBMS which is easier. An index is rooted within the internal levels and that make DB query optimization much more effective and interesting.
I've noticed from your question that you have not even developed the query as yet. Focus on the query first. Indexing comes after, as a matter of a fact, in any graduate or post graduate Database course, indexing falls under the maintenance of a Database and not necessarily the development.
Also N.B. I have seen quite many people say as a rule to make all primary keys indexes. This is not true. There are many instances where a primary key index would slow up the Database. Infact, if we were to go with only primary indexes then should use hash indexes since they work better than B+ trees!
In summary, it doesn't make sense to ask a question for a query and an index. Ask for help with the query first. Then given your tables (relational schema) and SQL query, then and only then could I advice you on the best index - remember its maintenance. We can't do maintanance if there is 0 development.
Kind Regards,
N.B. most questions concerning indexes at the post graduate level of many computing courses are as follows: we give the students a relational schema (i.e. your tables) and a query and then ask: critically suggest a suitable index for the following query on the tables ----> we can't ask a question like this if they dont have a query
I'm looking to create an SQL query (in MySQL) that will display 6 random, yet popular entries in my web application.
My database has the following tables:
favorites
submissions
submissions_tags
tags
users
submissions_tags and tags are cross-referencing tables that give each submission a certain number of tags.
submissions contains boolean featured, int downloads, and int views, all three of which I'd like to use to weight this query with.
The favorites table is again a cross-reference table with the fields submission_id and user_id. Counting the number of times each submission has been favorited would be good to weigh the results with.
So basically I want to select 6 random rows weighted with these four variables - featured, downloads, views, and favorite count. Each time the user refreshes the page, I want a new random 6 to be selected. So maybe the query could limit it to 12 most-recent but only pluck 6 random results out to show. Is that a sensible idea in terms of processing etc.?
So my question is, how can I go about writing this query? Where should I begin? I am using PHP/CodeIgniter to drive this site with. Is it possible to get the entire lot in one query, or will I have to use multiple queries to do this? Or, do I need to simplify my ideas?
Thanks,
Jack
I've implemented something similar to this before. The route I took was to have a script run on the server every XX minutes to fill a table with a pool of items (say 20-30 items). Then the query to use in your application would be randomly pick 5 or so from that table.
Just need to setup an algorithm to select those 20-30 items. #Emmerman's is similar to what I used before to calculate a popularity_number where I took weights of multiple associations to the item (views, downloads, etc) to get an overall number. We also used an age to make sure the pool of items stayed up-to-date. You'll have to tinker with the algorithm over time to make sure the relevant items are being populated.
The idea is to calc some popularity which can be for e.g.
popularity = featured*W1 + downloads*W2 + views*W3 + fcount*W4
Where W1-W4 are constant weights.
Then add some random number to popularity and sort for it.