For an online game, I have a table that contains all the plays, and some information on those plays, like the difficulty setting etc.:
+---------+---------+------------+------------+
| play-id | user-id | difficulty | timestamp |
+---------+---------+------------+------------+
| 1 | abc | easy | 1335939007 |
| 2 | def | medium | 1354833214 |
| 3 | abc | easy | 1354833875 |
| 4 | abc | medium | 1354833937 |
+---------+---------+------------+------------+
In another table, after the game has finished, I store some stats related to that specific game, like the score etc:
+---------+----------------+--------+
| play-id | type | value |
+---------+----------------+--------+
| 1 | score | 201487 |
| 1 | enemies_killed | 17 |
| 1 | gems_found | 4 |
| 2 | score | 110248 |
| 2 | enemies_killed | 12 |
| 2 | gems_found | 7 |
+---------+----------------+--------+
Now, I want to make a distribution graph so users can see in what score percentile they are. So I basically want the boundaries of the percentiles.
If it would be on a score level, I could rank the scores and start from there, but it needs to be on a highscore level. So mathematically, I would need to sort all the highscores of users, and then find the percentiles.
I'm in doubt what's the best approach here.
On one hand, constructing an array that holds all the highscores seems like a performance heavy thing to do, because it needs to cycle through both tables and match the scores and the users (the first table holds around 10M rows).
On the other hand, making a separate table with the highscore of users would make things easier, but it feels like it's against the rules of avoiding data redundancy.
Another approach that came to mind was doing the performance heavy thing once a week and keep the result in a separate table, or doing the performance heavy stuff on only a (statistically relevant) subset of the data.
Or maybe I'm completely missing the point here and should use a completely different database setup?
What's the best practice here?
Related
I have a question on making the effective database structure for accounting code. The result I was expecting is this
| ID | Code | Name | Parent |
| 1 | 1 | Assets | |
| 2 | 1 | Tangible Fixed Assets | 1 |
| 3 | 1 | Building | 2 |
| 4 | 2 | Intangible Fixed Assets| 1 |
| 5 | 1 | CopyRights | 3 |
I've been thinking about making 3 tables such as tbl_lvl1 for main parent, tbl_lvl2 as first child and tbl_lvl3 as second child. I found about recursive query, which is just only using 1 table, but it's kind of difficult making recursive query in MYSQL.
And the result I want to view in PHP, is something like this
| Code | Name |
| 1 | Assets |
| 11 | Tangible Fixed Assets |
| 111 | Building |
| 12 | Intangible Fixed Asset |
| 121 | CopyRights |
Which structure I should make? Using 3 table or 1 table ? Thank you
You're looking for a search tree, and I'd especially suggest a B-tree.
A search tree, generally spoken, allows you to hierarchically search for all sub-nodes in a single query through nested intervals.
There are literally dozens of implementations, so you don't need to dig deep into the details, even though I would suggest it, as it's a major data structure that you should be used to.
I get some issues when i implement product_description table with language .
my process is that i have default table product_description_en to store description and when a client installs new language (Chinese) the php script will create new table product_des_ch and then put the all default data(from the English table) in to the newly created table.then the client can update .
My problems are
Is it a security issue that we create the table dynamically while installing new language
2.If we use same table for all languages(the records will be around 500,000) then are there any per performance issues
3.what is the best way for large amount of records to store , i mean same table or separate tables.
Thanx
Az
Updated:
This is sample product_description table structure for English table and Japan .What you think about this table(we store the all records in a same table and when the client inserts new record for different language only inserting new records ) ,Any feedback please ?
+---------------------------------------------------------------------------+
| product_id | name | desc | meta_name | meta_desc | key_words | lan_code |
+---------------------------------------------------------------------------+
| 1 | A | D| m1 | m_d1 | k1 | en |
+---------------------------------------------------------------------------+
| 1 | A | D| m2 | m_d2 | k2 | jp |
+---------------------------------------------------------------------------+
Basic RDBMS design wisdom would put a huge red flag on anything that dynamically alters the table structure. Relational databases are more than flexible enough to handle pretty much any situation without requiring such measures.
My suggestion as for the structure would be to create a single Languages table to store the available languages, and then a Phrases table to store all the available phrases. Then use a Translations table to provide the actual translations of those phrases into the available languages. Something that might look like this:
Language
+----+---------+
| id | name |
+----+---------+
| 1 | English |
| 2 | Chinese |
+----+---------+
Phrase
+----+-------------+
| id | label |
+----+-------------+
| 1 | header |
| 2 | description |
+----+-------------+
Translations
+-------------+-----------+-----------------+
| language_id | phrase_id | translation |
+-------------+-----------+-----------------+
| 1 | 1 | Header |
| 1 | 2 | Description |
| 2 | 1 | 头 |
| 2 | 2 | 描述 |
+-------------+-----------+-----------------+
For small to medium sized databases, there should be no performance issues at all even using the default database configurations. If you get to huge sizes (where you are counting the database size in terabytes) you can optimize the database in many ways to keep the performance level acceptable.
Recently I have been planning a system that allows a user to customize and add to a web interface. The app could be compared to a quiz creating system. The problem I'm having is how to design a schema that will allow for "variable" numbers of additions to be made to the application.
The first option that I looked into was just creating an object for the additions and then serializing it and putting it in its own column. The content wouldn't be edited often so writing would be minimal, reads however would be very often. (caching could be used to cut down)
The other option was using something other than mysql or postgresql such as cassandra. I've never used other databases before but would be interested in learning how to use them if they would improve the design of the system.
Any input on the subject would be appreciated.
Thank you.
*edit 29/3/14
Some information on the data being changed. For my idea above of using a serialized object, you could say that in the table I would store the name of the quiz, the number of points the quiz is worth and then a column called quiz data that would store the serialized object containing the information on the questions. So overall the object could look like this:
Questions(Array):{
[1](Object):Question{
Field-type(int):1
Field-title(string):"Whats your gender?"
Options(Array):{"Female", "Male"}
}
[2](Object):Question{
Field-type(int):2
Field-title(string):"Whats your name?"
}
}
The structure could vary of course but generally i would be storing integers to determin the type of field in the quiz and then a field to hold the label for the field and the options (if there are any) for that field.
In this scenario I would advise looking at MongoDB.
However if you want to work with MySQL you can think about the entity-attribute-value model in your design. The EAV model allows you to design for entries that contain a variable number of attributes.
edit
Following your update on the datatypes you would like to store, you could map your design as follows:
+-------------------------------------+
| QuizQuestions |
+----+---------+----------------------+
| id | type_id | question_txt |
+----+---------+----------------------+
| 1 | 1 | What's your gender? |
| 2 | 2 | What's your name? |
+----+---------+----------------------+
+-----------------------------------+
| QuestionTypes |
+----+--------------+---------------+
| id | attribute_id | description |
+----+--------------+---------------+
| 1 | 1 | Single select |
| 2 | 2 | Free text |
+----+--------------+---------------+
+----------------------------+
| QuestionValues |
+----+--------------+--------+
| id | question_id | value |
+----+--------------+--------+
| 1 | 1 | Male |
| 2 | 1 | Female |
+----+--------------+--------+
+-------------------------------+
| QuestionResponses |
+----+--------------+-----------+
| id | question_id | response |
+----+--------------+-----------+
| 1 | 1 | 1 |
| 2 | 2 | Fred |
+----+--------------+-----------+
This would then allow you to dynamically add various different questions (QuizQuestions), of different types (QuestionTypes), and then restrict them with different options (QuestionValues) and store those responses (QuestionResponses).
Say if I wanted to add the functionality of logging user actions within a web application. My table schema would look similar to the following:
tbl_history:
+----+---------+--+-----------+
| id | user_id | | action_id |
+----+---------+--+-----------+
| 1 | 1 | | 1 |
| 1 | 1 | | 2 |
| 1 | 2 | | 2 |
+----+---------+--+-----------+
A user can generate many actions so I will need to paginate this history. In order to do this I will need to figure out the total amount of rows for the user then calculate how many pages of data there should be.
Which would method be the most efficient if I were to have hundreds of users generating thousands of rows of data each day?
A)
Using the MYSQL's COUNT() function to query the amount of rows of data in the tbl_history table for a particular user.
B)
Having another table which would keep a count of history for the user within the tbl_history table.
+---------+--+---------------+
| user_id | | history_count |
+---------+--+---------------+
| 1 | | 2 |
| 2 | | 1 |
+---------+--+---------------+
This will allow me to instantly get the total count of rows with a simple query in less than 1ms.
The tradeoff is that I will need to perform more queries updating the count for each user and also again on page load.
Which method is more efficient to use? Or is there any other better method? Any technical explanation would be great.
Thanks in advance.
For a table with 100% reading (no writing), which structure is better and why?
[My table has many columns, but I've made an example here with 4 columns for simplicity]
Option 1: One table with multiple columns
ID | Length | Width | Height
-----------------------------------------
1 | 10 | 20 | 30
2 | 100 | 200 | 300
Option 2: Two tables; one storing column headers, and other storing values
Table 1:
ID | Object_ID | Attribute_ID | Attribute_Value
------------------------------------------
1 | 1 | 1 | 10
2 | 1 | 2 | 20
3 | 1 | 3 | 30
4 | 2 | 1 | 100
5 | 2 | 2 | 200
6 | 2 | 3 | 300
Table 2:
ID | Name
-------------------
1 | Length
2 | Width
3 | Height
Your second option is an under-optimized implementation of the EAV anti-pattern:
Entity-Attribute-Value Model
Why it's bad has already been argued to death on this site and elsewhere.
You'll get much better results from the first.
I will preface this by saying that I'm a relative novice to SQL and database tables; that, however, doesn't mean that I don't know my basics.
Unless your example is heavily oversimplified, you really should use the first example. Not only will it be faster and easier to query, but it simply makes more sense.
In this example, you don't need to split your tables at all; your 'Attribute IDs' are adequately represented by the table headers. Further, these values have no real meaning by themselves, so they really don't need to be in another table.
You would generally break out a new table and reference it as you have if you had another object, existing separately, relating to your object with a one-to-many relationship.
Here's an example (actually from my database on an O'Reilly server) using blog entries and comments on blog entries:
mysql> select * from blog_entries;
+----+--------------+-------------+---------------------+
| id | poster | post | timestamp |
+----+--------------+-------------+---------------------+
| 1 | lunchmeat317 | blah blah | 0000-00-00 00:00:00 |
| 2 | Yongho Shin | yadda yadda | 0000-00-00 00:00:00 |
+----+--------------+-------------+---------------------+
2 rows in set (0.00 sec)
mysql> select id, blog_id, poster, post, timestamp from blog_comments;
+----+---------+--------------+----------------+---------------------+
| id | blog_id | poster | post | timestamp |
+----+---------+--------------+----------------+---------------------+
| 1 | 1 | lunchmeat317 | humina humina | 0000-00-00 00:00:00 |
| 2 | 1 | Joe Blow | huh? | 0000-00-00 00:00:00 |
| 3 | 2 | lunchmeat317 | yakk yakk yakk | 0000-00-00 00:00:00 |
| 4 | 2 | Yongho Shin | lol | 0000-00-00 00:00:00 |
+----+---------+--------------+----------------+---------------------+
4 rows in set (0.00 sec)
mysql>
Think about it from a logical perspective; there's no reason to artificially inject complexity into this design when it doesn't need to be there. In your example, length, width, and height aren't really separate objects, and they're all related to the dimensions of the object you're describing in the table row. Further, length width and height only have one value at a given time.
I hope that made some sense - if I was a bit pedantic in my pedagogy, I apologize. However, if someone else stumbles on this question, hopefully this example will help them.
Good luck.
Edit: I just realized that your question was specifically about performance. That's a little more in-depth, perhaps based on the db engine that you use? Generally, though, I would imagine that querying a table without doing any joins would be slightly faster, considering that denormalization is a commonly-cited method of improving performance.