For a table with 100% reading (no writing), which structure is better and why?
[My table has many columns, but I've made an example here with 4 columns for simplicity]
Option 1: One table with multiple columns
ID | Length | Width | Height
-----------------------------------------
1 | 10 | 20 | 30
2 | 100 | 200 | 300
Option 2: Two tables; one storing column headers, and other storing values
Table 1:
ID | Object_ID | Attribute_ID | Attribute_Value
------------------------------------------
1 | 1 | 1 | 10
2 | 1 | 2 | 20
3 | 1 | 3 | 30
4 | 2 | 1 | 100
5 | 2 | 2 | 200
6 | 2 | 3 | 300
Table 2:
ID | Name
-------------------
1 | Length
2 | Width
3 | Height
Your second option is an under-optimized implementation of the EAV anti-pattern:
Entity-Attribute-Value Model
Why it's bad has already been argued to death on this site and elsewhere.
You'll get much better results from the first.
I will preface this by saying that I'm a relative novice to SQL and database tables; that, however, doesn't mean that I don't know my basics.
Unless your example is heavily oversimplified, you really should use the first example. Not only will it be faster and easier to query, but it simply makes more sense.
In this example, you don't need to split your tables at all; your 'Attribute IDs' are adequately represented by the table headers. Further, these values have no real meaning by themselves, so they really don't need to be in another table.
You would generally break out a new table and reference it as you have if you had another object, existing separately, relating to your object with a one-to-many relationship.
Here's an example (actually from my database on an O'Reilly server) using blog entries and comments on blog entries:
mysql> select * from blog_entries;
+----+--------------+-------------+---------------------+
| id | poster | post | timestamp |
+----+--------------+-------------+---------------------+
| 1 | lunchmeat317 | blah blah | 0000-00-00 00:00:00 |
| 2 | Yongho Shin | yadda yadda | 0000-00-00 00:00:00 |
+----+--------------+-------------+---------------------+
2 rows in set (0.00 sec)
mysql> select id, blog_id, poster, post, timestamp from blog_comments;
+----+---------+--------------+----------------+---------------------+
| id | blog_id | poster | post | timestamp |
+----+---------+--------------+----------------+---------------------+
| 1 | 1 | lunchmeat317 | humina humina | 0000-00-00 00:00:00 |
| 2 | 1 | Joe Blow | huh? | 0000-00-00 00:00:00 |
| 3 | 2 | lunchmeat317 | yakk yakk yakk | 0000-00-00 00:00:00 |
| 4 | 2 | Yongho Shin | lol | 0000-00-00 00:00:00 |
+----+---------+--------------+----------------+---------------------+
4 rows in set (0.00 sec)
mysql>
Think about it from a logical perspective; there's no reason to artificially inject complexity into this design when it doesn't need to be there. In your example, length, width, and height aren't really separate objects, and they're all related to the dimensions of the object you're describing in the table row. Further, length width and height only have one value at a given time.
I hope that made some sense - if I was a bit pedantic in my pedagogy, I apologize. However, if someone else stumbles on this question, hopefully this example will help them.
Good luck.
Edit: I just realized that your question was specifically about performance. That's a little more in-depth, perhaps based on the db engine that you use? Generally, though, I would imagine that querying a table without doing any joins would be slightly faster, considering that denormalization is a commonly-cited method of improving performance.
Related
For an online game, I have a table that contains all the plays, and some information on those plays, like the difficulty setting etc.:
+---------+---------+------------+------------+
| play-id | user-id | difficulty | timestamp |
+---------+---------+------------+------------+
| 1 | abc | easy | 1335939007 |
| 2 | def | medium | 1354833214 |
| 3 | abc | easy | 1354833875 |
| 4 | abc | medium | 1354833937 |
+---------+---------+------------+------------+
In another table, after the game has finished, I store some stats related to that specific game, like the score etc:
+---------+----------------+--------+
| play-id | type | value |
+---------+----------------+--------+
| 1 | score | 201487 |
| 1 | enemies_killed | 17 |
| 1 | gems_found | 4 |
| 2 | score | 110248 |
| 2 | enemies_killed | 12 |
| 2 | gems_found | 7 |
+---------+----------------+--------+
Now, I want to make a distribution graph so users can see in what score percentile they are. So I basically want the boundaries of the percentiles.
If it would be on a score level, I could rank the scores and start from there, but it needs to be on a highscore level. So mathematically, I would need to sort all the highscores of users, and then find the percentiles.
I'm in doubt what's the best approach here.
On one hand, constructing an array that holds all the highscores seems like a performance heavy thing to do, because it needs to cycle through both tables and match the scores and the users (the first table holds around 10M rows).
On the other hand, making a separate table with the highscore of users would make things easier, but it feels like it's against the rules of avoiding data redundancy.
Another approach that came to mind was doing the performance heavy thing once a week and keep the result in a separate table, or doing the performance heavy stuff on only a (statistically relevant) subset of the data.
Or maybe I'm completely missing the point here and should use a completely different database setup?
What's the best practice here?
Hello I am facing hard time trying to realized this task. The problem is that I am not sure in which way this have to be proceeded and couldn't find tutorials or information about realizing this type of task.
The question is I have 2 tables and one connecting table between the two of them. With regular query usually what is displayed is the table header which is known value and them then data. In My case I have to display the table horizontally and vertically since the header value is unknown value.
Here is example of the DB
Clients:
+--------+------ +
| ID | client|
+--------+------ +
| 1 | Sony |
| 2 | Dell |
+--------+------ +
Users:
+--------+---------+------------+
| ID | name | department |
+--------+--------+-------------+
| 1 | John | 1|
| 2 | Dave | 2|
| 3 | Michael| 1|
| 4 | Rich | 3|
+--------+--------+-------------+
Time:
+--------+------+---------------------+------------+
| ID | user | clientid | time | date |
+--------+------+---------------------+------------+
| 1 | 1 | 1 | 01:00:00 | 2017-01-02 |
| 2 | 2 | 2 | 02:00:00 | 2017-01-02 |
| 3 | 1 | 2 | 04:00:00 | 2017-02-02 | -> Result Not Selected since date is different
| 4 | 4 | 1 | 02:00:00 | 2017-01-02 |
| 5 | 1 | 1 | 02:00:00 | 2017-01-02 |
+--------+------+---------------------+------------+
Result Table
+------------+--------+-----------+---------+----------+
| Client | John | Michael | Rich | Dave |
+------------+--------+-----------+---------+----------+
| Sony |3:00:00 | 0 | 2:00:00 | 0 |
+------------+--------+-----------+---------+----------+
| Dell | 0 | 0 | 0 | 2:00:00 |
+------------+--------+-----------+---------+----------+
First table Clients Contains information about clients.
Second table Users Contains information about users
Third Table Time contains rows of time for each users dedicated to different clients from the clients table.
So my goal is to make a SQL Query which will show the Result table. In other words it will select sum of hours which every user have completed for certain client. The number of clients and users is unknown. So first thing that have to be done is Select all users, no matter if they have hours completed or not. After that have to select each client and the sum of hours for each client which was realized for individual user.
The problem is I don't know how to approach this situation. Do I have first to make one query slecting all users then foreach them in the table header and then realize second query selecting the hours and foreaching the body conent, or this can be made with single query which will render the whole table.
The filters for select command are:
WHERE MONTH(`date`) = '$month'
AND YEAR(`date`) ='$year'
AND u.department = '$department'
Selecting single row for tume SUM is:
(SELECT SUM( TIME_TO_SEC( `time` ) ) FROM Time tm
WHERE tm.clientid = c.id AND MONTH(`date`) = '$month' AND YEAR(`date`) ='$year'
This is the query to select the times for a user , here by my logic this might be transformed with GROUP BY c.id (client id), and the problem is that it have to contains another WHERE clause which will specify the USER which is unknown. If the users was known value was for example 5, there is no problem to make 5 subsequent for each user WHERE u.id = 1, 2, 3 etc.
So here are the 2 major problems how to display in same query The users header and them select the sum of hours for each client corresponding the user.
Check out the result table hope to make the things clear.
Any suggestion or answer which can come to resolve this situation will be very helpful.
Thank you!
I have the following table structure
+-------+------------+-----------+---------------+
| id |assigned_to | status | group_id |
+-------+------------+-----------+---------------+
| 1 | 1001 | 1 | 19 |
+-------+------------+-----------+---------------+
| 2 | 1001 | 2 | 19 |
+-------+------------+-----------+---------------+
| 3 | 1001 | 1 | 18 |
+-------+------------+-----------+---------------+
| 4 | 1002 | 2 | 19 |
+-------+------------+-----------+---------------+
| 5 | 1002 | 2 | 19 |
+-------+------------+-----------+---------------+
I would like to get the information in the following format
+-------+------------+-----------+
| | 1001 | 1002 |
+-------+------------+-----------+
| 1 | 1 | 0 |
+-------+------------+-----------+
| 2 | 1 | 2 |
+-------+------------+-----------+
So basically I am looking to use the assigned to field as the column names. Then the rows represent the status. So for example in the table we have two rows where user 1002 has a status of 2, therefore the sum is shown on that particular status row.
Please note that the group_id must be 19. Hence why I left out the row with id 3 on my table.
Can someone point me in the right direction. Im sure there is a name for this type of query, but I can't for the life of me put this into words. I have tried various other queries, but none of them even come close to this.
Marc B is right, there is no way to pivot a table -i.e. converting the content of a field into columns- unless you make some assumptions, like supossing that the values of assigned_to are somewhat fixed.
On the other hand, this is the kind of problems that can be solved by a program. It is not an easy program, but it can do the job.
I recently made a program similar to this in java, if you are interested I can post the core of it here.
you might want to read this article http://www.artfulsoftware.com/infotree/qrytip.php?id=523
i'd be something like
SELECT
assigned_to,
COUNT( CASE assigned_to WHEN '1001' THEN 1 ELSE 0 END ) AS '1001',
COUNT( CASE assigned_to WHEN '1002' THEN 1 ELSE 0 END ) AS '1002'
FROM table
WHERE group_by = 19
GROUP BY assigned_to WITH ROLLUP;
or something like that (i haven't tested this code.. )
in the article, he does it using SUM() you'd have to do it with COUNT() and add a WHERE constraint for the group_id
Hope this helps
I'm displaying a record set using Datatables pulling records from two tables.
Table A
sno | item_id | start_date | end_date | created_on |
===========================================================
10523563 | 2 | 2013-10-24 | 2013-10-27 | 2013-01-22 |
10535677 | 25 | 2013-11-18 | 2013-11-29 | 2013-01-22 |
10587723 | 11 | 2013-05-04 | 2013-05-24 | 2013-01-22 |
10598734 | 5 | 2013-06-14 | 2013-06-22 | 2013-01-22 |
Table B
id | item_name |
=====================================
2 | Timesheet testing |
25 | Vigour |
11 | Fabwash |
5 | Cruise |
Now since the number of records returned is going to turn into a big number in near future, I want the processing to be done serverside. I've successfully managed to achieve that but it came at a cost. I'm running into a problem while dealing with filters.
From the figure above, (1) is the column whose value will be in int (item_id), but using some small modifications inside the while loop of the mysql resource, I'm displaying the corresponding string using Table B.
Now if I use the filter (2), it is working fine since those values come from Table A
The Problem
When I try to filter from the field (3), if I enter a string value such as fab it says no record found. But if I enter an int such as 11 I get a single row which contains Fabwash as the item name.
So while filtering I'm required to use the direct value used in Table A and not its corresponding string value stored in Table B. I hope the point that I'm putting across is understandable because it is hard to explain it in words.
I'm clueless on how to solve the issue.
I have to deal with queries that have lots of results, but I only show them in sets of 20-30 rows.
Then I use the SetLimits() method from the php API.
But I need to know what's the total number of results, to calculate the number of pages (or sets of results)
The only way I can do this right now is pulling all the results by setting the limit to 10000000 and see what is in the 'total' key of the array returned by sphinx, but this isn't good because I only need the count() number, I don't wan't sphinx to create a huge array with all the id's.
Performing a select..count() query in mysql won't work, because the indexed data in sphinx is always different.
Any ideas?
Isn't SphinxClient:query returning data about how many records matched your request?
"total" is the number of entries returned by this request (affected by SetLimit) and total_found is the total number of results matching query (not affected by SetLimit) as I understand.
According to manual: SphinxClient::setLimits,
This should do the trick
$cl->SetLimits(0,0);
I'm not Sphinx developer, so this is just a blind guess... It should avoid memory
overflow with large number of results.
Let me know does it work so I can remove answer if this is not correct.
I've also found that SELECT..COUNT() doesn't work in Sphinx query, so you're right about that.
Also, according to Sphinx documentation, you can retrive number of results using SHOW META query.
SHOW META
SHOW META shows additional meta-information about the latest query such as query time and keyword statistics:
mysql> SELECT * FROM test1 WHERE MATCH('test|one|two');
+------+--------+----------+------------+
| id | weight | group_id | date_added |
+------+--------+----------+------------+
| 1 | 3563 | 456 | 1231721236 |
| 2 | 2563 | 123 | 1231721236 |
| 4 | 1480 | 2 | 1231721236 |
+------+--------+----------+------------+
3 rows in set (0.01 sec)
mysql> SHOW META;
+---------------+-------+
| Variable_name | Value |
+---------------+-------+
| total | 3 |
| total_found | 3 |
| time | 0.005 |
| keyword[0] | test |
| docs[0] | 3 |
| hits[0] | 5 |
| keyword[1] | one |
| docs[1] | 1 |
| hits[1] | 2 |
| keyword[2] | two |
| docs[2] | 1 |
| hits[2] | 2 |
+---------------+-------+
12 rows in set (0.00 sec)
References:
Sphinx: SHOW META syntax
SphinxClient::setLimits
SELECT VARIABLE_NAME, VARIABLE_VALUE
FROM information_schema.GLOBAL_STATUS WHERE
VARIABLE_NAME LIKE 'SPHINX_TOTAL_FOUND';
for more info
SELECT VARIABLE_NAME, VARIABLE_VALUE
FROM information_schema.GLOBAL_STATUS WHERE
VARIABLE_NAME LIKE 'SPHINX_%';