I have a table with all my invoice items as packages:
Table: invoice_items
invoice_item_id | package_id | addon_1 | addon_2 | addon_3 | ...
----------------|------------|---------|---------|
1 | 6 | 2 | 5 | 3 |
Then my other table:
Table: addons
addon_id | addon_name | addon_desc |
----------|--------------|--------------------------|
1 | Dance Lights | Brighten up the party... |
2 | Fog Machine | Add some fog for an e... |
Instead of taking up space storing the addon name in my invoice_items table, I'd like to just include the addon_id in the addon_1, addon_2, etc columns.
How do I then get the name of the addon when doing a query for invoice_item rows?
Right now I just have it programmed into the page that if addon_id == 1, echo "Dance Lights", etc but I'd like to do it in the query. Here is my current query:
$invoice_items_SQL = "
SELECT invoice_items.*, packages.*
FROM `invoice_items`
INNER JOIN packages ON invoice_items.invoice_item_id = packages.package_id
WHERE `event_id` = \"$event_id\"
";
So I'm able to do this with packages, but only because there's just one package_id per row, but there are up to 9 addons :(
The most direct way of doing it is to join onto the table multiple times. That's a bit naff though because you'll write almost the same thing 9 times.
Another, better way would be to restructure your tables - you need another table with 2 data columns: invoice_id and addon_id. You then need either an auto-inc primary column, or use both of those existing columns as a dual primary key. So this is a many-to-many junction table.
From there you can can query without having 9 repetitive joins, but you will get a row of each package for every addon it has (so if it has three addons it will appear three times in the results). And then from there you can use GROUP_CONCAT to concatenate the names of the addons into a single field so that you only get one row per invoice.
Related
When I started designing my application database schema few months ago I have been told not to store the same data/calculated data in more than one place in the database(normalization). If I do, I will make a scope of bugs when I update the data in one place and left the other without updating. So I did an orders table and ordersDetails table. Something like this..
-- orders table
+-----+---------+----------+
| ID | clintID | date |
+-----+---------+----------+
| 1 | 1 |2018-02-22|
| 2 | 1 |2018-02-23|
| 3 | 2 |2018-02-24|
+-----+---------+----------+
-- orderDetail table
+-----+---------+------------+----------+----------+
| ID | orderID | itemNumber | quantity | unitPrice|
+-----+---------+------------+----------+----------+
| 1 | 1 | 12345 | 3 | 100.75 |
| 2 | 1 | 12346 | 3 | 100.75 |
| 3 | 2 | 12347 | 3 | 100.75 |
| 4 | 2 | 12345 | 3 | 100.75 |
| 5 | 3 | 12347 | 3 | 100.75 |
| 6 | 3 | 12345 | 3 | 100.75 |
+-----+---------+------------+----------+----------+
And to make the the queries easier for me I made a view "allOrdersSummary" like
-- allOrdersSummary
SELECT
orders.*, SUM(orderDetail.quantity * orderDetail.unitPrice) totalAmount
FROM orders INNER JOIN orderDetail ON orders.ID = orderDetail.orderID
GROUP BY orders.ID;
and I used this view later for my queries, but now I started to get the MAX_JOIN_SIZE error.
So I thought of saving the calculated total order amount along with the orders table ID, clintID, date, totalAmount and whenever I change something in the orderDeatils table I update the calculated totalAmount column in the orders table, I don't know if this is good or bad!
This problem -I don't know if this is considered a problem or not- is encountered many times, for example to know the unread messages of the client making the request I have to do sum(messages) unread from messages where to = ? and isRead = 0
A) should I make another column for calculated totalAmount in the orders table or it is a normal thing in databases to calculate the totalAmount from the orderDetails table every time I need it ?
B) If you recommend making another column in the orders table, what is the best way to update it every time a change happens in the orderDetails table ? should I update it at the PHP layer whenever I update the orderDetails table, or this is something that needs a stored procedure ?
Yes, it is normal to store pre-calculated values, based on other data in the database, in a database. But not necessarily for the reason you mention. I never had a problem with MAX_JOIN_SIZE.
The main, and probably only, reason for storing calculated values is speed. So you do it for values that don't change that often and that may be used in queries that use a lot of data and may therefore be too slow if you didn't use them.
For instance: If you want to know the average value of all the orders in your database the query would be a lot faster if you already have the order totals.
Why, and how, you update the values is completely up to you. However you have got to be consistent about it. If you use the MVC pattern it would make sense to integrate it in the controller. Or in simple terms: Whenever a form is submitted that could change one of the values, out of which the pre-calculated value is computed, you need to recompute it.
This is a clear demonstration where 'normalization' is not entirely maintained. It's not really pretty, but sometimes worth it. You could, of course, argue, that the calculated value represents 'new' information, and therefore does not offend against 'normalization'.
You have an "inflate-deflate" problem.
JOIN the two tables to make a much larger temporary table.
GROUP BY to shrink back to one row per row of the original (orders) table.
This avoids the problem:
SELECT *,
( SELECT SUM(quantity * unitPrice
FROM orderDetail WHERE orderID = orders.ID
) AS totalAmount
FROM orders;
Please let me know how your experience is with this one. It is one of the simplest examples of the inflate-deflate problem.
I'm wondering which method below is faster?
Suppose:
Maximum 10,000 products, each product has 1 user id, 1 cat id, 3 extra fields, and 5 images.
90-99% users come to the website just for the information, not posting.
Method 1: get all data from a table from a query without "JOIN":
SELECT * FROM products WHERE ...
Table: products
id | name | poster_name | cat_name | code_1 | code_2 | content |
dimensions | contact | message | images |
Method 2: get all data from multiple tables with "JOIN":
SELECT ... FROM products
LEFT JOIN cats ON products.cat_id = casts.id
LEFT JOIN users ON ....
table: products
id | name | code_1 | code_2 | content | cat_id | poster_id |
table: cats
id | cat_name |
table: users
id | poster_name |
table: extra
id | product_id | extra_info | extra_data |
table: images
id | product_id | img_src |
The first method will usually be faster for reads, and the second one will help you maintain data integrity and usually will be faster for writes.
The transition from the later form to the former is called denormalization and is usually used in data warehouses, while operational ("live") databases usually prefer the later form (second method).
You have not finished asking the question. Method 2 has no WHERE, so it will deliver 10K rows, plus have to do 20K lookups into the other tables. That makes it the loser.
Since your real question is about performance, then let's discuss the WHERE clause. With that, we can optimize it so that the desired data tends to be in RAM.
Back to your question... JOIN is probably the 'right' way to do it. And it is not that much of a performance hit assuming you have the proper indexes. So provide SHOW CREATE TABLE (even if tentative) and complete WHERE clauses.
Don't over-normalize. For example, do not normalize datetime or any other 'continuous' values.
Normalization can save space, especially in huge tables (eg, millions or billions of rows, and large, frequently repeated, strings being normalized.) This is especially helpful when the table is too big to stay cached in RAM.
I have a table that contains information about an item, and another table that references the owner of that item, like so:
baseItem
--------
itemID | 1 | 2 | 3 | 4 |
itemSize | 5 | 1 | 5 | 3 |
itemCost | 100 | 50 | 1 | 99 |
itemOwner
--------
ownerID | 1 | 1 | 3
itemID | 1 | 4 | 2
What I'm after are the SUMS of itemSize and itemCOST based on the owner. I've looked around but none of the answers I've seen make sense? Here's the best I could come up with, which clearly isn't working:
SUM itemCost FROM baseItem.itemCost LEFT JOIN itemID ON itemOwner.itemid = baseItem.itemid
SELECT ownerId, sum(itemCost) as OwnerCost, sum(itemSize) as OwnerSize
FROM itemOwner
LEFT JOIN baseItem
ON itemOwner.itemid = baseItem.itemid
GROUP BY ownerId
A SELECT statement lists which fields you want to read from the table; in this case you want two values: the id of the owner, and the sum of the values of the items they own. However, since you're using sum (an aggregate function), you must GROUP your elements over some parameter. In this case, you want to group them by ownerId.
A FROM clause references a table; you can start with either baseItem or itemOwner, it makes no difference in this case. You can think of LEFT JOIN as a cartesian product that creates a new table, which contains every element from the cartesian product of both, filtered by the ON clause. However will always have all the items in the left table which is itemOwner and when there are no matching rows in the baseItem all the fields will be NULL. The SUM function will act as if those are 0s and should return you a 0 sum for non matching rows in the baseItem table.
Maybe it is not working because it is invalid SQL statement. Try following code
SELECT SUM(baseItem.itemCost) FROM baseItem
LEFT JOIN itemOwner ON itemOwner.itemId = baseItem.itemId
I need to store and retrieve items of a course plan in sequence. I also need to be able to add or remove items at any point.
The data looks like this:
-- chapter 1
--- section 1
----- lesson a
----- lesson b
----- drill b
...
I need to be able to identify the sequence so that when the student completes lesson a, I know that he needs to move to lesson b. I also need to be able to insert items in the sequence, like say drill a, and of course now the student goes from lesson a to drill a instead of going to lesson b.
I understand relational databases are not intended for sequences. Originally, I thought about using a simple autoincrement column and use that to handle the sequence, but the insert requirement makes it unworkable.
I have seen this question and the first answer is interesting:
items table
item_id | item
1 | section 1
2 | lesson a
3 | lesson b
4 | drill a
sequence table
item_id | sequence
1 | 1
2 | 2
3 | 4
4 | 3
That way, I would keep adding items in the items table with whatever id and work out the sequence in the sequence table. The only problem with that system is that I need to change the sequence numbers for all items in the sequence table after an insertion. For instance, if I want to insert quiz a before drill a I need to update the sequence numbers.
Not a huge deal but the solutions seems a little overcomplicated. Is there an easier, smarter way to handle this?
Just relate records to the parent and use a sequence flag. You will still need to update all the records when you insert in the middle but I can't really think of a simple way around that without leaving yourself space to begin with.
items table:
id | name | parent_id | sequence
--------------------------------------
1 | chapter 1 | null | 1
2 | section 1 | 1 | 2
3 | lesson a | 2 | 3
4 | lesson b | 2 | 5
5 | drill a | 2 | 4
When you need to insert a record in the middle a query like this will work:
UPDATE items SET sequence=sequence+1 WHERE sequence > 3;
insert into items (name, parent_id, sequence) values('quiz a', 2, 4);
To select the data in order your query will look like:
select * from items order by sequence;
purpose: I have been tasked with exporting a complex dataset from a PHP counseling appointment webapp, and convert it into an excel file containing student data sorted by their STUDENT_ID.
I have 3 MySQL tables containing data. They all have a STUDENT_ID field.
I need to make a query which retrieves all the data from the 4 tables, grouping into a single row based on STUDENT_ID.
Some of the tables contain multiple entries for the same STUDENT_ID. If possible I'd like these multiple entries combined into a single row (so that each unique STUDENT_ID is on one line).
This is what I have so far but it doesn't seem to work how I expect it to.
SELECT *
from ssp_student t1
INNER JOIN ssp_student_quarterly t2
ON t1.STUDENT_ID = t2.STUDENT_ID
INNER JOIN ssp_weekly_progress t3
ON t2.STUDENT_ID = t3.STUDENT_ID
GROUP BY t1.STUDENT_ID
Table Schema:
Table 1:
| STUDENT_ID | PEER_COACH_ID | ACTIVE | COHORT | WEEKLY_MEETING_TIME | FYE_ID | RC | AGREEMENT_SIGNED | RELEASE_SIGNED | NOTES | FACULTY_ADVISOR |
Table 2:
| STUDENT_ID | QUARTER | COUNSELLING_OFFICE | WRITING_CENTER | CASE_MANAGEMENT | SSP_SOCIAL_EVENTS | SSP_SUCCESS_SEMINAR | HOME_SUPPORT | ACCOMODATION_USED | DISCOVERY_PATHWAYS | PEER_COACHING |
Table 3:
| STUDENT_ID | QUARTER | WEEK | EMAIL_INTERACTION | PHONE_INTERACTION | TEXT_INTERACTION | INPERSON_INTERACTION | SOCIAL_INTERACTION | NUMBER_OF_SOCIAL_INTERACTIONS | CASE_MANAGEMENT_INTERACTIONS | NUMBER_OF_CASE_mANAGEMENT_INTERACTIONS | SUCCESS_SEMINAR_INTERACTION | NUMBER_OF_SUCCESS_SEMINAR_INTERACTIONS | OTHER_INTERACTION | THEMES | SURVEY_ID | NOTES |
what I need: I want 1 row for each STUDENT_ID, which contains columns for all the data associated with that STUDENT_ID in tables 1, 2 and 3.
if you use SELECT * and you say that some of the tables contain more than one row for the same student, you will never get only one row. Try to select the fields related to the student id that you want to display.
If any of the fields that you want to display is one of the multiple-entry, then it will not work, it will display one row per entry.
If you really want to concatenate the data for each row into one field, your SELECT statement you could do something like the following:
SELECT t1.STUDENT_ID, GROUP_CONCAT(t2.Field1 SEPARATOR ', ') AS t2Field1Concat,
GROUP_CONCAT(t2.Field2 SEPARATOR ', ') AS t2Field2Concat,
GROUP_CONCAT(t3.Field1 SEPARATOR ', ') AS t3Field1Concat,
GROUP_CONCAT(t3.Field2 SEPARATOR ', ') AS t3Field2Concat
In the above example you would have to do this for each field other than t1.STUDENT_ID.
You seem to be after 4 separate groups of data that have virtually nothing in common other than the student ID. You should perform a single query for each and then combine the data into a relevant format in PHP.
Joining all 4 tables together like this is going to end up with a potentially MASSIVE result set full of duplicate data.