I've a web application where I use 2 tables, one for storing product information and other for storing votes of each product.
Now I'd like to display the products based on the number of votes the products had got. Below is table structure
Products:
PRODUCT_ID TITLE
1 product1
2 product2
3 product3
4 product4
Votes:
PRODUCT_ID USER_ID
1 1
1 1
2 2
3 2
And I am expecting a result to display the products in descending order of the votes
PRODUCT_ID TITLE VOTES
1 product1 2
2 product2 1
3 product3 1
Currently I am using a query like this
SELECT p.product_id, p.title, count(*) AS total FROM products p
INNER JOIN votes v ON v.product_id = p.product_id GROUP BY p.product_id
ORDER BY count(*) DESC LIMIT 110
Products table has around 30,000 records and votes tables has around 90,000 records.
Now the problem is it takes a lot of time(randomly between 18 to 30 seconds). Since the number of records in the tables aren't that high, I wonder why it takes such a huge amount of time.
One thing noticed is when I run the query for the second time it fetches the results in few milli seconds which I think is the ideal time for a not so complex query like this.
Again I am pretty new to database side of programming.
I am not sure if there's anything wrong in the query or is it the table structure which isn't efficient (at least to fetch the records quickly).
First, your query is fine, although I would be inclined to format it differently:
SELECT p.product_id, p.title, count(*) AS total
FROM products p INNER JOIN
votes v
ON v.product_id = p.product_id
GROUP BY p.product_id
ORDER BY count(*) DESC
LIMIT 110;
As mentioned in another answer, an index on votes(product_id) would definitely help the query, if you don't have one already. Even with the improvement in the join performance, you still have the overhead of an aggregation. And, in MySQL that can be a lot of overhead.
If you are expecting lots and lots more votes -- getting into the millions -- then you may have to take another approach. One approach is to add a trigger to some table (perhaps the products table that keeps track of votes as they come in. Then the query would fly. Another approach would be to periodically summarize the data, similar to using a trigger but using a job instead.
Related
I had a query made up from two tables that have a One to Many relationship.
Products table
product_id
1234
Products_destinations table
product_id
destinations_id
1234
1
1234
2
I made a query to select product id ant all the related destinations, which is pretty easy.
SELECT
p.product_id, GROUP_CONCAT(DISTINCT pd.destinations_id) as destinations_list
FROM
`products` p
INNER JOIN products_destinations pd ON pd.product_id = p.product_id
GROUP BY p.product_id
The result is:
product_id
destinations_list
1234
1,2
Now a new table enters the query, that can has a One to Many relationship with product as well. It does not however have any relationshio with products_destinations dable.
Products_prices table
product_id
price
1234
200
The updated query looks like this:
SELECT
p.product_id, GROUP_CONCAT(DISTINCT pd.destinations_id) as destinations_list, SUM(pc.price) as all_prices
FROM
`products` p
INNER JOIN products_destinations pd ON pd.product_id = p.product_id
INNER JOIN products_prices pc ON pc.product_id = p.product_id
GROUP BY p.product_id
And now the end result looks like this:
product_id
destinations_list
all_prices
1234
1,2
400
As you can see the price is showing 400 instead of 200, because of the grouping of two destinations that the product has. Is it possible to count the SUM of the products prices in this type of query? One solution is to use SUBQUERY to count the SUM of prices, however this is just an example and the real tables are super large and full of data... Subqueries increase the query time drastically.
UPDATED:
"count the SUM" is bad english, sorry for that. The problem with this query is that the product price is not correct when there are multiple destinations for the product. For example the product can be related to multiple destinations and can have multiple prices. In the same query I need to select the list of destinations and the Total price of the product. In this example product has one price which is 200, but because I need to retrieve the list of destinations also I have to GROUP BY the product which causes the price to be incremented by each destination also. IF the product would have more prices the results would be even worse. In the end the result should look like this:
product_id
destinations_list
all_prices
1234
1,2
200
instead of this:
product_id
destinations_list
all_prices
1234
1,2
400
The basic problem you're running into is that there is a cartesian explosion between prices and destinations. Any time you are joining in any more than one single table that enjoys a 1:M relationship with product, you'll start to multiple the rows; two prices and two destinations will become 4 rows, three prices and 4 destinations will become 12 rows
The simplest solution is to make sure you only join in rows that are 1:1:
SELECT
p.product_id, pd.destinations_list, pc.sumprices
FROM
products p
INNER JOIN (
SELECT product_id, GROUP_CONCAT(destinations_id) as destinations_list
FROM products_destinations
GROUP BY product_id
) pd ON pd.product_id = p.product_id
INNER JOIN (
SELECT product_id, SUM(price) as sumprices
FROM products_prices
GROUP BY product_id
) pc ON pc.product_id = p.product_id
You can get away with only doing one of these subqueries, but I don't see much point because you'll only have to group the outer and handle repetition there. It's easier to just handle aggregation on a per table basis so that ultimately you're joining everything on a 1:1 in the outer. Once you have this, run an explain plan and work out how you can wisely index to improve things (you haven't posted any where clauses). I appreciate that you said "without subquery" but you're causing a problem (cartesian explosion) that you're then having to find hacky ways to solve, and that will only go so far; it's better to not cause the problem in the first place than find ways to bodge it up after you've caused it. Don't fret too much about how the query appears; it'll likely be considerably rewritten by the optimizer anyway, so getting the plan of how it's actually executing and tuning the setup for that would be more productive
Another approach is correlated subqueries:
SELECT p.product_id,
(SELECT GROUP_CONCAT(DISTINCT pd.destinations_id)
FROM products_destinations pd
WHERE pd.product_id = p.product_id
) as destinations_list,
(SELECT SUM(pc.price)
FROM products_prices pc
WHERE pc.product_id = p.product_id
) as all_prices
FROM products p ;
Note: This keeps all products, even those that might be missing prices or destinations.
This version has the advantage that if you are filtering down the number of products with a WHERE clause, then it should have very good performance -- assuming that product_id is indexed in the two junction tables.
SELECT
p.product,
q.format,
p.title
FROM
product p
JOIN info q ON p.product = q.product
WHERE p.user='$user'
GROUP BY p.product,q.format
I want to first group by 'product' from the product table but the also by format on the info table.
This is to not show duplicates of format and product. At the moment only the grouping by product is working.
Table - products
product | title
0 one
1 two
1 two - a
2 three
Table - product_details
product | title | format |
0 one home
1 two home
1 two - a home
2 three work
So for this example I want a list like:
product | title | format
0 one home
2 three work
Instead of:
product | title | format
0 one home
1 two home
2 three work
After your table structures were posted, I can see what your intent is, I believe. It looks like you are attempting to limit your output result set to those values for product.product which are never repeated. That is, values for product.product which have exactly one product.title.
For that, you can use a GROUP BY aggregation to return only those with COUNT(*) = 1 after the group is applied.
In this case, since you only expect one row back per product.product anyway, you can do the aggregation at the top level, not requiring a subquery. If you had joined in other tables, and ended up getting multiple rows back per product due to other one-to-many relationships, you would need to use the subquery method instead (to be portable anyway - MySQL would still probably allow this)
SELECT
p.product,
q.format,
p.title
FROM
products p
JOIN product_details q ON p.product = q.product
GROUP BY
p.product,
q.format,
p.title
HAVING COUNT(*) = 1
Here is a demonstration: http://sqlfiddle.com/#!2/72eda/6
If you did expect multiple rows back per p.product, such as if you joined in additional one-to-many related tables, an efficient way to handle that is to perform a JOIN against a subquery that imposes that limit in the HAVING clause. Those which don't meet the HAVING condition won't be returned in the subquery and therefore get discarded by the INNER JOIN.
SELECT
p.product,
q.format,
p.title
FROM
products p
INNER JOIN product_details q ON p.product = q.product
/* Subquery returns only product values having exactly 1 row */
INNER JOIN (
SELECT product
FROM products
GROUP BY product
HAVING COUNT(*) = 1
) pcount ON p.product = pcount.product
WHERE p.user = '$user'
http://sqlfiddle.com/#!2/72eda/2
I have a table USERS and a table ORDERS. In my backend office I'm attempting to output a table with all users (customers) and SUM their individual order total, so to simplify I'm saying:
SELECT users.id, SUM(orders.total) as spent FROM users
JOIN orders ON users.id=orders.customer_id GROUP BY users.id
(Note: do not pay attention to the syntax, this is just to illustrate the point. the syntax is fine when I run it.)
I now have say 4 users in total and the ORDERS table looks something like this:
order_id customer_id total
1 1 25
2 2 10
3 1 5
Then my query will output ONLY those users that can be found in the ORDERS table and my backend customer overview table will look unfortunately like this:
Customer ID Spent in Total
1 30
2 10
ignoring completely the other 2 users who have not yet placed any orders. What I want to see is this:
Customer ID Spent in Total
1 30
2 10
3 0
4 0
Is there a way to do this?
My guess is that it has something to do with special joins like inner, outer, but I don't know the difference there.
Also what I thought about was to run two queries, selecting * from users and then running a foreach to sum up order total, but this seems inefficient.
Because sometimes one picture is worth more than thousand words:
You need a left join, and on some (old) versions of MySQL also an IFNULL().
SELECT
users.id,
IFNULL(SUM(orders.total),0) as spent
FROM users
LEFT JOIN orders ON users.id=orders.customer_id
GROUP BY users.id
select u.id, sum(o.Total)
from Users u
left outer join Orders o on u.id = o.customer_id
group by u.id
So, I have a table named clients, another one known as orders and other two, orders_type_a and orders_type_b.
What I'm trying to do is create a query that returns the list of all clients, and for each client it must return the number of orders based on this client's id and the amount of money this customer already spent.
And... I have no idea how to do that. I know the logic behind this, but can't find out how to translate it into a MySQL query.
I have a basic-to-thinkimgoodbutimnot knowledge of MySQL, but to this situation I've got really confused.
Here is a image to illustrate better the process I'm trying to do:
Useful extra information:
Each orders row have only one type (which is A or B)
Each orders row can have multiple orders_type_X (where X is A or B)
orders relate with client through the column client_id
orders_type_X relate with orders through the column order_id
This process is being made today by doing a query to retrieve clients, and then from each entry returned the code do another query (with php) to retrieve the orders and yet another one to retrieve the values. So basically for each row returned from the first query there is two others inside it. Needless to say that this is a horrible approach, the performance sucks and I thats the reason why I want to change it.
UPDATE width tables columns:
clients:
id | name | phone
orders:
id | client_id | date
orders_type_a:
id | order_id | number_of_items | price_of_single_item
orders_type_b:
id | order_id | number_of_shoes_11 | number_of_shoes_12 | number_of_shoes_13 | price_of_single_shoe
For any extra info needed, just ask.
If I understand you correctly, you are looking for something like this?
select c.*, SUM(oa.value) + SUM(ob.value) as total
from clients c
inner join orders o on c.order_id = o.id
inner join orders_type_a oa on oa.id = o.order_type_id AND o.type = 'A'
inner join orders_type_b ob on ob.id = o.order_type_id AND o.type = 'B'
group by c.id
I do not know your actual field names, but this returns the information on each customer plus a single field 'total' that contains the sum of the values of all the orders of both type A and type B. You might have to tweak the various names to get it to work, but does this get you in the right direction?
Erik's answer is on the right track. However, since there could be multiple orders_type_a and orders_type_b records for each order, it is a little more complex:
SELECT c.id, c.name, c.phone, SUM(x.total) as total
FROM clients c
INNER JOIN orders o
ON o.client_id = c.id
INNER JOIN (
SELECT order_id, SUM(number_of_items * price_of_single_item) as total
FROM orders_type_a
UNION ALL
SELECT order_id, SUM((number_of_shoes_11 + number_of_shoes_12 + number_of_shoes_13) * price_of_single_shoe) as total
FROM orders_type_b
) x
ON x.order_id = o.id
GROUP BY c.id
;
I'm making a few assumptions about how to calculate the total based on the columns in the orders_type_x tables.
in our project we've got an user table where userdata with name and different kind of scores (overall score, quest score etc. is stored). How the values are calculated doesn't matter, but take them as seperated.
Lets look table 'users' like below
id name score_overall score_trade score_quest
1 one 40000 10000 20000
2 two 20000 15000 0
3 three 30000 1000 50000
4 four 80000 60000 3000
For showing the scores there are then a dummy table and one table for each kind of score where the username is stored together with the point score and a rank. All the tables look the same but have different names.
id name score rank
They are seperated to allow the users to search and filter the tables. Lets say there is one row with the player "playerX" who has rank 60. So if I filter the score for "playerX" I only see this row, but with rank 60. That means the rank are "hard stored" and not only displayed dynamically via a rownumber or something like that.
The different score tables are filled via a cronjob (and under the use of a addional dummy table) which does the following:
copies the userdata to a dummy table
alters the dummy table by order by score
copies the dummy table to the specific score table so the AI primary key (rank) is automatically filled with the right values, representing the rank for each user.
That means: Wheren there are five specific scores there are also five score tables and the dummy table, making a total of 6.
How to optimize?
What I would like to do is to optimize the whole thing and to drop duplicate tables (and to avoid the dummy table if possible) to store all the score data in one table which has the following cols:
userid, overall_score, overall_rank, trade_score, trade_rank, quest_score, quest_rank
My question is now how I could do this the best way and is there another way as the one shown above (with all the different tables)? MYSQL-Statements and/or php-code is welcome.
Some time ago I tried using row numbers but this doesn't work a) because they can't be used in insert statements and b) because when filtering every player (like 'playerX' in the example above) would be on rank 1 as it's the only row returning.
Well, you can try creating a table with the following configuration:
id | name | score_overall | score_trade | score_quest | overall_rank | trade_rank | quest_rank
If you do that, you can use the following query to populate the table:
SET #overall_rank:=-(SELECT COUNT(id) FROM users);
SET #trade_rank:=#overall_rank;
SET #quest_rank:=#overall_rank;
SELECT *
FROM users u
INNER JOIN (SELECT id,
#overall_rank:=#overall_rank+1 AS overall_rank
FROM users
ORDER BY score_overall DESC) ovr
ON u.id = ovr.id
INNER JOIN (SELECT id,
#trade_rank:=#trade_rank+1 AS trade_rank
FROM users
ORDER BY score_trade DESC) tr
ON u.id = tr.id
INNER JOIN (SELECT id,
#quest_rank:=#quest_rank+1 AS quest_rank
FROM users
ORDER BY score_quest DESC) qr
ON u.id = qr.id
ORDER BY u.id ASC
I've prepared an SQL-fiddle for you.
Although I think performance will weigh in if you start getting a lot of records.
A bit of explanation: the #*_rank things are SQL variables. They get increased with 1 on every new row.