I just want to ask if with this kind of database design below and query below will this have a big effect on the query performance or should I break down the query instead of multiple subqueries. Though this subquery works for me I just want to make sure that someday it will not affect the performance. My goal in mind with this query is that I want to generate a dynamic queries for all those table. Example query for this is to view list of participants attended in a specific commodity and start date between 2 given date.Thank you in advance.
Below is my sample code that will view participants in between 1 and 3 years old.
SELECT TT.title, TTP.*, TS.*, TP.lastname, TP.firstname, (DATE_FORMAT(NOW(), '%Y') - DATE_FORMAT(TP.birthday, '%Y') - (DATE_FORMAT(NOW(), '00-%m-%d') < DATE_FORMAT(TP.birthday, '00-%m-%d')) )AS age, TP.birthday
FROM tbl_trainingparticipant AS TTP
LEFT JOIN (
SELECT t1.*
FROM tbl_schedules t1
WHERE t1.sched_id = (
SELECT t2.sched_id
FROM tbl_schedules t2
WHERE t2.training_id = t1.training_id
ORDER BY t2.sched_id DESC LIMIT 1
)
)AS TS ON TTP.training_id = TS.training_id
LEFT JOIN tbl_participants AS TP
ON TTP.participant_id = TP.participant_id
LEFT JOIN tbl_trainings AS TT
ON TT.training_id = TS.training_id
HAVING age BETWEEN 1 AND 3
SQL has an ability to nest queries within one another. A subquery is a
SELECT statement that is nested within another SELECT statement and
which return intermediate results. SQL executes innermost subquery
first, then next level.
So if you use a nested select you assigned task to mysql to execute two queries which are sent to it together. Usually it's going to be faster to do one trip than several.
Related
Here's a problem I'm facing: I need to lists some items. Those items come from different sources (let's say table A, table B, table C), with different attributes and nature (although some are common).
How can I merge them together in a list that is paginated?
The options I've considered:
Get them all first, then sort and paginate them afterwards in the code. This doesn't work well because there are too many items (thousands) and performance is a mess.
Join them in a SQL view with their shared attributes, once the SQL query is done, reload only the paginated items to get the rest of their attributes. This works so far, but might become difficult to maintain if the sources change/increase.
Do you know any other option? Basically, what is the most used/recommended way to paginate items from two data sources (either in SQL or directly in the code).
Thanks.
If UNION solves the problem, here are some syntax and optimization tips.
This will provide page 21 of 10-row pages:
(
( SELECT ... LIMIT 210 )
UNION [ALL|DISTINCT]
( SELECT ... LIMIT 210 )
) ORDER BY ... LIMIT 10 OFFSET 200
Note that 210 = 200+10. You can't trust using OFFSET in the inner SELECTs.
Use UNION ALL for speed, but if there could be repeated rows between the SELECTs, then explicitly say UNION DISTINCT.
If you take away too many parentheses, you will get either syntax errors or the 'wrong' results.
If you end up with a subquery, repeat the ORDER BY but not the LIMIT:
SELECT ...
FROM (
( SELECT ... LIMIT 210 )
UNION [ALL|DISTINCT]
( SELECT ... LIMIT 210 )
ORDER BY ... LIMIT 10 OFFSET 200
) AS u
JOIN something_else ON ...
ORDER BY ...
One reason that might include a JOIN is for performance -- The subquery u has boiled the resultset down to only 10 rows, hence the JOIN will have only 10 things to look up. Putting the JOIN inside would lead to lots of joining before whittling down to only 10.
I actually had to answer a similar situation very recently, specifically reporting across two large tables and paginating across both of them. The answer I came to was to use subqueries, like so:
SELECT
t1.id as 't1_id',
t1.name as 't1_name',
t1.attribute as 't1_attribute',
t2.id as 't2_id',
t2.name as 't2_name',
t2.attribute as 't2_attribute',
l.attribute as 'l_attribute'
FROM (
SELECT
id, name, attribute
FROM
table1
/* You can perform joins in here if you want, just make sure you're using your aliases right */
/* You can also put where statements here */
ORDER BY
name DESC, id ASC
LIMIT 0,50
) as t1
INNER JOIN (
SELECT
id,
name,
attribute
FROM
table2
ORDER BY
attribute ASC
LIMIT 250,50
) as t2
ON t2.id IS NOT NULL
LEFT JOIN
linkingTable as l
ON l.t1Id = t1.id
AND l.t2Id = t2.id
/* Do your wheres and stuff here */
/* You shouldn't need to do any additional ordering or limiting */
The current implementation is a single complex query with multiple joins and temporary tables, but is putting too much stress on my MySQL and is taking upwards of 30+ seconds to load the table. The data is retrieved by PHP via a JavaScript Ajax call and displayed on a webpage. Here is the tables involved:
Table: table_companies
Columns: company_id, ...
Table: table_manufacture_line
Columns: line_id, line_name, ...
Table: table_product_stereo
Columns: product_id, line_id, company_id, assembly_datetime, serial_number, ...
Table: table_product_television
Columns: product_id, line_id, company_id, assembly_datetime, serial_number, warranty_expiry, ...
A single company can have 100k+ items split between the two product tables. The product tables are unioned and filtered by the line_name, then ordered by assembly_datetime and limited depending on the paging. The datetime value is also reliant on timezone and this is applied as part of the query (another JOIN + temp table). line_name is also one of the returned columns.
I was thinking of splitting the line_name filter out from the product union query. Essentially I'd determine the ids of the lines that correspond to the filter, then do a UNION query with a WHERE condition WHERE line_id IN (<results from previous query>). This would cut out the need for joins and temp tables, and I can apply the line_name to line_id and timezone modification in PHP, but I'm not sure this is the best way to go about things.
I have also looked at potentially using Redis, but the large number of individual products is leading to a similarly long wait time when pushing all of the data to Redis via PHP (20-30 seconds), even if it is just pulled in directly from the product tables.
Is it possible to tweak the existing queries to increase the efficiency?
Can I push some of the handling to PHP to decrease the load on the SQL server? What about Redis?
Is there a way to architect the tables better?
What other solution(s) would you suggest?
I appreciate any input you can provide.
Edit:
Existing query:
SELECT line_name,CONVERT_TZ(datetime,'UTC',timezone) datetime,... FROM (SELECT line_name,datetime,... FROM ((SELECT line_id,assembly_datetime datetime,... FROM table_product_stereos WHERE company_id=# ) UNION (SELECT line_id,assembly_datetime datetime,... FROM table_product_televisions WHERE company_id=# )) AS union_products INNER JOIN table_manufacture_line USING (line_id)) AS products INNER JOIN (SELECT timezone FROM table_companies WHERE company_id=# ) AS tz ORDER BY datetime DESC LIMIT 0,100
Here it is formatted for some readability.
SELECT line_name,CONVERT_TZ(datetime,'UTC',tz.timezone) datetime,...
FROM (SELECT line_name,datetime,...
FROM (SELECT line_id,assembly_datetime datetime,...
FROM table_product_stereos WHERE company_id=#
UNION
SELECT line_id,assembly_datetime datetime,...
FROM table_product_televisions
WHERE company_id=#
) AS union_products
INNER JOIN table_manufacture_line USING (line_id)
) AS products
INNER JOIN (SELECT timezone
FROM table_companies
WHERE company_id=#
) AS tz
ORDER BY datetime DESC LIMIT 0,100
IDs are indexed; Primary keys are the first key for each column.
Let's build this query up from its component parts to see what we can optimize.
Observation: you're fetching the 100 most recent rows from the union of two large product tables.
So, let's start by trying to optimize the subqueries fetching stuff from the product tables. Here is one of them.
SELECT line_id,assembly_datetime datetime,...
FROM table_product_stereos
WHERE company_id=#
But look, you only need the 100 newest entries here. So, let's add
ORDER BY assembly_datetime DESC
LIMIT 100
to this query. Also, you should put a compound index on this table as follows. This will allow both the WHERE and ORDER BY lookups to be satisfied by the index.
CREATE INDEX id_date ON table_product_stereos (company_id, assembly_datetime)
All the same considerations apply to the query from table_product_televisions. Order it by the time, limit it to 100, and index it.
If you need to apply other selection criteria, you can put them in these inner queries. For example, in a comment you mentioned a selection based on a substring search. You could do this as follows
SELECT t.line_id,t.assembly_datetime datetime,...
FROM table_product_stereos AS t
JOIN table_manufacture_line AS m ON m.line_id = t.line_id
AND m.line_name LIKE '%test'
WHERE company_id=#
ORDER BY assembly_datetime DESC
LIMIT 100
Next, you are using UNION to combine those two query result sets into one. UNION has the function of eliminating duplicates, which is time-consuming. (You know you don't have duplicates, but MySQL doesn't.) Use UNION ALL instead.
Putting this all together, the innermost sub query becomes this. We have to wrap up the subqueries because SQL is confused by UNION and ORDER BY clauses at the same query level.
SELECT * FROM (
SELECT line_id,assembly_datetime datetime,...
FROM table_product_stereos
WHERE company_id=#
ORDER BY assembly_datetime DESC
LIMIT 100
) AS st
UNION ALL
SELECT * FROM (
SELECT line_id,assembly_datetime datetime,...
FROM table_product_televisions
WHERE company_id=#
ORDER BY assembly_datetime DESC
LIMIT 100
) AS tv
That gets you 200 rows. It should get those rows fairly quickly.
200 rows are guaranteed to be enough to give you the 100 most recent items later on after you do your outer ORDER BY ... LIMIT operation. But that operation only has to crunch 200 rows, not 100K+, so it will be far faster.
Finally wrap up this query in your outer query material. Join the table_manufacture_line information, and fix up the timezone.
If you do the indexing and the ORDER BY ... LIMIT operation earlier, this query should become very fast.
The comment dialog in your question indicates to me that you may have multiple product types, not just two, and that you have complex selection criteria for your paged display. Using UNION ALL on large numbers of rows slams performance: it converts multiple indexed tables into an internal list of rows that simply can't be searched efficiently.
You really should consider putting your two kinds of product data in a single table instead of having to UNION ALL multiple product tables. The setup you have now is inflexible and won't scale up easily. If you structure your schema with a master product table and perhaps some attribute tables for product-specific information, you will find yourself much happier two years from now. Seriously. Please consider making the change.
Remember: Index fast, data slow. Use joins over nested queries. Nested queries return all of the data fields whereas joins just consider the filters (which should all be indexed - make sure there's a unique index on table_product_*.line_id). It's been a while but I'm pretty sure you can join "ON company_id=#" which should cut down the results early on.
In this case, all of the results refer to the same company (or a much smaller subset) so it makes sense to run that query separately (and it makes the query more maintainable).
So your data source would be:
(table_product_stereos as prod
INNER JOIN table_manufacture_line AS ml ON prod.line_id = ml.line_id and prod.company_id=#
UNION
table_product_televisions as prod
INNER JOIN table_manufacture_line as ml on prod.line_id = ml.line_id and prod.company_id=#)
From which you can select prod. or ml. fields as required.
PHP is not a solution at all...
Redis can be a solution.
But the main thing I would change is the index creation for the tables (add missing indexe)...If you're running into temp tables you didn't create indexes well for the tables. And 100k rows in not much at all.
But I cant help you without any table creation statements as well as queries you run.
Make sure your "where part" is part of youf btree index from left to right.
I want to query the database to fetch the last visit time of every user here is the query:
SELECT
u.user_id,
u.firstname,
u.lastname,
u.email,
pv.visit_time
FROM
users u
LEFT OUTER JOIN pageviews pv
ON u.user_id = pv.user_id
GROUP BY pv.user_id
LIMIT 0, 12
This query is taking 30 to 40 seconds to execute on live server, however if i remove the GROUP BY clause then it is taking 3 to 6 seconds but with duplicate records. Any idea what's wrong with this query?
Also i have tried DISTINCT but found same issue.
Thanks, any help would be appreciated.
What are your indexes?
Do you really want a left join, as that would seem irrelevant? Using a LEFT OUTER JOIN it would just seem that you are going to get a row for user_id of NULL, but with nulls also in the other columns.
Further you are using GROUP BY to return a single row for each user. However which row is returned is not defined, so it could be any page views visit_time that is brought back for a user.
Also you have only a single column in the GROUP BY clause but other non aggregate columns in the select. With default options in MySQL this will work but will not work in most flavours of SQL and will also not work when MySQL is performing the group by in strict mode (see this manual page ).
Add a index on u.user_id and a compound index on pv.user_id AND pv.visit_time. Then assuming you want the latest visit time for each user try the query as:-
SELECT u.user_id,
u.firstname,
u.lastname,
u.email,
MAX(pv.visit_time)
FROM users u
INNER JOIN pageviews pv
ON u.user_id = pv.user_id
GROUP BY u.user_id, u.firstname, u.lastname, u.email
ORDER BY u.user_id
LIMIT 0, 12
(strictly speaking the ORDER BY clause is not required as it is implicitly done by the GROUP BY clause, but it does make it more explicit what is expected to anyone reading the code in future).
The group by clause and distinct requires a full scan of the table.
Maybe the query without the group by clause can be faster in returning the first rows, have you checked how long it takes to retrieve the whole result set?
If it takes only 3-6 seconds, I would refresh the statistics, maybe the optimiser is not doing the best choices for the join (I imagine that the table pageviews is a large one).
Select t1.x, t1.y, t1.z from table1 t1 Group by t1.x,t1.y,t1.z....
It will give better performance dude...
Group by fields (x,y,z) should be appended with select statement to get better performance..
Try it ...(group by operation will happen with in result set for above query)
I have a problem with an SQL query. This is my first time using advanced SQL operations like this so it could be that I'm missing something basic. I am running this query:
SELECT countries.id,
countries.name,
AVG(users.points) AS average
FROM countries
LEFT JOIN users
ON countries.id = users.country
ORDER BY average DESC
This query is only returning 1 row and it's not following the ORDER BY because the returned value is . My aim with this is to get all the records in the Countries table and get the average of the points awarded to the users from each country. I want it to return those countries which do not have users assigned to them as well. I have done this in 2 queries and it worked but I thought that maybe I could do only one query. What am I missing?
It is only returning one row because it is an aggregation query without a group by. Perhaps you mean:
SELECT c.id, c.name, AVG(u.points) AS average
FROM countries c LEFT JOIN
users u
ON c.id = u.country
GROUP BY c.id, c.name
ORDER BY average DESC;
The AVG() makes this an aggregation query. Without the the group by, SQL interprets it as returning one row summarizing all the rows. MySQL supports an extension to the SQL standard where columns in the select do not have to be in the group by. In most databases, you query would return an error.
I've been racking my brain for hours trying work out how to join these two queries..
My goal is to return multiple venue rows (from venues) based on certain criteria... which is what my current query does....
SELECT venues.id AS ven_id,
venues.venue_name,
venues.sub_category_id,
venues.score,
venues.lat,
venues.lng,
venues.short_description,
sub_categories.id,
sub_categories.sub_cat_name,
sub_categories.category_id,
categories.id,
categories.category_name,
((ACOS( SIN(51.44*PI()/180)*SIN(lat*PI()/180) + COS(51.44*PI()/180)*COS(lat*PI()/180)*COS((-2.60796 - lng)*PI()/180)) * 180/PI())*60 * 1.1515) AS dist
FROM venues,
sub_categories,
categories
WHERE
venues.sub_category_id = sub_categories.id
AND sub_categories.category_id = categories.id
HAVING
dist < 5
ORDER BY score DESC
LIMIT 0, 100
However, I need to include another field in this query (thumbnail), which comes from another table (venue_images). The idea is to extract one image row based on which venue it's related to and it's order. Only one image needs to be extracted however. So LIMIT 1.
I basically need to insert this query:
SELECT
venue_images.thumb_image_filename,
venue_images.image_venue_id,
venue_images.image_order
FROM venue_images
WHERE venue_images.image_venue_id = ven_id //id from above query
ORDER BY venue_images.image_order
LIMIT 1
Into my first query, and label this new field as "thumbnail".
Any help would really be appreciated. Thanks!
First of all, you could write the first query using INNER JOIN:
SELECT
...
FROM
venues INNER JOIN sub_categories ON venues.sub_category_id = sub_categories.id
INNER JOIN categories ON sub_categories.category_id = categories.id
HAVING
...
the result should be identical, but i like this one more.
What I'd like to do next is to JOIN a subquery, something like this:
...
INNER JOIN (SELECT ... FROM venue_images
WHERE venue_images.image_venue_id = ven_id //id from above query
ORDER BY venue_images.image_order
LIMIT 1) first_image
but unfortunately this subquery can't see ven_id because it is evaluated first, before the outer query (I think it's a limitation of MySql), so we can't use that and we have to find another solution. And since you are using LIMIT 1, it's not easy to rewrite the condition you need using just JOINS.
It would be easier if MySql provided a FIRST() aggregate function, but since it doesn't, we have to simulate it, see for example this question: How to fetch the first and last record of a grouped record in a MySQL query with aggregate functions?
So using this trick, you can write a query that extracts first image_id for every image_venue_id:
SELECT
image_venue_id,
SUBSTRING_INDEX(
GROUP_CONCAT(image_id order by venue_images.image_order),',',1) as first_image_id
FROM venue_images
GROUP BY image_venue_id
and this query could be integrated in your query above:
SELECT
...
FROM
venues INNER JOIN sub_categories ON venues.sub_category_id = sub_categories.id
INNER JOIN categories ON sub_categories.category_id = categories.id
INNER JOIN (the query above) first_image on first_image.image_venue_id = venues.id
INNER JOIN venue_images on first_image.first_image_id = venue_images.image_id
HAVING
...
I also added one more JOIN, to join the first image id with the actual image. I couldn't check your query but the idea is to procede like this.
Since the query is now becoming more complicated and difficult to mantain, i think it would be better to create a view that extracts the first image for every venue, and then join just the view in your query. This is just an idea. Let me know if it works or if you need any help!
I'm not too sure about your data but a JOIN with the thumbnails table and a group by on your large query would probably work.
GROUP BY venues.id