How to optimize a SQL query using multiple tables - php

I have this SQL query here that grabs the 5 latest news posts. I want to make it so it also grabs the total likes and total news comments in the same query. But the query I made seems to be a little slow when working with large amounts of data so I am trying to see if I can find a better solution. Here it is below:
SELECT *,
`id` as `newscode`,
(SELECT COUNT(*) FROM `likes` WHERE `type`="newspost" AND `code`=`newscode`) as `total_likes`,
(SELECT COUNT(*) FROM `news_comments` WHERE `post_id`=`newscode`) as `total_comments`
FROM `news` ORDER BY `id` DESC LIMIT 5
Here is a SQLFiddle as well: http://sqlfiddle.com/#!2/d3ecbf/1

I would recommend adding a total_likes and total_comments fields to the news table which gets incremented/decremented whenever a like and/or comment is added or removed.
Your likes and news_comments tables should be used for historical purposes only.
This strenuous counting should not be performed every time a page is loaded because that is a complete waste of resources.

You could rewrite this using joins, MySQL has known issues with subqueries, especially when dealing with large data sets:
SELECT n.*,
`id` as `newscode`,
COALESCE(l.TotalLikes, 0) AS `total_likes`,
COALESCE(c.TotalComments, 0) AS `total_comments`
FROM `news` n
LEFT JOIN
( SELECT Code, COUNT(*) AS TotalLikes
FROM `likes`
WHERE `type` = "newspost"
GROUP BY Code
) AS l
ON l.`code` = n.`id`
LEFT JOIN
( SELECT post_id, COUNT(*) AS TotalComments
FROM `news_comments`
GROUP BY post_id
) AS c
ON c.`post_id` = n.`id`
ORDER BY n.`id` DESC LIMIT 5;
The reason is that when you use a join as above, MySQL will materialise the results of the subquery when it is first needed, e.g at the start of this query, mySQL will put the results of:
SELECT post_id, COUNT(*) AS TotalComments
FROM `news_comments`
GROUP BY post_id
into an in memory table and hash post_id for faster lookups. Then for each row in news it only has to look up TotalComments from this hashed table, when you use a correlated subquery it will execute the query once for each row in news, which when news is large will result in a large number of executions. If the initial result set is small you may not see a performance benefit and it may be worse.
Examples on SQL Fiddle
Finally, you may want to index the relevant fields in news_comments and likes. For this particular query I think the following indexes will help:
CREATE INDEX IX_Likes_Code_Type ON Likes (Code, Type);
CREATE INDEX IX_newcomments_post_id ON news_comments (post_id);
Although you may need to split the first index into two:
CREATE INDEX IX_Likes_Code ON Likes (Code);
CREATE INDEX IX_Likes_Type ON Likes (Type);

First check for helping indexes on columns id, post_id and type,code.

I assume this is T-SQL, as that is what I am most familiar with.
First I would check indexes. If that looks good, then I'd check statement. Take a look at your query map to see how it's populating your result.
SQL works backward, so it starts with your last AND statement and goes from there. It'll group them all by code, and then type, and finally give you a count.
Right now, you're grabbing everything with certain codes, regardless of date. When you stated that you want the latest, I assume there is a date column somewhere.
In order to speed things up, add another AND to your WHERE and account for the date. Either last 24 hours, last week, whatever.

Related

Select additional rows depending on each row's value

What I'm trying to do is: I'm trying to build a comments and replies system for my website. I already have this working, but the way I'm doing is probably not the best for performance.
I want to select 10 rows from a table that contains the comments, then I want to select 2 additional rows from another table that contains the replies for each of these comments. I do that by having a loop on PHP to select 2 replies from another table for each comment. Something more or less like this:
$comments = $MySQL->fetchRows("SELECT id, text FROM comments LIMIT 10");
foreach($comments as $i => $c) {
$comments[$i]["replies"] = $MySQL->fetchRows("SELECT id, text FROM replies WHERE comment_id = $c['id'] LIMIT 2");
}
Like I said, I'm sure this isn't the most optimal way of doing it, since it requires multiple calls to the database. Is there a better way of doing this in a single query using MySQL?
I often have 40 well-tuned queries on a single web page. It is not bad.
On the other hand, JOINs, UNIONs, Stored Procedures, etc can cut down the number of roundtrips to the server.
Notes:
A LIMIT without an ORDER BY does not make much sense.
The two queries you have can be combined using a JOIN and a "derived table".
SELECT c.text, r.text
FROM ( SELECT id, text FROM comments
WHERE ...
ORDER BY ...
LIMIT 10 ) AS c
JOIN replies AS r
ON r.id = c.id -- Really the same id??
That will find all the replies from some 10 "comments".
To have limits on both gets trickier.

yii2 data provider query takes very long time

I am using yii2 data Provider to extract data from database. Raw query looks like this
SELECT `client_money_operation`.* FROM `client_money_operation`
LEFT JOIN `user` ON `client_money_operation`.`user_id` = `user`.`id`
LEFT JOIN `client` ON `client_money_operation`.`client_id` = `client`.`id`
LEFT JOIN `client_bonus_operation` ON `client_money_operation`.`id` = `client_bonus_operation`.`money_operation_id`
WHERE (`client_money_operation`.`status`=0) AND (`client_money_operation`.`created_at` BETWEEN 1 AND 1539723600)
GROUP BY `operation_code` ORDER BY `created_at` DESC LIMIT 10
this query takes 107 seconds to execute.
Table client_money operations contains 132000 rows. What do I need to do to optimise this query, or set up my database properly?
Try pagination. But if you must have to show large set of records in one go remove as many left joins as you can. You can duplicate some data in the client_money_operation table if it is certainly required to show in the one-go result set.
SELECT mo.*
FROM `client_money_operation` AS mo
LEFT JOIN `user` AS u ON mo.`user_id` = u.`id`
LEFT JOIN `client` AS c ON mo.`client_id` = c.`id`
LEFT JOIN `client_bonus_operation` AS bo ON mo.`id` = bo.`money_operation_id`
WHERE (mo.`status`=0)
AND (mo.`created_at` BETWEEN 1 AND 1539723600)
GROUP BY `operation_code`
ORDER BY `created_at` DESC
LIMIT 10
is a rather confusing use of GROUP BY. First, it is improper to group by one column while having lots of non-aggregated columns in the SELECT list. And the use of created_at in the ORDER BY does not make sense since it is unclear which date will be associated with each operation_code. Perhaps you want MIN(created_at)?
Optimization...
There will be a full scan of mo and (hopefully) PRIMARY KEY lookups into the other tables. Please provide EXPLAIN SELECT ... so we can check this.
The only useful index on mo is INDEX(status, created_at), and it may or may not be useful, depending on how big that date range is.
bo needs some index starting with money_operation_id.
What table(s) are operation_code and created_at in? It makes a big difference to the Optimizer.
But there is a pattern that can probably be used to greatly speed up the query. (I can't give you details without knowing what table those columns are in, nor whether it can be made to work.)
SELECT mo.*
FROM ( SELECT mo.id FROM .. WHERE .. GROUP BY .. ORDER BY .. LIMIT .. ) AS x
JOIN mo ON x.id = mo.id
ORDER BY .. -- yes, repeated
That is, first do (in a derived table) the minimal work to find ids for the 10 rows desired, then use JOIN(s) to fetch there other columns needed.
(If yii2 cannot be made to generate such, then it is in the way.)

Displaying a large amount of data in paging table without heavily impacting DB

The current implementation is a single complex query with multiple joins and temporary tables, but is putting too much stress on my MySQL and is taking upwards of 30+ seconds to load the table. The data is retrieved by PHP via a JavaScript Ajax call and displayed on a webpage. Here is the tables involved:
Table: table_companies
Columns: company_id, ...
Table: table_manufacture_line
Columns: line_id, line_name, ...
Table: table_product_stereo
Columns: product_id, line_id, company_id, assembly_datetime, serial_number, ...
Table: table_product_television
Columns: product_id, line_id, company_id, assembly_datetime, serial_number, warranty_expiry, ...
A single company can have 100k+ items split between the two product tables. The product tables are unioned and filtered by the line_name, then ordered by assembly_datetime and limited depending on the paging. The datetime value is also reliant on timezone and this is applied as part of the query (another JOIN + temp table). line_name is also one of the returned columns.
I was thinking of splitting the line_name filter out from the product union query. Essentially I'd determine the ids of the lines that correspond to the filter, then do a UNION query with a WHERE condition WHERE line_id IN (<results from previous query>). This would cut out the need for joins and temp tables, and I can apply the line_name to line_id and timezone modification in PHP, but I'm not sure this is the best way to go about things.
I have also looked at potentially using Redis, but the large number of individual products is leading to a similarly long wait time when pushing all of the data to Redis via PHP (20-30 seconds), even if it is just pulled in directly from the product tables.
Is it possible to tweak the existing queries to increase the efficiency?
Can I push some of the handling to PHP to decrease the load on the SQL server? What about Redis?
Is there a way to architect the tables better?
What other solution(s) would you suggest?
I appreciate any input you can provide.
Edit:
Existing query:
SELECT line_name,CONVERT_TZ(datetime,'UTC',timezone) datetime,... FROM (SELECT line_name,datetime,... FROM ((SELECT line_id,assembly_datetime datetime,... FROM table_product_stereos WHERE company_id=# ) UNION (SELECT line_id,assembly_datetime datetime,... FROM table_product_televisions WHERE company_id=# )) AS union_products INNER JOIN table_manufacture_line USING (line_id)) AS products INNER JOIN (SELECT timezone FROM table_companies WHERE company_id=# ) AS tz ORDER BY datetime DESC LIMIT 0,100
Here it is formatted for some readability.
SELECT line_name,CONVERT_TZ(datetime,'UTC',tz.timezone) datetime,...
FROM (SELECT line_name,datetime,...
FROM (SELECT line_id,assembly_datetime datetime,...
FROM table_product_stereos WHERE company_id=#
UNION
SELECT line_id,assembly_datetime datetime,...
FROM table_product_televisions
WHERE company_id=#
) AS union_products
INNER JOIN table_manufacture_line USING (line_id)
) AS products
INNER JOIN (SELECT timezone
FROM table_companies
WHERE company_id=#
) AS tz
ORDER BY datetime DESC LIMIT 0,100
IDs are indexed; Primary keys are the first key for each column.
Let's build this query up from its component parts to see what we can optimize.
Observation: you're fetching the 100 most recent rows from the union of two large product tables.
So, let's start by trying to optimize the subqueries fetching stuff from the product tables. Here is one of them.
SELECT line_id,assembly_datetime datetime,...
FROM table_product_stereos
WHERE company_id=#
But look, you only need the 100 newest entries here. So, let's add
ORDER BY assembly_datetime DESC
LIMIT 100
to this query. Also, you should put a compound index on this table as follows. This will allow both the WHERE and ORDER BY lookups to be satisfied by the index.
CREATE INDEX id_date ON table_product_stereos (company_id, assembly_datetime)
All the same considerations apply to the query from table_product_televisions. Order it by the time, limit it to 100, and index it.
If you need to apply other selection criteria, you can put them in these inner queries. For example, in a comment you mentioned a selection based on a substring search. You could do this as follows
SELECT t.line_id,t.assembly_datetime datetime,...
FROM table_product_stereos AS t
JOIN table_manufacture_line AS m ON m.line_id = t.line_id
AND m.line_name LIKE '%test'
WHERE company_id=#
ORDER BY assembly_datetime DESC
LIMIT 100
Next, you are using UNION to combine those two query result sets into one. UNION has the function of eliminating duplicates, which is time-consuming. (You know you don't have duplicates, but MySQL doesn't.) Use UNION ALL instead.
Putting this all together, the innermost sub query becomes this. We have to wrap up the subqueries because SQL is confused by UNION and ORDER BY clauses at the same query level.
SELECT * FROM (
SELECT line_id,assembly_datetime datetime,...
FROM table_product_stereos
WHERE company_id=#
ORDER BY assembly_datetime DESC
LIMIT 100
) AS st
UNION ALL
SELECT * FROM (
SELECT line_id,assembly_datetime datetime,...
FROM table_product_televisions
WHERE company_id=#
ORDER BY assembly_datetime DESC
LIMIT 100
) AS tv
That gets you 200 rows. It should get those rows fairly quickly.
200 rows are guaranteed to be enough to give you the 100 most recent items later on after you do your outer ORDER BY ... LIMIT operation. But that operation only has to crunch 200 rows, not 100K+, so it will be far faster.
Finally wrap up this query in your outer query material. Join the table_manufacture_line information, and fix up the timezone.
If you do the indexing and the ORDER BY ... LIMIT operation earlier, this query should become very fast.
The comment dialog in your question indicates to me that you may have multiple product types, not just two, and that you have complex selection criteria for your paged display. Using UNION ALL on large numbers of rows slams performance: it converts multiple indexed tables into an internal list of rows that simply can't be searched efficiently.
You really should consider putting your two kinds of product data in a single table instead of having to UNION ALL multiple product tables. The setup you have now is inflexible and won't scale up easily. If you structure your schema with a master product table and perhaps some attribute tables for product-specific information, you will find yourself much happier two years from now. Seriously. Please consider making the change.
Remember: Index fast, data slow. Use joins over nested queries. Nested queries return all of the data fields whereas joins just consider the filters (which should all be indexed - make sure there's a unique index on table_product_*.line_id). It's been a while but I'm pretty sure you can join "ON company_id=#" which should cut down the results early on.
In this case, all of the results refer to the same company (or a much smaller subset) so it makes sense to run that query separately (and it makes the query more maintainable).
So your data source would be:
(table_product_stereos as prod
INNER JOIN table_manufacture_line AS ml ON prod.line_id = ml.line_id and prod.company_id=#
UNION
table_product_televisions as prod
INNER JOIN table_manufacture_line as ml on prod.line_id = ml.line_id and prod.company_id=#)
From which you can select prod. or ml. fields as required.
PHP is not a solution at all...
Redis can be a solution.
But the main thing I would change is the index creation for the tables (add missing indexe)...If you're running into temp tables you didn't create indexes well for the tables. And 100k rows in not much at all.
But I cant help you without any table creation statements as well as queries you run.
Make sure your "where part" is part of youf btree index from left to right.

Include NULL as 0 in COUNT SQL Query

I know for a fact this has been asked a few times before, but none of the answered questions relating to this seem to work or are far too confusing for me..
I should probably explain.
I'm trying to create an AJAX script to run to order some results by the number of 'Likes' it has.
My current code is this:
SELECT COUNT(*) AS total, likes.palette_id, palette.*
FROM likes LEFT JOIN palette ON likes.palette_id = palette.palette_id
GROUP BY likes.palette_id
ORDER BY total DESC
Which works fine, however it doesn't list the results with 0 likes for obvious reasons, they don't exist in the table.
I've attached images of the current tables:
Likes table:
http://imgur.com/EGeR3On
Palette table:
http://imgur.com/fKZmSve
There are no results in the likes table until the user clicks 'Like'. It is then that the database gets updated and the palette_id and user_id are inserted.
I'm trying to count how many times *palette_id* occurs in the likes table but also display 0 for all palettes that don't appear in the likes table.
Is this possible? If so, can someone help me out at all?
Thank you
It might not be the exact MySQL syntax (I'm used to SQL Server), but should be pretty straight forward to translate if needed.
SELECT p.*, IFNULL(l.total, 0) AS total
FROM palette p
LEFT JOIN (
SELECT palette_id, COUNT(*) AS total
FROM likes
GROUP BY palette_id
) l
ON l.palette_id = p.palette_id
ORDER BY total
Try this:
SELECT COUNT(likes.palette_id) AS total, palette.palette_id, palette.*
FROM palette LEFT JOIN likes ON likes.palette_id = palette.palette_id
GROUP BY palette.palette_id
ORDER BY total DESC
EDIT:
In regards to the discussion about listing columns that are not in the GROUP BY, there's a good explanation in this MySql documentation page.
MySQL extends the use of GROUP BY so that the select list can refer
to nonaggregated columns not named in the GROUP BY clause. This means
that the preceding query is legal in MySQL. You can use this feature
to get better performance by avoiding unnecessary column sorting and
grouping. However, this is useful primarily when all values in each
nonaggregated column not named in the GROUP BY are the same for each
group. The server is free to choose any value from each group, so
unless they are the same, the values chosen are indeterminate.
In this example, the palette information not added to the GROUP BY will be the same for each group because we are grouping by palette_id so there won't be any issue using palette.*
Your join is written backwards. It should be palette LEFT JOIN likes, because you want all rows in palette and rows in likes, if they exist. The "all rows in palette" will get you a palette_id for the entries there without any matching "likes."

mysql join not working

I have two tables: "users" and "posts." The posts table has a 'post' column and a 'poster_id' column. I'm working on a PHP page that shows the latest posts by everyone, like this:
SELECT * FROM posts WHERE id < '$whatever' LIMIT 10
This way, I can print each result like this:
id: 43, poster_id:'4', post: hello, world
id: 44, poster_id:'4', post: hello, ward
id: 45, poster_id:'5', post: oh hi!
etc...
Instead of the id, I would like to display the NAME of the poster (there's a column for it in the 'users' table)
I've tried the following:
SELECT *
FROM posts
WHERE id < '$whatever'
INNER JOIN users
ON posts.poster_id = users.id LIMIT 10
Is this the correct type of join for this task? Before learning about joins, I would query the users table for each post result. The result should end up looking similar to this:
id: 43, poster_id:'4', name:'foo', post: hello, world
id: 44, poster_id:'4', name:'foo', post: hello, ward
id: 45, poster_id:'5', name:'fee', post: oh hi!
etc...
Thanks for helping in advance.
WHERE clause must come after the FROM clause.
SELECT posts.*, users.* // select your desired columns
FROM posts
INNER JOIN users ON posts.poster_id = users.id
WHERE id < '$whatever'
LIMIT 10
the SQL Order of Operation is as follows:
FROM clause
WHERE clause
GROUP BY clause
HAVING clause
SELECT clause
ORDER BY clause
UPDATE 1
For those column names that exists on both tables, add an ALIAS on them so it can be uniquely identified. example,
SELECT post.colName as PostCol,
users.colName as UserCol, ....
FROM ....
on the example above, both tables has column name colName. In order to get them both, you need to add alias on them so in your front end, use PostCol and UserCol to get their values.
Try:
SELECT *
FROM posts
INNER JOIN users ON posts.poster_id = users.id
WHERE posts.id < '$whatever'
LIMIT 10
Got the syntax a little incorrect.
Should be
SELECT * FROM posts
INNER JOIN users ON posts.poster_id = users.id
WHERE id < '$whatever' LIMIT 10
The answers already given tell you the main reason for your query not working at all (ie the WHERE clause should come after the JOIN clauses), however, I'd like to make a couple of additional points:
I would suggest using an OUTER JOIN for this. It probably won't make much difference, but in the event of a post record having an invalid poster_id, an INNER JOIN will mean the record is dropped from the results, whereas an OUTER JOIN will mean that the record is included, but the values from the users table will be null. I imagine you don't want to ever have an invalid poster_id on the posts table, but broken data does happen even in the best regulated system, and it is helpful in these cases to still get the data from the query.
I would strongly suggest not doing SELECT *, and instead itemising the fields you want to get back from the query. SELECT * has a number of problems, but it's particularly bad when you have multiple tables in the query, because if you have fields with the same name on both tables, (eg id), then it becomes very hard to distinguish which one you're working with, as your PHP recordset won't include the table reference. Itemising the fields may make your query string longer, but it won't make it any slower - if anything it'll be quicker - and it will be easier to work with in the long run.
Neither of these points are essential; the query will work without them (as long as you switch the WHERE clause to after the JOIN), but they may improve your query and hopefully also improve your understanding of SQL.

Categories