Improving a query/table: Long "Copying to tmp table"

Improving a query/table: Long "Copying to tmp table" - php

I have the following query in mysql:
SELECT t.ID
FROM forum_categories c, forum_threads t
INNER JOIN forum_posts p ON p.ID = t.Last_post
WHERE t.ForumID=36 OR (c.Parent=36 AND t.ForumID=c.ID)
ORDER BY t.Last_post DESC LIMIT 1
The table forum_threads looks like this:
ID --- Title --- ForumID -- Last_post (ID of Last forum Post)
And the table forum_posts like this:
ID --- Content -- Author
And lastly the table forum_categories like this:
ID -- Name --- Parent (Another forum_categoriey)
(both simplified)
The table forum_posts contains currently ~ 200,000 rows and the table forum_threads ~ 5,000 rows
Somehow these queries take about 1-2 seconds sometimes.
I already indexed "Last_post", but it doesn't help.
The "Copying to tmp table" duration makes ~ 99% of the whole execution time of this query
I also increased the tmp_table_size and the sort_buffer_size but it still makes no difference.
Any ideas?

The query should be much better when you have something as
select t.id
from forum_threads t
inner join forum_posts p ON p.ID = t.Last_post
inner join forum_categories c on t.ForumID=c.ID
WHERE t.ForumID=36 OR c.Parent=36
ORDER BY t.Last_post
DESC LIMIT 1
Now for small set of data it will look very nice and the query time will be really good.
So the next thing how to improve it for large set of data and the answer is INDEX.
There are 2 joins happening
There is a where clause as well
So you will need to index the table properly to avoid full table scan.
You can run the following command to see the current indexes on the tables as
show indexes from forum_threads;
show indexes from forum_posts ;
show indexes from forum_categories ;
The above commands will show you the indexes associated with the tables. Now consider the fact that there is no index so we will need to do the indexing as
alter table forum_threads add index Last_post_idx (`Last_post`);
alter table forum_posts add index ID_idx (`ID`);
alter table forum_categories add index ID_idx (`ID`);
and finally
alter table forum_threads add index ForumID_idx (`ForumID`);
alter table forum_categories add index Parent_idx (`Parent`);
Now we have indexes on the tables and query should be way faster.
NOTE : The joining keys between 2 tables should have identical data type and size so that the indexes works. For example
inner join forum_posts p ON p.ID = t.Last_post
the ID and Last_post should be having same data type and size in the tables.
Now we still have an issue on the query it uses OR condition and even with the proper index the query will try to scan the full table in some cases.
WHERE t.ForumID=36 OR c.Parent=36
So how to get rid of this, well sometime UNION works better in this cases. Meaning you run one query with a condtion
WHERE t.ForumID=36
followed by UNION same query with a different where condition as
WHERE c.Parent=36
But the optimization needs more insight on the tables and the possible queries that are going to be executed on those tables.
The explanation above is just an idea how we can improve the performance of the query and there are many possibilities in real time and these could be handled while having the complete table structures and the possible queries that are going to be applied on them.

Related

yii2 data provider query takes very long time

I am using yii2 data Provider to extract data from database. Raw query looks like this
SELECT `client_money_operation`.* FROM `client_money_operation`
LEFT JOIN `user` ON `client_money_operation`.`user_id` = `user`.`id`
LEFT JOIN `client` ON `client_money_operation`.`client_id` = `client`.`id`
LEFT JOIN `client_bonus_operation` ON `client_money_operation`.`id` = `client_bonus_operation`.`money_operation_id`
WHERE (`client_money_operation`.`status`=0) AND (`client_money_operation`.`created_at` BETWEEN 1 AND 1539723600)
GROUP BY `operation_code` ORDER BY `created_at` DESC LIMIT 10
this query takes 107 seconds to execute.
Table client_money operations contains 132000 rows. What do I need to do to optimise this query, or set up my database properly?

Try pagination. But if you must have to show large set of records in one go remove as many left joins as you can. You can duplicate some data in the client_money_operation table if it is certainly required to show in the one-go result set.

SELECT mo.*
FROM `client_money_operation` AS mo
LEFT JOIN `user` AS u ON mo.`user_id` = u.`id`
LEFT JOIN `client` AS c ON mo.`client_id` = c.`id`
LEFT JOIN `client_bonus_operation` AS bo ON mo.`id` = bo.`money_operation_id`
WHERE (mo.`status`=0)
AND (mo.`created_at` BETWEEN 1 AND 1539723600)
GROUP BY `operation_code`
ORDER BY `created_at` DESC
LIMIT 10
is a rather confusing use of GROUP BY. First, it is improper to group by one column while having lots of non-aggregated columns in the SELECT list. And the use of created_at in the ORDER BY does not make sense since it is unclear which date will be associated with each operation_code. Perhaps you want MIN(created_at)?
Optimization...
There will be a full scan of mo and (hopefully) PRIMARY KEY lookups into the other tables. Please provide EXPLAIN SELECT ... so we can check this.
The only useful index on mo is INDEX(status, created_at), and it may or may not be useful, depending on how big that date range is.
bo needs some index starting with money_operation_id.
What table(s) are operation_code and created_at in? It makes a big difference to the Optimizer.
But there is a pattern that can probably be used to greatly speed up the query. (I can't give you details without knowing what table those columns are in, nor whether it can be made to work.)
SELECT mo.*
FROM ( SELECT mo.id FROM .. WHERE .. GROUP BY .. ORDER BY .. LIMIT .. ) AS x
JOIN mo ON x.id = mo.id
ORDER BY .. -- yes, repeated
That is, first do (in a derived table) the minimal work to find ids for the 10 rows desired, then use JOIN(s) to fetch there other columns needed.
(If yii2 cannot be made to generate such, then it is in the way.)

Pagination query mysql+php take 20-40 seconds

Hi I have pagination in angular js in my app, I send the data to my big query that includes the filters that the user set
UPDATE
SQL_CALC_FOUND_ROWS this is my problem. How do I count the rows of specific filters . It is took me 2 second for 100,000 rows I need the number for the pagination as a total number
UPDATE:
I have the following inner query that I missed here :
(select count(*) from students as inner_st where st.name = inner_st.name) as names,
when I remove above inner query is much faster
rows: 50,000
Users table : 4 rows
Classes table : 4 rows
indexes: only id as primary key
query time 20-40 seconds
tables: students.
columns : id, date ,class, name,image,status,user_id,active
table user
coloumn: id,full_name,is_admin
query
SELECT SQL_CALC_FOUND_ROWS st.id,
st.date,
st.image,
st.user_id,
st.status,
st,
ck.name AS class_name,
users.full_name,
(select count(*) from students AS inner_st where st.name = inner_st.name) AS names,
FROM students AS st
LEFT JOIN users ON st.user_id = users.user_id
LEFT JOIN classes AS ck ON st.class = ck.id
WHERE date BETWEEN '2018-01-17' AND DATE_ADD('2018-01-17', INTERVAL 1 DAY)
AND DATE_FORMAT(date,'%H:%i') >= '00:00'
AND DATE_FORMAT(date,'%H:%i') <= '23:59'
AND st.active=1
-- here I can concat filters from web like "and class= 1"
ORDER BY st.date DESC
LIMIT 0, 10
How can I make it faster? when I delete the order by and SQL_CALC_FOUND_ROWS it faster but i need them
I heard about indexes but only primary key is index

Few comments before recommending a different approach to this query:
Did you consider removing SQL_CALC_FOUND_ROWS and instead running two queries (one that counts and one that selects the data)? In some cases it might be quicker than joining them both to one query.
What is the goal of these conditions? What are you trying to achieve? Can we remove them (as it seems they might always return true?) - AND DATE_FORMAT(st.date, '%H:%i') >= '00:00' AND DATE_FORMAT(st.date, '%H:%i') <= '23:59'
You only need 10 results, but the database will have to run the "names" subquery for each of the results before the LIMIT (which might be a lot?). Therefore, I would recommend to extract the subquery from the SELECT clause to a temporary table, index it and join to it (see fixed query below).
To optimize the query, let's begin with adding these indexes:
ALTER TABLE `classes` ADD INDEX `classes_index_1` (`id`, `name`);
ALTER TABLE `students` ADD INDEX `students_index_1` (`active`, `user_id`, `class`, `name`, `date`);
ALTER TABLE `users` ADD INDEX `users_index_1` (`user_id`, `full_name`);
Now create the temporary table (originally this was a subquery in the SELECT clause) and index it:
-- Transformed subquery to a temp table to improve performance
CREATE TEMPORARY TABLE IF NOT EXISTS temp1 AS SELECT
count(*) AS names,
name
FROM
students AS inner_st
WHERE
1 = 1
GROUP BY
name
ORDER BY
NULL
-- This index is required for optimal temp tables performance
ALTER TABLE
`temp1`
ADD
INDEX `temp1_index_1` (`name`, `names`);
And the modified query:
SELECT
SQL_CALC_FOUND_ROWS st.id,
st.date,
st.image,
st.user_id,
st.status,
ck.name AS class_name,
users.full_name,
temp1.names
FROM
students AS st
LEFT JOIN
users
ON st.user_id = users.user_id
LEFT JOIN
classes AS ck
ON st.class = ck.id
LEFT JOIN
temp1
ON st.name = temp1.name
WHERE
st.date BETWEEN '2018-01-17' AND DATE_ADD('2018-01-17', INTERVAL 1 DAY)
AND st.active = 1
ORDER BY
st.date DESC LIMIT 0,
10

Give this a try first:
INDEX(active, date)
Is user_id the PK for users? Is class_id the PK for classes? If not, then they should be INDEXed.
Why are you testing the times separate?
Fix the test so it is obvious which table each column is in.
Do you really need LEFT JOIN? Or would JOIN suffice? In the latter case, there are more optimization options.
Give some realistic examples of other SELECTs; different index(es) may be needed.
Is the "first" page slow? Or only later pages? See this for pagination optimization -- by not using OFFSET.

Query performance issue

am working with mySql, and with below query am getting performance issue:
SELECT COUNT(*)
FROM
(SELECT company.ID
FROM `company`
INNER JOIN `featured_company` ON (company.ID=featured_company.COMPANY_ID)
INNER JOIN `company_portal` ON (company.ID=company_portal.COMPANY_ID)
INNER JOIN `job` ON company.ID = job.COMPANY_ID
WHERE featured_company.DATE_START<='2016-09-21'
AND featured_company.DATE_END>='2016-09-21'
AND featured_company.PORTAL_ID=16
AND company_portal.PORTAL_ID=16
AND (company.IMAGE IS NOT NULL
AND company.IMAGE<>'')
AND job.IS_ACTIVE=1
AND job.IS_DELETED=0
AND job.EXPIRATION_DATE >= '2016-09-21'
AND job.ACTIVATION_DATE <= '2016-09-21'
GROUP BY company.ID)
with this query am getting below newrelic log (Query analysis:
Table - Hint):
featured_company
- The table was retrieved with this index: portal_date_start_end
- A temporary table was created to access this part of the query, which can cause poor performance. This typically happens if the query contains GROUP BY and ORDER BY clauses that list columns differently.
- MySQL had to do an extra pass to retrieve the rows in sorted order, which is a cause of poor performance but sometimes unavoidable.
- You can speed up this query by querying only fields that are within the index. Or you can create an index that includes every field in your query, including the primary key.
Approximately 89 rows of this table were scanned.
company_portal
- The table was retrieved with this index: PRIMARY
- Approximately 1 row of this table was scanned.
job
- The table was retrieved with this index: company_expiration_date
- You can speed up this query by querying only fields that are within the index. Or you can create an index that includes every field in your query, including the primary key.
- Approximately 37 rows of this table were scanned.
company
- The table was retrieved with this index: PRIMARY
- You can speed up this query by querying only fields that are within the index. Or you can create an index that includes every field in your query, including the primary key.
- Approximately 1 row of this table was scanned.
I don't get idea what more I can do for this query optimization, please provide ideas if you have

Be sure you have proper index on :
featured_company.DATE_START
featured_company.PORTAL_ID
job.IS_ACTIVE
job.IS_DELETED
job.EXPIRATION_DATE
job.ACTIVATION_DATE
and eventually
company.IMAGE
Assuming that the id are already indexed
company.ID
featured_company.COMPANY_ID
job.COMPANY_ID
and a suggestion based on the fact you don't use aggregation function don't use group by use DISTINCT instead
company.ID
featured_company.COMPANY_ID
job.COMPANY_ID
SELECT COUNT(*) FROM (
SELECT DISTINCT company.ID
FROM `company`
INNER JOIN `featured_company` ON company.ID=featured_company.COMPANY_ID
INNER JOIN `company_portal` ON company.ID=company_portal.COMPANY_ID
INNER JOIN `job` ON company.ID = job.COMPANY_ID
WHERE featured_company.DATE_START<='2016-09-21'
AND featured_company.DATE_END>='2016-09-21'
AND featured_company.PORTAL_ID=16
AND company_portal.PORTAL_ID=16
AND (company.IMAGE IS NOT NULL AND company.IMAGE<>'')
AND job.IS_ACTIVE=1
AND job.IS_DELETED=0
AND job.EXPIRATION_DATE >= '2016-09-21'
AND job.ACTIVATION_DATE <= '2016-09-21'
)

Displaying a large amount of data in paging table without heavily impacting DB

The current implementation is a single complex query with multiple joins and temporary tables, but is putting too much stress on my MySQL and is taking upwards of 30+ seconds to load the table. The data is retrieved by PHP via a JavaScript Ajax call and displayed on a webpage. Here is the tables involved:
Table: table_companies
Columns: company_id, ...
Table: table_manufacture_line
Columns: line_id, line_name, ...
Table: table_product_stereo
Columns: product_id, line_id, company_id, assembly_datetime, serial_number, ...
Table: table_product_television
Columns: product_id, line_id, company_id, assembly_datetime, serial_number, warranty_expiry, ...
A single company can have 100k+ items split between the two product tables. The product tables are unioned and filtered by the line_name, then ordered by assembly_datetime and limited depending on the paging. The datetime value is also reliant on timezone and this is applied as part of the query (another JOIN + temp table). line_name is also one of the returned columns.
I was thinking of splitting the line_name filter out from the product union query. Essentially I'd determine the ids of the lines that correspond to the filter, then do a UNION query with a WHERE condition WHERE line_id IN (<results from previous query>). This would cut out the need for joins and temp tables, and I can apply the line_name to line_id and timezone modification in PHP, but I'm not sure this is the best way to go about things.
I have also looked at potentially using Redis, but the large number of individual products is leading to a similarly long wait time when pushing all of the data to Redis via PHP (20-30 seconds), even if it is just pulled in directly from the product tables.
Is it possible to tweak the existing queries to increase the efficiency?
Can I push some of the handling to PHP to decrease the load on the SQL server? What about Redis?
Is there a way to architect the tables better?
What other solution(s) would you suggest?
I appreciate any input you can provide.
Edit:
Existing query:
SELECT line_name,CONVERT_TZ(datetime,'UTC',timezone) datetime,... FROM (SELECT line_name,datetime,... FROM ((SELECT line_id,assembly_datetime datetime,... FROM table_product_stereos WHERE company_id=# ) UNION (SELECT line_id,assembly_datetime datetime,... FROM table_product_televisions WHERE company_id=# )) AS union_products INNER JOIN table_manufacture_line USING (line_id)) AS products INNER JOIN (SELECT timezone FROM table_companies WHERE company_id=# ) AS tz ORDER BY datetime DESC LIMIT 0,100
Here it is formatted for some readability.
SELECT line_name,CONVERT_TZ(datetime,'UTC',tz.timezone) datetime,...
FROM (SELECT line_name,datetime,...
FROM (SELECT line_id,assembly_datetime datetime,...
FROM table_product_stereos WHERE company_id=#
UNION
SELECT line_id,assembly_datetime datetime,...
FROM table_product_televisions
WHERE company_id=#
) AS union_products
INNER JOIN table_manufacture_line USING (line_id)
) AS products
INNER JOIN (SELECT timezone
FROM table_companies
WHERE company_id=#
) AS tz
ORDER BY datetime DESC LIMIT 0,100
IDs are indexed; Primary keys are the first key for each column.

Let's build this query up from its component parts to see what we can optimize.
Observation: you're fetching the 100 most recent rows from the union of two large product tables.
So, let's start by trying to optimize the subqueries fetching stuff from the product tables. Here is one of them.
SELECT line_id,assembly_datetime datetime,...
FROM table_product_stereos
WHERE company_id=#
But look, you only need the 100 newest entries here. So, let's add
ORDER BY assembly_datetime DESC
LIMIT 100
to this query. Also, you should put a compound index on this table as follows. This will allow both the WHERE and ORDER BY lookups to be satisfied by the index.
CREATE INDEX id_date ON table_product_stereos (company_id, assembly_datetime)
All the same considerations apply to the query from table_product_televisions. Order it by the time, limit it to 100, and index it.
If you need to apply other selection criteria, you can put them in these inner queries. For example, in a comment you mentioned a selection based on a substring search. You could do this as follows
SELECT t.line_id,t.assembly_datetime datetime,...
FROM table_product_stereos AS t
JOIN table_manufacture_line AS m ON m.line_id = t.line_id
AND m.line_name LIKE '%test'
WHERE company_id=#
ORDER BY assembly_datetime DESC
LIMIT 100
Next, you are using UNION to combine those two query result sets into one. UNION has the function of eliminating duplicates, which is time-consuming. (You know you don't have duplicates, but MySQL doesn't.) Use UNION ALL instead.
Putting this all together, the innermost sub query becomes this. We have to wrap up the subqueries because SQL is confused by UNION and ORDER BY clauses at the same query level.
SELECT * FROM (
SELECT line_id,assembly_datetime datetime,...
FROM table_product_stereos
WHERE company_id=#
ORDER BY assembly_datetime DESC
LIMIT 100
) AS st
UNION ALL
SELECT * FROM (
SELECT line_id,assembly_datetime datetime,...
FROM table_product_televisions
WHERE company_id=#
ORDER BY assembly_datetime DESC
LIMIT 100
) AS tv
That gets you 200 rows. It should get those rows fairly quickly.
200 rows are guaranteed to be enough to give you the 100 most recent items later on after you do your outer ORDER BY ... LIMIT operation. But that operation only has to crunch 200 rows, not 100K+, so it will be far faster.
Finally wrap up this query in your outer query material. Join the table_manufacture_line information, and fix up the timezone.
If you do the indexing and the ORDER BY ... LIMIT operation earlier, this query should become very fast.
The comment dialog in your question indicates to me that you may have multiple product types, not just two, and that you have complex selection criteria for your paged display. Using UNION ALL on large numbers of rows slams performance: it converts multiple indexed tables into an internal list of rows that simply can't be searched efficiently.
You really should consider putting your two kinds of product data in a single table instead of having to UNION ALL multiple product tables. The setup you have now is inflexible and won't scale up easily. If you structure your schema with a master product table and perhaps some attribute tables for product-specific information, you will find yourself much happier two years from now. Seriously. Please consider making the change.

Remember: Index fast, data slow. Use joins over nested queries. Nested queries return all of the data fields whereas joins just consider the filters (which should all be indexed - make sure there's a unique index on table_product_*.line_id). It's been a while but I'm pretty sure you can join "ON company_id=#" which should cut down the results early on.
In this case, all of the results refer to the same company (or a much smaller subset) so it makes sense to run that query separately (and it makes the query more maintainable).
So your data source would be:
(table_product_stereos as prod
INNER JOIN table_manufacture_line AS ml ON prod.line_id = ml.line_id and prod.company_id=#
UNION
table_product_televisions as prod
INNER JOIN table_manufacture_line as ml on prod.line_id = ml.line_id and prod.company_id=#)
From which you can select prod. or ml. fields as required.

PHP is not a solution at all...
Redis can be a solution.
But the main thing I would change is the index creation for the tables (add missing indexe)...If you're running into temp tables you didn't create indexes well for the tables. And 100k rows in not much at all.
But I cant help you without any table creation statements as well as queries you run.
Make sure your "where part" is part of youf btree index from left to right.

How to improve the performance of MYSQL query with large data?

I am using MySQL tables that have the following data:
users(ID, name, email, create_added) (about 10000 rows)
points(user_id, point) (about 15000 rows)
And my query:
SELECT u.*, SUM(p.point) point
FROM users u
LEFT JOIN points p ON p.user_id = u.ID
WHERE u.id > 0
GROUP BY u.id
ORDER BY point DESC
LIMIT 0, 10
I only get the top 10 users having best point, but then it dies. How can I improve the performance of my query?

Like #Grim said, you can use INNER JOIN instead of LEFT JOIN. However, if you truly look for optimization, I would suggest you to have an extra field at table users with a precalculate point. This solution would beat any query optimization with your current database design.

Swapping the LEFT JOIN for an INNER JOIN would help a lot. Make sure points.point and points.user_id are indexed. I assume you can get rid of the WHERE clause, as u.id will always be more than 0 (although MySQL probably does this for you at the query optimisation stage).

It doesn't really matter than you are getting only 10 rows. MySQL has to sum up the points for every user, before it can sort them ("Using filesort" operation.) That LIMIT is applied last.
A covering index ON points(user_id,point) is going to be the best bet for optimum performance. (I'm really just guessing, without any EXPLAIN output or table definitions.)
The id column in users is likely the primary key, or at least a unique index. So it's likely you already have an index with id as the leading column, or primary key cluster index if it's InnoDB.)
I'd be tempted to test a query like this:
SELECT u.*
, s.total_points
FROM ( SELECT p.user_id
, SUM(p.point) AS total_points
FROM points p
WHERE p.user_id > 0
GROUP BY p.user_id
ORDER BY total_points DESC
LIMIT 10
) s
JOIN user u
ON u.id = s.user_id
ORDER BY s.total_points DESC
That does have the overhead of creating a derived table, but with a suitable index on points, with a leading column of user_id, and including the point column, it's likely that MySQL can optimize the group by using the index, and avoiding one "Using filesort" operation (for the GROUP BY).
There will likely be a "Using filesort" operation on that resultset, to get the rows ordered by total_points. Then get the first 10 rows from that.
With those 10 rows, we can join to the user table to get the corresponding rows.
BUT.. there is one slight difference with this result, if any of the values of user_id that are in the top 10 which aren't in the user table, then this query will return less than 10 rows. (I'd expect there to be a foreign key defined, so that wouldn't happen, but I'm really just guessing without table definitions.)
An EXPLAIN would show the access plan being used by MySQL.

Ever thought about partitioning?
I'm currently working with large database and successfully improve sql query.
For example,
PARTITION BY RANGE (`ID`) (
PARTITION p1 VALUES LESS THAN (100) ENGINE = InnoDB,
PARTITION p2 VALUES LESS THAN (200) ENGINE = InnoDB,
PARTITION p3 VALUES LESS THAN (300) ENGINE = InnoDB,
... and so on..
)
It allows us to get better speed while scanning mysql table. Mysql will scan only partition p 1 that contains userid 1 to 99 even if there are million rows in table.
Check out this http://dev.mysql.com/doc/refman/5.5/en/partitioning.html

We Keep Coding

PHP, A popular general-purpose scripting language that is especially suited to web development.