am working with mySql, and with below query am getting performance issue:
SELECT COUNT(*)
FROM
(SELECT company.ID
FROM `company`
INNER JOIN `featured_company` ON (company.ID=featured_company.COMPANY_ID)
INNER JOIN `company_portal` ON (company.ID=company_portal.COMPANY_ID)
INNER JOIN `job` ON company.ID = job.COMPANY_ID
WHERE featured_company.DATE_START<='2016-09-21'
AND featured_company.DATE_END>='2016-09-21'
AND featured_company.PORTAL_ID=16
AND company_portal.PORTAL_ID=16
AND (company.IMAGE IS NOT NULL
AND company.IMAGE<>'')
AND job.IS_ACTIVE=1
AND job.IS_DELETED=0
AND job.EXPIRATION_DATE >= '2016-09-21'
AND job.ACTIVATION_DATE <= '2016-09-21'
GROUP BY company.ID)
with this query am getting below newrelic log (Query analysis:
Table - Hint):
featured_company
- The table was retrieved with this index: portal_date_start_end
- A temporary table was created to access this part of the query, which can cause poor performance. This typically happens if the query contains GROUP BY and ORDER BY clauses that list columns differently.
- MySQL had to do an extra pass to retrieve the rows in sorted order, which is a cause of poor performance but sometimes unavoidable.
- You can speed up this query by querying only fields that are within the index. Or you can create an index that includes every field in your query, including the primary key.
Approximately 89 rows of this table were scanned.
company_portal
- The table was retrieved with this index: PRIMARY
- Approximately 1 row of this table was scanned.
job
- The table was retrieved with this index: company_expiration_date
- You can speed up this query by querying only fields that are within the index. Or you can create an index that includes every field in your query, including the primary key.
- Approximately 37 rows of this table were scanned.
company
- The table was retrieved with this index: PRIMARY
- You can speed up this query by querying only fields that are within the index. Or you can create an index that includes every field in your query, including the primary key.
- Approximately 1 row of this table was scanned.
I don't get idea what more I can do for this query optimization, please provide ideas if you have
Be sure you have proper index on :
featured_company.DATE_START
featured_company.PORTAL_ID
job.IS_ACTIVE
job.IS_DELETED
job.EXPIRATION_DATE
job.ACTIVATION_DATE
and eventually
company.IMAGE
Assuming that the id are already indexed
company.ID
featured_company.COMPANY_ID
job.COMPANY_ID
and a suggestion based on the fact you don't use aggregation function don't use group by use DISTINCT instead
company.ID
featured_company.COMPANY_ID
job.COMPANY_ID
SELECT COUNT(*) FROM (
SELECT DISTINCT company.ID
FROM `company`
INNER JOIN `featured_company` ON company.ID=featured_company.COMPANY_ID
INNER JOIN `company_portal` ON company.ID=company_portal.COMPANY_ID
INNER JOIN `job` ON company.ID = job.COMPANY_ID
WHERE featured_company.DATE_START<='2016-09-21'
AND featured_company.DATE_END>='2016-09-21'
AND featured_company.PORTAL_ID=16
AND company_portal.PORTAL_ID=16
AND (company.IMAGE IS NOT NULL AND company.IMAGE<>'')
AND job.IS_ACTIVE=1
AND job.IS_DELETED=0
AND job.EXPIRATION_DATE >= '2016-09-21'
AND job.ACTIVATION_DATE <= '2016-09-21'
)
Related
I have 3 tables in a MySQL database: courses, users and participants, which contains about 30mil, 30k and 3k entries respectively.
My goal is to (efficiently) figure out the number of users that have been assigned to courses that matches our criteria. The criteria is a little more complex, but for this example we only care about users where deleted_at is null and courses where deleted_at is null and active is 1.
Simplified these are the columns:
users:
id
deleted_at
1
null
2
2022-01-01
courses:
id
active
deleted_at
1
1
null
1
1
2020-01-01
2
0
2020-01-01
participants:
id
participant_id
course_id
1
1
1
2
1
2
3
2
2
Based on the data above, the number we would get would be 1 as only user 1 is not deleted and that user assigned to some course (id 1) that is active and not deleted.
Here is a list of what I've tried.
Joining all the tables and do simple where's.
Joining using subqueries.
Pulling the correct courses and users out to the application layer (PHP), and querying participants using WHERE IN.
Pulling everything out and doing the filtering in the application layer.
Calling using EXPLAIN to add better indexes - I, admittedly, do not do this often and may not have done this well enough.
A combination of all the above.
An example of a query would be:
SELECT COUNT(DISTINCT participant_id)
FROM `participants`
INNER JOIN
(SELECT `courses`.`id`
FROM `courses`
WHERE (`active` = '1')
AND `deleted_at` IS NULL) AS `tempCourses` ON `tempCourses`.`id` = `participants`.`course_id`
WHERE `participant_type` = 'Eloomi\\Models\\User'
AND `participant_id` in
(SELECT `users`.`id`
FROM `users`
WHERE `users`.`deleted_at` IS NULL)
From what I can gather doing this will create a massive table, which only then will start applying where's. In my mind it should be possible to short circuit a lot of that because once we get a match for a user, we can disregard that going forward. That would be how to handle it, in my mind, in the application layer.
We could do this on a per-user basis in the application layer, but the number of requests to the database would make this a bad solution.
I have tagged it as PHP as well as MySQL, not because it has to be PHP but because I do not mind offloading some parts to the application layer if that is required. It's my experience that joins do not always use indexes optimally
Edit:
To specify my question: Can someone help me provide a efficient way to pull out the number of non-deleted users that have been assigned to to active non-deleted courses?
I would write it this way:
SELECT COUNT(DISTINCT p.participant_id)
FROM courses AS c
INNER JOIN participants AS p
ON c.id = p.course_id
INNER JOIN users AS u
ON p.participant_id = u.id
WHERE u.deleted_at IS NULL
AND c.active = 1 AND c.deleted_at IS NULL
AND p.participant_type = 'Eloomi\\Models\\User';
MySQL may join the tables in another order, not the order you list the tables in the query.
I hope that courses is the first table MySQL accesses, because it's probably the smallest table. Especially after filtering by active and deleted_at. The following index will help to narrow down that filtering, so only matching rows are examined:
ALTER TABLE courses ADD KEY (active, deleted_at);
Every index implicitly has the table's primary key (e.g. id) appended as the last column. That column being part of the index, it is used in the join to participants. So you need an index in participants that the join uses to find the corresponding rows in that table. The order of columns in the index is important.
ALTER TABLE participants ADD KEY (course_id, participant_type, participant_id);
The participant_id is used to join to the users table. MySQL's optimizer will probably prefer to join to users by its primary key, but you also want to restrict that by deleted_at, so you might need this index:
ALTER TABLE users ADD KEY (id, deleted_at);
And you might need to use an index hint to coax the optimizer to prefer this secondary index over the primary key index.
SELECT COUNT(DISTINCT p.participant_id)
FROM courses AS c
INNER JOIN participants AS p
ON c.id = p.course_id
INNER JOIN users AS u USE INDEX(deleted_at)
ON p.participant_id = u.id
WHERE u.deleted_at IS NULL
AND c.active = 1 AND c.deleted_at IS NULL
AND p.participant_type = 'Eloomi\\Models\\User';
MySQL knows how to use compound indexes even if some conditions are in join clauses and other conditions are in the WHERE clause.
Caveat: I have not tested this. Choosing indexes may take several tries, and testing the EXPLAIN after each try.
Hi I have pagination in angular js in my app, I send the data to my big query that includes the filters that the user set
UPDATE
SQL_CALC_FOUND_ROWS this is my problem. How do I count the rows of specific filters . It is took me 2 second for 100,000 rows I need the number for the pagination as a total number
UPDATE:
I have the following inner query that I missed here :
(select count(*) from students as inner_st where st.name = inner_st.name) as names,
when I remove above inner query is much faster
rows: 50,000
Users table : 4 rows
Classes table : 4 rows
indexes: only id as primary key
query time 20-40 seconds
tables: students.
columns : id, date ,class, name,image,status,user_id,active
table user
coloumn: id,full_name,is_admin
query
SELECT SQL_CALC_FOUND_ROWS st.id,
st.date,
st.image,
st.user_id,
st.status,
st,
ck.name AS class_name,
users.full_name,
(select count(*) from students AS inner_st where st.name = inner_st.name) AS names,
FROM students AS st
LEFT JOIN users ON st.user_id = users.user_id
LEFT JOIN classes AS ck ON st.class = ck.id
WHERE date BETWEEN '2018-01-17' AND DATE_ADD('2018-01-17', INTERVAL 1 DAY)
AND DATE_FORMAT(date,'%H:%i') >= '00:00'
AND DATE_FORMAT(date,'%H:%i') <= '23:59'
AND st.active=1
-- here I can concat filters from web like "and class= 1"
ORDER BY st.date DESC
LIMIT 0, 10
How can I make it faster? when I delete the order by and SQL_CALC_FOUND_ROWS it faster but i need them
I heard about indexes but only primary key is index
Few comments before recommending a different approach to this query:
Did you consider removing SQL_CALC_FOUND_ROWS and instead running two queries (one that counts and one that selects the data)? In some cases it might be quicker than joining them both to one query.
What is the goal of these conditions? What are you trying to achieve? Can we remove them (as it seems they might always return true?) - AND DATE_FORMAT(st.date, '%H:%i') >= '00:00' AND DATE_FORMAT(st.date, '%H:%i') <= '23:59'
You only need 10 results, but the database will have to run the "names" subquery for each of the results before the LIMIT (which might be a lot?). Therefore, I would recommend to extract the subquery from the SELECT clause to a temporary table, index it and join to it (see fixed query below).
To optimize the query, let's begin with adding these indexes:
ALTER TABLE `classes` ADD INDEX `classes_index_1` (`id`, `name`);
ALTER TABLE `students` ADD INDEX `students_index_1` (`active`, `user_id`, `class`, `name`, `date`);
ALTER TABLE `users` ADD INDEX `users_index_1` (`user_id`, `full_name`);
Now create the temporary table (originally this was a subquery in the SELECT clause) and index it:
-- Transformed subquery to a temp table to improve performance
CREATE TEMPORARY TABLE IF NOT EXISTS temp1 AS SELECT
count(*) AS names,
name
FROM
students AS inner_st
WHERE
1 = 1
GROUP BY
name
ORDER BY
NULL
-- This index is required for optimal temp tables performance
ALTER TABLE
`temp1`
ADD
INDEX `temp1_index_1` (`name`, `names`);
And the modified query:
SELECT
SQL_CALC_FOUND_ROWS st.id,
st.date,
st.image,
st.user_id,
st.status,
ck.name AS class_name,
users.full_name,
temp1.names
FROM
students AS st
LEFT JOIN
users
ON st.user_id = users.user_id
LEFT JOIN
classes AS ck
ON st.class = ck.id
LEFT JOIN
temp1
ON st.name = temp1.name
WHERE
st.date BETWEEN '2018-01-17' AND DATE_ADD('2018-01-17', INTERVAL 1 DAY)
AND st.active = 1
ORDER BY
st.date DESC LIMIT 0,
10
Give this a try first:
INDEX(active, date)
Is user_id the PK for users? Is class_id the PK for classes? If not, then they should be INDEXed.
Why are you testing the times separate?
Fix the test so it is obvious which table each column is in.
Do you really need LEFT JOIN? Or would JOIN suffice? In the latter case, there are more optimization options.
Give some realistic examples of other SELECTs; different index(es) may be needed.
Is the "first" page slow? Or only later pages? See this for pagination optimization -- by not using OFFSET.
I need to select some data from MySQL DB using PHP. It can be done within one single MySQL query which takes 5 minutes to run on a good server (multiple JOINs on tables with more that 10 Mio rows).
I was wondering if it is a better practice to split the query in PHP and use some loops, rather than MySQL. Also, would it be better to query all the emails from one table with 150 000 rows in an array and then check the array instead of doing thousands of MySQL SELECTs.
Here is the Query:
SELECT count(contacted_emails.id), contacted_emails.email
FROM contacted_emails
LEFT OUTER JOIN blacklist ON contacted_emails.email = blacklist.email
LEFT OUTER JOIN submission_authors ON contacted_emails.email = submission_authors.email
LEFT OUTER JOIN users ON contacted_emails.email = users.email
GROUP BY contacted_emails.email
HAVING count(contacted_emails.id) > 3
The EXPLAIN returns:
The indexes in the 4 tables are:
contacted_emails: id, blacklist_section_id, journal_id and mail
blacklist: id, email and name
submission_authors: id, hash_key and email
users: id, email, firstname, lastname, editor_id, title_id, country_id, workplace_id
jobtype_id
The table contacted_emails is created like:
CREATE TABLE contacted_emails (
id int(10) unsigned NOT NULL AUTO_INCREMENT,
email varchar(150) COLLATE utf8_unicode_ci NOT NULL,
contacted_at datetime NOT NULL,
created_at datetime NOT NULL,
blacklist_section_id int(11) unsigned NOT NULL,
journal_id int(10) DEFAULT NULL,
PRIMARY KEY (id),
KEY blacklist_section_id (blacklist_section_id),
KEY journal_id (journal_id),
KEY email (email) )
ENGINE=InnoDB AUTO_INCREMENT=4491706 DEFAULT CHARSET=utf8 COLLATE=utf8_unicode_ci
Your indexes look fine.
The performance problems seem to come from the fact that you're JOINing all rows, then filtering using HAVING.
This would probably work better instead:
SELECT *
FROM (
SELECT email, COUNT(id) AS number_of_contacts
FROM contacted_emails
GROUP BY email
HAVING COUNT(id) > 3
) AS ce
LEFT OUTER JOIN blacklist AS bl ON ce.email = bl.email
LEFT OUTER JOIN submission_authors AS sa ON ce.email = sa.email
LEFT OUTER JOIN users AS u ON ce.email = u.email
/* EDIT: Exclude-join clause added based on comments below */
WHERE bl.email IS NULL
AND sa.email IS NULL
AND u.email IS NULL
Here you're limiting your initial GROUPed data set before the JOINs, which is significantly more optimal.
Although given the context of your original query, the LEFT OUTER JOIN tables dom't seem to be used at all, so the below would probably return the exact same results with even less overhead:
SELECT email, COUNT(id) AS number_of_contacts
FROM contacted_emails
GROUP BY email
HAVING count(id) > 3
What exactly is the point of those JOINed tables? the LEFT JOIN prevents them from reducing the data any, and you're only looking at the aggregate data from contacted_emails. Did you mean to use INNER JOIN instead?
EDIT: You mentioned that the point of the joins is to exclude emails in your existing tables. I modified my first query to do a proper exclude join (this was a bug in your originally posted code).
Here's another possible option that may perform well for you:
SELECT
FROM contacted_emails
LEFT JOIN (
SELECT email FROM blacklist
UNION ALL SELECT email FROM submission_authors
UNION ALL SELECT email FROM users
) AS existing ON contacted_emails.email = existing.email
WHERE existing.email IS NULL
GROUP BY contacted_emails.email
HAVING COUNT(id) > 3
What I'm doing here is gathering the existing emails in a subquery and doing a single exclude join on that derived table.
Another way you may try to express this is as a non-correlated subquery in the WHERE clause:
SELECT
FROM contacted_emails
WHERE email NOT IN (
SELECT email FROM blacklist
UNION ALL SELECT email FROM submission_authors
UNION ALL SELECT email FROM users
)
GROUP BY email
HAVING COUNT(id) > 3
Try them all and see which gives the best execution plan in MySQL
A couple of thoughts, in terms of the query you may find it faster if you
count(*) row_count
and change the HAVING to
row_count > 3
as this can be satisfied from the contacted_emails.email index without having to access the row to get the contacted_emails.id. As both fields are NOT NULL and contacted_emails is the base table this should be the same logic.
As this query will only lengthen as you collect more data, I would suggest a summary table where you store the counts (possibly per some time unit). This can either be update periodically with a cronjob or on the fly with triggers and/or application logic.
If you use a per time unit option on created_at and/or store the last update to the cron, you should be able to get live results by pulling in and appending the latest data.
Any cache solution would have to be adjusted anyway to stay live and the full query run every time the data is cleared/updated.
As suggested in the comments, the database is built for aggregating large amounts of data.. PHP isn't.
You would probably be best with a Summary table that is updated via trigger on every insert into your contacted emails table. This Summary table should have the email address and a count column. Every insert into contacted table, update the count. Have an index on your count column in the summary table. Then you can query directly from THAT, have the email account in question, THEN join to get the rest of whatever details need to be pulled.
Following your recommandations, I was choosing this solution:
SELECT ce.email, ce.number_of_contacts
FROM (
SELECT email, COUNT(id) AS number_of_contacts
FROM contacted_emails
GROUP BY email
HAVING number_of_contacts > 3
) AS ce
NATURAL LEFT JOIN blacklist AS bl
NATURAL LEFT JOIN submission_authors AS sa
NATURAL LEFT JOIN users AS u
WHERE bl.email IS NULL AND sa.email IS NULL AND u.email IS NULL
This is taking 10sec to run which is fine for the moment. Once I will have more data in the database, I will need to think about another solution where I will create a temporary table.
So, to conclude, loading an entire table as php array is not good for the performance as making mysql queries.
I have written a query to fetch details from table1, which has this condition clause:
IN(number1,number2......
Up to 323 entries so far now. These numbers are the primary key of table1, which has been extracted from table2 and passed into the IN condition clause.
Due to this my query slows down and takes 13 seconds to run. Is there any other way to overcome this? If I give some constant values (like PK id), the query works in usual time.
You can also do it using LEFT JOIN:
For example:
SELECT T1.*
FROM Table1 T1 LEFT JOIN
Table2 T2 ON T1.numberfield = T2.numberfield
WHERE T2.someotherfield IS NOT NULL
This does the exact job of the query with IN.
try below-
select a.* from table1 a join table2 b
on a.parent_id=b.id;
Note: parent_id should be indexed in table 1 and assuming id will be prmary key of table b means already indexed.
I have the following query in mysql:
SELECT t.ID
FROM forum_categories c, forum_threads t
INNER JOIN forum_posts p ON p.ID = t.Last_post
WHERE t.ForumID=36 OR (c.Parent=36 AND t.ForumID=c.ID)
ORDER BY t.Last_post DESC LIMIT 1
The table forum_threads looks like this:
ID --- Title --- ForumID -- Last_post (ID of Last forum Post)
And the table forum_posts like this:
ID --- Content -- Author
And lastly the table forum_categories like this:
ID -- Name --- Parent (Another forum_categoriey)
(both simplified)
The table forum_posts contains currently ~ 200,000 rows and the table forum_threads ~ 5,000 rows
Somehow these queries take about 1-2 seconds sometimes.
I already indexed "Last_post", but it doesn't help.
The "Copying to tmp table" duration makes ~ 99% of the whole execution time of this query
I also increased the tmp_table_size and the sort_buffer_size but it still makes no difference.
Any ideas?
The query should be much better when you have something as
select t.id
from forum_threads t
inner join forum_posts p ON p.ID = t.Last_post
inner join forum_categories c on t.ForumID=c.ID
WHERE t.ForumID=36 OR c.Parent=36
ORDER BY t.Last_post
DESC LIMIT 1
Now for small set of data it will look very nice and the query time will be really good.
So the next thing how to improve it for large set of data and the answer is INDEX.
There are 2 joins happening
There is a where clause as well
So you will need to index the table properly to avoid full table scan.
You can run the following command to see the current indexes on the tables as
show indexes from forum_threads;
show indexes from forum_posts ;
show indexes from forum_categories ;
The above commands will show you the indexes associated with the tables. Now consider the fact that there is no index so we will need to do the indexing as
alter table forum_threads add index Last_post_idx (`Last_post`);
alter table forum_posts add index ID_idx (`ID`);
alter table forum_categories add index ID_idx (`ID`);
and finally
alter table forum_threads add index ForumID_idx (`ForumID`);
alter table forum_categories add index Parent_idx (`Parent`);
Now we have indexes on the tables and query should be way faster.
NOTE : The joining keys between 2 tables should have identical data type and size so that the indexes works. For example
inner join forum_posts p ON p.ID = t.Last_post
the ID and Last_post should be having same data type and size in the tables.
Now we still have an issue on the query it uses OR condition and even with the proper index the query will try to scan the full table in some cases.
WHERE t.ForumID=36 OR c.Parent=36
So how to get rid of this, well sometime UNION works better in this cases. Meaning you run one query with a condtion
WHERE t.ForumID=36
followed by UNION same query with a different where condition as
WHERE c.Parent=36
But the optimization needs more insight on the tables and the possible queries that are going to be executed on those tables.
The explanation above is just an idea how we can improve the performance of the query and there are many possibilities in real time and these could be handled while having the complete table structures and the possible queries that are going to be applied on them.