I'm having real problems with my Mysql statement, I need to join a few tables together, query them and order by the average of values from another table. This is what I have...
SELECT
ROUND(avg(re.rating), 1)AS avg_rating,
s.staff_id, s.full_name, s.mobile, s.telephone, s.email, s.drive
FROM staff s
INNER JOIN staff_homes sh
ON s.staff_id=sh.staff_id
INNER JOIN staff_positions sp
ON s.staff_id=sp.staff_id
INNER JOIN reliability re
ON s.staff_id=re.staff_id
INNER JOIN availability ua
ON s.staff_id=ua.staff_id
GROUP BY staff_id
ORDER BY avg_rating DESC
Now I believe this to work although I am getting this error "The SELECT would examine more than MAX_JOIN_SIZE rows; check your WHERE and use SET SQL_BIG_SELECTS=1 or SET SQL_MAX_JOIN_SIZE=# if the SELECT is okay".
I think this means that I have too many joins and because it is shared hosting it won't allow large queries to run I don't know.
What I would like to know is exactly what the error means (I have googled it but I don't understand the answers) and how I can work round it by maybe making my query more efficient?
Any help would be appreciated. Thanks
EDIT:
The reason I need the joins is so I can query the tables based on a search function like so...
SELECT
ROUND(avg(re.rating), 1)AS avg_rating
, s.staff_id, s.full_name, s.mobile, s.telephone, s.email, s.drive
FROM staff s
INNER JOIN staff_homes sh
ON s.staff_id=sh.staff_id
INNER JOIN staff_positions sp
ON s.staff_id=sp.staff_id
INNER JOIN reliability re
ON s.staff_id=re.staff_id
INNER JOIN availability ua
ON s.staff_id=ua.staff_id
WHERE s.full_name LIKE '%'
AND s.drive = '1'
AND sh.home_id = '3'
AND sh.can_work = '1'
AND sp.position_id = '3'
AND sp.can_work = '1'
GROUP BY staff_id
ORDER BY avg_rating DESC
EDIT 2
This was the result of my explain. Also I'm not great with MYSQL how would I set up foreign keys?
id select_type table type possible_keys key key_len ref rows Extra
1 SIMPLE ua ALL NULL NULL NULL NULL 14 Using temporary; Using filesort
1 SIMPLE re ALL NULL NULL NULL NULL 50 Using where; Using join buffer
1 SIMPLE sp ALL NULL NULL NULL NULL 84 Using where; Using join buffer
1 SIMPLE sh ALL NULL NULL NULL NULL 126 Using where; Using join buffer
1 SIMPLE s eq_ref PRIMARY PRIMARY 4 web106-prestwick.ua.staff_id 1
EDIT 3: Thanks lc, it was my foreign keys, they were not set up correctly. Problem sorted
Maybe you should use more and/or better indexes on the tables.
According to the db you're using, the optimization may be faster or not with subqueries.
There may be 2 bottlenecks on your query:
try to remove the average function of your query. If your query speeds up, try to replace it with a subquery and see what happens.
The multiple joins often reduce performances, and there's nothing you can do except modifying your db schema. The simplest solution would be to have a table that precomputes data and reduces the work for your db engine. This table can be fulfilled with stored procedures triggered when you modify the data on the implied tables, or you can also modify the table values from your php application.
Related
I have 3 tables in a MySQL database: courses, users and participants, which contains about 30mil, 30k and 3k entries respectively.
My goal is to (efficiently) figure out the number of users that have been assigned to courses that matches our criteria. The criteria is a little more complex, but for this example we only care about users where deleted_at is null and courses where deleted_at is null and active is 1.
Simplified these are the columns:
users:
id
deleted_at
1
null
2
2022-01-01
courses:
id
active
deleted_at
1
1
null
1
1
2020-01-01
2
0
2020-01-01
participants:
id
participant_id
course_id
1
1
1
2
1
2
3
2
2
Based on the data above, the number we would get would be 1 as only user 1 is not deleted and that user assigned to some course (id 1) that is active and not deleted.
Here is a list of what I've tried.
Joining all the tables and do simple where's.
Joining using subqueries.
Pulling the correct courses and users out to the application layer (PHP), and querying participants using WHERE IN.
Pulling everything out and doing the filtering in the application layer.
Calling using EXPLAIN to add better indexes - I, admittedly, do not do this often and may not have done this well enough.
A combination of all the above.
An example of a query would be:
SELECT COUNT(DISTINCT participant_id)
FROM `participants`
INNER JOIN
(SELECT `courses`.`id`
FROM `courses`
WHERE (`active` = '1')
AND `deleted_at` IS NULL) AS `tempCourses` ON `tempCourses`.`id` = `participants`.`course_id`
WHERE `participant_type` = 'Eloomi\\Models\\User'
AND `participant_id` in
(SELECT `users`.`id`
FROM `users`
WHERE `users`.`deleted_at` IS NULL)
From what I can gather doing this will create a massive table, which only then will start applying where's. In my mind it should be possible to short circuit a lot of that because once we get a match for a user, we can disregard that going forward. That would be how to handle it, in my mind, in the application layer.
We could do this on a per-user basis in the application layer, but the number of requests to the database would make this a bad solution.
I have tagged it as PHP as well as MySQL, not because it has to be PHP but because I do not mind offloading some parts to the application layer if that is required. It's my experience that joins do not always use indexes optimally
Edit:
To specify my question: Can someone help me provide a efficient way to pull out the number of non-deleted users that have been assigned to to active non-deleted courses?
I would write it this way:
SELECT COUNT(DISTINCT p.participant_id)
FROM courses AS c
INNER JOIN participants AS p
ON c.id = p.course_id
INNER JOIN users AS u
ON p.participant_id = u.id
WHERE u.deleted_at IS NULL
AND c.active = 1 AND c.deleted_at IS NULL
AND p.participant_type = 'Eloomi\\Models\\User';
MySQL may join the tables in another order, not the order you list the tables in the query.
I hope that courses is the first table MySQL accesses, because it's probably the smallest table. Especially after filtering by active and deleted_at. The following index will help to narrow down that filtering, so only matching rows are examined:
ALTER TABLE courses ADD KEY (active, deleted_at);
Every index implicitly has the table's primary key (e.g. id) appended as the last column. That column being part of the index, it is used in the join to participants. So you need an index in participants that the join uses to find the corresponding rows in that table. The order of columns in the index is important.
ALTER TABLE participants ADD KEY (course_id, participant_type, participant_id);
The participant_id is used to join to the users table. MySQL's optimizer will probably prefer to join to users by its primary key, but you also want to restrict that by deleted_at, so you might need this index:
ALTER TABLE users ADD KEY (id, deleted_at);
And you might need to use an index hint to coax the optimizer to prefer this secondary index over the primary key index.
SELECT COUNT(DISTINCT p.participant_id)
FROM courses AS c
INNER JOIN participants AS p
ON c.id = p.course_id
INNER JOIN users AS u USE INDEX(deleted_at)
ON p.participant_id = u.id
WHERE u.deleted_at IS NULL
AND c.active = 1 AND c.deleted_at IS NULL
AND p.participant_type = 'Eloomi\\Models\\User';
MySQL knows how to use compound indexes even if some conditions are in join clauses and other conditions are in the WHERE clause.
Caveat: I have not tested this. Choosing indexes may take several tries, and testing the EXPLAIN after each try.
I have a fairly simple query which runs okay when I test it in phpMyAdmin:
SELECT
c.customers_id,
c.customers_cid,
c.customers_gender,
c.customers_firstname,
c.customers_lastname,
c.customers_email_address,
c.customers_telephone,
c.customers_date_added,
ab.entry_company,
ab.entry_street_address,
ab.entry_postcode,
ab.entry_city,
COUNT(o.customers_id) AS orders_number,
SUM(ot.value) AS totalvalue,
mb.bonus_points
FROM
orders AS o,
orders_total AS ot,
customers AS c,
address_book AS ab,
module_bonus AS mb
WHERE
c.customers_id = o.customers_id
AND c.customers_default_address_id = ab.address_book_id
AND c.customers_id = mb.customers_id
AND o.orders_id = ot.orders_id
AND ot.class = 'ot_subtotal'
** AND c.customers_gender = 'm' AND c.customers_lastname LIKE 'Famlex'
GROUP BY o.customers_id
The row marked with ** changes depending on filtering settings of the application making the query.
Now, when I test this in phpMyAdmin, the query takes a couple of seconds to run (which is fine, since there are thousands of entries and, as far as I know, when using COUNTs and SUMs indexes don't help) and the results are perfect, but when I run the exact same query in PHP (echoed before running), the MySQL thread loads a core to 100% and doesn't stop until I kill it.
If I strip the extra stuff to calculate the COUNT and SUM, the query finishes but the results are useless to me.
EXPLAIN:
1 SIMPLE mb ALL NULL NULL NULL NULL 48713 Using temporary; Using filesort
1 SIMPLE ot ALL idx_orders_total_orders_id NULL NULL NULL 811725 Using where
1 SIMPLE o eq_ref PRIMARY PRIMARY 4 db.ot.orders_id 1 Using where
1 SIMPLE c eq_ref PRIMARY PRIMARY 4 db.o.customers_id 1 Using where
1 SIMPLE ab eq_ref PRIMARY PRIMARY 4 db.c.customers_default_address_id 1
EXPLAIN after applying indexes and using joins:
1 SIMPLE c ref PRIMARY,search_str_idx search_str_idx 98 const 1 Using where; Using temporary; Using filesort
1 SIMPLE mb ALL NULL NULL NULL NULL 48713 Using where
1 SIMPLE ab eq_ref PRIMARY PRIMARY 4 db.c.customers_default_address_id 1
1 SIMPLE ot ref idx_orders_total_orders_id,class class 98 const 157004 Using where
1 SIMPLE o eq_ref PRIMARY PRIMARY 4 db.ot.orders_id 1 Using where
Use explicit join instead of implicit
SELECT
c.customers_id,
c.customers_cid,
c.customers_gender,
c.customers_firstname,
c.customers_lastname,
c.customers_email_address,
c.customers_telephone,
c.customers_date_added,
ab.entry_company,
ab.entry_street_address,
ab.entry_postcode,
ab.entry_city,
COUNT(o.customers_id) AS orders_number,
SUM(ot.value) AS totalvalue,
mb.bonus_points
FROM
orders o
join orders_total ot on o.orders_id = ot.orders_id
join customers c on c.customers_id = o.customers_id
join address_book ab on c.customers_default_address_id = ab.address_book_id
join module_bonus mb on c.customers_id = mb.customers_id
where
ot.class = 'ot_subtotal'
c.customers_gender = 'm'
AND c.customers_lastname = 'Famlex'
GROUP BY o.customers_id
Assuming all the joining keys are also primary key of those tables viz:
o.orders_id, c.customers_id, ab.address_book_id
You will need to add the following indexes if they are not added already
alter table orders add index customers_id_idx(customers_id);
alter table module_bonus add index customers_id_idx(customers_id);
alter table orders_total add index orders_id_idx(orders_id);
alter table orders_total add index orders_class_idx(class);
alter table customers add index search_str_idx(customers_gender,customers_lastname);
Make sure to take a backup of the tables before applying indexes.
Can you share SQL dump of your record so that I can take a look ?
phpMyAdmin automatically adds limit clause to the select queries, that is why I think you are getting the impression that in phpMyAdmin query is running fine but not via you PHP script.
Try to add explicit limit clause to the query say limit 0, 1000 in phpMyAdmin before running and see if that makes performance of phpMyAdmin slower.
I've seen a few questions dabbling with the inefficiency of "NOT IN" in MySQL queries, but I didn't manage to reproduce the proposed solutions.
So I've got some sort of search engine. It starts with very simple queries, and then tries more complicated ones if it doesn't find enough results. Here's how it works in pseudocode
list_of_ids = do_simple_search()
nb_results = size_of(list_of_ids)
if nb_results < max_nb_results :
list_of_ids .= do_search_where_id_not_in(list_of_ids)
if nb_results < max_nb_results :
list_of_ids .= do_complicated_search_where_id_not_in(list_of_ids)
Hope I'm clear.
Anyway here's the slow query, as shown by MySQL-slow :
SELECT DISTINCT c.id
FROM clients c LEFT JOIN communications co ON c.id = co.client_id
WHERE (co.titre LIKE 'S' OR co.contenu LIKE 'S') AND c.id NOT IN(N)
LIMIT N, N
And here's an EXPLAIN on that query :
id select_type table type possible_keys key key_len ref rows Extra
1 SIMPLE c index PRIMARY PRIMARY 2 NULL 25250 Using where; Using index; Using temporary
1 SIMPLE co ref qui_com,id_client,titre id_client 2 klients.c.id 8 Using where; Distinct
MySQL version is 5.1.63-0ubuntu0.11.04.1-log
Maybe my approach is wrong here ? How would you do it ? thanks.
A couple of remarks:
1) Why do you do LEFT JOIN i/o (INNER) JOIN? LEFT JOIN means that you want to also get the records which are not matched agains clients, is that the intention? If not, then JOIN i/o LEFT JOIN is quicker.
2) Why do you need JOIN at all if you can simply do:
SELECT DISTINCT co.client_id from communications co
WHERE (co.titre LIKE 'S' OR co.contenu LIKE 'S') AND co.id!=N LIMIT N,N;
Also, if you do a JOIN, both joined fields must be indexes, otherwise it's slow too.
More importantly, you condition both client_id and id from communications table, but there is no common index for these both, which means more work to execute your query (hence using temporary which is in general not a good sign).
3) You do a complex condition on both co.titre and co.contenu, you seem to have indexes but they are not used. That means this part might be potentially quite slow.
so I have this query
SELECT a.`title`,a.`id`,a.`numvol`,a.`numepi`,a.`release_date`,
(SELECT COUNT(id) FROM `Release` r WHERE r.`article_id`=a.`id`) AS `num_rows`,
(SELECT COUNT(id) FROM `Article_views` av WHERE av.`article_id`=a.`id`) AS `num_rows2`
FROM `Article` a WHERE a.`type` = 'ani' ORDER BY a.`title` ASC
The first load takes up to 5 secs and if I do a refresh it will take about 0.001 sec, is there a way to uniform the loading time?
id select_type table type possible_keys key key_len ref rows Extra
1 PRIMARY a ALL NULL NULL NULL NULL 567 Using where; Using filesort
3 DEPENDENT SUBQUERY av ALL NULL NULL NULL NULL 5301 Using where
2 DEPENDENT SUBQUERY r ALL NULL NULL NULL NULL 11717 Using where
I tried to do it with join but it didn't work at all so I gave up this way...
Solution
use barmar query. way better :) (and indexes -,-')
id select_type table type possible_keys key key_len ref rows Extra
1 PRIMARY <derived2> ALL NULL NULL NULL NULL 536 Using temporary; Using filesort
1 PRIMARY a eq_ref PRIMARY PRIMARY 4 r.art.. 1 Using where
1 PRIMARY <derived3> ALL NULL NULL NULL NULL 574 Using where; Using join buffer
3 DERIVED Article_views index NULL article_id 4 NULL 5301 Using index
2 DERIVED Release index NULL article_id 4 NULL 11717 Using index
Thanks guys for your time and the solution :) I guess I need to redo a good part of this old project ahah :)
Try this query instead:
SELECT a.`title`,a.`id`,a.`numvol`,a.`numepi`,a.`release_date`, `num_rows`, `num_rows2`
FROM `Article` a
JOIN (SELECT article_id, COUNT(*) AS num_rows
FROM Release
GROUP BY article_id) r
ON r.article_id = a.id
JOIN (SELECT article_id, COUNT(*) AS num_rows2
FROM Article_views
GROUP BY article_id) av
ON av.article_id = a.id
WHERE a.`type` = 'ani'
ORDER BY a.`title` ASC
In my experience, JOINs are faster than correlated subqueries.
For performance, make sure you have indexes on Release.article_id and Article_views.article_id.
I guess, 2nd try is benefit from SQL QUERY CACHE. I wonder if adding SQL_NO_CACHE, every try took 5 secs?
SELECT SQL_NO_CACHE a.`title`,a.`id`,a.`numvol`,a.`numepi`,a.`release_date`,
....
INDEXES
Oops. you have no relevant INDEX. could you add following indexes?
ALTER TABLE Article ADD INDEX(type);
ALTER TABLE Release ADD INDEX(article_id);
ALTER TABLE Article_views ADD INDEX(article_id);
More Efficient Query
And your Query converted into JOIN. I guess this is much faster than yours. Assuming every Article has Release and Article_views
SELECT a.`title`,a.`id`,a.`numvol`,a.`numepi`,a.`release_date`,
COUNT(r.id) AS `num_rows`,
COUNT(av.id) AS `num_rows2`
FROM `Article` a JOIN Release r ON r.`article_id`=a.`id`
JOIN Article_views av ON av.`article_id`=a.`id`
WHERE a.`type` = 'ani'
GROUP BY a.title, a.id, a.numvol, a.numepi, a.release_date
ORDER BY a.`title` ASC;
That significant improvement on query latency is due to internal MySQL cache functioning.
After the first execution of the query the result set is cached in RAM, so the results second query which just matches the previous, are immediately taken from the RAM without HDD accesses.
There are different points of view about MySQL internal cache, and experts often recommend to disable it in highload production environments using memecached, Redis or some other caching layer instead.
But definitely you should try to optimize performance of your query with caching turned off - 5 seconds is extremely slow.
Try not to use subqueries, cause MySQL optimizer is not performant on them.
Store values of the counters (results of count()) in separate table and update them properly. Then you can use just the counters values in your query without performing heavy database request each time.
Create indexes, for example, for type field.
Use EXPLAIN for further optimizations
I need some help optimizing some queries for my database. I do understand the use of indexes in helping with joins and order by statements to help speed things up, but I was wondering if there were some techniques available to avoid using filesort and using temporary when I use the EXPLAIN command. Here's an example of what I am using.
SELECT a.id, DATE_FORMAT(a.submitted_at, '%d-%b-%Y') as submitted_at, a.user_id,
data1.*,
data2.name, data2.type,
u.first_name, u.last_name
FROM applications AS a
LEFT JOIN users AS u ON u.id = a.user_id
LEFT JOIN score_table AS data1 ON data1.applications_id = a.id
LEFT JOIN sections AS data2 ON data2.id = data1.section_id
WHERE category_id = [value] && submitted_at IS NOT NULL
ORDER BY data2.type
Again, indexes are being used properly in my queries like the one up above. If I take out the ORDER BY clause, then the query executes quickly from using the proper indexes. I understand that the order of the joins can affect the performance of the query. When I test using the ORDER BY on the users table, since it is the next table after the "const", it will only use "Using where, Using Filesort" on EXPLAIN. If I drop to any of the other tables, we get into the "Using Temporary" issue.
My question is: what would be the best way to optimize queries like this to run faster and in the best case scenario, avoid using filesort/temporary in EXPLAIN? I'm open to any possibilities :) I'm more or less interested in the theory on how to make queries like this perform better, than this exact query as I am having to perform more and more of these deep level ORDER BY queries in the database I'm working on.
--EDIT--
Here's the explain of the query above.....
id select_type table type possible_keys key key_len ref rows Extra
1 SIMPLE a ref category_id,submitted_at category_id 4 const 49 Using where; Using temporary; Using filesort
1 SIMPLE u eq_ref PRIMARY PRIMARY 4 a.user_id 1
1 SIMPLE data1 ref app id app id 4 a.id 7
1 SIMPLE data2 eq_ref PRIMARY PRIMARY 4 data1.section_id 1
Couple of things.
Are you sure you need to use 'LEFT JOIN'? Looking at the query it looks like you can get away with 'INNER JOIN' which will reduce the number of potential rows.
You didn't post your schema, but I assume that users.id, applications.user_id, score_table.applications_id, applications.id, sections.id and score_table.section_id are all ints? If they are non-ints I would strongly urge you to convert them. And if not primary keys, be sure they are indexed.
I wouldn't run any mysql level data formatting (i.e. DATE_FORMAT), as it will create some overhead during the query, rather I would format data like this at the app layer.
The ORDER BY forces MySQL to create a temp table in order to sort correctly, so be sure you absolutely need this functionality. If so, be sure that sections.type is indexed.
I would consider using a different alias naming convention. data1 and data2 are so abstract it's difficult to discern what they are actually referring to. I would suggest you use an abbreviated construct of the table you are aliasing, for example; applications becomes app (instead of a), score_table becomes score (instead of data1), etc.