I am running this query on my website in order to find a ToDo list based on specific criteria. But it runs too slow and it is probably possible to write it in another way.
SELECT * FROM lesson WHERE
id IN
(SELECT `lesson_id` FROM `localization_logging`
WHERE `language_id` = 2 AND `action_id` = 1)
AND `id` NOT IN
(SELECT `lesson_id` FROM `localization_logging`
WHERE `language_id` = 2 AND `part_id` = 1 AND `action_id` = 6)
What the query does is that it looks in the lesson table to find all lesson list names and then checks if a specific task is done. If the task is done in one todo than show it in the next. Action 1 is done but not action 6 in this case.
I hope I'm explaining this good enough. On my local machine the query takes 1.8 seconds, and sometimes I have to print multiple lists next to each others and then it takes 1.8 times the lists which makes the page load super slow.
Something like this for mark id as completed:
SELECT l.*, SUM(ll.action_id=6) completed FROM lesson l
INNER JOIN localization_logging ll ON ll.lesson_id = l.id
WHERE ll.language_id = 2 AND
(
ll.action_id = 1
OR
ll.action_id = 6 AND ll.part_id == 1
)
GROUP BY l.id
And now we can wrap it with:
SELECT t.* FROM (...) t WHERE t.completed = 0
You'll usually get faster queries filtering rows with INNER/LEFT JOIN, but you need to test it.
SELECT lesson.* FROM lesson
INNER JOIN localization_logging task1
ON lesson.id = task1.lesson_id
LEFT JOIN localization_logging task2
ON lesson.id = task2.lesson_id
AND task2.language_id = 2
AND task2.part_id = 1
AND task2.action_id = 6
WHERE task1.language_id = 2
AND task1.action_id = 1
AND task2.lesson_id IS NULL
Second table is joined on multiple conditions, but have to list them within ON clause because only results that were in result "force joined" as nulls (left join means left side stays no matter what) are required.
Btw. You'll get multiple rows from lesson if task1 condition is not limiting results to one row - GROUP BY lesson.id then.
Related
I programmed a filter which generates a Query to show special employees.
I have table employees and a lot of 1:1, 1:n and n:m relationships e.g. for skills and languages for the employees like this:
Employees
id name
1 John
2 Mike
Skills
id skill experience
1 PHP 3
2 SQL 1
Employee_Skills
eid sid
1 1
1 2
Now I want to filter employees which have at least 2 years experience in using PHP and 1 year SQL.
My filter always generates a correct working Query for every table, relationship and field.
But now my problem is when I would like to filter the same field in a related table multiple times with a and it does not work.
e.g.
John PHP 3
John SQL 1
PHP and SQL are different rows so AND can not work.
I tried using group_concat and find_in_set but I have the problem that I can not filter experience over 2 years with find_in_set and find_in_set does not know PHP is 3 and SQL is 1.
I also tried
WHERE emp.id IN (SELECT eid FROM Employee_Skills WHERE sid IN (SELECT id FROM Skills WHERE skill = 'PHP' AND experience > 1)) AND emp.id IN (SELECT eid FROM Employee_Skills WHERE sid IN (SELECT id FROM Skills WHERE skill = 'SQL' AND experience > 0))
which works for this example, but it only works for n:m and it too complex to know the relationship type.
I have the final Query with
ski.skill = 'PHP' AND ski.experience > 1 AND ski.skill = 'SQL' AND ski.experience > 0
and I would like to manipulate the Query to make it work.
How does a Query have to look like to deal with relational division.
you can try next approach:
select * from Employees
where id in (
select eid
from Employee_Skills as a
inner join
Skills as ski
on (a.sid = ski.id)
where
(ski.skill = 'PHP' AND a.experience > 2) OR
(ski.skill = 'SQL' AND a.experience > 1)
group by eid
having count(*) = 2
)
so, for every filter you will add OR statement, having will filter employees with all filters passed, just pass appropriate number
You could make a kind of pivot query, where you put the experience in each of all of the known skills in columns. This could be a long query, but you could build it dynamically in php, so it would add all skills as columns to the final query, which would look like this:
SELECT e.*, php_exp, sql_exp
FROM Employee e
INNER JOIN (
SELECT es.eid,
SUM(CASE s.skill WHEN 'PHP' THEN s.experience END) php_exp,
SUM(CASE s.skill WHEN 'SQL' THEN s.experience END) sql_exp,
SUM(CASE s.skill WHEN 'JS' THEN s.experience END) js_exp
-- do the same for other skills here --
FROM Employee_Skills es
INNER JOIN Skills s ON es.sid = s.id
GROUP BY es.eid
) pivot ON pivot.eid = e.id
WHERE php_exp > 2 AND sql_exp > 0;
The WHERE clause is then very concise and intuitive: you use the logical operators like in other circumstances.
If the set of skills is rather static, you could even create a view for the sub-query. Then the final SQL is quite concise.
Here is a fiddle.
Alternative
Using the same principle, but using the SUM in the HAVING clause, you can avoid gathering all skill's experiences:
SELECT e.*
FROM Employee e
INNER JOIN (
SELECT es.eid
FROM Employee_Skills es
INNER JOIN Skills s ON es.sid = s.id
GROUP BY es.eid
HAVING SUM(CASE s.skill WHEN 'PHP' THEN s.experience END) > 2
AND SUM(CASE s.skill WHEN 'SQL' THEN s.experience END) > 0
) pivot ON pivot.eid = e.id;
Here is a fiddle.
You can also replace the CASE construct by the IF function, like this:
HAVING SUM(IF(s.skill='PHP', s.experience, 0)) > 2
... etc.
But it comes down to the same.
The straightforward way would be to repeatedly JOIN the skills:
SELECT e.*
FROM Employees AS e
JOIN Employee_Skills AS j1 ON (e.id = j1.eid)
JOIN Skills AS s1 ON (j1.sid = s1.id AND s1.skill = 'PHP' AND s1.experience > 3)
JOIN Employee_Skills AS j2 ON (e.id = j2.eid)
JOIN Skills AS s2 ON (j2.sid = s2.id AND s2.skill = 'SQL' AND s2.experience > 1)
...
Since all the clauses are required this translated to a straight JOIN.
You will need to add two JOINs for each clause, but they're quite fast joins.
A more hackish way would be to compress the skills into a code in a 1:1 relation with the employees. If experience never exceeds, say, 30, then you can multiply the first condition's experience by 1, the second by 30, the third by 30*30, the fourth by 30*30*30... and never get an overflow.
SELECT eid, SUM(CASE skill
WHEN 'PHP' THEN 30*experience
WHEN 'SQL' THEN 1*experience) AS code
FROM Employees_Skills JOIN Skills ON (Skills.id = Employees_Skills.sid)
GROUP BY eid HAVING code > 0;
Actually since you want 3 years PHP, you can HAVE code > 91. If you had three conditions with experiences 2, 3 and 5, you would request more than x = 2*30*30 + 3*30 + 5. This only serves to whittle the results, since 3*30*30 + 2*30 + 4 still passes the filter but is of no use to you. But since you want a restriction on code, and "> x" costs the same as "> 0" and gives better results... (if you needed more complex filtering than a series of AND, > 0 is safer, though).
The table above you join with Employees, then on the result you perform the true filtering, requiring
((code / 30*30) % 30) > 7 // for instance :-)
AND
((code / 30) % 30) > 3 // for PHP
AND
((code / 1) % 30) > 1 // for SQL
(the *1 and /1 are superfluous, and only inserted to clarify)
This solution requires a full table scan on Skills, with no real possibility of automatic optimizations. So it is slower than the other solution. On the other hand, its cost grows much more slowly, so if you have complex queries, or need OR operators or conditional expressions instead of ANDs, it may be more convenient to implement the "hackish" solution.
I have the below two tables and I need to be able to search by items to find the shopping_list_id. Also, I want to limit the query so that it doesn't bring back other shopping lists with additional items on it. Essentially, I'm checking to see if this is a shopping list the user has saved before. The below query does NOT handle if there are shopping lists that match but with additional items, I'm stumped as to how to do that.
tables:
shopping_list
shopping_list_id
user
shopping_list_name
shopping_list_item
shopping_list_item_id
shopping_list_id
category_id
qty
qty_unit_id
This example has three items, but there could be any number. My PHP code dynamically generates the SQL joins and where clause based on the user's input.
Query that I have:
SELECT DISTINCT sli.shopping_list_id
FROM shopping_list_item sli
JOIN shopping_list sl ON sli.shopping_list_id=sl.shopping_list_id
JOIN shopping_list_item sli0 on sli.shopping_list_id=sli0.shopping_list_id
JOIN shopping_list_item sli1 on sli.shopping_list_id=sli1.shopping_list_id
JOIN shopping_list_item sli2 on sli.shopping_list_id=sli2.shopping_list_id
WHERE sl.user_id=:webuser_id
AND sli0.category_id=3 AND sli0.qty=1 AND sli0.qty_unit_id=3
AND sli1.category_id=683 AND sli1.qty=1 AND sli1.qty_unit_id=3
AND sli2.category_id=309 AND sli2.qty=1 AND sli2.qty_unit_id=7
You can do this pretty easily with the group by/having approach to this type of query:
select sli.shopping_list_id
from shopping_list_item sli
group by sli.shopping_list_id
having sum(sli.category_id = 3 AND sli.qty = 1 AND sli.qty_unit_id) = 1 and
sum(sli.category_id = 683 AND sli.qty = 1 AND sli.qty_unit_id = 3) = 1 and
sum(sli.category_id = 309 AND sli.qty = 1 AND sli.qty_unit_id = 7) = 1 and
count(*) = 3;
I'm currently trying to join two tables with a left join:
--portal--
id_portal (index)
id_venue
name_portal
--access--
id_access (index)
id_event
id_portal
id_tickets
scan_access
'access' contains a number of ticket types per portal for each event. I need to combine these to get the sum total of the scan_access column for each portal but include the portals that have 'null' scan_access to come up with '0'. To achieve this I've used a left join:
SELECT portal.name_portal, SUM(access.scan_access) AS total_scan
FROM portal LEFT JOIN access ON portal.id_portal = access.id_portal
WHERE portal.id_venue = $venueId
GROUP BY portal.id_portal
ORDER BY portal.id_portal ASC
which means I get the following:
Portal 1 - Null
Portal 2 - 40
Portal 3 - 33
Portal 4 - Null
but I have an issue when I need to also get the above result when taking into account the event (id_event) because when I use the following:
SELECT portal.name_portal, SUM(access.scan_access) AS total_scan
FROM portal LEFT JOIN access ON portal.id_portal = access.id_portal
WHERE portal.id_venue = $venueId AND access.id_event = 20
GROUP BY portal.id_portal
ORDER BY portal.id_portal ASC
I get:
Portal 2 - 40
Portal 3 - 33
which makes sense as those are the only two rows that have an id_event value. But how can I take this col into account without losing the other portals? also, is there a way in sql to make the 'null' a zero when returning a result? (I can fix the null after with php but wanted to see if it was possible)
By putting access.id_event = 20 in your WHERE clause, you turn your LEFT JOIN into an INNER JOIN. Move access.id_event = 20 into your join criteria to preserve your LEFT JOIN. As #echo_me mentioned, you can use COALESCE() to get rid of your zeroes. I'd put it around the SUM(), instead of inside.
SELECT portal.name_portal, COALESCE( SUM(access.scan_access), 0 ) AS total_scan
FROM portal LEFT JOIN access ON portal.id_portal = access.id_portal AND access.id_event = 20
WHERE portal.id_venue = $venueId
GROUP BY portal.id_portal
ORDER BY portal.id_portal ASC
to convert NULL to 0 use this
COALESCE(col, 0)
in your example it will be
SUM(COALESCE(access.scan_access, 0)) AS total_scan
I need help with an advanced SQL-query (MSSQL 2000).
I have a table called Result that lists athletics 100 and 200 meter race-times. A runner can have several racetimes but I want to show only the best time from each runner at each event.
The Result-table contains five columns, Result_id, athlete_id, result_time, result_date, event_code. So athlete_id must be unique when I list the values and result_time must be the fastest (lowest) value. Also I want to be able to choose if event_code should be "= 1" or "= 2", since 100 and 200 meter resulttimes are mixed in the same table.
I asked a similiar question a few days ago, but without the event_code condition.
This is the answer we came up with.
select r.*
from result r
inner join (
select athelete_id, min(result_time) as FastestTime
from result
group by athelete_id
) rm on r.athelete_id = rm.athelete_id and r.result_time = rm.FastestTime
Any ideas how I can add the event_code condition to this snippet?
Try this:
select r.*
from result r
inner join (
select athelete_id, min(result_time) as FastestTime
from result
where event_code = 1
group by athelete_id
) rm on r.athelete_id = rm.athelete_id and r.result_time = rm.FastestTime
Include the event code in the output of the subquery. Then you can show all events at the same time or choose them in an outer where clause:
select r.*
from result r inner join
(select athlete_id, event_code, min(result_time) as FastestTime
from result
group by athlete_id, event_code
) rm
on r.athelete_id = rm.athlete_id and
r.result_time = rm.FastestTime and
r.event_code = rm.event_code
-- where event_code = xx
The last line is an optional WHERE clause in case you want just one event at a time.
This is by far the slowest query in my web application.
SELECT prof.user_id AS userId,
prof.first_name AS first,
prof.last_name AS last,
prof.birthdate,
prof.class_string AS classes,
prof.city,
prof.country,
prof.state,
prof.images,
prof.videos,
u.username,
u.avatar,
(SELECT Count(*)
FROM company_member_sponsorship
WHERE member_id = prof.user_id
AND status = 'sponsored') AS sponsor_count,
(SELECT Count(*)
FROM member_schedules
WHERE user_id = prof.user_id) AS sched_count
FROM member_profiles prof
LEFT JOIN users u
ON u.id = prof.user_id
ORDER BY ( prof.images + prof.videos * 5 + (
CASE
WHEN prof.expire_date > :time THEN 50
ELSE 0
end ) + sponsor_count * 20 + sched_count * 4
) DESC,
prof.last_name ASC
LIMIT :start, :records
Everything else on the site takes less than a second to load even with lots of queries happening on all levels. This one takes about 3-4 seconds.
It's obviously the table scans that are causing the slowdown. I can understand why; the first table has 50,000+ rows, the second 160,000+ rows.
Is there any way I can optimize this query to make it go faster?
If worse comes to worst I can always go through my code and maintain a tally for sponsorships and events in the profile table like I do for images and videos though I'd like to avoid it.
EDIT: I added the results of an EXPLAIN on the query.
id select_type table type possible_keys key key_len ref rows Extra
1 PRIMARY prof ALL NULL NULL NULL NULL 44377 Using temporary; Using filesort
1 PRIMARY u eq_ref PRIMARY PRIMARY 3 mxsponsor.prof.user_id 1
3 DEPENDENT SUBQUERY member_schedules ref user_id user_id 3 mxsponsor.prof.user_id 6 Using index
2 DEPENDENT SUBQUERY company_member_sponsorship ref member_id member_id 3 mxsponsor.prof.user_id 2 Using where; Using index
EDIT2:
I ended up dealing with the problem by maintaining a count in the member profile. Wherever sponsorships/events are added/deleted I just invoke a function that scans the sponsorship/events table and updates the count for that member. There might still be a way to optimize a query like this, but we're publishing this site rather soon so I'm going with the quick and dirty solution for now.
Not guaranteed to work, but try using join and group by rather than inner selects:
SELECT prof.user_id AS userId,
prof.first_name AS first,
prof.last_name AS last,
prof.birthdate,
prof.class_string AS classes,
prof.city,
prof.country,
prof.state,
prof.images,
prof.videos,
u.username,
u.avatar,
Count(cms.id) AS sponsor_count,
Count(ms.id) AS sched_count
FROM member_profiles prof
LEFT JOIN users u
ON u.id = prof.user_id
LEFT JOIN company_member_sponsorship cms
ON cms.member_id = prof.user_id
AND cms.status = 'sponsored'
LEFT JOIN member_schedules ms
ON ms.user_id = prof.user_id
GROUP BY u.id
ORDER BY ( prof.images + prof.videos * 5 + (
CASE
WHEN prof.expire_date > :time THEN 50
ELSE 0
end ) + sponsor_count * 20 + sched_count * 4
) DESC,
prof.last_name ASC
LIMIT :start, :records
If that's not any better, a explain of that query would help.