How to optimize query with table scans?

How to optimize query with table scans? - php

This is by far the slowest query in my web application.
SELECT prof.user_id AS userId,
prof.first_name AS first,
prof.last_name AS last,
prof.birthdate,
prof.class_string AS classes,
prof.city,
prof.country,
prof.state,
prof.images,
prof.videos,
u.username,
u.avatar,
(SELECT Count(*)
FROM company_member_sponsorship
WHERE member_id = prof.user_id
AND status = 'sponsored') AS sponsor_count,
(SELECT Count(*)
FROM member_schedules
WHERE user_id = prof.user_id) AS sched_count
FROM member_profiles prof
LEFT JOIN users u
ON u.id = prof.user_id
ORDER BY ( prof.images + prof.videos * 5 + (
CASE
WHEN prof.expire_date > :time THEN 50
ELSE 0
end ) + sponsor_count * 20 + sched_count * 4
) DESC,
prof.last_name ASC
LIMIT :start, :records
Everything else on the site takes less than a second to load even with lots of queries happening on all levels. This one takes about 3-4 seconds.
It's obviously the table scans that are causing the slowdown. I can understand why; the first table has 50,000+ rows, the second 160,000+ rows.
Is there any way I can optimize this query to make it go faster?
If worse comes to worst I can always go through my code and maintain a tally for sponsorships and events in the profile table like I do for images and videos though I'd like to avoid it.
EDIT: I added the results of an EXPLAIN on the query.
id select_type table type possible_keys key key_len ref rows Extra
1 PRIMARY prof ALL NULL NULL NULL NULL 44377 Using temporary; Using filesort
1 PRIMARY u eq_ref PRIMARY PRIMARY 3 mxsponsor.prof.user_id 1
3 DEPENDENT SUBQUERY member_schedules ref user_id user_id 3 mxsponsor.prof.user_id 6 Using index
2 DEPENDENT SUBQUERY company_member_sponsorship ref member_id member_id 3 mxsponsor.prof.user_id 2 Using where; Using index
EDIT2:
I ended up dealing with the problem by maintaining a count in the member profile. Wherever sponsorships/events are added/deleted I just invoke a function that scans the sponsorship/events table and updates the count for that member. There might still be a way to optimize a query like this, but we're publishing this site rather soon so I'm going with the quick and dirty solution for now.

Not guaranteed to work, but try using join and group by rather than inner selects:
SELECT prof.user_id AS userId,
prof.first_name AS first,
prof.last_name AS last,
prof.birthdate,
prof.class_string AS classes,
prof.city,
prof.country,
prof.state,
prof.images,
prof.videos,
u.username,
u.avatar,
Count(cms.id) AS sponsor_count,
Count(ms.id) AS sched_count
FROM member_profiles prof
LEFT JOIN users u
ON u.id = prof.user_id
LEFT JOIN company_member_sponsorship cms
ON cms.member_id = prof.user_id
AND cms.status = 'sponsored'
LEFT JOIN member_schedules ms
ON ms.user_id = prof.user_id
GROUP BY u.id
ORDER BY ( prof.images + prof.videos * 5 + (
CASE
WHEN prof.expire_date > :time THEN 50
ELSE 0
end ) + sponsor_count * 20 + sched_count * 4
) DESC,
prof.last_name ASC
LIMIT :start, :records
If that's not any better, a explain of that query would help.

Related

Join a 3rd table when the result of the initial join has NULL values (that I want retained!)

I have 3 tables:
event_timestamps with colums Race_number, timestamp
event_entry with Race_number, User_id
user with user_id, Firstname, lastname
I want to find race_numbers that are not linked to a user_id, and count laps while I'm at it by Joining event_timestamps and event_entry.
select event_entry.user_id, max(timestamp), event_timestamps.race_number, count(event_timestamps.race_number)
from event_timestamps
left join event_entry on event_timestamps.race_number = event_entry.race_number and event_entry.event_id=430
where timestamp > '2022-05-28 11:50:00' and timestamp < '2022-05-29'
group by event_timestamps.race_number
order by count(event_timestamps.race_number) desc , max(timestamp);
Output
user_id
max(timestamp)
race_number
count(event...)
NULL
2022-05-28 12:30:01
1000
5
14694
2022-05-28 12:30:02
32
5
37617
2022-05-28 12:30:17
44
5
16134
2022-05-28 12:34:37
24
5
But when I join tbl.user the Null value disappears. I want to display the NULL so I can see if we are missing user data.
This query sort of works but the NULL value is not displaying:
select user.firstname, user.lastname, max(timestamp), event_timestamps.race_number, count(event_timestamps.race_number)
from event_timestamps
left join event_entry on event_timestamps.race_number = event_entry.race_number
inner join user on user.user_id = event_entry.user_id
where timestamp > '2022-05-28 11:50:00' and timestamp < '2022-05-29' and event_entry.event_id=430
group by event_timestamps.race_number
order by count(event_timestamps.race_number) desc , max(timestamp);```
firstname
lastname
max(timestamp)
race_number
count(event....)
Albert
Coles
12:30:02
32
5
Vince
Butre
12:30:17
44
5
John
Plessis
12:34:37
24
5
So I want race_number 1000 (for example) to display as well with NULL values in firstname, lastname.
Any assistance would be much appreciated because this is breaking my novice brain!
The events_timestamps has multiple occurances of the same race_number as the user completes laps. We count the laps and creat rank by using the last lap time (MAX timestamp) and sorting from there.

Try it like this:
SELECT user.firstname, user.lastname, s.timestamp, s.race_number, s.user_id, s.count
FROM (
SELECT event_entry.user_id as user_id, max(timestamp) as timestamp, event_timestamps.race_number as race_number, count(event_timestamps.race_number) as count
from event_timestamps
Left join event_entry on event_timestamps.race_number = event_entry.race_number and event_entry.event_id=430
Where timestamp > '2022-05-28 11:50:00' and timestamp < '2022-05-29'
group by event_timestamps.race_number, event_entry.user_id
) s
LEFT JOIN user on user.user_id = s.user_id
order by s.count desc , s.timestamp
like #Marshal_c said, inner join doesn't work, cause it gets rid of NULLs, viz. SQL JOIN and different types of JOINs
also like #Honk_der_Hase points out, your code is missing group by for all columns. When your SQL works, i asume, that you work in MariaDB, which takes in this case first found element (equivalent of ORALCE's TOP())
In your scenario, you should also have some kind of integrity rule, which will prevent from having multiple users (user_id) with same race number (race_number). If you had thouse records in your database, the rows would start duplicating.
Also be aware, that some databases (like MariaDB) can have problems with table called user because table named like that is used for authentication into database (in information_scheme)

Suggestion SQL query

Basically what i am trying to do is to suggest people based on common interests.
I have a table of Users.
I have a table of Interested_People where UserID + InterestID is stored.
I have a table of Contactlist where people who are added with each other is stored.
What I want is to only output people who are not your friends.
I searched a lot in internet but couldn't find something like so.
Although I created a query but it is very slow. Now I Kindly request you guys if you can edit my query a bit and make it much more bandwidth & time efficient.
SELECT *
FROM users
WHERE id IN(SELECT userid
FROM interested_people
WHERE interested_in IN(SELECT interested_in
FROM interested_people
WHERE userid = [userid])
AND id NOT IN(SELECT user1 AS my_friends_userid
FROM contactlist f
WHERE f.user2 = [userid]
AND accepted = 1
UNION
SELECT user2 AS my_friends_userid
FROM contactlist f
WHERE f.user1 = [userid]
AND accepted = 1))
AND id != [userid]
ORDER BY Rand ()
LIMIT 0, 10;
This query actually does the job but it takes very long about 16 sec in my local machine. and that's not what I want. I want a fast and reliable one.
Thanks in advance!

Subqueries in WHERE clauses are often slow in MySQL; at least slower than comparable JOINs.
SELECT others.*
FROM interested_people AS userI
INNER JOIN interested_people AS othersI
ON userI.interestid = othersI.interestid
AND userI.userid <> othersI.userid
INNER JOIN users AS others ON othersI.user_id = others.userid
LEFT JOIN contactlist AS cl
ON userI.userid = cl.user1
AND others.userid = cl.user2
AND cl.accepted = 1
WHERE userI.userid = [userid]
AND cl.accepted IS NULL
ORDER BY RAND()
LIMIT 0, 10;
Note: intuition makes me wonder if contactlist might be better as a where subquery.
The AND cl.accepted IS NULL ends up processed after the JOINs, resulting in allowing only results that did NOT have a match in contactlist.
If you want to enhance things a bit further:
SELECT others.*, COUNT(1) AS interestsCount
...
GROUP BY others.userid
ORDER BY interestsCount DESC, RAND()
LIMIT 0,10;
This would give you a random selection of the people that share the most interests in common.

First, looking at your interested-in query and assuming the "userID"
you are testing with is = 1. Sounds like you are trying to get one level
away from those user 1 is also interested in...
SELECT userid FROM interested_people
WHERE interested_in IN
( SELECT interested_in FROM interested_people
WHERE userid = [userid] )
Sample Data for Interested_People
userID Interested_In
1 5
1 7
1 8
2 3
2 5
2 7
7 1
7 2
7 5
8 3
In this case, the innermost returns interested_in values of 5, 7, 8.
Then, getting all users who are interested in 5, 7 and 8 would return 2 and 7.
(but since both users 2 and 7 are interested in 5, the 2 ID would be returned TWICE
thus a possible duplicate join later on. I would do distinct. This same
result could be done with the following query which you could sample times with...
SELECT distinct ip2.userid
from
interested_people ip
join interested_people ip2
ON ip.interested_in = ip2.interested_in
where
userid = [parmUserID]
Now, you need to exclude from this list all your contacts already accepted.
You could then left-join TWO TIMES for the from/to contact and ensure NULL
indicating not one of the contacts... Then join again to user table to
get the user details.
SELECT
u.*
from
users u
JOIN
( SELECT distinct
ip2.userid
from
interested_people ip
join interested_people ip2
ON ip.interested_in = ip2.interested_in
left join contactList cl1
ON ip2.userid = cl1.user1
AND cl1.accepted = 1
left join contactList cl2
ON ip2.userid = cl2.user2
AND cl2.accepted = 1
where
ip.userid = [parmUserID]
AND NOT ip2.userID = [parmUserID] ) PreQuery
ON u.id = PreQuery.userID
order by
RAND()
limit
0, 10
I would have two indexes on your contactList table to optimize both left-joins... with user1 and user2 in primary position... Similarly for the interested_people table.
table index
contactList ( user1, accepted )
contactList ( user2, accepted )
interested_people ( userid, interested_in )
interested_people ( interested_in, userid )
I would expect your user table is already indexed on the ID as primary key.

I think this will give you the same results but perform a lot better:
SELECT * FROM Users u
INNER JOIN interested_people i
ON u.id = i.userid
WHERE NOT EXISTS
(SELECT * FROM contacts WHERE user1 = [userid] or user2 = [userid] and accepted=1)
AND id != [userid]
ORDER BY Rand()
LIMIT 0, 10
Skip the ORDER BY clause if that is at all reasonable. That will be the most expensive part
The select and join clauses give you the users who are interested in connecting and the WHERE NOT EXISTS is a performant way to exclude those contacts already listed.

MySQL time out issue

I am facing serious issue in my workout related to PHP and MySql on Linux server while when am running same code with same database in localhost, it's working fine.
As well as I have almost 30,000 records in database table and mysql is:
SELECT * FROM tbl_movies where id not in (select movie_id from tbl_usermovieque where user_id='3' union
select movie_id from tbl_user_movie_response where user_id='3' union
select movie_id from tbl_user_movie_fav where user_id='3') and id < 220 order by rand() limit 0,20
its taking 0.0010 sec in my localhost and INFINITE on our linux server. i unable to find the reason.
Thanks
Kamal

Can you confirm this return the same result ? It should be faster this way. Union are usefull sometime but not really optimized.
SELECT * FROM tbl_movies where id not in (
select distinct movie_id
from tbl_movies m
inner join tbl_usermovieque um ON um.movie_id = m.movie_id = m.movie_id
inner join tbl_user_movie_response umr ON umr.movie_id = m.movie_id = m.movie_id
inner join tbl_user_movie_fav umf ON umf.movie_id = m.movie_id = m.movie_id
where um.user_id = 3 or umr.user_id = 3 or umf.user_id = 3
) and id < 220 order by rand() limit 0,20;
PS : I assume you have Index un oser_id and id_movie
EDIT : your problem may come from rand()
MySQL order by optimization Look for RAND() in the page : in comment there are some performance test => rand() alone seams to be a bad solution
Performance
Now let's see what happends to our performance. We have 3 different
queries for solving our problems.
Q1. ORDER BY RAND()
Q2. RAND() * MAX(ID)
Q3. RAND() * MAX(ID) + ORDER BY ID
Q1 is expected to cost N * log2(N), Q2 and Q3 are nearly constant.
The get real values we filled the table with N rows ( one thousand to
one million) and executed each query 1000 times.
Rows ||100 ||1.000 ||10.000 ||100.000 ||1.000.000
Q1||0:00.718s||0:02.092s||0:18.684s||2:59.081s||58:20.000s
Q2||0:00.519s||0:00.607s||0:00.614s||0:00.628s||0:00.637s
Q3||0:00.570s||0:00.607s||0:00.614s|0:00.628s ||0:00.637s
As you can see the plain ORDER BY RAND() is already behind the
optimized query at only 100 rows in the table.

mysql query need to be optimized

I have a query which give result like
id | productid | userid | coinsid
1 | 2 | 2 | 5
3 | 2 | 2 | 6
4 | 2 | 3 | 7
5 | 2 | 4 | 8
6 | 2 | 3 | 9
This is result for specific productid. Now i have to update the balance in user table by adding $1 to all the users in above result, but if userid is twice, i need to add $1 twice to the balance of that specific user. So in the above case $1 twice added to userid=2 balance and userid=3 balance.
The simple way is to count records for every distinct userid and run queries as many time as we have users in foreach loop. But i am looking for some optimize way. Please suggest any. Thanks

One approach:
UPDATE user_table u
JOIN ( SELECT q.userid
, SUM(1.00) AS deposit
FROM (
-- original OP query goes here
) q
GROUP BY q.userid
) r
ON r.userid = u.userid
SET u.balance = u.balance + r.deposit
We use the original OP query that returns the resultset displayed, and make that an inline view (aliased in the query above as q).
From that, we query a distinct list of userid, and the number of times that userid appears in the resultset. That gives us the username and a deposit amount (1 dollar for each time the userid appears) (some databases might want us to specify the value as 1.0 rather than 1, to make sure it was decimal. I think the SUM is more representative of what we are trying to accomplish.)
We join that inline view (r) to the user table, and add the deposit amount to the current balance, for that user (assuming the balance is stored as decimal dollars (1.00 = one dollar)
To testing, convert the UPDATE into a SELECT statement:
remove the "SET" clause
add an "ORDER BY" clause (optional) to make the results determinate
remove the "UPDATE" keyword and replace it
with:
SELECT r.userid
, r.deposit
, u.balance AS old_balance
, u.balance + r.deposit AS new_balance
, u.userid
FROM
Full select:
SELECT r.userid
, r.deposit
, u.balance AS old_balance
, u.balance + r.deposit AS new_balance
, u.userid
FROM user_table u
JOIN ( SELECT q.userid
, SUM(1.00) AS deposit
FROM (
-- original OP query goes here
) q
GROUP BY q.userid
) r
ON r.userid = u.userid
NOTE There is no WHERE clause, the JOIN predicates (in the ON clause) is what determines which rows are selected/affected in the user table.

Assuming you have no duplicate user ids in your balance table, maybe something like this would work:
update balance_table set balance_table.balance = (select count(*) from users_table where users_table.user_id = balance_table.user_id) * 1;
I haven't tried this query against a mysql database as I am more familiar with plsql, but wouldn't something like this work ?

The correlated subquery in the other answer will work, but an INNER JOIN will usually be more efficient. Try something like this; you'll of course need to supply the table and column names.
UPDATE myTable
INNER JOIN (
SELECT userid, count(*) AS AmountToAdd
FROM users
GROUP BY userid
) UserCounts ON myTable.userid = UserCounts.userid
SET balance = balance + UserCounts.AmountToAdd

select count(*), userid from yourTable group by userid
If I do understand your question.

PHP / MySQL Checking data between tables with long query

This is a more detailed question as my previous attempt wasn't clear enough. I'm new to MySQL and have no idea about the best way to do certain things. I'm building a voting application for images and am having trouble with some of the finer points of MySQL
My db
_votes
id
voter_id
image_id
_images
id
file_name
entrant_id
approved
_users
id
...
Basically I need to do the following:
tally up all votes that are approved
return the top 5 with the most votes
check if the user has voted on each of these 5 (return Boolean) from another table
I've tried variations of
SELECT i.id, i.file_name, i.total_votes
FROM _images i WHERE i.approved = 1
CASE WHEN (SELECT count(*) from _votes v WHERE v.image_id = i.id AND v.voter_id = ?) > 0 THEN '1' ELSE '0' END 'hasvoted'
ORDER BY i.total_votes DESC LIMIT ".($page*5).", 5
is that something I should try and do all in one query?
This query was working fine before I tried to add in the 'hasvoted' boolean:
SELECT id, file_name, total_votes FROM _images WHERE approved = 1 ORDER BY total_votes DESC LIMIT ".($page*5).", 5
At the moment I'm also storing the vote count in the _images table and I know this is wrong, but I have no idea about how to tally the votes by image_id and then order them.

Let me give this a shot to see if I understand your question:
SELECT i.*,(SELECT COUNT(*) FROM _votes WHERE i.id = image_id) AS total_votes, (SELECT count(*) from _votes where i.id = image_id and user_id = ?) as voted FROM _images AS i WHERE i.approved = 1

We Keep Coding

PHP, A popular general-purpose scripting language that is especially suited to web development.

How to optimize query with table scans? - php

Related

Join a 3rd table when the result of the initial join has NULL values (that I want retained!)

Suggestion SQL query

MySQL time out issue

mysql query need to be optimized

PHP / MySQL Checking data between tables with long query

Categories

Resources