SQL Optimization WHERE vs JOIN - php

I am using mysql and this is a query that I quickly wrote, but I feel this can be optimized using JOINS. This is an example btw.
users table:
id user_name first_name last_name email password
1 bobyxl Bob Cox bob#gmail.com pass
player table
id role player_name user_id server_id racial_name
3 0 boby123 1 2 Klingon
1 1 example 2 2 Race
2 0 boby2 1 1 Klingon
SQL
SELECT `player`.`server_id`,`player`.`id`,`player`.`player_name`,`player`.`racial_name`
FROM `player`,`users`
WHERE `users`.`id` = 1
and `users`.`id` = `player`.`user_id`
I know I can use a left join but what are the benefits
SELECT `player`.`server_id`,`player`.`id`,`player`.`player_name`,`player`.`racial_name`
FROM `player`
LEFT JOIN `users`
ON `users`.`id` = `player`.`user_id`
WHERE `users`.`id` = 1
What are the benefits, I get the same results ether way.

Your query has a JOIN in it. It is the same as writing:
SELECT `player`.`server_id`,`player`.`id`,`player`.`player_name`,`player`.`racial_name`
FROM `player`
INNER JOIN `users` ON `users`.`id` = `player`.`user_id`
WHERE `users`.`id` = 1
The only reason for you to use left join is if you want to get data from player table even when you don't have matches in users table.

LEFT JOIN will get data from the left table even if there's no equal data from the right side table.
I guess at one point, that player table's data will not be equivalent to users table specially if the data on users table has not been inserted into player table.
Your first query might return null on cases that the 2nd table (player) has no equivalent data corresponding to users table.
Also, IMHO, setting up another table for servers is a good idea in terms of complying to the normalization rules in database structure. After all, what details of the server_id is the column on player table pointing to.

The first solution makes a direct product (gets and connects everything with everything) then drops away the bad results. If you have a lot of rows this will be very slow!
The left join gets first the left table then put only the matching rows from the right (or null).
In your example you don't even need join. :)
This'll give you the same result and it'll be good until you just check for user id:
SELECT `player`.`server_id`,`player`.`id`,`player`.`player_name`,`player`.`racial_name`
FROM `player`
WHERE `player`.`user_id` = 1
Another solution if you want more conditions, without join could be something like this:
SELECT * FROM player WHERE player.user_id IN (SELECT id FROM user WHERE ...... )

Related

Optimal joins in MySQL or offloading to application layer

I have 3 tables in a MySQL database: courses, users and participants, which contains about 30mil, 30k and 3k entries respectively.
My goal is to (efficiently) figure out the number of users that have been assigned to courses that matches our criteria. The criteria is a little more complex, but for this example we only care about users where deleted_at is null and courses where deleted_at is null and active is 1.
Simplified these are the columns:
users:
id
deleted_at
1
null
2
2022-01-01
courses:
id
active 
deleted_at
1
1
null
1
1
2020-01-01
2
0
2020-01-01
participants:
id
participant_id 
course_id
1
1
1
2
1
2
3
2
2
Based on the data above, the number we would get would be 1 as only user 1 is not deleted and that user assigned to some course (id 1) that is active and not deleted.
Here is a list of what I've tried.
Joining all the tables and do simple where's.
Joining using subqueries.
Pulling the correct courses and users out to the application layer (PHP), and querying participants using WHERE IN.
Pulling everything out and doing the filtering in the application layer.
Calling using EXPLAIN to add better indexes - I, admittedly, do not do this often and may not have done this well enough.
A combination of all the above.
An example of a query would be:
SELECT COUNT(DISTINCT participant_id)
FROM `participants`
INNER JOIN
(SELECT `courses`.`id`
FROM `courses`
WHERE (`active` = '1')
AND `deleted_at` IS NULL) AS `tempCourses` ON `tempCourses`.`id` = `participants`.`course_id`
WHERE `participant_type` = 'Eloomi\\Models\\User'
AND `participant_id` in
(SELECT `users`.`id`
FROM `users`
WHERE `users`.`deleted_at` IS NULL)
From what I can gather doing this will create a massive table, which only then will start applying where's. In my mind it should be possible to short circuit a lot of that because once we get a match for a user, we can disregard that going forward. That would be how to handle it, in my mind, in the application layer.
We could do this on a per-user basis in the application layer, but the number of requests to the database would make this a bad solution.
I have tagged it as PHP as well as MySQL, not because it has to be PHP but because I do not mind offloading some parts to the application layer if that is required. It's my experience that joins do not always use indexes optimally
Edit:
To specify my question: Can someone help me provide a efficient way to pull out the number of non-deleted users that have been assigned to to active non-deleted courses?
I would write it this way:
SELECT COUNT(DISTINCT p.participant_id)
FROM courses AS c
INNER JOIN participants AS p
ON c.id = p.course_id
INNER JOIN users AS u
ON p.participant_id = u.id
WHERE u.deleted_at IS NULL
AND c.active = 1 AND c.deleted_at IS NULL
AND p.participant_type = 'Eloomi\\Models\\User';
MySQL may join the tables in another order, not the order you list the tables in the query.
I hope that courses is the first table MySQL accesses, because it's probably the smallest table. Especially after filtering by active and deleted_at. The following index will help to narrow down that filtering, so only matching rows are examined:
ALTER TABLE courses ADD KEY (active, deleted_at);
Every index implicitly has the table's primary key (e.g. id) appended as the last column. That column being part of the index, it is used in the join to participants. So you need an index in participants that the join uses to find the corresponding rows in that table. The order of columns in the index is important.
ALTER TABLE participants ADD KEY (course_id, participant_type, participant_id);
The participant_id is used to join to the users table. MySQL's optimizer will probably prefer to join to users by its primary key, but you also want to restrict that by deleted_at, so you might need this index:
ALTER TABLE users ADD KEY (id, deleted_at);
And you might need to use an index hint to coax the optimizer to prefer this secondary index over the primary key index.
SELECT COUNT(DISTINCT p.participant_id)
FROM courses AS c
INNER JOIN participants AS p
ON c.id = p.course_id
INNER JOIN users AS u USE INDEX(deleted_at)
ON p.participant_id = u.id
WHERE u.deleted_at IS NULL
AND c.active = 1 AND c.deleted_at IS NULL
AND p.participant_type = 'Eloomi\\Models\\User';
MySQL knows how to use compound indexes even if some conditions are in join clauses and other conditions are in the WHERE clause.
Caveat: I have not tested this. Choosing indexes may take several tries, and testing the EXPLAIN after each try.

Getting a Triple Join To Work When Rows Are Missing?

I have 3 tables. 1 table is like the master table and I want all rows from this table where GameID = X. Then I have a guides table which will have a matching ID and finally i have a user table that defines whether the user has selected this row to be hidden. this is causing issues. This table may not have a row associated with it. This table is shared amongst ALL users. The primary key of this table is UserID+InfoID. The query below returns what I want provided there are no other rows in the table for other userIDs.
SELECT PS_Info.*, PS_Guides.Guide, PS_Userhidden.* FROM PS_Info
LEFT JOIN PS_Guides ON PS_Info.ID = PS_Guides.InfoID
LEFT JOIN PS_Userhidden ON PS_Info.ID = PS_Userhidden.InfoID
WHERE PS_Info.GameID = :ID AND (PS_Userhidden.UserID = :UserID)
OR (PS_Userhidden.UserID IS NULL AND PS_Userhidden.InfoID IS NULL)
So I will run the php script and have infoID =1 and userID=1. In the table there is infoID=1 and userid = 2, but nothing will be returned for this row. If I remove PS_Userhidden.UserID = :UserID I get multiple of the same row. The user table will grow to millions of rows. I need a way to make this query stick to the primary key of the users table so it will still return a row if no match exists in the user table and also return a row if there is a match in the users table for the specific user
I think you just need to move the condition on the hidden user to the ON clause:
SELECT i.*, g.Guide, h.*
FROM PS_Info i LEFT JOIN
PS_Guides g
ON i.ID = g.InfoID LEFT JOIN
PS_Userhidden h
ON i.ID = h.InfoID AND h.UserID = :UserID
WHERE i.GameID = :ID ;
Your description of the problem sounds like something that can happen when you start fiddling with conditions in the WHERE clause of a LEFT JOIN. It is a little hard to follow though. If this doesn't work, edit your question with sample data and desired results -- or, better yet, set up a SQL Fiddle.

MySQL - structuring query to discard common results

That title is really not useful, but its a complex question (in my head, maybe) ... anywho...
Say I have a MySQL table of Countries (A-Z all countries in the world) with id & name
Then I have a table where I am tracking which countries a user has been to: Like so:
Country Table
id name
1 india
2 luxembourg
3 usa
Visited Table
id user_id country_id
1 1 1
2 1 3
Now here's what I want to do, when I present the form to add to the list of visited countries I want country.id 1 & 3 to be excluded from the query result.
I know I can filter this using PHP ... which is something I have done in the past ... but surely there must be a way to structure a query in such a way that 1 & 3 are excluded from the returned results, like:
SELECT *
FROM `countries`
WHERE `id`!= "SELECT `country_id`
FROM `visited`
WHERE `user_id`='1'"
I suspect it has something to do with JOIN statements but I can't quite figure it out.
Bonus gratitude if someone can point me in the right direction with Laravel.
Thanks you all :)
Is this what you want?
select c.*
from countries c left join
visited v
on c.id = v.country_id and v.user_id = 1
where v.country_id is null;
You can also express this as a not in or not exists, but the left join method typically has pretty good performance.
The left outer join keeps all records in the first table regardless of whether or not the on clause evaluates to true. If there are no matches in the second table, then the columns are populated with NULL values. The where clause simply chooses these records -- the ones that do not match.
Here is another way of expressing this that you might find easier to follow:
select c.*
from countries c
where not exists (select 1 from visited where c.id = v.country_id and v.user_id = 1)
You can use your query like this.
SELECT *
FROM `countries` c LEFT JOIN `visited` v on c.id = v.country_id
WHERE v.`country_id` is null
AND v.`user_id` = 1
This is a operation of a LEFT JOIN. What is means is that I'm selecting all registries from the table countries that may or may not is on the table visited based on the ID of the country.
So it will bring you this group
from country from visited
1 1
2 no registry
3 3
So on the where condition (v.country_id is null) I'm saying: I only want the ones that on this left join operation is only on the country table but it is not on visited table so it brings me the id 2. Plus the condition that says that those registries on visited must be from the user_id=1
SELECT * FROM COUNTRIES LEFT JOIN
VISITED ON
countries.id = visited.country_id and visited.country_id NOT IN ( SELECT country_id FROM visited )
if i understand right maybe you need something like this ?

facebook type wall post script

I have two mysql tables members_tbl and post_tbl
members_tbl:
id|userName |fname |lname |friendArray
post_tbl:
postId| memId | thePost |postDate
now, I'm trying to display post from user id and from his friendArray.
please let me know how to do it (still new to php)
Since MySQL lacks an explode function, you either need to create a relation table and use joins, or use multiple queries with php processing inbetween. I strongly recommend the relational approach as it conforms to database standards (normalization) much more than the alternative and is easier to implement.
You need a third table, which describes the relation between two friends, I.E.
friends_tbl
user1_id | user2_id
With a primary key on user1_id and user2_id (thereby preventing duplicates). For every friend relationship, I.E. user 1 is friends with user 2, there is one row in this table. You can then get the listing you want with the following query.
SELECT p.*, u.*
FROM posts_tbl p
INNER JOIN members_tbl u
ON u.id = p.memId
WHERE u.id IN (
SELECT user2_id AS id
FROM friends_tbl
INNER JOIN members_tbl
ON (user1_id = id)
WHERE members_tbl.id = $id
UNION
SELECT user1_id AS id
FROM friends_tbl
INNER JOIN members_tbl
ON (user2_id = id)
WHERE members_tbl.id = $id
)
ORDER BY p.postDate
SQLFiddle of the above.
create a different table for your friend-relations and then join this table in your SQL.

MySQL join and exclude?

I have two tables, table A one with two columns: IP and ID, and table B with columns: ID and extra information. I want to extract the rows in table B for IPs that are not in table A. So if I have a rows in table A with
id = 1
ip = 000.000.00
id = 2
ip = 111.111.11
and I have rows in table B
id = 1
id = 2
then, given ip = 111.111.11, how can I return row 1 in table B?
select b.id, b.*
from b
left join a on a.id = b.id
where a.id is null
This'll pull all the rows in B that have no matching rows in A. You can add a specific IP into the where clause if you want to try for just that one ip.
The simplest and most easy-to-read way to spell what you're describing is:
SELECT * FROM `B` WHERE `ID` NOT IN (SELECT `ID` FROM `A`)
You should be aware, though, that using a subquery for something like this has historically been slower than doing the same thing with a self-join, because it is easier to optimise the latter, which might look like this:
SELECT
`B`.*
FROM
`B`
LEFT JOIN
`A` ON `A`.`ID` = `B`.`ID`
WHERE
`A`.`ID` IS NULL
However, technology is improving all the time, and the extent to which this is true (or even whether this is true) depends on the database software you're using.
You should test both approaches then settle on the best balance of readability and performance for your use case.

Categories