COUNT in LEFT JOIN returning duplicated value - php

I've the following tables (example):
users:
id | user | photo | joined | country
1 | Igor | abc.jpg | 2015 | Brazil
2 | John | cga.png | 2014 | USA
3 | Lucas| hes.jpg | 2016 | Japan
posts (see that there are two lines with author = Igor and ft = 2 and one line with author = Igor and ft = 3 and Igor have three posts):
id | author | content | date | ft (2 = photos and 3 = videos)
1 | Igor | hi | 2016 | 2
2 | Igor | hello | 2016 | 3
3 | John | hehehe | 2016 | 2
4 | Igor | huhuhuh | 2016 | 2
5 | Lucas | lol | 2016 | 3
friendship (when status = 2 means that they are friends):
id | friend1 | friend2 | status
1 | Igor | Lucas | 2
2 | Lucas | John | 2
3 | John | Igor | 2
And I want to do a COUNT of posts with ft = 2 and a COUNT of friends (status = 2) according to the currently logged user (Igor, in this case).
So, I do (assuming that the current user logged in is Igor):
SELECT photo, joined, country, sum(CASE WHEN ft = 2 THEN 1 ELSE 0 END) AS numPhotos, sum(CASE WHEN ft = 3 THEN 1 ELSE 0 END) AS numVideos
FROM users
LEFT JOIN posts
ON users.user = posts.author
WHERE users.user = 'Igor'
GROUP BY users.user
LIMIT 1
And when I check on a foreach, the data is correct:
numPhotos = 2 and numVideos = 1.
But, I want to select too the number of friends, so, I do:
SELECT photo, joined, country, sum(CASE WHEN ft = 2 THEN 1 ELSE 0 END) AS numPhotos, sum(CASE WHEN ft = 3 THEN 1 ELSE 0 END) AS numVideos, count(friendship.status) AS numFriends
FROM users
LEFT JOIN posts
ON users.user = posts.author
LEFT JOIN friendship
ON (users.user = friend1 OR users.user = friend2) AND friendship.status = 2
WHERE users.user = 'Igor'
GROUP BY users.user
LIMIT 1
But, the output is:
numPhotos = 4, numVideos = 2 and numFriends = 6.
In other words, he is duplicating all results but in numFriends he's taking the total of posts of Igor (3) and duplicating the value too. And if I change count(friendship.status) to sum(friendship.status) the output is:
numPhotos = 4, numVideos = 2 and numFriends = 18 (triples the numFriends).
I tried too with count(distinct friendship.status) and the result is:
numPhotos = 4, numVideos = 2 and numFriends = 1 (duplicates the values again as well as return the wrong value 1 for numFriends that should be 2 knowing he has two friends).
So, how I can do this? (I'm using MySQL)
EDIT:
I changed the count(distinct friendship.status) to count(distinct friendship.id) and it worked to select the number of friends. But the rest of values (numPhotos and numVideos) continue duplicated.
I discovered that the problem is in ON (users.user = friend1 OR users.user = friend2), because if I leave only ON (users.user = friend1) or ON (users.user = friend2) the output isn't duplicated. I tried too with ON 'Igor' IN (friend1, friend2) but the result is the same (numPhotosandnumVideos` continue duplicated).

I think the left join may be joining on a one-to-many relationship, which is causing inflated counts.
Since you are only retrieving the counts for 1 user, I suggest using a subquery to retrieve the friendship counts (for retrieving the counts for multiple users, a derived table may be faster than a subquery):
SELECT
sum(ft = 2) AS numPhotos,
sum(ft = 3) AS numVideos,
(select count(*) from friendships f
where (friend1 = users.user
or friend2 = users.user)
and status = 2) as friendship_count
FROM users
LEFT JOIN posts
ON users.user = posts.author
WHERE users.user = 'Igor'
Note that I removed the group by because users.user is already in the where clause, which means there is only 1 group.

Instead of count(distinct friendship.status), try using count(distinct friendship.id). That should give you the number of unique friends. Counting distinct statuses doesn't work because all the statuses will be 2 by definition, so there is only one distinct value.

Related

How can I put a condition on the way of join?

I have this table structure:
// QandA
+----+---------------------+----------------------------------------+------+---------+
| Id | title | content | type | related |
+----+---------------------+----------------------------------------+------+---------+
| 1 | title of question 1 | content of question1 | 0 | 1 |
| 2 | | content of first answer for question1 | 1 | 1 |
| 3 | title of question 2 | content of question2 | 0 | 3 |
| 4 | | content of second answer for question1 | 1 | 1 |
| 5 | | content of first answer for question2 | 1 | 3 |
+----+---------------------+----------------------------------------+------+---------+
type column: 0 means it is a question and 1 means it is a answer.
related column: for question this column is containing the id of itself and for answer this column is containing the id of its question.
Also there is other dependent tables:
// Votes
+----+---------+---------+-------+
| id | post_id | user_id | value |
+----+---------+---------+-------+
| 1 | 1 | 1234 | 1 |
| 2 | 2 | 1234 | -1 |
| 3 | 1 | 4321 | 1 |
+----+---------+---------+-------+
// Favorites
+----+---------+---------+
| id | post_id | user_id |
+----+---------+---------+
| 1 | 1 | 1234 |
| 2 | 1 | 4321 |
+----+---------+---------+
Ok well, This is the main note in my question: Favorites table is only belong to the questions (not answers). Answers can never be favorite (just questions can be)
Also here is my query:
SELECT
p.title, p.content,
vv.value AS cuvv -- cuvv is stand for current_user_vote_value,
CASE WHEN ff.id IS NOT NULL THEN '2' ELSE '3' END AS cuf -- current_user_favorite
(SELECT SUM(v.value) FROM Votes v WHERE p.id = v.post_id) AS total_votes,
(SELECT COUNT(1) FROM Favorites f WHERE p.id = f.post_id) AS total_favorites,
FROM QandA p
LEFT JOIN Votes vv ON p.id = vv.post_id AND vv.user_id = :user_id_1
LEFT JOIN favorites ff ON p.id = ff.post_id AND f.user_id = :user_id_2
WHERE p.related = :id
Note: For cuf, 2 means current user has marked this question as favorite and 3 means he didn't have (in other word, 3 means this question isn't favorite for current user).
Ok, let me pass some parameters to query and execute it: (as an example)
$user_id = 1234;
$id = 1;
$sth->bindValue(":user_id_1", $user_id, PDO::PARAM_INT);
$sth->bindValue(":user_id_2", $user_id, PDO::PARAM_INT);
$sth->bindValue(":id", $id, PDO::PARAM_INT);
$sth->execute();
And here is the output:
-- cuvv is stand for current_user_vote_value
-- cuf is stand for current_user_favorite
+--------------+----------------------+------+-----+-------------+-----------------+
| title | content | cuvv | cuf | total_votes | total_favorites |
+--------------+----------------------+------+-----+-------------+-----------------+
| title of ... | content of que ... | 1 | 2 | 2 | 2 |
| | content of fir ... | -1 | 3 | -1 | 0 |
| | content of sec ... | NULL | 3 | 0 | 0 |
+--------------+----------------------+------+-----+-------------+-----------------+
Ok So, What's my question?
These two columns cuf and total_favorites are just belong to questions (type = 0). But my query doesn't know it. I mean my query calculates the number of total favorites for all rows, and I want to know, how can tell it: calculate cuf and total_favorites only for questions, not both questions and answers?
In other word, I need to put a IF condition to check if p.type = 0 then execute these two lines:
(SELECT COUNT(1) FROM Favorites f WHERE p.id = f.post_id) AS total_favorites,
and
LEFT JOIN favorites ff ON p.id = ff.post_id AND f.user_id = :user_id_2
Otherwise doesn't execute those two lines, because if p.type = 1, then those two lines are waste and useless.
How can I implement that condition and improve that query?
One way you may want to try is to query the favorite and votes table only once in subqueries, and calculate both the user and all values at once.
SELECT
q.title, q.content,
IFNULL(vv.user_val, 0) cuvv, IFNULL(vv.all_val, 0) total_votes,
IFNULL(ff.user_fav, 0) cuf, IFNULL(ff.all_fav, 0) total_favorites
FROM QandA q
LEFT JOIN (
SELECT post_id,
SUM(value) all_val, SUM(CASE WHEN user_id=1234 THEN value END) user_val
FROM votes GROUP BY post_id
) vv
ON vv.post_id = q.id
LEFT JOIN (
SELECT post_id,
COUNT(1) all_fav, COUNT(CASE WHEN user_id=1234 THEN 1 END) user_fav
FROM favorites GROUP BY post_id
) ff
ON q.type=0 AND ff.post_id = q.id
WHERE q.related = 1;
An SQLfiddle to test with.
Try this:
SELECT
p.id, p.type,p.title, p.content,
vv.value AS cuvv,
CASE WHEN ff.id IS NOT NULL THEN '2' ELSE '3' END AS cuf,
(SELECT SUM(v.value) FROM Votes v WHERE p.id = v.post_id) AS total_votes,
(SELECT COUNT(1) FROM Favorites f WHERE p.id = f.post_id) AS total_favorites
FROM QandA p
LEFT JOIN Votes vv ON p.id = vv.post_id AND vv.user_id = '1234'
LEFT JOIN Favorites ff ON p.id = ff.post_id AND ff.user_id = '1234'
WHERE p.related = 1 and p.type=0
union all
SELECT
p.id, p.type,p.title, p.content,
vv.value AS cuvv,
'3' AS cuf,
(SELECT SUM(v.value) FROM Votes v WHERE p.id = v.post_id) AS total_votes,
NULL AS total_favorites
FROM QandA p
LEFT JOIN Votes vv ON p.id = vv.post_id AND vv.user_id = '1234'
WHERE p.related = 1 and p.type=1;

Select specific titles based on friends' posts

I have some tables.
titles
id| title
1 | Cars
2 | Computers
3 | Phones
4 | Tvs
entry
id | title_id | user_id | entry | time
1 | 1 | 12 | entry-01 | 1
2 | 2 | 11 | entry-02 | 2
3 | 3 | 12 | entry-03 | 3
4 | 2 | 11 | entry-04 | 4
5 | 3 | 11 | entry-05 | 5
6 | 4 | 12 | entry-06 | 6
7 | 4 | 13 | entry-07 | 7
8 | 4 | 11 | entry-08 | 8
9 | 1 | 10 | entry-09 | 9
10 | 2 | 12 | entry-10 | 10
users
id | username
10 | user-1
11 | user-2
12 | user-3
13 | user-4
friends
id | user_id | friend_id
1 | 10 | 12
2 | 11 | 12
3 | 12 | 10
4 | 10 | 11
I need to filter titles based on friends' entries and sort the results by (entry.time) desc. And I also need to show friends name and count(entry) at the list.
Expected result filtered by user_id=10 is:
result
1 | Computers | user-3, user-2(2)
2 | Tvs | user-2, user-3
3 | Phones | user-2, user-3
4 | Cars | user-3
any ideas?
This problem is complex, but if you get into the habit of breaking problems down into smaller pieces you will catch on pretty quickly. Why not start by getting all friends of user id 10? We can do so like this:
SELECT CASE WHEN user_id = 10 THEN friend_id
WHEN friend_id = 10 THEN user_id END AS userFriends
FROM friends
GROUP BY userFriends
HAVING userFriends IS NOT NULL;
Notice the use of a case statement, because user_id 10 could be in either of the two columns. I use the GROUP BY in case the user/friend pair appears multiple times (like 10 and 12 for your example) and a check for not null to remove the rows that didn't match the case.
Now that you have those, you can join it with the entries and titles tables to get the information you're going to need. Just add in some aggregation to get the number of entries each user has for a title:
SELECT t.title, u.userName, COUNT(*) AS numEntries
FROM titles t
LEFT JOIN entry e ON e.title_id = t.id
JOIN users u ON u.id = e.user_id
JOIN(
SELECT
CASE WHEN user_id = 10 THEN friend_id
WHEN friend_id = 10 THEN user_id END AS userFriends
FROM friends
GROUP BY userFriends
HAVING userFriends IS NOT NULL) f ON f.userFriends = u.id
GROUP BY t.title, u.userName;
Matching your format is going to be very tricky. Typically, you can use GROUP_CONCAT() to get a comma separated list, but you will get something like user3, user2, user2 for your first list. To fix this, I recommend writing a CONCAT() in your select statement to modify the above query to get the number of entries to the side of each user. In addition, use another CASE statement so that this only happens when the COUNT(*) is greater than 1:
SELECT t.title,
CASE WHEN COUNT(*) > 1 THEN
CONCAT(u.userName, ' (', COUNT(*), ')')
ELSE
u.userName
END AS numEntries
FROM titles t
LEFT JOIN entry e ON e.title_id = t.id
JOIN users u ON u.id = e.user_id
JOIN(
SELECT
CASE WHEN user_id = 10 THEN friend_id
WHEN friend_id = 10 THEN user_id END AS userFriends
FROM friends
GROUP BY userFriends
HAVING userFriends IS NOT NULL) f ON f.userFriends = u.id
GROUP BY t.title, u.userName;
And now, I would preform a GROUP_CONCAT() on that query:
SELECT tmp.title, GROUP_CONCAT(tmp.userEntries) AS friendEntries
FROM(
SELECT t.title,
CASE WHEN COUNT(*) > 1 THEN
CONCAT(u.userName, ' (', COUNT(*), ')')
ELSE
u.userName
END AS userEntries
FROM titles t
LEFT JOIN entry e ON e.title_id = t.id
JOIN users u ON u.id = e.user_id
JOIN(
SELECT
CASE WHEN user_id = 10 THEN friend_id
WHEN friend_id = 10 THEN user_id END AS userFriends
FROM friends
GROUP BY userFriends
HAVING userFriends IS NOT NULL) f ON f.userFriends = u.id
GROUP BY t.title, u.userName) tmp
GROUP BY tmp.title;
I apologize for the lengthy response (though I wanted to be clear and cover it all). If you've made it to this point, you'll be happy to know that it works in SQL Fiddle.

Query based on individual COUNT(*) values

I have two tables ('posts' and 'votes') which allows users to vote thumbs-up (where rating = 1 in 'votes') or thumbs-down (where rating = 0 in 'votes') on posts.
I'm joining the two tables in a query and am trying to filter the results so that a row only shows if it has 2+ positive (rating = 1) ratings AND 2+ negative (rating = 0) ratings from the 'votes' table ('post_id' and 'rating' columns).
This is what I got, but it doesn't work as intended, since it brings back results which also have only 1 positive and 1 negative vote -- although you can't see this due to the conglomeration of votes for each post -- which isn't what I want (the 'HAVING' line isn't working as intended):
SELECT *, COUNT(*)
FROM posts p
JOIN votes v ON p.id = v.post_id
WHERE rating = 1 OR rating = 0
GROUP BY p.id
HAVING COUNT(rating = 1) > 1 AND COUNT(rating = 0) > 1
+----+---------+----------+----------+
| id | post_id | rating | COUNT(*) |
+----+---------+----------+----------+
| 4 | 4 | 0 | 2 |
| 7 | 7 | 0 | 2 |
| 9 | 9 | 0 | 2 |
| 83 | 83 | 1 | 2 |
+----+---------+----------+----------+
I think you want this having clause:
SELECT p.*, COUNT(*)
FROM posts p JOIN
votes v
ON p.id = v.post_id
WHERE rating IN (1, 0)
GROUP BY p.id
HAVING ABS(SUM(rating = 1) - SUM(rating = 0)) > 1;
EDIT:
The above is doing the right thing. Here is a SQL Fiddle showing the results.
EDIT II:
I might have misinterpreted the question. I understood the question to be a net positive of two votes or a net negative of two votes. You seem to want just at least two votes either way. That having clause is:
HAVING SUM(rating = 1) > 1 or SUM(rating = 0) > 1

How To Search Two Mysql tables Using Ones Data

I have tables like this
mainTable
Id | name | country
1 | John | 5
2 | Bill | 7
categoriesTable
other_table_id | category
1 | 6
1 | 12
My question is how can I say
SELECT id FROM mainTable
WHERE country=5
AND WHERE categoriesTable order_table_id=[**THE ID I JUST GOT FROM THE FIRST TABLE**] && category=6 || category=12
Then returns the number of records that match so in this case 1
Thanks!
Doesn't anyone learn how to write JOINs when they learn SQL?
SELECT m.id
FROM mainTable AS m
JOIN categoriesTable AS c ON c.other_table_id = m.id
WHERE c.category IN (6, 12)
AND m.country = 5

Select row in which the sum of the times the same value came up is less than X

this one been puzzling me for a couple of searching hours.
So I have a campaign table and a vendor Table. The vendor might have several campaigns.
I want to select all campaigns if the vendor has enough credits.
Problem is I don't know how many campaigns are going to be selected from the same vendor which means that the vendor might still have credits for two campaigns but not for the rest of them.
Example
tblvendors
+---------+------------+---------------+
|vendorId | vendorName | vendorCredits |
+---------+------------+---------------+
| 1 | a | 5 |
| 2 | b | 100 |
+---------+------------+---------------+
tblproducts
+-----------+---------------+------------+
| productId | productName | vendorId |
+-----------+---------------+------------+
| 1 | c | 1 |
| 2 | e | 2 |
| 3 | f | 1 |
| 4 | g | 1 |
| 5 | h | 1 |
+-----------+---------------+------------+
tblcampaigns
+------------+---------------+------------+
| campaignId | productId | vendorId |
+------------+---------------+------------+
| 1 | 1 | 1 |
| 2 | 2 | 2 |
| 3 | 3 | 1 |
| 4 | 4 | 1 |
| 5 | 5 | 1 |
+------------+---------------+------------+
Now considering that everytime a row is selected the vendor looses 2 credits since vendor 'a' only has 5 credits left only campaigns 1 2 and 3 should be returned.
My current Query is this:
SET #maxCampaignId = (SELECT MAX(campaignId) FROM tblCampaigns);
SELECT
#maxCampaignId,
t0.campaignId,
t0.productId,
productName,
productDescription,
productImage,
(CASE WHEN campaignId > (SELECT configValue FROM tblconfiguration WHERE configKey = 'lastHomeCampaignId')
THEN campaignId ELSE campaignId + #maxCampaignId END) AS orderField
FROM tblcampaigns AS t0
INNER JOIN tblproducts AS t1 ON t0.productId = t1.productId
INNER JOIN tblvendors AS t2 ON t1.vendorId = t2.vendorId
WHERE
campaignType = 'homeFeature' AND
t0.isActive = 1 AND
t2.vendorCredits > (SELECT configValue FROM tblconfiguration WHERE configKey = 'campaignHomeFeatureCost' LIMIT 1)
ORDER BY orderField ASC
LIMIT 4
The problem as you can see is int the line that compares the vendorCredits. Obviously as is the query selects more campaigns than the vendor can afford.
I wanted to avoid doing this in PHP as I think it should be possible to do this straight out of the database.
Check this post, it may help - group by and having clauses. I'll try to do some test later
Using COUNT(*) in the WHERE clause
UPDATE:
select t2.vendorId, vendorCredits from tblcampaigns AS t0 JOIN tblproducts AS t1 ON t0.productId = t1.productId JOIN tblvendors AS t2 ON t1.vendorId = t2.vendorId group by t2.vendorId having t2.vendorCredits = count(t2.vendorId)
If I correctly understood the question: This query will select all vendors having more campains than credits.
Ok found it.
Thanks to this post: How do I limit the number of rows per field value in SQL?
What I did was Selecting the rows I wanted in the order I wanted as a subquery and its respective row number so that I could reorder it back in the end.
Then I made a second subquery ordered by the vendorId so that I could count the number of times it turned up and returning the row_count to the main query.
Finally in the main query I reordered it back to the row number in the deepest subquery but now I have the value I wanted to compare which is the value of credits per row * the current row number for a particular vendor.
Anyways maybe the code is cleared and here it goes:
SET #creditsCost = (SELECT configValue FROM tblconfiguration WHERE configKey = 'campaignHomeFeatureCost' LIMIT 1);
SET #maxCampaignId = (SELECT MAX(campaignId) FROM tblCampaigns);
SET #curRow = 0;
SELECT * FROM
(
SELECT *,
#num := if(#first_column = vendorId, #num:= #num + 1, 1) as row_num,
#first_column:=vendorId as c
FROM
(SELECT
#curRow := #curRow + 1 AS row_number,
#maxCampaignId,
t0.campaignId,
t0.productId,
t2.vendorId,
t2.vendorCredits,
productName,
productDescription,
productImage,
(CASE WHEN campaignId > (SELECT configValue FROM tblconfiguration WHERE configKey = 'lastHomeCampaignId')
THEN campaignId ELSE campaignId + #maxCampaignId END) AS orderField
FROM tblcampaigns AS t0
INNER JOIN tblproducts AS t1 ON t0.productId = t1.productId
INNER JOIN tblvendors AS t2 ON t1.vendorId = t2.vendorId
WHERE
campaignType = 'homeFeature' AND
t0.isActive = 1
ORDER BY orderField ASC) AS filteredCampaigns
ORDER BY vendorId
) AS creditAllowedCampaigns
WHERE
row_num * #creditsCost <= vendorCredits
ORDER BY row_number
Anyhow I still appreciate Who took the time to answer and try to help, and will be listening to future comments since I think this is not the best way performance wise.

Categories