I am trying to get the latest 1 or 2 comments related to each post I download, a bit like instagram does as they show the latest 3 comments for each post, So far I am getting the posts & the likes counts.
Now all I need to do is figure out how to get the latest comments, not too sure how to approach it and that is why I hoping someone with a lot more expertise can help me!
This is my current query:
(SELECT
P.uuid,
P.caption,
P.imageHeight,
P.path,
P.date,
U.id,
U.fullname,
U.coverImage,
U.bio,
U.username,
U.profileImage,
coalesce(Activity.LikeCNT,0),
Activity.CurrentUserLiked
FROM USERS AS U
INNER JOIN Posts AS P
ON P.id = U.id
LEFT JOIN (SELECT COUNT(DISTINCT Activity.uuidPost) LikeCNT, Activity.uuidPost, Activity.id, sum(CASE WHEN Activity.id = $id then 1 else 0 end) as CurrentUserLiked
FROM Activity Activity
WHERE type = 'like'
GROUP BY Activity.uuidPost) Activity
ON Activity.uuidPost = P.uuid
AND Activity.id = U.id
WHERE U.id = $id)
UNION
(SELECT
P.uuid,
P.caption,
P.imageHeight,
P.path,
P.date,
U.id,
U.fullname,
U.coverImage,
U.bio,
U.username,
U.profileImage,
coalesce(Activity.LikeCNT,0),
Activity.CurrentUserLiked
FROM Activity AS A
INNER JOIN USERS AS U
ON A.IdOtherUser=U.id
INNER JOIN Posts AS P
ON P.id = U.id
LEFT JOIN (SELECT COUNT(DISTINCT Activity.uuidPost) LikeCNT, Activity.uuidPost, Activity.id, sum(CASE WHEN Activity.id = $id then 1 else 0 end) as CurrentUserLiked
FROM Activity Activity
WHERE type = 'like'
GROUP BY Activity.uuidPost) Activity
ON Activity.uuidPost = P.uuid
AND Activity.id = U.id
WHERE A.id = $id)
ORDER BY date DESC
LIMIT 0, 5
Basically the comments are store in the same table as the likes.
So the table is Activity, then I have a column comment which stores the comment text, and then the "type" is equal to "comment".
Possibly not very well explained but I am willing to try and give as much detail as possible!
If anyone can help it's very much appreciated!!
UPDATE
On this query given by https://stackoverflow.com/users/1016435/xqbert I am currently getting this error:
Illegal mix of collations (utf8_general_ci,IMPLICIT) and (utf8_unicode_ci,IMPLICIT) for operation '='
SELECT Posts.id,
Posts.uuid,
Posts.caption,
Posts.path,
Posts.date,
USERS.id,
USERS.username,
USERS.fullname,
USERS.profileImage,
coalesce(A.LikeCNT,0),
com.comment
FROM Posts
INNER JOIN USERS
ON Posts.id = 145
AND USERS.id = 145
LEFT JOIN (SELECT COUNT(A.uuidPost) LikeCNT, A.UUIDPost
FROM Activity A
WHERE type = 'like'
GROUP BY A.UUIDPOST) A
on A.UUIDPost=Posts.uuid
LEFT JOIN (SELECT comment, UUIDPOST, #row_num := IF(#prev_value=UUIDPOST,#row_num+1,1) as row_number,#prev_value := UUIDPOST
FROM Activity
CROSS JOIN (SELECT #row_num := 1) x
CROSS JOIN (SELECT #prev_value := '') y
WHERE type = 'comment'
ORDER BY UUIDPOST, date DESC) Com
ON Com.UUIIDPOSt = Posts.UUID
AND row_number <= 2
ORDER BY date DESC
LIMIT 0, 5
Latest Edit
Table structures:
Posts
----------------------------------------------------------
| id | int(11) | | not null |
| uuid | varchar(100) | utf8_unicode_ci | not null |
| imageLink | varchar(500) | utf8_unicode_ci | not null |
| date | timestamp | | not null |
----------------------------------------------------------
USERS
-------------------------------------------------------------
| id | int(11) | | not null |
| username | varchar(100) | utf8_unicode_ci | not null |
| profileImage | varchar(500) | utf8_unicode_ci | not null |
| date | timestamp | | not null |
-------------------------------------------------------------
Activity
----------------------------------------------------------
| id | int(11) | | not null |
| uuid | varchar(100) | utf8_unicode_ci | not null |
| uuidPost | varchar(100) | utf8_unicode_ci | not null |
| type | varchar(50) | utf8_unicode_ci | not null |
| commentText | varchar(500) | utf8_unicode_ci | not null |
| date | timestamp | | not null |
----------------------------------------------------------
Those are some examples, in the "Activity" table in this case "type" will always be equal to "comment".
Sum up of everything and desire result:
When I query the users posts, I would like to be able to go into the "Activity" table and get the latest 2 comments for every posts he has. Maybe there will be no comments so obviously it would return 0, maybe there could be 100 comments for that post. But I only want to get the latest/most recent 2 comments.
An example could be looking at how Instagram does it. For every post the display the most recent comments 1, 2 or 3....
Hope this helps!
Fiddle link
This error message
Illegal mix of collations (utf8_general_ci,IMPLICIT) and
(utf8_unicode_ci,IMPLICIT) for operation '='
is typically due to the definition of your columns and tables. It usually means that on either side of an equal sign there are different collations. What you need to do is choose one and include that decision in your query.
The collation issue here was in the CROSS JOIN of #prev_value which needed an explicit collation to be used.
I have also slightly changed the "row_number" logic to a single cross join and moved the if logic to the extremes of the select list.
Some sample data is displayed below. Sample data is needed to test queries with. Anyone attempting to answer your question with working examples will need data. The reason I am including it here is twofold.
so that you will understand any result I present
so that in future when you ask another SQL related question you understand the importance of supplying data. It is not only more convenient for us that you do this. If the asker provides the sample data then the asker will already understand it - it won't be an invention of some stranger who has devoted some of their time to help out.
Sample Data
Please note some columns are missing from the tables, only the columns specified in the table details have been included.
This sample data has 5 comments against a single post (no likes are recorded)
CREATE TABLE Posts
(
`id` int,
`uuid` varchar(7) collate utf8_unicode_ci,
`imageLink` varchar(9) collate utf8_unicode_ci,
`date` datetime
);
INSERT INTO Posts(`id`, `uuid`, `imageLink`, `date`)
VALUES
(145, 'abcdefg', 'blah blah', '2016-10-10 00:00:00') ;
CREATE TABLE USERS
(
`id` int,
`username` varchar(15) collate utf8_unicode_ci,
`profileImage` varchar(12) collate utf8_unicode_ci,
`date` datetime
) ;
INSERT INTO USERS(`id`, `username`, `profileImage`, `date`)
VALUES
(145, 'used_by_already', 'blah de blah', '2014-01-03 00:00:00') ;
CREATE TABLE Activity
(
`id` int,
`uuid` varchar(4) collate utf8_unicode_ci,
`uuidPost` varchar(7) collate utf8_unicode_ci,
`type` varchar(40) collate utf8_unicode_ci,
`commentText` varchar(11) collate utf8_unicode_ci, `date` datetime
) ;
INSERT INTO Activity (`id`, `uuid`, `uuidPost`, `type`, `commentText`, `date`)
VALUES
(345, 'a100', 'abcdefg', 'comment', 'lah lha ha', '2016-07-05 00:00:00'),
(456, 'a101', 'abcdefg', 'comment', 'lah lah lah', '2016-07-06 00:00:00'),
(567, 'a102', 'abcdefg', 'comment', 'lha lha ha', '2016-07-07 00:00:00'),
(678, 'a103', 'abcdefg', 'comment', 'ha lah lah', '2016-07-08 00:00:00'),
(789, 'a104', 'abcdefg', 'comment', 'hla lah lah', '2016-07-09 00:00:00') ;
[SQL Standard behaviour: 2 rows per Post query]
This was my initial query, with some corrections. I changed the column order of the select list so that you will see some comment related data easily when I present the results. Please study those results they are provided so you may understand what the query will do. Columns preceded by # do not exist in the sample data I am working with for reasons I have already noted.
SELECT
Posts.id
, Posts.uuid
, rcom.uuidPost
, rcom.commentText
, rcom.`date` commentDate
#, Posts.caption
#, Posts.path
, Posts.`date`
, USERS.id
, USERS.username
#, USERS.fullname
, USERS.profileImage
, COALESCE(A.LikeCNT, 0) num_likes
FROM Posts
INNER JOIN USERS ON Posts.id = 145
AND USERS.id = 145
LEFT JOIN (
SELECT
COUNT(A.uuidPost) LikeCNT
, A.UUIDPost
FROM Activity A
WHERE type = 'like'
GROUP BY
A.UUIDPOST
) A ON A.UUIDPost = Posts.uuid
LEFT JOIN (
SELECT
#row_num := IF(#prev_value=UUIDPOST,#row_num+1,1) as row_number
, commentText
, uuidPost
, `date`
, #prev_value := UUIDPOST
FROM Activity
CROSS JOIN ( SELECT #row_num := 1, #prev_value := '' collate utf8_unicode_ci ) xy
WHERE type = 'comment'
ORDER BY
uuidPost
, `date` DESC
) rcom ON rcom.uuidPost = Posts.UUID
AND rcom.row_number <= 2
ORDER BY
posts.`date` DESC
;
See a working demonstration of this query at SQLFiddle
Results:
| id | uuid | uuidPost | commentText | date | date | id | username | profileImage | num_likes |
|-----|---------|----------|-------------|------------------------|---------------------------|-----|-----------------|--------------|-----------|
| 145 | abcdefg | abcdefg | hla lah lah | July, 09 2016 00:00:00 | October, 10 2016 00:00:00 | 145 | used_by_already | blah de blah | 0 |
| 145 | abcdefg | abcdefg | ha lah lah | July, 08 2016 00:00:00 | October, 10 2016 00:00:00 | 145 | used_by_already | blah de blah | 0 |
There are 2 ROWS - as expected. One row for the most recent comment, and another rows for the next most recent comment. This is normal behaviour for SQL and until a comment was added under this answer readers of the question would assume this normal behaviour would be acceptable.
The question lacks a clearly articulated "expected result".
[Option 1: One row per Post query, with UP TO 2 comments, added columns]
In a comment below it was revealed that you did not want 2 rows per post and this would be an easy fix. Well it kind of is easy BUT there are options and the options are dictated by the user in the form of requirements. IF the question had an "expected result" then we would know which option to choose. Nonetheless here is one option
SELECT
Posts.id
, Posts.uuid
, max(case when rcom.row_number = 1 then rcom.commentText end) Comment_one
, max(case when rcom.row_number = 2 then rcom.commentText end) Comment_two
#, Posts.caption
#, Posts.path
, Posts.`date`
, USERS.id
, USERS.username
#, USERS.fullname
, USERS.profileImage
, COALESCE(A.LikeCNT, 0) num_likes
FROM Posts
INNER JOIN USERS ON Posts.id = 145
AND USERS.id = 145
LEFT JOIN (
SELECT
COUNT(A.uuidPost) LikeCNT
, A.UUIDPost
FROM Activity A
WHERE type = 'like'
GROUP BY
A.UUIDPOST
) A ON A.UUIDPost = Posts.uuid
LEFT JOIN (
SELECT
#row_num := IF(#prev_value=UUIDPOST,#row_num+1,1) as row_number
, commentText
, uuidPost
, `date`
, #prev_value := UUIDPOST
FROM Activity
CROSS JOIN ( SELECT #row_num := 1, #prev_value := '' collate utf8_unicode_ci ) xy
WHERE type = 'comment'
ORDER BY
uuidPost
, `date` DESC
) rcom ON rcom.uuidPost = Posts.UUID
AND rcom.row_number <= 2
GROUP BY
Posts.id
, Posts.uuid
#, Posts.caption
#, Posts.path
, Posts.`date`
, USERS.id
, USERS.username
#, USERS.fullname
, USERS.profileImage
, COALESCE(A.LikeCNT, 0)
ORDER BY
posts.`date` DESC
;
See the second query working at SQLFiddle
Results of query 2:
| id | uuid | Comment_one | Comment_two | date | id | username | profileImage | num_likes |
|-----|---------|-------------|-------------|---------------------------|-----|-----------------|--------------|-----------|
| 145 | abcdefg | hla lah lah | ha lah lah | October, 10 2016 00:00:00 | 145 | used_by_already | blah de blah | 0 |
** Option 2, concatenate the most recent comments into a single comma separated list **
SELECT
Posts.id
, Posts.uuid
, group_concat(rcom.commentText) Comments_two_concatenated
#, Posts.caption
#, Posts.path
, Posts.`date`
, USERS.id
, USERS.username
#, USERS.fullname
, USERS.profileImage
, COALESCE(A.LikeCNT, 0) num_likes
FROM Posts
INNER JOIN USERS ON Posts.id = 145
AND USERS.id = 145
LEFT JOIN (
SELECT
COUNT(A.uuidPost) LikeCNT
, A.UUIDPost
FROM Activity A
WHERE type = 'like'
GROUP BY
A.UUIDPOST
) A ON A.UUIDPost = Posts.uuid
LEFT JOIN (
SELECT
#row_num := IF(#prev_value=UUIDPOST,#row_num+1,1) as row_number
, commentText
, uuidPost
, `date`
, #prev_value := UUIDPOST
FROM Activity
CROSS JOIN ( SELECT #row_num := 1, #prev_value := '' collate utf8_unicode_ci ) xy
WHERE type = 'comment'
ORDER BY
uuidPost
, `date` DESC
) rcom ON rcom.uuidPost = Posts.UUID
AND rcom.row_number <= 2
GROUP BY
Posts.id
, Posts.uuid
#, Posts.caption
#, Posts.path
, Posts.`date`
, USERS.id
, USERS.username
#, USERS.fullname
, USERS.profileImage
, COALESCE(A.LikeCNT, 0)
ORDER BY
posts.`date` DESC
See this third query working at SQLFiddle
Results of query 3:
| id | uuid | Comments_two_concatenated | date | id | username | profileImage | num_likes |
|-----|---------|---------------------------|---------------------------|-----|-----------------|--------------|-----------|
| 145 | abcdefg | hla lah lah,ha lah lah | October, 10 2016 00:00:00 | 145 | used_by_already | blah de blah | 0 |
** Summary **
I have presented 3 queries, each one shows only the 2 most recent comments, but each query does that in a different way. The first query (default behaviour) will display 2 rows for each post. Option 2 adds a column but removes the second row. Option 3 concatenates the 2 most recent comments.
Please note that:
The question lacks table definitions covering all columns
The question lacks any sample data, which makes it harder for you to understand any results presented here, but also harder for us to prepare solutions
The question also lacks a definitive "expected result" (the wanted output) and this has led to further complexity in answering
I do hope the additional provided information will be of some use, and that by now you also know that it is normal for SQL to present data as multiple rows. If you do not want that normal behaviour please be specific about what you do really want in your question.
Postscript. To include yet another subquery for "follows" you may use a similar subquery to the one you already have. It may be added before or after that subquery. You may also see it in use at sqlfiddle here
LEFT JOIN (
SELECT
COUNT(*) FollowCNT
, IdOtherUser
FROM Activity
WHERE type = 'Follow'
GROUP BY
IdOtherUser
) F ON USERS.id = F.IdOtherUser
Whilst adding another subquery may resolve your desire for more information, the overall query may get slower in proportion to the growth of your data. Once you have settled on the functionality you really need it may be worthwhile considering what indexes you need on those tables. (I believe you would be advised to ask for that advice separately, and if you do make sure you include 1. the full DDL of your tables and 2. an explain plan of the query.)
I am a little bit lost in your query, but if you want to download data for multiple posts at once, it's not a good idea to include comment data in the first query since you would include all the data about post and posting user multiple times. You should run another query that would connect posts with comments. Something like:
SELECT
A.UUIDPost,
C.username,
C.profileImage,
B.Comment,
B.[DateField]
FROM Posts A JOIN
Activities B ON A.uuid = B.UUIDPost JOIN
Users C ON B.[UserId] = C.id
and use that data to display your comments with commenting user id, name, image etc.
To get only 3 comments per post, you can look into this post:
Select top 3 values from each group in a table with SQL
if you are sure that there are going to be no duplicate rows in the comment table or this post:
How to select top 3 values from each group in a table with SQL which have duplicates
if you're not sure about that (although due to DateField in the table, it should not be possible).
UNTESTED: I would recommend putting together an SQL fiddle with some sample data and your existing table structure showing the problem; that way we could play around with the responses and ensure functionality with your schema.
So we use a variables to simulate a window function (Such as row_number)
in this case #Row_num and #prev_Value. #Row_number keeps track of the current row for each post (since a single post could have lots of comments) then when the a new post ID (UUIDPOST?) is encountered the row_num variable is reset to 1. When the current records UUIDPOST matches the variable #prev_Value, we simply increment the row by 1.
This technique allows us to assign a row number based on the date or activity ID order descending. As each cross join only results in 1 record we don't cause duplicate records to appear. However, since we then limit by row_number < = 2 we only get the two most recent comments in our newly added left join.
This assumes posts relation to users is a Many to one, meaning a post can only have 1 user.
Something like This: though I'm not sure about the final left join I need to better understand the structure of the activity table thus a comment against the original question.
SELECT Posts.id,
Posts.uuid,
Posts.caption,
Posts.path,
Posts.date,
USERS.id,
USERS.username,
USERS.fullname,
USERS.profileImage,
coalesce(A.LikeCNT,0)
com.comment
FROM Posts
INNER JOIN USERS
ON Posts.id = 145
AND USERS.id = 145
LEFT JOIN (SELECT COUNT(A.uuidPost) LikeCNT, A.UUIDPost
FROM Activity A
WHERE type = 'like'
GROUP BY A.UUIDPOST) A
on A.UUIDPost=Posts.uuid
--This join simulates row_Number() over (partition by PostID, order by activityID desc) (Nice article [here](http://preilly.me/2011/11/11/mysql-row_number/) several other examples exist on SO already.
--Meaning.... Generate a row number for each activity from 1-X restarting at 1 for each new post but start numbering at the newest activityID)
LEFT JOIN (SELECT comment, UUIDPOST, #row_num := IF(#prev_value=UUIDPOST,#row_num+1,1) as row_number,#prev_value := UUIDPOST
FROM ACTIVITY
CROSS JOIN (SELECT #row_num := 1) x
CROSS JOIN (SELECT #prev_value := '') y
WHERE type = 'comment'
ORDER BY UUIDPOST, --Some date or ID desc) Com
on Com.UUIIDPOSt = Posts.UUID
and row_number < = 2
-- Now since we have a row_number restarting at 1 for each new post, simply return only the 1st two rows.
ORDER BY date DESC
LIMIT 0, 5
we had to put the and row_number < = 2 on the join itself. If it was put in the where clause you would lose those posts without any comments which I think you still want.
Additionally we should probably look at the "comment" field to make sure it's not blank or null, but lets make sure this works first.
This type of comment has been posted many times, and trying to get the "latest-for-each" always appears to be a stumbling block and join / subquery nightmare for most.
Especially for a web interface, you might be better to tack on a column (or 2 or 3) to the one table that is your active "posts" table such as Latest1, Latest2, Latest3.
Then, via an insert into your comment table, have an insert trigger on your table to update the main post with the newest ID. Then you always have that ID on the table without any sub-joins. Now, as you mentioned, you might want to have the last 2 or 3 IDs, then add the 3 sample columns and have your insert trigger to the post comment detail do an update to the primary post table something like
update PrimaryPostTable
set Latest3 = Latest2,
Latest2 = Latest1,
Latest1 = NewDetailCommentID
where PostID = PostIDFromTheInsertedDetail
This would have to be formalized into a proper trigger under MySQL, but should be easy enough to implement. You could prime the list with the latest 1, then as new posts go, it would automatically roll the most recent into their 1st, 2nd, 3rd positions. Finally your query could be simplified down to something like
Select
P.PostID,
P.TopicDescription,
PD1.WhateverDetail as LatestDetail1,
PD2.WhateverDetail as LatestDetail2,
PD3.WhateverDetail as LatestDetail3
from
Posts P
LEFT JOIN PostDetail PD1
on P.Latest1 = PD1.PostDetailID
LEFT JOIN PostDetail PD2
on P.Latest2 = PD2.PostDetailID
LEFT JOIN PostDetail PD3
on P.Latest3 = PD3.PostDetailID
where
whateverCondition
Denormalizing data is typically NOT desired. However, in cases such as this, it is a great simplifier for getting these "latest" entries in a For-Each type of query. Good luck.
Here is a fully working sample in MySQL so you can see the tables and the results of the sql-inserts and the automatic stamping via the trigger to update the main post table. Then querying the post table you can see how the most recent automatically rolls into first, second and third positions. Finally a join showing how to pull all the data from each "post activity"
CREATE TABLE Posts
( id int,
uuid varchar(7),
imageLink varchar(9),
`date` datetime,
ActivityID1 int null,
ActivityID2 int null,
ActivityID3 int null,
PRIMARY KEY (id)
);
CREATE TABLE Activity
( id int,
postid int,
`type` varchar(40) collate utf8_unicode_ci,
commentText varchar(20) collate utf8_unicode_ci,
`date` datetime,
PRIMARY KEY (id)
);
DELIMITER //
CREATE TRIGGER ActivityRecAdded
AFTER INSERT ON Activity FOR EACH ROW
BEGIN
Update Posts
set ActivityID3 = ActivityID2,
ActivityID2 = ActivityID1,
ActivityID1 = NEW.ID
where
ID = NEW.POSTID;
END; //
DELIMITER ;
INSERT INTO Posts
(id, uuid, imageLink, `date`)
VALUES
(123, 'test1', 'blah', '2016-10-26 00:00:00');
INSERT INTO Posts
(id, uuid, imageLink, `date`)
VALUES
(125, 'test2', 'blah 2', '2016-10-26 00:00:00');
INSERT INTO Activity
(id, postid, `type`, `commentText`, `date`)
VALUES
(789, 123, 'type1', 'any comment', '2016-10-26 00:00:00'),
(821, 125, 'type2', 'another comment', '2016-10-26 00:00:00'),
(824, 125, 'type3', 'third comment', '2016-10-27 00:00:00'),
(912, 123, 'typeAB', 'comment', '2016-10-27 00:00:00');
-- See the results after the insert and the triggers.
-- you will see that the post table has been updated with the
-- most recent
-- activity post ID=912 in position Posts.Activity1
-- activity post ID=789 in position Posts.Activity2
-- no value in position Posts.Activity3
select * from Posts;
-- NOW, insert two more records for post ID = 123.
-- you will see the shift of ActivityIDs adjusted
INSERT INTO Activity
(id, postid, `type`, `commentText`, `date`)
VALUES
(931, 123, 'type1', 'any comment', '2016-10-28 00:00:00'),
(948, 123, 'newest', 'blah', '2016-10-29 00:00:00');
-- See the results after the insert and the triggers.
-- you will see that the post table has been updated with the
-- most recent
-- activity post ID=948 in position Posts.Activity1
-- activity post ID=931 in position Posts.Activity2
-- activity post ID=912 in position Posts.Activity3
-- notice the FIRST activity post 789 is not there as
-- anything AFTER the 4th entry, it got pushed away.
select * from Posts;
-- Finally, query the data to get the most recent 3 items for each post.
select
p.id,
p.uuid,
p.imageLink,
p.`date`,
A1.id NewestActivityPostID,
A1.`type` NewestType,
A1.`date` NewestDate,
A2.id SecondActivityPostID,
A2.`type` SecondType,
A2.`date` SecondDate,
A3.id ThirdActivityPostID,
A3.`type` ThirdType,
A3.`date` ThirdDate
from
Posts p
left join Activity A1
on p.ActivityID1 = A1.ID
left join Activity A2
on p.ActivityID2 = A2.ID
left join Activity A3
on p.ActivityID3 = A3.ID;
You can create a test database as to not corrupt yours to see this example.
This will probably get rid of the illegal mix of collations... Just after establishing the connection, perform this query:
SET NAMES utf8 COLLATE utf8_unicode_ci;
For the question about the 'latest 2', please use the mysql commandline tool and run SHOW CREATE TABLE Posts and provide the output. (Ditto for the other relevant tables.) Phpmyadmin (and other UIs) have a way to perform the query without getting to a command line.
You can get there with a pretty simple query by using sub-queries. First I specify the user in the where-clause and join the posts because it seems more logic to me. Then I get all the likes for a post with a sub-query.
Now instead of grouping and limiting the group size we join only the values we want to by limiting the count of dates after the date we are currently looking at.
INNER JOIN Activity if you only want to show posts with at least one comment.
SELECT
u.id,
u.username,
u.fullname,
u.profileImage,
p.uuid,
p.caption,
p.path,
p.date,
(SELECT COUNT(*) FROM Activity v WHERE v.uuidPost = p.uuidPost AND v.type = 'like') likes,
a.commentText,
a.date
FROM
Users u INNER JOIN
Posts p ON p.id = u.id LEFT JOIN
Activity a ON a.uuid = p.uuid AND a.type = 'comment' AND 2 > (
SELECT COUNT(*) FROM Activity v
WHERE v.uuid = p.uuid AND v.type = 'comment' AND v.date > a.date)
WHERE
u.id = 145
That said a redesign would probably be best, also performance-wise (Activity will soon contain a lot of entries and they always have to be filtered for the desired type). The user table is okay with the id auto-incremented and as primary key. For the posts I would also add an auto-incremented id as primary key and user_id as foreign key (you can also decide what to do on deletion, e.g. with cascade all his posts would also be deleted automatically).
For the comments and likes you can create separated tables with the two foreign keys user_id and post_id (simple example, like this you can only like posts and nothing else, but if there are not many different kind of likes it could still be good to create a post_likes and few other ..._likes tables, you have to think about how this data is usually queried, if those likes are mostly independent from each other it's probably a good choice).
So I am working with a client to implement a similar system as the "badges and privileges system" on StackExchange. Although in her system, she is looking to use points and rewards for her staff. It's the same basic principle. The users are rewarded points for good team work and gain rewards from these points. I thought it would be handy to add the same kind of feature which SE uses to display these in the top nav bar, where it shows your rep and badges in order of the date you have earned either of them. This is my issue, I have found help retrieving the data together from the two separate tables but am not sure how I would display these results in order of date earned? As an example:
User ID #1 has earned 50 points on 18/12/2015 would be in ap_user_points table
User ID #1 has earned 'The Gift Voucher' reward on '17/12/2015'
If I simply:
echo $row8['reward'] . $row8['points_added']
It would echo as:
The Gift Voucher 50
Where I need it in order by date as:
50
The Gift Voucher
If you look at your rep and badge icon in the nav bar you'll see what I'm getting at here, it's a similar system.
<?php
$user_id = $_SESSION['userid'];
$sql8 = "
SELECT r.reward_id,
r.user_id,
r.reward as reward,
r.date_earned as date_earned,
r.badge_desc,
NULL AS points_added,
NULL AS added_for,
NULL AS date_added
FROM ap_user_rewards as r
WHERE r.user_id = '$user_id'
UNION ALL
SELECT NULL,
NULL,
NULL,
NULL,
NULL,
p.points_added AS points_added,
p.added_for AS added_for,
p.date_added AS date_added
FROM ap_user_points as p
WHERE p.user_id = '$user_id' ORDER BY date_earned DESC, date_added DESC;";
$result8 = $conn->query($sql8);
if ($result8->num_rows > 0) {
// output data of each row
while($row8 = $result8->fetch_assoc()) {
////// NOT SURE WHAT TO ECHO HERE?
}
}
?>
Add another column to the result set. In that new column, populate it from both queries... looks like it would be the date_added expression in the first query and the date_earned expression in the second query. When those are in the same column, then ordering is easy. (This also assumes that these expressions are of the same or compatible datatypes, preferably DATE, DATETIME or TIMESTAMP.)
Then you can order by ordinal position, e.g. ORDER BY 2 to order by the second column in the resultset.
SELECT a1
, b1
, NULL
, NULL
, a1 AS sortexpr
FROM ...
UNION ALL
SELECT NULL
, NULL
, x2
, y2
, x2 AS sortexpr
FROM ...
ORDER BY 5 DESC
That's just one possibility. If you can't add an extra column, to line up the expressions from the two queries, then you need a way to discriminate which query is returning the row. I typically include a literal as a discriminator column.
Then you can use implicit-style UNION syntax, wrapping the queries in parens...
( SELECT 'q1' AS `source`
, a1
, b1 AS date_earned
, NULL
, NULL AS date_added
FROM ...
)
UNION ALL
( SELECT 'q2' AS `source`
, NULL
, NULL AS date_earned
, x2
, y2 AS date_added
FROM ...
)
ORDER BY IF(`source`='q1',date_earned,date_added) DESC
Followup
I may have misunderstood the question. I though the question was how to get the rows from a UNION/UNION ALL returned in a particular order.
Personally, I would write the query to include a discriminator column, and then line up the columns as much as I could, so they would be processed the same.
As an example:
SELECT 'reward' AS `source`
, r.date_earned AS `seq`
, r.user_id AS `user_id`
, r.date_earned AS `date_earned`
, r.reward_id
, r.reward
, r.badge_desc
, NULL AS `points_added`
, NULL AS `added_for`
FROM r ...
UNION ALL
SELECT 'points' AS `source`
, p.date_added AS `seq`
, p.user_id AS `user_id`
, p.date_added AS `date_earned`
, NULL
, NULL
, NULL
, p.points_added AS `points_added`
, p.added_for AS `added_for`
FROM p ...
ORDER BY 2 DESC, 1 DESC
(It's probably not really necessary to return user_id, since we already know what the value will be. I've returned it here to demonstrate how the columns from the two resultsets can be "lined up".)
Then, when I fetched the rows...
if ( $row8['source'] == 'points' ) {
# process columns from a row of 'points' type
echo $row8['badge_desc'];
echo $row8['user_id'];
} elsif ( $row8['source'] == 'reward' ) {
# process columns from a row of 'reward' type
echo $row8['added_for'];
echo $row8['user_id'];
}
That's how I would do it.
Right now I'm creating an online game where I list the last transfers of players.
The table that handles the history of players, has the columns history_join_date and history_end_date.
When history_end_date is filled, it means that player left a club, and when it is like the default (0000-00-00 00:00:00) and history_join_date has some date it means player joined the club (in that date).
Right now, I've the following query:
SELECT
player_id,
player_nickname,
team_id,
team_name,
history_join_date,
history_end_date
FROM
players
INNER JOIN history
ON history.history_user_id = players.player_id
INNER JOIN teams
ON history.history_team_id = teams.team_id
ORDER BY
history_end_date DESC,
history_join_date DESC
LIMIT 7
However, this query returns something like (filtered with PHP above):
(22-Aug-2012 23:05): Folha has left Portuguese Haxball Team.
(22-Aug-2012 00:25): mancini has left United.
(21-Aug-2012 01:29): PatoDaOldSchool has left Reign In Power.
(22-Aug-2012 23:37): Master has joined Born To Win.
(22-Aug-2012 23:28): AceR has joined Born To Win.
(22-Aug-2012 23:08): Nasri has joined Porto Club of Haxball.
(22-Aug-2012 18:53): Lloyd Banks has joined ARRIBA.
PHP Filter:
foreach ($transfers as $transfer) {
//has joined
if($transfer['history_end_date']<$transfer['history_join_date']) {
$type = ' has joined ';
$date = date("d-M-Y H:i", strtotime($transfer['history_join_date']));
} else {
$type = ' has left ';
$date = date("d-M-Y H:i", strtotime($transfer['history_end_date']));
}
As you can see, in the transfers order, the date is not being followed strictly (22-Aug => 21-Aug => 22-Aug).
What am I missing in the SQL?
Regards!
The issue is you are ordering based upon two different values. So your results are ordered first by history_end_date, and when the end dates are equal (i.e. when it is the default value), they are then ordered by history_join_date
(Note that your first results are all ends, and then your subsequent results are all joins, and each subset is properly ordered).
How much control do you have over this data structure? You might be able to restructure the history table such that there is only a single date, and a history type of JOINED or END... You might be able to make a view of joined_date and end_date and sort across that...
From what you have in the question I made up the following DDL & Data:
create table players (
player_id INT NOT NULL AUTO_INCREMENT PRIMARY KEY,
player_nickname VARCHAR(255) NOT NULL UNIQUE
);
create table teams (
team_id INT NOT NULL AUTO_INCREMENT PRIMARY KEY,
team_name VARCHAR(255) NOT NULL UNIQUE
);
create table history (
history_id INT NOT NULL AUTO_INCREMENT PRIMARY KEY,
history_user_id INT NOT NULL, history_team_id INT NOT NULL,
history_join_date DATETIME NOT NULL,
history_end_date DATETIME NOT NULL DEFAULT "0000-00-00 00:00:00"
);
insert into players VALUES
(1,'Folha'),
(2,'mancini'),
(3,'PatoDaOldSchool'),
(4,'Master'),
(5,'AceR'),
(6,'Nasri'),
(7,'Lloyd Banks');
insert into teams VALUES
(1,'Portuguese Haxball Team'),
(2,'United'),
(3,'Reign In Power'),
(4,'Born To Win'),
(5,'Porto Club of Haxball'),
(6,'ARRIBA');
insert into history VALUES
(DEFAULT,1,1,'2012-08-01 00:04','2012-08-22 23:05'),
(DEFAULT,2,2,'2012-08-21 19:04','2012-08-22 00:25'),
(DEFAULT,3,3,'2012-08-19 01:29','2012-08-21 01:29'),
(DEFAULT,4,4,'2012-08-22 23:37',DEFAULT),
(DEFAULT,5,4,'2012-08-22 23:28',DEFAULT),
(DEFAULT,6,5,'2012-08-22 23:08',DEFAULT),
(DEFAULT,7,6,'2012-08-22 18:53',DEFAULT);
SOLUTION ONE - History Event View
This is obviously not the only solution (and you'd have to evaluate options as they suit your needs, but you could create a view in MySQL for your history events and join to it and use it for ordering similar to the following:
create view historyevent (
event_user_id,
event_team_id,
event_date,
event_type
) AS
SELECT
history_user_id,
history_team_id,
history_join_date,
'JOIN'
FROM history
UNION
SELECT
history_user_id,
history_team_id,
history_end_date,
'END'
FROM history
WHERE history_end_date <> "0000-00-00 00:00:00";
Your select then becomes:
SELECT
player_id,
player_nickname,
team_id,
team_name,
event_date,
event_type
FROM players
INNER JOIN historyevent
ON historyevent.event_user_id = players.player_id
INNER JOIN teams
ON historyevent.event_team_id = teams.team_id
ORDER BY
event_date DESC;
Benefit here is you can get both joins and leaves for the same player.
SOLUTION TWO - Pseudo column. use the IF construction to pick one or the other column.
SELECT
player_id,
player_nickname,
team_id,
team_name,
history_join_date,
history_end_date,
IF(history_end_date>history_join_date,history_end_date,history_join_date) as order_date
FROM
players
INNER JOIN history
ON history.history_user_id = players.player_id
INNER JOIN teams
ON history.history_team_id = teams.team_id
ORDER BY
order_date DESC;
Building from #Barmar's answer, you can also use GREATEST() to pick the greatest of the arguments. (MAX() is a grouping function... not actually what you're looking for)
I think what you want is:
ORDER BY MAX(history_join_date, history_end_date)
This post is taking a substantial amount of time to type because I'm trying to be as clear as possible, so please bear with me if it is still unclear.
Basically, what I have are a table of posts in the database which users can add privacy settings to.
ID | owner_id | post | other_info | privacy_level (int value)
From there, users can add their privacy details, allowing it to be viewable by all [privacy_level = 0), friends (privacy_level = 1), no one (privacy_level = 3), or specific people or filters (privacy_level = 4). For privacy levels specifying specific people (4), the query will reference the table "post_privacy_includes_for" in a subquery to see if the user (or a filter the user belongs to) exists in a row in the table.
ID | post_id | user_id | list_id
Also, the user has the ability to prevent some people from viewing their post in within a larger group by excluding them (e.g., Having it set for everyone to view but hiding it from a stalker user). For this, another reference table is added, "post_privacy_exclude_from" - it looks identical to the setup as "post_privacy_includes_for".
My problem is that this does not scale. At all. At the moment, there are about 1-2 million posts, the majority of them set to be viewable by everyone. For each post on the page it must check to see if there is a row that is excluding the post from being shown to the user - this moves really slow on a page that can be filled with 100-200 posts. It can take up to 2-4 seconds, especially when additional constraints are added to the query.
This also creates extremely large and complex queries that are just... awkward.
SELECT t.*
FROM posts t
WHERE ( (t.privacy_level = 3
AND t.owner_id = ?)
OR (t.privacy_level = 4
AND EXISTS
( SELECT i.id
FROM PostPrivacyIncludeFor i
WHERE i.user_id = ?
AND i.thought_id = t.id)
OR t.privacy_level = 4
AND t.owner_id = ?)
OR (t.privacy_level = 4
AND EXISTS
(SELECT i2.id
FROM PostPrivacyIncludeFor i2
WHERE i2.thought_id = t.id
AND EXISTS
(SELECT r.id
FROM FriendFilterIds r
WHERE r.list_id = i2.list_id
AND r.friend_id = ?))
OR t.privacy_level = 4
AND t.owner_id = ?)
OR (t.privacy_level = 1
AND EXISTS
(SELECT G.id
FROM Following G
WHERE follower_id = t.owner_id
AND following_id = ?
AND friend = 1)
OR t.privacy_level = 1
AND t.owner_id = ?)
OR (NOT EXISTS
(SELECT e.id
FROM PostPrivacyExcludeFrom e
WHERE e.thought_id = t.id
AND e.user_id = ?
AND NOT EXISTS
(SELECT e2.id
FROM PostPrivacyExcludeFrom e2
WHERE e2.thought_id = t.id
AND EXISTS
(SELECT l.id
FROM FriendFilterIds l
WHERE l.list_id = e2.list_id
AND l.friend_id = ?)))
AND t.privacy_level IN (0, 1, 4))
AND t.owner_id = ?
ORDER BY t.created_at LIMIT 100
(mock up query, similar to the query I use now in Doctrine ORM. It's a mess, but you get what I am saying.)
I guess my question is, how would you approach this situation to optimize it? Is there a better way to set up my database? I'm willing to completely scrap the method I have currently built up, but I wouldn't know what to move onto.
Thanks guys.
Updated: Fix the query to reflect the values I defined for privacy level above (I forgot to update it because I simplified the values)
Your query is too long to give a definitive solution for, but the approach I would follow is to simply the data lookups by converting the sub-queries into joins, and then build the logic into the where clause and column list of the select statement:
select t.*, i.*, r.*, G.*, e.* from posts t
left join PostPrivacyIncludeFor i on i.user_id = ? and i.thought_id = t.id
left join FriendFilterIds r on r.list_id = i.list_id and r.friend_id = ?
left join Following G on follower_id = t.owner_id and G.following_id = ? and G.friend=1
left join PostPrivacyExcludeFrom e on e.thought_id = t.id and e.user_id = ?
(This might need expanding: I couldn't follow the logic of the final clause.)
If you can get the simple select working fast AND including all the information needed, then all you need to do is build up the logic in the select list and where clause.
Had a quick stab at simplifying this without re-working your original design too much.
Using this solution your web page can now simply call the following stored procedure to get a list of filtered posts for a given user within a specified period.
call list_user_filtered_posts( <user_id>, <day_interval> );
The whole script can be found here : http://pastie.org/1212812
I haven't fully tested all of this and you may find this solution isn't performant enough for your needs but it may help you in fine tuning/modifying your existing design.
Tables
Dropped your post_privacy_exclude_from table and added a user_stalkers table which works pretty much like the inverse of user_friends. Kept the original post_privacy_includes_for table as per your design as this allows a user restrict a specific post to a subset of people.
drop table if exists users;
create table users
(
user_id int unsigned not null auto_increment primary key,
username varbinary(32) unique not null
)
engine=innodb;
drop table if exists user_friends;
create table user_friends
(
user_id int unsigned not null,
friend_user_id int unsigned not null,
primary key (user_id, friend_user_id)
)
engine=innodb;
drop table if exists user_stalkers;
create table user_stalkers
(
user_id int unsigned not null,
stalker_user_id int unsigned not null,
primary key (user_id, stalker_user_id)
)
engine=innodb;
drop table if exists posts;
create table posts
(
post_id int unsigned not null auto_increment primary key,
user_id int unsigned not null,
privacy_level tinyint unsigned not null default 0,
post_date datetime not null,
key user_idx(user_id),
key post_date_user_idx(post_date, user_id)
)
engine=innodb;
drop table if exists post_privacy_includes_for;
create table post_privacy_includes_for
(
post_id int unsigned not null,
user_id int unsigned not null,
primary key (post_id, user_id)
)
engine=innodb;
Stored Procedures
The stored procedure is relatively simple - it initially selects ALL posts within the specified period and then filters out posts as per your original requirements. I have not performance tested this sproc with large volumes but as the initial selection is relatively small it should be performant enough as well as simplifying your application/middle tier code.
drop procedure if exists list_user_filtered_posts;
delimiter #
create procedure list_user_filtered_posts
(
in p_user_id int unsigned,
in p_day_interval tinyint unsigned
)
proc_main:begin
drop temporary table if exists tmp_posts;
drop temporary table if exists tmp_priv_posts;
-- select ALL posts in the required date range (or whatever selection criteria you require)
create temporary table tmp_posts engine=memory
select
p.post_id, p.user_id, p.privacy_level, 0 as deleted
from
posts p
where
p.post_date between now() - interval p_day_interval day and now()
order by
p.user_id;
-- purge stalker posts (0,1,3,4)
update tmp_posts
inner join user_stalkers us on us.user_id = tmp_posts.user_id and us.stalker_user_id = p_user_id
set
tmp_posts.deleted = 1
where
tmp_posts.user_id != p_user_id;
-- purge other users private posts (3)
update tmp_posts set deleted = 1 where user_id != p_user_id and privacy_level = 3;
-- purge friend only posts (1) i.e where p_user_id is not a friend of the poster
/*
requires another temp table due to mysql temp table problem/bug
http://dev.mysql.com/doc/refman/5.0/en/temporary-table-problems.html
*/
-- the private posts (1) this user can see
create temporary table tmp_priv_posts engine=memory
select
tp.post_id
from
tmp_posts tp
inner join user_friends uf on uf.user_id = tp.user_id and uf.friend_user_id = p_user_id
where
tp.user_id != p_user_id and tp.privacy_level = 1;
-- remove private posts this user cant see
update tmp_posts
left outer join tmp_priv_posts tpp on tmp_posts.post_id = tpp.post_id
set
tmp_posts.deleted = 1
where
tpp.post_id is null and tmp_posts.privacy_level = 1;
-- purge filtered (4)
truncate table tmp_priv_posts; -- reuse tmp table
insert into tmp_priv_posts
select
tp.post_id
from
tmp_posts tp
inner join post_privacy_includes_for ppif on tp.post_id = ppif.post_id and ppif.user_id = p_user_id
where
tp.user_id != p_user_id and tp.privacy_level = 4;
-- remove private posts this user cant see
update tmp_posts
left outer join tmp_priv_posts tpp on tmp_posts.post_id = tpp.post_id
set
tmp_posts.deleted = 1
where
tpp.post_id is null and tmp_posts.privacy_level = 4;
drop temporary table if exists tmp_priv_posts;
-- output filtered posts (display ALL of these on web page)
select
p.*
from
posts p
inner join tmp_posts tp on p.post_id = tp.post_id
where
tp.deleted = 0
order by
p.post_id desc;
-- clean up
drop temporary table if exists tmp_posts;
end proc_main #
delimiter ;
Test Data
Some basic test data.
insert into users (username) values ('f00'),('bar'),('alpha'),('beta'),('gamma'),('omega');
insert into user_friends values
(1,2),(1,3),(1,5),
(2,1),(2,3),(2,4),
(3,1),(3,2),
(4,5),
(5,1),(5,4);
insert into user_stalkers values (4,1);
insert into posts (user_id, privacy_level, post_date) values
-- public (0)
(1,0,now() - interval 8 day),
(1,0,now() - interval 8 day),
(2,0,now() - interval 7 day),
(2,0,now() - interval 7 day),
(3,0,now() - interval 6 day),
(4,0,now() - interval 6 day),
(5,0,now() - interval 5 day),
-- friends only (1)
(1,1,now() - interval 5 day),
(2,1,now() - interval 4 day),
(4,1,now() - interval 4 day),
(5,1,now() - interval 3 day),
-- private (3)
(1,3,now() - interval 3 day),
(2,3,now() - interval 2 day),
(4,3,now() - interval 2 day),
-- filtered (4)
(1,4,now() - interval 1 day),
(4,4,now() - interval 1 day),
(5,4,now());
insert into post_privacy_includes_for values (15,4), (16,1), (17,6);
Testing
As I mentioned before I've not fully tested this but on the surface it seems to be working.
select * from posts;
call list_user_filtered_posts(1,14);
call list_user_filtered_posts(6,14);
call list_user_filtered_posts(1,7);
call list_user_filtered_posts(6,7);
Hope you find some of this of use.