I am trying to get the latest 1 or 2 comments related to each post I download, a bit like instagram does as they show the latest 3 comments for each post, So far I am getting the posts & the likes counts.
Now all I need to do is figure out how to get the latest comments, not too sure how to approach it and that is why I hoping someone with a lot more expertise can help me!
This is my current query:
(SELECT
P.uuid,
P.caption,
P.imageHeight,
P.path,
P.date,
U.id,
U.fullname,
U.coverImage,
U.bio,
U.username,
U.profileImage,
coalesce(Activity.LikeCNT,0),
Activity.CurrentUserLiked
FROM USERS AS U
INNER JOIN Posts AS P
ON P.id = U.id
LEFT JOIN (SELECT COUNT(DISTINCT Activity.uuidPost) LikeCNT, Activity.uuidPost, Activity.id, sum(CASE WHEN Activity.id = $id then 1 else 0 end) as CurrentUserLiked
FROM Activity Activity
WHERE type = 'like'
GROUP BY Activity.uuidPost) Activity
ON Activity.uuidPost = P.uuid
AND Activity.id = U.id
WHERE U.id = $id)
UNION
(SELECT
P.uuid,
P.caption,
P.imageHeight,
P.path,
P.date,
U.id,
U.fullname,
U.coverImage,
U.bio,
U.username,
U.profileImage,
coalesce(Activity.LikeCNT,0),
Activity.CurrentUserLiked
FROM Activity AS A
INNER JOIN USERS AS U
ON A.IdOtherUser=U.id
INNER JOIN Posts AS P
ON P.id = U.id
LEFT JOIN (SELECT COUNT(DISTINCT Activity.uuidPost) LikeCNT, Activity.uuidPost, Activity.id, sum(CASE WHEN Activity.id = $id then 1 else 0 end) as CurrentUserLiked
FROM Activity Activity
WHERE type = 'like'
GROUP BY Activity.uuidPost) Activity
ON Activity.uuidPost = P.uuid
AND Activity.id = U.id
WHERE A.id = $id)
ORDER BY date DESC
LIMIT 0, 5
Basically the comments are store in the same table as the likes.
So the table is Activity, then I have a column comment which stores the comment text, and then the "type" is equal to "comment".
Possibly not very well explained but I am willing to try and give as much detail as possible!
If anyone can help it's very much appreciated!!
UPDATE
On this query given by https://stackoverflow.com/users/1016435/xqbert I am currently getting this error:
Illegal mix of collations (utf8_general_ci,IMPLICIT) and (utf8_unicode_ci,IMPLICIT) for operation '='
SELECT Posts.id,
Posts.uuid,
Posts.caption,
Posts.path,
Posts.date,
USERS.id,
USERS.username,
USERS.fullname,
USERS.profileImage,
coalesce(A.LikeCNT,0),
com.comment
FROM Posts
INNER JOIN USERS
ON Posts.id = 145
AND USERS.id = 145
LEFT JOIN (SELECT COUNT(A.uuidPost) LikeCNT, A.UUIDPost
FROM Activity A
WHERE type = 'like'
GROUP BY A.UUIDPOST) A
on A.UUIDPost=Posts.uuid
LEFT JOIN (SELECT comment, UUIDPOST, #row_num := IF(#prev_value=UUIDPOST,#row_num+1,1) as row_number,#prev_value := UUIDPOST
FROM Activity
CROSS JOIN (SELECT #row_num := 1) x
CROSS JOIN (SELECT #prev_value := '') y
WHERE type = 'comment'
ORDER BY UUIDPOST, date DESC) Com
ON Com.UUIIDPOSt = Posts.UUID
AND row_number <= 2
ORDER BY date DESC
LIMIT 0, 5
Latest Edit
Table structures:
Posts
----------------------------------------------------------
| id | int(11) | | not null |
| uuid | varchar(100) | utf8_unicode_ci | not null |
| imageLink | varchar(500) | utf8_unicode_ci | not null |
| date | timestamp | | not null |
----------------------------------------------------------
USERS
-------------------------------------------------------------
| id | int(11) | | not null |
| username | varchar(100) | utf8_unicode_ci | not null |
| profileImage | varchar(500) | utf8_unicode_ci | not null |
| date | timestamp | | not null |
-------------------------------------------------------------
Activity
----------------------------------------------------------
| id | int(11) | | not null |
| uuid | varchar(100) | utf8_unicode_ci | not null |
| uuidPost | varchar(100) | utf8_unicode_ci | not null |
| type | varchar(50) | utf8_unicode_ci | not null |
| commentText | varchar(500) | utf8_unicode_ci | not null |
| date | timestamp | | not null |
----------------------------------------------------------
Those are some examples, in the "Activity" table in this case "type" will always be equal to "comment".
Sum up of everything and desire result:
When I query the users posts, I would like to be able to go into the "Activity" table and get the latest 2 comments for every posts he has. Maybe there will be no comments so obviously it would return 0, maybe there could be 100 comments for that post. But I only want to get the latest/most recent 2 comments.
An example could be looking at how Instagram does it. For every post the display the most recent comments 1, 2 or 3....
Hope this helps!
Fiddle link
This error message
Illegal mix of collations (utf8_general_ci,IMPLICIT) and
(utf8_unicode_ci,IMPLICIT) for operation '='
is typically due to the definition of your columns and tables. It usually means that on either side of an equal sign there are different collations. What you need to do is choose one and include that decision in your query.
The collation issue here was in the CROSS JOIN of #prev_value which needed an explicit collation to be used.
I have also slightly changed the "row_number" logic to a single cross join and moved the if logic to the extremes of the select list.
Some sample data is displayed below. Sample data is needed to test queries with. Anyone attempting to answer your question with working examples will need data. The reason I am including it here is twofold.
so that you will understand any result I present
so that in future when you ask another SQL related question you understand the importance of supplying data. It is not only more convenient for us that you do this. If the asker provides the sample data then the asker will already understand it - it won't be an invention of some stranger who has devoted some of their time to help out.
Sample Data
Please note some columns are missing from the tables, only the columns specified in the table details have been included.
This sample data has 5 comments against a single post (no likes are recorded)
CREATE TABLE Posts
(
`id` int,
`uuid` varchar(7) collate utf8_unicode_ci,
`imageLink` varchar(9) collate utf8_unicode_ci,
`date` datetime
);
INSERT INTO Posts(`id`, `uuid`, `imageLink`, `date`)
VALUES
(145, 'abcdefg', 'blah blah', '2016-10-10 00:00:00') ;
CREATE TABLE USERS
(
`id` int,
`username` varchar(15) collate utf8_unicode_ci,
`profileImage` varchar(12) collate utf8_unicode_ci,
`date` datetime
) ;
INSERT INTO USERS(`id`, `username`, `profileImage`, `date`)
VALUES
(145, 'used_by_already', 'blah de blah', '2014-01-03 00:00:00') ;
CREATE TABLE Activity
(
`id` int,
`uuid` varchar(4) collate utf8_unicode_ci,
`uuidPost` varchar(7) collate utf8_unicode_ci,
`type` varchar(40) collate utf8_unicode_ci,
`commentText` varchar(11) collate utf8_unicode_ci, `date` datetime
) ;
INSERT INTO Activity (`id`, `uuid`, `uuidPost`, `type`, `commentText`, `date`)
VALUES
(345, 'a100', 'abcdefg', 'comment', 'lah lha ha', '2016-07-05 00:00:00'),
(456, 'a101', 'abcdefg', 'comment', 'lah lah lah', '2016-07-06 00:00:00'),
(567, 'a102', 'abcdefg', 'comment', 'lha lha ha', '2016-07-07 00:00:00'),
(678, 'a103', 'abcdefg', 'comment', 'ha lah lah', '2016-07-08 00:00:00'),
(789, 'a104', 'abcdefg', 'comment', 'hla lah lah', '2016-07-09 00:00:00') ;
[SQL Standard behaviour: 2 rows per Post query]
This was my initial query, with some corrections. I changed the column order of the select list so that you will see some comment related data easily when I present the results. Please study those results they are provided so you may understand what the query will do. Columns preceded by # do not exist in the sample data I am working with for reasons I have already noted.
SELECT
Posts.id
, Posts.uuid
, rcom.uuidPost
, rcom.commentText
, rcom.`date` commentDate
#, Posts.caption
#, Posts.path
, Posts.`date`
, USERS.id
, USERS.username
#, USERS.fullname
, USERS.profileImage
, COALESCE(A.LikeCNT, 0) num_likes
FROM Posts
INNER JOIN USERS ON Posts.id = 145
AND USERS.id = 145
LEFT JOIN (
SELECT
COUNT(A.uuidPost) LikeCNT
, A.UUIDPost
FROM Activity A
WHERE type = 'like'
GROUP BY
A.UUIDPOST
) A ON A.UUIDPost = Posts.uuid
LEFT JOIN (
SELECT
#row_num := IF(#prev_value=UUIDPOST,#row_num+1,1) as row_number
, commentText
, uuidPost
, `date`
, #prev_value := UUIDPOST
FROM Activity
CROSS JOIN ( SELECT #row_num := 1, #prev_value := '' collate utf8_unicode_ci ) xy
WHERE type = 'comment'
ORDER BY
uuidPost
, `date` DESC
) rcom ON rcom.uuidPost = Posts.UUID
AND rcom.row_number <= 2
ORDER BY
posts.`date` DESC
;
See a working demonstration of this query at SQLFiddle
Results:
| id | uuid | uuidPost | commentText | date | date | id | username | profileImage | num_likes |
|-----|---------|----------|-------------|------------------------|---------------------------|-----|-----------------|--------------|-----------|
| 145 | abcdefg | abcdefg | hla lah lah | July, 09 2016 00:00:00 | October, 10 2016 00:00:00 | 145 | used_by_already | blah de blah | 0 |
| 145 | abcdefg | abcdefg | ha lah lah | July, 08 2016 00:00:00 | October, 10 2016 00:00:00 | 145 | used_by_already | blah de blah | 0 |
There are 2 ROWS - as expected. One row for the most recent comment, and another rows for the next most recent comment. This is normal behaviour for SQL and until a comment was added under this answer readers of the question would assume this normal behaviour would be acceptable.
The question lacks a clearly articulated "expected result".
[Option 1: One row per Post query, with UP TO 2 comments, added columns]
In a comment below it was revealed that you did not want 2 rows per post and this would be an easy fix. Well it kind of is easy BUT there are options and the options are dictated by the user in the form of requirements. IF the question had an "expected result" then we would know which option to choose. Nonetheless here is one option
SELECT
Posts.id
, Posts.uuid
, max(case when rcom.row_number = 1 then rcom.commentText end) Comment_one
, max(case when rcom.row_number = 2 then rcom.commentText end) Comment_two
#, Posts.caption
#, Posts.path
, Posts.`date`
, USERS.id
, USERS.username
#, USERS.fullname
, USERS.profileImage
, COALESCE(A.LikeCNT, 0) num_likes
FROM Posts
INNER JOIN USERS ON Posts.id = 145
AND USERS.id = 145
LEFT JOIN (
SELECT
COUNT(A.uuidPost) LikeCNT
, A.UUIDPost
FROM Activity A
WHERE type = 'like'
GROUP BY
A.UUIDPOST
) A ON A.UUIDPost = Posts.uuid
LEFT JOIN (
SELECT
#row_num := IF(#prev_value=UUIDPOST,#row_num+1,1) as row_number
, commentText
, uuidPost
, `date`
, #prev_value := UUIDPOST
FROM Activity
CROSS JOIN ( SELECT #row_num := 1, #prev_value := '' collate utf8_unicode_ci ) xy
WHERE type = 'comment'
ORDER BY
uuidPost
, `date` DESC
) rcom ON rcom.uuidPost = Posts.UUID
AND rcom.row_number <= 2
GROUP BY
Posts.id
, Posts.uuid
#, Posts.caption
#, Posts.path
, Posts.`date`
, USERS.id
, USERS.username
#, USERS.fullname
, USERS.profileImage
, COALESCE(A.LikeCNT, 0)
ORDER BY
posts.`date` DESC
;
See the second query working at SQLFiddle
Results of query 2:
| id | uuid | Comment_one | Comment_two | date | id | username | profileImage | num_likes |
|-----|---------|-------------|-------------|---------------------------|-----|-----------------|--------------|-----------|
| 145 | abcdefg | hla lah lah | ha lah lah | October, 10 2016 00:00:00 | 145 | used_by_already | blah de blah | 0 |
** Option 2, concatenate the most recent comments into a single comma separated list **
SELECT
Posts.id
, Posts.uuid
, group_concat(rcom.commentText) Comments_two_concatenated
#, Posts.caption
#, Posts.path
, Posts.`date`
, USERS.id
, USERS.username
#, USERS.fullname
, USERS.profileImage
, COALESCE(A.LikeCNT, 0) num_likes
FROM Posts
INNER JOIN USERS ON Posts.id = 145
AND USERS.id = 145
LEFT JOIN (
SELECT
COUNT(A.uuidPost) LikeCNT
, A.UUIDPost
FROM Activity A
WHERE type = 'like'
GROUP BY
A.UUIDPOST
) A ON A.UUIDPost = Posts.uuid
LEFT JOIN (
SELECT
#row_num := IF(#prev_value=UUIDPOST,#row_num+1,1) as row_number
, commentText
, uuidPost
, `date`
, #prev_value := UUIDPOST
FROM Activity
CROSS JOIN ( SELECT #row_num := 1, #prev_value := '' collate utf8_unicode_ci ) xy
WHERE type = 'comment'
ORDER BY
uuidPost
, `date` DESC
) rcom ON rcom.uuidPost = Posts.UUID
AND rcom.row_number <= 2
GROUP BY
Posts.id
, Posts.uuid
#, Posts.caption
#, Posts.path
, Posts.`date`
, USERS.id
, USERS.username
#, USERS.fullname
, USERS.profileImage
, COALESCE(A.LikeCNT, 0)
ORDER BY
posts.`date` DESC
See this third query working at SQLFiddle
Results of query 3:
| id | uuid | Comments_two_concatenated | date | id | username | profileImage | num_likes |
|-----|---------|---------------------------|---------------------------|-----|-----------------|--------------|-----------|
| 145 | abcdefg | hla lah lah,ha lah lah | October, 10 2016 00:00:00 | 145 | used_by_already | blah de blah | 0 |
** Summary **
I have presented 3 queries, each one shows only the 2 most recent comments, but each query does that in a different way. The first query (default behaviour) will display 2 rows for each post. Option 2 adds a column but removes the second row. Option 3 concatenates the 2 most recent comments.
Please note that:
The question lacks table definitions covering all columns
The question lacks any sample data, which makes it harder for you to understand any results presented here, but also harder for us to prepare solutions
The question also lacks a definitive "expected result" (the wanted output) and this has led to further complexity in answering
I do hope the additional provided information will be of some use, and that by now you also know that it is normal for SQL to present data as multiple rows. If you do not want that normal behaviour please be specific about what you do really want in your question.
Postscript. To include yet another subquery for "follows" you may use a similar subquery to the one you already have. It may be added before or after that subquery. You may also see it in use at sqlfiddle here
LEFT JOIN (
SELECT
COUNT(*) FollowCNT
, IdOtherUser
FROM Activity
WHERE type = 'Follow'
GROUP BY
IdOtherUser
) F ON USERS.id = F.IdOtherUser
Whilst adding another subquery may resolve your desire for more information, the overall query may get slower in proportion to the growth of your data. Once you have settled on the functionality you really need it may be worthwhile considering what indexes you need on those tables. (I believe you would be advised to ask for that advice separately, and if you do make sure you include 1. the full DDL of your tables and 2. an explain plan of the query.)
I am a little bit lost in your query, but if you want to download data for multiple posts at once, it's not a good idea to include comment data in the first query since you would include all the data about post and posting user multiple times. You should run another query that would connect posts with comments. Something like:
SELECT
A.UUIDPost,
C.username,
C.profileImage,
B.Comment,
B.[DateField]
FROM Posts A JOIN
Activities B ON A.uuid = B.UUIDPost JOIN
Users C ON B.[UserId] = C.id
and use that data to display your comments with commenting user id, name, image etc.
To get only 3 comments per post, you can look into this post:
Select top 3 values from each group in a table with SQL
if you are sure that there are going to be no duplicate rows in the comment table or this post:
How to select top 3 values from each group in a table with SQL which have duplicates
if you're not sure about that (although due to DateField in the table, it should not be possible).
UNTESTED: I would recommend putting together an SQL fiddle with some sample data and your existing table structure showing the problem; that way we could play around with the responses and ensure functionality with your schema.
So we use a variables to simulate a window function (Such as row_number)
in this case #Row_num and #prev_Value. #Row_number keeps track of the current row for each post (since a single post could have lots of comments) then when the a new post ID (UUIDPOST?) is encountered the row_num variable is reset to 1. When the current records UUIDPOST matches the variable #prev_Value, we simply increment the row by 1.
This technique allows us to assign a row number based on the date or activity ID order descending. As each cross join only results in 1 record we don't cause duplicate records to appear. However, since we then limit by row_number < = 2 we only get the two most recent comments in our newly added left join.
This assumes posts relation to users is a Many to one, meaning a post can only have 1 user.
Something like This: though I'm not sure about the final left join I need to better understand the structure of the activity table thus a comment against the original question.
SELECT Posts.id,
Posts.uuid,
Posts.caption,
Posts.path,
Posts.date,
USERS.id,
USERS.username,
USERS.fullname,
USERS.profileImage,
coalesce(A.LikeCNT,0)
com.comment
FROM Posts
INNER JOIN USERS
ON Posts.id = 145
AND USERS.id = 145
LEFT JOIN (SELECT COUNT(A.uuidPost) LikeCNT, A.UUIDPost
FROM Activity A
WHERE type = 'like'
GROUP BY A.UUIDPOST) A
on A.UUIDPost=Posts.uuid
--This join simulates row_Number() over (partition by PostID, order by activityID desc) (Nice article [here](http://preilly.me/2011/11/11/mysql-row_number/) several other examples exist on SO already.
--Meaning.... Generate a row number for each activity from 1-X restarting at 1 for each new post but start numbering at the newest activityID)
LEFT JOIN (SELECT comment, UUIDPOST, #row_num := IF(#prev_value=UUIDPOST,#row_num+1,1) as row_number,#prev_value := UUIDPOST
FROM ACTIVITY
CROSS JOIN (SELECT #row_num := 1) x
CROSS JOIN (SELECT #prev_value := '') y
WHERE type = 'comment'
ORDER BY UUIDPOST, --Some date or ID desc) Com
on Com.UUIIDPOSt = Posts.UUID
and row_number < = 2
-- Now since we have a row_number restarting at 1 for each new post, simply return only the 1st two rows.
ORDER BY date DESC
LIMIT 0, 5
we had to put the and row_number < = 2 on the join itself. If it was put in the where clause you would lose those posts without any comments which I think you still want.
Additionally we should probably look at the "comment" field to make sure it's not blank or null, but lets make sure this works first.
This type of comment has been posted many times, and trying to get the "latest-for-each" always appears to be a stumbling block and join / subquery nightmare for most.
Especially for a web interface, you might be better to tack on a column (or 2 or 3) to the one table that is your active "posts" table such as Latest1, Latest2, Latest3.
Then, via an insert into your comment table, have an insert trigger on your table to update the main post with the newest ID. Then you always have that ID on the table without any sub-joins. Now, as you mentioned, you might want to have the last 2 or 3 IDs, then add the 3 sample columns and have your insert trigger to the post comment detail do an update to the primary post table something like
update PrimaryPostTable
set Latest3 = Latest2,
Latest2 = Latest1,
Latest1 = NewDetailCommentID
where PostID = PostIDFromTheInsertedDetail
This would have to be formalized into a proper trigger under MySQL, but should be easy enough to implement. You could prime the list with the latest 1, then as new posts go, it would automatically roll the most recent into their 1st, 2nd, 3rd positions. Finally your query could be simplified down to something like
Select
P.PostID,
P.TopicDescription,
PD1.WhateverDetail as LatestDetail1,
PD2.WhateverDetail as LatestDetail2,
PD3.WhateverDetail as LatestDetail3
from
Posts P
LEFT JOIN PostDetail PD1
on P.Latest1 = PD1.PostDetailID
LEFT JOIN PostDetail PD2
on P.Latest2 = PD2.PostDetailID
LEFT JOIN PostDetail PD3
on P.Latest3 = PD3.PostDetailID
where
whateverCondition
Denormalizing data is typically NOT desired. However, in cases such as this, it is a great simplifier for getting these "latest" entries in a For-Each type of query. Good luck.
Here is a fully working sample in MySQL so you can see the tables and the results of the sql-inserts and the automatic stamping via the trigger to update the main post table. Then querying the post table you can see how the most recent automatically rolls into first, second and third positions. Finally a join showing how to pull all the data from each "post activity"
CREATE TABLE Posts
( id int,
uuid varchar(7),
imageLink varchar(9),
`date` datetime,
ActivityID1 int null,
ActivityID2 int null,
ActivityID3 int null,
PRIMARY KEY (id)
);
CREATE TABLE Activity
( id int,
postid int,
`type` varchar(40) collate utf8_unicode_ci,
commentText varchar(20) collate utf8_unicode_ci,
`date` datetime,
PRIMARY KEY (id)
);
DELIMITER //
CREATE TRIGGER ActivityRecAdded
AFTER INSERT ON Activity FOR EACH ROW
BEGIN
Update Posts
set ActivityID3 = ActivityID2,
ActivityID2 = ActivityID1,
ActivityID1 = NEW.ID
where
ID = NEW.POSTID;
END; //
DELIMITER ;
INSERT INTO Posts
(id, uuid, imageLink, `date`)
VALUES
(123, 'test1', 'blah', '2016-10-26 00:00:00');
INSERT INTO Posts
(id, uuid, imageLink, `date`)
VALUES
(125, 'test2', 'blah 2', '2016-10-26 00:00:00');
INSERT INTO Activity
(id, postid, `type`, `commentText`, `date`)
VALUES
(789, 123, 'type1', 'any comment', '2016-10-26 00:00:00'),
(821, 125, 'type2', 'another comment', '2016-10-26 00:00:00'),
(824, 125, 'type3', 'third comment', '2016-10-27 00:00:00'),
(912, 123, 'typeAB', 'comment', '2016-10-27 00:00:00');
-- See the results after the insert and the triggers.
-- you will see that the post table has been updated with the
-- most recent
-- activity post ID=912 in position Posts.Activity1
-- activity post ID=789 in position Posts.Activity2
-- no value in position Posts.Activity3
select * from Posts;
-- NOW, insert two more records for post ID = 123.
-- you will see the shift of ActivityIDs adjusted
INSERT INTO Activity
(id, postid, `type`, `commentText`, `date`)
VALUES
(931, 123, 'type1', 'any comment', '2016-10-28 00:00:00'),
(948, 123, 'newest', 'blah', '2016-10-29 00:00:00');
-- See the results after the insert and the triggers.
-- you will see that the post table has been updated with the
-- most recent
-- activity post ID=948 in position Posts.Activity1
-- activity post ID=931 in position Posts.Activity2
-- activity post ID=912 in position Posts.Activity3
-- notice the FIRST activity post 789 is not there as
-- anything AFTER the 4th entry, it got pushed away.
select * from Posts;
-- Finally, query the data to get the most recent 3 items for each post.
select
p.id,
p.uuid,
p.imageLink,
p.`date`,
A1.id NewestActivityPostID,
A1.`type` NewestType,
A1.`date` NewestDate,
A2.id SecondActivityPostID,
A2.`type` SecondType,
A2.`date` SecondDate,
A3.id ThirdActivityPostID,
A3.`type` ThirdType,
A3.`date` ThirdDate
from
Posts p
left join Activity A1
on p.ActivityID1 = A1.ID
left join Activity A2
on p.ActivityID2 = A2.ID
left join Activity A3
on p.ActivityID3 = A3.ID;
You can create a test database as to not corrupt yours to see this example.
This will probably get rid of the illegal mix of collations... Just after establishing the connection, perform this query:
SET NAMES utf8 COLLATE utf8_unicode_ci;
For the question about the 'latest 2', please use the mysql commandline tool and run SHOW CREATE TABLE Posts and provide the output. (Ditto for the other relevant tables.) Phpmyadmin (and other UIs) have a way to perform the query without getting to a command line.
You can get there with a pretty simple query by using sub-queries. First I specify the user in the where-clause and join the posts because it seems more logic to me. Then I get all the likes for a post with a sub-query.
Now instead of grouping and limiting the group size we join only the values we want to by limiting the count of dates after the date we are currently looking at.
INNER JOIN Activity if you only want to show posts with at least one comment.
SELECT
u.id,
u.username,
u.fullname,
u.profileImage,
p.uuid,
p.caption,
p.path,
p.date,
(SELECT COUNT(*) FROM Activity v WHERE v.uuidPost = p.uuidPost AND v.type = 'like') likes,
a.commentText,
a.date
FROM
Users u INNER JOIN
Posts p ON p.id = u.id LEFT JOIN
Activity a ON a.uuid = p.uuid AND a.type = 'comment' AND 2 > (
SELECT COUNT(*) FROM Activity v
WHERE v.uuid = p.uuid AND v.type = 'comment' AND v.date > a.date)
WHERE
u.id = 145
That said a redesign would probably be best, also performance-wise (Activity will soon contain a lot of entries and they always have to be filtered for the desired type). The user table is okay with the id auto-incremented and as primary key. For the posts I would also add an auto-incremented id as primary key and user_id as foreign key (you can also decide what to do on deletion, e.g. with cascade all his posts would also be deleted automatically).
For the comments and likes you can create separated tables with the two foreign keys user_id and post_id (simple example, like this you can only like posts and nothing else, but if there are not many different kind of likes it could still be good to create a post_likes and few other ..._likes tables, you have to think about how this data is usually queried, if those likes are mostly independent from each other it's probably a good choice).
Related
I have a table ("lms_attendance") of users' check-in and out times that looks like this:
id user time io (enum)
1 9 1370931202 out
2 9 1370931664 out
3 6 1370932128 out
4 12 1370932128 out
5 12 1370933037 in
I'm trying to create a view of this table that would output only the most recent record per user id, while giving me the "in" or "out" value, so something like:
id user time io
2 9 1370931664 out
3 6 1370932128 out
5 12 1370933037 in
I'm pretty close so far, but I realized that views won't accept subquerys, which is making it a lot harder. The closest query I got was :
select
`lms_attendance`.`id` AS `id`,
`lms_attendance`.`user` AS `user`,
max(`lms_attendance`.`time`) AS `time`,
`lms_attendance`.`io` AS `io`
from `lms_attendance`
group by
`lms_attendance`.`user`,
`lms_attendance`.`io`
But what I get is :
id user time io
3 6 1370932128 out
1 9 1370931664 out
5 12 1370933037 in
4 12 1370932128 out
Which is close, but not perfect. I know that last group by shouldn't be there, but without it, it returns the most recent time, but not with it's relative IO value.
Any ideas?
Thanks!
Query:
SQLFIDDLEExample
SELECT t1.*
FROM lms_attendance t1
WHERE t1.time = (SELECT MAX(t2.time)
FROM lms_attendance t2
WHERE t2.user = t1.user)
Result:
| ID | USER | TIME | IO |
--------------------------------
| 2 | 9 | 1370931664 | out |
| 3 | 6 | 1370932128 | out |
| 5 | 12 | 1370933037 | in |
Note that if a user has multiple records with the same "maximum" time, the query above will return more than one record. If you only want 1 record per user, use the query below:
SQLFIDDLEExample
SELECT t1.*
FROM lms_attendance t1
WHERE t1.id = (SELECT t2.id
FROM lms_attendance t2
WHERE t2.user = t1.user
ORDER BY t2.id DESC
LIMIT 1)
No need to trying reinvent the wheel, as this is common greatest-n-per-group problem. Very nice solution is presented.
I prefer the most simplistic solution (see SQLFiddle, updated Justin's) without subqueries (thus easy to use in views):
SELECT t1.*
FROM lms_attendance AS t1
LEFT OUTER JOIN lms_attendance AS t2
ON t1.user = t2.user
AND (t1.time < t2.time
OR (t1.time = t2.time AND t1.Id < t2.Id))
WHERE t2.user IS NULL
This also works in a case where there are two different records with the same greatest value within the same group - thanks to the trick with (t1.time = t2.time AND t1.Id < t2.Id). All I am doing here is to assure that in case when two records of the same user have same time only one is chosen. Doesn't actually matter if the criteria is Id or something else - basically any criteria that is guaranteed to be unique would make the job here.
Based in #TMS answer, I like it because there's no need for subqueries but I think ommiting the 'OR' part will be sufficient and much simpler to understand and read.
SELECT t1.*
FROM lms_attendance AS t1
LEFT JOIN lms_attendance AS t2
ON t1.user = t2.user
AND t1.time < t2.time
WHERE t2.user IS NULL
if you are not interested in rows with null times you can filter them in the WHERE clause:
SELECT t1.*
FROM lms_attendance AS t1
LEFT JOIN lms_attendance AS t2
ON t1.user = t2.user
AND t1.time < t2.time
WHERE t2.user IS NULL and t1.time IS NOT NULL
Already solved, but just for the record, another approach would be to create two views...
CREATE TABLE lms_attendance
(id int, user int, time int, io varchar(3));
CREATE VIEW latest_all AS
SELECT la.user, max(la.time) time
FROM lms_attendance la
GROUP BY la.user;
CREATE VIEW latest_io AS
SELECT la.*
FROM lms_attendance la
JOIN latest_all lall
ON lall.user = la.user
AND lall.time = la.time;
INSERT INTO lms_attendance
VALUES
(1, 9, 1370931202, 'out'),
(2, 9, 1370931664, 'out'),
(3, 6, 1370932128, 'out'),
(4, 12, 1370932128, 'out'),
(5, 12, 1370933037, 'in');
SELECT * FROM latest_io;
Click here to see it in action at SQL Fiddle
If your on MySQL 8.0 or higher you can use Window functions:
Query:
DBFiddleExample
SELECT DISTINCT
FIRST_VALUE(ID) OVER (PARTITION BY lms_attendance.USER ORDER BY lms_attendance.TIME DESC) AS ID,
FIRST_VALUE(USER) OVER (PARTITION BY lms_attendance.USER ORDER BY lms_attendance.TIME DESC) AS USER,
FIRST_VALUE(TIME) OVER (PARTITION BY lms_attendance.USER ORDER BY lms_attendance.TIME DESC) AS TIME,
FIRST_VALUE(IO) OVER (PARTITION BY lms_attendance.USER ORDER BY lms_attendance.TIME DESC) AS IO
FROM lms_attendance;
Result:
| ID | USER | TIME | IO |
--------------------------------
| 2 | 9 | 1370931664 | out |
| 3 | 6 | 1370932128 | out |
| 5 | 12 | 1370933037 | in |
The advantage I see over using the solution proposed by Justin is that it enables you to select the row with the most recent data per user (or per id, or per whatever) even from subqueries without the need for an intermediate view or table.
And in case your running a HANA it is also ~7 times faster :D
Ok, this might be either a hack or error-prone, but somehow this is working as well-
SELECT id, MAX(user) as user, MAX(time) as time, MAX(io) as io FROM lms_attendance GROUP BY id;
select b.* from
(select
`lms_attendance`.`user` AS `user`,
max(`lms_attendance`.`time`) AS `time`
from `lms_attendance`
group by
`lms_attendance`.`user`) a
join
(select *
from `lms_attendance` ) b
on a.user = b.user
and a.time = b.time
I have tried one solution which works for me
SELECT user, MAX(TIME) as time
FROM lms_attendance
GROUP by user
HAVING MAX(time)
I have a very large table and all of the other suggestions here were taking a very long time to execute. I came up with this hacky method that was much faster. The downside is, if the max(date) row has a duplicate date for that user, it will return both of them.
SELECT * FROM mb_web.devices_log WHERE CONCAT(dtime, '-', user_id) in (
SELECT concat(max(dtime), '-', user_id) FROM mb_web.devices_log GROUP BY user_id
)
select result from (
select vorsteuerid as result, count(*) as anzahl from kreditorenrechnung where kundeid = 7148
group by vorsteuerid
) a order by anzahl desc limit 0,1
I have done same thing like below
SELECT t1.*
FROM lms_attendance t1
WHERE t1.id in (SELECT max(t2.id) as id
FROM lms_attendance t2
group BY t2.user)
This will also reduce memory utilization.
Thanks.
Possibly you can do group by user and then order by time desc. Something like as below
SELECT * FROM lms_attendance group by user order by time desc;
Try this query:
select id,user, max(time), io
FROM lms_attendance group by user;
This worked for me:
SELECT user, time FROM
(
SELECT user, time FROM lms_attendance --where clause
) AS T
WHERE (SELECT COUNT(0) FROM table WHERE user = T.user AND time > T.time) = 0
ORDER BY user ASC, time DESC
I would like to fetch one row in my MySQL database for every value in an array. What I'm trying to do is get the posts which were most recently voted on. The votes table has the following structure
| id | postid | voter | vote type | time |
|====|========|=======|===========|============|
| 1 | 1 | 1 | 1 | 1445389824 |
| 2 | 2 | 6 | 1 | 1445408529 |
| 3 | 1 | 5 | 2 | 1445435978 |
I would like to select the posts that were most recently voted on, in the order they were voted on. So, for example, because the ids of the votes ordered by time from greatest to lowest is 3, 2, 1, I would like to select the post ids 1, 2, 1. But, because 1 appears twice, I would only like to select the first one, so the final result would be 1, 2.
This table is going to be very, very large, so selecting every post id and then trimming it to the desirable array using php doesn't seem like a very good idea.
Also, only the posts that are in an array should be selected. For example, selecting all of the posts without omitting duplicates would be
SELECT `postid`
FROM `votes`
WHERE `postid` IN ($posts)
ORDER BY `time` DESC
But by using this method, I would have to get rid of the duplicate entries using php, which seems like it would be very intensive.
I would also like to select the number of votes on each post in the list. I could do this in a separate query, but doing it in one would probably be faster. So, for example
SELECT COUNT(`id`)
FROM `votes`
WHERE `postid` IN ($posts)
ORDER BY `time` DESC
Would select all of the votes on the posts given. Instead, I would like it to select an array of the votes for each post, or something that could be converted to that.
Is there any MySQL operator that would allow me to select the number of votes on each post included in the array, and order them by the time the most recent post was voted on? In the above table, because there are 2 votes on post 1, and 1 vote on post 2, the result would be
array("1" => 2, "2" => 1)
Here is a possible query to get both the time of the latest vote and vote count per post:
SELECT `postid`,
MAX(time) as time,
COUNT(*) as vote_count
FROM `votes`
WHERE `postid` IN ($posts)
GROUP BY `postid`
ORDER BY 2 DESC
If you want all the other fields of these latest votes records, then you could use the above as a sub-query of a larger one:
SELECT `id`, `postid`, `voter`,
`vote_type`, `time`, vote_count
FROM `votes` v
INNER JOIN (
SELECT `postid`,
MAX(time) as time,
COUNT(*) as vote_count
FROM `votes`
WHERE `postid` IN ($posts)
GROUP BY `postid`) filter
ON v.`postid` = filter.`postid`
AND v.`time` = filter.time
ORDER BY `time` DESC
So you want to get the latest vote for each post in an array. I think you can just add a GROUP BY clause.
SELECT `postid`, COUNT(postid) AS votecount
FROM `votes`
WHERE `postid` IN ($posts)
GROUP BY `postid`
ORDER BY MAX(`time`)
I am currently working with MySQL creating a view that would return the following:
NAME | EMAIL | LAST_SEEN
abby | a#l.d | 2015-10-31 14:36:26
abby | a#l.d | 2015-11-28 13:30:37
I then apply the GROUP BY name to the select query and it returns the following
NAME | EMAIL | LAST_SEEN
abby | a#l.d | 2015-10-31 14:36:26
I want to know how can I fix this query so that it returns the following:
NAME | EMAIL | LAST_SEEN
abby | a#l.d | 2015-11-28 13:30:37
the actual code is as follows:
CREATE VIEW v_user_last_seen
AS
SELECT concat_ws(' ', u.first_name, u.middle_name, u.last_name) AS user_name
,c.email
,l.in_when AS last_seen
FROM user AS u
INNER JOIN check_here_first AS c ON c.email = u.email
INNER JOIN log AS l ON l.u_id = c.username
GROUP BY user_name
ORDER BY user_name ASC
simply use max(last_seen)
select name, email, max(last_seen)
from yourtable,
group by name, email;
Try to sort your table before grouping, this should do for simple cases such as this one:
SELECT *
FROM ( SELECT *
FROM `tablename`
ORDER BY `LAST_SEEN` DESC
) temp
GROUP BY `name`
If this fails, you can do something like:
SELECT tmp1.*
FROM `tablename` tmp1
LEFT JOIN `tablename` tmp2
ON (tmp1.`name` = tmp2.`name`
AND tmp1.`LAST_SEEN` < tmp2.`LAST_SEEN`)
WHERE tmp2.`name` IS NULL
The idea here is matching the table's rows with itself, where the matched values have a higher values. for that ones that don't match, we will get a null, as we use LEFT JOIN. These should be the highest in the group.
On my website people can thumbs up or thumbs down a comment.
To do this I use two tables:
$sql = "CREATE TABLE content
(
id INT NOT NULL AUTO_INCREMENT,
PRIMARY KEY(id),
content TEXT NOT NULL,
date date,
time time
)";
and
$sql2 = "CREATE TABLE ratings
(
rating_id INT AUTO_INCREMENT PRIMARY KEY NOT NULL ,
rating VARCHAR (10) NOT NULL ,
id INT NOT NULL ,
ip VARCHAR (50) NOT NULL
)";
The data stored in the ratings would be as follows:
Comment ID like/dislike user IP
1 l 86.42.173.83
1 d 86.42.173.43
2 l 86.42.173.79
2 l 86.42.173.34
2 d 86.42.173.22
The problem I'm having is that I'm finding it extremely difficult to create a SQL statement to order the comments by the amount of likes they have.
If anyone has any ideas on how to do this it would be greatly appreciated.
It would be easier if you stored likes as integers and not letters.
I added up the likes using a case statement and grouped by comment.
SELECT C.content,
SUM(CASE WHEN R.rating = 'l' THEN 1 ELSE -1 END) AS overallRating
FROM content C
LEFT JOIN ratings R ON R.id = C.id
GROUP BY C.content
ORDER BY overallRating
something like this will work
select content.text, count(*) likes
from content join ratings on content.id = ratings.id
group by context.text
order by likes
This post is taking a substantial amount of time to type because I'm trying to be as clear as possible, so please bear with me if it is still unclear.
Basically, what I have are a table of posts in the database which users can add privacy settings to.
ID | owner_id | post | other_info | privacy_level (int value)
From there, users can add their privacy details, allowing it to be viewable by all [privacy_level = 0), friends (privacy_level = 1), no one (privacy_level = 3), or specific people or filters (privacy_level = 4). For privacy levels specifying specific people (4), the query will reference the table "post_privacy_includes_for" in a subquery to see if the user (or a filter the user belongs to) exists in a row in the table.
ID | post_id | user_id | list_id
Also, the user has the ability to prevent some people from viewing their post in within a larger group by excluding them (e.g., Having it set for everyone to view but hiding it from a stalker user). For this, another reference table is added, "post_privacy_exclude_from" - it looks identical to the setup as "post_privacy_includes_for".
My problem is that this does not scale. At all. At the moment, there are about 1-2 million posts, the majority of them set to be viewable by everyone. For each post on the page it must check to see if there is a row that is excluding the post from being shown to the user - this moves really slow on a page that can be filled with 100-200 posts. It can take up to 2-4 seconds, especially when additional constraints are added to the query.
This also creates extremely large and complex queries that are just... awkward.
SELECT t.*
FROM posts t
WHERE ( (t.privacy_level = 3
AND t.owner_id = ?)
OR (t.privacy_level = 4
AND EXISTS
( SELECT i.id
FROM PostPrivacyIncludeFor i
WHERE i.user_id = ?
AND i.thought_id = t.id)
OR t.privacy_level = 4
AND t.owner_id = ?)
OR (t.privacy_level = 4
AND EXISTS
(SELECT i2.id
FROM PostPrivacyIncludeFor i2
WHERE i2.thought_id = t.id
AND EXISTS
(SELECT r.id
FROM FriendFilterIds r
WHERE r.list_id = i2.list_id
AND r.friend_id = ?))
OR t.privacy_level = 4
AND t.owner_id = ?)
OR (t.privacy_level = 1
AND EXISTS
(SELECT G.id
FROM Following G
WHERE follower_id = t.owner_id
AND following_id = ?
AND friend = 1)
OR t.privacy_level = 1
AND t.owner_id = ?)
OR (NOT EXISTS
(SELECT e.id
FROM PostPrivacyExcludeFrom e
WHERE e.thought_id = t.id
AND e.user_id = ?
AND NOT EXISTS
(SELECT e2.id
FROM PostPrivacyExcludeFrom e2
WHERE e2.thought_id = t.id
AND EXISTS
(SELECT l.id
FROM FriendFilterIds l
WHERE l.list_id = e2.list_id
AND l.friend_id = ?)))
AND t.privacy_level IN (0, 1, 4))
AND t.owner_id = ?
ORDER BY t.created_at LIMIT 100
(mock up query, similar to the query I use now in Doctrine ORM. It's a mess, but you get what I am saying.)
I guess my question is, how would you approach this situation to optimize it? Is there a better way to set up my database? I'm willing to completely scrap the method I have currently built up, but I wouldn't know what to move onto.
Thanks guys.
Updated: Fix the query to reflect the values I defined for privacy level above (I forgot to update it because I simplified the values)
Your query is too long to give a definitive solution for, but the approach I would follow is to simply the data lookups by converting the sub-queries into joins, and then build the logic into the where clause and column list of the select statement:
select t.*, i.*, r.*, G.*, e.* from posts t
left join PostPrivacyIncludeFor i on i.user_id = ? and i.thought_id = t.id
left join FriendFilterIds r on r.list_id = i.list_id and r.friend_id = ?
left join Following G on follower_id = t.owner_id and G.following_id = ? and G.friend=1
left join PostPrivacyExcludeFrom e on e.thought_id = t.id and e.user_id = ?
(This might need expanding: I couldn't follow the logic of the final clause.)
If you can get the simple select working fast AND including all the information needed, then all you need to do is build up the logic in the select list and where clause.
Had a quick stab at simplifying this without re-working your original design too much.
Using this solution your web page can now simply call the following stored procedure to get a list of filtered posts for a given user within a specified period.
call list_user_filtered_posts( <user_id>, <day_interval> );
The whole script can be found here : http://pastie.org/1212812
I haven't fully tested all of this and you may find this solution isn't performant enough for your needs but it may help you in fine tuning/modifying your existing design.
Tables
Dropped your post_privacy_exclude_from table and added a user_stalkers table which works pretty much like the inverse of user_friends. Kept the original post_privacy_includes_for table as per your design as this allows a user restrict a specific post to a subset of people.
drop table if exists users;
create table users
(
user_id int unsigned not null auto_increment primary key,
username varbinary(32) unique not null
)
engine=innodb;
drop table if exists user_friends;
create table user_friends
(
user_id int unsigned not null,
friend_user_id int unsigned not null,
primary key (user_id, friend_user_id)
)
engine=innodb;
drop table if exists user_stalkers;
create table user_stalkers
(
user_id int unsigned not null,
stalker_user_id int unsigned not null,
primary key (user_id, stalker_user_id)
)
engine=innodb;
drop table if exists posts;
create table posts
(
post_id int unsigned not null auto_increment primary key,
user_id int unsigned not null,
privacy_level tinyint unsigned not null default 0,
post_date datetime not null,
key user_idx(user_id),
key post_date_user_idx(post_date, user_id)
)
engine=innodb;
drop table if exists post_privacy_includes_for;
create table post_privacy_includes_for
(
post_id int unsigned not null,
user_id int unsigned not null,
primary key (post_id, user_id)
)
engine=innodb;
Stored Procedures
The stored procedure is relatively simple - it initially selects ALL posts within the specified period and then filters out posts as per your original requirements. I have not performance tested this sproc with large volumes but as the initial selection is relatively small it should be performant enough as well as simplifying your application/middle tier code.
drop procedure if exists list_user_filtered_posts;
delimiter #
create procedure list_user_filtered_posts
(
in p_user_id int unsigned,
in p_day_interval tinyint unsigned
)
proc_main:begin
drop temporary table if exists tmp_posts;
drop temporary table if exists tmp_priv_posts;
-- select ALL posts in the required date range (or whatever selection criteria you require)
create temporary table tmp_posts engine=memory
select
p.post_id, p.user_id, p.privacy_level, 0 as deleted
from
posts p
where
p.post_date between now() - interval p_day_interval day and now()
order by
p.user_id;
-- purge stalker posts (0,1,3,4)
update tmp_posts
inner join user_stalkers us on us.user_id = tmp_posts.user_id and us.stalker_user_id = p_user_id
set
tmp_posts.deleted = 1
where
tmp_posts.user_id != p_user_id;
-- purge other users private posts (3)
update tmp_posts set deleted = 1 where user_id != p_user_id and privacy_level = 3;
-- purge friend only posts (1) i.e where p_user_id is not a friend of the poster
/*
requires another temp table due to mysql temp table problem/bug
http://dev.mysql.com/doc/refman/5.0/en/temporary-table-problems.html
*/
-- the private posts (1) this user can see
create temporary table tmp_priv_posts engine=memory
select
tp.post_id
from
tmp_posts tp
inner join user_friends uf on uf.user_id = tp.user_id and uf.friend_user_id = p_user_id
where
tp.user_id != p_user_id and tp.privacy_level = 1;
-- remove private posts this user cant see
update tmp_posts
left outer join tmp_priv_posts tpp on tmp_posts.post_id = tpp.post_id
set
tmp_posts.deleted = 1
where
tpp.post_id is null and tmp_posts.privacy_level = 1;
-- purge filtered (4)
truncate table tmp_priv_posts; -- reuse tmp table
insert into tmp_priv_posts
select
tp.post_id
from
tmp_posts tp
inner join post_privacy_includes_for ppif on tp.post_id = ppif.post_id and ppif.user_id = p_user_id
where
tp.user_id != p_user_id and tp.privacy_level = 4;
-- remove private posts this user cant see
update tmp_posts
left outer join tmp_priv_posts tpp on tmp_posts.post_id = tpp.post_id
set
tmp_posts.deleted = 1
where
tpp.post_id is null and tmp_posts.privacy_level = 4;
drop temporary table if exists tmp_priv_posts;
-- output filtered posts (display ALL of these on web page)
select
p.*
from
posts p
inner join tmp_posts tp on p.post_id = tp.post_id
where
tp.deleted = 0
order by
p.post_id desc;
-- clean up
drop temporary table if exists tmp_posts;
end proc_main #
delimiter ;
Test Data
Some basic test data.
insert into users (username) values ('f00'),('bar'),('alpha'),('beta'),('gamma'),('omega');
insert into user_friends values
(1,2),(1,3),(1,5),
(2,1),(2,3),(2,4),
(3,1),(3,2),
(4,5),
(5,1),(5,4);
insert into user_stalkers values (4,1);
insert into posts (user_id, privacy_level, post_date) values
-- public (0)
(1,0,now() - interval 8 day),
(1,0,now() - interval 8 day),
(2,0,now() - interval 7 day),
(2,0,now() - interval 7 day),
(3,0,now() - interval 6 day),
(4,0,now() - interval 6 day),
(5,0,now() - interval 5 day),
-- friends only (1)
(1,1,now() - interval 5 day),
(2,1,now() - interval 4 day),
(4,1,now() - interval 4 day),
(5,1,now() - interval 3 day),
-- private (3)
(1,3,now() - interval 3 day),
(2,3,now() - interval 2 day),
(4,3,now() - interval 2 day),
-- filtered (4)
(1,4,now() - interval 1 day),
(4,4,now() - interval 1 day),
(5,4,now());
insert into post_privacy_includes_for values (15,4), (16,1), (17,6);
Testing
As I mentioned before I've not fully tested this but on the surface it seems to be working.
select * from posts;
call list_user_filtered_posts(1,14);
call list_user_filtered_posts(6,14);
call list_user_filtered_posts(1,7);
call list_user_filtered_posts(6,7);
Hope you find some of this of use.