MySQL inclusion/exclusion of posts - php

This post is taking a substantial amount of time to type because I'm trying to be as clear as possible, so please bear with me if it is still unclear.
Basically, what I have are a table of posts in the database which users can add privacy settings to.
ID | owner_id | post | other_info | privacy_level (int value)
From there, users can add their privacy details, allowing it to be viewable by all [privacy_level = 0), friends (privacy_level = 1), no one (privacy_level = 3), or specific people or filters (privacy_level = 4). For privacy levels specifying specific people (4), the query will reference the table "post_privacy_includes_for" in a subquery to see if the user (or a filter the user belongs to) exists in a row in the table.
ID | post_id | user_id | list_id
Also, the user has the ability to prevent some people from viewing their post in within a larger group by excluding them (e.g., Having it set for everyone to view but hiding it from a stalker user). For this, another reference table is added, "post_privacy_exclude_from" - it looks identical to the setup as "post_privacy_includes_for".
My problem is that this does not scale. At all. At the moment, there are about 1-2 million posts, the majority of them set to be viewable by everyone. For each post on the page it must check to see if there is a row that is excluding the post from being shown to the user - this moves really slow on a page that can be filled with 100-200 posts. It can take up to 2-4 seconds, especially when additional constraints are added to the query.
This also creates extremely large and complex queries that are just... awkward.
SELECT t.*
FROM posts t
WHERE ( (t.privacy_level = 3
AND t.owner_id = ?)
OR (t.privacy_level = 4
AND EXISTS
( SELECT i.id
FROM PostPrivacyIncludeFor i
WHERE i.user_id = ?
AND i.thought_id = t.id)
OR t.privacy_level = 4
AND t.owner_id = ?)
OR (t.privacy_level = 4
AND EXISTS
(SELECT i2.id
FROM PostPrivacyIncludeFor i2
WHERE i2.thought_id = t.id
AND EXISTS
(SELECT r.id
FROM FriendFilterIds r
WHERE r.list_id = i2.list_id
AND r.friend_id = ?))
OR t.privacy_level = 4
AND t.owner_id = ?)
OR (t.privacy_level = 1
AND EXISTS
(SELECT G.id
FROM Following G
WHERE follower_id = t.owner_id
AND following_id = ?
AND friend = 1)
OR t.privacy_level = 1
AND t.owner_id = ?)
OR (NOT EXISTS
(SELECT e.id
FROM PostPrivacyExcludeFrom e
WHERE e.thought_id = t.id
AND e.user_id = ?
AND NOT EXISTS
(SELECT e2.id
FROM PostPrivacyExcludeFrom e2
WHERE e2.thought_id = t.id
AND EXISTS
(SELECT l.id
FROM FriendFilterIds l
WHERE l.list_id = e2.list_id
AND l.friend_id = ?)))
AND t.privacy_level IN (0, 1, 4))
AND t.owner_id = ?
ORDER BY t.created_at LIMIT 100
(mock up query, similar to the query I use now in Doctrine ORM. It's a mess, but you get what I am saying.)
I guess my question is, how would you approach this situation to optimize it? Is there a better way to set up my database? I'm willing to completely scrap the method I have currently built up, but I wouldn't know what to move onto.
Thanks guys.
Updated: Fix the query to reflect the values I defined for privacy level above (I forgot to update it because I simplified the values)

Your query is too long to give a definitive solution for, but the approach I would follow is to simply the data lookups by converting the sub-queries into joins, and then build the logic into the where clause and column list of the select statement:
select t.*, i.*, r.*, G.*, e.* from posts t
left join PostPrivacyIncludeFor i on i.user_id = ? and i.thought_id = t.id
left join FriendFilterIds r on r.list_id = i.list_id and r.friend_id = ?
left join Following G on follower_id = t.owner_id and G.following_id = ? and G.friend=1
left join PostPrivacyExcludeFrom e on e.thought_id = t.id and e.user_id = ?
(This might need expanding: I couldn't follow the logic of the final clause.)
If you can get the simple select working fast AND including all the information needed, then all you need to do is build up the logic in the select list and where clause.

Had a quick stab at simplifying this without re-working your original design too much.
Using this solution your web page can now simply call the following stored procedure to get a list of filtered posts for a given user within a specified period.
call list_user_filtered_posts( <user_id>, <day_interval> );
The whole script can be found here : http://pastie.org/1212812
I haven't fully tested all of this and you may find this solution isn't performant enough for your needs but it may help you in fine tuning/modifying your existing design.
Tables
Dropped your post_privacy_exclude_from table and added a user_stalkers table which works pretty much like the inverse of user_friends. Kept the original post_privacy_includes_for table as per your design as this allows a user restrict a specific post to a subset of people.
drop table if exists users;
create table users
(
user_id int unsigned not null auto_increment primary key,
username varbinary(32) unique not null
)
engine=innodb;
drop table if exists user_friends;
create table user_friends
(
user_id int unsigned not null,
friend_user_id int unsigned not null,
primary key (user_id, friend_user_id)
)
engine=innodb;
drop table if exists user_stalkers;
create table user_stalkers
(
user_id int unsigned not null,
stalker_user_id int unsigned not null,
primary key (user_id, stalker_user_id)
)
engine=innodb;
drop table if exists posts;
create table posts
(
post_id int unsigned not null auto_increment primary key,
user_id int unsigned not null,
privacy_level tinyint unsigned not null default 0,
post_date datetime not null,
key user_idx(user_id),
key post_date_user_idx(post_date, user_id)
)
engine=innodb;
drop table if exists post_privacy_includes_for;
create table post_privacy_includes_for
(
post_id int unsigned not null,
user_id int unsigned not null,
primary key (post_id, user_id)
)
engine=innodb;
Stored Procedures
The stored procedure is relatively simple - it initially selects ALL posts within the specified period and then filters out posts as per your original requirements. I have not performance tested this sproc with large volumes but as the initial selection is relatively small it should be performant enough as well as simplifying your application/middle tier code.
drop procedure if exists list_user_filtered_posts;
delimiter #
create procedure list_user_filtered_posts
(
in p_user_id int unsigned,
in p_day_interval tinyint unsigned
)
proc_main:begin
drop temporary table if exists tmp_posts;
drop temporary table if exists tmp_priv_posts;
-- select ALL posts in the required date range (or whatever selection criteria you require)
create temporary table tmp_posts engine=memory
select
p.post_id, p.user_id, p.privacy_level, 0 as deleted
from
posts p
where
p.post_date between now() - interval p_day_interval day and now()
order by
p.user_id;
-- purge stalker posts (0,1,3,4)
update tmp_posts
inner join user_stalkers us on us.user_id = tmp_posts.user_id and us.stalker_user_id = p_user_id
set
tmp_posts.deleted = 1
where
tmp_posts.user_id != p_user_id;
-- purge other users private posts (3)
update tmp_posts set deleted = 1 where user_id != p_user_id and privacy_level = 3;
-- purge friend only posts (1) i.e where p_user_id is not a friend of the poster
/*
requires another temp table due to mysql temp table problem/bug
http://dev.mysql.com/doc/refman/5.0/en/temporary-table-problems.html
*/
-- the private posts (1) this user can see
create temporary table tmp_priv_posts engine=memory
select
tp.post_id
from
tmp_posts tp
inner join user_friends uf on uf.user_id = tp.user_id and uf.friend_user_id = p_user_id
where
tp.user_id != p_user_id and tp.privacy_level = 1;
-- remove private posts this user cant see
update tmp_posts
left outer join tmp_priv_posts tpp on tmp_posts.post_id = tpp.post_id
set
tmp_posts.deleted = 1
where
tpp.post_id is null and tmp_posts.privacy_level = 1;
-- purge filtered (4)
truncate table tmp_priv_posts; -- reuse tmp table
insert into tmp_priv_posts
select
tp.post_id
from
tmp_posts tp
inner join post_privacy_includes_for ppif on tp.post_id = ppif.post_id and ppif.user_id = p_user_id
where
tp.user_id != p_user_id and tp.privacy_level = 4;
-- remove private posts this user cant see
update tmp_posts
left outer join tmp_priv_posts tpp on tmp_posts.post_id = tpp.post_id
set
tmp_posts.deleted = 1
where
tpp.post_id is null and tmp_posts.privacy_level = 4;
drop temporary table if exists tmp_priv_posts;
-- output filtered posts (display ALL of these on web page)
select
p.*
from
posts p
inner join tmp_posts tp on p.post_id = tp.post_id
where
tp.deleted = 0
order by
p.post_id desc;
-- clean up
drop temporary table if exists tmp_posts;
end proc_main #
delimiter ;
Test Data
Some basic test data.
insert into users (username) values ('f00'),('bar'),('alpha'),('beta'),('gamma'),('omega');
insert into user_friends values
(1,2),(1,3),(1,5),
(2,1),(2,3),(2,4),
(3,1),(3,2),
(4,5),
(5,1),(5,4);
insert into user_stalkers values (4,1);
insert into posts (user_id, privacy_level, post_date) values
-- public (0)
(1,0,now() - interval 8 day),
(1,0,now() - interval 8 day),
(2,0,now() - interval 7 day),
(2,0,now() - interval 7 day),
(3,0,now() - interval 6 day),
(4,0,now() - interval 6 day),
(5,0,now() - interval 5 day),
-- friends only (1)
(1,1,now() - interval 5 day),
(2,1,now() - interval 4 day),
(4,1,now() - interval 4 day),
(5,1,now() - interval 3 day),
-- private (3)
(1,3,now() - interval 3 day),
(2,3,now() - interval 2 day),
(4,3,now() - interval 2 day),
-- filtered (4)
(1,4,now() - interval 1 day),
(4,4,now() - interval 1 day),
(5,4,now());
insert into post_privacy_includes_for values (15,4), (16,1), (17,6);
Testing
As I mentioned before I've not fully tested this but on the surface it seems to be working.
select * from posts;
call list_user_filtered_posts(1,14);
call list_user_filtered_posts(6,14);
call list_user_filtered_posts(1,7);
call list_user_filtered_posts(6,7);
Hope you find some of this of use.

Related

MYSQL Query - Get latest comment related to the post

I am trying to get the latest 1 or 2 comments related to each post I download, a bit like instagram does as they show the latest 3 comments for each post, So far I am getting the posts & the likes counts.
Now all I need to do is figure out how to get the latest comments, not too sure how to approach it and that is why I hoping someone with a lot more expertise can help me!
This is my current query:
(SELECT
P.uuid,
P.caption,
P.imageHeight,
P.path,
P.date,
U.id,
U.fullname,
U.coverImage,
U.bio,
U.username,
U.profileImage,
coalesce(Activity.LikeCNT,0),
Activity.CurrentUserLiked
FROM USERS AS U
INNER JOIN Posts AS P
ON P.id = U.id
LEFT JOIN (SELECT COUNT(DISTINCT Activity.uuidPost) LikeCNT, Activity.uuidPost, Activity.id, sum(CASE WHEN Activity.id = $id then 1 else 0 end) as CurrentUserLiked
FROM Activity Activity
WHERE type = 'like'
GROUP BY Activity.uuidPost) Activity
ON Activity.uuidPost = P.uuid
AND Activity.id = U.id
WHERE U.id = $id)
UNION
(SELECT
P.uuid,
P.caption,
P.imageHeight,
P.path,
P.date,
U.id,
U.fullname,
U.coverImage,
U.bio,
U.username,
U.profileImage,
coalesce(Activity.LikeCNT,0),
Activity.CurrentUserLiked
FROM Activity AS A
INNER JOIN USERS AS U
ON A.IdOtherUser=U.id
INNER JOIN Posts AS P
ON P.id = U.id
LEFT JOIN (SELECT COUNT(DISTINCT Activity.uuidPost) LikeCNT, Activity.uuidPost, Activity.id, sum(CASE WHEN Activity.id = $id then 1 else 0 end) as CurrentUserLiked
FROM Activity Activity
WHERE type = 'like'
GROUP BY Activity.uuidPost) Activity
ON Activity.uuidPost = P.uuid
AND Activity.id = U.id
WHERE A.id = $id)
ORDER BY date DESC
LIMIT 0, 5
Basically the comments are store in the same table as the likes.
So the table is Activity, then I have a column comment which stores the comment text, and then the "type" is equal to "comment".
Possibly not very well explained but I am willing to try and give as much detail as possible!
If anyone can help it's very much appreciated!!
UPDATE
On this query given by https://stackoverflow.com/users/1016435/xqbert I am currently getting this error:
Illegal mix of collations (utf8_general_ci,IMPLICIT) and (utf8_unicode_ci,IMPLICIT) for operation '='
SELECT Posts.id,
Posts.uuid,
Posts.caption,
Posts.path,
Posts.date,
USERS.id,
USERS.username,
USERS.fullname,
USERS.profileImage,
coalesce(A.LikeCNT,0),
com.comment
FROM Posts
INNER JOIN USERS
ON Posts.id = 145
AND USERS.id = 145
LEFT JOIN (SELECT COUNT(A.uuidPost) LikeCNT, A.UUIDPost
FROM Activity A
WHERE type = 'like'
GROUP BY A.UUIDPOST) A
on A.UUIDPost=Posts.uuid
LEFT JOIN (SELECT comment, UUIDPOST, #row_num := IF(#prev_value=UUIDPOST,#row_num+1,1) as row_number,#prev_value := UUIDPOST
FROM Activity
CROSS JOIN (SELECT #row_num := 1) x
CROSS JOIN (SELECT #prev_value := '') y
WHERE type = 'comment'
ORDER BY UUIDPOST, date DESC) Com
ON Com.UUIIDPOSt = Posts.UUID
AND row_number <= 2
ORDER BY date DESC
LIMIT 0, 5
Latest Edit
Table structures:
Posts
----------------------------------------------------------
| id | int(11) | | not null |
| uuid | varchar(100) | utf8_unicode_ci | not null |
| imageLink | varchar(500) | utf8_unicode_ci | not null |
| date | timestamp | | not null |
----------------------------------------------------------
USERS
-------------------------------------------------------------
| id | int(11) | | not null |
| username | varchar(100) | utf8_unicode_ci | not null |
| profileImage | varchar(500) | utf8_unicode_ci | not null |
| date | timestamp | | not null |
-------------------------------------------------------------
Activity
----------------------------------------------------------
| id | int(11) | | not null |
| uuid | varchar(100) | utf8_unicode_ci | not null |
| uuidPost | varchar(100) | utf8_unicode_ci | not null |
| type | varchar(50) | utf8_unicode_ci | not null |
| commentText | varchar(500) | utf8_unicode_ci | not null |
| date | timestamp | | not null |
----------------------------------------------------------
Those are some examples, in the "Activity" table in this case "type" will always be equal to "comment".
Sum up of everything and desire result:
When I query the users posts, I would like to be able to go into the "Activity" table and get the latest 2 comments for every posts he has. Maybe there will be no comments so obviously it would return 0, maybe there could be 100 comments for that post. But I only want to get the latest/most recent 2 comments.
An example could be looking at how Instagram does it. For every post the display the most recent comments 1, 2 or 3....
Hope this helps!
Fiddle link
This error message
Illegal mix of collations (utf8_general_ci,IMPLICIT) and
(utf8_unicode_ci,IMPLICIT) for operation '='
is typically due to the definition of your columns and tables. It usually means that on either side of an equal sign there are different collations. What you need to do is choose one and include that decision in your query.
The collation issue here was in the CROSS JOIN of #prev_value which needed an explicit collation to be used.
I have also slightly changed the "row_number" logic to a single cross join and moved the if logic to the extremes of the select list.
Some sample data is displayed below. Sample data is needed to test queries with. Anyone attempting to answer your question with working examples will need data. The reason I am including it here is twofold.
so that you will understand any result I present
so that in future when you ask another SQL related question you understand the importance of supplying data. It is not only more convenient for us that you do this. If the asker provides the sample data then the asker will already understand it - it won't be an invention of some stranger who has devoted some of their time to help out.
Sample Data
Please note some columns are missing from the tables, only the columns specified in the table details have been included.
This sample data has 5 comments against a single post (no likes are recorded)
CREATE TABLE Posts
(
`id` int,
`uuid` varchar(7) collate utf8_unicode_ci,
`imageLink` varchar(9) collate utf8_unicode_ci,
`date` datetime
);
INSERT INTO Posts(`id`, `uuid`, `imageLink`, `date`)
VALUES
(145, 'abcdefg', 'blah blah', '2016-10-10 00:00:00') ;
CREATE TABLE USERS
(
`id` int,
`username` varchar(15) collate utf8_unicode_ci,
`profileImage` varchar(12) collate utf8_unicode_ci,
`date` datetime
) ;
INSERT INTO USERS(`id`, `username`, `profileImage`, `date`)
VALUES
(145, 'used_by_already', 'blah de blah', '2014-01-03 00:00:00') ;
CREATE TABLE Activity
(
`id` int,
`uuid` varchar(4) collate utf8_unicode_ci,
`uuidPost` varchar(7) collate utf8_unicode_ci,
`type` varchar(40) collate utf8_unicode_ci,
`commentText` varchar(11) collate utf8_unicode_ci, `date` datetime
) ;
INSERT INTO Activity (`id`, `uuid`, `uuidPost`, `type`, `commentText`, `date`)
VALUES
(345, 'a100', 'abcdefg', 'comment', 'lah lha ha', '2016-07-05 00:00:00'),
(456, 'a101', 'abcdefg', 'comment', 'lah lah lah', '2016-07-06 00:00:00'),
(567, 'a102', 'abcdefg', 'comment', 'lha lha ha', '2016-07-07 00:00:00'),
(678, 'a103', 'abcdefg', 'comment', 'ha lah lah', '2016-07-08 00:00:00'),
(789, 'a104', 'abcdefg', 'comment', 'hla lah lah', '2016-07-09 00:00:00') ;
[SQL Standard behaviour: 2 rows per Post query]
This was my initial query, with some corrections. I changed the column order of the select list so that you will see some comment related data easily when I present the results. Please study those results they are provided so you may understand what the query will do. Columns preceded by # do not exist in the sample data I am working with for reasons I have already noted.
SELECT
Posts.id
, Posts.uuid
, rcom.uuidPost
, rcom.commentText
, rcom.`date` commentDate
#, Posts.caption
#, Posts.path
, Posts.`date`
, USERS.id
, USERS.username
#, USERS.fullname
, USERS.profileImage
, COALESCE(A.LikeCNT, 0) num_likes
FROM Posts
INNER JOIN USERS ON Posts.id = 145
AND USERS.id = 145
LEFT JOIN (
SELECT
COUNT(A.uuidPost) LikeCNT
, A.UUIDPost
FROM Activity A
WHERE type = 'like'
GROUP BY
A.UUIDPOST
) A ON A.UUIDPost = Posts.uuid
LEFT JOIN (
SELECT
#row_num := IF(#prev_value=UUIDPOST,#row_num+1,1) as row_number
, commentText
, uuidPost
, `date`
, #prev_value := UUIDPOST
FROM Activity
CROSS JOIN ( SELECT #row_num := 1, #prev_value := '' collate utf8_unicode_ci ) xy
WHERE type = 'comment'
ORDER BY
uuidPost
, `date` DESC
) rcom ON rcom.uuidPost = Posts.UUID
AND rcom.row_number <= 2
ORDER BY
posts.`date` DESC
;
See a working demonstration of this query at SQLFiddle
Results:
| id | uuid | uuidPost | commentText | date | date | id | username | profileImage | num_likes |
|-----|---------|----------|-------------|------------------------|---------------------------|-----|-----------------|--------------|-----------|
| 145 | abcdefg | abcdefg | hla lah lah | July, 09 2016 00:00:00 | October, 10 2016 00:00:00 | 145 | used_by_already | blah de blah | 0 |
| 145 | abcdefg | abcdefg | ha lah lah | July, 08 2016 00:00:00 | October, 10 2016 00:00:00 | 145 | used_by_already | blah de blah | 0 |
There are 2 ROWS - as expected. One row for the most recent comment, and another rows for the next most recent comment. This is normal behaviour for SQL and until a comment was added under this answer readers of the question would assume this normal behaviour would be acceptable.
The question lacks a clearly articulated "expected result".
[Option 1: One row per Post query, with UP TO 2 comments, added columns]
In a comment below it was revealed that you did not want 2 rows per post and this would be an easy fix. Well it kind of is easy BUT there are options and the options are dictated by the user in the form of requirements. IF the question had an "expected result" then we would know which option to choose. Nonetheless here is one option
SELECT
Posts.id
, Posts.uuid
, max(case when rcom.row_number = 1 then rcom.commentText end) Comment_one
, max(case when rcom.row_number = 2 then rcom.commentText end) Comment_two
#, Posts.caption
#, Posts.path
, Posts.`date`
, USERS.id
, USERS.username
#, USERS.fullname
, USERS.profileImage
, COALESCE(A.LikeCNT, 0) num_likes
FROM Posts
INNER JOIN USERS ON Posts.id = 145
AND USERS.id = 145
LEFT JOIN (
SELECT
COUNT(A.uuidPost) LikeCNT
, A.UUIDPost
FROM Activity A
WHERE type = 'like'
GROUP BY
A.UUIDPOST
) A ON A.UUIDPost = Posts.uuid
LEFT JOIN (
SELECT
#row_num := IF(#prev_value=UUIDPOST,#row_num+1,1) as row_number
, commentText
, uuidPost
, `date`
, #prev_value := UUIDPOST
FROM Activity
CROSS JOIN ( SELECT #row_num := 1, #prev_value := '' collate utf8_unicode_ci ) xy
WHERE type = 'comment'
ORDER BY
uuidPost
, `date` DESC
) rcom ON rcom.uuidPost = Posts.UUID
AND rcom.row_number <= 2
GROUP BY
Posts.id
, Posts.uuid
#, Posts.caption
#, Posts.path
, Posts.`date`
, USERS.id
, USERS.username
#, USERS.fullname
, USERS.profileImage
, COALESCE(A.LikeCNT, 0)
ORDER BY
posts.`date` DESC
;
See the second query working at SQLFiddle
Results of query 2:
| id | uuid | Comment_one | Comment_two | date | id | username | profileImage | num_likes |
|-----|---------|-------------|-------------|---------------------------|-----|-----------------|--------------|-----------|
| 145 | abcdefg | hla lah lah | ha lah lah | October, 10 2016 00:00:00 | 145 | used_by_already | blah de blah | 0 |
** Option 2, concatenate the most recent comments into a single comma separated list **
SELECT
Posts.id
, Posts.uuid
, group_concat(rcom.commentText) Comments_two_concatenated
#, Posts.caption
#, Posts.path
, Posts.`date`
, USERS.id
, USERS.username
#, USERS.fullname
, USERS.profileImage
, COALESCE(A.LikeCNT, 0) num_likes
FROM Posts
INNER JOIN USERS ON Posts.id = 145
AND USERS.id = 145
LEFT JOIN (
SELECT
COUNT(A.uuidPost) LikeCNT
, A.UUIDPost
FROM Activity A
WHERE type = 'like'
GROUP BY
A.UUIDPOST
) A ON A.UUIDPost = Posts.uuid
LEFT JOIN (
SELECT
#row_num := IF(#prev_value=UUIDPOST,#row_num+1,1) as row_number
, commentText
, uuidPost
, `date`
, #prev_value := UUIDPOST
FROM Activity
CROSS JOIN ( SELECT #row_num := 1, #prev_value := '' collate utf8_unicode_ci ) xy
WHERE type = 'comment'
ORDER BY
uuidPost
, `date` DESC
) rcom ON rcom.uuidPost = Posts.UUID
AND rcom.row_number <= 2
GROUP BY
Posts.id
, Posts.uuid
#, Posts.caption
#, Posts.path
, Posts.`date`
, USERS.id
, USERS.username
#, USERS.fullname
, USERS.profileImage
, COALESCE(A.LikeCNT, 0)
ORDER BY
posts.`date` DESC
See this third query working at SQLFiddle
Results of query 3:
| id | uuid | Comments_two_concatenated | date | id | username | profileImage | num_likes |
|-----|---------|---------------------------|---------------------------|-----|-----------------|--------------|-----------|
| 145 | abcdefg | hla lah lah,ha lah lah | October, 10 2016 00:00:00 | 145 | used_by_already | blah de blah | 0 |
** Summary **
I have presented 3 queries, each one shows only the 2 most recent comments, but each query does that in a different way. The first query (default behaviour) will display 2 rows for each post. Option 2 adds a column but removes the second row. Option 3 concatenates the 2 most recent comments.
Please note that:
The question lacks table definitions covering all columns
The question lacks any sample data, which makes it harder for you to understand any results presented here, but also harder for us to prepare solutions
The question also lacks a definitive "expected result" (the wanted output) and this has led to further complexity in answering
I do hope the additional provided information will be of some use, and that by now you also know that it is normal for SQL to present data as multiple rows. If you do not want that normal behaviour please be specific about what you do really want in your question.
Postscript. To include yet another subquery for "follows" you may use a similar subquery to the one you already have. It may be added before or after that subquery. You may also see it in use at sqlfiddle here
LEFT JOIN (
SELECT
COUNT(*) FollowCNT
, IdOtherUser
FROM Activity
WHERE type = 'Follow'
GROUP BY
IdOtherUser
) F ON USERS.id = F.IdOtherUser
Whilst adding another subquery may resolve your desire for more information, the overall query may get slower in proportion to the growth of your data. Once you have settled on the functionality you really need it may be worthwhile considering what indexes you need on those tables. (I believe you would be advised to ask for that advice separately, and if you do make sure you include 1. the full DDL of your tables and 2. an explain plan of the query.)
I am a little bit lost in your query, but if you want to download data for multiple posts at once, it's not a good idea to include comment data in the first query since you would include all the data about post and posting user multiple times. You should run another query that would connect posts with comments. Something like:
SELECT
A.UUIDPost,
C.username,
C.profileImage,
B.Comment,
B.[DateField]
FROM Posts A JOIN
Activities B ON A.uuid = B.UUIDPost JOIN
Users C ON B.[UserId] = C.id
and use that data to display your comments with commenting user id, name, image etc.
To get only 3 comments per post, you can look into this post:
Select top 3 values from each group in a table with SQL
if you are sure that there are going to be no duplicate rows in the comment table or this post:
How to select top 3 values from each group in a table with SQL which have duplicates
if you're not sure about that (although due to DateField in the table, it should not be possible).
UNTESTED: I would recommend putting together an SQL fiddle with some sample data and your existing table structure showing the problem; that way we could play around with the responses and ensure functionality with your schema.
So we use a variables to simulate a window function (Such as row_number)
in this case #Row_num and #prev_Value. #Row_number keeps track of the current row for each post (since a single post could have lots of comments) then when the a new post ID (UUIDPOST?) is encountered the row_num variable is reset to 1. When the current records UUIDPOST matches the variable #prev_Value, we simply increment the row by 1.
This technique allows us to assign a row number based on the date or activity ID order descending. As each cross join only results in 1 record we don't cause duplicate records to appear. However, since we then limit by row_number < = 2 we only get the two most recent comments in our newly added left join.
This assumes posts relation to users is a Many to one, meaning a post can only have 1 user.
Something like This: though I'm not sure about the final left join I need to better understand the structure of the activity table thus a comment against the original question.
SELECT Posts.id,
Posts.uuid,
Posts.caption,
Posts.path,
Posts.date,
USERS.id,
USERS.username,
USERS.fullname,
USERS.profileImage,
coalesce(A.LikeCNT,0)
com.comment
FROM Posts
INNER JOIN USERS
ON Posts.id = 145
AND USERS.id = 145
LEFT JOIN (SELECT COUNT(A.uuidPost) LikeCNT, A.UUIDPost
FROM Activity A
WHERE type = 'like'
GROUP BY A.UUIDPOST) A
on A.UUIDPost=Posts.uuid
--This join simulates row_Number() over (partition by PostID, order by activityID desc) (Nice article [here](http://preilly.me/2011/11/11/mysql-row_number/) several other examples exist on SO already.
--Meaning.... Generate a row number for each activity from 1-X restarting at 1 for each new post but start numbering at the newest activityID)
LEFT JOIN (SELECT comment, UUIDPOST, #row_num := IF(#prev_value=UUIDPOST,#row_num+1,1) as row_number,#prev_value := UUIDPOST
FROM ACTIVITY
CROSS JOIN (SELECT #row_num := 1) x
CROSS JOIN (SELECT #prev_value := '') y
WHERE type = 'comment'
ORDER BY UUIDPOST, --Some date or ID desc) Com
on Com.UUIIDPOSt = Posts.UUID
and row_number < = 2
-- Now since we have a row_number restarting at 1 for each new post, simply return only the 1st two rows.
ORDER BY date DESC
LIMIT 0, 5
we had to put the and row_number < = 2 on the join itself. If it was put in the where clause you would lose those posts without any comments which I think you still want.
Additionally we should probably look at the "comment" field to make sure it's not blank or null, but lets make sure this works first.
This type of comment has been posted many times, and trying to get the "latest-for-each" always appears to be a stumbling block and join / subquery nightmare for most.
Especially for a web interface, you might be better to tack on a column (or 2 or 3) to the one table that is your active "posts" table such as Latest1, Latest2, Latest3.
Then, via an insert into your comment table, have an insert trigger on your table to update the main post with the newest ID. Then you always have that ID on the table without any sub-joins. Now, as you mentioned, you might want to have the last 2 or 3 IDs, then add the 3 sample columns and have your insert trigger to the post comment detail do an update to the primary post table something like
update PrimaryPostTable
set Latest3 = Latest2,
Latest2 = Latest1,
Latest1 = NewDetailCommentID
where PostID = PostIDFromTheInsertedDetail
This would have to be formalized into a proper trigger under MySQL, but should be easy enough to implement. You could prime the list with the latest 1, then as new posts go, it would automatically roll the most recent into their 1st, 2nd, 3rd positions. Finally your query could be simplified down to something like
Select
P.PostID,
P.TopicDescription,
PD1.WhateverDetail as LatestDetail1,
PD2.WhateverDetail as LatestDetail2,
PD3.WhateverDetail as LatestDetail3
from
Posts P
LEFT JOIN PostDetail PD1
on P.Latest1 = PD1.PostDetailID
LEFT JOIN PostDetail PD2
on P.Latest2 = PD2.PostDetailID
LEFT JOIN PostDetail PD3
on P.Latest3 = PD3.PostDetailID
where
whateverCondition
Denormalizing data is typically NOT desired. However, in cases such as this, it is a great simplifier for getting these "latest" entries in a For-Each type of query. Good luck.
Here is a fully working sample in MySQL so you can see the tables and the results of the sql-inserts and the automatic stamping via the trigger to update the main post table. Then querying the post table you can see how the most recent automatically rolls into first, second and third positions. Finally a join showing how to pull all the data from each "post activity"
CREATE TABLE Posts
( id int,
uuid varchar(7),
imageLink varchar(9),
`date` datetime,
ActivityID1 int null,
ActivityID2 int null,
ActivityID3 int null,
PRIMARY KEY (id)
);
CREATE TABLE Activity
( id int,
postid int,
`type` varchar(40) collate utf8_unicode_ci,
commentText varchar(20) collate utf8_unicode_ci,
`date` datetime,
PRIMARY KEY (id)
);
DELIMITER //
CREATE TRIGGER ActivityRecAdded
AFTER INSERT ON Activity FOR EACH ROW
BEGIN
Update Posts
set ActivityID3 = ActivityID2,
ActivityID2 = ActivityID1,
ActivityID1 = NEW.ID
where
ID = NEW.POSTID;
END; //
DELIMITER ;
INSERT INTO Posts
(id, uuid, imageLink, `date`)
VALUES
(123, 'test1', 'blah', '2016-10-26 00:00:00');
INSERT INTO Posts
(id, uuid, imageLink, `date`)
VALUES
(125, 'test2', 'blah 2', '2016-10-26 00:00:00');
INSERT INTO Activity
(id, postid, `type`, `commentText`, `date`)
VALUES
(789, 123, 'type1', 'any comment', '2016-10-26 00:00:00'),
(821, 125, 'type2', 'another comment', '2016-10-26 00:00:00'),
(824, 125, 'type3', 'third comment', '2016-10-27 00:00:00'),
(912, 123, 'typeAB', 'comment', '2016-10-27 00:00:00');
-- See the results after the insert and the triggers.
-- you will see that the post table has been updated with the
-- most recent
-- activity post ID=912 in position Posts.Activity1
-- activity post ID=789 in position Posts.Activity2
-- no value in position Posts.Activity3
select * from Posts;
-- NOW, insert two more records for post ID = 123.
-- you will see the shift of ActivityIDs adjusted
INSERT INTO Activity
(id, postid, `type`, `commentText`, `date`)
VALUES
(931, 123, 'type1', 'any comment', '2016-10-28 00:00:00'),
(948, 123, 'newest', 'blah', '2016-10-29 00:00:00');
-- See the results after the insert and the triggers.
-- you will see that the post table has been updated with the
-- most recent
-- activity post ID=948 in position Posts.Activity1
-- activity post ID=931 in position Posts.Activity2
-- activity post ID=912 in position Posts.Activity3
-- notice the FIRST activity post 789 is not there as
-- anything AFTER the 4th entry, it got pushed away.
select * from Posts;
-- Finally, query the data to get the most recent 3 items for each post.
select
p.id,
p.uuid,
p.imageLink,
p.`date`,
A1.id NewestActivityPostID,
A1.`type` NewestType,
A1.`date` NewestDate,
A2.id SecondActivityPostID,
A2.`type` SecondType,
A2.`date` SecondDate,
A3.id ThirdActivityPostID,
A3.`type` ThirdType,
A3.`date` ThirdDate
from
Posts p
left join Activity A1
on p.ActivityID1 = A1.ID
left join Activity A2
on p.ActivityID2 = A2.ID
left join Activity A3
on p.ActivityID3 = A3.ID;
You can create a test database as to not corrupt yours to see this example.
This will probably get rid of the illegal mix of collations... Just after establishing the connection, perform this query:
SET NAMES utf8 COLLATE utf8_unicode_ci;
For the question about the 'latest 2', please use the mysql commandline tool and run SHOW CREATE TABLE Posts and provide the output. (Ditto for the other relevant tables.) Phpmyadmin (and other UIs) have a way to perform the query without getting to a command line.
You can get there with a pretty simple query by using sub-queries. First I specify the user in the where-clause and join the posts because it seems more logic to me. Then I get all the likes for a post with a sub-query.
Now instead of grouping and limiting the group size we join only the values we want to by limiting the count of dates after the date we are currently looking at.
INNER JOIN Activity if you only want to show posts with at least one comment.
SELECT
u.id,
u.username,
u.fullname,
u.profileImage,
p.uuid,
p.caption,
p.path,
p.date,
(SELECT COUNT(*) FROM Activity v WHERE v.uuidPost = p.uuidPost AND v.type = 'like') likes,
a.commentText,
a.date
FROM
Users u INNER JOIN
Posts p ON p.id = u.id LEFT JOIN
Activity a ON a.uuid = p.uuid AND a.type = 'comment' AND 2 > (
SELECT COUNT(*) FROM Activity v
WHERE v.uuid = p.uuid AND v.type = 'comment' AND v.date > a.date)
WHERE
u.id = 145
That said a redesign would probably be best, also performance-wise (Activity will soon contain a lot of entries and they always have to be filtered for the desired type). The user table is okay with the id auto-incremented and as primary key. For the posts I would also add an auto-incremented id as primary key and user_id as foreign key (you can also decide what to do on deletion, e.g. with cascade all his posts would also be deleted automatically).
For the comments and likes you can create separated tables with the two foreign keys user_id and post_id (simple example, like this you can only like posts and nothing else, but if there are not many different kind of likes it could still be good to create a post_likes and few other ..._likes tables, you have to think about how this data is usually queried, if those likes are mostly independent from each other it's probably a good choice).

How to get 2 columns from one table and 2 rows as columns from other table in one row, in MySQL?

I know this is quite complicated, but I sincerely hope someone will check this out.
I made short version (to better understand the problem) and full version (with original SQL)
Short version:
[TABLE A] [TABLE B]
|1|a|b| |1|x
|2|c|d| |1|y
|3| | | |2|z
|5| | | |2|v
|4|w
How can I make MySQL query to get rows like that:
1|a|b|x|y
2|c|d|z|v
2 columns from A and 2 rows from B as columns, only with keys 1 and 2, no empty results
Subquery?
Full version:
I tried to get from Prestashop db in one row:
product id
ean13 code
upc code
feature with id 24
feature with id 25
It's easy to get id_product, ean13 and upc, as it's one row in ps_product table. To get features I used subqueries (JOIN didn't work out).
So, I selected id_product, ean13, upc, (subquery1) as code1, (subquery2) as code2.
Then I needed to throw out empty rows. But couldn't just put code1 or code2 in WHERE.
To make it work I had to put everything in subquery.
This code WORKS, but it is terribly ugly and I bet this should be done differently.
How can I make it BETTER?
SELECT * FROM(
SELECT
p.id_product as idp, p.ean13 as ean13, p.upc as upc, (
SELECT
fvl.value
FROM
`ps_feature_product` fp
LEFT JOIN
`ps_feature_value_lang` fvl ON (fp.id_feature_value = fvl.id_feature_value)
WHERE fp.id_feature = 24 AND fp.id_product = idp
) AS code1, (
SELECT
fvl.value
FROM
`ps_feature_product` fp
LEFT JOIN
`ps_feature_value_lang` fvl ON (fp.id_feature_value = fvl.id_feature_value)
WHERE fp.id_feature = 25 AND fp.id_product = idp
) AS code2,
m.name
FROM
`ps_product` p
LEFT JOIN
`ps_manufacturer` m ON (p.id_manufacturer = m.id_manufacturer)
) mainq
WHERE
ean13 != '' OR upc != '' OR code1 IS NOT NULL OR code2 IS NOT NULL
create table tablea
( id int,
col1 varchar(1),
col2 varchar(1));
create table tableb
( id int,
feature int,
cola varchar(1));
insert into tablea (id, col1, col2)
select 1,'a','b' union
select 2,'c','d' union
select 3,null,null union
select 5,null,null;
insert into tableb (id, feature, cola)
select 1,24,'x' union
select 1,25,'y' union
select 2,24,'z' union
select 2,25,'v' union
select 4,24,'w';
select a.id, a.col1, a.col2, b1.cola b1a, b2.cola b2a
from tablea a
inner join tableb b1 on (b1.id = a.id and b1.feature = 24)
inner join tableb b2 on (b2.id = a.id and b2.feature = 25);
SQLFiddle here.
What you want to do is called a Pivot Query. MySQL has no native support for pivot queries, though other RDBMSen do.
You can simulate a pivot query with derived columns, but you must specify each derived column. That is, it is impossible in MySQL itself to have the number of columns match rows of another table. This has to be known ahead of time.
It would be much easier to query the results as rows and then use PHP to do the aggregation into columns. For example:
while ($row = $result->fetch()) {
if (!isset($table[$row->id])) {
$table[$row->id] = array();
}
$table[$row->id][] = $row->feature;
This is not a simple question because it's not a standard query, by the way if you can make use of views you can do the following procedure. Assuming you're starting from this tables:
CREATE TABLE `A` (
`id` int(11) unsigned NOT NULL AUTO_INCREMENT,
`firstA` char(1) NOT NULL DEFAULT '',
`secondA` char(1) NOT NULL DEFAULT '',
PRIMARY KEY (`id`)
);
CREATE TABLE `B` (
`id` int(11) unsigned NOT NULL,
`firstB` char(1) NOT NULL DEFAULT ''
) ENGINE=InnoDB DEFAULT CHARSET=utf8;
INSERT INTO `A` (`id`, `firstA`, `secondA`)
VALUES (1, 'a', 'b'), (2, 'c', 'd');
INSERT INTO `B` (`id`, `firstB`)
VALUES (1, 'x'), (1, 'y'), (2, 'z'), (2, 'v'), (4, 'w');
First create a view that joins the two tables:
create or replace view C_join as
select A.firstA, A.secondA, B.firstB
from A
join B on B.id=A.id;
Create the view that groups the rows in table B:
create or replace view d_group_concat as
select firstA, secondA, group_concat(firstB) groupconcat
from c_join
group by firstA, secondA
Create the view that does what you need:
create or replace view e_result as
select firstA, secondA, SUBSTRING_INDEX(groupconcat,',',1) firstB, SUBSTRING_INDEX(SUBSTRING_INDEX(groupconcat,',',2),',',-1) secondB
from d_group_concat
And that's all. Hope this helps you.
If you can't create views, this could be the query:
select firstA, secondA, SUBSTRING_INDEX(groupconcat,',',1) firstB, SUBSTRING_INDEX(SUBSTRING_INDEX(groupconcat,',',2),',',-1) secondB
from (
select firstA, secondA, group_concat(firstB) groupconcat
from (
select A.firstA, A.secondA, B.firstB
from A
join B on B.id=A.id
) c_join
group by firstA, secondA
) d_group_concat
Big thanks to everyone for the answers. James's answer was first, simplest and works perfectly in my case. The query runs several times faster than mine, with subqueries. Thanks, James!
Just a few words why I needed that:
It's a part of integration component for Prestashop and wholesale exchange platform. There are 4 product code systems that wholesalers use on the platform (ean13, upc and 2 other systems). Those 2 other product codes are added as product feature in Prestashop. There are thousands of products on the shop and hundreds of thousands of products on the platform. Which is why speed is crucial.
Here is the code for full version of my question. Maybe someone will find this helpful.
Query to get Prestashop product codes and certain features in one row:
SELECT
p.id_product, p.ean13, p.upc, fvl1.value as code1, fvl2.value as code2
FROM `ps_product` p
LEFT JOIN
`ps_feature_product` fp1 ON (p.id_product = fp1.id_product and fp1.id_feature = 24)
LEFT JOIN
`ps_feature_value_lang` fvl1 ON (fvl1.id_feature_value = fp1.id_feature_value)
LEFT JOIN
`ps_feature_product` fp2 ON (p.id_product = fp2.id_product and fp2.id_feature = 25)
LEFT JOIN
`ps_feature_value_lang` fvl2 ON (fvl2.id_feature_value = fp2.id_feature_value)
WHERE
ean13 != '' OR upc != '' OR fvl1.value IS NOT NULL OR fvl2.value IS NOT NULL;

How to order comments by likes/dislikes in MySQL

On my website people can thumbs up or thumbs down a comment.
To do this I use two tables:
$sql = "CREATE TABLE content
(
id INT NOT NULL AUTO_INCREMENT,
PRIMARY KEY(id),
content TEXT NOT NULL,
date date,
time time
)";
and
$sql2 = "CREATE TABLE ratings
(
rating_id INT AUTO_INCREMENT PRIMARY KEY NOT NULL ,
rating VARCHAR (10) NOT NULL ,
id INT NOT NULL ,
ip VARCHAR (50) NOT NULL
)";
The data stored in the ratings would be as follows:
Comment ID like/dislike user IP
1 l 86.42.173.83
1 d 86.42.173.43
2 l 86.42.173.79
2 l 86.42.173.34
2 d 86.42.173.22
The problem I'm having is that I'm finding it extremely difficult to create a SQL statement to order the comments by the amount of likes they have.
If anyone has any ideas on how to do this it would be greatly appreciated.
It would be easier if you stored likes as integers and not letters.
I added up the likes using a case statement and grouped by comment.
SELECT C.content,
SUM(CASE WHEN R.rating = 'l' THEN 1 ELSE -1 END) AS overallRating
FROM content C
LEFT JOIN ratings R ON R.id = C.id
GROUP BY C.content
ORDER BY overallRating
something like this will work
select content.text, count(*) likes
from content join ratings on content.id = ratings.id
group by context.text
order by likes

How to order this specific Inner Joins?

Right now I'm creating an online game where I list the last transfers of players.
The table that handles the history of players, has the columns history_join_date and history_end_date.
When history_end_date is filled, it means that player left a club, and when it is like the default (0000-00-00 00:00:00) and history_join_date has some date it means player joined the club (in that date).
Right now, I've the following query:
SELECT
player_id,
player_nickname,
team_id,
team_name,
history_join_date,
history_end_date
FROM
players
INNER JOIN history
ON history.history_user_id = players.player_id
INNER JOIN teams
ON history.history_team_id = teams.team_id
ORDER BY
history_end_date DESC,
history_join_date DESC
LIMIT 7
However, this query returns something like (filtered with PHP above):
(22-Aug-2012 23:05): Folha has left Portuguese Haxball Team.
(22-Aug-2012 00:25): mancini has left United.
(21-Aug-2012 01:29): PatoDaOldSchool has left Reign In Power.
(22-Aug-2012 23:37): Master has joined Born To Win.
(22-Aug-2012 23:28): AceR has joined Born To Win.
(22-Aug-2012 23:08): Nasri has joined Porto Club of Haxball.
(22-Aug-2012 18:53): Lloyd Banks has joined ARRIBA.
PHP Filter:
foreach ($transfers as $transfer) {
//has joined
if($transfer['history_end_date']<$transfer['history_join_date']) {
$type = ' has joined ';
$date = date("d-M-Y H:i", strtotime($transfer['history_join_date']));
} else {
$type = ' has left ';
$date = date("d-M-Y H:i", strtotime($transfer['history_end_date']));
}
As you can see, in the transfers order, the date is not being followed strictly (22-Aug => 21-Aug => 22-Aug).
What am I missing in the SQL?
Regards!
The issue is you are ordering based upon two different values. So your results are ordered first by history_end_date, and when the end dates are equal (i.e. when it is the default value), they are then ordered by history_join_date
(Note that your first results are all ends, and then your subsequent results are all joins, and each subset is properly ordered).
How much control do you have over this data structure? You might be able to restructure the history table such that there is only a single date, and a history type of JOINED or END... You might be able to make a view of joined_date and end_date and sort across that...
From what you have in the question I made up the following DDL & Data:
create table players (
player_id INT NOT NULL AUTO_INCREMENT PRIMARY KEY,
player_nickname VARCHAR(255) NOT NULL UNIQUE
);
create table teams (
team_id INT NOT NULL AUTO_INCREMENT PRIMARY KEY,
team_name VARCHAR(255) NOT NULL UNIQUE
);
create table history (
history_id INT NOT NULL AUTO_INCREMENT PRIMARY KEY,
history_user_id INT NOT NULL, history_team_id INT NOT NULL,
history_join_date DATETIME NOT NULL,
history_end_date DATETIME NOT NULL DEFAULT "0000-00-00 00:00:00"
);
insert into players VALUES
(1,'Folha'),
(2,'mancini'),
(3,'PatoDaOldSchool'),
(4,'Master'),
(5,'AceR'),
(6,'Nasri'),
(7,'Lloyd Banks');
insert into teams VALUES
(1,'Portuguese Haxball Team'),
(2,'United'),
(3,'Reign In Power'),
(4,'Born To Win'),
(5,'Porto Club of Haxball'),
(6,'ARRIBA');
insert into history VALUES
(DEFAULT,1,1,'2012-08-01 00:04','2012-08-22 23:05'),
(DEFAULT,2,2,'2012-08-21 19:04','2012-08-22 00:25'),
(DEFAULT,3,3,'2012-08-19 01:29','2012-08-21 01:29'),
(DEFAULT,4,4,'2012-08-22 23:37',DEFAULT),
(DEFAULT,5,4,'2012-08-22 23:28',DEFAULT),
(DEFAULT,6,5,'2012-08-22 23:08',DEFAULT),
(DEFAULT,7,6,'2012-08-22 18:53',DEFAULT);
SOLUTION ONE - History Event View
This is obviously not the only solution (and you'd have to evaluate options as they suit your needs, but you could create a view in MySQL for your history events and join to it and use it for ordering similar to the following:
create view historyevent (
event_user_id,
event_team_id,
event_date,
event_type
) AS
SELECT
history_user_id,
history_team_id,
history_join_date,
'JOIN'
FROM history
UNION
SELECT
history_user_id,
history_team_id,
history_end_date,
'END'
FROM history
WHERE history_end_date <> "0000-00-00 00:00:00";
Your select then becomes:
SELECT
player_id,
player_nickname,
team_id,
team_name,
event_date,
event_type
FROM players
INNER JOIN historyevent
ON historyevent.event_user_id = players.player_id
INNER JOIN teams
ON historyevent.event_team_id = teams.team_id
ORDER BY
event_date DESC;
Benefit here is you can get both joins and leaves for the same player.
SOLUTION TWO - Pseudo column. use the IF construction to pick one or the other column.
SELECT
player_id,
player_nickname,
team_id,
team_name,
history_join_date,
history_end_date,
IF(history_end_date>history_join_date,history_end_date,history_join_date) as order_date
FROM
players
INNER JOIN history
ON history.history_user_id = players.player_id
INNER JOIN teams
ON history.history_team_id = teams.team_id
ORDER BY
order_date DESC;
Building from #Barmar's answer, you can also use GREATEST() to pick the greatest of the arguments. (MAX() is a grouping function... not actually what you're looking for)
I think what you want is:
ORDER BY MAX(history_join_date, history_end_date)

Need expert advice on complex nested queries

I have 3 queries. I was told that they were potentially inefficient so I was wondering if anyone who is experienced could suggest anything. The logic is somewhat complex so bear with me.
I have two tables: shoutbox, and topic. Topic stores all information on topics that were created, while shoutbox stores all comments pertaining to each topic. Each comment comes with a group labelled by reply_chunk_id. The earliest timestamp is the first comment, while any following with the same reply_chunk_id and a later timestamp are replies. I would like to find the latest comment for each group that was started by the user (made first comment) and if the latest comment was made this month display it.
What I have written achieves that with one problem: all the latest comments are displayed in random order. I would like to organize these groups/latest comments. I really appreciate any advice
Shoutbox
Field Type
-------------------
id int(5)
timestamp int(11)
user varchar(25)
message varchar(2000)
topic_id varchar(35)
reply_chunk_id varchar(35)
Topic
id mediumint(8)
topic_id varchar(35)
subject_id mediumint(8)
file_name varchar(35)
topic_title varchar(255)
creator varchar(25)
topic_host varchar(255)
timestamp int(11)
color varchar(10)
mp3 varchar(75)
custom_background varchar(55)
description mediumtext
content_type tinyint(1)
Query
$sql="SELECT reply_chunk_id FROM shoutbox
GROUP BY reply_chunk_id
HAVING count(*) > 1
ORDER BY timestamp DESC ";
$stmt16 = $conn->prepare($sql);
$result=$stmt16->execute();
while($row = $stmt16->fetch(PDO::FETCH_ASSOC)){
$sql="SELECT user,reply_chunk_id, MIN(timestamp) AS grp_timestamp
FROM shoutbox WHERE reply_chunk_id=? AND user=?";
$stmt17 = $conn->prepare($sql);
$result=$stmt17->execute(array($row['reply_chunk_id'],$user));
while($row2 = $stmt17->fetch(PDO::FETCH_ASSOC)){
$sql="SELECT t.topic_title, t.content_type, t.subject_id,
t.creator, t.description, t.topic_host,
c1.message, c1.topic_id, c1.user, c1.timestamp AS max
FROM shoutbox c1
JOIN topic t ON (t.topic_id = c1.topic_id)
WHERE reply_chunk_id = ? AND c1.timestamp > ?
ORDER BY c1.timestamp DESC, c1.id
LIMIT 1";
$stmt18 = $conn->prepare($sql);
$result=$stmt18->execute(array($row2['reply_chunk_id'],$month));
while($row3 = $stmt18->fetch(PDO::FETCH_ASSOC)){
Make the first query:
SELECT reply_chunk_id FROM shoutbox
GROUP BY reply_chunk_id
HAVING count(*) > 1
ORDER BY timestamp DESC
This does the same, but is faster.
Make sure you have an index on reply_chunk_id.
The second query:
SELECT user,reply_chunk_id, MIN(timestamp) AS grp_timestamp
FROM shoutbox WHERE reply_chunk_id=? AND user=?
The GROUP BY is unneeded, because only one row gets returned, because of the MIN() and the equality tests.
The third query:
SELECT t.topic_title, t.content_type, t.subject_id,
t.creator, t.description, t.topic_host,
c1.message, c1.topic_id, c1.user, c1.timestamp AS max
FROM shoutbox c1
JOIN topic t ON (t.topic_id = c1.topic_id)
WHERE reply_chunk_id = ? AND c1.timestamp > ?
ORDER BY c1.timestamp DESC, c1.id
LIMIT 1
Doing it all in one query:
SELECT
t.user,t.reply_chunk_id, MIN(t.timestamp) AS grp_timestamp,
t.topic_title, t.content_type, t.subject_id,
t.creator, t.description, t.topic_host,
c1.message, c1.topic_id, c1.user, c1.timestamp AS max
FROM shoutbox c1
INNER JOIN topic t ON (t.topic_id = c1.topic_id)
LEFT JOIN shoutbox c2 ON (c1.id = c2.id and c1.timestamp < c2.timestamp)
WHERE c2.timestamp IS NULL AND t.user = ?
GROUP BY t.reply_chunk_id
HAVING count(*) > 1
ORDER BY t.reply_chunk_id
or the equivalent
SELECT
t.user,t.reply_chunk_id, MIN(t.timestamp) AS grp_timestamp,
t.topic_title, t.content_type, t.subject_id,
t.creator, t.description, t.topic_host,
c1.message, c1.topic_id, c1.user, c1.timestamp AS max
FROM shoutbox c1
INNER JOIN topic t ON (t.topic_id = c1.topic_id)
WHERE c1.timestamp = (SELECT max(timestamp) FROM shoutbox c2
WHERE c2.reply_chunk_id = c1.reply_chunk_id)
AND t.user = ?
GROUP BY t.reply_chunk_id
HAVING count(*) > 1
ORDER BY t.reply_chunk_id
How does this work?
The group by selects one entry per topic.reply_chunk_id
The left join (c1.id = c2.id and c1.`timestamp` < c2.`timestamp`) + WHERE c2.`timestamp` IS NULL selects only those items from shoutbox which have the highest timestamp. This works because MySQL keeps increasing c1.timestamp to get c2.timestamp to be null as soon as that is true, it c1.timestamp will have reached its maximum value and will select that row within the possible rows to choose from.
If you don't understand point 2, see: http://dev.mysql.com/doc/refman/5.0/en/example-maximum-column-group-row.html
Note that the PDO is autoescaping the fields with backticks
Sounds like most of it should be directly from your ShoutBox table. Prequery to find all "Chunks" the user replied to... of those chunks (and topic_ID since each chunk is always the same topic), get their respective minimum and maximum. Using the "Having count(*) > 1" will force only those that HAVE a second posting by a given user (what you were looking for).
THEN, re-query to the chunks to get the minimum regardless of user. This prevents the need of querying ALL chunks. Then join only what a single user is associated with back to the Topic.
Additionally, and I could be incorrect and need to adjust (minimally), but it appears that the SOUNDBOX table ID column would be an auto-increment column, and just happens to be time-stamped too at time of creation. That said, for a given "Chunk", the earliest ID would be the same as the earliest timestamp as they would be stamped at the same time they are created. Also makes easier on subsequent JOINs and sub query too.
By using STRAIGHT_JOIN, should force the "PreQuery" FIRST, come up with a very limited set, then qualify the WHERE clause and joins afterwords.
select STRAIGHT_JOIN
T.topic_title,
T.content_type,
T.subject_id,
T.creator,
T.description,
T.topic_host,
sb2.Topic_ID
sb2.message,
sb2.user,
sb2.TimeStamp
from
( select
sb1.Reply_Chunk_ID,
sb1.Topic_ID,
count(*) as TotalEntries,
min( sb1.id ) as FirstIDByChunkByUser,
min( sbJoin.id ) as FirstIDByChunk,
max( sbJoin.id ) as LastIDByChunk,
max( sbJoin.timestamp ) as LastTimeByChunk
from
ShoutBox sb1
join ShoutBox sbJoin
on sb1.Reply_Chunk_ID = sbJoin.Reply_Chunk_ID
where
sb1.user = CurrentUser
group by
sb1.Reply_Chunk_ID,
sb1.Topic_ID
having
min( sb1.id ) = min( sbJoin.ID ) ) PreQuery
join Topic T on
PreQuery.Topic_ID = T.ID
join ShoutBox sb2
PreQuery.LastIDByChunk = sb2.ID
where
sb2.TimeStamp >= YourTimeStampCriteria
order by
sb2.TimeStamp desc
EDIT ---- QUERY EXPLANATION -- with Modified query.
I've changed the query from re-reading (as was almost midnight when answered after holiday weekend :)
First, "STRAIGHT_JOIN" is a MySQL clause telling the engine to "do the query in the way / sequence I've stated". Basically, sometimes an engine will try to think for you and optimize in ways that may appear more efficient, but if based on your data, you know what will retrieve the smallest set of data first, and then join to other lookup fields next might in fact be better. Second the "PreQuery". If you have a "SQL-Select" statement (within parens) as Alias "From" clause, The "PreQuery" is just the name of the alias of the resultset... I could have called it anything, just makes sense that this is a stand-alone query of it's own. (Ooops... fixed to ShoutBox :) As for case-sensitivity, typically Column names are NOT case-sensitive... However, table names are... You could have a table name "MyTest" different than "mytest" or "MYTEST". But by supplying "alias", it helps shorten readability (especially with VeryLongTableNamesUsed ).
Should be working after the re-reading and applying adjustments.. Try the first "Prequery" on its own to see how many records it returns. On its own merits, it should return... for a single "CurrentUser" parameter value, every "Reply_Chunk_ID" (which will always have the same topic_id", get the first ID the person entered (min()). By JOINing again to Shoutbox on the chunk id, we (only those qualified as entered by the user), get the minimum and maximum ID per the chunk REGARDLESS of who started or responded. By applying the HAVING clause, this should only return those where the same person STARTED the topic (hence both have the same min() value.)
Finally, once those have been qualified, join directly to the TOPIC and SHOUTBOX tables again on their own merits of topic_id and LastIDByChunk and order the final results by the latest comment response timestamp descending.
I've added a where clause to further limit your "timestamp" criteria where the most recent final timestamp is on/after the given time period you want.
I would be curious how this query's time performance works compared to your already accepted answer too.

Categories