Selecting from two tables based on id and timestamps - php

I have two tables as follows:
tasks:
id int(5) auto_increment,
content mediumtext,
primary key(id)
completed:
id int(10) auto_increment,
taskid int(5),
datetime int(11),
primary key(id)
I'm attempting to run an SQL query to pull out the tasks that have not been marked as completed today. I'm doing the time calculation from PHP, so the query itself looks something like this:
$morning = mktime(0, 0, 0);
$night = mktime(23, 59, 59);
$query = sprintf("SELECT t.id, t.content FROM tasks as t, completed as c WHERE c.datetime < %s AND c.datetime > %s AND t.id != c.taskid", $night, $morning);
This comes out something like the following:
SELECT t.id, t.content FROM tasks as t, completed as c WHERE c.datetime < 1391471999 AND c.datetime > 1391385600 AND t.id != c.taskid
If someone could point me in the correct direction, that would be awesome. Thanks :)

Using comma operator is INNER JOINing tables. But you are trying to get tasks, which have no corresponding action in the completed table so you should use LEFT JOIN instead to get those. You should also use MySQL datetime comparison as it is less code writting and you don't have to count in timezones all the time.
I suppose you have no tasks with future date in table completed.
SELECT t.id, t.content
FROM tasks as t
LEFT JOIN completed as c ON t.id = c.taskid
WHERE c.datetime < 1391385600
OR c.datetime >= 1391472000
OR c.taskid IS NULL
Select tasks, which:
were completed before today 1391385600
or were completed in future 1391472000
or are not completed at all

I don't really understand. The data set is too small to be representative. And if you can't figure out what the desired result set would look like then you probably shouldn't be here ;-).
That said, is this what you're after...
SELECT t.*
FROM tasks t
LEFT
JOIN completed c
ON c.taskid = t.id
AND c.datetime BETWEEN 1391385600 AND 1391472000
WHERE c.id IS NULL;
?

Related

SELECT rows that are referenced less than x times in another table

In MySQL I have two tables, my reservable "weekend":
id bigint(20) unsigned,
label varchar(64),
date_start date,
max_attendees smallint(5) unsigned
And my attendees:
id bigint(20) unsigned,
name varchar(64),
email varchar(255),
weekend bigint(20) unsigned
I want to select all weekends that have attendees less than their max_attendees. This includes weekends that have 0 attendees.
Note: I also need to ignore weekend with id "1";
Currently, this works fine with PHP (I'm using Wordpress for mysql access), like so:
$weekends = $wpdb->get_results("SELECT * FROM $weekends_table
WHERE id <> 1", ARRAY_A);
$open_weekends = array();
foreach ($weekends as $weekend) {
$id = $weekend['id'];
$attendees = $wpdb->get_row("SELECT COUNT(id) as attendees
FROM $attendees_table
WHERE weekend = $id", ARRAY_A);
if ( $attendees['attendees'] < $weekend['max_attendees'] ) {
$weekend['attendees'] = $attendees['attendees'];
$open_weekends[] = $weekend;
}
}
Shouldn't I be able to do this in MySQL without the PHP? My knowledge of MySQL doesn't extend that far. Can you suggest a query?
use the HAVING clause
This is untested, so you may have to play with it, but here's the gist:
SELECT w.*, COUNT(a.name)
FROM weekend w
LEFT JOIN attendees a
ON w.id = a.weekend
WHERE w.id <> 1
GROUP BY w.id
HAVING (COUNT(a.name) < w.max_attendees) OR (COUNT(a.name) IS NULL)
A very simple approach would be this:
SELECT COUNT($attendees_table.id) as attendees
attendees_table.max_attendees as maximum
FROM weekends_table, attendees_table
WHERE attendees_table.weekend = weekends_table.id
GROUP BY weekends_table.id
You could use a JOIN ON attendees_table.
This should be possible as well:
SELECT COUNT(attendees_table.id) as attendees
weekends_table.max_attendees as maximum
FROM weekends_table, attendees_table
WHERE attendees_table.weekend = weekends_table.id
GROUP BY weekends_table.id
HAVING attendees < maximum
This is all untested. I don't have your tables or data, but it might get you going?
Ah, it didn't get what you wanted. To include zero attendees you can use a subselect:
SELECT weekends_table.id AS weekend_id
FROM weekends_table
WHERE weekends_table.max_attendees > (SELECT COUNT(*)
FROM attendees_table
WHERE attendees_table.weekend = weekends_table.id)
It should return weekend id's where there's at least room for one more attendee. Again, completely untested, but perhaps it works?

Making sales statistics for my ERP system, but performance is bad

I have a ERP system programmed in PHP with a mySQL database, with all my orders for the past 4 years in it. Now I would like to make a function to generate sales statistics. It should be possible to set search criteria like Salesman, Department and year/period.
The sales statistics should be grouped by customer. Just like the illustration on this link:
http://maabjerg.eu/illustration_stat.png
My customers table:
customers
--------------------
id - int - auto - primary
name - varchar(100)
My orders table:
orders
-------------------
id - int - auto - primary
customerId - int
departmentId - int
salesmanId - int
orderdate - datetime
invoicedate - datetime
quantity - int
saleprice - decimal(10,2)
I had no trouble making this, but the performance is very bad. The way I had made it before was like:
foreach($customers as $customer)
{
foreach($months as $month)
{
$sql = mysql_query("select sum(quantity*saleprice) as amount from orders where DATE_FORMAT(invoicedate, '%m-%Y') = '".$month."-".$_REQUEST["year"]."' AND customerId='".$customer->id."'",$connection) or die(mysql_error());
$rs = mysql_fetch_assoc($sql);
$result[$customerId][$month] = $rs["amount"];
}
}
I hope someone can give me advice how to make this the best way.
Thanks in advance.
Steffen
This is your query:
select sum(quantity*saleprice) as amount
from order
where DATE_FORMAT(invoicedate, '%m-%Y') = '".$month."-".$_REQUEST["year"]."' AND
customerId='".$customer->id."'
As written, if you want to speed it up, add an index on order(customerId).
You should also do this as one query:
select c.name, sum(quantity*saleprice) as amount
from customers c left outer join
order o
on c.id = o.customerId
where DATE_FORMAT(invoicedate, '%m-%Y') = '".$month."-".$_REQUEST["year"]."' AND
customerId='".$customer->id."'
group by c.name;
You can rewrite the query a bit, and build an index on order(customerId, invoicedate). This would require creating constants for the beginning and ending of the period and then doing something like:
select c.name, sum(quantity*saleprice) as amount
from customers c left outer join
order o
on c.id = o.customerId
where invoicedate $StartDate and $EndDate AND
customerId='".$customer->id."'
group by c.name;
MySQL cannot use an index when there is a function call on the column.

How to order this specific Inner Joins?

Right now I'm creating an online game where I list the last transfers of players.
The table that handles the history of players, has the columns history_join_date and history_end_date.
When history_end_date is filled, it means that player left a club, and when it is like the default (0000-00-00 00:00:00) and history_join_date has some date it means player joined the club (in that date).
Right now, I've the following query:
SELECT
player_id,
player_nickname,
team_id,
team_name,
history_join_date,
history_end_date
FROM
players
INNER JOIN history
ON history.history_user_id = players.player_id
INNER JOIN teams
ON history.history_team_id = teams.team_id
ORDER BY
history_end_date DESC,
history_join_date DESC
LIMIT 7
However, this query returns something like (filtered with PHP above):
(22-Aug-2012 23:05): Folha has left Portuguese Haxball Team.
(22-Aug-2012 00:25): mancini has left United.
(21-Aug-2012 01:29): PatoDaOldSchool has left Reign In Power.
(22-Aug-2012 23:37): Master has joined Born To Win.
(22-Aug-2012 23:28): AceR has joined Born To Win.
(22-Aug-2012 23:08): Nasri has joined Porto Club of Haxball.
(22-Aug-2012 18:53): Lloyd Banks has joined ARRIBA.
PHP Filter:
foreach ($transfers as $transfer) {
//has joined
if($transfer['history_end_date']<$transfer['history_join_date']) {
$type = ' has joined ';
$date = date("d-M-Y H:i", strtotime($transfer['history_join_date']));
} else {
$type = ' has left ';
$date = date("d-M-Y H:i", strtotime($transfer['history_end_date']));
}
As you can see, in the transfers order, the date is not being followed strictly (22-Aug => 21-Aug => 22-Aug).
What am I missing in the SQL?
Regards!
The issue is you are ordering based upon two different values. So your results are ordered first by history_end_date, and when the end dates are equal (i.e. when it is the default value), they are then ordered by history_join_date
(Note that your first results are all ends, and then your subsequent results are all joins, and each subset is properly ordered).
How much control do you have over this data structure? You might be able to restructure the history table such that there is only a single date, and a history type of JOINED or END... You might be able to make a view of joined_date and end_date and sort across that...
From what you have in the question I made up the following DDL & Data:
create table players (
player_id INT NOT NULL AUTO_INCREMENT PRIMARY KEY,
player_nickname VARCHAR(255) NOT NULL UNIQUE
);
create table teams (
team_id INT NOT NULL AUTO_INCREMENT PRIMARY KEY,
team_name VARCHAR(255) NOT NULL UNIQUE
);
create table history (
history_id INT NOT NULL AUTO_INCREMENT PRIMARY KEY,
history_user_id INT NOT NULL, history_team_id INT NOT NULL,
history_join_date DATETIME NOT NULL,
history_end_date DATETIME NOT NULL DEFAULT "0000-00-00 00:00:00"
);
insert into players VALUES
(1,'Folha'),
(2,'mancini'),
(3,'PatoDaOldSchool'),
(4,'Master'),
(5,'AceR'),
(6,'Nasri'),
(7,'Lloyd Banks');
insert into teams VALUES
(1,'Portuguese Haxball Team'),
(2,'United'),
(3,'Reign In Power'),
(4,'Born To Win'),
(5,'Porto Club of Haxball'),
(6,'ARRIBA');
insert into history VALUES
(DEFAULT,1,1,'2012-08-01 00:04','2012-08-22 23:05'),
(DEFAULT,2,2,'2012-08-21 19:04','2012-08-22 00:25'),
(DEFAULT,3,3,'2012-08-19 01:29','2012-08-21 01:29'),
(DEFAULT,4,4,'2012-08-22 23:37',DEFAULT),
(DEFAULT,5,4,'2012-08-22 23:28',DEFAULT),
(DEFAULT,6,5,'2012-08-22 23:08',DEFAULT),
(DEFAULT,7,6,'2012-08-22 18:53',DEFAULT);
SOLUTION ONE - History Event View
This is obviously not the only solution (and you'd have to evaluate options as they suit your needs, but you could create a view in MySQL for your history events and join to it and use it for ordering similar to the following:
create view historyevent (
event_user_id,
event_team_id,
event_date,
event_type
) AS
SELECT
history_user_id,
history_team_id,
history_join_date,
'JOIN'
FROM history
UNION
SELECT
history_user_id,
history_team_id,
history_end_date,
'END'
FROM history
WHERE history_end_date <> "0000-00-00 00:00:00";
Your select then becomes:
SELECT
player_id,
player_nickname,
team_id,
team_name,
event_date,
event_type
FROM players
INNER JOIN historyevent
ON historyevent.event_user_id = players.player_id
INNER JOIN teams
ON historyevent.event_team_id = teams.team_id
ORDER BY
event_date DESC;
Benefit here is you can get both joins and leaves for the same player.
SOLUTION TWO - Pseudo column. use the IF construction to pick one or the other column.
SELECT
player_id,
player_nickname,
team_id,
team_name,
history_join_date,
history_end_date,
IF(history_end_date>history_join_date,history_end_date,history_join_date) as order_date
FROM
players
INNER JOIN history
ON history.history_user_id = players.player_id
INNER JOIN teams
ON history.history_team_id = teams.team_id
ORDER BY
order_date DESC;
Building from #Barmar's answer, you can also use GREATEST() to pick the greatest of the arguments. (MAX() is a grouping function... not actually what you're looking for)
I think what you want is:
ORDER BY MAX(history_join_date, history_end_date)

Need expert advice on complex nested queries

I have 3 queries. I was told that they were potentially inefficient so I was wondering if anyone who is experienced could suggest anything. The logic is somewhat complex so bear with me.
I have two tables: shoutbox, and topic. Topic stores all information on topics that were created, while shoutbox stores all comments pertaining to each topic. Each comment comes with a group labelled by reply_chunk_id. The earliest timestamp is the first comment, while any following with the same reply_chunk_id and a later timestamp are replies. I would like to find the latest comment for each group that was started by the user (made first comment) and if the latest comment was made this month display it.
What I have written achieves that with one problem: all the latest comments are displayed in random order. I would like to organize these groups/latest comments. I really appreciate any advice
Shoutbox
Field Type
-------------------
id int(5)
timestamp int(11)
user varchar(25)
message varchar(2000)
topic_id varchar(35)
reply_chunk_id varchar(35)
Topic
id mediumint(8)
topic_id varchar(35)
subject_id mediumint(8)
file_name varchar(35)
topic_title varchar(255)
creator varchar(25)
topic_host varchar(255)
timestamp int(11)
color varchar(10)
mp3 varchar(75)
custom_background varchar(55)
description mediumtext
content_type tinyint(1)
Query
$sql="SELECT reply_chunk_id FROM shoutbox
GROUP BY reply_chunk_id
HAVING count(*) > 1
ORDER BY timestamp DESC ";
$stmt16 = $conn->prepare($sql);
$result=$stmt16->execute();
while($row = $stmt16->fetch(PDO::FETCH_ASSOC)){
$sql="SELECT user,reply_chunk_id, MIN(timestamp) AS grp_timestamp
FROM shoutbox WHERE reply_chunk_id=? AND user=?";
$stmt17 = $conn->prepare($sql);
$result=$stmt17->execute(array($row['reply_chunk_id'],$user));
while($row2 = $stmt17->fetch(PDO::FETCH_ASSOC)){
$sql="SELECT t.topic_title, t.content_type, t.subject_id,
t.creator, t.description, t.topic_host,
c1.message, c1.topic_id, c1.user, c1.timestamp AS max
FROM shoutbox c1
JOIN topic t ON (t.topic_id = c1.topic_id)
WHERE reply_chunk_id = ? AND c1.timestamp > ?
ORDER BY c1.timestamp DESC, c1.id
LIMIT 1";
$stmt18 = $conn->prepare($sql);
$result=$stmt18->execute(array($row2['reply_chunk_id'],$month));
while($row3 = $stmt18->fetch(PDO::FETCH_ASSOC)){
Make the first query:
SELECT reply_chunk_id FROM shoutbox
GROUP BY reply_chunk_id
HAVING count(*) > 1
ORDER BY timestamp DESC
This does the same, but is faster.
Make sure you have an index on reply_chunk_id.
The second query:
SELECT user,reply_chunk_id, MIN(timestamp) AS grp_timestamp
FROM shoutbox WHERE reply_chunk_id=? AND user=?
The GROUP BY is unneeded, because only one row gets returned, because of the MIN() and the equality tests.
The third query:
SELECT t.topic_title, t.content_type, t.subject_id,
t.creator, t.description, t.topic_host,
c1.message, c1.topic_id, c1.user, c1.timestamp AS max
FROM shoutbox c1
JOIN topic t ON (t.topic_id = c1.topic_id)
WHERE reply_chunk_id = ? AND c1.timestamp > ?
ORDER BY c1.timestamp DESC, c1.id
LIMIT 1
Doing it all in one query:
SELECT
t.user,t.reply_chunk_id, MIN(t.timestamp) AS grp_timestamp,
t.topic_title, t.content_type, t.subject_id,
t.creator, t.description, t.topic_host,
c1.message, c1.topic_id, c1.user, c1.timestamp AS max
FROM shoutbox c1
INNER JOIN topic t ON (t.topic_id = c1.topic_id)
LEFT JOIN shoutbox c2 ON (c1.id = c2.id and c1.timestamp < c2.timestamp)
WHERE c2.timestamp IS NULL AND t.user = ?
GROUP BY t.reply_chunk_id
HAVING count(*) > 1
ORDER BY t.reply_chunk_id
or the equivalent
SELECT
t.user,t.reply_chunk_id, MIN(t.timestamp) AS grp_timestamp,
t.topic_title, t.content_type, t.subject_id,
t.creator, t.description, t.topic_host,
c1.message, c1.topic_id, c1.user, c1.timestamp AS max
FROM shoutbox c1
INNER JOIN topic t ON (t.topic_id = c1.topic_id)
WHERE c1.timestamp = (SELECT max(timestamp) FROM shoutbox c2
WHERE c2.reply_chunk_id = c1.reply_chunk_id)
AND t.user = ?
GROUP BY t.reply_chunk_id
HAVING count(*) > 1
ORDER BY t.reply_chunk_id
How does this work?
The group by selects one entry per topic.reply_chunk_id
The left join (c1.id = c2.id and c1.`timestamp` < c2.`timestamp`) + WHERE c2.`timestamp` IS NULL selects only those items from shoutbox which have the highest timestamp. This works because MySQL keeps increasing c1.timestamp to get c2.timestamp to be null as soon as that is true, it c1.timestamp will have reached its maximum value and will select that row within the possible rows to choose from.
If you don't understand point 2, see: http://dev.mysql.com/doc/refman/5.0/en/example-maximum-column-group-row.html
Note that the PDO is autoescaping the fields with backticks
Sounds like most of it should be directly from your ShoutBox table. Prequery to find all "Chunks" the user replied to... of those chunks (and topic_ID since each chunk is always the same topic), get their respective minimum and maximum. Using the "Having count(*) > 1" will force only those that HAVE a second posting by a given user (what you were looking for).
THEN, re-query to the chunks to get the minimum regardless of user. This prevents the need of querying ALL chunks. Then join only what a single user is associated with back to the Topic.
Additionally, and I could be incorrect and need to adjust (minimally), but it appears that the SOUNDBOX table ID column would be an auto-increment column, and just happens to be time-stamped too at time of creation. That said, for a given "Chunk", the earliest ID would be the same as the earliest timestamp as they would be stamped at the same time they are created. Also makes easier on subsequent JOINs and sub query too.
By using STRAIGHT_JOIN, should force the "PreQuery" FIRST, come up with a very limited set, then qualify the WHERE clause and joins afterwords.
select STRAIGHT_JOIN
T.topic_title,
T.content_type,
T.subject_id,
T.creator,
T.description,
T.topic_host,
sb2.Topic_ID
sb2.message,
sb2.user,
sb2.TimeStamp
from
( select
sb1.Reply_Chunk_ID,
sb1.Topic_ID,
count(*) as TotalEntries,
min( sb1.id ) as FirstIDByChunkByUser,
min( sbJoin.id ) as FirstIDByChunk,
max( sbJoin.id ) as LastIDByChunk,
max( sbJoin.timestamp ) as LastTimeByChunk
from
ShoutBox sb1
join ShoutBox sbJoin
on sb1.Reply_Chunk_ID = sbJoin.Reply_Chunk_ID
where
sb1.user = CurrentUser
group by
sb1.Reply_Chunk_ID,
sb1.Topic_ID
having
min( sb1.id ) = min( sbJoin.ID ) ) PreQuery
join Topic T on
PreQuery.Topic_ID = T.ID
join ShoutBox sb2
PreQuery.LastIDByChunk = sb2.ID
where
sb2.TimeStamp >= YourTimeStampCriteria
order by
sb2.TimeStamp desc
EDIT ---- QUERY EXPLANATION -- with Modified query.
I've changed the query from re-reading (as was almost midnight when answered after holiday weekend :)
First, "STRAIGHT_JOIN" is a MySQL clause telling the engine to "do the query in the way / sequence I've stated". Basically, sometimes an engine will try to think for you and optimize in ways that may appear more efficient, but if based on your data, you know what will retrieve the smallest set of data first, and then join to other lookup fields next might in fact be better. Second the "PreQuery". If you have a "SQL-Select" statement (within parens) as Alias "From" clause, The "PreQuery" is just the name of the alias of the resultset... I could have called it anything, just makes sense that this is a stand-alone query of it's own. (Ooops... fixed to ShoutBox :) As for case-sensitivity, typically Column names are NOT case-sensitive... However, table names are... You could have a table name "MyTest" different than "mytest" or "MYTEST". But by supplying "alias", it helps shorten readability (especially with VeryLongTableNamesUsed ).
Should be working after the re-reading and applying adjustments.. Try the first "Prequery" on its own to see how many records it returns. On its own merits, it should return... for a single "CurrentUser" parameter value, every "Reply_Chunk_ID" (which will always have the same topic_id", get the first ID the person entered (min()). By JOINing again to Shoutbox on the chunk id, we (only those qualified as entered by the user), get the minimum and maximum ID per the chunk REGARDLESS of who started or responded. By applying the HAVING clause, this should only return those where the same person STARTED the topic (hence both have the same min() value.)
Finally, once those have been qualified, join directly to the TOPIC and SHOUTBOX tables again on their own merits of topic_id and LastIDByChunk and order the final results by the latest comment response timestamp descending.
I've added a where clause to further limit your "timestamp" criteria where the most recent final timestamp is on/after the given time period you want.
I would be curious how this query's time performance works compared to your already accepted answer too.

MySQL inclusion/exclusion of posts

This post is taking a substantial amount of time to type because I'm trying to be as clear as possible, so please bear with me if it is still unclear.
Basically, what I have are a table of posts in the database which users can add privacy settings to.
ID | owner_id | post | other_info | privacy_level (int value)
From there, users can add their privacy details, allowing it to be viewable by all [privacy_level = 0), friends (privacy_level = 1), no one (privacy_level = 3), or specific people or filters (privacy_level = 4). For privacy levels specifying specific people (4), the query will reference the table "post_privacy_includes_for" in a subquery to see if the user (or a filter the user belongs to) exists in a row in the table.
ID | post_id | user_id | list_id
Also, the user has the ability to prevent some people from viewing their post in within a larger group by excluding them (e.g., Having it set for everyone to view but hiding it from a stalker user). For this, another reference table is added, "post_privacy_exclude_from" - it looks identical to the setup as "post_privacy_includes_for".
My problem is that this does not scale. At all. At the moment, there are about 1-2 million posts, the majority of them set to be viewable by everyone. For each post on the page it must check to see if there is a row that is excluding the post from being shown to the user - this moves really slow on a page that can be filled with 100-200 posts. It can take up to 2-4 seconds, especially when additional constraints are added to the query.
This also creates extremely large and complex queries that are just... awkward.
SELECT t.*
FROM posts t
WHERE ( (t.privacy_level = 3
AND t.owner_id = ?)
OR (t.privacy_level = 4
AND EXISTS
( SELECT i.id
FROM PostPrivacyIncludeFor i
WHERE i.user_id = ?
AND i.thought_id = t.id)
OR t.privacy_level = 4
AND t.owner_id = ?)
OR (t.privacy_level = 4
AND EXISTS
(SELECT i2.id
FROM PostPrivacyIncludeFor i2
WHERE i2.thought_id = t.id
AND EXISTS
(SELECT r.id
FROM FriendFilterIds r
WHERE r.list_id = i2.list_id
AND r.friend_id = ?))
OR t.privacy_level = 4
AND t.owner_id = ?)
OR (t.privacy_level = 1
AND EXISTS
(SELECT G.id
FROM Following G
WHERE follower_id = t.owner_id
AND following_id = ?
AND friend = 1)
OR t.privacy_level = 1
AND t.owner_id = ?)
OR (NOT EXISTS
(SELECT e.id
FROM PostPrivacyExcludeFrom e
WHERE e.thought_id = t.id
AND e.user_id = ?
AND NOT EXISTS
(SELECT e2.id
FROM PostPrivacyExcludeFrom e2
WHERE e2.thought_id = t.id
AND EXISTS
(SELECT l.id
FROM FriendFilterIds l
WHERE l.list_id = e2.list_id
AND l.friend_id = ?)))
AND t.privacy_level IN (0, 1, 4))
AND t.owner_id = ?
ORDER BY t.created_at LIMIT 100
(mock up query, similar to the query I use now in Doctrine ORM. It's a mess, but you get what I am saying.)
I guess my question is, how would you approach this situation to optimize it? Is there a better way to set up my database? I'm willing to completely scrap the method I have currently built up, but I wouldn't know what to move onto.
Thanks guys.
Updated: Fix the query to reflect the values I defined for privacy level above (I forgot to update it because I simplified the values)
Your query is too long to give a definitive solution for, but the approach I would follow is to simply the data lookups by converting the sub-queries into joins, and then build the logic into the where clause and column list of the select statement:
select t.*, i.*, r.*, G.*, e.* from posts t
left join PostPrivacyIncludeFor i on i.user_id = ? and i.thought_id = t.id
left join FriendFilterIds r on r.list_id = i.list_id and r.friend_id = ?
left join Following G on follower_id = t.owner_id and G.following_id = ? and G.friend=1
left join PostPrivacyExcludeFrom e on e.thought_id = t.id and e.user_id = ?
(This might need expanding: I couldn't follow the logic of the final clause.)
If you can get the simple select working fast AND including all the information needed, then all you need to do is build up the logic in the select list and where clause.
Had a quick stab at simplifying this without re-working your original design too much.
Using this solution your web page can now simply call the following stored procedure to get a list of filtered posts for a given user within a specified period.
call list_user_filtered_posts( <user_id>, <day_interval> );
The whole script can be found here : http://pastie.org/1212812
I haven't fully tested all of this and you may find this solution isn't performant enough for your needs but it may help you in fine tuning/modifying your existing design.
Tables
Dropped your post_privacy_exclude_from table and added a user_stalkers table which works pretty much like the inverse of user_friends. Kept the original post_privacy_includes_for table as per your design as this allows a user restrict a specific post to a subset of people.
drop table if exists users;
create table users
(
user_id int unsigned not null auto_increment primary key,
username varbinary(32) unique not null
)
engine=innodb;
drop table if exists user_friends;
create table user_friends
(
user_id int unsigned not null,
friend_user_id int unsigned not null,
primary key (user_id, friend_user_id)
)
engine=innodb;
drop table if exists user_stalkers;
create table user_stalkers
(
user_id int unsigned not null,
stalker_user_id int unsigned not null,
primary key (user_id, stalker_user_id)
)
engine=innodb;
drop table if exists posts;
create table posts
(
post_id int unsigned not null auto_increment primary key,
user_id int unsigned not null,
privacy_level tinyint unsigned not null default 0,
post_date datetime not null,
key user_idx(user_id),
key post_date_user_idx(post_date, user_id)
)
engine=innodb;
drop table if exists post_privacy_includes_for;
create table post_privacy_includes_for
(
post_id int unsigned not null,
user_id int unsigned not null,
primary key (post_id, user_id)
)
engine=innodb;
Stored Procedures
The stored procedure is relatively simple - it initially selects ALL posts within the specified period and then filters out posts as per your original requirements. I have not performance tested this sproc with large volumes but as the initial selection is relatively small it should be performant enough as well as simplifying your application/middle tier code.
drop procedure if exists list_user_filtered_posts;
delimiter #
create procedure list_user_filtered_posts
(
in p_user_id int unsigned,
in p_day_interval tinyint unsigned
)
proc_main:begin
drop temporary table if exists tmp_posts;
drop temporary table if exists tmp_priv_posts;
-- select ALL posts in the required date range (or whatever selection criteria you require)
create temporary table tmp_posts engine=memory
select
p.post_id, p.user_id, p.privacy_level, 0 as deleted
from
posts p
where
p.post_date between now() - interval p_day_interval day and now()
order by
p.user_id;
-- purge stalker posts (0,1,3,4)
update tmp_posts
inner join user_stalkers us on us.user_id = tmp_posts.user_id and us.stalker_user_id = p_user_id
set
tmp_posts.deleted = 1
where
tmp_posts.user_id != p_user_id;
-- purge other users private posts (3)
update tmp_posts set deleted = 1 where user_id != p_user_id and privacy_level = 3;
-- purge friend only posts (1) i.e where p_user_id is not a friend of the poster
/*
requires another temp table due to mysql temp table problem/bug
http://dev.mysql.com/doc/refman/5.0/en/temporary-table-problems.html
*/
-- the private posts (1) this user can see
create temporary table tmp_priv_posts engine=memory
select
tp.post_id
from
tmp_posts tp
inner join user_friends uf on uf.user_id = tp.user_id and uf.friend_user_id = p_user_id
where
tp.user_id != p_user_id and tp.privacy_level = 1;
-- remove private posts this user cant see
update tmp_posts
left outer join tmp_priv_posts tpp on tmp_posts.post_id = tpp.post_id
set
tmp_posts.deleted = 1
where
tpp.post_id is null and tmp_posts.privacy_level = 1;
-- purge filtered (4)
truncate table tmp_priv_posts; -- reuse tmp table
insert into tmp_priv_posts
select
tp.post_id
from
tmp_posts tp
inner join post_privacy_includes_for ppif on tp.post_id = ppif.post_id and ppif.user_id = p_user_id
where
tp.user_id != p_user_id and tp.privacy_level = 4;
-- remove private posts this user cant see
update tmp_posts
left outer join tmp_priv_posts tpp on tmp_posts.post_id = tpp.post_id
set
tmp_posts.deleted = 1
where
tpp.post_id is null and tmp_posts.privacy_level = 4;
drop temporary table if exists tmp_priv_posts;
-- output filtered posts (display ALL of these on web page)
select
p.*
from
posts p
inner join tmp_posts tp on p.post_id = tp.post_id
where
tp.deleted = 0
order by
p.post_id desc;
-- clean up
drop temporary table if exists tmp_posts;
end proc_main #
delimiter ;
Test Data
Some basic test data.
insert into users (username) values ('f00'),('bar'),('alpha'),('beta'),('gamma'),('omega');
insert into user_friends values
(1,2),(1,3),(1,5),
(2,1),(2,3),(2,4),
(3,1),(3,2),
(4,5),
(5,1),(5,4);
insert into user_stalkers values (4,1);
insert into posts (user_id, privacy_level, post_date) values
-- public (0)
(1,0,now() - interval 8 day),
(1,0,now() - interval 8 day),
(2,0,now() - interval 7 day),
(2,0,now() - interval 7 day),
(3,0,now() - interval 6 day),
(4,0,now() - interval 6 day),
(5,0,now() - interval 5 day),
-- friends only (1)
(1,1,now() - interval 5 day),
(2,1,now() - interval 4 day),
(4,1,now() - interval 4 day),
(5,1,now() - interval 3 day),
-- private (3)
(1,3,now() - interval 3 day),
(2,3,now() - interval 2 day),
(4,3,now() - interval 2 day),
-- filtered (4)
(1,4,now() - interval 1 day),
(4,4,now() - interval 1 day),
(5,4,now());
insert into post_privacy_includes_for values (15,4), (16,1), (17,6);
Testing
As I mentioned before I've not fully tested this but on the surface it seems to be working.
select * from posts;
call list_user_filtered_posts(1,14);
call list_user_filtered_posts(6,14);
call list_user_filtered_posts(1,7);
call list_user_filtered_posts(6,7);
Hope you find some of this of use.

Categories