I just have a general database theory question. I have a need to make something similar to showing what posts/items a user has viewed or not (such as in a forum) or an unread email message. What I have is there are posts that multiple users can view, but it needs to separate by user who has actually viewed it. So if User A viewed Post 1, it would no longer show that Post 1 is a new item to view, but to User B, it would still show that Post 1 is a new item to view.
I've search for other ideas and one of them is to get a timestamp of when the user last logged in, but I actually need to keep track of the posts they've seen as opposed to posts that have happened since they last logged in.
I would like a MySQL database solution if possible, but I'm open to cookies if that is a must. I could do this on my own and just figure it out, but I'd appreciate any advice on how to properly structure a table(s) to make this the most efficient. Also, bandwidth and storage is not issue.
While reviewing the relevant schema for phpBB, I found the following:
# Table: 'phpbb_topics_track'
CREATE TABLE phpbb_topics_track (
user_id mediumint(8) UNSIGNED DEFAULT '0' NOT NULL,
topic_id mediumint(8) UNSIGNED DEFAULT '0' NOT NULL,
forum_id mediumint(8) UNSIGNED DEFAULT '0' NOT NULL,
mark_time int(11) UNSIGNED DEFAULT '0' NOT NULL,
PRIMARY KEY (user_id, topic_id),
KEY topic_id (topic_id),
KEY forum_id (forum_id)
) CHARACTER SET `utf8` COLLATE `utf8_bin`;
And:
# Table: 'phpbb_forums_track'
CREATE TABLE phpbb_forums_track (
user_id mediumint(8) UNSIGNED DEFAULT '0' NOT NULL,
forum_id mediumint(8) UNSIGNED DEFAULT '0' NOT NULL,
mark_time int(11) UNSIGNED DEFAULT '0' NOT NULL,
PRIMARY KEY (user_id, forum_id)
) CHARACTER SET `utf8` COLLATE `utf8_bin`;
Then I look here in their wiki:
This table keeps record for visited topics in order to mark them as
read or unread. We use the mark_time timestamp in conjunction with
last post of topic x's timestamp to know if topic x is read or not.
In order to accurately tell whether a topic is read, one has to also
check phpbb_forums_track.
So essentially they have a lookup table to store the data associated with a user's viewing of a topic (thread), and then check it against the timestamp in the forum view table, to determine whether the topic has been viewed by the user.
Just create a simple cross-reference table (read_posts or something):
user_id|post_id
----------------
2 | 132
53 | 43
....
Make sure that both of these columns are indexed (especially important that the user_id be indexed) and then use a join (or a sub-query) to select unread posts for the logged in user. If you're just trying to show a list of unread posts, for example, you just run:
SELECT * FROM `posts` WHERE `post_id` NOT IN (
SELECT `post_id` FROM `read_posts` WHERE `user_id`='[$USER ID]')
ORDER BY [your ordering clause]
Based on this description I would use a simple table with maybe 3 columns.
User ID
Post ID
Timestamp First Viewed
When a user views a post, add a row to the table. If a row does not exist in the table for a given user/post id combo, then they have not viewed the post.
Related
I have two tables, the first is the Videos table which has the details of every videos.
CREATE TABLE IF NOT EXISTS `Videos` (
`VideoID` varchar(23) NOT NULL AUTO_INCREMENT,
`VideoTitle` varchar(50) NOT NULL,
);
The second records the download of each video
CREATE TABLE IF NOT EXISTS `Downloads` (
`downloadID` int(11) NOT NULL AUTO_INCREMENT,
`userID` int(11) NOT NULL,
`VideoID` varchar(23) NOT NULL,
);
Every record in the Downloads table equates to a single download.
How do I compile a list of the top ten most downloaded items
How are you using a VARCHAR field for AUTO_INCREMENT? Have i missed some MySQL Updates?
Nope i haven't, MySQL still doesn't allow that.
Use a large enough integer data type for the AUTO_INCREMENT column to hold the maximum sequence value you will need.
Reference
Once the schema is fixed, you can do
SELECT VideoTitle,count(downloadID) as d FROM Downloads
INNER JOIN Videos ON Videos.VideoId=Downloads.VideoID
GROUP BY Downloads.VideoID
ORDER BY d DESC
LIMIT 10
P.S: If you had provided some sample data I would be happy to put up a fiddle, feeling lazy without it :)
I need to determine what table data is from for a news feed. The feed must say something like "Person has uploaded a video" or "Person has updated their bio". Therefore I need to determine where data came from as different types of data are in different tables, obviously. I am hoping you can do this with SQL but probably not so PHP is the option. I have no idea how to do this so just need pointing in the right direction.
I'll briefly describe the database as I don't have time to make a diagram.
1.There is a table titled members with all basic info such as email, password and ID. The ID is the primary key.
All other tables have foreign keys for the ID linking to the ID in the members table.
Other tables include; tracks, status, pics, videos. All pretty self explanatory from there.
I need to determine somehow what table the updated data comes from so I can then tell the user what so and so has done. Preferably I would want only one SQL statement for the whole feed so all the tables are joined and ordered by timestamp making everything much simpler for me. Hopefully I can do both but as I said really not sure.
A basic outline of the statement, will be longer but have simplified;
SELECT N.article, N.ID, A.ID, A.name,a.url, N.timestamp
FROM news N
LEFT JOIN artists A ON N.ID = A.ID
WHERE N.ID = A.ID
ORDER BY N.timestamp DESC
LIMIT 10
Members table;
CREATE TABLE `members` (
`ID` int(111) NOT NULL AUTO_INCREMENT,
`email` varchar(100) COLLATE latin1_general_ci NOT NULL,
`password` varchar(100) COLLATE latin1_general_ci NOT NULL,
`FNAME` varchar(100) COLLATE latin1_general_ci NOT NULL,
`SURNAME` varchar(100) COLLATE latin1_general_ci NOT NULL,
`timestamp` timestamp NOT NULL DEFAULT '0000-00-00 00:00:00' ON UPDATE CURRENT_TIMESTAMP,
PRIMARY KEY (`ID`),
UNIQUE KEY `email` (`email`)
) ENGINE=InnoDB AUTO_INCREMENT=5 DEFAULT CHARSET=latin1 COLLATE=latin1_general_ci
Tracks table, all other tables are pretty much the same;
CREATE TABLE `tracks` (
`ID` int(11) NOT NULL,
`url` varchar(200) COLLATE latin1_general_ci NOT NULL,
`name` varchar(100) COLLATE latin1_general_ci NOT NULL,
`timestamp` timestamp NOT NULL DEFAULT '0000-00-00 00:00:00' ON UPDATE CURRENT_TIMESTAMP,
`track_ID` int(11) NOT NULL AUTO_INCREMENT,
PRIMARY KEY (`track_ID`),
UNIQUE KEY `url` (`url`),
UNIQUE KEY `track_ID` (`track_ID`),
KEY `ID` (`ID`),
CONSTRAINT `tracks_ibfk_1` FOREIGN KEY (`ID`) REFERENCES `members` (`ID`)
) ENGINE=InnoDB AUTO_INCREMENT=4 DEFAULT CHARSET=latin1 COLLATE=latin1_general_ci
Before I have tried using a mysql query for each table and putting everything into an array and echoing it out. This seemed long and tiresome and I had no luck with it. I have now deleted all that code as it was a week or so ago.
Please do not feel you have to go into depth with this just point me in the right direction.
ADDITION:
Here is the sql query i have made for a trigger that was suggested. Not sure what is wrong as have never used trigger before. When inserting something into tracks this error comes up
#1054 - Unknown column 'test' in 'field list'
The values in the query are just for testing at the moment
delimiter $$
CREATE
TRIGGER tracks_event AFTER INSERT
ON tracks FOR EACH ROW
BEGIN
INSERT into events(ID, action)
VALUES (3, test);
END$$
delimiter ;
UPDATE!
I have now created a table called events as suggested and used triggers to update it AFTER an insert in one of several tables.
Here is the query I have tried but it is wrong. The query needs to get info referenced in the events table from all the other tables and order by timestamp.
SELECT T.url, E.ID, T.ID, E.action, T.name, T.timestamp
FROM tracks T
LEFT JOIN events E ON T.ID = E.ID
WHERE T.ID = E.ID
ORDER BY T.timestamp DESC
In that query I have only include the events and tracks table for simplicity as the problem is still there. There will be many more tables so the problem will worsen.
It's hard to describe the problem but basically because there is an ID in every table and one ID can do several actions, the action can be shown with the wrong outcome, in this case url.
I will explain what's in the events table and the tracks table and give the outcome to further explain.
In the events table;
4 has uploaded a track.
3 has some news.
4 has become an NBS artist.
In the tracks;
2 uploads/abc.wav Cannonballs & Stones 2012-08-20 23:59:59 1
3 uploads/19c9aa51c821952c81be46ca9b2e9056.mp3 test 2012-08-31 23:59:59 2
4 uploads/2b412dd197d464fedcecb1e244e18faf.mp3 testing 2012-08-31 00:32:56 3
4 111 111111 0000-00-00 00:00:00 111111
Outcome of query;
uploads/19c9aa51c821952c81be46ca9b2e9056.mp3 3 3 has some news. test 2012-08-31 23:59:59
uploads/2b412dd197d464fedcecb1e244e18faf.mp3 4 4 has uploaded a track. testing 2012-08-31 00:32:56
uploads/2b412dd197d464fedcecb1e244e18faf.mp3 4 4 has become an NBS artist. testing 2012-08-31 00:32:56
111 4 4 has become an NBS artist. 111111 0000-00-00 00:00:00
111 4 4 has uploaded a track. 111111 0000-00-00 00:00:00
As you can see the query gives unwanted results. The action for each ID is given on each url so the url can be shown more than once and with the wrong action. Because there is only the tracks table in that query, the only action i would want showing is 'has uploaded a track.'
It's hard to provide the statement you want without the full details of your schema. For example, the question refers to a news table and an artists table, but doesn't provide the schemas for those, or indicate how the statement that contains those references relate to any of the other tables mentioned in the question.
Still, I think what you want can be done entirely in MySQL, without any fun PHP tricks, especially if there are common fields in each of the various tables.
But first: this might not be the answer you're really wanting, but using triggers on your various tables to update an "events feed" table is likely the best solution. i.e., when an insert or update happens on the "status" table, have a trigger on the status table that inserts into the "events feed" table the ID of the person, and their type of action. You could have a separate insert and update trigger to indicate different events for the same data type.
Then it'd be super-easy to have an events feed, because you're just selecting straight from that events feed table.
Check out the create trigger syntax.
That said, I think you might have a look at the CASE and UNION keywords.
You can then construct a query that grabs data from all tables and outputs strings indicating something. You could then turn that query into a view, and use that as an "events feed" table to select directly from.
Say you have a list of members (which you do), and the various tables that contain actions from those members (i.e., tracks, status, pics, videos), which all have a key pointing back to your members table. You don't need to select from members to generate a list of activity, then; you can just UNION together the tables that have certain events.
SELECT
events.member_id
, events.table_id
, events.table
, events.action
, events.when_it_happened
, CASE
WHEN events.table = "tracks" THEN "Did something with tracks"
WHEN events.table = "status" THEN "Did something with status"
END
AS feed_description
FROM (
SELECT
tracks.ID AS member_id
, tracks.track_ID AS table_id
, "tracks" AS table
, CONCAT(tracks.url, ' ', tracks.name) AS action
, tracks.timestamp AS when_it_happened
ORDER BY tracks.timestamp DESC
LIMIT 10
UNION
SELECT
status.ID as member_id
, status.status_id AS table_id
, "status" AS table
, status.value AS action
, status.timestamp AS when_it_happened
ORDER BY status.timestamp DESC
LIMIT 10
UNION
...
) events
ORDER BY events.when_it_happened DESC
I still think you'd be better off creating a feed table built by triggers, because it'll perform a lot better if you're querying for the feed more often than generating events.
Alright, another interesting problem over at Route 50.
We wanted to implement a true forum lightbulb system where posts that are unread by a user (after the user's account is created) show as unread until that status is cleared or until the user reads them.
We figured the best and easiest way to do this would be to implement a table of unread messages.
The Columns are: user_id, board_id, thread_id, post_id, timestamp, and hidden
This is working very well and very quickly for seeing which boards/threads/posts are unread (and linking to them) per user, however it is INCREDIBLY slow for a user to post to the forum even though only a single SQL query is being run:
INSERT IGNORE INTO `forums_lightbulb` SELECT `id`,'x','x','x',UNIX_TIMESTAMP(),0 FROM `users`
I'm sure this is the result of having 3065 user accounts. How can I speed up this process? I'd prefer to keep the system as Real-Time as possible.
Important Note: Please limit your answers to a shared hosting environment with no additional budget. We are limited to PHP and MySQL 5.1.53-log
What PHPBB does is a very quick way to do it. It keeps a table that marks for each thread and each forum when the last time was a user opened it. And uses that to determine if there are unread messages. It allows a Users*Topics + Users*Forums storage usage scheme while allowing a check with pretty simple and fast queries.
You can see how it works from the database structure.
# Table: 'phpbb_forums_track'
CREATE TABLE phpbb_forums_track (
user_id mediumint(8) UNSIGNED DEFAULT '0' NOT NULL,
forum_id mediumint(8) UNSIGNED DEFAULT '0' NOT NULL,
mark_time int(11) UNSIGNED DEFAULT '0' NOT NULL,
PRIMARY KEY (user_id, forum_id)
) CHARACTER SET `utf8` COLLATE `utf8_bin`;
# Table: 'phpbb_topics_track'
CREATE TABLE phpbb_topics_track (
user_id mediumint(8) UNSIGNED DEFAULT '0' NOT NULL,
topic_id mediumint(8) UNSIGNED DEFAULT '0' NOT NULL,
forum_id mediumint(8) UNSIGNED DEFAULT '0' NOT NULL,
mark_time int(11) UNSIGNED DEFAULT '0' NOT NULL,
PRIMARY KEY (user_id, topic_id),
KEY topic_id (topic_id),
KEY forum_id (forum_id)
) CHARACTER SET `utf8` COLLATE `utf8_bin`;
Sorry to say, but the solution proposed in the question is an unscalable design.
I was looking into this problem over here, which had a decent discussion on it (before I showed up). Take a look there.
In your case, storing U*M records to track "unread" posts, where U is the number of users and M is the number of messages, would get out of control very, very quickly. This is because its best-case efficiency requires all users to read every message (and most users don't care about everything, as most everything on a forum is noise). Average case, maybe 20% of users have read 100% of posts, but 80% have read near 0% of posts and never will read the rest. This means that you're being forced to store 0.8*U*M, with U and M only ever increasing, geometrically. No amount of indexing will fix this.
The previous #will-hartung answer has the more efficient approach.
I see that this is pretty old, and I hope you found a better solution in the meantime.
Here is the most efficient way:
have a table called read_threads which stores the thread_id, and user_id
have a column in the users table called mark_read_date which stores the date the user clicked on the mark all threads read link in your forum
in order to determine if a thread is read, your query will check if it is in the read_threads table, or if its last_post_date (the date the last post was made to it) is older than the users mark_read_date
It is important that you also remove all rows from the read_threads table when a user clicks the mark all threads read link in your forum.
On read:
insert into read_articles(user_id, article_id);
On display:
SELECT a.*, r.user_id FROM articles a
LEFT OUTER JOIN read_articles r ON (a.article_id = r.article_id and r.user_id = $user_id)
WHERE (article_filter, like forum or thread id, or whatever)
On your result set, if user_id is not null, then they've read the article. Otherwise, they haven't.
Index as appropriate. Server warm with biscuits and jam.
I realize this is somewhat of an abstract question that has several answers, but I am at a loss as to where to begin. I want to have a separate comments area for each of my blog posts. Should I just set up a different table for the comments for each entry manually every time I update the code to include the latest entry?
Create a new table for the comments with a structure similar to (of course you can customize it to your needs):
Comments
id INT NOT NULL auto_increment,
blog_id INT NOT NULL,
author_id INT NOT NULL DEFAULT 0,
comment text NOT NULL,
added_date DATETIME NOT NULL
The author_id is linked to the users table for logged in users, 0 for an anonymous user. Everything else should be self explanatory I hope.
I'm not sure what you're quite getting at... but it sounds like you want to have comments specific to each post. If that's the case, just simply make a field in the comments table for "post_id" or something similar. Then on each post page just use a SELECT statement to grab comments for that particular post_id.
Simply have a database table storing the post ID in one of the fields. Something similar to the following:
CREATE TABLE blog_comments (
id INT(10) UNSIGNED NOT NULL AUTO_INCREMENT,
author_id INT(10) UNSIGNED NOT NULL DEFAULT '0',
post_id INT(10) UNSIGNED NOT NULL,
comment TEXT NOT NULL,
added_on TIMESTAMP NOT NULL DEFAULT CURRENT_TIMESTAMP,
PRIMARY KEY (`id`)
);
Then you can simply query for post comments like thus:
$pid = 13; // or whatever your post ID is; could be a $_GET value
$sql = "SELECT * FROM comments WHERE post_id = '$pid' ORDER BY added_on DESC";
Of course, be sure to sanitize the above query, as you don't want any one passing what they feel like for the value of $pid. You need to make sure it's a number, and a number only.
I have a social network similar to myspace but I use PHP and mysql, I have been looking for the best way to show users bulletins posted only
fronm themself and from users they are confirmed friends with.
This involves 3 tables
friend_friend = this table stores records for who is who's friend
friend_bulletins = this stores the bulletins
friend_reg_user = this is the main user table with all user data like name and photo url
I will post bulletin and friend table scheme below, I will only post the fields important for the user table.
-- Table structure for table friend_bulletin
CREATE TABLE IF NOT EXISTS `friend_bulletin` (
`auto_id` int(11) NOT NULL AUTO_INCREMENT,
`user_id` int(10) NOT NULL DEFAULT '0',
`bulletin` text NOT NULL,
`subject` varchar(255) NOT NULL DEFAULT '',
`color` varchar(6) NOT NULL DEFAULT '000000',
`submit_date` datetime NOT NULL DEFAULT '0000-00-00 00:00:00',
`status` enum('Active','In Active') NOT NULL DEFAULT 'Active',
`spam` enum('0','1') NOT NULL DEFAULT '1',
PRIMARY KEY (`auto_id`),
KEY `user_id` (`user_id`),
KEY `submit_date` (`submit_date`)
) ENGINE=MyISAM DEFAULT CHARSET=utf8 AUTO_INCREMENT=245144 ;
-- Table structure for table friend_friend
CREATE TABLE IF NOT EXISTS `friend_friend` (
`autoid` int(11) NOT NULL AUTO_INCREMENT,
`userid` int(10) DEFAULT NULL,
`friendid` int(10) DEFAULT NULL,
`status` enum('1','0','3') NOT NULL DEFAULT '0',
`submit_date` datetime NOT NULL DEFAULT '0000-00-00 00:00:00',
`alert_message` enum('yes','no') NOT NULL DEFAULT 'yes',
PRIMARY KEY (`autoid`),
KEY `userid` (`userid`),
KEY `friendid` (`friendid`)
) ENGINE=MyISAM DEFAULT CHARSET=utf8 AUTO_INCREMENT=2657259 ;
friend_reg_user table fields that will be used
auto_id = this is the users ID number
disp_name = this is the users name
pic_url = this is a thumbnail image path
Bulletins should show all bulletins posted by a user ID that is in our friend list
should also show all bulletins that we posted ourself
needs to scale well, friends table is several million rows
// 1 Old method uses a subselect
SELECT auto_id, user_id, bulletin, subject, color, fb.submit_date, spam
FROM friend_bulletin AS fb
WHERE (user_id IN (SELECT userid FROM friend_friend WHERE friendid = $MY_ID AND status =1) OR user_id = $MY_ID)
ORDER BY auto_id
// Another old method that I used on accounts that had a small amount of friends because this one uses another query
that would return a string of all there friends in this format $str_friend_ids = "1,2,3,4,5,6,7,8"
select auto_id,subject,submit_date,user_id,color,spam
from friend_bulletin
where user_id=$MY_ID or user_id in ($str_friend_ids)
order by auto_id DESC
I know these are not good for performance as my site is getting really large so I have been experimenting with JOINS
I beleive this gets everything I need except it needs to be modified to also get bulletins posted by myself, when I add that into the WHERE part it seems to break it and return multiple results for each bulletin posted, I think because it is trying to
return results that I am a friedn of and then I try to consider myself a friend and that doesnt work well.
My main point of this whole post though is I am open to opinions on the best performance way to do this task, many big social networks have similar function that return a list of items posted only by your friends. There has to be other faster ways???? I keep reading that JOINS are not great for performance but how else can I do this? Keep in mind I do use indexes and have a dedicated database server but my userbase is large there is no way around that
SELECT fb.auto_id, fb.user_id, fb.bulletin, fb.subject, fb.color, fb.submit_date, fru.disp_name, fru.pic_url
FROM friend_bulletin AS fb
LEFT JOIN friend_friend AS ff ON fb.user_id = ff.userid
LEFT JOIN friend_reg_user AS fru ON fb.user_id = fru.auto_id
WHERE (
ff.friendid =1
AND ff.status =1
)
LIMIT 0 , 30
First of all, you can try to partition out the database so that you're only accessing a table with the primary rows you need. Move rows that are less often used to another table.
JOINs can impact performance but from what I've seen, subqueries are not any better. Try refactoring your query so that you're not pulling all that data at once. It also seems like some of that query can be run once elsewhere in your app, and those results either stored in variables or cached.
For example, you can cache an array of friends who are connected for each user and just reference that when running the query, and only update the cache when a new friend is added/removed.
It also depends on the structure of your systems and your code architecture - your bottle nneck may not entirely be in the db.