Using PHP and MySQL, I have a forum system I'm trying to build. What I want to know is, how can I set it so that when a user reads a forum entry, it shows as read JUST for that user, no matter what forum they are in, until someone else posts on it.
Currently, for each thread, I have a table with a PostID, and has the UserID that posted it, the ThreadID to link it to, the actual Post (as Text), then the date/time it was posted.
For the thread list in each forum, there is the threadID (Primary Key), the ThreadName, ForumID it belongs to, NumPosts, NumViews, LastPostDateTime, and CreateDateTime. Any help?
The traditional solution is a join table something along the lines of:
CREATE TABLE topicviews (
userid INTEGER NOT NULL,
topicid INTEGER NOT NULL,
lastread TIMESTAMP NOT NULL,
PRIMARY KEY (userid, topicid),
FOREIGN KEY (userid) REFERENCES users(id),
FOREIGN KEY (topicid) REFERENCES topics(id)
);
with lastread updated every time a topic is read. When displaying the list of topics, if the topics.lastupdated is > topicviews.lastread, there are new posts.
The traditional solution is rubbish and will kill your database! Don't do it!
The first problem is that a write on every topic view will soon bring the database server to its knees on a busy forum, especially on MyISAM tables which only have table-level locks. (Don't use MyISAM tables, use InnoDB for everything except fulltext search).
You can improve this situation a bit by only bothering to write through the lastread time when there are actually new messages being read in the topic. If topic.lastupdated < topicviews.lastread you have nothing to gain by updating the value. Even so, on a heavily-used forum this can be a burden.
The second problem is a combinatorial explosion. One row per user per topic soon adds up: just a thousand users and a thousand topics and you have potentially a million topicview rows to store!
You can improve this situation a bit by limiting the number of topics remembered for each user. For example you could remove any topic from the views table when it gets older than a certain age, and just assume all old topics are 'read'. This generally needs a cleanup task to be done in the background.
Other, less intensive approaches include:
only storing one lastread time per forum
only storing one lastvisit time per user across the whole site, which would show as 'new' only things updated since the user's previous visit (session)
not storing any lastread information at all, but including the last-update time in a topic's URL itself. If the user's browser has seen the topic recently, it will remember the URL and mark it as visited. You can then use CSS to style visited links as 'topics containing no new messages'.
May be storing in another table UserID,threadID, LastReadDateTime when the user read that thread.
if (LastPostDateTime > LastReadDateTime) you got an unread post.
Sadly you have a great overhead, on every read you'll have a write.
The general ideas here are correct, but they've overlooked some obvious solutions to the scalability issue.
#bobince:
The second problem is a combinatorial explosion. One row per user per topic soon adds up: just a thousand users and a thousand topics and you have potentially a million topicview rows to store!
You don't need to store a record in the "topicviews" table if somebody hasn't ever viewed that thread. You'd simply display a topic as having unread posts if null is returned OR of the last_read time is < last_post time. This will reduce that "million" rows by perhaps an order of magnitude.
#gortok: There are plenty of ways to do it, but each grows exponentially larger as the user visits the site.
In this case, you archive a forum after n-posts or n-weeks and, when you lock, you clean up the "topicviews" table.
My first suggestion is obvious and has no downside. My second reduces usability on archived topics, but that's a small price to pay for a speedy forum. Slow forums are just painful to read and post to.
But honestly? You probably won't need to worry about scalability. Even one million rows really isn't all that many.
There's no easy way to do this. There are plenty of ways to do it, but each grows exponentially larger as the user visits the site. The best you can do and still keep performance is to have a timestamp and mark any forums that have been updated since the last visit as 'unread'.
You could just use the functionality of the user's browser and append the last postid in the link to the thread like this: "thread.php?id=12&lastpost=122"
By making use of a:visited in your CSS, you can display the posts that the user already read different from those that he did not read.
Bobince has a lot of good suggestions. Another few potential optimizations:
Write the new "is this new?" information to memcached and to a MySQL "ARCHIVE" table. A batch job can update the "real" table.
For each user, keep a "everything read up until $date" flag (for when "mark all read" is clicked).
When something new is posted, clear all the "it's been read" flags -- that keeps the number of "flags" down and the table can just be (topic_id, user_id) -- no timestamps.
The used of functionality user’s browser and add the last post ID in the link of the thread. After use of a: visited in CSS you can display all thread that did not read by user.
Related
I have 4 tables: users, posts, categories, categories_map
posts has id, text, category_id
categories_map contains user_id and category_id
My goal is to make a queue that the user can preview. Also, the user will be able to skip some posts or edit text in them. If the user skipped a post it will never appear in queue. However, the user is not able to change sequence because cron will be executing a script.
The first approach I think is to create a table that will contain
user_id, post_id, text_modified, is_skipped, last_posted. So when the cron job is executed it will leave a timestamp so next time this post won't be grabbed and the user easily can change the text for this post.
The second approach is to create a separate table where a queue will be generated for user user_id, post_id, category_id, text_modified. So the cron job can easily job follow this table and remove the row after it was done. But with this approach if I will have 30 users, with an average of 3 categories that contains 5000 posts each, my table will have 450000 rows already. Yes if it is indexed properly it should be all good. But will it be scalable when I have 100-200 users?
Which approach should I go or is there any other solution?
A lot of things depend on your product. We don't know:
How users interact with each other?
Do their actions (skips) need to be persisted, or are we ok, if they lose them above 99.9 percentile.
Are their text modification on the posts, globally visible, or only to them.
Are the users checking posts by category?
Said all these unknowns, I'll take a stab at it:
If the answer to question 4 is YES then option #2 seems more sound judging from your PKs.
If the answer to question 4 is NO then option #1 seems more sound judging from your PKs.
For database size, I think you're doing a bit of pre-optimization. You should take into account table width. Since your tables are very narrow (only a couple of columns and mainly ints), you shouldn't worry too much about the length of the specific table.
When that becomes a constraint, (which you can benchmark, or wait to see disk space on the specific servers), you can scale up the databases by sharding on the user easily. You basically put different users on different db servers.
Note: Question 1 will determine how easy the above would be.
Said all this, keep in mind performance implications:
The lists are going to get really long.
If the users modification affect other users, you are going to have to do quite a bit of fan-out work, to publish the updates to the specific queues.
In that case, you might want to take a look at some distributed cache like Memcached, Redis.
Note: Depending on answers to Questions 2 & 3, you might not even need to persist the queues.
I'm trying to create a Like/Unlike system akin to Facebook's for an existing comments section of a website, and I need help in designing the system.
Currently, every product on the website has a comments section and members can post and like comments. I need to know each member has posted how many comments and each of his comments has received how many likes. Of course, I need to know who liked what comments too (partly so that I can prevent a user from liking a comment more than once) for analytical purposes.
The naive way of implementing a Like system to the current comments module is to create a new table in the database that has foreign keys to the CommentID and UserID. Then for every "like" given to a comment by a user, I would insert a row to this new table with the targeting comment ID and user ID.
While this might work, the massive amount of comments and users is going to cause this table to grow quickly and retrieving records from and doing counts on this huge table will become slow and inefficient. I can index either one of the columns, but I don't know how effective it would be. The website has over a million comments.
I'm using PHP and MySQL. For a system like this with a huge database, how should I designing a Like system so that it is more optimised and stable?
For scalability, do not include the count column in the same table with other things. This is a rare case where "vertical partitioning" is beneficial. Why? The LIKEs/UNLIKEs will come fast and furious. If the code to do the increment/decrement hits a table used for other things (such as the text of the Comment), there will be an unacceptable amount of contention between the two.
This tip is the first of many steps toward being able to scale to Facebook levels. The other tips will come, not from a free forum, but from the team of smart engineers you will have to hire to get to that level. (Hints: Sharding, Buffering, Showing Estimates, etc.)
Your main concern will be a lot of counts, so the easy thing to do is to keep a separate count in your comments table.
Then you can create a TRIGGER that increments/decrements the count based on a like/unlike.
That way you only use the big table to figure out if a user already voted.
So I have been playing around with a forum I am building and have been stuck on one aspect of it for a while, how to track unread posts and notifications without storing loads of data in the database. After looking at some solutions I believe I came up with a solution that may suit my needs but need a set of fresh eyes to point out what I didn't think of. Here is the architecture of my idea.
1) When a user logs in, check for posts made between current time() and last login time().
2) If posts found, add to array, then serialize() array and save to member row in database.
3) Output array to user if not empty.
This way it will only check for unread posts and store on users who actually log in to the forum, instead of taking up unnecessary space holding unread IDs of inactive users. I'm still wondering if this isn't such a good idea since if the user doesn't read posts then the serialization in the database might become too large to manage.
Does anyone see a problem in my way of thinking? If so please let me know.
Don't worry about the space until there's actually a problem. A table storing the post ID (integer) and the user ID (another integer) will be small. Even if you have thousands of posts and thousands of users, you can safely assume that:
a large part of the users will be inactive (one-time registrations to post something and forget the whole issue)
even the active members will not read all the posts, but rather only a (relatively small) part of the ones that are in topics that interest them.
One other thing: don't store unread posts if you really want to minimise space. Store only the last read post in each thread. That's one record per thread per user, and only assuming the user has ever opened the thread.
If the user logs in, but does not read posts, your scheme still marks them as read.
If the user logs in twice at once (as from a desktop computer and an iPad), what will happen?
What is the problem with keeping each user's view of the forum with a flag to indicate whether they read each one? Such a mechanism is obviously useful to expand into upvoting, favorites, etc.
I am in the process of writing my own basic forum to plug into a code igntier site. I'm a little stuck on how to display threads/latest posts unread by a user.
I was thinking of a table that holds each thread_id visited, but this table has the potential to get rather large.
What's are some ways to approach this requirement?
A simple idea: record the last datetime that a user visits the site/forum/subforum. This could be as granular as the thread or subforum, as you like. Perhaps create/update this key-value pair of thread_id and last_visit_date in a cookie. Perhaps store this in a cookie, rather than in your RDBMS. Ask: is this mission critical data, or an important feature that can/cannot withstand a loss of data?
When the user returns, find all the threads/posts whose create_date is greater than the last_visit_date for the forum.
I'm assuming that the act of visiting the forum (list of threads) is same as 'viewing'. Assuming that if the info was presented, that you'd 'viewed' the thread title, regardless of whether you actually drilled into the thread.
The easiest way out would probably be just to keep a cookie of the time of user's last visit and query posts posted/edit after this. You don't get exactly all read threads but most forums seems to work this way, otherwise you have to save all read threads somewhere.
I don't think you really need to create any table to log thread ids as you have thought because its going to grow by the size of your users and by the numbers of threads/posts created. You can just show threads or posts that were created after the user's last visit as unread. I think thats what I am going to do.
I have written a simple forum in PHP using PostgreSQL. The forum consists of a number of subforums (or categories, if you like) that contain topics. I have a table that stores when was the last time a user visited a topic. It's something like this: user_id, topic_id, timestamp.
I can easily determine what topics should be marked as unread by comparing the timestamp of the last topic reply with the timestamp of the last user visit.
My question is: how do I efficiently determine what subforums (categories) should be marked as unread? All I've come up with is this: every time a user visits a topic, update the visit timestamp and check if all the topics from the current subforum are read or unread. If they are all read, mark the subforum as read for the user. Else, mark it as unread. But I think there must be another way.
Thank you in advance.
There are many ways (like yours) to achieve a similar behavior, since you mention efficiency I will consider performance is important.
The way I handled this before did not involved a database to take care of unread content at all. That in mind, my suggestion would be:
On the first visit mark only topics newer than, let's say, 3 days as 'unread'
As the user browses the topics, start throwing the topic IDs and LastUpdate for the thread into a cookie on the client
When the forum pages load, check the cookie and if the thread has suffered any updates, this code and also the cookie handling can be easily done with pure javascript.
If the client is a whole week away from the website, no problem, he will see everything newer than 3 days (first visit rule) as unread.
p.s.: This is 100% related to how important is to a person to know what he has not read. In my suggestion this is not something crucial, because it is not 100% reliable (we are not using a database/proper persistance after all)