So I have been playing around with a forum I am building and have been stuck on one aspect of it for a while, how to track unread posts and notifications without storing loads of data in the database. After looking at some solutions I believe I came up with a solution that may suit my needs but need a set of fresh eyes to point out what I didn't think of. Here is the architecture of my idea.
1) When a user logs in, check for posts made between current time() and last login time().
2) If posts found, add to array, then serialize() array and save to member row in database.
3) Output array to user if not empty.
This way it will only check for unread posts and store on users who actually log in to the forum, instead of taking up unnecessary space holding unread IDs of inactive users. I'm still wondering if this isn't such a good idea since if the user doesn't read posts then the serialization in the database might become too large to manage.
Does anyone see a problem in my way of thinking? If so please let me know.
Don't worry about the space until there's actually a problem. A table storing the post ID (integer) and the user ID (another integer) will be small. Even if you have thousands of posts and thousands of users, you can safely assume that:
a large part of the users will be inactive (one-time registrations to post something and forget the whole issue)
even the active members will not read all the posts, but rather only a (relatively small) part of the ones that are in topics that interest them.
One other thing: don't store unread posts if you really want to minimise space. Store only the last read post in each thread. That's one record per thread per user, and only assuming the user has ever opened the thread.
If the user logs in, but does not read posts, your scheme still marks them as read.
If the user logs in twice at once (as from a desktop computer and an iPad), what will happen?
What is the problem with keeping each user's view of the forum with a flag to indicate whether they read each one? Such a mechanism is obviously useful to expand into upvoting, favorites, etc.
Related
I am working on a jQuery Mobile Web App where a company will have the ability to message certain groups of users (based on their profile preferences).
I am debating on what is the most efficient way to mark when each user has read the latest messages. I have considered using a session to try and keep track of the last time they opened the messages page, and comparing that to the post times of the messages. I have also considered a table with the message_id and the user_id, marking each one as read when they open the page.
I think both would work, but I am trying to balance the pros and cons. Keeping in a database would allow me to keep a history (especially if i added a timestamp column to know when they read the message), but if it is going to hurt the app performance due to the table size, then it may not be worth it. The app will potentially have 10's of thousands of users.
One thing I should probably mention is that the users may use the app on multiple devices and the app will have very long session times, potentially allowing the user to stay logged in for months. I like the idea that if they read it on one device then it would mark it read on all devices, which may make sessions difficult to work with, right?
Ok, I'm gonna put everything I said in the comments into one solid answer.
Short Answer: You should be using the database to store 'read' notifications
Logic behind it:
It should be a negligable performance hit with decent servers and optimized code (couple of ms max) even with hundreds of thousands of users
It is highly maintainable
You can track it and sync it across devices
Specifically why you shouldn't use sessions
Sessions were designed to store temporary user data (think ram), they're not supposed to log stuff.
You should not be keeping sessions for months. It is highly insecure as it opens up a much larger window for session hijacking. Rather you should be generating a new session each time the app is accessed, and using a different "remember me" cookie or something each time to authenticate them.
Even if you do make your session persist for months, after those months won't the user all of a sudden get a bajillion "unread" notifications?
How to store it in the database
This is called a many-to-many relationship (from the message perspective) OR a one-to-many relationship (from the user perspective)
Table 1: messages
ID, message, timestamp
Table 2: messages_users
ID, user_id, message_id, read
Table 3: users
(Do user business as usual)
I can do one thing, if no problem with one user or 100 of user, you create one column named readUnread with more than 63,999 Characters in which you use put every user your message with 0 and 1 assign like {jeff:0,kevin:1,Sal:0} when read update from 0 to 1 and when you open this on the screen, split it with current user and ";", this will help you (this is the logic which inhance your performance).
For reference, here is a question on SO that I asked recently that is relevant to this question: How to model Friendship relationships
On that question, we figured out a way to display news feed items only if posted by friends. However, what I need is advice on how to check for friendship in a more dynamic way so it can be used throughout several site functions.
For example, I've just installed a comment system, allowing users to post comments on news posts. However, this isn't restricted to friends, and it should be (and later should be made optional by the author of the post).
Posting news items only by friends was slightly different because I was getting data directly from the database, and using SELECT subqueries to only get posts by friends of the current user. However, in the example of the commenting, I only want to display the comment post form if the person is a friend. I'm not actually pulling anything from the database to pick, sort, and display.
Knowing issues like this will arise many times throughout the site, what would be the simplest way to check for friendship? Could I somehow pull all the friends' user IDs from the database into a session array of some kind and then do an if(in_array($friends)) whenever I needed to determine if the person in question is a friend of the currently logged in user? Off the top of my head, this sounds like it would work fine, but I'd like your input first.
The question I linked to above explains how my friendship table works, in case that help you help me with this.
Actually, it is a very bad idea to stor the friend array in the session. What happens if some you add a new friend after the session variable is created? It does not get updated in the session.
Since you will be checking the list of friends a lot of times on the same page, why not just query it out and store it in a local array that you can keep using on the same page.
When the page finish executing, the array will be discarded.
So basically, you will only query the list out once.
A recommended implementation would be to follow the advise of "pst" in the above comment, and just query for every time you need to find the relationship first, as that is simple to implement. Later, when the speed of the query starts to become a issue, you can just change the internal of that method to cache the friend list in a local array when it is first called to speed things up. (exchange memory usage for processor usage).
I am in the process of writing my own basic forum to plug into a code igntier site. I'm a little stuck on how to display threads/latest posts unread by a user.
I was thinking of a table that holds each thread_id visited, but this table has the potential to get rather large.
What's are some ways to approach this requirement?
A simple idea: record the last datetime that a user visits the site/forum/subforum. This could be as granular as the thread or subforum, as you like. Perhaps create/update this key-value pair of thread_id and last_visit_date in a cookie. Perhaps store this in a cookie, rather than in your RDBMS. Ask: is this mission critical data, or an important feature that can/cannot withstand a loss of data?
When the user returns, find all the threads/posts whose create_date is greater than the last_visit_date for the forum.
I'm assuming that the act of visiting the forum (list of threads) is same as 'viewing'. Assuming that if the info was presented, that you'd 'viewed' the thread title, regardless of whether you actually drilled into the thread.
The easiest way out would probably be just to keep a cookie of the time of user's last visit and query posts posted/edit after this. You don't get exactly all read threads but most forums seems to work this way, otherwise you have to save all read threads somewhere.
I don't think you really need to create any table to log thread ids as you have thought because its going to grow by the size of your users and by the numbers of threads/posts created. You can just show threads or posts that were created after the user's last visit as unread. I think thats what I am going to do.
OK I have a social network, around 50,000 users so far and there is a friend table that show who is your friend on the site, this table is over a million rows
Not the tricky part, I show user posted bulletin, status posts, stuff like that that is only visible to people on your friend list.
Keep in mind the size of the tables, user table around 50,000 so far and friend table around 1 million
1 method of getting the friends list to know what status posts to show to a user is to run this query below and put the results into an array
select friendid from friend_friend where userid=1 and status=1
I would then turn this array into a comma seperated list and use it in an IN clause on the mysql query that fetches the posts that this user is allowed to view
Hope that makes sense so far.
Now what if I were to save this friend array to a session variable since this query is ran very frequently?
One of the drawbacks I see is if a user adds that person as a friend while they are logged in, they would show up as there friend until that session was reset but other then that, would this be bad performance for memory or whatever sessions use?
Also note there is sometimes up to 500 users logged in and some have a friend list of 10,000 friend ID's and this is a PHP/MySQL setup
Another question about session data, it is stored in file and not sytem memory correct? If it is on disk instead of memory then memory shouldn't be to much of a problem?
have you looked into memcached? This article ( http://highscalability.com/memcached-and-storage-friend-list ) , and it's comment, highlight the pros and cons of your way versus offloading to memcache.
maybe you would want to respond to this problem dynamically.
if, let's say, a user has more than X friends, don't cache it. think of what could happen when a user has a lot of friends and you need to push the entire data through the connection... i think the user recognizes this as bigger performance reduction than the one when you need to query the list each time.
this is my opinion based on my experience with php/mysql... maybe there's a reason for doing what you proposed anyway^
regards
Using PHP and MySQL, I have a forum system I'm trying to build. What I want to know is, how can I set it so that when a user reads a forum entry, it shows as read JUST for that user, no matter what forum they are in, until someone else posts on it.
Currently, for each thread, I have a table with a PostID, and has the UserID that posted it, the ThreadID to link it to, the actual Post (as Text), then the date/time it was posted.
For the thread list in each forum, there is the threadID (Primary Key), the ThreadName, ForumID it belongs to, NumPosts, NumViews, LastPostDateTime, and CreateDateTime. Any help?
The traditional solution is a join table something along the lines of:
CREATE TABLE topicviews (
userid INTEGER NOT NULL,
topicid INTEGER NOT NULL,
lastread TIMESTAMP NOT NULL,
PRIMARY KEY (userid, topicid),
FOREIGN KEY (userid) REFERENCES users(id),
FOREIGN KEY (topicid) REFERENCES topics(id)
);
with lastread updated every time a topic is read. When displaying the list of topics, if the topics.lastupdated is > topicviews.lastread, there are new posts.
The traditional solution is rubbish and will kill your database! Don't do it!
The first problem is that a write on every topic view will soon bring the database server to its knees on a busy forum, especially on MyISAM tables which only have table-level locks. (Don't use MyISAM tables, use InnoDB for everything except fulltext search).
You can improve this situation a bit by only bothering to write through the lastread time when there are actually new messages being read in the topic. If topic.lastupdated < topicviews.lastread you have nothing to gain by updating the value. Even so, on a heavily-used forum this can be a burden.
The second problem is a combinatorial explosion. One row per user per topic soon adds up: just a thousand users and a thousand topics and you have potentially a million topicview rows to store!
You can improve this situation a bit by limiting the number of topics remembered for each user. For example you could remove any topic from the views table when it gets older than a certain age, and just assume all old topics are 'read'. This generally needs a cleanup task to be done in the background.
Other, less intensive approaches include:
only storing one lastread time per forum
only storing one lastvisit time per user across the whole site, which would show as 'new' only things updated since the user's previous visit (session)
not storing any lastread information at all, but including the last-update time in a topic's URL itself. If the user's browser has seen the topic recently, it will remember the URL and mark it as visited. You can then use CSS to style visited links as 'topics containing no new messages'.
May be storing in another table UserID,threadID, LastReadDateTime when the user read that thread.
if (LastPostDateTime > LastReadDateTime) you got an unread post.
Sadly you have a great overhead, on every read you'll have a write.
The general ideas here are correct, but they've overlooked some obvious solutions to the scalability issue.
#bobince:
The second problem is a combinatorial explosion. One row per user per topic soon adds up: just a thousand users and a thousand topics and you have potentially a million topicview rows to store!
You don't need to store a record in the "topicviews" table if somebody hasn't ever viewed that thread. You'd simply display a topic as having unread posts if null is returned OR of the last_read time is < last_post time. This will reduce that "million" rows by perhaps an order of magnitude.
#gortok: There are plenty of ways to do it, but each grows exponentially larger as the user visits the site.
In this case, you archive a forum after n-posts or n-weeks and, when you lock, you clean up the "topicviews" table.
My first suggestion is obvious and has no downside. My second reduces usability on archived topics, but that's a small price to pay for a speedy forum. Slow forums are just painful to read and post to.
But honestly? You probably won't need to worry about scalability. Even one million rows really isn't all that many.
There's no easy way to do this. There are plenty of ways to do it, but each grows exponentially larger as the user visits the site. The best you can do and still keep performance is to have a timestamp and mark any forums that have been updated since the last visit as 'unread'.
You could just use the functionality of the user's browser and append the last postid in the link to the thread like this: "thread.php?id=12&lastpost=122"
By making use of a:visited in your CSS, you can display the posts that the user already read different from those that he did not read.
Bobince has a lot of good suggestions. Another few potential optimizations:
Write the new "is this new?" information to memcached and to a MySQL "ARCHIVE" table. A batch job can update the "real" table.
For each user, keep a "everything read up until $date" flag (for when "mark all read" is clicked).
When something new is posted, clear all the "it's been read" flags -- that keeps the number of "flags" down and the table can just be (topic_id, user_id) -- no timestamps.
The used of functionality user’s browser and add the last post ID in the link of the thread. After use of a: visited in CSS you can display all thread that did not read by user.