I'm making a website that have posts and replies system.
I'd like to do is when someone replies, sending notification to those who have ever replied (or involved) the post.
My thought is to create a table named Notification, contains message and seen (seen/unread) field. Once people replied, INSERT record to the Notification table.
It's seems easy and intuitive, but if there are lots of people involved in, for example, the 31st user replies, 30 people who have ever replied will receive notification. This will make 30 rows of SQL records. And the 32nd user will make 31 records. Then total number of rows will become 30+31=61.
My question is
Is that a good way to handle notification system?
If so, how to deal with the duplicate notification (haven't seen but has new reply)
As above, will this make a huge server load?
Thank you so much.
I was creating similar system. Here is my experience:
My notification table looks like: id (int) | user_id (int) | post_id (int) | last_visited (datetime).
user_id + post_id is an unique composite index.
So when a user opens the page, I'm looking for an entry (user_id + post_id) in the database. If I find it, then I update the last_visited field if I don't find, then create new row.
When I need list messages for notification I'm just query all messages that was created after last_visited time.
Also I have cron sript that clean notification for closed posts or banned users.
As for your questions:
1 and 2: You have to find a balance between the amount of data that will be stored and site performance. If you don't need to store all this data you can follow my way. If this data is needed your way is better.
3: It depends on the number of visitors and other functionality. But here is some advices. You must use indexes for MySql table for better perfomance. Also you should think about cron script that will remove useless notifications. If you have huge amount of visitors more than 700k per day you shoulf think about MogoDb or other high perfomance noSql database.
Related
I have four tables: projects, posts, users and project_users. Some users are connected to some projects via the project_users table. Each project contains a bunch of rows from the posts table. Pretty straight forward.
The users can edit the posts rows and every time an update occurs the user id and timestamp should be saved so other users can see what is new and not. Should I save this information to 1) the post row, keeping a latest editor and time for each row or should I consider 2) a whole separate log table? What are the benefits of each case?
When that is decided, I want to run a script every now and then that refreshes the site content if any rows have been updated by anyone else (poor man's push). To give me some perspective. Is it a big query to ask the database with say 1 million posts: "check if there are any posts in this project where any user other than you have updated after timestamp x"? Just asking because I'd run that query a lot.
A much quicker way would be to log the latest editor to the projects or project users rows, but with multiple people editing at the same time, that would be less accurate and there is also no way to see which rows got updated. Makes sense?
Use solution (1). It's a clean design and easy to work with.
When you fetch post data for the edit form, save the updated_at timestamp in a hidden field. Then run an Ajax request every X seconds to check if the timestamp has been changed. The query is simple:
SELECT updated_at FROM posts WHERE post_id = ?
Since you filter by the primary key, the query will be very fast even with a huge table.
Now compare the fetched timestamp with what is stored in the hidden field. If it has changed - Notify the user (editor).
As you already realized, it's possible that another user will update the post between two Ajax requests. So you should also check for updates just befor saving the form data. Don't forget to use a transaction and lock the row:
Pseudo code:
START TRANSACTION
SELECT updated_at FROM posts WHERE post_id = ? FOR UPDATE
if ($row->updated_at == $form->updated_at)
then update row
else notify the user that he is unlucky ;-(
COMMIT
Of course you can also use solution (2) and add a separate table post_updates(post_id, user_id, updated_at). To get the latest timestamp you would run:
SELECT MAX(updated_at) as updated_at
FROM post_updates
WHERE post_id = ?
It should run pretty fast with an index on (post_id, updated_at). But don't complicate things as long as you don't have any requirements which would benefit from that table.
For my university project, I'm developing a dynamic live chat website with rooms, user registration, etc. I've got the entire system planned out bar one aspect. The rooms. I'm confused as to how to design the database for rooms.
To put it in perspective, a room is created by a user who is then an operator of that room. Users can join the room and talk within it. The system has to be scalable, accounting for hundreds of thousands if not millions of messages being sent a day.
Originally, I was going to create on table in my database called messages, and have fields like this:
| r_id | u_id | message | timestamp |
r_id and u_id would be foreign keys to the room ID and user ID respectively. Doing it this way means I would need to insert a new record whenever a user sends a message, and periodically run a SELECT statement for every client (say every 3 seconds or so) to get the recent messages. My worry with this is because the table will be huge, running these statements might create a lot of overhead and take a long time.
The other way I thought of implementing this would be to create a new database table for every room. Say a user creates 3 rooms called General, Programming and Gaming, the database tables would look like: room_general, room_programming, room_gaming, each with fields like:
| u_id | message | timestamp |
This would drastically cut down on the amount of queries for each table, but may introduce problems when I come to program it.
So, I'm stuck on what the best way to do this is. If it makes a difference, the technology I'm using will be MySQL with PHP, and a whole lotta AJAX.
Thanks for any help!
It is bad idea to create a table per room. Hard to implement and hard to support.
Dont worry about performance of selects because they will be wery simple:
SELECT * FROM messages WHERE r_id=X ORDER BY timestamp DESC LIMIT X,Y
Just make sure your (r_id, timestamp) indexed together in this order to make this select using index:
ALTER TABLE `messages` ADD KEY `IN_messages_room_time` (`r_id`, `timestamp`);
If you will still have problems with performance (probably you will not), just add a 1-3 seconds inmemory cache (using memcache) and fetch a messages from DB one time per 1-3 seconds.
Also look at the Apollo Clark's answer: https://stackoverflow.com/a/8673165/436932 to prevent storing huge amount of unneccessary old messages: you can just put it in to the MYISAM table archive or simply delete.
Look into creating a "transaction table" for storing the messages. Basically, you need to decide, do I really want to log all of the messages ever posted to the room, or just the message posted this past month / week / day / hour. If you really want to have a history of every message ever written, then you would create two databases. If you don't want to keep a history of every message, then you just need one table.
Using a transaction table, here's how it would flow:
user enters chat room
user types a message, which is saved to the transaction table.
every 500msec or 3sec, every user in the room would query the transaction table to get the latest updates from the past 500msec or 3sec
SELECT * FROM message_transactions WHERE timestamp > 123456789
a CRON job runs every 5 min or 1 hour, and deletes all entries older then 5min or however long you want the history to be.
Be sure to synchronize and round the time that each user queries the transaction table, so that the MySQL query result caching will kick in. For example, round the timestamp to once every 1sec or every 500msec.
What'll happen now is the users only get the newest messages, and your database won't explode in size over time, or slow down. Doing this, you'll need to cache the history of messages on the client-side in JS.
On the flip side, you could just get a PHP to IRC library, and call it a day. Also, if you're curious about it, look into how Facebook implements their AJAX-based chat system.
To speed up your database, have a look at indexing your tables: http://dev.mysql.com/doc/refman/5.0/en/mysql-indexes.html
In your case I assume that you'd be SELECTing messages by r_id while doing a JOIN on the user table through u_id. I would index the r_id and u_id columns. I am by no means an expert on this subject as I've only done "what works" for my own projects. I don't understand every pro and con of indexing, just that indexing those columns that are typically used as, well, indexes, speeds things up. Google "mysql index tutorial", you'll find plenty more information.
Don't go nuts and index every column, you'll slow down your INSERTs and UPDATEs.
I also suggest that you purge the chat logs every few days / weeks, or move them to another server for archival purposes if that's what you want / need to do.
You could potentially use memcached to hold recent chat messages in memory and do your database writes in bulk.
Using memcached as a database buffer for chat messages
What you can do is:
whenever the user updates, you save the message to a cache specific to a room with a timestamp of when the message came in, while saving it to the database at the time. When the clients requests for new messages, if the user is not new in the chat room, you check the last time the user got served by the server and load the new messages from the cache for the request. But if the user is new, then you serve him from the database.
To improve scalability in this scenario, you have to set the expiration of the messages so that messages can expire after that time. Or implement an async method that deletes old messages based on their timestamp.
I'm working on an app in JavaScipt, jQuery, PHP & MySQL that consists of ~100 lessons. I am trying to think of an efficient way to store the status of each user's progress through the lessons, without having to query the MySQL database too much.
Right now, I am thinking the easiest implementation is to create a table for each user, and then store each lesson's status in that table. The only problem with that is if I add new lessons, I would have to update every user's table.
The second implementation I considered would be to store each lesson as a table, and record the user ID for each user that completed that lesson there - but then generating a status report (what lessons a user completed, how well they did, etc.) would mean pulling data from 100 tables.
Is there an obvious solution I am missing? How would you store your users progress through 100 lessons, so it's quick and simple to generate a status report showing their process.
Cheers!
The table structure I would recommend would be to keep a single table with non-unique fields userid and lessonid, as well as the relevant progress fields. When you want the progress of user x on lesson y, you would do this:
SELECT * FROM lessonProgress WHERE userid=x AND lessonid=y LIMIT 1;
You don't need to worry about performance unless you see that it's actually an issue. Having a table for each user or a table for each lesson are bad solutions because there aren't meant to be a dynamic number of tables in a database.
If reporting is restricted to one user at a time - that is, when generating a report, it's for a specific user and not a large clump of users - why not consider javascript object notation stored in a file? If extensibility is key, it would make it a simple matter.
Obviously, if you're going to run reports against an arbitrarily large number of users at once, separate data files would become inefficient.
Discarding the efficiency argument, json would also give you a very human-readable and interchangeable format.
Lastly, if the security of the report output isn't a big sticking point, you'd also gain the ability to easily offload view rendering onto the client.
Use relations between 2 tables. One for users with user specific columns like ID, username, email, w/e else you want to store about them.
Then a status table that has a UID foreign key. ID UID Status etc.
It's good to keep datecreated and dateupdated on tables as well.
Then just join the tables ON status.UID = users.ID
A good option will be to create one table with an user_ID as primary key and a status (int) each row of the table will represent a user. Accessing to its progress would be fast a simple since you have an index of user IDs.
In this way, adding new leassons would not make you change de DB
I am thinking of storing online users in memcached.
First I thought about having array of key => value pairs where key will be user id and value timestamp of last access.
My problem is that it will be quite large array when there are many users currently online(and there will be).
As memcached is not built to store large data, how would you solve it? What is the best practice!
Thanks for your input!
The problem with this approach is memcache is only queriable if you know the key in advance. This means you would have to keep the entire online user list under a single known key. Each time a user came online or went offline, it would become necessary to read the list, adjust it, and rewrite it. There is serious potential for a race condition there so you would have to use the check-and-set locking mechanism.
I don't think you should do this. Consider keeping a database table of recent user hits:
user_id: int
last_seen: timestamp
Index on timestamp and user_id. Query friends online using:
SELECT user_id FROM online WHERE user_id IN (...) AND timestamp > (10 minutes ago);
Periodically go through the table and batch remove old timestamp rows.
When your site becomes big you can shard this table on the user_id.
EDIT:
Actually, you can do this if you don't need to query all the users who are currently online, but just need to know if certain users are online.
When a user hits a page,
memcache.set("online."+user_id, true, 600 /* 10 mins */);
To see if a user is online,
online = memcache.get("online."+user_id);
There should also be a way to multikey query memcache, look that up. Some strange things could happen if you add a memcache server, but if you use this information for putting an "online" marker next to user names that shouldn't be a big deal.
I have user discussion forums I coded in php/mysql, I am wanting to know how the big name forums can make it show you which topics have new posts in them, usually by changing an icon image next to the thread without using hardly any resources?
The simplest way is to track the last time someone was logged in. When they come back to visit, everything which has been updated since then is obviously "new".
This has some problems though, since logging out effectively marks all items as read.
The only other way I could think to do it would be to maintain a table containing all the threads and the latest post in that thread which each user has seen.
user_id thread_id post_id
1 5 15
1 6 19
With that information, if there is a post in thread #5 which has an ID larger than 15, then you know there's unread posts there. Update this table only with the post_id of the latest post on that page. This means if there's 3 pages of new posts, and the user only views the first, it'll still know there's unread posts.
As nickf said above except that the threads the user has actually visited is tracked. so anything the user hasn't visited is considered new for that visitor. for finer grain control any threads created before the user registered are ignored and possibly any threads not visited within a period of time are ignored. this would prevent every unvisited thread as becoming a new thread for them.
Of course there are many ways to skin a cat and depending on what the forum creators wanted the above can be changed to suit
DC
You could log the last time they selected that topic and then see if a post has a later time-stamp then their last "click" on the thread.
You could make a special table in your database with columns like USER_ID and THREAD_ID and with appropriate constraints to your USER and THREAD tables and a primary key containing USER and THREAD IDs.
Now when somebody opens a thread, you just insert that USER-THREAD-PAIR into that special table.
In your thread listings you can now simply outer-join that table on to what ever suits you use there. if your new table contains NULL on any particular spot, that thread is unread. This will enable lists like:
All Threads with "unread" marker
All unread threads
Threads read by user XY
If you add a date column to this table, you can do even more interesting stuff.
Just keep an eye on your keys and indexes to prevent too heavy negative performance impacts. Try to read from the USER-THREAD-table only by joining it into your existing queries. That will work much faster than executing individual queries all the time.
You could have a table that gets an insert whenever a thread gets read, if the user reading it hasn't already. Then when someone adds to the thread you can delete all entries in the table for that thread, thus making it unread for all users.
The table structure would be something like
forum_id thread_id user_id
With the optional extra has_read_id for your primary key, with the other fields making a composite key.