Database schema for a live chat project with rooms

Database schema for a live chat project with rooms - php

For my university project, I'm developing a dynamic live chat website with rooms, user registration, etc. I've got the entire system planned out bar one aspect. The rooms. I'm confused as to how to design the database for rooms.
To put it in perspective, a room is created by a user who is then an operator of that room. Users can join the room and talk within it. The system has to be scalable, accounting for hundreds of thousands if not millions of messages being sent a day.
Originally, I was going to create on table in my database called messages, and have fields like this:
| r_id | u_id | message | timestamp |
r_id and u_id would be foreign keys to the room ID and user ID respectively. Doing it this way means I would need to insert a new record whenever a user sends a message, and periodically run a SELECT statement for every client (say every 3 seconds or so) to get the recent messages. My worry with this is because the table will be huge, running these statements might create a lot of overhead and take a long time.
The other way I thought of implementing this would be to create a new database table for every room. Say a user creates 3 rooms called General, Programming and Gaming, the database tables would look like: room_general, room_programming, room_gaming, each with fields like:
| u_id | message | timestamp |
This would drastically cut down on the amount of queries for each table, but may introduce problems when I come to program it.
So, I'm stuck on what the best way to do this is. If it makes a difference, the technology I'm using will be MySQL with PHP, and a whole lotta AJAX.
Thanks for any help!

It is bad idea to create a table per room. Hard to implement and hard to support.
Dont worry about performance of selects because they will be wery simple:
SELECT * FROM messages WHERE r_id=X ORDER BY timestamp DESC LIMIT X,Y
Just make sure your (r_id, timestamp) indexed together in this order to make this select using index:
ALTER TABLE `messages` ADD KEY `IN_messages_room_time` (`r_id`, `timestamp`);
If you will still have problems with performance (probably you will not), just add a 1-3 seconds inmemory cache (using memcache) and fetch a messages from DB one time per 1-3 seconds.
Also look at the Apollo Clark's answer: https://stackoverflow.com/a/8673165/436932 to prevent storing huge amount of unneccessary old messages: you can just put it in to the MYISAM table archive or simply delete.

Look into creating a "transaction table" for storing the messages. Basically, you need to decide, do I really want to log all of the messages ever posted to the room, or just the message posted this past month / week / day / hour. If you really want to have a history of every message ever written, then you would create two databases. If you don't want to keep a history of every message, then you just need one table.
Using a transaction table, here's how it would flow:
user enters chat room
user types a message, which is saved to the transaction table.
every 500msec or 3sec, every user in the room would query the transaction table to get the latest updates from the past 500msec or 3sec
SELECT * FROM message_transactions WHERE timestamp > 123456789
a CRON job runs every 5 min or 1 hour, and deletes all entries older then 5min or however long you want the history to be.
Be sure to synchronize and round the time that each user queries the transaction table, so that the MySQL query result caching will kick in. For example, round the timestamp to once every 1sec or every 500msec.
What'll happen now is the users only get the newest messages, and your database won't explode in size over time, or slow down. Doing this, you'll need to cache the history of messages on the client-side in JS.
On the flip side, you could just get a PHP to IRC library, and call it a day. Also, if you're curious about it, look into how Facebook implements their AJAX-based chat system.

To speed up your database, have a look at indexing your tables: http://dev.mysql.com/doc/refman/5.0/en/mysql-indexes.html
In your case I assume that you'd be SELECTing messages by r_id while doing a JOIN on the user table through u_id. I would index the r_id and u_id columns. I am by no means an expert on this subject as I've only done "what works" for my own projects. I don't understand every pro and con of indexing, just that indexing those columns that are typically used as, well, indexes, speeds things up. Google "mysql index tutorial", you'll find plenty more information.
Don't go nuts and index every column, you'll slow down your INSERTs and UPDATEs.
I also suggest that you purge the chat logs every few days / weeks, or move them to another server for archival purposes if that's what you want / need to do.

You could potentially use memcached to hold recent chat messages in memory and do your database writes in bulk.
Using memcached as a database buffer for chat messages

What you can do is:
whenever the user updates, you save the message to a cache specific to a room with a timestamp of when the message came in, while saving it to the database at the time. When the clients requests for new messages, if the user is not new in the chat room, you check the last time the user got served by the server and load the new messages from the cache for the request. But if the user is new, then you serve him from the database.
To improve scalability in this scenario, you have to set the expiration of the messages so that messages can expire after that time. Or implement an async method that deletes old messages based on their timestamp.

Related

Notification and PHP+MySQL design

I'm making a website that have posts and replies system.
I'd like to do is when someone replies, sending notification to those who have ever replied (or involved) the post.
My thought is to create a table named Notification, contains message and seen (seen/unread) field. Once people replied, INSERT record to the Notification table.
It's seems easy and intuitive, but if there are lots of people involved in, for example, the 31st user replies, 30 people who have ever replied will receive notification. This will make 30 rows of SQL records. And the 32nd user will make 31 records. Then total number of rows will become 30+31=61.
My question is
Is that a good way to handle notification system?
If so, how to deal with the duplicate notification (haven't seen but has new reply)
As above, will this make a huge server load?
Thank you so much.

I was creating similar system. Here is my experience:
My notification table looks like: id (int) | user_id (int) | post_id (int) | last_visited (datetime).
user_id + post_id is an unique composite index.
So when a user opens the page, I'm looking for an entry (user_id + post_id) in the database. If I find it, then I update the last_visited field if I don't find, then create new row.
When I need list messages for notification I'm just query all messages that was created after last_visited time.
Also I have cron sript that clean notification for closed posts or banned users.
As for your questions:
1 and 2: You have to find a balance between the amount of data that will be stored and site performance. If you don't need to store all this data you can follow my way. If this data is needed your way is better.
3: It depends on the number of visitors and other functionality. But here is some advices. You must use indexes for MySql table for better perfomance. Also you should think about cron script that will remove useless notifications. If you have huge amount of visitors more than 700k per day you shoulf think about MogoDb or other high perfomance noSql database.

MySQL InnoDB insert and select lock

I created a ticketing system that in its simplest form just records a user joining the queue, and prints out a ticket with the queue number.
When the user presses for a ticket, the following happens in the database
INSERT details INTO All_Transactions_Table
SELECT COUNT(*) as ticketNum FROM All_Transactions_Table WHERE date is TODAY
This serves me well in most cases. However, I recently started to see some duplicate ticket numbers. I cant seem to replicate the issue even after running the web service multiple times myself.
My guess of how it could happen is that in some scenarios the INSERT happened only AFTER the SELECT COUNT. But this is an InnoDB table and I am not using INSERT DELAYED. Does InnoDB have any of such implicit mechanisms?

I think your problem is that you have a race condition. Imagine that you have two people that come in to get tickets. Here's person one:
INSERT details INTO All_Transactions_Table
Then, before the SELECT COUNT(*) can happen, person two comes along and does:
INSERT details INTO All_Transactions_Table
Now both users get the same ticket number. This can be very hard to replicate using your existing code because it depends on the exact scheduling of threads withing MySQL which is totally beyond your control.
The best solution to this would be to use some kind of AUTO_INCREMENT column to provide the ticket number, but failing that, you can probably use transactions to achieve what you want:
START TRANSACTION
SELECT COUNT(*) + 1 as ticketNum FROM All_Transactions_Table WHERE date is TODAY FOR UPDATE
INSERT details INTO All_Transactions_Table
COMMIT
However, whether or not this works will depend on what transaction isolation level you have set, and it will not be very efficient.

Saving stats data in MySQL Database for a line graph

I'm trying to make a pretty line graph to represent the rate of users registering to my site. What would be the best way to fetch and store my sites stat data to a MySQL Database?
I was thinking to use a cron job that would fetch the total amount of users from a seperate table, and subtract the previous total to get the amount of newly registered users. This value would be used for the line graph. To me, it seems like too much, especially if I want to get rate of users on a per-minute basis. Is there a more efficient way of doing this? Should I store each day in a separate row?

I'd suggest you include the time that users registered in your users table. You can then perform whatever analysis you like at a later date.

You can also create a field in table that is upgraded by a cron job (by minute, hour, day and so on), the field stores the status of user, new, old or other status that you have.
As said eggyal, you must also have a field with register date of user

Storing User Login Time in a User Table

In a table of Users, I want to keep track of the time of day each user logs in as running totals. For example
UserID midnightTo6am 6amToNoon noonTo6pm 6pmToMidnight
User1 3 2 7 1
User2 4 9 1 8
Note that this is part of a larger table that contains more information about a user, such as address and gender, hair color, etc, etc.
In this example, what is the best way to store this this data? Should it be part of the users table, despite knowing that not every user will log in at every time (a user may never log in between 6am and noon)? Or is this table a 1NF failure because of repeating columns that should be moved to a separate table?
If stored as part of the Users Table, there may be empty cells that never get populated with data because the user never logs in at that time.
If this data is a 1NF failure and the data is to be put in a separate table, how would I ensure that a +1 for a certain time goes smoothly? Would I search for the user in the separate table to see if they have logged in at that time before and +1? Or add a column to that table if it is their first time logging in during that time period?
Any clarifications or other solutions are welcome!

I would recommend storing the login events either in a file based log or in a simple table with just the userid and DATETIME of the login.
Once a day, or however often you need to report on the data you illustrated in your question, aggregate that data up into a table in the shape that you want. This way you're not throwing away any raw data and can always reaggregate for different periods, by hour, etc at a later date.
addition: I suspect that the fastest way of deriving the aggregated data would be to run a number of range queries for each of your aggregation periods so you're searching for (e.g.) login dates in the range 2011-12-25 00:00:00 - 2011-12-24 03:00:00. If you go with that approach and index of (datetime, user_id) would work well. It seems counter-intuitive as you want to do stuff on a user-centric basis but the index on the DATETIME field would allow easy finding of the rows and then the trailing user_id index would allow for fast grouping.

A couple of things. Firstly, this is not a violation of 1NF. Doing it as 4 columns may in fact be acceptable. Secondly, if you do go with this design, you should not use nulls, use zero instead(with the possible exception of existing records). Finally, WHETHER you should use this design or split it into another table (or two) is dependent upon your purpose and usage. If your standard use of the table does not make use of this information, it should go into another table with a 1 to 1 relationship. If you may need to increase the granuality of the login times, then you should use another table. Finally, if you do split this off into another table with a timestamp, give some consideration to privacy.

What is an elegant / efficient way of storing the status of 100 lessons for multiple users?

I'm working on an app in JavaScipt, jQuery, PHP & MySQL that consists of ~100 lessons. I am trying to think of an efficient way to store the status of each user's progress through the lessons, without having to query the MySQL database too much.
Right now, I am thinking the easiest implementation is to create a table for each user, and then store each lesson's status in that table. The only problem with that is if I add new lessons, I would have to update every user's table.
The second implementation I considered would be to store each lesson as a table, and record the user ID for each user that completed that lesson there - but then generating a status report (what lessons a user completed, how well they did, etc.) would mean pulling data from 100 tables.
Is there an obvious solution I am missing? How would you store your users progress through 100 lessons, so it's quick and simple to generate a status report showing their process.
Cheers!

The table structure I would recommend would be to keep a single table with non-unique fields userid and lessonid, as well as the relevant progress fields. When you want the progress of user x on lesson y, you would do this:
SELECT * FROM lessonProgress WHERE userid=x AND lessonid=y LIMIT 1;
You don't need to worry about performance unless you see that it's actually an issue. Having a table for each user or a table for each lesson are bad solutions because there aren't meant to be a dynamic number of tables in a database.

If reporting is restricted to one user at a time - that is, when generating a report, it's for a specific user and not a large clump of users - why not consider javascript object notation stored in a file? If extensibility is key, it would make it a simple matter.
Obviously, if you're going to run reports against an arbitrarily large number of users at once, separate data files would become inefficient.
Discarding the efficiency argument, json would also give you a very human-readable and interchangeable format.
Lastly, if the security of the report output isn't a big sticking point, you'd also gain the ability to easily offload view rendering onto the client.

Use relations between 2 tables. One for users with user specific columns like ID, username, email, w/e else you want to store about them.
Then a status table that has a UID foreign key. ID UID Status etc.
It's good to keep datecreated and dateupdated on tables as well.
Then just join the tables ON status.UID = users.ID

A good option will be to create one table with an user_ID as primary key and a status (int) each row of the table will represent a user. Accessing to its progress would be fast a simple since you have an index of user IDs.
In this way, adding new leassons would not make you change de DB

We Keep Coding

PHP, A popular general-purpose scripting language that is especially suited to web development.