I am currently developing a simple PHP application where users can send messages to each other. Messages are stored in a SQL database. I'd like to put a count of unread messages in the menu on every page, so that a user can see quickly if they have new messages without checking the inbox periodically.
While this may be an easy problem to solve, I don't know what the best method would be, performance-wise :
Do a plain SQL COUNT() of unread messages on every page load (instantly notified of changes, but it may impact performance badly ?)
Do the same thing, but cache the result for X minutes (we create an annoying delay)
Same as 2., but only update when we read a message or when a message is sent to us (can use up a lot of RAM / stress the disk, since we create one persistent entry/file per user : we can't store it in $_SESSION because we need to update it when another user sends a message to us)
All my solutions are somewhat server-based, because I'm not very familiar with JS. But if a better solution exists using JavaScript, It's okay.
Thank you for your help !
I'd suggest 4'th:
Once new message has been sent to a user, you update counter in memcache. You create simple ajax application on client side sending a request every X seconds. At server side, you just check is there unread messages. At page refresh, you don't need to query the database since you get count from memcache extremely fast.
That's what I'd done if I had bottleneck in DB (in 90% cases, DB is the weekest part of any database-driven application).
That's what we usually do at highloaded web sites: we trying to avoid any COUNTER queries. If not, we denormalize the database to store counters right in the appropriate table as yet another column e.g. if you can not use memcache, you would store the unread messages counter as a column for Users table.
I'd go for option three, except I'd add memcached as solution 4.
Do a plain SQL COUNT() of unread
messages on every page load (instantly
notified of changes, but it may impact
performance badly ?)
As long as you have a decent table structure, COUNT() is a pretty fast command. I wouldn't cache this particular command. I'd instead work out the other queries to make sure you're only returning the data you need when showing them a listing. For example, if all you need is an excerpt, I'd make sure to do something like this:
SELECT id, author, msgdate, substring(body, 0, 50) from table where recipient = ?
instead of
SELECT * from table where recipient = ?;
Imho. It's best to let the client ping the server and send a json back with the amount of unread messages. Counting in mysql should be fast so I see no reason not to use it. Just filter the results on the chat session.
For the database part. The best way would be to store a new_message filled in your db table and default it to 1, and set that one to 0 when the message has been loaded.
Related
Is their any method/way that we come to know, any change occurs in desired table?
I have implemented one solution that it checks in db every 30 sec, that is if any change occur in chats table then refresh my chats listing. it waists lot of performance and slow down the site.
Is their any way that our action listener only calls whenever any row is inserted in table?
Any idea?
You may create table with notifications (which will always contains just unread messages for last 2 hours or so) and then create trigger [syntax] that will create new notification each time change occurs:
CREATE TRIGGER create_notification AFTER INSERT ON chats
FOR EACH ROW INSERT INTO notifications (...) VALUES( NEW.col1, NEW.col2)...
Then you'll end up with much smaller (and faster) database table.
There are some alternatives like writing a socket server but I think it would be less resource effective than pooling database plus I have a bad experience running php scripts for a long time.
well in your scenario you need a listener, which tracks the moment the new row of chat is inserted to the database.
It will be easier if you emit chat message to the users before you insert it into the database.
You can use socket. you can either use nodejs+socket.io to perform this. Here is bit about that
socket.io and node.js to send message to particular client
You can also look this article
https://www.flynsarmy.com/2012/02/php-websocket-chat-application-2-0
My webservices are as structured as follows:
Receive request
Parse and validate input
Do actual webservice
Validate output
I am primarily using logging for debugging purposes, if something went wrong I want to know what the request was so I can hope to reproduce it (ie. send the exact same request).
Currently I'm logging to a MySQL database table. After 1. a record is created, which is updated with more info after 2. and 3. and cleaned up after 4. (Logs of successful requests are pruned).
I want the logging to be as quick and painless as possible. Any speed up here will considerably improve overall performance (round trip of each request).
I was thinking of using INSERT DELAYED but I can't do that because I need the LAST_INSERT_ID to update and later delete the log record, or at least record the status of the request (ie. success or error) so I know when to prune.
I could generated a unique id myself (or at least an id that is 'unique enough') but even then I wont know the order of the DELAYED statements and I might end up trying to update or delete a record that doesn't exist yet. And since DELAYED also removes the ability to use NUM_AFFECTED_ROWS I can't check if the queries are effected.
Any suggestions?
When you say pruned im assuming if it was a success your removing the record? If so I think it would be better for you if you had a Java object storing the information instead of the database as the process unfolds then if an exception occurs you log the information in the object to the database all at once.
If you wanted to take it further, I did something similiar to this, I have a service which queues the logging of audit data and inserts the info at a steady pace so if the services are getting hammered were not clogging the DB with logging statements, but if your only logging errors that would probably be overkill.
I figured I can probably just do a REPLACE DELAYED and do the pruning some other time. (with a DELETE DELAYED).
This question has been asked a THOUSAND times... so it's not unfair if you decide to skip reading/answering it, but I still thought people would like to see and comment on my approach...
I'm building a site which requires an activity feed, like FourSquare.
But my site has this feature for the eye-candy's sake, and doesn't need the stuff to be saved forever.
So, I write the event_type and user_id to a MySQL table. Before writing new events to the table, I delete all the older, unnecessary rows (by counting the total number of rows, getting the event_id lesser than which everything is redundant, and deleting those rows). I prune the table, and write a new row every time an event happens. There's another user_text column which is NULL if there is no user-generated text...
In the front-end, I have jQuery that checks with a PHP file via GET every x seconds the user has the site open. The jQuery sends a request with the last update "id" it received. The <div> tags generated by my backend have the "id" attribute set as the MySQL row id. This way, I don't have to save the last_received_id in memory, though I guess there's absolutely no performance impact from storing one variable with a very small int value in memory...
I have a function that generates an "update text" depending on the event_type and user_id I pass it from the jQuery, and whether the user_text column is empty. The update text is passed back to jQuery, which appends the freshly received event <div> to the feed with some effects, while simultaneously getting rid of the "tail end" event <div> with an effect.
If I (more importantly, the client) want to, I can have an "event archive" table in my database (or a different one) that saves up all those redundant rows before deleting. This way, event information will be saved forever, while not impacting the performance of the live site...
I'm using CodeIgniter, so there's no question of repeated code anywhere. All the pertinent functions go into a LiveUpdates class in the library and model respectively.
I'm rather happy with the way I'm doing it because it solves the problem at hand while sticking to the KISS ideology... but still, can anyone please point me to some resources, that show a better way to do it? A Google search on this subject reveals too many articles/SO questions, and I would like to benefit from the experience any other developer that has already trawled through them and found out the best approach...
If you use proper indexes there's no reason you couldn't keep all the events in one table without affecting performance.
If you craft your polling correctly to return nothing when there is nothing new you can minimize the load each client has on the server. If you also look into push notification (the hybrid delayed-connection-closing method) this will further help you scale big successfully.
Finally, it is completely unnecessary to worry about variable storage in the client. This is premature optimization. The performance issues are going to be in the avalanche of connections to the web server from many users, and in the DB, tables without proper indexes.
About indexes: An index is "proper" when the most common query against a table can be performed with a seek and a minimal number of reads (like 1-5). In your case, this could be an incrementing id or a date (if it has enough precision). If you design it right, the operation to find the most recent update_id should be a single read. Then when your client submits its ajax request to see if there is updated content, first do a query to see if the value submitted (id or time) is less than the current value. If so, respond immediately with the new content via a second query. Keeping the "ping" action as lightweight as possible is your goal, even if this incurs a slightly greater cost for when there is new content.
Using a push would be far better, though, so please explore Comet.
If you don't know how many reads are going on with your queries then I encourage you to explore this aspect of the database so you can find it out and assess it properly.
Update: offering the idea of clients getting a "yes there's new content" answer and then actually requesting the content was perhaps not the best. Please see Why the Fat Pings Win for some very interesting related material.
I'm trying to write a PHP script for 'long-polling', returning data when new rows are added to a (Postgres) database table. Is there any way to get a SELECT query to return only when it would return results, blocking otherwise? Or should I use another signaling mechanism, outside of the database?
Take a look at LISTEN/NOTIFY:
The NOTIFY command sends a
notification event to each client
application that has previously
executed LISTEN name for the specified
notification name in the current
database
http://www.postgresql.org/docs/8.4/static/sql-notify.html
You can add an "ON INSERT" trigger to the table to fire off a NOTIFY event. However, you will need another mechanism to figure out which records need to be selected as the ability to deliver a payload with the NOTIFY event won't be available until 9.0:
http://www.postgresql.org/docs/9.0/static/sql-notify.html
there is no blocking select statement.
you could just issue the select statement on a regular basis - which incurs a certain overhead. If the query is expensive, then you might write a cheaper one like count(*) and keep track of new entries that may possibly be returned, and if the number changes issue the more expensive query.
You could look into LOCK and FOR UPDATE. FOR UPDATE can allow a query to wait until the row(s) that are being selected are unlocked. I'm not sure if there is a timeout or what resources impact having a large number of these can have, but it's one possibility.
You're trying to get an interrupt (event), when you should probably think about polling.
Create and call a stored procedure which will determine if there are new rows that the client should retrieve. If this is a web app, call an Ajax method periodically which, on the server, will query the db to see if there are new rows since its last call. If so, run another query to retrieve them and send them back to the client.
I love postgres and all, but if you're trying to do something simple and not super enterprisey, perhaps redis will be enough for you. I've had a lot of success with using it myself, and it can scale.
http://code.google.com/p/redis/
I'm designing a very simple (in terms of functionality) but difficult (in terms of scalability) system where users can message each other. Think of it as a very simple chatting service. A user can insert a message through a php page. The message is short and has a recipient name.
On another php page, the user can view all the messages that were sent to him all at once and then deletes them on the database. That's it. That's all the functionality needed for this system. How should I go about designing this (from a database/php point of view)?
So far I have the table like this:
field1 -> message (varchar)
field2 -> recipient (varchar)
Now for sql insert, I find that the time it takes is constant regardless of number of rows in the database. So my send.php will have a guaranteed return time which is good.
But for pulling down messages, my pull.php will take longer as the number of rows increase! I find the sql select (and delete) will take longer as the rows grow and this is true even after I have added an index for the recipient field.
Now, if it was simply the case that users will have to wait a longer time before their messages are pulled on the php then it would have been OK. But what I am worried is that when each pull.php service time takes really long, the php server will start to refuse connections to some request. Or worse the server might just die.
So the question is, how to design this such that it scales? Any tips/hints?
PS. Some estiamte on numbers:
number of users starts with 50,000 and goes up.
each user on average have around 10 messages stored before the other end might pull it down.
each user sends around 10-20 messages a day.
UPDATE from reading the answers so far:
I just want to clarify that by pulling down less messages from pull.php does not help. Even just pull one message will take a long time when the table is huge. This is because the table has all the messages so you have to do a select like this:
select message from DB where recipient = 'John'
even if you change it to this it doesn't help much
select top 1 message from DB where recipient = 'John'
So far from the answers it seems like the longer the table the slower the select will be O(n) or slightly better, no way around it. If that is the case, how should I handle this from the php side? I don't want the php page to fail on the http because the user will be confused and end up refreshing like mad which makes it even worse.
the database design for this is simple as you suggest. As far as it taking longer once the user has more messages, what you can do is just paginate the results. Show the first 10/50/100 or whatever makes sense and only pull those records. Generally speaking, your times shouldn't increase very much unless the volume of messages increases by an order of magnatude or more. You should be able to pull back 1000 short messages in way less than a second. Now it may take more time for the page to display at that point, but thats where the pagination should help.
I would suggest though going through and thinking of future features and building your database out a little more based on that. Adding more features to the software is easy, changing the database is comparatively harder.
Follow the rules of normalization. Try to reach 3rd normal form. To go further for this type of application probably isn’t worth it. Keep your tables thin.
Don’t actually delete rows just mark them as deleted with a bit flag. If you really need to remove them for some type of maintenance / cleanup to reduce size. Mark them as deleted and then create a cleanup process to archive or remove the records during low usage hours.
Integer values are easier for SQL server to deal with then character values. So instead of where recipient = 'John' use WHERE Recipient_ID = 23 You will gain this type of behavior when you normalize your database.
Don't use VARCHAR for your recipient. It's best to make a Recipient table with a primary key that is an integer (or bigint if you are expecting extremely large quantities of people).
Then when you do your select statements:
SELECT message FROM DB WHERE recipient = 52;
The speed retrieving rows will be much faster.
Plus, I believe MySQL indexes are B-Trees, which is O(log n) for most cases.
A database table without an index is called a heap, querying a heap results in each row of the table being evaluated even with a 'where' clause, the big-o notation for a heap is O(n) with n being the number of rows in the table. Adding an index (and this really depends on the underlying aspects of your database engine) results in a complexity of O(log(n)) to find the matching row in the table. This is because the index most certainly is implemented in a b-tree sort of way. Adding rows to the table, even with an index present is an O(1) operation.
> But for pulling down messages, my pull.php will take longer as the number of rows
increase! I find the sql select (and delete) will take longer as the rows grow and
this is true even after I have added an index for the recipient field.
UNLESS you are inserting into the middle of an index, at which point the database engine will need to shift rows down to accommodate. The same occurs when you delete from the index. Remember there is more than one kind of index. Be sure that the index you are using is not a clustered index as more data must be sifted through and moved with inserts and deletes.
FlySwat has given the best option available to you... do not use an RDBMS because your messages are not relational in a formal sense. You will get much better performance from a file system.
dbarker has also given correct answers. I do not know why he has been voted down 3 times, but I will vote him up at the risk that I may lose points. dbarker is referring to "Vertical Partitioning" and his suggestion is both acceptable and good. This isn't rocket surgery people.
My suggestion is to not implement this kind of functionality in your RDBMS, if you do remember that select, update, insert, delete all place locks on pages in your table. If you do go forward with putting this functionality into a database then run your selects with a nolock locking hint if it is available on your platform to increase concurrency. Additionally if you have so many concurrent users, partition your tables vertically as dbarker suggested and place these database files on separate drives (not just volumes but separate hardware) to increase I/O concurrency.
So the question is, how to design this such that it scales? Any tips/hints?
Yes, you don't want to use a relational database for message queuing. What you are trying to do is not what a relational database is best designed for, and while you can do it, its kinda like driving in a nail with a screwdriver.
Instead, look at one of the many open source message queues out there, the guys at SecondLife have a neat wiki where they reviewed a lot of them.
http://wiki.secondlife.com/wiki/Message_Queue_Evaluation_Notes
This is an unavoidable problem - more messages, more time to find the requested ones. The only thing you can do is what you already did - add an index and turn O(n) look up time for a complete table scan into O(log(u) + m) for a clustered index look up where n is the number of total messages, u the number of users, and m the number of messages per user.
Limit the number of rows that your pull.php will display at any one time.
The more data you transfer, longer it will take to display the page, regardless of how great your DB is.
You must limit your data in the SQL, return the most recent N rows.
EDIT
Put an index on Recipient and it will speed it up. You'll need another column to distinguish rows if you want to take the top 50 or something, possibly SendDate or an auto incrementing field. A Clustered index will slow down inserts, so use a regular index there.
You could always have only one row per user and just concatenate messages together into one long record. If you're keeping messages for a long period of time, that isn't the best way to go, but it reduces your problem to a single find and concatenate at storage time and a single find at retrieve time. It's hard to say without more detail - part of what makes DB design hard is meeting all the goals of the system in a well-compromised way. Without all the details, its hard to give advice on the best compromise.
EDIT: I thought I was fairly clear on this, but evidently not: You would not do this unless you were blanking a reader's queue when he reads it. This is why I prompted for clarification.