Let's say I have a forum. A small forum with maybe 100 visitors a day.
Would the best way be to store the number of posts a topic has by just creating a column num_posts and each time a user makes a post in that topic I just increase that number by one. And the other way when a user deletes a post. Or just make a query?
SELECT COUNT(*)
FROM posts
WHERE topic_id = thetopicid
I prefer the second. But off course I guess it affect performance. But how much? Is this bad practice?
Use count(*). Having that extra column requires you to maintain it yourself, i.e. update on new and deleted posts. You need to add something extra to do this which definitely requires extra resource, whereas using count(*) you are using something already built in to DMBS.
Related
I need to display the number of comments a user has post. I can think about two different ways of doing it, and I would like to know which one is better.
METHOD ONE: Each time I need to display the number of comments, query the comments table to select all comments with user_id x, and count the number of results.
METHOD TWO: Add a new column to the user table to store the number of comments a particular user has post. This value will be updated each time the user enters a new comment. This way every time I need to show the number of comments, I just need to query this value in the datbase.
I think the second method is more efficient, but I would like to know other opinions.
Any comment will be appreciated.
Thanks in advance,
Sonia
Well it depends. I suppose you use SQL. Counting is pretty fast of you have correct indexes (eg. SELECT COUNT(1) FROM articles WHERE user_id = ?). If this would be bottleneck than I would consider caching of these results.
At scale, option #2 is the only one that is viable. Counts may eventually be skewed some and you may need to rebuild the stats but this is a relatively low cost compared to trying to count the number of rows matching a secondary index.
Hello again Stackoverflow!
I'm currently working on custom forumsoftware and one of the things you like to see on a forum is a viewcounter.
All the approaches for a viewcounter that I found would just select the topic from the database, retrieve the number from a "views" column, add one and update it.
But here's my thought: If, lets say 400, people at the exact same time open a topic, the MySQL database probably won´t count all views because it takes time for the queries to complete, and so the last person (of the 400) might overwrites the first persons (of the 400) view.
Ofcourse one could argue that on a normal site this is never going to happen, but if you have ~7 people opening that topic at the exact same second and the server is struggleing at that moment, you could have the same problem.
Is there any other good approach to count views?
EDIT
Woah, could the one who voted down specify why?
I ment by "Retrieving the number of views and adding one" that I would use SELECT to retrieve the number, add one using PHP (note the tags) and updating it using UPDATE. I had no idea of the other methods specified below, that's why I asked.
If, lets say 400, people at the exact same time open a topic, the MySQL database apparently would count all the views because this is exactly what databases were invented for.
All the approaches for a viewcounter that you have found are wrong. To update a field you don't need to retrieve it, but just already update:
UPDATE forum SET views + 1 WHERE id = ?
So something like that will work:
UPDATE tbl SET cnt = cnt+1 WHERE ...
UPDATE is guaranteed to be atomic. That means no one will be able to alter cnt between the time it is read and the time it is replaced. If you have several concurrent UPDATE for the same row (InnoDB) or table (MyISAM) they have to wait their turn to update the date.
See Is incrementing a field in MySQL atomic?
and http://dev.mysql.com/doc/refman/5.1/en/ansi-diff-transactions.html
I have to create a like system (the name won't be "like", Facebook owns it).
So I imagined two ways to store these likes in my database and I want to know, which way is the better for a very high-traffic site.
Create table comment_likes with "id", "comment_id", "user_id" cells. In comments table store the "like_count", so I don't need to count it when I need to write it out. But likes are easy to do thing, so people will create a lots of them and if I need to list a specified comment's likes, I need to read the whole comment_likes table and found all the user_ids. This could be millions of rows in the future. If 1000 user will do it in the same time, my system will die.
My second thought was, to store likes in comments table. create a cell named "likes" with a list of user_ids like this: 1#34#21#56#....
So when somebody like/unlike a comment just CONCAT or REPLACE his/her id in this cell with a #. When I need to list specified comment just explode this list at #-s.
I think 2nd could be faster and smarter, but what do you think about this?
The first option is much better, because you have the benefits of a relational setup. For example: What if you want to get the comments from the database userId x has liked? With the first setup this is a fast and simple query. In the second case you would have to use a LIKE, which is much slower and inaccurate. (Imagine the userId is 1, and the likes field in the comments table contains #10 - it would return the comment if you would use LIKE '%1%').
And even for a high traffic site; just using an index on commentId would make this a fast operation.
So go for the first option.
If you really doubt the speed of the first option, you could create a "cache" field in the comments table in which you count the amount of likes, so you don't have to perform a subquery to select the like count.
Sorry if the title is a little... Crappy. Basically I'm writing a small forum and using multiple sub-queries to select the number of threads, number of posts, and the date of the last post in a forum while grabbing the forum's information at the same time to display on the main page!
This is my query, since I suck at explaining things:
SELECT `f`.*,
(SELECT COUNT(`id`)
FROM `forum_threads`
WHERE `forumId1` = `f`.`id1`
AND `forumId2` = `f`.`id2`) AS `threadCount`,
(SELECT COUNT(`id`)
FROM `forum_posts`
WHERE `forumId1` = `f`.`id1`
AND `forumId2` = `f`.`id2`) AS `postCount`,
(SELECT `date`
FROM `forum_posts`
WHERE `forumId1` = `f`.`id1`
AND `forumId2` = `f`.`id2`
ORDER BY `date` DESC LIMIT 1) AS `lastPostDate`
FROM `forum_forums` AS `f`
ORDER BY `f`.`position` ASC, `f`.`id1` ASC;
And am using the general foreach loop to display the results:
foreach($forums AS $forum) {
echo $forum->name .'<br />';
echo $forum->threadCount .'<br />';
echo $forum->postCount .'<br />';
echo $forum->lastPostDate .'<br />';
}
(Not exactly like that of course, but for the sake of explaining...)
Now I was wondering if that would be "bad" for performance, or if there was any better way of doing it? Assuming there are quite a few posts and threads in each forum.
I was originally storing "posts", "threads", and "lastPost" columns in the forum table itself, and was going to increment (posts = posts + 1) the values every time someone created a new thread or post. Though I had this idea as well and was wondering if it was any good. :P
I would do things a bit differently:
It seems to me that all these three fields: threadCount, postCount and lastPostDate are fields that you can maintain on a separate table, say forum_stats which will hold only 4 columns:
* forum_id
* thread_count
* post_count
* last_post_date
These columns can be updated via. trigger upon insert/update.
If you'll pay this small overhead during the update operations - you'll get a very fast query for the select (and it will remain very fast regardless the number of forums/posts/threads you have).
Another approach (not us good TMO):
Create the stats table and run a daily (or every few hours) a batch-job which will update the stats. The price is that the data you display will never be up-to-date, and the job might require resources, you might want to run the job only at night, for example, since it's heavy and you don't want it to effect the majority of your website visitors.
Usually this kind of thing is terrible from a performance perspective and you'd be better off with counter columns that you can fetch from a single row. Keeping these in sync can be annoying, but there's no retrieval cost once they're in there.
You've identified the data you're retrieving, so what you need to do next is figure out how to put that data in there in the first place. #alfasin's answer describes an example schema, and while putting it in a separate table is one idea, there's usually not too much in the way of trouble just putting them in the main one. If you're worried about locking, update in smaller batches.
One approach is to write a TRIGGER that updates the counters as records are added and removed from the various tables. This tends to hide a lot of the complexity which can be a bad thing if the logic changes often and people need to be aware of how the system works.
A simple method is to just fiddle with the columns using an additional query after you've created or removed something that would have updated them. For instance, adjusting the last-posted-date is trivial if you do it at the time a post is created.
If these counters get a bit screwy, and they will eventually, you need a method to bring them back into sync. An easy way is to write a VIEW that produces the same results your query does now, perhaps re-written to use LEFT JOIN instead, and then UPDATE against that if that's possible. This may involve using a temporary table if MySQL can't cope with updating a table with a view of itself, but that's usually not a big deal.
I am just learning php as I go along, and I'm completely lost here. I've never really used join before, and I think I need to here, but I don't know. I'm not expecting anyone to do it for me but if you could just point me in the right direction it would be amazing, I've tried reading up on joins but there are like 20 different methods and I'm just lost.
Basically, I hand coded a forum, and it works fine but is not efficient.
I have board_posts (for posts) and board_forums (for forums, the categories as well as the sections).
The part I'm redoing is how I get the information for the last post for the index page. The way I set it up is that to avoid using joins, I have it store the info for latest post in the table for board_forums, so say there is a section called "Off Topic" there I would have a field for "forum_lastpost_username/userid/posttitle/posttime" which I woudl update when a user posts etc. But this is bad, I'm trying to grab it all dynamically and get rid of those fields.
Right now my query is just like:
`SELECT * FROM board_forums WHERE forum_parent='$forum_id''
And then I have the stuff where I grab the info for that forum (name, description, etc) and all the data for the last post is there:
$last_thread_title = $forumrow["forum_lastpost_title"];
$last_thread_time = $forumrow["forum_lastpost_time"];
$lastpost_username = $forumrow["forum_lastpost_username"];
$lastpost_threadid = $forumrow["forum_lastpost_threadid"];
But I need to get rid of that, and get it from board_posts. The way it's set up in board_posts is that if it's a thread, post_parentpost is NULL, if it's a reply, then that field has the id of the thread (first post of the topic). So, I need to grab the latest post_date, see which user posted that, THEN see if parentpost is NULL (if it's null then the last post is a new thread, so I can get all the info of the title and user there, but if it's not, then I need to get the info (title, id) of the first post in that thread (which can be found by seeing what post_parentpost is, looking up that ID and getting the title from it.
Does that make any sense? If so please help me out :(
Any help is greatly appreciated!!!!
Updating board___forums whenever a post or a reply is inserted is - regarding performance - not the worst idea. For displaying the index page you only have to select data from one table board_forums - this is definitely much faster than selecting a second table to get the "last posts' information", even when using a clever join.
You are better off just updating the stats on each action, New Post, Delete Post etc.
The other instances would not likely require any stats update (deletion of a thread would trigger a forum update, to show one less topic in the topic count).
Think about all the actions the user would do, in most cases, you dont need to update any stats, therefore, getting the counts on the fly is very inefficient and you are right to think so.
It looks like you've already done the right thing.
If you were to join, you'd do it like this:
SELECT * FROM board_forums
JOIN board_posts ON board_posts.forum_id = board_forums.id
WHERE forum_parent = '$forum_id'
The problem with that, is that it gets you every post, which is not useful (and very slow). What you would want to do is something like this
SELECT * FROM board_forums
JOIN board_posts ON board_posts.forum_id = board_forums.id ORDER BY board_posts.id desc LIMIT 1
WHERE forum_parent = '$forum_id'
except SQL doesn't work like that. You can't order or limit on a join (or do many other useful things like that), so you have to fetch every row and then scan them in code (which sucks).
In short, don't worry. Use joins for the actual case where you do want to load all forums and all posts in one hit.
The simple solution will result in numerous queries, some optional, as you're already discovered.
The classic approach to this is to cache the results, and only retrieve it once in a while. The cache doesn't have to live long; even two or three seconds on a busy site will make a significant difference.
De-normalizing the data into a table you're already reading anyway will help. This approach saves you figuring out optional queries and can be a bit of a cheap win because it's just one more update when an insert is already happening. But it shifts some data integrity to the application.
As an aside, you might be running into the recursive-query problem with your threads. Relational databases do not store heirarchical data all that well if you use a "simple" algorithim. A better way is something sometimes called 'set trees'. It's a bit hard to Google, unfortunately, so here are some links.