Best way of getting the number of comments a user has post - php

I need to display the number of comments a user has post. I can think about two different ways of doing it, and I would like to know which one is better.
METHOD ONE: Each time I need to display the number of comments, query the comments table to select all comments with user_id x, and count the number of results.
METHOD TWO: Add a new column to the user table to store the number of comments a particular user has post. This value will be updated each time the user enters a new comment. This way every time I need to show the number of comments, I just need to query this value in the datbase.
I think the second method is more efficient, but I would like to know other opinions.
Any comment will be appreciated.
Thanks in advance,
Sonia

Well it depends. I suppose you use SQL. Counting is pretty fast of you have correct indexes (eg. SELECT COUNT(1) FROM articles WHERE user_id = ?). If this would be bottleneck than I would consider caching of these results.

At scale, option #2 is the only one that is viable. Counts may eventually be skewed some and you may need to rebuild the stats but this is a relatively low cost compared to trying to count the number of rows matching a secondary index.

Related

MYSQL select 'random' rows, but always give the same random rows if called again

My situations is this... I have a table of opportunities that is sorted. We have a paid service that will allow people to view the opportunities on the website any time. However we want an unpaid view that will show a random %/# of opportunities, which will always be the same. The opportunities are sorted out by dates; e.g. they will expire and be removed from the list, and a new one should be on the free search. However the only problem is that they will always have to show the same opportunity. (For example, I can't just pick random rows because it will cycle through them if they keep refreshing, and likewise can't just take the ones about to expire or furthest form expiry because people still end up seeing the entire list.
My only solution thus far is to add an extra column to the table to mark that it is open display. Then to count them on display, and if we are missing rows then to randomly select a few more. Below is a mock up...
SELECT count(id) as total FROM opportunities WHERE display_status="open" LIMIT 1000;
...
while(total < requiredNumber) {
UPDATE opportunities SET display_status="open" WHERE display_status="private" ORDER BY random() LIMIT (required-total);
}
Can anyone think of a better way to solve this problem, preferably one that does not leave me adding another column to the table, and possible conflicts if many people load the page at a single time. One final note as well, it can't be a random set number of them (e.g. pick one, skip a few, take the next).
Any thought/comments would be very helpful,
Thanks.
One way to make sure that a user only sees the same set of random rows is to feed the random number generator a seed that is linked to that user (such as their user_id). That means every user gets a random ordering of rows but it's always the same random ordering for each user.
Your code would be something:
SELECT ...
FROM ...
WHERE ...
ORDER BY random(<user id>)
LIMIT <however many>
Note: as Twelfth pointed out, as new rows are created, they will get new order values and may end up in your random selection.
I'm the type that doesn't like to lose information...including what random rows someone got to see. However I do not like the modification of your existing table idea...
Create a second table as randon_rows or something to that extent to save the ID's of the user and the ID's of the random records they got to see. Inner join to the table whenever you need to find those same rows again. You can also put expirey dates and the sort in the table as well, so the user isn't perma stuck with the same 10 rows.

Update view count, most reliable way

Hello again Stackoverflow!
I'm currently working on custom forumsoftware and one of the things you like to see on a forum is a viewcounter.
All the approaches for a viewcounter that I found would just select the topic from the database, retrieve the number from a "views" column, add one and update it.
But here's my thought: If, lets say 400, people at the exact same time open a topic, the MySQL database probably won´t count all views because it takes time for the queries to complete, and so the last person (of the 400) might overwrites the first persons (of the 400) view.
Ofcourse one could argue that on a normal site this is never going to happen, but if you have ~7 people opening that topic at the exact same second and the server is struggleing at that moment, you could have the same problem.
Is there any other good approach to count views?
EDIT
Woah, could the one who voted down specify why?
I ment by "Retrieving the number of views and adding one" that I would use SELECT to retrieve the number, add one using PHP (note the tags) and updating it using UPDATE. I had no idea of the other methods specified below, that's why I asked.
If, lets say 400, people at the exact same time open a topic, the MySQL database apparently would count all the views because this is exactly what databases were invented for.
All the approaches for a viewcounter that you have found are wrong. To update a field you don't need to retrieve it, but just already update:
UPDATE forum SET views + 1 WHERE id = ?
So something like that will work:
UPDATE tbl SET cnt = cnt+1 WHERE ...
UPDATE is guaranteed to be atomic. That means no one will be able to alter cnt between the time it is read and the time it is replaced. If you have several concurrent UPDATE for the same row (InnoDB) or table (MyISAM) they have to wait their turn to update the date.
See Is incrementing a field in MySQL atomic?
and http://dev.mysql.com/doc/refman/5.1/en/ansi-diff-transactions.html

Mysql store things like "Facebook likes"

I have to create a like system (the name won't be "like", Facebook owns it).
So I imagined two ways to store these likes in my database and I want to know, which way is the better for a very high-traffic site.
Create table comment_likes with "id", "comment_id", "user_id" cells. In comments table store the "like_count", so I don't need to count it when I need to write it out. But likes are easy to do thing, so people will create a lots of them and if I need to list a specified comment's likes, I need to read the whole comment_likes table and found all the user_ids. This could be millions of rows in the future. If 1000 user will do it in the same time, my system will die.
My second thought was, to store likes in comments table. create a cell named "likes" with a list of user_ids like this: 1#34#21#56#....
So when somebody like/unlike a comment just CONCAT or REPLACE his/her id in this cell with a #. When I need to list specified comment just explode this list at #-s.
I think 2nd could be faster and smarter, but what do you think about this?
The first option is much better, because you have the benefits of a relational setup. For example: What if you want to get the comments from the database userId x has liked? With the first setup this is a fast and simple query. In the second case you would have to use a LIKE, which is much slower and inaccurate. (Imagine the userId is 1, and the likes field in the comments table contains #10 - it would return the comment if you would use LIKE '%1%').
And even for a high traffic site; just using an index on commentId would make this a fast operation.
So go for the first option.
If you really doubt the speed of the first option, you could create a "cache" field in the comments table in which you count the amount of likes, so you don't have to perform a subquery to select the like count.

Optimizing an MYSQL COUNT ORDER BY query

I have recently written a survey application that has done it's job and all the data is gathered. Now i have to analyze the data and i'm having some time issues.
I have to find out how many people selected what option and display it all.
I'm using this query, which does do it's job:
SELECT COUNT(*)
FROM survey
WHERE users = ? AND table = ? AND col = ? AND row = ? AND selected = ?
GROUP BY users,table,col,row,selected
As evident by the "?" i'm using MySQLi (in php) to fetch the data when needed, but i fear this is causing it to be so slow.
The table consists of all the elements above (+ an unique ID) and all of them are integers.
To explain some of the fields:
Each survey was divided into 3 or 4 tables (sized from 2x3 to 5x5) with a 1 to 10 happiness grade to select form. (questions are on the right and top of the table, then you answer where the questions intersect)
users - age groups
table, row, col - explained above
selected - dooooh explained above
Now with the surveys complete and around 1 million entries in the table the query is getting very slow. Sometimes it takes like 3 minutes, sometimes (i guess) the time limit expires and you get no data at all. I also don't have access to the full database, just my empty "testing" one since the costumer is kinda paranoid :S (and his server seems to be a bit slow)
Now (after the initial essay) my questions are: I left indexing out intentionally because with a lot of data being written during the survey, it would be a bad idea. But since no new data is coming in at this point, would it make sense to index all the fields of a table? How much sense does it make to index integers that never go above 10? (as you can guess i haven't got a clue about indexes). Do i need the primary unique ID in this table? I
I read somewhere that indexing may help groups but only if you group by the first columns in a table (and since my ID is first and from my point of view useless can i remove it and gain anything by it?)
Is there another way to write my query that would basically do the same thing but in a shorter period of time?
Thanks for all your suggestions in advance!
Add an index on entries that you "GROUP BY" or do "WHERE". So that's ONE index incorporating users,table,col,row and selected in your case.
Some quick rules:
combine fields to have the WHERE first, and the GROUP BY elements last.
If you have other queries that only use part of it (e.g. users,table,col and selected) then leave the missing value (row, in this example) last.
Don't use too many indexes/indeces, as each will slow the table to updates marginally - so on really large system you need to balance queries with indexes.
Edit: do you need the GROUP BY user,col,row as these are used in the WHERE. If the WHERE has already filtered them out, you only need group by "selected".

Counting items or incrementing a number?

From someone with more experience than myself, would it be a better idea to simply count the number of items in a table (such as counting the number of topics in a category) or to keep a variable that holds that value and just increment and call it (an extra field in the category table)?
Is there a significant difference between the two or is it just very slight, and even if it is slight, would one method still be better than the other? It's not for any one particular project, so please answer generally (if that makes sense) rather than based on something like the number of users.
Thank you.
To get the number of items (rows in a table), you'd use standard SQL and do it on demand
SELECT COUNT(*) FROM MyTable
Note, in case I've missed something, each item (row) in the table has some unique identifier, whether it's a part number, some code, or an auto-increment. So adding a new row could trigger the "auto-increment" of a column.
This is unrelated to "counting rows". Because of DELETEs or ROLLBACK, numbers may not be contiguous.
Trying to maintain row counts separately will end in tears and/or disaster. Trying to use COUNT(*)+1 or MAX(id)+1 to generate a new row identifier is even worse
I think there is some confusion about your question. My interpretation is whether you want to do a select count(*) or a column where you track your actual count.
I would not add such a column, if you don't have reasons to do so. This is premature optimization and you complicate your software design.
Also, you want to avoid having the same information stored in different places. Counting is a trivial task, so you actually duplicating information, which is a bad idea.
I'd go with just counting. If you notice a performance issue, you can consider other options, but as soon as you keep a value that's separate, you have to do some work to make sure it's always correct. Using COUNT() you always get the actual number "straight from the horse's mouth" so to speak.
Basically, don't start optimizing until you have to. If everything works fine and fast using COUNT(), then do that. Otherwise, store the count somewhere, but rather than adding/subtracting to update the stored value, run COUNT() when needed to get the new number of items
In my forum I count the sub-threads in a forum like this:
SELECT COUNT(forumid) AS count FROM forumtable
As long as you're using an identifier that is the same to specify what forum and/or sub-section, and the column has an index key, it's very fast. So there's no reason to add more columns than you need to.

Categories