Update view count, most reliable way - php

Hello again Stackoverflow!
I'm currently working on custom forumsoftware and one of the things you like to see on a forum is a viewcounter.
All the approaches for a viewcounter that I found would just select the topic from the database, retrieve the number from a "views" column, add one and update it.
But here's my thought: If, lets say 400, people at the exact same time open a topic, the MySQL database probably won´t count all views because it takes time for the queries to complete, and so the last person (of the 400) might overwrites the first persons (of the 400) view.
Ofcourse one could argue that on a normal site this is never going to happen, but if you have ~7 people opening that topic at the exact same second and the server is struggleing at that moment, you could have the same problem.
Is there any other good approach to count views?
EDIT
Woah, could the one who voted down specify why?
I ment by "Retrieving the number of views and adding one" that I would use SELECT to retrieve the number, add one using PHP (note the tags) and updating it using UPDATE. I had no idea of the other methods specified below, that's why I asked.

If, lets say 400, people at the exact same time open a topic, the MySQL database apparently would count all the views because this is exactly what databases were invented for.
All the approaches for a viewcounter that you have found are wrong. To update a field you don't need to retrieve it, but just already update:
UPDATE forum SET views + 1 WHERE id = ?

So something like that will work:
UPDATE tbl SET cnt = cnt+1 WHERE ...
UPDATE is guaranteed to be atomic. That means no one will be able to alter cnt between the time it is read and the time it is replaced. If you have several concurrent UPDATE for the same row (InnoDB) or table (MyISAM) they have to wait their turn to update the date.
See Is incrementing a field in MySQL atomic?
and http://dev.mysql.com/doc/refman/5.1/en/ansi-diff-transactions.html

Related

Best way of getting the number of comments a user has post

I need to display the number of comments a user has post. I can think about two different ways of doing it, and I would like to know which one is better.
METHOD ONE: Each time I need to display the number of comments, query the comments table to select all comments with user_id x, and count the number of results.
METHOD TWO: Add a new column to the user table to store the number of comments a particular user has post. This value will be updated each time the user enters a new comment. This way every time I need to show the number of comments, I just need to query this value in the datbase.
I think the second method is more efficient, but I would like to know other opinions.
Any comment will be appreciated.
Thanks in advance,
Sonia
Well it depends. I suppose you use SQL. Counting is pretty fast of you have correct indexes (eg. SELECT COUNT(1) FROM articles WHERE user_id = ?). If this would be bottleneck than I would consider caching of these results.
At scale, option #2 is the only one that is viable. Counts may eventually be skewed some and you may need to rebuild the stats but this is a relatively low cost compared to trying to count the number of rows matching a secondary index.

Have a column "num_posts" or querying the database

Let's say I have a forum. A small forum with maybe 100 visitors a day.
Would the best way be to store the number of posts a topic has by just creating a column num_posts and each time a user makes a post in that topic I just increase that number by one. And the other way when a user deletes a post. Or just make a query?
SELECT COUNT(*)
FROM posts
WHERE topic_id = thetopicid
I prefer the second. But off course I guess it affect performance. But how much? Is this bad practice?
Use count(*). Having that extra column requires you to maintain it yourself, i.e. update on new and deleted posts. You need to add something extra to do this which definitely requires extra resource, whereas using count(*) you are using something already built in to DMBS.

Optimizing an MYSQL COUNT ORDER BY query

I have recently written a survey application that has done it's job and all the data is gathered. Now i have to analyze the data and i'm having some time issues.
I have to find out how many people selected what option and display it all.
I'm using this query, which does do it's job:
SELECT COUNT(*)
FROM survey
WHERE users = ? AND table = ? AND col = ? AND row = ? AND selected = ?
GROUP BY users,table,col,row,selected
As evident by the "?" i'm using MySQLi (in php) to fetch the data when needed, but i fear this is causing it to be so slow.
The table consists of all the elements above (+ an unique ID) and all of them are integers.
To explain some of the fields:
Each survey was divided into 3 or 4 tables (sized from 2x3 to 5x5) with a 1 to 10 happiness grade to select form. (questions are on the right and top of the table, then you answer where the questions intersect)
users - age groups
table, row, col - explained above
selected - dooooh explained above
Now with the surveys complete and around 1 million entries in the table the query is getting very slow. Sometimes it takes like 3 minutes, sometimes (i guess) the time limit expires and you get no data at all. I also don't have access to the full database, just my empty "testing" one since the costumer is kinda paranoid :S (and his server seems to be a bit slow)
Now (after the initial essay) my questions are: I left indexing out intentionally because with a lot of data being written during the survey, it would be a bad idea. But since no new data is coming in at this point, would it make sense to index all the fields of a table? How much sense does it make to index integers that never go above 10? (as you can guess i haven't got a clue about indexes). Do i need the primary unique ID in this table? I
I read somewhere that indexing may help groups but only if you group by the first columns in a table (and since my ID is first and from my point of view useless can i remove it and gain anything by it?)
Is there another way to write my query that would basically do the same thing but in a shorter period of time?
Thanks for all your suggestions in advance!
Add an index on entries that you "GROUP BY" or do "WHERE". So that's ONE index incorporating users,table,col,row and selected in your case.
Some quick rules:
combine fields to have the WHERE first, and the GROUP BY elements last.
If you have other queries that only use part of it (e.g. users,table,col and selected) then leave the missing value (row, in this example) last.
Don't use too many indexes/indeces, as each will slow the table to updates marginally - so on really large system you need to balance queries with indexes.
Edit: do you need the GROUP BY user,col,row as these are used in the WHERE. If the WHERE has already filtered them out, you only need group by "selected".

hiding model data based on id's existance in another table

I've got a somewhat complicated question for you cakephp experts.
Basically, I have created a db table called "locations". Every month I will get this table sent to me in csv format from a client. Unfortunately, instead of updating this table, I will have to empty it and reimport all of the records. Unfortunately, I cannot alter this table at all.
Functionality wise, users will have the ability to look at a display of these records, and be able to choose to hide certain ones. This "hidden" attribute must be persistent and survive the month to month purging of all records.
I had all of this working yesterday. What I did was, create a separate table called location_properties (columns were: id(int), location_id(foreign key), is_hidden(boolean)). When showing these records, it would simply check to see if "is_hidden==true".
This was all well and good(AND WORKING!), but then my boss kind of gummed up the works. He told me to delete the "is_hidden" column from the table because it would be more efficient. That I should be able to simply check for the existence of the location_id to hide or show it.
It doesn't appear to be quite that simple. Anyone know how I can pull this off? I've tried everything I can think of.
Your boss is wrong.
It's more efficient to add your column, than it is too delete and re-import the locations every month.
Did he say it was less efficient, or did you do an actual benchmark to see if its harms performance too much?
At first glance I see 2 solutions:
1) add a condition array('Location.id' => 'NOT NULL')
2) change join type to right join
I hope this helps

PHP join help with two tables

I am just learning php as I go along, and I'm completely lost here. I've never really used join before, and I think I need to here, but I don't know. I'm not expecting anyone to do it for me but if you could just point me in the right direction it would be amazing, I've tried reading up on joins but there are like 20 different methods and I'm just lost.
Basically, I hand coded a forum, and it works fine but is not efficient.
I have board_posts (for posts) and board_forums (for forums, the categories as well as the sections).
The part I'm redoing is how I get the information for the last post for the index page. The way I set it up is that to avoid using joins, I have it store the info for latest post in the table for board_forums, so say there is a section called "Off Topic" there I would have a field for "forum_lastpost_username/userid/posttitle/posttime" which I woudl update when a user posts etc. But this is bad, I'm trying to grab it all dynamically and get rid of those fields.
Right now my query is just like:
`SELECT * FROM board_forums WHERE forum_parent='$forum_id''
And then I have the stuff where I grab the info for that forum (name, description, etc) and all the data for the last post is there:
$last_thread_title = $forumrow["forum_lastpost_title"];
$last_thread_time = $forumrow["forum_lastpost_time"];
$lastpost_username = $forumrow["forum_lastpost_username"];
$lastpost_threadid = $forumrow["forum_lastpost_threadid"];
But I need to get rid of that, and get it from board_posts. The way it's set up in board_posts is that if it's a thread, post_parentpost is NULL, if it's a reply, then that field has the id of the thread (first post of the topic). So, I need to grab the latest post_date, see which user posted that, THEN see if parentpost is NULL (if it's null then the last post is a new thread, so I can get all the info of the title and user there, but if it's not, then I need to get the info (title, id) of the first post in that thread (which can be found by seeing what post_parentpost is, looking up that ID and getting the title from it.
Does that make any sense? If so please help me out :(
Any help is greatly appreciated!!!!
Updating board___forums whenever a post or a reply is inserted is - regarding performance - not the worst idea. For displaying the index page you only have to select data from one table board_forums - this is definitely much faster than selecting a second table to get the "last posts' information", even when using a clever join.
You are better off just updating the stats on each action, New Post, Delete Post etc.
The other instances would not likely require any stats update (deletion of a thread would trigger a forum update, to show one less topic in the topic count).
Think about all the actions the user would do, in most cases, you dont need to update any stats, therefore, getting the counts on the fly is very inefficient and you are right to think so.
It looks like you've already done the right thing.
If you were to join, you'd do it like this:
SELECT * FROM board_forums
JOIN board_posts ON board_posts.forum_id = board_forums.id
WHERE forum_parent = '$forum_id'
The problem with that, is that it gets you every post, which is not useful (and very slow). What you would want to do is something like this
SELECT * FROM board_forums
JOIN board_posts ON board_posts.forum_id = board_forums.id ORDER BY board_posts.id desc LIMIT 1
WHERE forum_parent = '$forum_id'
except SQL doesn't work like that. You can't order or limit on a join (or do many other useful things like that), so you have to fetch every row and then scan them in code (which sucks).
In short, don't worry. Use joins for the actual case where you do want to load all forums and all posts in one hit.
The simple solution will result in numerous queries, some optional, as you're already discovered.
The classic approach to this is to cache the results, and only retrieve it once in a while. The cache doesn't have to live long; even two or three seconds on a busy site will make a significant difference.
De-normalizing the data into a table you're already reading anyway will help. This approach saves you figuring out optional queries and can be a bit of a cheap win because it's just one more update when an insert is already happening. But it shifts some data integrity to the application.
As an aside, you might be running into the recursive-query problem with your threads. Relational databases do not store heirarchical data all that well if you use a "simple" algorithim. A better way is something sometimes called 'set trees'. It's a bit hard to Google, unfortunately, so here are some links.

Categories