Calculating poll votes - php

I got a poll on my website and 5 stars rating:
1 star - 1 (worst)
2 stars - 2
3 stars - 3
4 stars - 4
5 stars - 5 (best)
Now, how should I store the poll records in MySQL? How to calculate them?
Default rate value is 5, but if user would rate it 1 star, it should change this value to 1 instead and then start to calculating it somehow... First I need an idea on how to store the votes in my database. You probably have more experience with that.

Store votes in a separate table, this way you will have record on who has voted.
user_id, topic_id , vote, date will be enough for now. Calculating is easy sum all votes divide by the total number of votes related to the topic. This will give you the average . In case you want it to show as 1-5 you can round() it. In order not to do this calculation every time you load a topic you can store it in a field in the topics table and update that field each time you add/remove record from the votes table.

Just store the votes in an integer field (1 to 5) in the table, combined with other info (eg to make sure the user can vote only once).
When you want to show the result, you use the cast votes, eg to calculate an average, or other statistics.
Recalculating (and storing) the statistics after each vote is cast, is also possible but not really required, unless you have much more page views than votes cast then it might result in less resource usage. (This also depending on the complexity of your statical calculations of course)

Related

Get lowest sum and highest sum of users from MySQL

I have a MySQL database that is holding users activities on the site. Each time the user completes something I log the amount of time it took them to complete. The field is stored as a decimal.
I am wanting to know how I can get the lowest sum and the highest sum amounts of users. Lets say user1 has performed 2 tasks taking .5 each. Their sum would be 1.0. User2 has performed 10 tasks each taking 1.5 each. Their sum would be 15. User3 has performed 20 tasks each taking .25 each. Their sum would be 5.
So running a query over the DB the lowest amount would be 1 and the highest amount would be 15.
I know how to get the sum of columns but not sure how to return the lowest and the highest.
Thanks.
You can do this with a subquery and two levels of aggregation:
select min(totaltime), max(totaltime)
from (select user, sum(amountoftime) as totaltime
from t
group by user
) t;
Actually getting the users associated with the min and max is a bit more difficult, but that is not this question.

MYSQL sorting content by rating logic and opinion?

I'm designing a site and don't know how to rate the system in terms of logic.
Outcome is I want an item with 4 stars with 1000 votes to be ranked higher than an item with 1 vote of 5 stars. However, I don't want an item with 1 star with 1000 votes to be ranked higher than an item with 4 stars and 200 votes.
Anyone have any ideas or advice on what to do?
I found these two questions
Sorting by weighted rating in SQL?
MySQL Rating System - Find Rating
and they have their drawbacks and in the first one I don't understand what the winner means by "You may want to denormalize this rating value into event for performance reasons if you have a lot of ratings coming in." Please share some insight? Thank you!
Here's a quick sketch-up of such a system which works by defining a bonus factor xₙ for each flag number. According to your question you want:
x₄*4*1000 > x₅*1*5
and
x₁*1*1000 < x₄*4*200
Setting the factors to for example x₁=1, x₄=2 and x₅=2 will satisfy this, but you will of course want to adjust it and add the missing factors.
He means, you should put rating-data into the event-table (and thus have redundant data) to optimize it for performance.
See the wiki for Denormalization: http://en.wikipedia.org/wiki/Denormalization
The data you have to determine the rank of items is:
average rating
number of ratings
The hard part is probably to make rules for the ranking. Like: If the average rating for an item > 4 and the number of ratings < 4 treat it like rated 3.9
For convenience, I would put this value (how to treat the items for ranking) in the item-table.

Making more recent items more likely to be drawn

There are a few hundred of book records in the database and each record has a publish time. In the homepage of the website, I am required to write some codes to randomly pick 10 books and put them there. The requirement is that newer books need to have higher chances of getting displayed.
Since the time is an integer, I am thinking like this to calculate the probability for each book:
Probability of a book to be drawn = (current time - publish time of the book) / ((current time - publish time of the book1) + (current time - publish time of the book1) + ... (current time - publish time of the bookn))
After a book is drawn, the next round of the loop will minus the (current time - publish time of the book) from the denominator and recalculate the probability for each of the remaining books, the loop continues until 10 books have been drawn.
Is this algorithm a correct one?
By the way, the website is written in PHP.
Feel free to suggest some PHP codes if you have a better algorithm in your mind.
Many thanks to you all.
Here's a very similar question that may help: Random weighted choice The solution is in C# but the code is very readable and close to PHP syntax so it should be easy to adapt.
For example, here's how one could do this in MySQL:
First calculate the total age of all books and store it in a MySQL user variable:
SELECT SUM(TO_DAYS(CURDATE())-TO_DAYS(publish_date)) FROM books INTO #total;
Then choose books randomly, weighted by their age:
SELECT book_id FROM (
SELECT book_id, TO_DAYS(CURDATE())-TO_DAYS(publish_date) AS age FROM books
) b
WHERE book_id NOT IN (...list of book_ids chosen so far...)
AND RAND()*#total < b.age AND (#total:=#total-b.age)
ORDER BY b.publish_date DESC
LIMIT 10;
Note that the #total decreases only if a book has passed the random-selection test, because of short-circuiting of AND expressions.
This is not guaranteed to choose 10 books in one pass -- it's not even guaranteed to choose any books on a given pass. So you have to re-run the second step until you've found 10 books. The #total variable retains its decreased value so you don't have to recalculate it.
First off I think your formula will guarantee that earlier books get picked. Try to set your initial probabilities based on:
Age - days since publication
Max(Age) - oldest book in the sample
Book Age(i) - age of book i
... Prob (i) = [Max (age) + e - Book Age (i)] / sum over all i [ Max (age) + e - Book age(i) ]
The value e ensures that the oldest book has some probability of being selected. Now that that is done, you can always recalc the prob of any sample.
Now you have to find an UNBIASED way of picking books. Probably the best way would be to calculate the cumulative distribution using the above then pick a uniform (0,1) r.v. Find where that r.v. is in the cumulative distribution and pick the book nearest to it.
Can't help you on the coding. Make sense?

Build a Ranking

I have a newssystem where you can rate News with 1 to 5 stars. In the Database i save the count, the sum and the absolute rating as int up to 100 (for html output, so 5 stars would be 100 1 star would be 20percent.
Now i have three toplists:
Best Rated
Most viewed
Most commented
Last two ones are simple, but the first is kinda tricky.
Before i took that thing over it was all a big mess, and they just put the 5 best rated news there, so in fact if there was a news rated 4.995 with 100k votes and another one with 5 stars at 1 vote, the "better rated" one is on top even if that is obv ridiculous.
For the first moment i capped the list so only news with a certain amount of votes (like 10 or 20) can be in the list.
But i do not really like that. Is there a nice method to kind-a give those things a "weight" with the count or something like that?
Have you considered using a weighted bayesian rating system? It'll weight the results based on the number of votes and the vote values themselves.
You could explore the statistical confidence in the rating perhaps based around the average rating received for all entries and the standard deviation of all votes. While an entry has an average rating of 5, if you only have a few votes then you may not be able to say with more than 90% confidence that the actual rating is above 4.7 say. You can then rate the entries based upon the rating for which you have 90% confidence.
I'm not sure if this meets your requirement of being simple.
You could use median of the user ratings as the total rating.
You would have five fields with eatch article, each one containing how many times the article was rated as n stars. Then you would select the field with the biggest value of all these and that would be your rating. It has the advantage of ignoring the outliers in the ratings.

Popularity Algorithm

I'm making a digg-like website that is going to have a homepage with different categories. I want to display the most popular submissions.
Our rating system is simply "likes", like "I like this" and whatnot. We basically want to display the submissions with the highest number of "likes" per time. We want to have three categories: all-time popularity, last week, and last day.
Does anybody know of a way to help? I have no idea how to go about doing this and making it efficient. I thought that we could use some sort of cron-job to run every 10 minutes and pull in the number of likes per the last 10 minutes...but I've been told that's pretty inefficient?
Help?
Thanks!
Typically Digg and Reddit-like sites go by the date of the submission and not the times of the votes. This way all it takes is a simple SQL query to find the top submissions for X time period. Here's a pseudo-query to find the 10 most popular links from the past 24 hours using this method:
select * from submissions
where (current_time - post_time) < 86400
order by score desc limit 10
Basically, this query says to find all the submissions where the number of seconds between now and the time it was posted is less than 86400, which is 24 hours in UNIX time.
If you really want to measure popularity within X time interval, you'll need to store the post and time for every vote in another table:
create table votes (
post foreign key references submissions(id),
time datetime,
vote integer); -- +1 for upvote, -1 for downvote
Then you can generate a list of the most popular posts between X and Y times like so:
select sum(vote), post from votes
where X < time and time < Y
group by post
order by sum(vote) desc limit 10;
From here you're just a hop, skip, and inner join away from getting the post data tied to the returned ids.
Do you have a decent DB setup? Can we please hear about your CREATE TABLE details and indices? Assuming a sane setup, the DB should be able to pull the counts you require fast enough to suit your needs! For example (net of indices and keys, that somewhat depend on what DB engine you're using), given two tables:
CREATE TABLE submissions (subid INT, when DATETIME, etc etc)
CREATE TABLE likes (subid INT, when DATETIME, etc etc)
you can get the top 33 all-time popular submissions as
SELECT *, COUNT(likes.subid) AS score
FROM submissions
JOIN likes USING(subid)
GROUP BY submissions.subid
ORDER BY COUNT(likes.subid) DESC
LIMIT 33
and those voted for within a certain time range as
SELECT *, COUNT(likes.subid) AS score
FROM submissions
JOIN likes USING(subid)
WHERE likes.when BETWEEN initial_time AND final_time
GROUP BY submissions.subid
ORDER BY COUNT(likes.subid) DESC
LIMIT 33
If you were storing "votes" (positive or negative) in likes, instead of just counting each entry there as +1, you could simply use SUM(likes.vote) instead of the COUNTs.
For stable list like alltime, lastweek, because they are not supposed to change really fast so that I think you should save the list in your cache with expiration time is around 1 days or longer.
If you concern about correct count in real time, you can check at every page view by comparing the page with lowest page in the cache.
All you need to do is to care for synchronizing between the cache and actual database.
thethanghn
Queries where the order is some function of the current time can become real performance problems. Things get much simpler if you can bucket by calendar time and update scores for each bucket as people vote.
To complete nobody_'s answer I would suggest you read up on the documentation (if you are using MySQL of course).

Categories