Build a Ranking

Build a Ranking - php

I have a newssystem where you can rate News with 1 to 5 stars. In the Database i save the count, the sum and the absolute rating as int up to 100 (for html output, so 5 stars would be 100 1 star would be 20percent.
Now i have three toplists:
Best Rated
Most viewed
Most commented
Last two ones are simple, but the first is kinda tricky.
Before i took that thing over it was all a big mess, and they just put the 5 best rated news there, so in fact if there was a news rated 4.995 with 100k votes and another one with 5 stars at 1 vote, the "better rated" one is on top even if that is obv ridiculous.
For the first moment i capped the list so only news with a certain amount of votes (like 10 or 20) can be in the list.
But i do not really like that. Is there a nice method to kind-a give those things a "weight" with the count or something like that?

Have you considered using a weighted bayesian rating system? It'll weight the results based on the number of votes and the vote values themselves.

You could explore the statistical confidence in the rating perhaps based around the average rating received for all entries and the standard deviation of all votes. While an entry has an average rating of 5, if you only have a few votes then you may not be able to say with more than 90% confidence that the actual rating is above 4.7 say. You can then rate the entries based upon the rating for which you have 90% confidence.
I'm not sure if this meets your requirement of being simple.

You could use median of the user ratings as the total rating.
You would have five fields with eatch article, each one containing how many times the article was rated as n stars. Then you would select the field with the biggest value of all these and that would be your rating. It has the advantage of ignoring the outliers in the ratings.

Related

Calculating poll votes

I got a poll on my website and 5 stars rating:
1 star - 1 (worst)
2 stars - 2
3 stars - 3
4 stars - 4
5 stars - 5 (best)
Now, how should I store the poll records in MySQL? How to calculate them?
Default rate value is 5, but if user would rate it 1 star, it should change this value to 1 instead and then start to calculating it somehow... First I need an idea on how to store the votes in my database. You probably have more experience with that.

Store votes in a separate table, this way you will have record on who has voted.
user_id, topic_id , vote, date will be enough for now. Calculating is easy sum all votes divide by the total number of votes related to the topic. This will give you the average . In case you want it to show as 1-5 you can round() it. In order not to do this calculation every time you load a topic you can store it in a field in the topics table and update that field each time you add/remove record from the votes table.

Just store the votes in an integer field (1 to 5) in the table, combined with other info (eg to make sure the user can vote only once).
When you want to show the result, you use the cast votes, eg to calculate an average, or other statistics.
Recalculating (and storing) the statistics after each vote is cast, is also possible but not really required, unless you have much more page views than votes cast then it might result in less resource usage. (This also depending on the complexity of your statical calculations of course)

MYSQL sorting content by rating logic and opinion?

I'm designing a site and don't know how to rate the system in terms of logic.
Outcome is I want an item with 4 stars with 1000 votes to be ranked higher than an item with 1 vote of 5 stars. However, I don't want an item with 1 star with 1000 votes to be ranked higher than an item with 4 stars and 200 votes.
Anyone have any ideas or advice on what to do?
I found these two questions
Sorting by weighted rating in SQL?
MySQL Rating System - Find Rating
and they have their drawbacks and in the first one I don't understand what the winner means by "You may want to denormalize this rating value into event for performance reasons if you have a lot of ratings coming in." Please share some insight? Thank you!

Here's a quick sketch-up of such a system which works by defining a bonus factor xₙ for each flag number. According to your question you want:
x₄*4*1000 > x₅*1*5
and
x₁*1*1000 < x₄*4*200
Setting the factors to for example x₁=1, x₄=2 and x₅=2 will satisfy this, but you will of course want to adjust it and add the missing factors.

He means, you should put rating-data into the event-table (and thus have redundant data) to optimize it for performance.
See the wiki for Denormalization: http://en.wikipedia.org/wiki/Denormalization
The data you have to determine the rank of items is:
average rating
number of ratings
The hard part is probably to make rules for the ranking. Like: If the average rating for an item > 4 and the number of ratings < 4 treat it like rated 3.9
For convenience, I would put this value (how to treat the items for ranking) in the item-table.

Star Rating System, Similar to Amazon

I'm looking at adding a rating system to my site, similar to that seen on Amazon. Basically users can rate the product out of 5 stars.
I've been racking my brains to think of the calculation to get the average customer rating... but I can't think how to do it.
And looking at Amazon's system, I think their caculation may be incorrect. If you take this page for example:
http://www.amazon.co.uk/exec/obidos...2521526-3543861
You can see that the average customer rating should actually be 5 stars as two people have chosen 5 whereas only one has chosen 4.
Any ideas??

You can calculate the average by taking the sum and dividing it by the number of values. In the case of ratings, it is the sum of all ratings divided by the number of ratings.
In the case you site, with ratings 5, 5 and 4, the average is (5+5+4)/3, which is 4.666..., and I'd guess they round to the nearest half yielding 4.5 out of 5.

why not try a jquery plugin?
http://orkans-tmp.22web.net/star_rating/index.html

Making more recent items more likely to be drawn

There are a few hundred of book records in the database and each record has a publish time. In the homepage of the website, I am required to write some codes to randomly pick 10 books and put them there. The requirement is that newer books need to have higher chances of getting displayed.
Since the time is an integer, I am thinking like this to calculate the probability for each book:
Probability of a book to be drawn = (current time - publish time of the book) / ((current time - publish time of the book1) + (current time - publish time of the book1) + ... (current time - publish time of the bookn))
After a book is drawn, the next round of the loop will minus the (current time - publish time of the book) from the denominator and recalculate the probability for each of the remaining books, the loop continues until 10 books have been drawn.
Is this algorithm a correct one?
By the way, the website is written in PHP.
Feel free to suggest some PHP codes if you have a better algorithm in your mind.
Many thanks to you all.

Here's a very similar question that may help: Random weighted choice The solution is in C# but the code is very readable and close to PHP syntax so it should be easy to adapt.
For example, here's how one could do this in MySQL:
First calculate the total age of all books and store it in a MySQL user variable:
SELECT SUM(TO_DAYS(CURDATE())-TO_DAYS(publish_date)) FROM books INTO #total;
Then choose books randomly, weighted by their age:
SELECT book_id FROM (
SELECT book_id, TO_DAYS(CURDATE())-TO_DAYS(publish_date) AS age FROM books
) b
WHERE book_id NOT IN (...list of book_ids chosen so far...)
AND RAND()*#total < b.age AND (#total:=#total-b.age)
ORDER BY b.publish_date DESC
LIMIT 10;
Note that the #total decreases only if a book has passed the random-selection test, because of short-circuiting of AND expressions.
This is not guaranteed to choose 10 books in one pass -- it's not even guaranteed to choose any books on a given pass. So you have to re-run the second step until you've found 10 books. The #total variable retains its decreased value so you don't have to recalculate it.

First off I think your formula will guarantee that earlier books get picked. Try to set your initial probabilities based on:
Age - days since publication
Max(Age) - oldest book in the sample
Book Age(i) - age of book i
... Prob (i) = [Max (age) + e - Book Age (i)] / sum over all i [ Max (age) + e - Book age(i) ]
The value e ensures that the oldest book has some probability of being selected. Now that that is done, you can always recalc the prob of any sample.
Now you have to find an UNBIASED way of picking books. Probably the best way would be to calculate the cumulative distribution using the above then pick a uniform (0,1) r.v. Find where that r.v. is in the cumulative distribution and pick the book nearest to it.
Can't help you on the coding. Make sense?

How can I create 'teams' from a list of weighted 'users' randomly but fairly using PHP?

What I am hoping to achieve is the ability to generate 'teams' of users. I will have x amount of men, weighted (decimal skill weight, like 75.23) and y amount of women (also with a skill weight value).
Given that list of users, I would then take for input the number of teams to make (let us say, 6 teams). Then, I go through the list of x's and y's and organize them so that the best average possible weighted teams are created. I would like to keep the teams balanced (women and men ratio)
I don't want 'stacked' teams, (best skilled in one team). I would like an even distribution of weight.
Curious how I could achieve this in PHP? I'd be using a MySQL database to fetch users with weight values. I would know ahead of time how many users I would have, also how many teams I would want to generate.
I would appreciate any suggestions, or links to a solution if anyone has found something similar like this. I'm just not a math wiz, so I don't know what formula would apply here.
Thanks. I appreciate any input!
EDIT
After reviewing the answers, maybe I was not clear enough, so hopefully this helps a little more.
I want the teams to be roughly equally-sized
I want the average (mean) skill score for each team to be roughly equal
I want the ratio of men to women in each team to be roughly equal (that is to say, if by division, we get a distribution, of 5 men and 3 women per team, I would like to keep that roughly the same). Not really an issue if I sort men first, and women second (or vise-versa).
I don't want a linear approach (team 1 gets highest, team 2, sec highest, team 3.. so on). Tim's method of taking (if 6 teams) 6 people and randomizing and then distributing via linear fashion seems to work out fine.

I'm not entirely clear what you're after here, so I'll recap on what I understand you to be asking. If this is not right, you can clarify your requirements by editing your question:
You have a list of a certain number of men and a certain number of women. Each person has a known skill score. You want to divide these into a certain number of teams, with the following aims:
you want the teams to be roughly equally-sized
you want the average (mean) skill score for each team to be roughly equal
you want the ratio of men to women in each team to be roughly equal
I would have thought that a simple method to achieve this would be:
Create a list of all the men in decreasing order of skill score.
Create a list of all the women in decreasing order of skill score.
Add the list of women to the end of the list of men.
Start at the beginning of the combined list, and allocated each person in turn to a team in a round-robin fashion. (That is to say, allocate the first person to team number one, the second to team number two, and so on until you have allocated one person to each of the teams you wish to create. Then start again with team one, allocating people to each team in order, and so on.)
With this approach, you will be guaranteed the following outcomes:
If possible (i.e. if the number of teams divides the total number of people), the teams will all have the same number of people.
If the teams are not all the same size, the largest team will have exactly one more person than the smallest team.
If possible the teams will all have the same number of men.
If the teams do not have the same number of men, the team with the most men exactly one more man than the team with the least men.
If possible the teams will all have the same number of women.
If the teams do not have the same number of women, the team with the most women exactly one more man than the team with the least women.
Each team will have men with a range of skill scores, from near the top of the range to near the bottom of the range.
Each team will have women with a range of skill scores, from near the top of the range to near the bottom of the range.
With sensible data, the mean skill score for each team will be roughly equal (although team one will have a slightly higher mean score than team two, and so on - there are ways of correcting this).
If this simple approach doesn't meet your requirements, please let us know what else you had in mind.

This is similar to "maximum/minimum weight perfect matching", just that the matching is for more than two elements (note that this is a different weight from what you have (the skill weight), namely, you would assign a weight to a matching (a matching would be a proposed 'team')).
The known algorithms for the perfect matching above (e.g., Edmond's algorithm) might not be adaptable to the group case. I would perhaps look into some simulated annealing technique or a simple genetic algorithm.

If the number of people in each group (x,y) is relatively even, and the total number of people is relatively high random sampling should work quite well. See here on how to select random rows from a MySQL database:
http://dev.mysql.com/doc/refman/5.0/en/mathematical-functions.html#function_rand
Slight edit, to ensure fairness personally I'd do something like this. Say you know you want n members per team. Then create a local variable which is n*mean where mean is the average skill level per person. Then when your randomly selecting your team members do so within that limit.
E.g.
while(new random record){
if(team_skill+random person skill > n*mean){
next;
}
if(team_skill+random person skill < n*mean && selected team members =n){
team + random person;
break;
}
}

We Keep Coding

PHP, A popular general-purpose scripting language that is especially suited to web development.