Rating Formula Design (conceptual not coding) - php

One element of my site is a rating system. I am puzzled a bit by how best to set up the formula and I hope someone with more math aptitude can help me.
Users upload pictures that are rated 1-10 by other users. The users rating then is the average of those scores.
Simple enough. However, I want to add some system which rewards users for uploading more pictures. SO that the formula would be average of ratings + some function of the number of pictures uploaded.
An example might be, Rating = AVG + .05 * Count
This formula would be somewhat fair for users who have uploaded 1-20 pictures... However, if users upload 2000 pictures they will have bipassed the entire rating system and automatically will have a 10/10.
So, my limited knowledge of post algebra math is failing. What would be some formula that would produce the desired effect? The word "log" keeps bouncing around in my head--but I honesty can't remember anything about why... :)

Just do something like:
avg + numofpics*scale*(.9^numofpics)
This will make it do that as they upload more pictures they get less and less. You can change .09 (the rate of decay) depending on how many pictures you expect the average user to upload.
This the equation used for half life decay

you could do something like:
Rating = Average + (\sum_{i=1}^numofuploads 1/i)*scalefactor
though the sum grows to infinity - the sum grows very slowly
Edit:
The idea is basically the same as with #maxhud's solution you add less points to your rating for every picture, and for simplicity say scaling is 1/3, and for now i use exact not floating point math
1 -> avg + (1/1)*.3 = avg + 1/3
2 -> avg + (1/1+1/2)/3 = avg + (3/2)/3) = avg + 1/2
3 -> avg + (3/2+1/3)/3 = avg + (10/6)/3 = avg + 10/18 = avg + 5/9 ~ avg + .55555
4 -> ...
technically the series (1+1/2+1/3+…) is going to infinity but you'd have to upload a huge amount of pictures to go over 50 - so you'd better choose your scaling factor carefully and give a bit, of thought. If you want to have a maximum of points that can be achieved via uploading this is the WRONG solution. You should rather go with something like
avg + scaling*(.9^n)
where n is the number of pictures. if you could upload infinitely many pictures you would have
avg + scaling*(1/(1-.9)) = avg + 10*scaling
for your rating: which is, as I commented, much better.
ps: I think #maxhud should leave
avg + numofpics*scale*(.9^numofpics)
^^^^^^^^^
because after uploading 10 pictures you have outweight your shrinking growth function.

Consider an alternative approach - you want to reward users by increasing for uploading images but perhaps the user is uploading many many images that are not rated highly. Do you want to reward them if the majority of their images are poor?? Consider stackoverflow as an example - you can answer many many questions, but if they are not considered "good" by the rest of the community your reputation will not increase, no matter how many answers you provide.
It is possible that this is not how you want to do it and want to reward quantity rather than quality, but should you decide the opposite you could try something like
UserRating = 10 * (AverageRating/10 + Scalefactor*((AverageRating/10)^2 * ImageUploadCount)))
You choose scale factor to be what you want and obviously limit the rating as maximum 10. This way, you reward mutiple image uploads, but you reward users with higher quality image uploads too. Consider someone automates uploading images with some type of web bot and all images are considered poor by users - do you want to reward it? This way you can reward multiple uploads, but better quality uploads are treated more favourably. Maybe not what you are looking for but worth considering perhaps - only you can decide....

Related

Ranking players based off the percentages

I have a database that has over 1,000 users and points that rank them on skill. I'm going to take the user with the most points and base off my rankings on that alone. So, say the player has 900 points, I will create 5 "divisions" based off that. The higher the division, the lesser the percentages and harder it is to get into. So, with that being said, let me show you an example.
Here are 5 divisions based from best to least.
Master
Platinum
Gold
Silver
Bronze
Iron
So, the player with the most points would already be in the "Master" division. Then I want to create percentages based off of these divisions. So, for example, here is how I plan on setting up the percentages.
Master 5%
Platinum 10%
Gold 15%
Silver 20%
Bronze 25%
Iron 25%
So as you can see, I need to be able to get the percentages of this and use it on the player who has the most points. So in this case, I need to be able to display this on any user's profile. So I need to take the user's points of the profile I am looking at and compare it with the player who has the most points and form a division for the given player's profile I am looking at.
function rank($rank, $mostpoints) {
$count = $rank / $mostpoints;
}
I am aware it isn't really much. But I know I can finish the rest off with IF statements, however, I want to know the best way to take percents of this number. I need to be able to take the current rank and spit out a division. I realized, however, that I can't just take the percent of Master (5%) and divide it by the most points and have it allocate as the Master. I need a way to be able to break the mostpoints into 5 sections based on the percentages and then do if statements to see if the user belongs in them based on the rank. Can anyone provide any feedback on the most efficient way to do this?

How to calculate luck factor based on bet winchance/result?

I am working on a website where users can bet on events with variable win chance. One of the rquests is to display the "Luck factor" of a certain user, based on his bets.
Here is the definition of the Luck factor:
The luck percentage displayed shows how many bets you have won compared to how many you 'should' have won. For example, if you play 10 times with a 10% chance of winning and win two of the 10 bets, your luck will show as 200%, since you have won twice as many as you 'should' have. Bet size is not taken into account when calculating luck, so it is possible to have a luck less than 100% and still show a profit if your winning bets risked more than your losing bets.
Here is my (MySQL) database structure:
Table bet
Columns:
winchance (0.01 - 99.99)
win (true/false)
The application is written in php, but I am sure a pseudocode example would push me to the right direction.
If I understand your question, right, You can take the average of winning probability using mysql winchance column and real winning ratio, would be (number of wins / total number of bets). Given these two values, luck factor would be real ratio / winchance avg * 100.
For instance, avg win chance is 0.1 and real winning ratio, is 2 / 10 = 0.2, then luck factor is 0.2/0.1 * 100 = 200%. This shall be easily calculated with mysql inbuilt functions itself.

Adding an extra factor (number of clicks) to a Bayesian ranking system

I run a music website for amateur musicians where we have a rating system based on a score out of 10, which is then calculated into an overall score out of 100. We have a "credibility" points system for users which directly influences the average score at the point of rating, but the next step is to implement a chart system which uses this data effectively.
I'll try and explain exactly how it all works so you can see which data I have at my disposal.
A site member rates a track between 1 and 10.
That site member has a "credibility" score, which is just a total of points accumulated for various activities around the site. A user gains, for example, 100 points for giving a rating so the more ratings they give, the higher their "credibility" score. Only the total credibility score is saved in the database, updated each time a user performs an activity with a points reward attached. These individual activities are not stored.
Based on the credibility of this user compared to other users who have rated the track, a weighted average is calculated for the track, which is then stored as a number between 1 and 100 in the tracks table.
In the tracks table, the number of times a track is listened to (i.e. number of plays) is also stored as a total.
So the data I have to work with is:
Overall rating for the track (number between 1 and 100)
Number of ratings for the track
Number of plays for the track
In the chart system I want to create a ranking that uses the above 3 sets of data to create a fair balance between quality (overall rating, normalized with number of ratings) and popularity (number of plays). BUT the system should factor quality more heavily than popularity, so for example the quality aspect makes up 75% of the normalized ranking and popularity 25%.
After a search on this site I found the IMDB Bayesian-style system which is helpful for working out the quality aspect, but how do I add in the popularity (number of plays) and have it balanced in the way I want?
The site is written in PHP and MySQL if that helps.
EDIT: the title says "number of clicks" but this is basically the direct equivalent of "number of plays".
You may want to try the following. The IMDB equation you mentioned uses weighing to lean toward either the average rating of the movie or the average rating of all movies:
WR = (v/(v+m)) × R + (m/(v+m)) × C
So
v << m => v/(v+m) -> 0; m/(v+m) -> 1 => WR -> C
and
v >> m => v/(v+m) -> 1; m/(v+m) -> 0 => WR -> R
This should generally be fair. Calculating a popularity score between 0 and 100 based on the number of plays is pretty tricky unless you really know your data. As a first try calculate the average number of plays avg(p) and the variance var(p) you can then use these to scale the number of plays using a technique call whitening:
WHITE(P) = (p - avg(p))/var(p)
This will give you a score between -1 and 1 by assuming your data looks like a bell curve. You can then scale this to be in the range 0 - 100 by scaling again:
POP = 50 * (1 + WHITE(P))
To combine the score based on some weighting factor w (e.g. 0.75) you'd simply do:
RATING = w x WR + (1 - w) x POP
Play with these and let me know how you get on.
NOTE: this does not account for the fact that a use can "game" the popularity buy playing a track many times. You could get around this by penalising multiple plays of a single song:
deltaP = (1 - (Puser - 1)/TPuser)
Where:
deltaP = Change in # plays
Puser = number of time this user has played this track
TPuser = total number of tracks (not unique) played by the user
So the more times a user plays just the one track the less it counts toward the total number of plays for that track. If the users listening habits are diverse then TPuser will be large and so deltaP will tend back to 1. This still can be gamed but is a good start.

Determine Click-thru Percentage With PHP

I am currently working on a small sponsorship application(PHP/MySql) for my personal blog, and am almost finish, but I am stuck on how to calculate the click-thru rate of my sponsors campaigns.
I was always terrible with working out percentages, so any practical help would be appreciated. The data is stored in the DB as simple numbers.. So as expected, when a page refreshes, or a sponsors ad is clicked, the data updates with an incrementation of 1.
So using these values...say $clicks and $impressions, how would I determine the click-thru rate? What would be the sum I would use to calculate? An example function would really be appreciated.
Kind Regards, Lea
What you're after is the percentage of $impressions that leads to $clicks. The ratio is found by calculating $clicks/$impressions, and then you can multiply by 100 to see the percentage.
As an example, if there are 100 impressions and 1 click, the ratio will be 1/100 = 0.01, and the percentage will be 0.01 * 100 = 1%.

Popularity Algorithm

I'd like to populate the homepage of my user-submitted-illustrations site with the "hottest" illustrations uploaded.
Here are the measures I have available:
How many people have favourited that illustration
votes table includes date voted
When the illustration was uploaded
illustration table has date created
Number of comments (not so good as max comments total about 10 at the moment)
comments table has comment date
I have searched around, but don't want user authority to play a part, but most algorithms include that.
I also need to find out if it's better to do the calculation in the MySQL that fetches the data or if there should be a PHP/cron method every hour or so.
I only need 20 illustrations to populate the home page. I don't need any sort of paging for this data.
How do I weight age against votes? Surely a site with less submission needs less weight on date added?
Many sites that use some type of popularity ranking do so by using a standard algorithm to determine a score and then decaying eternally over time. What I've found works better for sites with less traffic is a multiplier that gives a bonus to new content/activity - it's essentially the same, but the score stops changing after a period of time of your choosing.
For instance, here's a pseudo-example of something you might want to try. Of course, you'll want to adjust how much weight you're attributing to each category based on your own experience with your site. Comments are rare, but take more effort from the user than a favorite/vote, so they probably should receive more weight.
score = (votes / 10) + comments
age = UNIX_TIMESTAMP() - UNIX_TIMESTAMP(date_created)
if(age < 86400) score = score * 1.5
This type of approach would give a bonus to new content uploaded in the past day. If you wanted to approach this in a similar way only for content that had been favorited or commented on recently, you could just add some WHERE constraints on your query that grabs the score out from the DB.
There are actually two big reasons NOT to calculate this ranking on the fly.
Requiring your DB to fetch all of that data and do a calculation on every page load just to reorder items results in an expensive query.
Probably a smaller gotcha, but if you have a relatively small amount of activity on the site, small changes in the ranking can cause content to move pretty drastically.
That leaves you with either caching the results periodically or setting up a cron job to update a new database column holding this score you're ranking by.
Obviously there is some subjectivity in this - there's no one "correct" algorithm for determining the proper balance - but I'd start out with something like votes per unit age. MySQL can do basic math so you can ask it to sort by the quotient of votes over time; however, for performance reasons, it might be a good idea to cache the result of the query. Maybe something like
SELECT images.url FROM images ORDER BY (NOW() - images.date) / COUNT((SELECT COUNT(*) FROM votes WHERE votes.image_id = images.id)) DESC LIMIT 20
but my SQL is rusty ;-)
Taking a simple average will, of course, bias in favor of new images showing up on the front page. If you want to remove that bias, you could, say, count only those votes that occurred within a certain time limit after the image being posted. For images that are more recent than that time limit, you'd have to normalize by multiplying the number of votes by the time limit then dividing by the age of the image. Or alternatively, you could give the votes a continuously varying weight, something like exp(-time(vote) + time(image)). And so on and so on... depending on how particular you are about what this algorithm will do, it could take some experimentation to figure out what formula gives the best results.
I've no useful ideas as far as the actual agorithm is concerned, but in terms of implementation, I'd suggest caching the result somewhere, with a periodic update - if the resulting computation results in an expensive query, you probably don't want to slow your response times.
Something like:
(count favorited + k) * / time since last activity
The higher k is the less weight has the number of people having it favorited.
You could also change the time to something like the time it first appeared + the time of the last activity, this would ensure that older illustrations would vanish with time.

Categories