I have a database that has over 1,000 users and points that rank them on skill. I'm going to take the user with the most points and base off my rankings on that alone. So, say the player has 900 points, I will create 5 "divisions" based off that. The higher the division, the lesser the percentages and harder it is to get into. So, with that being said, let me show you an example.
Here are 5 divisions based from best to least.
Master
Platinum
Gold
Silver
Bronze
Iron
So, the player with the most points would already be in the "Master" division. Then I want to create percentages based off of these divisions. So, for example, here is how I plan on setting up the percentages.
Master 5%
Platinum 10%
Gold 15%
Silver 20%
Bronze 25%
Iron 25%
So as you can see, I need to be able to get the percentages of this and use it on the player who has the most points. So in this case, I need to be able to display this on any user's profile. So I need to take the user's points of the profile I am looking at and compare it with the player who has the most points and form a division for the given player's profile I am looking at.
function rank($rank, $mostpoints) {
$count = $rank / $mostpoints;
}
I am aware it isn't really much. But I know I can finish the rest off with IF statements, however, I want to know the best way to take percents of this number. I need to be able to take the current rank and spit out a division. I realized, however, that I can't just take the percent of Master (5%) and divide it by the most points and have it allocate as the Master. I need a way to be able to break the mostpoints into 5 sections based on the percentages and then do if statements to see if the user belongs in them based on the rank. Can anyone provide any feedback on the most efficient way to do this?
Related
I am working on a website where users can bet on events with variable win chance. One of the rquests is to display the "Luck factor" of a certain user, based on his bets.
Here is the definition of the Luck factor:
The luck percentage displayed shows how many bets you have won compared to how many you 'should' have won. For example, if you play 10 times with a 10% chance of winning and win two of the 10 bets, your luck will show as 200%, since you have won twice as many as you 'should' have. Bet size is not taken into account when calculating luck, so it is possible to have a luck less than 100% and still show a profit if your winning bets risked more than your losing bets.
Here is my (MySQL) database structure:
Table bet
Columns:
winchance (0.01 - 99.99)
win (true/false)
The application is written in php, but I am sure a pseudocode example would push me to the right direction.
If I understand your question, right, You can take the average of winning probability using mysql winchance column and real winning ratio, would be (number of wins / total number of bets). Given these two values, luck factor would be real ratio / winchance avg * 100.
For instance, avg win chance is 0.1 and real winning ratio, is 2 / 10 = 0.2, then luck factor is 0.2/0.1 * 100 = 200%. This shall be easily calculated with mysql inbuilt functions itself.
I'm trying to figure out how to build a specific algorithm (ultimately implemented in PHP, but that's less important), but I'm having a hard time wrapping my head around the best way to do the math. Instead of defining a complex industry-specific process, I'll use a crazy metaphor here (the math is what's important). Imagine you're trying to identify the percent chance a specific make of car is parked in a store's parking lot based on the items sold within the store. To begin you take a physical survey of 100,000 store parking lots, recording each unique car make spotted outside, each unique item sold within the store, and a fixed percent relevance that item has to the store (ex: lumber has an 89% relevance to Home Depot, but pencils only have a 23% relevance to Walmart).
There are two parts to what I’m trying to solve. First, I’m trying to figure out the best way to roll-up this data to a specific item, while respecting each relevance percent and the number of confirmed observations (so one spotting doesn’t equal 100% chance, similar to http://www.evanmiller.org/how-not-to-sort-by-average-rating.html ). In other words, if a brand new, never-before-seen store is selling Waterford glasses and cashmere sweaters, from those items we can predict there’s an 89% chance a Mercedes is in the parking lot.
So to recap:
Each item has been seen a specific number of times in a store. For each of those times, there is a different product/store relevance percentage and a list of all car makes in the parking lot. How do I best mathematically calculate the percent chance a specific make is in the parking lot of a brand new store, only based on the items within?
Now the second part of this is getting a bit more complicated by adding another layer of abstraction. If a single person visits 50 stores, and we aggregate all the items in all those stores, we can predict what type of car they drive (ex: lots of camping and hiking stores, so they have a 67% chance of driving a Jeep). Then if they visit a new store and are exposed to a brand new item, for which we have no data, I need to apply that 67% Jeep onto the new item (still respecting the relevance of that item to the store). Then use that item’s less-than-certain Jeep statistic to influence our predictions of parking lots that contain that new item (which was never directly measured). Perhaps this requires us to add a confidence interval of some kind? Or how can we represent that uncertainty, without every one of the millions of items we analyze eventually averaging out to 50%?
I REALLY appreciate your help on this!
I think, you need to build cross-correlation matrix,
where lines are goods, and columns are car types.
Each cell contains normalized coefficient, how to some
good (i.e. diamond ring) is related to car type (Geo or Mercedes).
Details see here:
http://en.wikipedia.org/wiki/Cross-correlation
I have a PHP rating system (1-5), in which, some judges come rate some products. I want the results of these products to be fair. Normally what happens is some judges are very strict and may rate products only in the range of 1-2. While some judges rate products only in range of 4-5. Some judge correctly between 1-5.
Can some one give an idea or help in creating an algorithm for mean judges which scales the judges' ratings and compute the product score.
I thought of taking mean of the judges scores on all products but is that the way to go forward or some one has another good alternative to get fair results.
Edit
The rating system is not for an ecommerce application. Here there are only few judges say 10 who rate all the products. The product may be a song in a contest for example. Some of the judges may be very strict and some very liberal. There maybe several contests, so I have to record ratings of these very strict and liberal judges even for other contests and set a rule for them.
Simply put, you assign a weight to a judge based on the range of their typical votes (note, they must not be aware of this weight, or they will throw the system off.) Judges who always vote a single score get the lowest weight. Judges that give things a wide range of scores are considered more accurate.
This also assumes that these judges judge products with a fair range of quality; so if you give them a bunch of good or bad products and expect a range of vote levels, it might be unrealistic.
What you're looking for is the judge with the highest standard deviation (highest variation) in votes having the highest weight, whereas the judge with the lowest would have the least.
The non-algorithmic solution is (essentially) to run the algorithm on the judges, and then pick, American Idol style, judges that balance each other off to get what feels like an accurate result. In which case, you'd want to note the average vote as well as the standard deviation, and perhaps set three judges, one with the wide standard deviation, and then two narrows, one high and one low (liberal and strict) to judge it. This way they don't feel like they get 'less voice' because they are stricter or looser.
Then again, that could be an impetus for them to be less/more strict - if they are too easy or too hard on the product consistently they 'lose voice'.
It sounds like you may be trying to apply an algorithmic solution to a non-algorithmic problem. I'd think about why some "judges" vote only 1-2 and others vote only 4-5.
One possible cause could be self-selection. For example, people who bought an item online may be more likely to review the item if they were particularly disappointed or particularly pleased with their purchase. If this is your problem, you could try to to encourage shoppers to vote more, so that even those who had a non-extreme experience come back to vote.
Another possible issue may be guidance. Maybe your explanation of the rating system isn't clear to the judges. You can try to add a description of what each rating means, and see if that improves the quality of data.
In summary, any kind of a solution to your rating problem will need to have a "human" component and take into account the full story of how the judges choose ratings and why. There is not a whole lot that a ranking algorithm can do if your input data is poor quality. On the other hand, if your data has decent quality, then taking a mean works quite well.
One unrelated problem with taking a mean is that an item with one 5-star rating will rank above an item with hundred 5-star ratings + one 4-star rating. One simple solution is Laplace Smoothing, which addresses the problem by effectively starting every item with one vote of each value (1,2,3,4,5). You don't display the "smoothed" values, but you use them when sorting. See How Not To Sort By Average Rating post for an alternate solution.
How about truncated mean? Here is a good explanation of the idea.
EDIT
Let's say you have votes like: [1,4,3,2,5,1,1,3,2,4].
You need to sort the array in ascending order, giving you: [1,1,1,2,2,3,3,4,4,5].
Then let's say you want to get rid of 25% of the votes, which is 3 (rounding up). You simply discard three votes from the left and from the right, giving you [2,2,3,3].
Then, use arithmetic mean to get 2.5.
EDIT 2
Depending on your database schema, you could query the database to return the votes in ascending order. Then, calculate the percentage, use array_slice() to help you (read the documentation) and calculating the arithmetic mean is the least of your concerns now.
I'm designing a site and don't know how to rate the system in terms of logic.
Outcome is I want an item with 4 stars with 1000 votes to be ranked higher than an item with 1 vote of 5 stars. However, I don't want an item with 1 star with 1000 votes to be ranked higher than an item with 4 stars and 200 votes.
Anyone have any ideas or advice on what to do?
I found these two questions
Sorting by weighted rating in SQL?
MySQL Rating System - Find Rating
and they have their drawbacks and in the first one I don't understand what the winner means by "You may want to denormalize this rating value into event for performance reasons if you have a lot of ratings coming in." Please share some insight? Thank you!
Here's a quick sketch-up of such a system which works by defining a bonus factor xₙ for each flag number. According to your question you want:
x₄*4*1000 > x₅*1*5
and
x₁*1*1000 < x₄*4*200
Setting the factors to for example x₁=1, x₄=2 and x₅=2 will satisfy this, but you will of course want to adjust it and add the missing factors.
He means, you should put rating-data into the event-table (and thus have redundant data) to optimize it for performance.
See the wiki for Denormalization: http://en.wikipedia.org/wiki/Denormalization
The data you have to determine the rank of items is:
average rating
number of ratings
The hard part is probably to make rules for the ranking. Like: If the average rating for an item > 4 and the number of ratings < 4 treat it like rated 3.9
For convenience, I would put this value (how to treat the items for ranking) in the item-table.
What I am hoping to achieve is the ability to generate 'teams' of users. I will have x amount of men, weighted (decimal skill weight, like 75.23) and y amount of women (also with a skill weight value).
Given that list of users, I would then take for input the number of teams to make (let us say, 6 teams). Then, I go through the list of x's and y's and organize them so that the best average possible weighted teams are created. I would like to keep the teams balanced (women and men ratio)
I don't want 'stacked' teams, (best skilled in one team). I would like an even distribution of weight.
Curious how I could achieve this in PHP? I'd be using a MySQL database to fetch users with weight values. I would know ahead of time how many users I would have, also how many teams I would want to generate.
I would appreciate any suggestions, or links to a solution if anyone has found something similar like this. I'm just not a math wiz, so I don't know what formula would apply here.
Thanks. I appreciate any input!
EDIT
After reviewing the answers, maybe I was not clear enough, so hopefully this helps a little more.
I want the teams to be roughly equally-sized
I want the average (mean) skill score for each team to be roughly equal
I want the ratio of men to women in each team to be roughly equal (that is to say, if by division, we get a distribution, of 5 men and 3 women per team, I would like to keep that roughly the same). Not really an issue if I sort men first, and women second (or vise-versa).
I don't want a linear approach (team 1 gets highest, team 2, sec highest, team 3.. so on). Tim's method of taking (if 6 teams) 6 people and randomizing and then distributing via linear fashion seems to work out fine.
I'm not entirely clear what you're after here, so I'll recap on what I understand you to be asking. If this is not right, you can clarify your requirements by editing your question:
You have a list of a certain number of men and a certain number of women. Each person has a known skill score. You want to divide these into a certain number of teams, with the following aims:
you want the teams to be roughly equally-sized
you want the average (mean) skill score for each team to be roughly equal
you want the ratio of men to women in each team to be roughly equal
I would have thought that a simple method to achieve this would be:
Create a list of all the men in decreasing order of skill score.
Create a list of all the women in decreasing order of skill score.
Add the list of women to the end of the list of men.
Start at the beginning of the combined list, and allocated each person in turn to a team in a round-robin fashion. (That is to say, allocate the first person to team number one, the second to team number two, and so on until you have allocated one person to each of the teams you wish to create. Then start again with team one, allocating people to each team in order, and so on.)
With this approach, you will be guaranteed the following outcomes:
If possible (i.e. if the number of teams divides the total number of people), the teams will all have the same number of people.
If the teams are not all the same size, the largest team will have exactly one more person than the smallest team.
If possible the teams will all have the same number of men.
If the teams do not have the same number of men, the team with the most men exactly one more man than the team with the least men.
If possible the teams will all have the same number of women.
If the teams do not have the same number of women, the team with the most women exactly one more man than the team with the least women.
Each team will have men with a range of skill scores, from near the top of the range to near the bottom of the range.
Each team will have women with a range of skill scores, from near the top of the range to near the bottom of the range.
With sensible data, the mean skill score for each team will be roughly equal (although team one will have a slightly higher mean score than team two, and so on - there are ways of correcting this).
If this simple approach doesn't meet your requirements, please let us know what else you had in mind.
This is similar to "maximum/minimum weight perfect matching", just that the matching is for more than two elements (note that this is a different weight from what you have (the skill weight), namely, you would assign a weight to a matching (a matching would be a proposed 'team')).
The known algorithms for the perfect matching above (e.g., Edmond's algorithm) might not be adaptable to the group case. I would perhaps look into some simulated annealing technique or a simple genetic algorithm.
If the number of people in each group (x,y) is relatively even, and the total number of people is relatively high random sampling should work quite well. See here on how to select random rows from a MySQL database:
http://dev.mysql.com/doc/refman/5.0/en/mathematical-functions.html#function_rand
Slight edit, to ensure fairness personally I'd do something like this. Say you know you want n members per team. Then create a local variable which is n*mean where mean is the average skill level per person. Then when your randomly selecting your team members do so within that limit.
E.g.
while(new random record){
if(team_skill+random person skill > n*mean){
next;
}
if(team_skill+random person skill < n*mean && selected team members =n){
team + random person;
break;
}
}