With all of the daily fantasy games out there, I am looking to see if I can easily implement a platform that will help identify the optimal lineup for a fantasy league based on a salary cap and projected points for each player.
If given a pool of ~500 players and you need to find the highest scoring lineup of within the maximium salary cap restraints.
1 Quarter Back
2 Running Back
3 Wide Receiver
1 Tight End
1 Kicker
1 Defense
Each player is assigned a salary (that changes weekly) and I will assign projected points for those players. I have this information in a MySQL DB and would prefer to use PHP/Pear or JQuery if that's the best option for calculating this.
The Table looks something like this
player_id name position salary ranking projected_points
1 Joe Smith QB 1000 2 21.7
2 Jake Plummer QB 2500 6 11.9
I've tried sorting by projected points and filling in the roster, but it obviously will provide the highest scoring team, but also exceeds the salary cap. I cannot think of a way to have it intelligently remove players and continue to loop through and find the highest scoring lineup based on the salary constraints.
So, is there any PHP or Pear class that you know of that will help "Solve" this type of problem? Any articles you can point me to for reference? I'm not asking for someone to do this, but I've been Googleing for a while and the best solution I currently have is this. http://office.microsoft.com/en-us/excel-help/pick-your-fantasy-football-team-with-solver-HA001124603.aspx and that's using Excel and limited to 200 objects.
I'll suggest two approaches to this problem.
The first is dynamic programming. For brute force, we could initialize a list containing the empty partial team, then, for each successive player, for each partial team currently in the list, add a copy of that partial team with the new player, assuming that this new partial team respects the positional and budget constraints. This is an exponential-time algorithm, but we can reduce the running time by quite a lot (to O(#partial position breakdowns * budget * #players), assuming that all monetary values are integer) if we throw away all but the best possibility so far for each combination of partial position breakdown and budget.
The second is to find an integer programming library callable from PHP that works like Excel's solver. It looks like (e.g.) lpsolve comes with a PHP interface. Then we can formulate an integer program like so.
maximize sum_{player p} value_p x_p
subject to
sum_{quarterback player p} x_p <= 1
sum_{running back player p} x_p <= 2
...
sum_{defense player p} x_p <= 1
sum_{player p} cost_p <= budget
for each player p, x_p in {0, 1} (i.e., x_p is binary)
Related
Consider I am writing a program to objectively select a winner in a competition. There are 'n' human judges secretly assigning a 1st, 2nd, 3rd position ranking to the top three candidates from a pool of 'm' candidates.
The program must then go through the judges decisions, and based on weights assigned to 1st place, 2nd place and 3rd place, each candidate will be rated based on the number of 1st, 2nd, and 3rd place votes they received, multiplied by the appropriate rating for each finishing position.
However, at the start, the program has no idea of what weights are appropriate, so I have created an automated "program" that is intended to "discover" the proper weights based on how the judges would pick the winner from a hypothetical situation.
I present a table where the horizontal axis contains the finishing position, and the judges' codes (e.g. Judge W, Judge X, Judge Y, Judge Z). The vertical axis has three rows (1st place, 2nd place, 3rd place), and at the intersection of each Judge/Row, I have randomly generated a candidate ID (from the set A through F).
After rendering the table, I then ask the judge who THEY would have chosen as the winner (the judge has the option to PASS if there is not sufficient information to choose).
After the judges run through an appropriate number of scenarios, I wish to now take the results of the various runs and use that information to determine the "best fit" for the weighting of 1st, 2nd, and 3rd positions.
Let's say one of the hypothetical grids looks like this:
<table border="1"><tbody><tr><th>Position</th><th>Judge 'W'</th><th>Judge 'X'</th><th>Judge 'Y'</th><th>Judge 'Z'</th></tr><tr><td>1st</td><td><center>A</center></td><td><center>F</center></td><td><center>C</center></td><td><center>B</center></td></tr><tr><td>2nd</td><td><center>D</center></td><td><center>B</center></td><td><center>E</center></td><td><center>D</center></td></tr><tr><td>3rd</td><td><center>C</center></td><td><center>E</center></td><td><center>B</center></td><td><center>C</center></td></tr></tbody></table>
and the human judge picks candidate "B" as the winner. My program should react by calculating the (w1 + w2 + w3) > (w1 + 2w3) (i.e. B better than C) and (w1 + w2 + w3) > (2 w2) (i.e. B better than D), etc.
From these various algebraic comparisons, over a number of "hypothetical scenarios", I want to be able to calculate the optimum values for w1, w2 and w3. And then, at some point when there is enough "good" data, I want to be able to use these "discovered" weights to go back over the training data an identify areas where perhaps the human judges were mistaken.
I am using PHP as the programming language and don't know which functions or possible existing libraries are appropriate to solve this kind of "fuzzy" equation.
I'm looking for some direction to help me tackle this problem.
Thank you for your assistance.
For the winning candidate count how many times he appears in each position, then do the same for all the other candidates. Then write the following formula for each candidate:
GoodForJ=w1*nw1+w2*nw2+w3*nw3>w1*nj1+w2*nj2+w3*nj3
Where nw1-3 are the times the winner appears in each position and nj1-3 are the times the j candidate appears in each position.
If goodForJ is true for all the candidates this means that the tuple of weights is good. Now you just have to try a bounch of weights combinations and find out which one fits. Trying all combinations of weights between 1 and 10 requires 1000 iterations.
To make things a bit fuzzier, for each try you could count how many timrs goodForJ is true and choose the weights that produces the highest score.
I am trying to create a way to rank domain between 1 and 100 based on a bunch of different metrics that range from 1 to 999,999,999. The idea is to use 3 different metrics and come up with a single number that can be accurately used to measure how good or bad a certain domain is.
One of the metrics I am using for this is AlexaRank which ranges from 1 to 999,999 (I think). 1 is obviously better. Another one would be the number of pages indexed by Google, where 1 is bad.
I think the correct way of doing this would be to give a certain base score to the range of numbers. For example, a domain with alexa rank 1 can have a base score of 49.9995, one with AR of 313 can be 46.7648 whereas one with AR of 123000 could be 24.4875 and something with AR 999000 can be 2.5478.
Does anyone know of a exponential logic that I can use for this? Really doesnt matter which language it is(prefer PHP), I would just like some examples of the logic. Any ideas are much appreciated.
Thanks
Alexa from 1 to 99,999,999. If one website scores 1 then it has a 100% score
Google indexed pages (you need to know how many pages the site has to extract total) if indexed is close to total than they get 100%. If indexed are 30 out of 100 total pages then they get 30% obviously
Let's say that alexa total score possible is 200 points and google total score possible is 200 points than we can do this math:
if they scored 70% in alexa then they score 140 here. If they scored 60% in Google indexed pages then they score 120. 140 + 120 = 260 total score.
There are multiple ways of doing this, you just need all the right numbers.
I hope I'm making sense lol.
I run a music website for amateur musicians where we have a rating system based on a score out of 10, which is then calculated into an overall score out of 100. We have a "credibility" points system for users which directly influences the average score at the point of rating, but the next step is to implement a chart system which uses this data effectively.
I'll try and explain exactly how it all works so you can see which data I have at my disposal.
A site member rates a track between 1 and 10.
That site member has a "credibility" score, which is just a total of points accumulated for various activities around the site. A user gains, for example, 100 points for giving a rating so the more ratings they give, the higher their "credibility" score. Only the total credibility score is saved in the database, updated each time a user performs an activity with a points reward attached. These individual activities are not stored.
Based on the credibility of this user compared to other users who have rated the track, a weighted average is calculated for the track, which is then stored as a number between 1 and 100 in the tracks table.
In the tracks table, the number of times a track is listened to (i.e. number of plays) is also stored as a total.
So the data I have to work with is:
Overall rating for the track (number between 1 and 100)
Number of ratings for the track
Number of plays for the track
In the chart system I want to create a ranking that uses the above 3 sets of data to create a fair balance between quality (overall rating, normalized with number of ratings) and popularity (number of plays). BUT the system should factor quality more heavily than popularity, so for example the quality aspect makes up 75% of the normalized ranking and popularity 25%.
After a search on this site I found the IMDB Bayesian-style system which is helpful for working out the quality aspect, but how do I add in the popularity (number of plays) and have it balanced in the way I want?
The site is written in PHP and MySQL if that helps.
EDIT: the title says "number of clicks" but this is basically the direct equivalent of "number of plays".
You may want to try the following. The IMDB equation you mentioned uses weighing to lean toward either the average rating of the movie or the average rating of all movies:
WR = (v/(v+m)) × R + (m/(v+m)) × C
So
v << m => v/(v+m) -> 0; m/(v+m) -> 1 => WR -> C
and
v >> m => v/(v+m) -> 1; m/(v+m) -> 0 => WR -> R
This should generally be fair. Calculating a popularity score between 0 and 100 based on the number of plays is pretty tricky unless you really know your data. As a first try calculate the average number of plays avg(p) and the variance var(p) you can then use these to scale the number of plays using a technique call whitening:
WHITE(P) = (p - avg(p))/var(p)
This will give you a score between -1 and 1 by assuming your data looks like a bell curve. You can then scale this to be in the range 0 - 100 by scaling again:
POP = 50 * (1 + WHITE(P))
To combine the score based on some weighting factor w (e.g. 0.75) you'd simply do:
RATING = w x WR + (1 - w) x POP
Play with these and let me know how you get on.
NOTE: this does not account for the fact that a use can "game" the popularity buy playing a track many times. You could get around this by penalising multiple plays of a single song:
deltaP = (1 - (Puser - 1)/TPuser)
Where:
deltaP = Change in # plays
Puser = number of time this user has played this track
TPuser = total number of tracks (not unique) played by the user
So the more times a user plays just the one track the less it counts toward the total number of plays for that track. If the users listening habits are diverse then TPuser will be large and so deltaP will tend back to 1. This still can be gamed but is a good start.
There are a few hundred of book records in the database and each record has a publish time. In the homepage of the website, I am required to write some codes to randomly pick 10 books and put them there. The requirement is that newer books need to have higher chances of getting displayed.
Since the time is an integer, I am thinking like this to calculate the probability for each book:
Probability of a book to be drawn = (current time - publish time of the book) / ((current time - publish time of the book1) + (current time - publish time of the book1) + ... (current time - publish time of the bookn))
After a book is drawn, the next round of the loop will minus the (current time - publish time of the book) from the denominator and recalculate the probability for each of the remaining books, the loop continues until 10 books have been drawn.
Is this algorithm a correct one?
By the way, the website is written in PHP.
Feel free to suggest some PHP codes if you have a better algorithm in your mind.
Many thanks to you all.
Here's a very similar question that may help: Random weighted choice The solution is in C# but the code is very readable and close to PHP syntax so it should be easy to adapt.
For example, here's how one could do this in MySQL:
First calculate the total age of all books and store it in a MySQL user variable:
SELECT SUM(TO_DAYS(CURDATE())-TO_DAYS(publish_date)) FROM books INTO #total;
Then choose books randomly, weighted by their age:
SELECT book_id FROM (
SELECT book_id, TO_DAYS(CURDATE())-TO_DAYS(publish_date) AS age FROM books
) b
WHERE book_id NOT IN (...list of book_ids chosen so far...)
AND RAND()*#total < b.age AND (#total:=#total-b.age)
ORDER BY b.publish_date DESC
LIMIT 10;
Note that the #total decreases only if a book has passed the random-selection test, because of short-circuiting of AND expressions.
This is not guaranteed to choose 10 books in one pass -- it's not even guaranteed to choose any books on a given pass. So you have to re-run the second step until you've found 10 books. The #total variable retains its decreased value so you don't have to recalculate it.
First off I think your formula will guarantee that earlier books get picked. Try to set your initial probabilities based on:
Age - days since publication
Max(Age) - oldest book in the sample
Book Age(i) - age of book i
... Prob (i) = [Max (age) + e - Book Age (i)] / sum over all i [ Max (age) + e - Book age(i) ]
The value e ensures that the oldest book has some probability of being selected. Now that that is done, you can always recalc the prob of any sample.
Now you have to find an UNBIASED way of picking books. Probably the best way would be to calculate the cumulative distribution using the above then pick a uniform (0,1) r.v. Find where that r.v. is in the cumulative distribution and pick the book nearest to it.
Can't help you on the coding. Make sense?
What I am hoping to achieve is the ability to generate 'teams' of users. I will have x amount of men, weighted (decimal skill weight, like 75.23) and y amount of women (also with a skill weight value).
Given that list of users, I would then take for input the number of teams to make (let us say, 6 teams). Then, I go through the list of x's and y's and organize them so that the best average possible weighted teams are created. I would like to keep the teams balanced (women and men ratio)
I don't want 'stacked' teams, (best skilled in one team). I would like an even distribution of weight.
Curious how I could achieve this in PHP? I'd be using a MySQL database to fetch users with weight values. I would know ahead of time how many users I would have, also how many teams I would want to generate.
I would appreciate any suggestions, or links to a solution if anyone has found something similar like this. I'm just not a math wiz, so I don't know what formula would apply here.
Thanks. I appreciate any input!
EDIT
After reviewing the answers, maybe I was not clear enough, so hopefully this helps a little more.
I want the teams to be roughly equally-sized
I want the average (mean) skill score for each team to be roughly equal
I want the ratio of men to women in each team to be roughly equal (that is to say, if by division, we get a distribution, of 5 men and 3 women per team, I would like to keep that roughly the same). Not really an issue if I sort men first, and women second (or vise-versa).
I don't want a linear approach (team 1 gets highest, team 2, sec highest, team 3.. so on). Tim's method of taking (if 6 teams) 6 people and randomizing and then distributing via linear fashion seems to work out fine.
I'm not entirely clear what you're after here, so I'll recap on what I understand you to be asking. If this is not right, you can clarify your requirements by editing your question:
You have a list of a certain number of men and a certain number of women. Each person has a known skill score. You want to divide these into a certain number of teams, with the following aims:
you want the teams to be roughly equally-sized
you want the average (mean) skill score for each team to be roughly equal
you want the ratio of men to women in each team to be roughly equal
I would have thought that a simple method to achieve this would be:
Create a list of all the men in decreasing order of skill score.
Create a list of all the women in decreasing order of skill score.
Add the list of women to the end of the list of men.
Start at the beginning of the combined list, and allocated each person in turn to a team in a round-robin fashion. (That is to say, allocate the first person to team number one, the second to team number two, and so on until you have allocated one person to each of the teams you wish to create. Then start again with team one, allocating people to each team in order, and so on.)
With this approach, you will be guaranteed the following outcomes:
If possible (i.e. if the number of teams divides the total number of people), the teams will all have the same number of people.
If the teams are not all the same size, the largest team will have exactly one more person than the smallest team.
If possible the teams will all have the same number of men.
If the teams do not have the same number of men, the team with the most men exactly one more man than the team with the least men.
If possible the teams will all have the same number of women.
If the teams do not have the same number of women, the team with the most women exactly one more man than the team with the least women.
Each team will have men with a range of skill scores, from near the top of the range to near the bottom of the range.
Each team will have women with a range of skill scores, from near the top of the range to near the bottom of the range.
With sensible data, the mean skill score for each team will be roughly equal (although team one will have a slightly higher mean score than team two, and so on - there are ways of correcting this).
If this simple approach doesn't meet your requirements, please let us know what else you had in mind.
This is similar to "maximum/minimum weight perfect matching", just that the matching is for more than two elements (note that this is a different weight from what you have (the skill weight), namely, you would assign a weight to a matching (a matching would be a proposed 'team')).
The known algorithms for the perfect matching above (e.g., Edmond's algorithm) might not be adaptable to the group case. I would perhaps look into some simulated annealing technique or a simple genetic algorithm.
If the number of people in each group (x,y) is relatively even, and the total number of people is relatively high random sampling should work quite well. See here on how to select random rows from a MySQL database:
http://dev.mysql.com/doc/refman/5.0/en/mathematical-functions.html#function_rand
Slight edit, to ensure fairness personally I'd do something like this. Say you know you want n members per team. Then create a local variable which is n*mean where mean is the average skill level per person. Then when your randomly selecting your team members do so within that limit.
E.g.
while(new random record){
if(team_skill+random person skill > n*mean){
next;
}
if(team_skill+random person skill < n*mean && selected team members =n){
team + random person;
break;
}
}