Algorithm that creates "teams" based on a numeric skill value

Algorithm that creates "teams" based on a numeric skill value - php

I am building an application that helps manage frisbee "hat tournaments". The idea is people sign up for this "hat tournament". When they sign up, the provide us with a numeric value between 1 and 6 which represents their skill level.
Currently, we are taking this huge list of people who signed up, and manually trying to create teams out of this based on the skill levels of each player. I figured, I could automate this by creating an algorithm that splits up the teams as evenly as possible.
The only data feeding into this is the array of "players" and a desired "number of teams". Generally speaking we are looking at 120 players and 8 teams.
My current thought process is to basically have a running "score" for each team. This running score is the total of all assigned players skill levels. I loop through each skill level. I go through rounds of picks once inside skill level loop. The order of the picks is recalculated each round based on the running score of a team.
This actually works fairly well, but its not perfect. For example, I had a range of 5 pts in my sample data array. I could very easily, manually swap players around and make the discrepancy no more then 1 pt between teams.. the problem is getting that done programatically.
Here is my code thus far: http://pastebin.com/LAi42Brq
Snippet of what data looks like:
[2] => Array
(
[user__id] => 181
[user__first_name] => Stephen
[user__skill_level] => 5
)
[3] => Array
(
[user__id] => 182
[user__first_name] => Phil
[user__skill_level] => 6
)
Can anyone think of a better, easier, more efficient way to do this? Many thanks in advance!!

I think you're making things too complicated. If you have T teams, sort your players according to their skill level. Choose the top T players to be captains of the teams. Then, starting with captain 1, each captain in turn chooses the player (s)he wants on the team. This will probably be the person at the top of the list of unchosen players.
This algorithm has worked in playgrounds (and, I dare say on the frisbee fields of California) for aeons and will produce results as 'fair' as any more complicated pseudo-statistical method.

A simple solution could be to first generating a team selection order, then each team would "select" one of the highest skilled player available. For the next round the order is reversed, the last team to select a player gets first pick and the first team gets the last pick. For each round you reverse the picking order.
First round picking order could be:
A - B - C - D - E
second round would then be:
E - D - C - B - A
and then
A - B - C - D - E etc.

It looks like this problem really is NP-hard, being a variant of the Multiprocessor scheduling problem.
"h00ligan"s suggestions is equivalent to the LPT algorithm.
Another heuristic strategy would be a variation of this algorithm:
First round: pick the best, second round: pair the teams with the worst (add from the end), etc.
With the example "6,5,5,3,3,1" and 2 teams this would give the teams "6,1,5" (=12) and "5,3,3" (=11). The strategy of "h00ligan" would give the teams "6,3,3" (=12) and "5,5,1" (=11).

This problem is unfortunately NP-Hard. Have a look at bin packing which is probably a good place to start and includes an algorithm you can hopefully tweak, this may or may not be useful depending on how "fair" two teams with the same score need to be.

Related

How do I determine the optimum variable weights when ranking the "best" candidate for a prize

Consider I am writing a program to objectively select a winner in a competition. There are 'n' human judges secretly assigning a 1st, 2nd, 3rd position ranking to the top three candidates from a pool of 'm' candidates.
The program must then go through the judges decisions, and based on weights assigned to 1st place, 2nd place and 3rd place, each candidate will be rated based on the number of 1st, 2nd, and 3rd place votes they received, multiplied by the appropriate rating for each finishing position.
However, at the start, the program has no idea of what weights are appropriate, so I have created an automated "program" that is intended to "discover" the proper weights based on how the judges would pick the winner from a hypothetical situation.
I present a table where the horizontal axis contains the finishing position, and the judges' codes (e.g. Judge W, Judge X, Judge Y, Judge Z). The vertical axis has three rows (1st place, 2nd place, 3rd place), and at the intersection of each Judge/Row, I have randomly generated a candidate ID (from the set A through F).
After rendering the table, I then ask the judge who THEY would have chosen as the winner (the judge has the option to PASS if there is not sufficient information to choose).
After the judges run through an appropriate number of scenarios, I wish to now take the results of the various runs and use that information to determine the "best fit" for the weighting of 1st, 2nd, and 3rd positions.
Let's say one of the hypothetical grids looks like this:
<table border="1"><tbody><tr><th>Position</th><th>Judge 'W'</th><th>Judge 'X'</th><th>Judge 'Y'</th><th>Judge 'Z'</th></tr><tr><td>1st</td><td><center>A</center></td><td><center>F</center></td><td><center>C</center></td><td><center>B</center></td></tr><tr><td>2nd</td><td><center>D</center></td><td><center>B</center></td><td><center>E</center></td><td><center>D</center></td></tr><tr><td>3rd</td><td><center>C</center></td><td><center>E</center></td><td><center>B</center></td><td><center>C</center></td></tr></tbody></table>
and the human judge picks candidate "B" as the winner. My program should react by calculating the (w1 + w2 + w3) > (w1 + 2w3) (i.e. B better than C) and (w1 + w2 + w3) > (2 w2) (i.e. B better than D), etc.
From these various algebraic comparisons, over a number of "hypothetical scenarios", I want to be able to calculate the optimum values for w1, w2 and w3. And then, at some point when there is enough "good" data, I want to be able to use these "discovered" weights to go back over the training data an identify areas where perhaps the human judges were mistaken.
I am using PHP as the programming language and don't know which functions or possible existing libraries are appropriate to solve this kind of "fuzzy" equation.
I'm looking for some direction to help me tackle this problem.
Thank you for your assistance.

For the winning candidate count how many times he appears in each position, then do the same for all the other candidates. Then write the following formula for each candidate:
GoodForJ=w1*nw1+w2*nw2+w3*nw3>w1*nj1+w2*nj2+w3*nj3
Where nw1-3 are the times the winner appears in each position and nj1-3 are the times the j candidate appears in each position.
If goodForJ is true for all the candidates this means that the tuple of weights is good. Now you just have to try a bounch of weights combinations and find out which one fits. Trying all combinations of weights between 1 and 10 requires 1000 iterations.
To make things a bit fuzzier, for each try you could count how many timrs goodForJ is true and choose the weights that produces the highest score.

PHP - Group distribution with minimal previous matches

I've been struggling on the following (algorithmic?) problem for days now. I have a list of persons that I need to group evenly. Each "round" of groupings is stored so that the next time (round) I try to group people together, we group them if they were not matched together during a previous round.
Example:
John
Bob
Laura
Lucy
Michael
Mark
1st round
Group 1
John
Bob
Laura
Group 2
Lucy
Michael
Mark
Now for the 2nd round we have to avoid grouping John Bob Laura together (or at least minimize it).
I came up with the solution that works for these small edge cases where I create a pairing matrix with
[John - Bob] = 1 (number of times they got paired together in previous rounds)
[Bob - Laura] = 1
etc...
I then loop through that matrix, for each person I find the lowest number of times that person got paired with someone else and add these 2 to the list. Etc, until everybody got added to that list.
I then split that list into groups of desired number (the group size is the only parameter).
I found out that this doesn't work after a few rounds or with larger "rosters".
I'm getting close to think that this might be a NP problem because I'd have to iterate many many times to find the perfect "list".
Is there an algorithm I should look into? I'm coding this in PHP but Java or pseudo code works too.
A roster containing 36 persons.
1st round, group size = 6
2nd round, group size = 4
I should not have someone paired with someone else more than once.
With my solution, I have around 10 pairings that happened twice.

Calculating possible tournament outcomes

I am trying to write a section of code in PHP which will work out for each team the best and worst possible outcome from a round robin type tournament.
This code will be executed after each round of games and so will lookup the current W-L-T record for each team as well as the future schedule of games for each team (all of this information is already stored in a database).
My initial thought was to run through each permutation of ranking of each team and remembering the extreme limits for each teams performance. However upon further thinking I realise that for the twelve teams in this case that would result in over 479 million permutations (which may take a little time to calculate, let alone being concise code).
I have unfortunately reached, I fear, the limit of my imagination in devising a logic system to deal with this so any help anyone could offer would be great.
Cheers in advance
Edward

I'll assume a loss is worth 0 points, a tie 1 point and a win 2 points.
For each team t
Sort the teams by their current point table so the last place
team(s) come first and the top teams come last. Put all teams tied with t before t.
Let i be the position of team t in this list
From here on I'll name teams by their position in the list. So we have
from left to right, teams currently worse than i, teams tied with team i, team i,
and finally teams better than i.
Make a working copy of your matrix. For the rest of this
iteration I'll implicitly refer to the working copy.
Suppose (in the working copy) that team i has loses all its remaining games.
For j from 0 up to i
Make a backup copy of the working copy.
for( k:=n-1 ; k < j and j is behind or tied with i ; k := k-1 )
If k hasn't played j and j is behind i
suppose that j beats k
Else if k hasn't played j /* and is tied with k */
suppose that j ties k
if j is still behind i
revert to the backup made before the preceding loop
discard the backup copy
for all games j has yet to play suppose j loses
At this point, all remaining games in the working copy are between teams ahead
of team i, assume all remaining games are ties.
Now (if we have really constructed a worst case scenario) the rank of team i
in the working copy is the worst it can do. I.e. team i beats "count
I'm not completely sure this give the exact lower bound. An upper bound would be symmetric.

Tournament bracket

Not sure of the best way to go about this?
I want to create a tournament bracket of 2,4,8,16,32, etc teams.
The winner of the first two will play winner of the next 2 etc.
All the way until there is a winner.
Like this
Can anyone help me?
OK so more information.
Initially I want to come up with a way to create the tournament with the 2,4,8,16,etc.
Then when I have all the users in place, if they are 16 players, there are 8 fixtures.
At this point I will send the fixture to the database.
When all the players that won are through to the next round, i would want another sql query again for the 2 winners that meet.
Can you understand what i mean?

I did something like this a few years ago. This was quite a while ago and I'm not sure I'd do it the same way (it doesn't really scale to double-elimintation or the like) How you output it might be a different question. I resorted to tables as it was in 2002-2003. There are certainly better techniques today.
The amount of rounds in the tournament is log2(players) + 1, as long as players is one of the numbers you specified above. Using this information you can calculate how many rounds there are. The last round contains the final winner.
I stored the player information something like this (tweek this for best practices)
Tournament
Name
Size
Players
Tournament
Name
Position (0 to tournament.size - 1)
Rounds
Tournament
Round
Position (max halves for each round)
Winner (player position)
Note in all my queries below, I don't include the "Tournament = [tournament]" to identify the tournament. They all need it.
It's rather simple to query this with one query and to split it out as needed for the different rounds. You could do something like this to get the next opponent (assuming there is one). For round 1, you'd simply need to get the next/previous player based on if it was even or odd:
SELECT * FROM Players WHERE Position = PlayerPosition + 1
SELECT * FROM Players WHERE Position = PlayerPosition - 1
For the next round, if the user's last Round.Position was even, you'll need to make suer that the next position up has a winner:
SELECT Player FROM Rounds WHERE Position = [playerRoundPosition] - 1
If not, the next player isn't decided, or there's a gap (don't allow gaps!)
If the users last Round.Position was odd, you'll need make sure there's a user below them AND that there's a winner below them, otherwise they should automatically be promoted to the next round (as there is no one to play)
SELECT COUNT(*) FROM Players WHERE Position > [Player.Position]
SELECT Player FROM Rounds WHERE Position = [playerRoundPosition] + 1
On a final note, I'm pretty sure you could use something like the following to reduce the queries you write by using something like:
SELECT Player FROM Rounds WHERE Position + Position % 2 = [playerRoundPosition]
SELECT Player FROM Rounds WHERE Position - Position % 2 = [playerRoundPosition]
Update:
Looking over my original post, I find that the Rounds table was a little ambigous. In reality, it should be named matches. A match is a competition between two players with a winner. The final table should look more like this (only the name changed):
Matches
Tournament
Round
Position (max halves for each round)
Winner (player position)
Hopefully that makes it a bit more clear. When the two players go up against each other (in a match), you store that information in this Matches table. This particular implementation depends on the position of the Match to know which players participated.
I started numbering the rounds at 1 because that was more clear in my implementation. You may choose 0 (or even do something completely different like go backwords), if you choose.
In the first round, match 1 means players 1 and 2 participated. In match 2, the players 3-4 participated. Essentially the first round is simply players position and position + 1 participated. You could also store this information in the rounds table if you need more access to it. Every time I used this data in the program, I needed all the round and player information anyways.
After the first round, you look at the last round of matches. In round 2, match 1, the winners from matches 1 and 2 participate. Round 2, match 2, the winners from match 3 and 4 participate. It should look pretty familiar, except that it uses the match table after round 1. I'm sure there's a more efficent way to do this repetitive task, I just never got enough time to refactor that code (it was refactored, just not that much).

Use arrays and remove the losing teams from the main array. (But keep 'em on a separate array, for reference and reuse purposes).

How can I create 'teams' from a list of weighted 'users' randomly but fairly using PHP?

What I am hoping to achieve is the ability to generate 'teams' of users. I will have x amount of men, weighted (decimal skill weight, like 75.23) and y amount of women (also with a skill weight value).
Given that list of users, I would then take for input the number of teams to make (let us say, 6 teams). Then, I go through the list of x's and y's and organize them so that the best average possible weighted teams are created. I would like to keep the teams balanced (women and men ratio)
I don't want 'stacked' teams, (best skilled in one team). I would like an even distribution of weight.
Curious how I could achieve this in PHP? I'd be using a MySQL database to fetch users with weight values. I would know ahead of time how many users I would have, also how many teams I would want to generate.
I would appreciate any suggestions, or links to a solution if anyone has found something similar like this. I'm just not a math wiz, so I don't know what formula would apply here.
Thanks. I appreciate any input!
EDIT
After reviewing the answers, maybe I was not clear enough, so hopefully this helps a little more.
I want the teams to be roughly equally-sized
I want the average (mean) skill score for each team to be roughly equal
I want the ratio of men to women in each team to be roughly equal (that is to say, if by division, we get a distribution, of 5 men and 3 women per team, I would like to keep that roughly the same). Not really an issue if I sort men first, and women second (or vise-versa).
I don't want a linear approach (team 1 gets highest, team 2, sec highest, team 3.. so on). Tim's method of taking (if 6 teams) 6 people and randomizing and then distributing via linear fashion seems to work out fine.

I'm not entirely clear what you're after here, so I'll recap on what I understand you to be asking. If this is not right, you can clarify your requirements by editing your question:
You have a list of a certain number of men and a certain number of women. Each person has a known skill score. You want to divide these into a certain number of teams, with the following aims:
you want the teams to be roughly equally-sized
you want the average (mean) skill score for each team to be roughly equal
you want the ratio of men to women in each team to be roughly equal
I would have thought that a simple method to achieve this would be:
Create a list of all the men in decreasing order of skill score.
Create a list of all the women in decreasing order of skill score.
Add the list of women to the end of the list of men.
Start at the beginning of the combined list, and allocated each person in turn to a team in a round-robin fashion. (That is to say, allocate the first person to team number one, the second to team number two, and so on until you have allocated one person to each of the teams you wish to create. Then start again with team one, allocating people to each team in order, and so on.)
With this approach, you will be guaranteed the following outcomes:
If possible (i.e. if the number of teams divides the total number of people), the teams will all have the same number of people.
If the teams are not all the same size, the largest team will have exactly one more person than the smallest team.
If possible the teams will all have the same number of men.
If the teams do not have the same number of men, the team with the most men exactly one more man than the team with the least men.
If possible the teams will all have the same number of women.
If the teams do not have the same number of women, the team with the most women exactly one more man than the team with the least women.
Each team will have men with a range of skill scores, from near the top of the range to near the bottom of the range.
Each team will have women with a range of skill scores, from near the top of the range to near the bottom of the range.
With sensible data, the mean skill score for each team will be roughly equal (although team one will have a slightly higher mean score than team two, and so on - there are ways of correcting this).
If this simple approach doesn't meet your requirements, please let us know what else you had in mind.

This is similar to "maximum/minimum weight perfect matching", just that the matching is for more than two elements (note that this is a different weight from what you have (the skill weight), namely, you would assign a weight to a matching (a matching would be a proposed 'team')).
The known algorithms for the perfect matching above (e.g., Edmond's algorithm) might not be adaptable to the group case. I would perhaps look into some simulated annealing technique or a simple genetic algorithm.

If the number of people in each group (x,y) is relatively even, and the total number of people is relatively high random sampling should work quite well. See here on how to select random rows from a MySQL database:
http://dev.mysql.com/doc/refman/5.0/en/mathematical-functions.html#function_rand
Slight edit, to ensure fairness personally I'd do something like this. Say you know you want n members per team. Then create a local variable which is n*mean where mean is the average skill level per person. Then when your randomly selecting your team members do so within that limit.
E.g.
while(new random record){
if(team_skill+random person skill > n*mean){
next;
}
if(team_skill+random person skill < n*mean && selected team members =n){
team + random person;
break;
}
}

We Keep Coding

PHP, A popular general-purpose scripting language that is especially suited to web development.