My problem: I want to make a "kind" lottery-process. This algorithm will distribute prizes evenly if possible. This could be considered unfair to the people who buy a ticket to every prize since he will be more flexible to win the unpopular prizes, but never mind that, we may say that the prizes are roughly the same. The algorithm will help killing variance and reduce the dicerolling to win prizes. (Yep, boring)
I will have N competitions were you can win a prize. The persons, M, can buy a ticket for every N.
So an example, here are prizes and people who have bought tickets:
Prize1=[Pete,Kim, Jim]
Prize2=[Jim, Kim]
Prize3=[Roger, Kim]
Prize4=[Jim]
There are 4 prizes and 4 unique names, so it should be possible to distribute it evenly.
The example may be easy to solve, you should find it out in 15 seconds, but when M and N increase it gets much worse.
I'm trying to make a general algorithm, but it's hard. I need some good tips or even better the solution or link to a solution.
Theory: You have a Bipartite graph. You have to find a Perfect matching. There is a perfect matching in a graph if:
|A| = |B|
The graph satisfies the Hall condition
If a perfect matching exists, you can run the Hungarian algorithm to find it.
You want to look for a job-assignment algorithm, or a hungarian algorithm for example a weighted perfect match in a bipartite graph, or maybe the all-pair floyd warshall algorithm. My idea is that this can be represent as a bipartite graph. This is not an easy to solve task.
Related
There seems to be a large amount of information about Cyclic (or "Rotating") Workforce Scheduling problems. I am searching for an algorithm that will help generate a schedule of employee shifts that does not care what the previous week's schedule looked like. From my research, this sounds like a non-cyclic workforce scheduling problem.
Essentially, I have the employee's availability, their min/max hours, and their requested time off. With that information, I want to create an optimized schedule that caters to the employee's desired availability while also meeting the number of required shifts for each day.
Does anyone have tips on a good algorithm for this purpose? Thanks!
For problems like employee scheduling where there are a lot of constraints on the solution, I prefer approaches that never violate any constraints, or as near as possible. (Some approaches such as genetic crossover will violate constraints and then perform additional operations to fix the solution - this is also a valid approach, but you need to beware of going down a blind alley.)
Two approaches are based on using a greedy algorithm.
The first is to use a semi random greedy algorithm; if you have two choices then ordinarily you would always select the locally optimal choice, but with a semi random greedy approach you introduce the possibility of selecting the choice that isn't locally optimal. For example choice one has a weight of 5 and choice two has a weight of 2; ordinarily you would select choice one, but in this case you would use a random number generator and select choice one if rand(5 + 2) is less than 5, else select choice two. Now run the algorithm several times and take the "best" solution.
The second option is to start with a greedy or semi random greedy solution, and to use a local search algorithm to reassign employee slots in an attempt to improve the solution. For example, if an employee has fewer than their desired hours then bump an employee occupying a slot that's legal for the sub optimal employee and assign the sub optimal emoloyee to it, continuing the search to reassign the bumped employee if need be. Unlike the first solution, this one may not terminate if you're not careful.
The two approaches can be combined, generating several solutions with the semi random greedy approach and then conducting local searches to improve the best results.
I am not skilled in the world of statistics, so I hope this will be easy for someone, my lack of skill also made it very hard to find the correct search terms on this topic so I may have missed my answer in searching. anyway. I am looking at arrays of data, say CPU usage for example. how can i capture accurate information in as few data-points as possible on say, a set of data containing 1-second time intervals on cpu usage over the cores of 1 hr, where the first 30mins where 0% and the second 30 mins are 100%. right now, all i will know in one data-point i can think of is the mean, which is 50%, and not useful at all in this case. also, another case is when the usage graph was like a wave, evenly bouncing up and down between 0-100, yet still giving a mean of 50%. how can i capture this data? thanks.
If I understand your question, it is really more of a statistics question than a programming question. Do you mean, what is the best way to capture a population curve with the fewest variables possible?
Firstly, the assumptions with most standard statistics implies that the system is more or less stable (although, if the system is unstable, the numbers you get will let you know because they will be non-sensical).
The main measures that you need to know statistically are the mean, population size and the standard deviation. From this, you can calculate the rough bell curve defining to population curve, and know the accuracy of the curve based on the scale of the standard deviation.
This gives you a three variable schema for a standard bell curve.
If you want to get in further detail, you can add Cpk, Ppk, which are calculated fields.
Otherwise, you may need to get into non-linear regression and curve fitting which is best handled on a case by case basis (not great for programming).
Check out the following sites for calculating the Cp, Cpk, Pp and Ppk:
http://www.qimacros.com/control-chart-formulas/cp-cpk-formula/
http://www.macroption.com/population-sample-variance-standard-deviation/
In nearly any programming language, if I do $number = rand(1,100) then I have created a flat probability, in which each number has a 1% chance of coming up.
What if I'm trying to abstract something weird, like launching rockets into space, so I want a curved (or angled) probability chart. But I don't want a "stepped" chart. (important: I'm not a math nerd, so there are probably terms or concepts that I'm completely skipping or ignorant of!) An angled chart is fine though.
So, if I wanted a probability that gave results of 1 through 100... 1 would be the most common result. 2 the next most common. In a straight line until a certain point - lets say 50, then the chart angles, and the probability of rolling 51 is less than that of rolling 49. Then it angles again at 75, so the probability of getting a result above 75 is not simply 25%, but instead is some incredibly smaller number, depending on the chart, perhaps only 10% or 5% or so.
Does this question make any sense? I'd specifically like to see how this can be done in PHP, but I wager the required logic will be rather portable.
The short answers to your questions are, yes this makes sense, and yes it is possible.
The technical term for what you're talking about is a probability density function. Intuitively, it's just what it sounds like: It is a function that tells you, if you draw random samples, how densely those samples will cluster (and what those clusters look like.) What you identify as a "flat" function is also called a uniform density; another very common one often built into standard libraries is a "normal" or Gaussian distribution. You've seen it, it's also called a bell curve distribution.
But subject to some limitations, you can have any distribution you like, and it's relatively straightforward to build one from the other.
That's the good news. The bad news is that it's math nerd territory. The ideas behind probability density functions are pretty intuitive and easy to understand, but the full power of working with them is only unlocked with a little bit of calculus. For instance, one of the limitations on your function is that the total probability has to be unity, which is the same as saying that the area under your curve needs to be exactly one. In the exact case you describe, the function is all straight lines, so you don't strictly need calculus to help you with that constraint... but in the general case, you really do.
Two good terms to look for are "Transformation methods" (there are several) and "rejection sampling." The basic idea behind rejection sampling is that you have a function you can use (in this case, your uniform distribution) and a function you want. You use the uniform distribution to make a bunch of points (x,y), and then use your desired function as a test vs the y coordinate to accept or reject the x coordinates.
That makes almost no sense without pictures, though, and unfortunately, all the best ways to talk about this are calculus based. The link below has a pretty good description and pretty good illustrations.
http://www.stats.bris.ac.uk/~manpw/teaching/folien2.pdf
Essentially you need only to pick a random number and then feed into a function, probably exponential, to pick the number.
Figuring out how weighted you want the results to be will make the formula you use different.
Assuming PHP has a random double function, I'm going to call it random.
$num = 100 * pow(random(), 2);
This will cause the random number to multiply by itself twice, and since it returns a number between 0 and 1, it will get smaller, thus increasing the chance to be a lower number. To get the exact ratio you'd just have to play with this format.
To me it seems like you need a logarithmic function (which is curved). You'd still pull a random number, but the value that you'd get would be closer to 1 than 100 most of the time. So I guess this could work:
function random_value($min=0, $max=100) {
return log(rand($min, $max), 10) * 10;
}
However you may want to look into it yourself to make sure.
The easiest way to achieve a curved probability is to think how you want to distribute for example a prize in a game across many winners and loosers. To simplify your example I take 16 players and 4 prizes. Then I make an array with a symbol of the prize (1,2,2,3,3,3,3,3,4,4,4,4,4,4,4) and pick randomly a number out of this array. Mathematically you would have a probability for prize 1 = 1:16, for prize 2 3:16, for prize 3 5:16 and for prize 4 7:16.
I'm having trouble wording my problem to search for it, so if anyone could point me in the right direction it would be appreciated.
I have multiple scores given out of 5 for a series of objects.
How can I find which object has the best overall rating? A similar formula to Amazon's reviews or Reddit's best comments (probably a lot more basic?), so not necessarily finding the highest average score but incorporating the number of reviews given to get the "best".
Any ideas?
This seems to be a classical application of the Friedman test: "n wine judges each rate k different wines. Are any wines ranked consistently higher or lower than the others?" Friedman test is implemented in many statistical packages, e.g., in R: friedman.test.
Friedman test will return the p-value. If the p-value is not siginificant there is no reason to assume that some of the objects are consistently ranked higher than other ones. If the p-value is significant, then you know that some objects have been ranked higher than others but you still do not know which ones. Hence, appropriate post-hoc multiple comparisons tests should be performed.
A number of different post-hoc tests can be performed, see e.g., for R code of an example post-hoc analysis http://www.r-statistics.com/2010/02/post-hoc-analysis-for-friedmans-test-r-code/
I'm trying to nest material with the least drop or waste.
Table A
Qty Type Description Length
2 W 16x19 16'
3 W 16x19 12'
5 W 16x19 5'
2 W 5x9 3'
Table B
Type Description StockLength
W 16X19 20'
W 16X19 25'
W 16X19 40'
W 5X9 20'
I've looked all over looking into Greedy Algorithms, Bin Packing, Knapsack, 1D-CSP, branch and bound, Brute force, and others. I'm pretty sure it is a Cutting stock problem. I just need help coming up with the function(s) to run this. I don't just have one stock length but multiple and a user may enter his own inventory of less common lengths. Any help at figuring a function or algorithm to use in PHP to come up with the optimized cutting pattern and stock lengths needed with the least waste would be greatly appreciated.
Thanks
If your question is "gimme the code", I am afraid that you have not given enough information to implement a good solution. If you read the whole of this answer, you will see why.
If your question is "gimme the algorithm", I am afraid you are looking for an answer in the wrong place. This is a technology-oriented site, not an algorithms-oriented one. Even though we programmers do of course understand algorithms (e.g., why it is inefficient to pass the same string to strlen in every iteration of a loop, or why bubble sort is not okay except for very short lists), most questions here are like "how do I use API X using language/framework Y?".
Answering complex algorithm questions like this one requires a certain kind of expertise (including, but not limited to, lots of mathematical ability). People in the field of operations research have worked in this kind of problems more than most of us ever has. Here is an introductory book on the topic.
As an engineer trying to find a practical solution to a real-world problem, I would first get answers for these questions:
How big is the average problem instance you are trying to solve? Since your generic problem is NP-complete (as Jitamaro already said), moderately big problem instances require the use of heuristics. If you are only going to solve small problem instances, you might be able to get away with implementing an algorithm that finds the exact optimum, but of course you would have to warn your users that they should not use your software to solve big problem instances.
Are there any patterns you could use to reduce the complexity of the problem? For example, do the items always or almost always come in specific sizes or quantities? If so, you could implementing a greedy algorithm that focuses on yielding high-quality solutions for common scenarios.
What would be your optimality vs. computational efficiency tradeoff? If you only need a good answer, then you should not waste mental or computational effort in trying to provide an optimal answer. Information, whether provided by a person of by a computer, is only useful if it is available when it is needed.
How much are your customers willing to pay for a high-quality solution? Unlike database or Web programming, which can be done by practically everyone because algorithms are kept to a minimum (e.g. you seldom code the exact procedure by which a SQL database provides the result of a query), operations research does require both mathematical and engineering skills. If you are not charging for them, you are losing money.
This looks to me like a variation of a 1d bin-packing. You may try a best-fit and then try it with different sorting of the table b. Anyway there doesn't exist an solution in 3/2 of the optimum and because this is a NP-complete problem. Here is a nice tutorial: http://m.developerfusion.com/article/5540/bin-packing. I used a lot to solve my problem.