Algorithm for non-cyclic workforce scheduling - php

There seems to be a large amount of information about Cyclic (or "Rotating") Workforce Scheduling problems. I am searching for an algorithm that will help generate a schedule of employee shifts that does not care what the previous week's schedule looked like. From my research, this sounds like a non-cyclic workforce scheduling problem.
Essentially, I have the employee's availability, their min/max hours, and their requested time off. With that information, I want to create an optimized schedule that caters to the employee's desired availability while also meeting the number of required shifts for each day.
Does anyone have tips on a good algorithm for this purpose? Thanks!

For problems like employee scheduling where there are a lot of constraints on the solution, I prefer approaches that never violate any constraints, or as near as possible. (Some approaches such as genetic crossover will violate constraints and then perform additional operations to fix the solution - this is also a valid approach, but you need to beware of going down a blind alley.)
Two approaches are based on using a greedy algorithm.
The first is to use a semi random greedy algorithm; if you have two choices then ordinarily you would always select the locally optimal choice, but with a semi random greedy approach you introduce the possibility of selecting the choice that isn't locally optimal. For example choice one has a weight of 5 and choice two has a weight of 2; ordinarily you would select choice one, but in this case you would use a random number generator and select choice one if rand(5 + 2) is less than 5, else select choice two. Now run the algorithm several times and take the "best" solution.
The second option is to start with a greedy or semi random greedy solution, and to use a local search algorithm to reassign employee slots in an attempt to improve the solution. For example, if an employee has fewer than their desired hours then bump an employee occupying a slot that's legal for the sub optimal employee and assign the sub optimal emoloyee to it, continuing the search to reassign the bumped employee if need be. Unlike the first solution, this one may not terminate if you're not careful.
The two approaches can be combined, generating several solutions with the semi random greedy approach and then conducting local searches to improve the best results.

Related

Given a list of numbers predict the next in sequence

I'm using PHP, I have a list of numbers with a min of 1 and a max of 10:
1,2,4,10,4,3,1,6,9,8,2,10,5,6,7,3,1...
Is there a way to find the next logic number in the sequence (or at least the possible number/s)?
I think I can loop thru the array and find the one that came up least, but I'm not sure it will be working.
You can create a list of functions that test that list of numbers for a specific pattern yes, however that is much different that what humans do which is to "discover" a pattern. Humans also test for previous patterns they have seen in the past, however we are capable of discovering a pattern we haven't seen before with the algorithms inside are head. If you want the code to try to discover patterns in your list of numbers, that would be Artificial Intelligence coding. It very much does exist, though it's a big topic all together.
I hope that explanation helps :)
Edited:
Here's a link if you are interested in knowing more about Artificial Intelligence coding:
https://www.youtube.com/watch?v=TjZBTDzGeGg&list=PLUl4u3cNGP63gFHB6xb-kVBiQHYe_4hSi

PHP functions for correct statistics of data

I am not skilled in the world of statistics, so I hope this will be easy for someone, my lack of skill also made it very hard to find the correct search terms on this topic so I may have missed my answer in searching. anyway. I am looking at arrays of data, say CPU usage for example. how can i capture accurate information in as few data-points as possible on say, a set of data containing 1-second time intervals on cpu usage over the cores of 1 hr, where the first 30mins where 0% and the second 30 mins are 100%. right now, all i will know in one data-point i can think of is the mean, which is 50%, and not useful at all in this case. also, another case is when the usage graph was like a wave, evenly bouncing up and down between 0-100, yet still giving a mean of 50%. how can i capture this data? thanks.
If I understand your question, it is really more of a statistics question than a programming question. Do you mean, what is the best way to capture a population curve with the fewest variables possible?
Firstly, the assumptions with most standard statistics implies that the system is more or less stable (although, if the system is unstable, the numbers you get will let you know because they will be non-sensical).
The main measures that you need to know statistically are the mean, population size and the standard deviation. From this, you can calculate the rough bell curve defining to population curve, and know the accuracy of the curve based on the scale of the standard deviation.
This gives you a three variable schema for a standard bell curve.
If you want to get in further detail, you can add Cpk, Ppk, which are calculated fields.
Otherwise, you may need to get into non-linear regression and curve fitting which is best handled on a case by case basis (not great for programming).
Check out the following sites for calculating the Cp, Cpk, Pp and Ppk:
http://www.qimacros.com/control-chart-formulas/cp-cpk-formula/
http://www.macroption.com/population-sample-variance-standard-deviation/

Cutting Stock Problem

I'm trying to nest material with the least drop or waste.
Table A
Qty Type Description Length
2 W 16x19 16'
3 W 16x19 12'
5 W 16x19 5'
2 W 5x9 3'
Table B
Type Description StockLength
W 16X19 20'
W 16X19 25'
W 16X19 40'
W 5X9 20'
I've looked all over looking into Greedy Algorithms, Bin Packing, Knapsack, 1D-CSP, branch and bound, Brute force, and others. I'm pretty sure it is a Cutting stock problem. I just need help coming up with the function(s) to run this. I don't just have one stock length but multiple and a user may enter his own inventory of less common lengths. Any help at figuring a function or algorithm to use in PHP to come up with the optimized cutting pattern and stock lengths needed with the least waste would be greatly appreciated.
Thanks
If your question is "gimme the code", I am afraid that you have not given enough information to implement a good solution. If you read the whole of this answer, you will see why.
If your question is "gimme the algorithm", I am afraid you are looking for an answer in the wrong place. This is a technology-oriented site, not an algorithms-oriented one. Even though we programmers do of course understand algorithms (e.g., why it is inefficient to pass the same string to strlen in every iteration of a loop, or why bubble sort is not okay except for very short lists), most questions here are like "how do I use API X using language/framework Y?".
Answering complex algorithm questions like this one requires a certain kind of expertise (including, but not limited to, lots of mathematical ability). People in the field of operations research have worked in this kind of problems more than most of us ever has. Here is an introductory book on the topic.
As an engineer trying to find a practical solution to a real-world problem, I would first get answers for these questions:
How big is the average problem instance you are trying to solve? Since your generic problem is NP-complete (as Jitamaro already said), moderately big problem instances require the use of heuristics. If you are only going to solve small problem instances, you might be able to get away with implementing an algorithm that finds the exact optimum, but of course you would have to warn your users that they should not use your software to solve big problem instances.
Are there any patterns you could use to reduce the complexity of the problem? For example, do the items always or almost always come in specific sizes or quantities? If so, you could implementing a greedy algorithm that focuses on yielding high-quality solutions for common scenarios.
What would be your optimality vs. computational efficiency tradeoff? If you only need a good answer, then you should not waste mental or computational effort in trying to provide an optimal answer. Information, whether provided by a person of by a computer, is only useful if it is available when it is needed.
How much are your customers willing to pay for a high-quality solution? Unlike database or Web programming, which can be done by practically everyone because algorithms are kept to a minimum (e.g. you seldom code the exact procedure by which a SQL database provides the result of a query), operations research does require both mathematical and engineering skills. If you are not charging for them, you are losing money.
This looks to me like a variation of a 1d bin-packing. You may try a best-fit and then try it with different sorting of the table b. Anyway there doesn't exist an solution in 3/2 of the optimum and because this is a NP-complete problem. Here is a nice tutorial: http://m.developerfusion.com/article/5540/bin-packing. I used a lot to solve my problem.

Points system like stackoverflow

I am trying to create a point system in my program similar to stack overflow i.e. when the user does some good deed (activity) his/her points are increased. I am wondering what is the best way to go about implementing this in terms of db schema + logic.
I can think of three options:
Add an extra field called points in the users table, and everytime a user does something, add it to that field (but this will not be able to show an activity of sorts)
Create a function which will run everytime the user does good deed and it calculates from scratch the value and updates the points field
Calculate everytime using a function without any points field.
What is the best way to go about this? Thank you for your time.
Personally, I would use the second option to approach this problem.
The first option limits functionality, so I eliminate that right away.
The third option is inefficient in terms of performance - it is likely that you will be fetching that number a lot, and, if your program is anything like stackoverflow, perhaps showing (calculating) that number many times per pageview/action.
To me, the second option is a decent hybrid solution. Normally, I hate having duplicated data in my system (actions and points, rather than one or the other), but in this case, an integer field is a rather small amount of space per user that saves you a lot of time in recalculating the values unnecessarily.
We must, at times, trade data storage space for performance or vice versa, and I would say that #2 is a trade-off that greatly benefits the application.
This depends very much on the number of expected computations you'll face. In fact, SO apparently uses a method which is similar to your 1) approach, for performance reasons I assume.
This also prevents jumps in the numbers if factors change (such as deleted items which awarded points, or here on SO replies which become community wiki, changes in the point rules, external actions such as joining another account here on SO etc.)
If a recalc solution (2) is what you want, you may implement a "smart" caching by clearing the value (setting it to NULL which would mean "dirty") each time a point modification may take place, and re-computing it when it is NULL, using the cache otherwise. You could also (as a self-correcting measure when non-explicit things happened) clear out the values after an hour, a day or whatever you think firs so that a recalc is forced after a certain time, independently of the "dirty" state.
I would go for 1 and 2 (run in cron on every minute or so).
So that:
- extra field would act as a cache to the amount of point.
- The function to calc the points could be a single sql query that would recalculate the points for all users at once to gain some speed.
I think that recalculating the field each time when point is recieved would be an overkill.
Personally, I'd go with the first option, and add an "Actions" table to keep track of your activity history.
When a user does something good, they get an entry in the "Actions" table, with the action and some point value. The point value can come from another table, or some config file. That same value gets added to the user record.
At any point in time, you could sum up the actions and get the user total, but for performance, simply updating when you add the action record would be simple enough.
How simple is your points system going to be?
I reckon some kind of logging / journalling is good so that you can track activity on a daily /weekly/monthly basis across all users
Check out http://code.google.com/p/userinfuser/
Its open source and allows for you to add points and badges to your application. It has Java, Python, PHP, and Ruby bindings.

Implementing keyword comparison scheme (reverse search)

I have a constantly growing database of keywords. I need to parse incoming text inputs (articles, feeds etc) and find which keywords from the database are present in the text. The database of keywords is much larger than the text.
Since the database is constantly growing (users add more and more keywords to watch for), I figure the best option will be to break the text input into words and compare those against the database. My main dilemma is implementing this comparison scheme (PHP and MySQL will be used for this project).
The most naive implementation would be to create a simple SELECT query against the keywords table, with a giant IN clause listing all the found keywords.
SELECT user_id,keyword FROM keywords WHERE keyword IN ('keyword1','keyword2',...,'keywordN');
Another approach would be to create a hash-table in memory (using something like memcache) and to check against it in the same manner.
Does anyone has any experience with this kind of searching and has any suggestions on how to better implement this? I haven't tried yet any of those approaches, I'm just gathering ideas at this point.
The classic way of searching a text stream for multiple keywords is the Aho-Corasick finite automaton, which uses time linear in the text to be searched. You'll want minor adaptations to recognize strings only on word boundaries, or perhaps it would be simpler just to check the keywords found and make sure they are not embedded in larger words.
You can find an implementation in fgrep. Even better, Preston Briggs wrote a pretty nice implementation in C that does exactly the kind of keyword search you are talking about. (It searches programs for occurrences of 'interesting' identifiers'.) Preston's implementation is distributed as part of the Noweb literate-programming tool. You could find a way to call this code from PHP or you could rewrite it in PHP---the recognize itself is about 220 lines of C, and the main program is another 135 lines.
All the proposed solutions, including Aho-Corasick, have these properties in common:
A preprocessing step that takes time and space proportional to the number of keywords in the database.
A search step that takes time and space proportional to the length of the text plus the number of keywords found.
Aho-Corasick offers considerably better constants of proportionality on the search step, but if your texts are small, this won't matter. In fact, if your texts are small and your database is large, you probably want to minimize the amount of memory used in the preprocessing step. Andrew Appel's DAWG data structure from the world's fastest scrabble program will probably do the trick.
In general,
break the text into words
b. convert words back to canonical root form
c. drop common conjunction words
d. strip duplicates
insert the words into a temporary table then do an inner join against the keywords table,
or (as you suggested) build the keywords into a complex query criteria
It may be worthwhile to cache a 3- or 4-letter hash array with which to pre-filter potential keywords; you will have to experiment to find the best tradeoff between memory size and effectiveness.
I'm not 100% clear on what you're asking, but maybe what you're looking for is an inverted index?
Update:
You can use an inverted index to match multiple keywords at once.
Split up the new document into tokens, and insert the tokens paired with an identifier for the document into the inverted index table. A (rather denormalized) inverted index table:
inverted_index
-----
document_id keyword
If you're searching for 3 keywords manually:
select document_id, count(*) from inverted_index
where keyword in (keyword1, keyword2, keyword3)
group by document_id
having count(*) = 3
If you have a table of the keywords you care about, just use an inner join rather than an in() operation:
keyword_table
----
keyword othercols
select keyword_table.keyword, keyword_table.othercols from inverted_index
inner join keyword_table on keyword_table.keyword=inverted_index.keyword
where inverted_index.document_id=id_of_some_new_document
is any of this closer to what you want?
Have you considered graduating to a fulltext solution such as Sphinx?
I'm talking out of my hat here, because I haven't used it myself. But it's getting a lot of attention as a high-speed fulltext search solution. It will probably scale better than any relational solution you use.
Here's a blog about using Sphinx as a fulltext search solution in MySQL.
I would do 2 things here.
First (and this isn't directly related to the question) I'd break up and partition user keywords by users. Having more tables with fewer data, ideally on different servers for distributed lookups where slices or ranges of users exist on different slices. Aka, all of usera's data exists on slice one, userb on slice two, etc.
Second, I'd have some sort of in-memory hash table to determine existence of keywords. This would likely be federated as well to distribute the lookups. For n keyword-existence servers, hash the keyword and mod it by n then distribute ranges of those keys across all of the memcached servers. This quick way lets you say is keyword x being watched, hash it and determine what server it would live on. Then make the lookup and collect/aggregate keywords being tracked.
At that point you'll at least know which keywords are being tracked and you can take your user slices and perform subsequent lookups to determine which users are tracking which keywords.
In short: SQL is not an ideal solution here.
I hacked up some code for scanning for multiple keywords using a dawg (as suggested above referencing the Scrabble paper) although I wrote it from first principles and I don't know whether it is anything like the AHO algorithm or not.
http://www.gtoal.com/wordgames/spell/multiscan.c.html
A friend made some hacks to my code after I first posted it on the wordgame programmers mailing list, and his version is probably more efficient:
http://www.gtoal.com/wordgames/spell/multidawg.c.html
Scales fairly well...
G

Categories