PHP shift scheduling - sequentially looping through a multidimensional array with numerous subarrays

PHP shift scheduling - sequentially looping through a multidimensional array with numerous subarrays - php

I am trying to design a shift scheduling program. There are about a dozen potential users (+/- a few), all with specific vacation requirements. There are 4 types of shifts (role1, role2, role3, role4); each role must be covered by a person, a single person cannot cover multiple roles, and some users prefer certain shift sequences (eg. role1-role2-role4-role3 vs. role1-role2-role3-role4), though ultimately the shift sequences can be rearranged if it can't be accommodated within the schedule. Some users can't work certain shifts at all. Each user also has a maximum number of shifts - overall, there isn't much flexibility within the system, so I suspect only a limited number of possible valid solutions.
I've created an array that takes into account mandatory user vacations, so for each calendar date I know which users are in theory available for that shift (at baseline):
[2020-07-21] => Array
(
[0] => Array
(
[role1] => userA
[role2] => userB
[role3] => userC
[role4] => userD
)
[1] => Array
(
[role1] => userA
[role2] => userB
[role3] => userD
[role4] => userC
)
[2] => Array
(
[role1] => userA
[role2] => userC
[role3] => userB
[role4] => userD
)
Array is generated by (but this code hasn't had issues - it's when I try to call print_r(cartesian($solutionsArray)); where I run out of memory.
// Use cartesian product to pick one value from each role array, and then exclude arrays that don't have at least 4 unique users
foreach ($globalAvailabilityCalendarArray[$globalAvailabilityCalendarDate] as $role => $availableUsersArray) {
$cartesianProducts = $utilities->cartesian($globalAvailabilityCalendarArray[$globalAvailabilityCalendarDate]);
foreach ($cartesianProducts as $cartesianProduct) {
if(count(array_unique(array_values($cartesianProduct))) !== count($this->roles)) {
continue;
} else {
$solutions[] = $cartesianProduct;
}
}
}
$solutionsArray[$globalAvailabilityCalendarDate] = array_unique($solutions, SORT_REGULAR);
I have a few problems though:
For this time period, there are 63 work days to take into account. Some days only have 4 people available so the possible combinations are small, but some days may have 10+ available people which means hundreds of possible valid combinations. However, the scheduling requirements of each day depend on what combination has been selected for previous days so it doesn't go over an individual worker's maximum number of shifts. Thus, I am trying to sequentially iterate over each subarray until I find the first possible valid solution (and if possible, other valid solutions for users to compare against) - (eg. [2020-07-01][0] ... [2020-07-02][1]... then [2020-07-01][0] ... [2020][07-02][2]). I have tried calculating cartesian product (Finding cartesian product with PHP associative arrays) which worked for finding available combinations for a single day, but my script runs out of memory when applying it to the whole calendar as there are just too many possible sequences. Is there a less memory intensive alternative?
With my array structure, how can I prioritize shift sequences so that users have a good ratio of shifts (target is role1+role2+role4 / role3 = 2.5) and preferred shift sequence (eg. role1-role2-role3-role4 and avoiding sequential busy shifts like role2-role2-role2-role2-role1)?

If the only problem you have is memory usage when using the cartesian function then you could search for another way to do the cartesian product without consuming that much memory. After a quick search I found a possible solution you could use. https://github.com/bpolaszek/cartesian-product
According to its documentation, you can use it like:
print_r(cartesian_product($solutionsArray)->asArray());

Related

Dynamic Programming: Minimum wastage and minimum lot

I am working on a project where I need to create a data in such optimised format that it doesn't slow down the framework too much.
The problem is:
Suppose you have ordered 12 unit from my ecommerce store of a product. And for that product I have 5 different bundles to offer this much quantity from.
Suppose the array of bundles with serial number as key and max units available in that bundle as value is like:
$arr = array(
array('sr_no1'=>5),
array('sr_no2'=>7),
array('sr_no3'=>2),
array('sr_no4'=>9),
array('sr_no5'=>12)
);
Now there are two main conditions to my greedy approach to give away the quantity requested by customer.
there should be minimum wastage like if you order 11 units then you would give 9+2-11=0 wastage instead of giving it from 12-11=1 wastage
value should be chosen from minimum number of lot/bundles like if you order 12 units then there are 5+7-12=0 wastage and 12-12=0 wastage so we'll choose array('sr_no5'=>12) for giving away the requested quantity.
I have been trying to figure out the solution for last 3 days.
Consider the test cases for quantity ordered to be like 12 or 11 or 6
or 35 or 30, etc.
What I need as a result is the arrays that we'll choose to distribute the quantities like array('sr_no5'=>12) for giving away 12 units of quantity ordered and arrays array('sr_no3'=>2),array('sr_no4'=>9) for giving away 11 units of quantity.
I have tried knapsack, greedy, and minimum spanning tree while trying to figure out the solution.
Please find the most optimised solution as we do not want to achieve server time out.
NOTE: all the values above like quantity/unit ordered, no. of
bundles, max available unit in each bundle are variables and can
change for any no. of cases.

Not sure of an algorithm but you could view all possible outcomes by assigning each bundle a binary with, 0 or 1. 11000 would equal 5+7+0+0+0 (12), 00010 would equal 0+0+0+9+0 etc.
Then build a master array based on that pseudo binary value and the total that gives.
Then filter by match (or nearest match) and see which result has least amount of 1's.
It's crude but would work.

It looks like a 3 parts algorithm:
Get an array with all possible bundles combinations
Filter possible solutions: all solutions which gives exactly or more items than needed
Order by preference ( PHP: Sorting Arrays ):
First those with minimal waste
First those which waste less number of bundles
First those which waste smaller bundles
You can optimize the code unifying parts 1 and 2, filtering the possible solutions while creating the array. There are probably many possible programming patterns or even libraries to accomplish such a purpose. Depending on the amount of bundles affecting each case, it would be over skilled to optimize that much.

sorted php array "introspection"

For some reason, I have a sorted php array:
"$arr_questions" = Array [6]
0 Array [6]
1 Array [6]
2 Array [6]
3 Array [6]
4 Array [6]
5 Array [6]
each of the positions is another array. This time it is associative. See position [0]:
0 = Array [6]
question_id 40
question La tercera pregunta del mundo
explanation
choices Array [3]
correct 0
answer 1
Without looping my array, is there any way to access directly this position 0, just knowing one of its properties?
Example... Imagine I have to change some property of the position of the array whose "question_id" property is 40. That is the only I know. I don't know if the question_id property is gonna be in the first or second or which position. And, for example, imagine I want to change the "answer" property to 2.
How can I access directly to that position without looping the whole array. I mean... I don't want to do this:
foreach ($arr_questions as $question){
if ($question["question_id"] == 40){
$question["answer"] == 2;
}
}

A PHP Array lets you access random values by its id.
It is actually a big deal, because in other languages array indices must be always integers.
However, PHP arrays work mostly like other-languages dictionaries, in which your key can be of other data types, like strings.
By that, if you want to be able to access some question, and you know the ID, then you should have constructed the array by letting your question_id be the index of each array entry.
If you can't do that, don't panic.
In the end you will have to make some kind of search, that's true.
But hey, then you have two cases:
a) Your array is big. Wow, in that case, you should run an optimized sorting algorithm, such as mergesort or quicksort, so that you can order your data quickly and then have them already sorted by your wanted field.
b) Your array is not-so-big. I think in that case it's no big deal, and the sorting can slow your application more than it should, and if you want to be quicker, you should cache the results of sorting the questions (if possible) or refactoring the array construction so it uses your wanted key as the array index.
As a side note, you can't map things avoid wasting some CPU time or some RAM space, and usually you can swap one for the other.
I mean, if you store just one array indexed by question_id then you can look up for question_id's in O(1) + O(array-access) time. If O(array-access) is a constant, then you can get to things in O(1). That means constant time, and it is as fast as it can get.
However, if you need other kind of searches you can end up with O(n * log(n)) or O(n²) time complexity.
But, if you had stored as many arrays as ways to order them you should need, you would need only O(1) time to access each of them. But, you would need O(n) space (where n here is the num of features to have direct access to).
That would increment the time to build the arrays (by a constant).

With your situation it's not possible without a loop, but if you change your array structure to this:
array(
39 => array(...),
40 => array(...)
)
Which 39 and 40 are your question_id, then you can access them so fast without any loop.
If you want or have to to keep that structure, then just write a function to get the array, the associative index and the value you want as parameters to search the array and return the found index, so you will not be forced to write this loop over and over ...

No, there is no way to access that element without looping over your array. You might abstract that search into a helper function, however.

Algorithm that creates "teams" based on a numeric skill value

I am building an application that helps manage frisbee "hat tournaments". The idea is people sign up for this "hat tournament". When they sign up, the provide us with a numeric value between 1 and 6 which represents their skill level.
Currently, we are taking this huge list of people who signed up, and manually trying to create teams out of this based on the skill levels of each player. I figured, I could automate this by creating an algorithm that splits up the teams as evenly as possible.
The only data feeding into this is the array of "players" and a desired "number of teams". Generally speaking we are looking at 120 players and 8 teams.
My current thought process is to basically have a running "score" for each team. This running score is the total of all assigned players skill levels. I loop through each skill level. I go through rounds of picks once inside skill level loop. The order of the picks is recalculated each round based on the running score of a team.
This actually works fairly well, but its not perfect. For example, I had a range of 5 pts in my sample data array. I could very easily, manually swap players around and make the discrepancy no more then 1 pt between teams.. the problem is getting that done programatically.
Here is my code thus far: http://pastebin.com/LAi42Brq
Snippet of what data looks like:
[2] => Array
(
[user__id] => 181
[user__first_name] => Stephen
[user__skill_level] => 5
)
[3] => Array
(
[user__id] => 182
[user__first_name] => Phil
[user__skill_level] => 6
)
Can anyone think of a better, easier, more efficient way to do this? Many thanks in advance!!

I think you're making things too complicated. If you have T teams, sort your players according to their skill level. Choose the top T players to be captains of the teams. Then, starting with captain 1, each captain in turn chooses the player (s)he wants on the team. This will probably be the person at the top of the list of unchosen players.
This algorithm has worked in playgrounds (and, I dare say on the frisbee fields of California) for aeons and will produce results as 'fair' as any more complicated pseudo-statistical method.

A simple solution could be to first generating a team selection order, then each team would "select" one of the highest skilled player available. For the next round the order is reversed, the last team to select a player gets first pick and the first team gets the last pick. For each round you reverse the picking order.
First round picking order could be:
A - B - C - D - E
second round would then be:
E - D - C - B - A
and then
A - B - C - D - E etc.

It looks like this problem really is NP-hard, being a variant of the Multiprocessor scheduling problem.
"h00ligan"s suggestions is equivalent to the LPT algorithm.
Another heuristic strategy would be a variation of this algorithm:
First round: pick the best, second round: pair the teams with the worst (add from the end), etc.
With the example "6,5,5,3,3,1" and 2 teams this would give the teams "6,1,5" (=12) and "5,3,3" (=11). The strategy of "h00ligan" would give the teams "6,3,3" (=12) and "5,5,1" (=11).

This problem is unfortunately NP-Hard. Have a look at bin packing which is probably a good place to start and includes an algorithm you can hopefully tweak, this may or may not be useful depending on how "fair" two teams with the same score need to be.

Ordering Combinations for Maximum Effectiveness

So recently I was given a problem, which I have been mulling over and am still unable to solve; I was wondering if anyone here could point me in the right direction by providing me with the psuedo code (or at least a rough outline of the pseudo code) for this problem. PS I'll be building in PHP if that makes a difference...
Specs
There are ~50 people (for this example I'll just call them a,b,c... ) and the user is going to group them into groups of three (people in the groups may overlap), and in the end there will be 50-100 groups (ie {a,b,c}; {d,e,f}; {a,d,f}; {b,c,l}...). *
So far it is easy, it is a matter of building an html form and processing it into a multidimensional array
There are ~15 time slots during the day (eg 9:00AM, 9:20AM, 9:40AM...). Each of these groups needs to meet once during the day. And during one time slot the person cannot be double booked (ie 'a' cannot be in 2 different groups at 9:40AM).
It gets tricky here, but not impossible, my best guess at how to do this would be to brute force it (pick out sets of groups that have no overlap (eg {a,b,c}; {l,f,g}; {q,n,d}...) and then just put each into a time slot
Finally, the schedule which I output needs to be 'optimized', by that I mean that 'a' should have minimal time between meetings (so if his first meeting is at 9:20AM, his second meeting shouldn't be at 2:00PM).
Here's where I am lost, my only guess would be to build many, many schedules and then rank them based on the average waiting time a person has from one meeting to the next
However My 'solutions' (I hesitate to call them that) require too much brute force and would take too long to create. Are there simpler, more elegant solutions?

These are the table laid out, modified for your scenerio
+----User_Details------+ //You may or may not need this
| UID | Particulars... |
+----------------------+
+----User_Timeslots---------+ //Time slots per collumn
| UID | SlotNumber(bool)... | //true/false if the user is avaliable
+---------------------------+ //SlotNumber is replaced by s1, s2, etc
+----User_Arrangements--------+ //Time slots per collumn
| UID | SlotNumber(string)... | //Group session string
+-----------------------------+
Note: That the string in the Arrangement table, was in the following format : JSON
'[12,15,32]' //From SMALLEST to BIGGEST!
So what happens in the arrangement table, was that a script [Or an EXCEL column formula] would go through each slot per session, and randomly create a possible session. Checking all previous sessions for conflicts.
/**
* Randomise a session, in which data is not yet set
**/
function randomizeSession( sesionID ) {
for( var id = [lowest UID], id < [highest UID], id++ ) {
if( id exists ) {
randomizeSingleSession( id, sessionID );
} //else skips
}
}
/**
* Randomizes a single user in a session, without conflicts in previous sessions
**/
function randomizeSingleSession( id, sessionID ) {
convert sessionID to its collumns name =)
get the collumns name of all ther previous session
if( there is data, false, or JSON ) {
Does nothing (Already has data)
}
if( ID is avaliable in time slot table (for this session) ) {
Get all IDs who are avaliable, and contains no data this session
Get all the UID previous session
while( first time || not yet resolved ) {
Randomly chose 2
if( there was conflict in UID previous session ) {
try again (while) : not yet resolved
} else {
resolved
}
}
Registers all 3 users as a group in the session
} else {
Set session result to false (no attendance)
}
}
You will realize the main part of the assignment of groups is via randomization. However, as the amount of sessions increases. There will be more and more data to check against for conflicts. Resulting to a much slower performance. However large being, ridiculously large, to an almost perfect permutation/combination formulation.
EDIT:
This setup will also help ensure, that as long as the user is available, they will be in a group. Though you may have pockets of users, having no user group (a small number). These are usually remedied by recalculating (for small session numbers). Or just manually group them together, even if it is a repeat. (having a few here and there does not hurt). Or alternatively in your case, along with the remainders, join several groups of 3's to form groups of 4. =)
And if this can work for EXCEL with about 100+ ppl, and about 10 sessions. I do not see how this would not work in SQL + PHP. Just that the calculations may actually take some considerable time both ways.

Okay, for those who just join in on this post, please read through all the comments to the question before considering the contents of this answer, as this will very likely fly over your head.
Here is some pseudo code in PHP'ish style:
/* Array with profs (this is one dimensional here for the show, but I assume
it will be multi-dimensional, filled with availability and what not;
For the sake of this example, let me say that the multi-dimensional array
contains the following keys: [id]{[avail_from],[avail_to],[last_ses],[name]}*/
$profs = array_fill(0, $prof_num, "assoc_ids");
// Array with time slots, let's say UNIX stamps of begin time
$times = array_fill(0, $slot_num, "time");
// First, we need to loop through all the time slots
foreach ($times as $slot) {
// See when session ends
$slot_end = $slot + $session_time;
// Now, run through the profs to see who's available
$avail_profs = array(); // Empty
foreach ($profs as $prof_id => $data) {
if (($data['avail_from'] >= $slot) && ($data['avail_to'] >= $slot_end)) {
$avail_prof[$prof_id] = $data['last_ses'];
}
}
/* Reverse sort the array so that the highest numbers (profs who have been
waiting the longest) will be up top */
arsort($avail_profs);
$profs_session = array_slice($avail_profs, 0, 3);
$profs_session_names = array(); // Empty
// Reset the last_ses counters on those profs
foreach ($profs_session as $prof_id => $last_ses) {
$profs[$prof_id]['last_ses'] = 0;
$profs_session_names[0] = $profs[$prof_id]['name'];
}
// Now, loop through all profs to add one to their waiting time
foreach ($profs as $prof_id = > $data) {
$profs[$prof_id]['last_ses']++;
}
print(sprintf('The %s session will be held by: %s, $s, and %s<br />', $slot,
$profs_session_names[0], $profs_session_names[1],
$profs_session_names[2]);
unset ($profs_session, $profs_session_names, $avail_prof);
}
That should print something like:
The 9:40am session will be held by: C. Hicks, A. Hole, and B.E.N. Dover

I see an object model consisting of:
Panelists: a fixed repository of of your the panelists (Tom, Dick, Harry, etc)
Panel: consists of X Panelists (X=3 in your case)
Timeslots: a fixed repository of your time slots. Assuming fixed duration and only occurring on a single day, then all you need is track is start time.
Meeting: consists of a Panel and Timeslot
Schedule: consists of many Meetings
Now, as you have observed, the optimization is the key. To me the question is: "Optimized with respect to what criteria?". Optimal for Tom might means that the Panels on which he is a member lay out without big gaps. But Harry's Panels may be all over the board. So, perhaps for a given Schedule, we compute something like totalMemberDeadTime (= sum of all dead time member gaps in the Schedule). An optimal Schedule is the one that is minimal with respect to this sum
If we are interested in computing a technically optimal schedule among the universe of all schedules, I don't really see an alternative to brute force .
Perhaps that universe of Schedules does not need to be as big as might first appear. It sounds like the panels are constituted first and then the issue is to assign them to Meetings which them constitute a schedule. So, we removed the variability in the panel composition; the full scope of variability is in the Meetings and the Schedule. Still, sure seems like a lot of variability there.
But perhaps optimal with respect to all possible Schedules is more than we really need.
Might we define a Schedule as acceptable if no panelist has total dead time more than X? Or failing that, if no more than X panelists have dead time more than X (can't satisfy everyone, but keep the screwing down to a minimum)? Then the user could assign meeting for panels containing the the more "important" panelists first, and less-important guys simply have to take what they get. Then all we have to do is fine a single acceptable Schedule
Might it be sufficient for your purposes to compare any two Schedules? Combined with an interface (I'm seeing a drag-and-drop interface, but that's clearly beyond the point) that allows the user to constitute a schedule, clone it into a second schedule, and tweak the second one, looking to reduce aggregate dead time until we can find one that is acceptable.
Anyway, not a complete answer. Just thinking out loud. Hope it helps.

php (fuzzy) search matching

if anyone has ever submitted a story to digg, it checks whether or not the story is already submitted, I assume by a fuzzy search.
I would like to implement something similar and want to know if they are using a php class that is open source?
Soundex isnt doing it, sentences/strings can be up to 250chars in length

Unfortunately, doing this in PHP is prohibitively expensive (high CPU and memory utilization.) However, you can certainly apply the algorithm to small data sets.
To specifically expand on how you can create a server meltdown: couple of built-in PHP functions will determine "distance" between strings: levenshtein and similar_text.
Dummy data: (pretend they're news headlines)$titles = <<< EOF
Apple
Apples
Orange
Oranges
Banana
EOF;
$titles = explode("\n", $titles );
At this point, $titles should just be an array of strings. Now, create a matrix and compare each headline against EVERY other headline for similarity. In other words, for 5 headlines, you will get a 5 x 5 matrix (25 entries.) That's where the CPU and memory sink goes in.
That's why this method (via PHP) can't be applied to thousands of entries. But if you wanted to:
$matches = array();
foreach( $titles as $title ) {
$matches[$title] = array();
foreach( $titles as $compare_to ) {
$matches[$title][$compare_to] = levenshtein( $compare_to, $title );
}
asort( $matches[$title], SORT_NUMERIC );
}
At this point what you basically have is a matrix with "text distances." In concept (not in real data) it looks sort of like this table below. Note how there is a set of 0 values that go diagonally - that means that in the matching loop, two identical words are -- well, identical.
Apple Apples Orange Oranges Banana
Apple 0 1 5 6 6
Apples 1 0 6 5 6
Orange 5 6 0 1 5
Oranges 6 5 1 0 5
Banana 6 6 5 5 0
The actual $matches array looks sort of like this (truncated):
Array
(
[Apple] => Array
(
[Apple] => 0
[Apples] => 1
[Orange] => 5
[Banana] => 6
[Oranges] => 6
)
[Apples] => Array
(
...
Anyhow, it's up to you to (by experimentation) determine what a good numerical distance cutoff might mostly match - and then apply it. Otherwise, read up on sphinx-search and use it - since it does have PHP libraries.
Orange you glad you asked about this?

I would suggest taking the users submitted URLs and storing them in multiple parts; domain name, path and query string. Use the PHP parse_url() function to derive the parts of the submitted URL.
Index at least the domain name and path. Then, when a new user submits URL you search your database for a record matching the domain and path. Since the columns are indexed, you will be filtering out first all records that are not in the same domain, and then searching through the remaining records. Depending on your dataset, this should be faster that simply indexing the entire URL. Make sure your WHERE clause is setup in the right order.
If that does not meet your needs I would suggest trying Sphinx. Sphinx is an open source SQL full text search engine that is far faster that MySQL's built in full-text search. It supports stemming and some other nice features.
http://sphinxsearch.com/
You could also take the title or text content of the users submission, run it through a function to generate keywords, and search the database for existing records with those or similar keywords.

You could (depending on the size of your dataset) use mySQL's FULLTEXT search, and look for item(s) that have a high score and are within a certain timeframe, and suggest this/these to the user.
More about score here: MySQL Fulltext Search Score Explained

We Keep Coding

PHP, A popular general-purpose scripting language that is especially suited to web development.