I'm having trouble figuring out an algorithm...
I've got a long series of GPS data that records Time, Speed, Distance, at 1 second intervals. Assume the distance is meters, and speed in m/s. There may be upwards of 2 hours of data, or 7200 points. The "time" field in here is mainly just for reference.
So, the first 5 seconds would be values something like this, with [1-5] being seconds.
$data = array(
1 : array('distance'=>0, 'time'=>'2014-01-09 17:50:00', 'speed'=>0.0),
2 : array('distance'=>2, 'time'=>'2014-01-09 17:50:01', 'speed'=>2.0),
3 : array('distance'=>6, 'time'=>'2014-01-09 17:50:02', 'speed'=>4.0),
4 : array('distance'=>10, 'time'=>'2014-01-09 17:50:03', 'speed'=>4.0),
5 : array('distance'=>12, 'time'=>'2014-01-09 17:50:04', 'speed'=>2.0)
);
I'd like to convert this to data that is listed at 1 meter intervals instead, like this with [1-6] being meters.
$data = array(
1 : array('seconds'=>1.5, 'time'=>'2014-01-09 17:50:01.500', 'speed'=>.666),
2 : array('seconds'=>2, 'time'=>'2014-01-09 17:50:02', 'speed'=>2.0),
3 : array('seconds'=>2.25, 'time'=>'2014-01-09 17:50:02.250', 'speed'=>4.0),
4 : array('seconds'=>2.5, 'time'=>'2014-01-09 17:50:02.500', 'speed'=>4.0),
5 : array('seconds'=>2.75, 'time'=>'2014-01-09 17:50:02.750', 'speed'=>4.0),
6 : array('seconds'=>3, 'time'=>'2014-01-09 17:50:03', 'speed'=>4.0)
);
This can be done w/o the time field of course. I'm having trouble with the calculation, since it definitely isn't 1-to-1. If we start with 7200 seconds of data, we could end up with more or less depending on the distance covered (more or less than 7200 meters).
EDIT (01/10/2014)
Below are the actual implementations of the two methods. I'm actually having trouble deciding which I like better, the iterative or recursive method. I may go with the iterative
METHOD 1, iterative (#Ezequiel Muns, with very minor modifications by me):
function timeToDistance($data) {
if(sizeof($data) == 0){ return; }
$startTime = $data[0]['time'];
$prev = null;
$result = array();
foreach ($data as $secs => $row) {
$row['seconds'] = $secs; // to simplify passing in secs
if ($prev == null) {
// make sure we have a pair
$prev = array( 'distance'=>0 );
}
foreach (distanceRowsBetween($startTime,$prev, $row) as $dist => $distRow) {
$result[$dist] = $distRow;
}
$prev = $row;
}
return $result;
}
function distanceRowsBetween($startTime,$prevRow, $nextRow) {
// Return the by-distance rows that are between $prevRow (exclusive)
// and $nextRow (inclusive)
$rows = array();
$currDist = $prevRow['distance'];
while (true) {
// try to move to the next whole unit of distance
$dDist = ceil($currDist) - $currDist;
$dDist = $dDist == 0.0? 1.0 : $dDist; // dDist is 1 unit if currDist is whole
$currDist += $dDist;
if ($currDist > $nextRow['distance'])
break;
$currSpeed = $nextRow['speed'];
$currSecs = strtotime($nextRow['time']) - strtotime($startTime);
$currTime = $nextRow['time'];
$rows[$currDist] = array(
'speed' => $currSpeed,
'seconds' => $currSecs,
'time' => $currTime,
);
}
return $rows;
}
METHOD 2, recursive (#Nathaniel Ford pseudocode, me actual code):
function data2dist($time_data = array()){
$dist_data = array();
if(sizeof($time_data) == 0){ return $dist_data; }
$start_point = array_shift($time_data);
$start_time = $start_point['time'];
data2dist_sub($start_time, $time_data,$dist_data,$start_point);
return $dist_data;
}
function data2dist_sub($start_time,&$time_data, &$dist_data, $start_point = array()){
if(sizeof($time_data) == 0 && !isset($start_point)){
return;
}
if(sizeof($dist_data) == 0){
$prev_dist = 0;
} else {
$prev_dist = $dist_data[sizeof($dist_data)-1]['distance'];
}
// since distances are accumulating, get curr distance by subtracting last one
$point_dist = $start_point['distance'] - $prev_dist;
if($point_dist == 1){
// exactly 1: perfect, add and continue
$dist_data[] = $start_point;
$start_point = array_shift($time_data);
} else if($point_dist > 1){
// larger than 1: effectively remove 1 from current point and send it forward
$partial_point = $start_point;
$partial_point['distance'] = 1 + $prev_dist;
$dist_data[] = $partial_point;
} else if($point_dist < 1){
// less than 1, carry forward to the next item and continue (minor: this partial speed is absorbed into next item)
$start_point = array_shift($time_data);
if(!isset($start_point)){ return; }
$start_point['distance'] += $point_dist;
}
data2dist_sub($start_time,$time_data,$dist_data,$start_point);
}
You can simplify this by noting that for every contiguous pair of by-time rows you need to calculate 0 or more by-distance rows, and these depend solely on those two by-time rows.
So start with a function to do this simpler calculation, this is a skeleton leaving the calculation of the transformed 'seconds', 'speed' and 'time' values out for simplicity.
function distanceRowsBetween($prevRow, $nextRow) {
// Return the by-distance rows that are between $prevRow (exclusive)
// and $nextRow (inclusive)
$rows = array();
$currDist = $prevRow['distance'];
while (true) {
// try to move to the next whole unit of distance
$dDist = ceil($currDist) - $currDist;
$dDist = $dDist == 0.0? 1.0 : $dDist; // dDist is 1 unit if currDist is whole
$currDist += $dDist;
if ($currDist > $nextRow['distance'])
break;
// calculate $currSecs at distance $currDist
// calculate $currSpeed
// calculate $currTime
$rows[$currDist] = array(
'speed' => $currSpeed,
'seconds' => $currSecs,
'time' => $currTime,
);
}
return $rows;
}
Now that you have this all that remains is iterating over each contiguous pair in the input and accumulate resulting by-distance rows:
function timeToDistance($data) {
$prev = null;
$result = array();
foreach ($data as $secs => $row) {
$row['seconds'] = $secs; // to simplify passing in secs
if ($prev == null) {
$prev = $row; // make sure we have a pair
continue;
}
foreach (distanceRowsBetween($prev, $row) as $dist => $distRow) {
$result[$dist] = $distRow;
}
$prev = $row;
}
return $result;
}
Note in this function I am populating and passing in the current 'seconds' value in the row, to reduce the number of parameters passed into the previous function.
This is a bit of a mind-bender, and there are a couple of edge cases that make it difficult. However, your basic algorithm should boil down to:
Take in an array of by-Time data points
Create a new array of by-Distance data points
Create a first by-Distance data point with 'zero' speed/distance
Pass this to your subfunction
Subfunction (Takes by-Time array, by-Distance array and 'start point')
Take the first by-Time data point and 'add' it to the by-Distance data point, call this 'temp'
Convert to seconds/speed
If distance covered by temp is exactly 1, add this new array to the by-Distance array
If it is more than one, subtract the portion that would equal one
back-calculate distance/speed/time, add to by-Distance array
Recurse into the subfunction, using the remainder as your new start point
If it is less than one
Recurse into the subfunction, using the modified start point as new start point
Note that the sub-function will need to use mutable copies of the arrays: the by-Time array should slowly shrink and the by-Distance array grow. Also, you will want to trampoline the function (rather than use straight recursion) because with 7200 datapoints you will probably have more than that in stack frames, and you run into a potential memory problem.
Related
Example Data
For this question, let's assume the following items:
Items: Apple, Banana, Carrot, Steak, Onion
Values: 2, 2, 4, 5, 3
Weights: 3, 1, 3, 4, 2
Max Weight: 7
Objective:
The MCKP is a type of Knapsack Problem with the additional constraint that "[T]he items are subdivided into k classes... and exactly one item must be taken from each class"
I have written the code to solve the 0/1 KS problem with dynamic programming using recursive calls and memoization. My question is whether it is possible to add this constraint to my current solution? Say my classes are Fruit, Vegetables, Meat (from the example), I would need to include 1 of each type. The classes could just as well be type 1, 2, 3.
Also, I think this can be solved with linear programming and a solver, but if possible, I'd like to understand the answer here.
Current Code:
<?php
$value = array(2, 2, 4, 5, 3);
$weight = array(3, 1, 3, 4, 2);
$maxWeight = 7;
$maxItems = 5;
$seen = array(array()); //2D array for memoization
$picked = array();
//Put a dummy zero at the front to make things easier later.
array_unshift($value, 0);
array_unshift($weight, 0);
//Call our Knapsack Solver and return the sum value of optimal set
$KSResult = KSTest($maxItems, $maxWeight, $value, $weight);
$maxValue = $KSResult; //copy the result so we can recreate the table
//Recreate the decision table from our memo array to determine what items were picked
//Here I am building the table backwards because I know the optimal value will be at the end
for($i=$maxItems; $i > 0; $i--) {
for($j=$maxWeight; $j > 0; $j--) {
if($seen[$i][$j] != $seen[$i-1][$j]
&& $maxValue == $seen[$i][$j]) {
array_push($picked, $i);
$maxValue -= $value[$i];
break;
}
}
}
//Print out picked items and max value
print("<pre>".print_r($picked,true)."</pre>");
echo $KSResult;
// Recursive formula to solve the KS Problem
// $n = number of items to check
// $c = total capacity of bag
function KSTest($n, $c, &$value, &$weight) {
global $seen;
if(isset($seen[$n][$c])) {
//We've seen this subproblem before
return $seen[$n][$c];
}
if($n === 0 || $c === 0){
//No more items to check or no more capacity
$result = 0;
}
elseif($weight[$n] > $c) {
//This item is too heavy, check next item without this one
$result = KSTest($n-1, $c, $value, $weight);
}
else {
//Take the higher result of keeping or not keeping the item
$tempVal1 = KSTest($n-1, $c, $value, $weight);
$tempVal2 = $value[$n] + KSTest($n-1, $c-$weight[$n], $value, $weight);
if($tempVal2 >= $tempVal1) {
$result = $tempVal2;
//some conditions could go here? otherwise use max()
}
else {
$result = $tempVal1;
}
}
//memo the results and return
$seen[$n][$c] = $result;
return $result;
}
?>
What I've Tried:
My first thought was to add a class (k) array, sort the items via class (k), and when we choose to select an item that is the same as the next item, check if it's better to keep the current item or the item without the next item. Seemed promising, but fell apart after a couple of items being checked. Something like this:
$tempVal3 = $value[$n] + KSTest($n-2, $c-$weight[$n]);
max( $tempVal2, $tempVal3);
Another thought is that at the function call, I could call a loop for each class type and solve the KS with only 1 item at a time of that type + the rest of the values. This will definitely be making some assumptions thought because the results of set 1 might still be assuming multiples of set 2, for example.
This looks to be the equation (If you are good at reading all those symbols?) :) and a C++ implementation? but I can't really see where the class constraint is happening?
The c++ implementation looks ok.
Your values and weights which are 1 dimensional array in your current PHP implementation will become 2 dimensional.
So for example,
values[i][j] will be value of j th item in class i. Similarly in case of weights[i][j]. You will be taking only one item for each class i and move forward while maximizing the condition.
The c++ implementation also does an optimization in memo. It only keeps 2 arrays of size respecting the max_weight condition, which are current and previous states. This is because you only need these 2 states at a time to compute present state.
Answers to your doubts:
1)
My first thought was to add a class (k) array, sort the items via
class (k), and when we choose to select an item that is the same as
the next item, check if it's better to keep the current item or the
item without the next item. Seemed promising, but fell apart after a
couple of items being checked. Something like this: $tempVal3 =
$value[$n] + KSTest($n-2, $c-$weight[$n]); max( $tempVal2, $tempVal3);
This won't work because there could be some item in class k+1 where you take a optimal value and to respect constraint you need to take a suboptimal value for class k. So sorting and picking the best won't work when the constraint is hit. If the constraint is not hit you can always pick the best value with best weight.
2)
Another thought is that at the function call, I could call a loop for
each class type and solve the KS with only 1 item at a time of that
type + the rest of the values.
Yes you are on the right track here. You will assume that you had already solved for first k classes. Now you will try extending using the values of k+1 class respecting the weight constraint.
3)
... but I can't really see where the class constraint is happening?
for (int i = 1; i < weight.size(); ++i) {
fill(current.begin(), current.end(), -1);
for (int j = 0; j < weight[i].size(); ++j) {
for (int k = weight[i][j]; k <= max_weight; ++k) {
if (last[k - weight[i][j]] > 0)
current[k] = max(current[k],
last[k - weight[i][j]] + value[i][j]);
}
}
swap(current, last);
}
In the above c++ snippet, the first loop iterates on class, the second loop iterates on values of class and the third loop extends the current state current using the previous state last and only 1 item j with class i at a time. Since you are only using previous state last and 1 item of the current class to extend and maximize, you are following the constraint.
Time complexity:
O( total_items x max_weight) which is equivalent to O( class x max_number_of_items_in_a_class x max_weight)
So I am not a php programmer but I will try to write a pseudocode with good explanation.
In the original problem each cell i, j meaning was: "Value of filling the knapsack with items 1 to i until it reach capacity j", the solution in the link you have provided defines each cell as "Value of filling the knapsack with items from buckets 1 to i until it reach capacity j". Notice that in this variation there is not such this as not taking an element from a class.
So on each step (each call for KSTest with $n, $c), we need to find which element to pick from the n'th class such that the weight of this element is less than c and it's value + KSTest(n - 1, c - w) is the greatest.
So I think you should only change the else if and else statements to something like:
else {
$result = 0
for($i=0; $i < $number_of_items_in_nth_class; $i++) {
if ($weight[$n][$i] > $c) {
//This item is too heavy, check next item
continue;
}
$result = max($result, KSTest($n-1, $c - $weight[$n][$i], $value, $weight));
}
}
Now two disclaimers:
I do not code in php so this code will not run :)
This is not the implementation given in the link you provided, TBH I didn't understood why the time complexity of their algorithm is so small (and what is C) but this implementation should work since it is following the definition of the recursive formula given.
The time complexity of this should be O(max_weight * number_of_classes * size_of_largerst_class).
This is my PHP solution. I've tried to comment the code in a way that it's easy to follow.
Update:
I updated the code because the old script was giving unreliable results. This is cleaner and has been thoroughly tested. Key takeaways are that I use two memo arrays, one at the group level to speed up execution and one at the item level to reconstruct the results. I found any attempts to track which items are being chosen as you go are unreliable and much less efficient. Also, isset() instead of if($var) is essential for checking the memo array because the previous results might have been 0 ;)
<?php
/**
* Multiple Choice Knapsack Solver
*
* #author Michael Cruz
* #version 1.0 - 03/27/2020
**/
class KS_Solve {
public $KS_Items;
public $maxValue;
public $maxWeight;
public $maxItems;
public $finalValue;
public $finalWeight;
public $finalItems;
public $finalGroups;
public $memo1 = array(); //Group memo
public $memo2 = array(); //Item memo for results rebuild
public function __construct() {
//some default variables as an example.
//KS_Items = array(Value, Weight, Group, Item #)
$this->KS_Items = array(
array(2, 3, 1, 1),
array(2, 1, 1, 2),
array(4, 3, 2, 3),
array(5, 4, 2, 4),
array(3, 2, 3, 5)
);
$this->maxWeight = 7;
$this->maxItems = 5;
$this->KS_Wrapper();
}
public function KS_Wrapper() {
$start_time = microtime(true);
//Put a dummy zero at the front to make things easier later.
array_unshift($this->KS_Items, array(0, 0, 0, 0));
//Call our Knapsack Solver
$this->maxValue = $this->KS_Solver($this->maxItems, $this->maxWeight);
//Recreate the decision table from our memo array to determine what items were picked
//ksort($this->memo2); //for debug
for($i=$this->maxItems; $i > 0; $i--) {
//ksort($this->memo2[$i]); //for debug
for($j=$this->maxWeight; $j > 0; $j--) {
if($this->maxValue == 0) {
break 2;
}
if($this->memo2[$i][$j] == $this->maxValue
&& $j == $this->maxWeight) {
$this->maxValue -= $this->KS_Items[$i][0];
$this->maxWeight -= $this->KS_Items[$i][1];
$this->finalValue += $this->KS_Items[$i][0];
$this->finalWeight += $this->KS_Items[$i][1];
$this->finalItems .= " " . $this->KS_Items[$i][3];
$this->finalGroups .= " " . $this->KS_Items[$i][2];
break;
}
}
}
//Print out the picked items and value. (IMPLEMENT Proper View or Return!)
echo "<pre>";
echo "RESULTS: <br>";
echo "Value: " . $this->finalValue . "<br>";
echo "Weight: " . $this->finalWeight . "<br>";
echo "Item's in KS:" . $this->finalItems . "<br>";
echo "Selected Groups:" . $this->finalGroups . "<br><br>";
$end_time = microtime(true);
$execution_time = ($end_time - $start_time);
echo "Results took " . sprintf('%f', $execution_time) . " seconds to execute<br>";
}
/**
* Recursive function to solve the MCKS Problem
* $n = number of items to check
* $c = total capacity of KS
**/
public function KS_Solver($n, $c) {
$group = $this->KS_Items[$n][2];
$groupItems = array();
$count = 0;
$result = 0;
$bestVal = 0;
if(isset($this->memo1[$group][$c])) {
$result = $this->memo1[$group][$c];
}
else {
//Sort out the items for this group
foreach($this->KS_Items as $item) {
if($item[2] == $group) {
$groupItems[] = $item;
$count++;
}
}
//$k adjusts the index for item memoization
$k = $count - 1;
//Find the results of each item + items of other groups
foreach($groupItems as $item) {
if($item[1] > $c) {
//too heavy
$result = 0;
}
elseif($item[1] >= $c && $group != 1) {
//too heavy for next group
$result = 0;
}
elseif($group == 1) {
//Just take the highest value
$result = $item[0];
}
else {
//check this item with following groups
$result = $item[0] + $this->KS_Solver($n - $count, $c - $item[1]);
}
if($result == $item[0] && $group != 1) {
//No solution with the following sets, so don't use this item.
$result = 0;
}
if($result > $bestVal) {
//Best item so far
$bestVal = $result;
}
//memo the results
$this->memo2[$n-$k][$c] = $result;
$k--;
}
$result = $bestVal;
}
//memo and return
$this->memo1[$group][$c] = $result;
return $result;
}
}
new KS_Solve();
?>
i am working on an algorithm for sorting teams based on highest number of score. Teams are to be generated from a list of players. The conditions for creating a team is
It should have 6 players.
The collective salary for 6 players must be less than or equal to 50K.
Teams are to be generated based on highest collective projection.
What i did to get this result is generate all possibilities of team then run checks on them to exclude those teams that have more than 50K salary and then sort the remainder based on projection. But generating all the possibilities takes a lot of time and sometimes it consume all the memory. For a list of 160 players it takes around 90 seconds. Here is the code
$base_array = array();
$query1 = mysqli_query($conn, "SELECT * FROM temp_players ORDER BY projection DESC");
while($row1 = mysqli_fetch_array($query1))
{
$player = array();
$mma_id = $row1['mma_player_id'];
$salary = $row1['salary'];
$projection = $row1['projection'];
$wclass = $row1['wclass'];
array_push($player, $mma_id);
array_push($player, $salary);
array_push($player, $projection);
array_push($player, $wclass);
array_push($base_array, $player);
}
$result_base_array = array();
$totalsalary = 0;
for($i=0; $i<count($base_array)-5; $i++)
{
for($j=$i+1; $j<count($base_array)-4; $j++)
{
for($k=$j+1; $k<count($base_array)-3; $k++)
{
for($l=$k+1; $l<count($base_array)-2; $l++)
{
for($m=$l+1; $m<count($base_array)-1; $m++)
{
for($n=$m+1; $n<count($base_array)-0; $n++)
{
$totalsalary = $base_array[$i][1]+$base_array[$j][1]+$base_array[$k][1]+$base_array[$l][1]+$base_array[$m][1]+$base_array[$n][1];
$totalprojection = $base_array[$i][2]+$base_array[$j][2]+$base_array[$k][2]+$base_array[$l][2]+$base_array[$m][2]+$base_array[$n][2];
if($totalsalary <= 50000)
{
array_push($result_base_array,
array($base_array[$i], $base_array[$j], $base_array[$k], $base_array[$l], $base_array[$m], $base_array[$n],
$totalprojection, $totalsalary)
);
}
}
}
}
}
}
}
usort($result_base_array, "cmp");
And the cmp function
function cmp($a, $b) {
if ($a[6] == $b[6]) {
return 0;
}
return ($a[6] < $b[6]) ? 1 : -1;
}
Is there anyway to reduce the time it takes to do this task, or any other workaround for getting the desired number of teams
Regards
Because number of elements in array can be very big (for example 100 players can generate 1.2*10^9 teams), you can't hold it in memory. Try to save resulting array to file by parts (truncate array after each save). Then use external file sorting.
It will be slow, but at least it will not fall because of memory.
If you need top n teams (like 10 teams with highest projection) then you should convert code that generates result_base_array to Generator, so it will yield next team instead of pushing it into array. Then iterate over this generator. On each iteration add new item to sorted resulted array and cut redundant elements.
Depending on whether the salaries are often the cause of exclusion, you could perform tests on this in the other loops as well. If after 4 player selections their summed salaries are already above 50K, there is no use to select the remaining 2 players. This could save you some iterations.
This can be further improved by remembering the lowest 6 salaries in the pack, and then check if after selecting 4 members you would still stay under 50K if you would add the 2 lowest existing salaries. If this is not possible, then again it is of no use to try to add the two remaining players. Of course, this can be done at each stage of the selection (after selecting 1 player, 2 players, ...)
Another related improvement comes into play when you sort your data by ascending salary. If after selecting the 4th player, the above logic brings you to conclude you cannot stay under 50K by adding 2 more players, then there is no use to replace the 4th player with the next one in the data series either: that player would have a greater salary, so it would also yield to a total above 50K. So that means you can backtrack immediately and work on the 3rd player selection.
As others pointed out, the number of potential solutions is enormous. For 160 teams and a team size of 6 members, the number of combinations is:
160 . 159 . 158 . 157 . 156 . 155
--------------------------------- = 21 193 254 160
6 . 5 . 4 . 3 . 2
21 billion entries is a stretch for memory, and probably not useful to you either: will you really be interested in the team at the 4 432 456 911th place?
You'll probably be interested in something like the top-10 of those teams (in terms of projection). This you can achieve by keeping a list of 10 best teams, and then, when you get a new team with an acceptable salary, you add it to that list, keeping it sorted (via a binary search), and ejecting the entry with the lowest projection from that top-10.
Here is the code you could use:
$base_array = array();
// Order by salary, ascending, and only select what you need
$query1 = mysqli_query($conn, "
SELECT mma_player_id, salary, projection, wclass
FROM temp_players
ORDER BY salary ASC");
// Specify with option argument that you only need the associative keys:
while($row1 = mysqli_fetch_array($query1, MYSQLI_ASSOC)) {
// Keep the named keys, it makes interpreting the data easier:
$base_array[] = $row1;
}
function combinations($base_array, $salary_limit, $team_size) {
// Get lowest salaries, so we know the least value that still needs to
// be added when composing a team. This will allow an early exit when
// the cumulative salary is already too great to stay under the limit.
$remaining_salary = [];
foreach ($base_array as $i => $row) {
if ($i == $team_size) break;
array_unshift($remaining_salary, $salary_limit);
$salary_limit -= $row['salary'];
}
$result = [];
$stack = [0];
$sum_salary = [0];
$sum_projection = [0];
$index = 0;
while (true) {
$player = $base_array[$stack[$index]];
if ($sum_salary[$index] + $player['salary'] <= $remaining_salary[$index]) {
$result[$index] = $player;
if ($index == $team_size - 1) {
// Use yield so we don't need to build an enormous result array:
yield [
"total_salary" => $sum_salary[$index] + $player['salary'],
"total_projection" => $sum_projection[$index] + $player['projection'],
"members" => $result
];
} else {
$index++;
$sum_salary[$index] = $sum_salary[$index-1] + $player['salary'];
$sum_projection[$index] = $sum_projection[$index-1] + $player['projection'];
$stack[$index] = $stack[$index-1];
}
} else {
$index--;
}
while (true) {
if ($index < 0) {
return; // all done
}
$stack[$index]++;
if ($stack[$index] <= count($base_array) - $team_size + $index) break;
$index--;
}
}
}
// Helper function to quickly find where to insert a value in an ordered list
function binary_search($needle, $haystack) {
$high = count($haystack)-1;
$low = 0;
while ($high >= $low) {
$mid = (int)floor(($high + $low) / 2);
$val = $haystack[$mid];
if ($needle < $val) {
$high = $mid - 1;
} elseif ($needle > $val) {
$low = $mid + 1;
} else {
return $mid;
}
}
return $low;
}
$top_team_count = 10; // set this to the desired size of the output
$top_teams = []; // this will be the output
$top_projections = [];
foreach(combinations($base_array, 50000, 6) as $team) {
$j = binary_search($team['total_projection'], $top_projections);
array_splice($top_teams, $j, 0, [$team]);
array_splice($top_projections, $j, 0, [$team['total_projection']]);
if (count($top_teams) > $top_team_count) {
// forget about lowest projection, to keep memory usage low
array_shift($top_teams);
array_shift($top_projections);
}
}
$top_teams = array_reverse($top_teams); // Put highest projection first
print_r($top_teams);
Have a look at the demo on eval.in, which just generates 12 players with random salary and projection data.
Final remarks
Even with the above mentioned optimisations, doing this for 160 teams might still require a lot of iterations. The more often the salaries amount to more than 50K, the better the performance will be. If this never happens, the algorithm cannot escape from having to look at each of the 21 billion combinations. If you would know beforehand that the 50K limit would not play any role, you would of course order the data by descending projection, like you originally did.
Another optimisation could be if you would feed back into the combination function the 10th highest team projection you have so far. The function could then eliminate combinations that would lead to a lower total projection. You could first take the 6 highest player projection values and use this to determine how high a partial team projection can still grow by adding the missing players. It might turn out that this becomes impossible after having selected a few players, and then you can skip some iterations, much like done on the basis of salaries.
I have such an array of intervals sorted by the lower bound ($a[$i] <= $a[$i+1] for every $i), key l is lower bound and , key h is upper bound and I'd like to remove all rows with intervals that are enclosed by larger intervals.
$a[0] = array('l' => 123, 'h'=>241);
$a[1] = array('l' => 250, 'h'=>360);
$a[2] = array('l' => 280, 'h'=>285);
$a[3] = array('l' => 310, 'h'=>310);
$a[4] = array('l' => 390, 'h'=>400);
So the result I'd like to get is
$a[0] = array('l' => 123, 'h'=>241);
$a[1] = array('l' => 250, 'h'=>360);
$a[2] = array('l' => 390, 'h'=>400);
This is what I attempted
function dup($a){
$c = count($a)-1;
for ($i = $c; $i > 0; $i --){
while ($a[$i]['h'] <= $a[$i-1]['h']){
unset($a[$i]);
}
}
$a = array_values($a);
}
The first answer which comes in mind was given with different variations by other contributors : for each interval, loop on each interval looking for a larger and enclosing interval. It's simple to understand and to write, and it works for sure.
This is basically n2 order, which means for n intervals we'll do n*n loop turns. There can be some tricks to optimize it :
break'ing when we find an enclosing interval in the nested loop, as in user3137702's answer, because it's useless to continue if we find at least one enclosing interval
avoiding looping on the same interval in the nested loop because we know an interval cant be strictly enclosed in itself (not significant)
avoiding looping on already excluded intervals in the nested loop (can have a significant impact)
looping on intervals (global loop) in ascending width = (h - l) order, because smaller intervals have more chance to be enclosed in others and the earliest we eliminate intervals, the more the next loop turns are effective (can be significant too in my opinion)
searching for enclosing intervals (nested loop) in descending width order, because larger intervals have more chance to be enclosing other intervals (I think it can have a significant impact too)
probably many other things that do not come to mind at the moment
Let me say now that :
optimization does not matter much if we have only few intervals to compute from time to time, and currently accepted user3137702's answer does the trick
to develop the suitable algorithm, it is necessary anyway to study the characteristics of the data that we have to deal with : in the case before us, how is the distribution of intervals ? Are there many enclosed intervals ? This can help to choose from the above list, the most useful tricks.
For educational purposes, I wondered if we could develop a different algorithm avoiding a n*n order which running time is necessarily very quickly deteriorated gradually as you increase the number of intervals to compute.
"Virtual rule" algorithm
I imagined this algorithm I called the "virtual rule".
place starting and ending points of the intervals on a virtual rule
run through the points along the rule in ascending order
during the run, register open or not intervals
when an interval starts and ends while another was opened before and is still open, we can say it is enclosed
so when an interval ends, check if it was opened after one of the other currently open intervals and if it is strictly closed before this interval. If yes, it is enclosed !
I do not pretend this is the best solution. But we can assume this is faster than the basic method because, despite many tests to do during the loop, this is n order.
Code example
I wrote comments to make it as clear as possible.
<?php
function removeEnclosedIntervals_VirtualRule($a, $debug = false)
{
$rule = array();
// place one point on a virtual rule for each low or up bound, refering to the interval's index in $a
// virtual rule has 2 levels because there can be more than one point for a value
foreach($a as $i => $interval)
{
$rule[$interval['l']][] = array('l', $i);
$rule[$interval['h']][] = array('h', $i);
}
// used in the foreach loop
$open = array();
$enclosed = array();
// loop through the points on the ordered virtual rule
ksort($rule);
foreach($rule as $points)
{
// Will register open intervals
// When an interval starts and ends while another was opened before and is still open, it is enclosed
// starts
foreach($points as $point)
if($point[0] == 'l')
$open[$point[1]] = $point[1]; // register it as open
// ends
foreach($points as $point)
{
if($point[0] == 'h')
{
unset($open[$point[1]]); // UNregister it as open
// was it opened after a still open interval ?
foreach($open as $i)
{
if($a[$i]['l'] < $a[$point[1]]['l'])
{
// it is enclosed.
// is it *strictly* enclosed ?
if($a[$i]['h'] > $a[$point[1]]['h'])
{
// so this interval is strictly enclosed
$enclosed[$point[1]] = $point[1];
if($debug)
echo debugPhrase(
$point[1], // $iEnclosed
$a[$point[1]]['l'], // $lEnclosed
$a[$point[1]]['h'], // $hEnclosed
$i, // $iLarger
$a[$i]['l'], // $lLarger
$a[$i]['h'] // $hLarger
);
break;
}
}
}
}
}
}
// obviously
foreach($enclosed as $i)
unset($a[$i]);
return $a;
}
?>
Benchmarking against basic method
It runs tests on randomly generated intervals
basic method works without a doubt. Comparing results from the two methods allows me to predent the "VirtualRule" method works because as far as I tested, it returned the same results
// * include removeEnclosingIntervals_VirtualRule function *
// arbitrary range for intervals start and end
// Note that it could be interesting to do benchmarking with different MIN and MAX values !
define('MIN', 0);
define('MAX', 500);
// Benchmarking params
define('TEST_MAX_NUMBER', 100000);
define('TEST_BY_STEPS_OF', 100);
// from http://php.net/manual/en/function.microtime.php
// used later for benchmarking purpose
function microtime_float()
{
list($usec, $sec) = explode(" ", microtime());
return ((float)$usec + (float)$sec);
}
function debugPhrase($iEnclosed, $lEnclosed, $hEnclosed, $iLarger, $lLarger, $hLarger)
{
return '('.$iEnclosed.')['.$lEnclosed.' ; '.$hEnclosed.'] is strictly enclosed at least in ('.$iLarger.')['.$lLarger.' ; '.$hLarger.']'.PHP_EOL;
}
// 2 foreach loops solution (based on user3137702's *damn good* work ;) and currently accepted answer)
function removeEnclosedIntervals_Basic($a, $debug = false)
{
foreach ($a as $i => $valA)
{
$found = false;
foreach ($a as $j => $valB)
{
if (($valA['l'] > $valB['l']) && ($valA['h'] < $valB['h']))
{
$found = true;
if($debug)
echo debugPhrase(
$i, // $iEnclosed
$a[$i]['l'], // $lEnclosed
$a[$i]['h'], // $hEnclosed
$j, // $iLarger
$a[$j]['l'], // $lLarger
$a[$j]['h'] // $hLarger
);
break;
}
}
if (!$found)
{
$out[$i] = $valA;
}
}
return $out;
}
// runs a benchmark with $number intervals
function runTest($number)
{
// Generating a random set of intervals with values between MIN and MAX
$randomSet = array();
for($i=0; $i<$number; $i++)
// avoiding self-closing intervals
$randomSet[] = array(
'l' => ($l = mt_rand(MIN, MAX-2)),
'h' => mt_rand($l+1, MAX)
);
/* running the two methods and comparing results and execution time */
// Basic method
$start = microtime_float();
$Basic_result = removeEnclosedIntervals_Basic($randomSet);
$end = microtime_float();
$Basic_time = $end - $start;
// VirtualRule
$start = microtime_float();
$VirtualRule_result = removeEnclosedIntervals_VirtualRule($randomSet);
$end = microtime_float();
$VirtualRule_time = $end - $start;
// Basic method works for sure.
// If results are the same, comparing execution time. If not, sh*t happened !
if(md5(var_export($VirtualRule_result, true)) == md5(var_export($VirtualRule_result, true)))
echo $number.';'.$Basic_time.';'.$VirtualRule_time.PHP_EOL;
else
{
echo '/;/;/;Work harder, results are not the same ! Cant say anything !'.PHP_EOL;
stop;
}
}
// CSV header
echo 'Number of intervals;Basic method exec time (s);VirtualRule method exec time (s)'.PHP_EOL;
for($n=TEST_BY_STEPS_OF; $n<TEST_MAX_NUMBER; $n+=TEST_BY_STEPS_OF)
{
runTest($n);
flush();
}
Results (for me)
As I thought, clearly different performances are obtained.
I ran the tests on a Core i7 computer with PHP5 and on a (old) AMD Quad Core computer with PHP7. There are clear differences in performance between the two versions on my systems ! which in principle can be explained by the difference in PHP versions because the computer that is running PHP5 is much more powerful...
A simplistic approach, maybe not exactly what you want, but should at least point you in the right direction. I can refine it if needed, just a bit busy and didn't want to leave the question unanswered..
$out = [];
foreach ($a as $valA)
{
$found = false;
foreach ($a as $valB)
{
if (($valA['l'] > $valB['l']) && ($valA['h'] < $valB['h']))
{
$found = true;
break;
}
}
if (!$found)
{
$out[] = $valA;
}
}
This is entirely untested, but should end up with only the unique (large) ranges in $out. Overlaps as I mentioned in my comment are unhandled.
The problem was missing break in the while cycle
function dup($a){
$c = count($a)-1;
for ($i = $c; $i > 0; $i --){
while ($a[$i]['h'] <= $a[$i-1]['h']){
unset($a[$i]);
break; //here
}
}
$a = array_values($a);
}
Here is the code
function sort_by_low($item1,$item2){
if($item1['l'] == $item2['l'])
return 0;
return ($item1['l']>$item2['l'])? -1:1;
}
usort($a,'sort_by_low');
for($i=0; $i<count($a); $i++){
for($j=$i+1; $j<count($a);$j++){
if($a[$i][l]<=$a[$j]['l'] && $a[$i][h]>=$a[$j]['h']){
unset($a[$j]);
}
}
}
$a=array_values($a);
Here is the working code (Tested)
$result = array();
usort($a, function ($item1, $item2) {
if ($item1['l'] == $item2['l']) return 0;
return $item1['l'] < $item2['l'] ? -1 : 1;
});
foreach ($a as $element) {
$exists = false;
foreach ($result as $r) {
if (($r['l'] < $element['l'] && $r['h'] > $element['h'])) {
$exists = true;
break;
}
}
if (!$exists) {
$result[] = $element;
}
}
$result will contain the desired result
I have a comma delimited list of numbers which i am converting into an array and what i want to know about the list of numbers is if the numbers listed obey a natural ordering of numbers,you know,have a difference of exactly 1 between the next and the previous.
If its true the list obeys the natural ordering,i want to pick the first number of the list and if not the list obeys not the natural order,i pick the second.
This is my code.
<?php
error_reporting(0);
/**
Analyze numbers
Condition 1
if from number to the next has a difference of 1,then pick the first number in the list
Condition 2
if from one number the next,a difference of greater than 1 was found,then pick next from first
Condition 3
if list contains only one number,pick the number
*/
$number_picked = null;
$a = '5,7,8,9,10';
$b = '2,3,4,5,6,7,8,9,10';
$c = '10';
$data = explode(',', $b);
$count = count($data);
foreach($data as $index => $number)
{
/**
If array has exactly one value
*/
if($count == 1){
echo 'number is:'.$number;
exit();
}
$previous = $data[($count+$index-1) % $count];
$current = $number;
$next = $data[($index+1) % $count];
$diff = ($next - $previous);
if($diff == 1){
$number_picked = array_values($data)[0];
echo $number_picked.'correct';
}
elseif($diff > 1){
$number_picked = array_values($data)[1];
echo $number_picked.'wrong';
}
}
?>
The problem i am having is to figure out how to test the difference for all array elements.
No loops are needed, a little bit of maths will help you here. Once you have your numbers in an array:
$a = explode(',', '5,7,8,9,10');
pass them to this function:-
function isSequential(array $sequence, $diff = 1)
{
return $sequence[count($sequence) - 1] === $sequence[0] + ($diff * (count($sequence) - 1));
}
The function will return true if the numbers in the array follow a natural sequence. You should even be able to adjust it for different spacings between numbers, eg 2, 4, 6, 8, etc using the $diff parameter, although I haven't tested that thoroughly.
See it working.
Keep in mind that this will only work if your list of numbers is ordered from smallest to largest.
Try using a function to solve this... Like so:
<?php
error_reporting(0);
/**
Analyze numbers
Condition 1
if from number to the next has a difference of 1,then pick the first number in the list
Condition 2
if from one number the next,a difference of greater than 1 was found,then pick next from first
Condition 3
if list contains only one number,pick the number
*/
$number_picked = null;
$a = '5,7,8,9,10';
$b = '2,3,4,5,6,7,8,9,10';
$c = '10';
function test($string) {
$data = explode(',', $string);
if(count($data) === 1){
return 'number is:'.$number;
}
foreach($data as $index => $number)
{
$previous = $data[($count+$index-1) % $count];
$current = $number;
$next = $data[($index+1) % $count];
$diff = ($next - $previous);
if($diff == 1){
$number_picked = array_values($data)[0];
return $number_picked.'correct';
}
elseif($diff > 1){
$number_picked = array_values($data)[1];
return $number_picked.'wrong';
}
}
}
echo test($a);
echo test($b);
echo test($c);
?>
You already know how to explode the list, so I'll skip that.
You already handle a single item, so I'll skip that as well.
What is left, is checking the rest of the array. Basically; there's two possible outcome values: either the first element or the second. So we'll save those two first:
$outcome1 = $list[0];
$outcome2 = $list[1];
Next, we'll loop over the items. We'll remember the last found item, and make sure that the difference between the new and the old is 1. If it is, we continue. If it isn't, we abort and immediately return $outcome2.
If we reach the end of the list without aborting, it's naturally ordered, so we return $outcome1.
$lastNumber = null;
foreach( $items as $number ) {
if($lastNumber === null || $number - $lastNumber == 1 ) {
// continue scanning
$lastNumber = $number;
}
else {
// not ordened
return $outcome2;
}
}
return $outcome1; // scanned everything; was ordened.
(Note: code not tested)
To avoid the headache of accessing the previous or next element, and deciding whether it still is inside the array or not, use the fact that on a natural ordering the item i and the first item have a difference of i.
Also the corner case you call condition 3 is easier to handle outside the loop than inside of it. But easier still, the way we characterize a natural ordered list holds for a 1-item list :
$natural = true;
for($i=1; $i<$count && $natural; $i++)
$natural &= ($data[$i] == $data[0] + $i)
$number = $natural ? $data[0] : $data[1];
For $count == 1 the loop is never entered and thus $natural stays true : you select the first element.
I'm using this PHP parse.com API library to retrieve rows from a table from Parse.
Due to the limit of maximum 1000 rows per request, I'm retrieving them in a for loop like this:
$lastUpdated = null;
$parse = new parseQuery($tableName);
$parseAllResults = array();
$skip = 0;
do {
$index = count($parseAllResults) - 1;
if($skip === 10000) {
$lastUpdated = $parseAllResults[$index]['updatedAt'];
$skip = 0;
}
$parseResults = queryParseCrawlObjects($lastUpdated, $skip);
if (!empty($parseResults)) {
$skip += 1000;
} else {
$skip = 0;
}
$parseAllResults = array_merge($parseAllResults, $parseResults);
} while ($skip > 0);
function queryParseCrawlObjects($parse, $lastUpdated, $skip) {
global $parse;
date_default_timezone_set('UTC');
$parse->orderBy('updatedAt');
if ($lastUpdated != null) {
$parse->whereGreaterThan('updatedAt', $parse->dataType('date', $lastUpdated));
} else {
$parse->whereNotEqualTo('objectId', '');
}
$parse->setLimit(1000);
$parse->setSkip($skip);
$results = $parse->find();
return $results['results'];
}
I'm using the whereNotEqualTo('objectId', ''); restriction as a workaround for a bug in the library, that practically retrieves all the rows, and the skip parameter ro retrieve them in a batch of 1000 rows.
Another limitation of Parse is that it doesn't allow a skip greater than 10.000. So I use the updatedAt field of the last row in the first 10.000 batch, as a restriction for the next rows.
And after the first 10.000 rows, it calls the whereGreaterThan method, that internally adds the gt (greater than) Parse parameter to the curl request.
The problem is that in the second loop step, it returns the same row as the last one from the previous step, and I get an error for duplicate objectId when I try to insert them into a database.
So the array looks something like this:
// first step
$parseAllResults[0] = array('objectId' => 'ihJikHNkjH', ...);
$parseAllResults[1] = array('objectId' => 'sHJKHfddkO', ...);
...
$parseAllResults[9999] = array('objectId' => 'rukBfcaDFR', ...);
// second step
$parseAllResults[10000] = array('objectId' => 'rukBfcaDFR', ...);
$parseAllResults[10001] = array('objectId' => 'gusFGvQWVs', ...);
...
$parseAllResults[19999] = array('objectId' => 'asHppNVAaD', ...);
with the 9999th and 10000th having the rest of the properties equal, so I'm sure it's the same row from Parse retrieved twice.
I don't understand why it does that, since it has a whereGreaterThanOrEqualTo method too, using the gte (greater than or equal to) Parse parameter.
TL;DR
greater than (gt) parameter behaves exactly like greater than or equal to (gte) parameter.
This might fix your issue. Change this part of your code:
if($skip === 10000) {
$lastUpdated = $parseAllResults[$index]['updatedAt'];
$skip = 0;
}
to:
if($skip === 10000) {
$lastUpdated = $parseAllResults[$index]['updatedAt'];
$skip = 1;
}