Removing enclosed intervals in an array of intervals in PHP

Removing enclosed intervals in an array of intervals in PHP - php

I have such an array of intervals sorted by the lower bound ($a[$i] <= $a[$i+1] for every $i), key l is lower bound and , key h is upper bound and I'd like to remove all rows with intervals that are enclosed by larger intervals.
$a[0] = array('l' => 123, 'h'=>241);
$a[1] = array('l' => 250, 'h'=>360);
$a[2] = array('l' => 280, 'h'=>285);
$a[3] = array('l' => 310, 'h'=>310);
$a[4] = array('l' => 390, 'h'=>400);
So the result I'd like to get is
$a[0] = array('l' => 123, 'h'=>241);
$a[1] = array('l' => 250, 'h'=>360);
$a[2] = array('l' => 390, 'h'=>400);
This is what I attempted
function dup($a){
$c = count($a)-1;
for ($i = $c; $i > 0; $i --){
while ($a[$i]['h'] <= $a[$i-1]['h']){
unset($a[$i]);
}
}
$a = array_values($a);
}

The first answer which comes in mind was given with different variations by other contributors : for each interval, loop on each interval looking for a larger and enclosing interval. It's simple to understand and to write, and it works for sure.
This is basically n2 order, which means for n intervals we'll do n*n loop turns. There can be some tricks to optimize it :
break'ing when we find an enclosing interval in the nested loop, as in user3137702's answer, because it's useless to continue if we find at least one enclosing interval
avoiding looping on the same interval in the nested loop because we know an interval cant be strictly enclosed in itself (not significant)
avoiding looping on already excluded intervals in the nested loop (can have a significant impact)
looping on intervals (global loop) in ascending width = (h - l) order, because smaller intervals have more chance to be enclosed in others and the earliest we eliminate intervals, the more the next loop turns are effective (can be significant too in my opinion)
searching for enclosing intervals (nested loop) in descending width order, because larger intervals have more chance to be enclosing other intervals (I think it can have a significant impact too)
probably many other things that do not come to mind at the moment
Let me say now that :
optimization does not matter much if we have only few intervals to compute from time to time, and currently accepted user3137702's answer does the trick
to develop the suitable algorithm, it is necessary anyway to study the characteristics of the data that we have to deal with : in the case before us, how is the distribution of intervals ? Are there many enclosed intervals ? This can help to choose from the above list, the most useful tricks.
For educational purposes, I wondered if we could develop a different algorithm avoiding a n*n order which running time is necessarily very quickly deteriorated gradually as you increase the number of intervals to compute.
"Virtual rule" algorithm
I imagined this algorithm I called the "virtual rule".
place starting and ending points of the intervals on a virtual rule
run through the points along the rule in ascending order
during the run, register open or not intervals
when an interval starts and ends while another was opened before and is still open, we can say it is enclosed
so when an interval ends, check if it was opened after one of the other currently open intervals and if it is strictly closed before this interval. If yes, it is enclosed !
I do not pretend this is the best solution. But we can assume this is faster than the basic method because, despite many tests to do during the loop, this is n order.
Code example
I wrote comments to make it as clear as possible.
<?php
function removeEnclosedIntervals_VirtualRule($a, $debug = false)
{
$rule = array();
// place one point on a virtual rule for each low or up bound, refering to the interval's index in $a
// virtual rule has 2 levels because there can be more than one point for a value
foreach($a as $i => $interval)
{
$rule[$interval['l']][] = array('l', $i);
$rule[$interval['h']][] = array('h', $i);
}
// used in the foreach loop
$open = array();
$enclosed = array();
// loop through the points on the ordered virtual rule
ksort($rule);
foreach($rule as $points)
{
// Will register open intervals
// When an interval starts and ends while another was opened before and is still open, it is enclosed
// starts
foreach($points as $point)
if($point[0] == 'l')
$open[$point[1]] = $point[1]; // register it as open
// ends
foreach($points as $point)
{
if($point[0] == 'h')
{
unset($open[$point[1]]); // UNregister it as open
// was it opened after a still open interval ?
foreach($open as $i)
{
if($a[$i]['l'] < $a[$point[1]]['l'])
{
// it is enclosed.
// is it *strictly* enclosed ?
if($a[$i]['h'] > $a[$point[1]]['h'])
{
// so this interval is strictly enclosed
$enclosed[$point[1]] = $point[1];
if($debug)
echo debugPhrase(
$point[1], // $iEnclosed
$a[$point[1]]['l'], // $lEnclosed
$a[$point[1]]['h'], // $hEnclosed
$i, // $iLarger
$a[$i]['l'], // $lLarger
$a[$i]['h'] // $hLarger
);
break;
}
}
}
}
}
}
// obviously
foreach($enclosed as $i)
unset($a[$i]);
return $a;
}
?>
Benchmarking against basic method
It runs tests on randomly generated intervals
basic method works without a doubt. Comparing results from the two methods allows me to predent the "VirtualRule" method works because as far as I tested, it returned the same results
// * include removeEnclosingIntervals_VirtualRule function *
// arbitrary range for intervals start and end
// Note that it could be interesting to do benchmarking with different MIN and MAX values !
define('MIN', 0);
define('MAX', 500);
// Benchmarking params
define('TEST_MAX_NUMBER', 100000);
define('TEST_BY_STEPS_OF', 100);
// from http://php.net/manual/en/function.microtime.php
// used later for benchmarking purpose
function microtime_float()
{
list($usec, $sec) = explode(" ", microtime());
return ((float)$usec + (float)$sec);
}
function debugPhrase($iEnclosed, $lEnclosed, $hEnclosed, $iLarger, $lLarger, $hLarger)
{
return '('.$iEnclosed.')['.$lEnclosed.' ; '.$hEnclosed.'] is strictly enclosed at least in ('.$iLarger.')['.$lLarger.' ; '.$hLarger.']'.PHP_EOL;
}
// 2 foreach loops solution (based on user3137702's *damn good* work ;) and currently accepted answer)
function removeEnclosedIntervals_Basic($a, $debug = false)
{
foreach ($a as $i => $valA)
{
$found = false;
foreach ($a as $j => $valB)
{
if (($valA['l'] > $valB['l']) && ($valA['h'] < $valB['h']))
{
$found = true;
if($debug)
echo debugPhrase(
$i, // $iEnclosed
$a[$i]['l'], // $lEnclosed
$a[$i]['h'], // $hEnclosed
$j, // $iLarger
$a[$j]['l'], // $lLarger
$a[$j]['h'] // $hLarger
);
break;
}
}
if (!$found)
{
$out[$i] = $valA;
}
}
return $out;
}
// runs a benchmark with $number intervals
function runTest($number)
{
// Generating a random set of intervals with values between MIN and MAX
$randomSet = array();
for($i=0; $i<$number; $i++)
// avoiding self-closing intervals
$randomSet[] = array(
'l' => ($l = mt_rand(MIN, MAX-2)),
'h' => mt_rand($l+1, MAX)
);
/* running the two methods and comparing results and execution time */
// Basic method
$start = microtime_float();
$Basic_result = removeEnclosedIntervals_Basic($randomSet);
$end = microtime_float();
$Basic_time = $end - $start;
// VirtualRule
$start = microtime_float();
$VirtualRule_result = removeEnclosedIntervals_VirtualRule($randomSet);
$end = microtime_float();
$VirtualRule_time = $end - $start;
// Basic method works for sure.
// If results are the same, comparing execution time. If not, sh*t happened !
if(md5(var_export($VirtualRule_result, true)) == md5(var_export($VirtualRule_result, true)))
echo $number.';'.$Basic_time.';'.$VirtualRule_time.PHP_EOL;
else
{
echo '/;/;/;Work harder, results are not the same ! Cant say anything !'.PHP_EOL;
stop;
}
}
// CSV header
echo 'Number of intervals;Basic method exec time (s);VirtualRule method exec time (s)'.PHP_EOL;
for($n=TEST_BY_STEPS_OF; $n<TEST_MAX_NUMBER; $n+=TEST_BY_STEPS_OF)
{
runTest($n);
flush();
}
Results (for me)
As I thought, clearly different performances are obtained.
I ran the tests on a Core i7 computer with PHP5 and on a (old) AMD Quad Core computer with PHP7. There are clear differences in performance between the two versions on my systems ! which in principle can be explained by the difference in PHP versions because the computer that is running PHP5 is much more powerful...

A simplistic approach, maybe not exactly what you want, but should at least point you in the right direction. I can refine it if needed, just a bit busy and didn't want to leave the question unanswered..
$out = [];
foreach ($a as $valA)
{
$found = false;
foreach ($a as $valB)
{
if (($valA['l'] > $valB['l']) && ($valA['h'] < $valB['h']))
{
$found = true;
break;
}
}
if (!$found)
{
$out[] = $valA;
}
}
This is entirely untested, but should end up with only the unique (large) ranges in $out. Overlaps as I mentioned in my comment are unhandled.

The problem was missing break in the while cycle
function dup($a){
$c = count($a)-1;
for ($i = $c; $i > 0; $i --){
while ($a[$i]['h'] <= $a[$i-1]['h']){
unset($a[$i]);
break; //here
}
}
$a = array_values($a);
}

Here is the code
function sort_by_low($item1,$item2){
if($item1['l'] == $item2['l'])
return 0;
return ($item1['l']>$item2['l'])? -1:1;
}
usort($a,'sort_by_low');
for($i=0; $i<count($a); $i++){
for($j=$i+1; $j<count($a);$j++){
if($a[$i][l]<=$a[$j]['l'] && $a[$i][h]>=$a[$j]['h']){
unset($a[$j]);
}
}
}
$a=array_values($a);

Here is the working code (Tested)
$result = array();
usort($a, function ($item1, $item2) {
if ($item1['l'] == $item2['l']) return 0;
return $item1['l'] < $item2['l'] ? -1 : 1;
});
foreach ($a as $element) {
$exists = false;
foreach ($result as $r) {
if (($r['l'] < $element['l'] && $r['h'] > $element['h'])) {
$exists = true;
break;
}
}
if (!$exists) {
$result[] = $element;
}
}
$result will contain the desired result

Related

Best method to intersect many arrays

I have a 2 dimensional array ($a), the first dimensional has 600 elements, the second dimension can have from 1k to 10k elements. Something like:
$a[0] = array(4 => true, 10 => true, 18 => true...);
$a[1] = array(6 => true, 10 => true, 73 => true...);
$a[599] = array(106 => true, 293 => true, 297 => true...);
I need to save in a file the $intersection of each of those 600 elements with each other 5x. Something like this:
for ($i=0;$i<=599;$i++) {
for ($j=$i+1;$j<=599;$j++) {
for ($k=$j+1;$k<=599;$k++) {
for ($p=$k+1;$p<=599;$p++) {
for ($m=$p+1;$m<=599;$m++) {
$intersection = array_intersect_key($a[$i], $a[$j], $a[$k], $a[$p], $a[$m]);
}
}
}
}
}
I am using array_intersect_key because its much faster than array_intersect because it's O(n) vs O(n^2).
That code works fine but runs pretty slow. So I made maaany other attempts which runs a little faster, something like:
for ($i=0;$i<=599;$i++) {
for ($j=$i+1;$j<=599;$j++) {
$b = array_intersect_keys($a[$i], $a[$j]);
for ($k=$j+1;$k<=599;$k++) {
$c = array_intersect_keys($b, $a[$k]);
for ($p=$k+1;$p<=599;$p++) {
$d = array_intersect_keys($c, $a[$p]);
for ($m=$p+1;$m<=599;$m++) {
$intersection = array_intersect_key($d, $a[$m]);
}
}
}
}
}
This runs 3x faster than the 1st code, but it's still pretty slow. I also tried creating long binary vectors like 00000011001110... (for each 600 elements) and doing bitwise operations && which works exactly like an intersect, but it's not much faster and uses a lot more memory.
So I am wondering, do you have any breakthrough suggestion? Any mathematical operation, matrix operation... that I can use? I am really tempted to recreate the C+ code of the array_intersect_keys because at every execution it does a lot of things that I dont need, like ordering the arrays before performing the intersection - I dont need that because I already ordered all my arrays before operations start, so that creates a big overhead. But I still dont want to go that route because it's not that simple to create a PHP extension that performs better than the native implementation.
Note: my array $a has all the elements already sorted and the elements in the first dimension that have less elements in the second dimension are positioned first in the array so array_intersect_keys works much faster because that function has to check only until reach the end of the first parameter (which is smaller in size).
EDIT
#user1597430 Again I appreciate all your attention and further explanations you gave me. You made me complete understand everything you said! And after that, I decided to take a shot on my own implementation of your algorithm and I think it is a little bit simpler and faster because I never need to use sort and I also make good use of array keys (instead of values). Feel free to use the algorithm below whenever you want if you ever need it, you already provided me with lots of your time!
After implementing the algorithm below I just realized one thing at the end! I can ONLY start checking the intersections/combinations after all the combinatory phase is done, and after that, I need to run another algorithm to parse the results. To me that's not possible because using 600x1000 takes already a long time and I may very well end up having to use 600x20000 (yeap, 600 x 20k) and I am pretty sure it's gonna take forever. And only after "forever" is that I will be able to check the combinations/intersections result (parse). Do you know a way in such a manner that I can already check the intersections after each computation or not having to wait ALL the combinations be generated? That's pretty sad that after taking almost 6 hours to understand and implement the algorithm below, I just came to the conclusion it wont fit my need. Dont worry, I will accept your answer if no better one come, since you already helped me a lot and I LEARNED A LOT WITH you!
<?php
$quantity_first_order_array = 10;
$quantity_second_order_array = 20;
$a = array();
$b = array();
for ($i=0;$i<$quantity_first_order_array;$i++) {
for ($j=0;$j<$quantity_second_order_array;$j++) {
//I just use `$i*2 + $j*3 + $j*$i` as a way to avoid using `rand`, dont try to make sense of this expression, it's just to avoid using `rand`.
$a['group-' . $i][$i*2 + $j*3 + $j*$i] = true;
$b[$i*2 + $j*3 + $j*$i] = true;
}
}
//echo "aaaa\n\n";var_dump($a);exit();
$c = array();
foreach ($b as $chave1 => $valor1) {
$c[$chave1] = array();
foreach ($a as $chave2 => $valor2) {
if (isset($a[$chave2][$chave1])) {
$c[$chave1][] = $chave2;
}
}
}
foreach ($c as $chave1 => $valor1) {
if (count($c[$chave1]) < 5) {
unset($c[$chave1]);
}
}
//echo "cccc\n\n";var_dump($c);exit();
$d = array();
foreach ($c as $chave1 => $valor1) {
$qntd_c_chave1 = count($c[$chave1]);
for ($a1=0;$a1<$qntd_c_chave1;$a1++) {
for ($a2=$a1 + 1;$a2<$qntd_c_chave1;$a2++) {
for ($a3=$a2 + 1;$a3<$qntd_c_chave1;$a3++) {
for ($a4=$a3 + 1;$a4<$qntd_c_chave1;$a4++) {
for ($a5=$a4 + 1;$a5<$qntd_c_chave1;$a5++) {
$d[$c[$chave1][$a1] . '|' . $c[$chave1][$a2] . '|' . $c[$chave1][$a3] . '|' . $c[$chave1][$a4] . '|' . $c[$chave1][$a5]][] = $chave1;
}
}
}
}
}
}
echo "dddd\n\n";var_dump($d);exit();
?>

Well, I usually don't do this for free, but since you mentioned genetic algos... Hello from Foldit community.
<?php
# a few magic constants
define('SEQUENCE_LENGTH', 600);
define('SEQUENCE_LENGTH_SUBARRAY', 1000);
define('RAND_MIN', 0);
define('RAND_MAX', 100000);
define('MIN_SHARED_VALUES', 5);
define('FILE_OUTPUT', sys_get_temp_dir() . DIRECTORY_SEPARATOR . 'out.csv');
# generates a sample data for processing
$a = [];
for ($i = 0; $i < SEQUENCE_LENGTH; $i++)
{
for ($j = 0; $j < SEQUENCE_LENGTH_SUBARRAY; $j++)
{
$a[$i][] = mt_rand(RAND_MIN, RAND_MAX);
}
$a[$i] = array_unique($a[$i]);
sort($a[$i]);
}
# prepares a map where key indicates the number
# and value is a sequence that contains this number
$map = [];
foreach ($a as $i => $array)
{
foreach ($array as $value)
{
$map[$value][$i] = $i;
}
}
ksort($map);
# extra optimization: drop all values that don't have at least
# MIN_SHARED_VALUES (default = 5) sequences
foreach ($map as $index => $values)
{
if (count($values) < MIN_SHARED_VALUES)
{
unset($map[$index]);
}
else
{
# reindex array keys - we need that later for simple
# "for" loops that should start from "0"
$map[$index] = array_values($map[$index]);
}
}
$file = fopen(FILE_OUTPUT, 'w');
# permutation: generates all sequences that share the same $value
foreach ($map as $value => $array)
{
$array_length = count($array);
for ($i = 0; $i < $array_length; $i++)
{
for ($j = $i + 1; $j < $array_length; $j++)
{
for ($k = $j + 1; $k < $array_length; $k++)
{
for ($l = $k + 1; $l < $array_length; $l++)
{
for ($m = $l + 1; $m < $array_length; $m++)
{
# index indicates a group that share a $value together
$index = implode('-', [$array[$i], $array[$j], $array[$k], $array[$l], $array[$m]]);
# instead of CSV export you can use something like
# "$result[$index][] = $value" here, but you need
# tons of RAM to do that
fputcsv($file, [$index, $value]);
}
}
}
}
}
}
fclose($file);
Locally script makes ~6.3 million permutations, that takes ~15s in total, 70% of the time goes to the I/O HDD operations.
You are free to play with the output method (by using linux sort out.csv -o sorted-out.csv on the top of your CSV file or by creating a separate file pointer for each unique $index sequence to dump your results - it's up to you).

Optimal way of cycling through 1000's of values

I need to find the value of x where the variance of two results (which take x into account) is the closest to 0. The problem is, the only way to do this is to cycle through all possible values of x. The equation uses currency, so I have to check in increments of 1 cent.
This might make it easier:
$previous_var = null;
$high_amount = 50;
for ($i = 0.01; $i <= $high_amount; $i += 0.01) {
$val1 = find_out_1($i);
$val2 = find_out_2();
$var = variance($val1, $val2);
if ($previous_var == null) {
$previous_var = $var;
}
// If this variance is larger, it means the previous one was the closest to
// 0 as the variance has now started increasing
if ($var > $previous_var) {
$l_s -= 0.01;
break;
}
}
$optimal_monetary_value = $i;
I feel like there is a mathematical formula that would make the "cycling through every cent" more optimal? It works fine for small values, but if you start using 1000's as the $high_amount it takes quite a few seconds to calculate.

Based on the comment in your code, it sounds like you want something similar to bisection search, but a little bit different:
function calculate_variance($i) {
$val1 = find_out_1($i);
$val2 = find_out_2();
return variance($val1, $val2);
}
function search($lo, $loVar, $hi, $hiVar) {
// find the midpoint between the hi and lo values
$mid = round($lo + ($hi - $lo) / 2, 2);
if ($mid == $hi || $mid == $lo) {
// we have converged, so pick the better value and be done
return ($hiVar > $loVar) ? $lo : $hi;
}
$midVar = calculate_variance($mid);
if ($midVar >= $loVar) {
// the optimal point must be in the lower interval
return search($lo, $loVar, $mid, $midVar);
} elseif ($midVar >= $hiVar) {
// the optimal point must be in the higher interval
return search($mid, $midVar, $hi, $hiVar);
} else {
// we don't know where the optimal point is for sure, so check
// the lower interval first
$loBest = search($lo, $loVar, $mid, $midVar);
if ($loBest == $mid) {
// we can't be sure this is the best answer, so check the hi
// interval to be sure
return search($mid, $midVar, $hi, $hiVar);
} else {
// we know this is the best answer
return $loBest;
}
}
}
$optimal_monetary_value = search(0.01, calculate_variance(0.01), 50.0, calculate_variance(50.0));
This assumes that the variance is monotonically increasing when moving away from the optimal point. In other words, if the optimal value is O, then for all X < Y < O, calculate_variance(X) >= calculate_variance(Y) >= calculate_variance(O) (and the same with all > and < flipped). The comment in your code and the way have you have it written make it seem like this is true. If this isn't true, then you can't really do much better than what you have.
Be aware that this is not as good as bisection search. There are some pathological inputs that will make it take linear time instead of logarithmic time (e.g., if the variance is the same for all values). If you can improve the requirement that calculate_variance(X) >= calculate_variance(Y) >= calculate_variance(O) to be calculate_variance(X) > calculate_variance(Y) > calculate_variance(O), you can improve this to be logarithmic in all cases by checking to see how the variance for $mid compares the the variance for $mid + 0.01 and using that to decide which interval to check.
Also, you may want to be careful about doing math with currency. You probably either want to use integers (i.e., do all math in cents instead of dollars) or use exact precision numbers.

If you known nothing at all about the behavior of the objective function, there is no other way than trying all possible values.
On the opposite if you have a guarantee that the minimum is unique, the Golden section method will converge very quickly. This is a variant of the Fibonacci search, which is known to be optimal (require the minimum number of function evaluations).
Your function may have different properties which call for other algorithms.

Why not implementing binary search ?
<?php
$high_amount = 50;
// computed val2 is placed outside the loop
// no need te recalculate it each time
$val2 = find_out_2();
$previous_var = variance(find_out_1(0.01), $val2);
$start = 0;
$end = $high_amount * 100;
$closest_variance = NULL;
while ($start <= $end) {
$section = intval(($start + $end)/2);
$cursor = $section / 100;
$val1 = find_out_1($cursor);
$variance = variance($val1, $val2);
if ($variance <= $previous_var) {
$start = $section;
}
else {
$closest_variance = $cursor;
$end = $section;
}
}
if (!is_null($closest_variance)) {
$closest_variance -= 0.01;
}

Knapsack Equation with item groups

Can't call it a problem on Stack Overflow apparently, however I am currently trying to understand how to integrate constraints in the form of item groups within the Knapsack problem. My math skills are proving to be fairly limiting in this situation, however I am very motivated to both make this work as intended as well as figure out what each aspect does (in that order since things make more sense when they work).
With that said, I have found an absolutely beautiful implementation at Rosetta Code and cleaned up the variable names some to help myself better understand this from a very basic perspective.
Unfortunately I am having an incredibly difficult time figuring out how I can apply this logic to include item groups. My purpose is for building fantasy teams, supplying my own value & weight (points/salary) per player but without groups (positions in my case) I am unable to do so.
Would anyone be able to point me in the right direction for this? I'm reviewing code examples from other languages and additional descriptions of the problem as a whole, however I would like to get the groups implemented by whatever means possible.
<?php
function knapSolveFast2($itemWeight, $itemValue, $i, $availWeight, &$memoItems, &$pickedItems)
{
global $numcalls;
$numcalls++;
// Return memo if we have one
if (isset($memoItems[$i][$availWeight]))
{
return array( $memoItems[$i][$availWeight], $memoItems['picked'][$i][$availWeight] );
}
else
{
// At end of decision branch
if ($i == 0)
{
if ($itemWeight[$i] <= $availWeight)
{ // Will this item fit?
$memoItems[$i][$availWeight] = $itemValue[$i]; // Memo this item
$memoItems['picked'][$i][$availWeight] = array($i); // and the picked item
return array($itemValue[$i],array($i)); // Return the value of this item and add it to the picked list
}
else
{
// Won't fit
$memoItems[$i][$availWeight] = 0; // Memo zero
$memoItems['picked'][$i][$availWeight] = array(); // and a blank array entry...
return array(0,array()); // Return nothing
}
}
// Not at end of decision branch..
// Get the result of the next branch (without this one)
list ($without_i,$without_PI) = knapSolveFast2($itemWeight, $itemValue, $i-1, $availWeight,$memoItems,$pickedItems);
if ($itemWeight[$i] > $availWeight)
{ // Does it return too many?
$memoItems[$i][$availWeight] = $without_i; // Memo without including this one
$memoItems['picked'][$i][$availWeight] = array(); // and a blank array entry...
return array($without_i,array()); // and return it
}
else
{
// Get the result of the next branch (WITH this one picked, so available weight is reduced)
list ($with_i,$with_PI) = knapSolveFast2($itemWeight, $itemValue, ($i-1), ($availWeight - $itemWeight[$i]),$memoItems,$pickedItems);
$with_i += $itemValue[$i]; // ..and add the value of this one..
// Get the greater of WITH or WITHOUT
if ($with_i > $without_i)
{
$res = $with_i;
$picked = $with_PI;
array_push($picked,$i);
}
else
{
$res = $without_i;
$picked = $without_PI;
}
$memoItems[$i][$availWeight] = $res; // Store it in the memo
$memoItems['picked'][$i][$availWeight] = $picked; // and store the picked item
return array ($res,$picked); // and then return it
}
}
}
$items = array("map","compass","water","sandwich","glucose","tin","banana","apple","cheese","beer","suntan cream","camera","t-shirt","trousers","umbrella","waterproof trousers","waterproof overclothes","note-case","sunglasses","towel","socks","book");
$weight = array(9,13,153,50,15,68,27,39,23,52,11,32,24,48,73,42,43,22,7,18,4,30);
$value = array(150,35,200,160,60,45,60,40,30,10,70,30,15,10,40,70,75,80,20,12,50,10);
## Initialize
$numcalls = 0;
$memoItems = array();
$selectedItems = array();
## Solve
list ($m4, $selectedItems) = knapSolveFast2($weight, $value, sizeof($value)-1, 400, $memoItems, $selectedItems);
# Display Result
echo "<b>Items:</b><br>" . join(", ", $items) . "<br>";
echo "<b>Max Value Found:</b><br>$m4 (in $numcalls calls)<br>";
echo "<b>Array Indices:</b><br>". join(",", $selectedItems) . "<br>";
echo "<b>Chosen Items:</b><br>";
echo "<table border cellspacing=0>";
echo "<tr><td>Item</td><td>Value</td><td>Weight</td></tr>";
$totalValue = 0;
$totalWeight = 0;
foreach($selectedItems as $key)
{
$totalValue += $value[$key];
$totalWeight += $weight[$key];
echo "<tr><td>" . $items[$key] . "</td><td>" . $value[$key] . "</td><td>".$weight[$key] . "</td></tr>";
}
echo "<tr><td align=right><b>Totals</b></td><td>$totalValue</td><td>$totalWeight</td></tr>";
echo "</table><hr>";
?>

That knapsack program is traditional, but I think that it obscures what's going on. Let me show you how the DP can be derived more straightforwardly from a brute force solution.
In Python (sorry; this is my scripting language of choice), a brute force solution could look like this. First, there's a function for generating all subsets with breadth-first search (this is important).
def all_subsets(S): # brute force
subsets_so_far = [()]
for x in S:
new_subsets = [subset + (x,) for subset in subsets_so_far]
subsets_so_far.extend(new_subsets)
return subsets_so_far
Then there's a function that returns True if the solution is valid (within budget and with a proper position breakdown) – call it is_valid_solution – and a function that, given a solution, returns the total player value (total_player_value). Assuming that players is the list of available players, the optimal solution is this.
max(filter(is_valid_solution, all_subsets(players)), key=total_player_value)
Now, for a DP, we add a function cull to all_subsets.
def best_subsets(S): # DP
subsets_so_far = [()]
for x in S:
new_subsets = [subset + (x,) for subset in subsets_so_far]
subsets_so_far.extend(new_subsets)
subsets_so_far = cull(subsets_so_far) ### This is new.
return subsets_so_far
What cull does is to throw away the partial solutions that are clearly not going to be missed in our search for an optimal solution. If the partial solution is already over budget, or if it already has too many players at one position, then it can safely be discarded. Let is_valid_partial_solution be a function that tests these conditions (it probably looks a lot like is_valid_solution). So far we have this.
def cull(subsets): # INCOMPLETE!
return filter(is_valid_partial_solution, subsets)
The other important test is that some partial solutions are just better than others. If two partial solutions have the same position breakdown (e.g., two forwards and a center) and cost the same, then we only need to keep the more valuable one. Let cost_and_position_breakdown take a solution and produce a string that encodes the specified attributes.
def cull(subsets):
best_subset = {} # empty dictionary/map
for subset in filter(is_valid_partial_solution, subsets):
key = cost_and_position_breakdown(subset)
if (key not in best_subset or
total_value(subset) > total_value(best_subset[key])):
best_subset[key] = subset
return best_subset.values()
That's it. There's a lot of optimization to be done here (e.g., throw away partial solutions for which there's a cheaper and more valuable partial solution; modify the data structures so that we aren't always computing the value and position breakdown from scratch and to reduce the storage costs), but it can be tackled incrementally.

One potential small advantage with regard to composing recursive functions in PHP is that variables are passed by value (meaning a copy is made) rather than reference, which can save a step or two.
Perhaps you could better clarify what you are looking for by including a sample input and output. Here's an example that makes combinations from given groups - I'm not sure if that's your intention... I made the section accessing the partial result allow combinations with less value to be considered if their weight is lower - all of this can be changed to prune in the specific ways you would like.
function make_teams($players, $position_limits, $weights, $values, $max_weight){
$player_counts = array_map(function($x){
return count($x);
}, $players);
$positions = array_map(function($x){
$positions[] = [];
},$position_limits);
$num_positions = count($positions);
$combinations = [];
$hash = [];
$stack = [[$positions,0,0,0,0,0]];
while (!empty($stack)){
$params = array_pop($stack);
$positions = $params[0];
$i = $params[1];
$j = $params[2];
$len = $params[3];
$weight = $params[4];
$value = $params[5];
// too heavy
if ($weight > $max_weight){
continue;
// the variable, $positions, is accumulating so you can access the partial result
} else if ($j == 0 && $i > 0){
// remember weight and value after each position is chosen
if (!isset($hash[$i])){
$hash[$i] = [$weight,$value];
// end thread if current value is lower for similar weight
} else if ($weight >= $hash[$i][0] && $value < $hash[$i][1]){
continue;
// remember better weight and value
} else if ($weight <= $hash[$i][0] && $value > $hash[$i][1]){
$hash[$i] = [$weight,$value];
}
}
// all positions have been filled
if ($i == $num_positions){
$positions[] = $weight;
$positions[] = $value;
if (!empty($combinations)){
$last = &$combinations[count($combinations) - 1];
if ($weight < $last[$num_positions] && $value > $last[$num_positions + 1]){
$last = $positions;
} else {
$combinations[] = $positions;
}
} else {
$combinations[] = $positions;
}
// current position is filled
} else if (count($positions[$i]) == $position_limits[$i]){
$stack[] = [$positions,$i + 1,0,$len,$weight,$value];
// otherwise create two new threads: one with player $j added to
// position $i, the other thread skipping player $j
} else {
if ($j < $player_counts[$i] - 1){
$stack[] = [$positions,$i,$j + 1,$len,$weight,$value];
}
if ($j < $player_counts[$i]){
$positions[$i][] = $players[$i][$j];
$stack[] = [$positions,$i,$j + 1,$len + 1
,$weight + $weights[$i][$j],$value + $values[$i][$j]];
}
}
}
return $combinations;
}
Output:
$players = [[1,2],[3,4,5],[6,7]];
$position_limits = [1,2,1];
$weights = [[2000000,1000000],[10000000,1000500,12000000],[5000000,1234567]];
$values = [[33,5],[78,23,10],[11,101]];
$max_weight = 20000000;
echo json_encode(make_teams($players, $position_limits, $weights, $values, $max_weight));
/*
[[[1],[3,4],[7],14235067,235],[[2],[3,4],[7],13235067,207]]
*/

PHP - Nest a for loop a variable number of times

I'm a beginner with PHP (and programming in general).
To test what I've learned so far I wrote this code, which prints all the possibile combinations of a set number of dice with a certain number of faces. (you'll find the code at the end).
What I want to do is dynamically change the number of nested for loops according to the $dicenumber variable. Right now it can only process 3 dice, since the code is:
for ($d1=1; $d1 <= $d1value ; $d1++) {
for ($d2=1; $d2 <= $d2value ; $d2++) {
for ($d3=1; $d3 <= $d3value ; $d3++) {
array_push(${sum.($d1+$d2+$d3)}, "$d1"."$d2"."$d3");
}
}
}
But I want to change it so that, for example, if $dicenumber were 2, it would produce something like:
for ($d1=1; $d1 <= $d1value ; $d1++) {
for ($d2=1; $d2 <= $d2value ; $d2++) {
array_push(${sum.($d1+$d2)}, "$d1"."$d2");
}
}
I want the code to process for whatever number $dicenumber may be, without limits. Looking around, it seems like I have to add some kind of recursive code, but I don't know how to do that. Any tips? Also, any feedback on what I did wrong in general, would be extremely helpful! thanks!
<?php
//defines the number and type of dice
$dicenumber = 3;
$dtype = 6;
//defines the maximum value of every die
for ($i=1; $i <=$dicenumber ; $i++) {
${d.$i.value} = $dtype;
}
//defines and array for each possible sum resulting from the roll of the given number of dice.
for ($i=$dicenumber; $i <= ($dtype*$dicenumber) ; $i++) {
${sum.$i} = array();
}
//the troublesome piece of code I want to change
for ($d1=1; $d1 <= $d1value ; $d1++) {
for ($d2=1; $d2 <= $d2value ; $d2++) {
for ($d3=1; $d3 <= $d3value ; $d3++) {
array_push(${sum.($d1+$d2+$d3)}, "$d1"."$d2"."$d3");
}
}
}
//prints all the possible roll combinations, each line lists combination that share the same sum
for ($i=$dicenumber; $i <= ($dtype*$dicenumber); $i++) {
print join(" ", ${sum.$i})."<br />";
}
?>

Here we have a two-function process. The first function, buildArrays, creates arrays in the proper format to feed into the second function, allCombinations. So, for this example with 3 d6's in play, buildArrays will produce an array equivalent to this:
$data = array(
array(1, 2, 3, 4, 5, 6),
array(1, 2, 3, 4, 5, 6),
array(1, 2, 3, 4, 5, 6));
I will warn you that as you increase the number of dice and the number of sides, the number of possible combinations increases exponentially! This means that you could place a very large demand on the server, and both timeout and max memory limits will quickly come into play. The arrays generated could be very, very large and quickly consume more than the max memory limit. That said, here we go:
function buildArrays($dicenumber, $dtype){
for ($i = 0; $i<$dicenumber; $i++){
$tmp = array();
for ($j = 1; $j<=$dtype; $j++){
$tmp[] = $j;
}
$data[$i] = $tmp;
}
return $data;
}
function allCombinations($data){
$result = array(array()); //this is crucial, dark magic.
foreach ($data as $key => $array) {
$new_result = array();
foreach ($result as $old_element){
foreach ($array as $element){
if ($key == 0){
$new_result[] = $element;
} else {
$new_result[] = $old_element.$element;
}
}
$result = $new_result;
}
}
return $result;
}
//set variables
$dicenumber = 3;
$dtype = 6;
//set_time_limit(0); //You may need to uncomment this for large values.
//call functions
$data = buildArrays($dicenumber, $dtype);
$results = allCombinations($data);
//print out the results
foreach ($results as $result){
echo $result."<br/>";
}
N.B. This answer is a variant of the cartesian product code

If you have learned functions, you can do a recursive call and keep track of what dicenumber you're on, then increment it each call to the function (and end the loop once you've hit your #).
understanding basic recursion

How to reduce lists of ranges?

Given a list of ranges ie: 1-3,5,6-4,31,9,19,10,25-20
how can i reduce it to 1-6,9-10,19-25,31 ?
Here is what i've done so far, it seems a little bit complicated, so
is there any simpler/clever method to do this.
$in = '1-3,5,6-4,31,9,19,10,25-20';
// Explode the list in ranges
$rs = explode(',', $in);
$tmp = array();
// for each range of the list
foreach($rs as $r) {
// find the start and end date of the range
if (preg_match('/(\d+)-(\d+)/', $r, $m)) {
$start = $m[1];
$end = $m[2];
} else {
// If only one date
$start = $end = $r;
}
// flag each date in an array
foreach(range($start,$end) as $i) {
$tmp[$i] = 1;
}
}
$str = '';
$prev = 999;
// for each date of a month (1-31)
for($i=1; $i<32; $i++) {
// is this date flaged ?
if (isset($tmp[$i])) {
// is output string empty ?
if ($str == '') {
$str = $i;
} else {
// if the previous date is less than the current minus 1
if ($i-1 > $prev) {
// build the new range
$str .= '-'.$prev.','.$i;
}
}
$prev = $i;
}
}
// build the last range
if ($i-1 > $prev) {
$str .= '-'.$prev;
}
echo "str=$str\n";
NB: it must run under php 5.1.6 (i can't upgrade).
FYI : the numbers represent days of month so they are limited to 1-31.
Edit:
From a given range of dates (1-3,6,7-8), i'd like obtain another list (1-3,6-8) where all the ranges are recalculated and ordered.

Perhaps not the most efficient, but shouldn't be too bad with the limited range of values you're working with:
$in = '1-3,5,6-4,31,9,19,10,25-20';
$inSets = explode(',',$in);
$outSets = array();
foreach($inSets as $inSet) {
list($start,$end) = explode('-',$inSet.'-'.$inSet);
$outSets = array_merge($outSets,range($start,$end));
}
$outSets = array_unique($outSets);
sort($outSets);
$newSets = array();
$start = $outSets[0];
$end = -1;
foreach($outSets as $outSet) {
if ($outSet == $end+1) {
$end = $outSet;
} else {
if ($start == $end) {
$newSets[] = $start;
} elseif($end > 0) {
$newSets[] = $start.'-'.$end;
}
$start = $end = $outSet;
}
}
if ($start == $end) {
$newSets[] = $start;
} else {
$newSets[] = $start.'-'.$end;
}
var_dump($newSets);
echo '<br />';

You just have to search your data to get what you want. Split the input on the delimiter, in your case ','. Then sort it somehow, this safes you searching left from the current position. Take you first element, check whether it's a range and use the highest number in this range (3 out of 1-3 range or 3 if 3 is a single element) for further comparisions. Then take the 2nd element in your list and check if it's a direct successor of the last element. If yes combine the 1st and 2nd elements/range to a new range. Repeat.
Edit: I'm not sure about PHP but a regular expression is a bit overkill for this problem. Just look for a '-' in your exploded array, then you know it's a range. Sorting the exp. array safes you the backtracking, the stuff you are doing with $prev. You could also explode every element in the exploded array on '-' and check if the resulting array has a size > 1 to learn whether an element is a range or not.

Looking at the problem from an algorithmic stand-point, let's consider the limitations that you've put on the problem. All numbers will be from 1-31. The list is a collection of "ranges", each of which is defined by two numbers (start and end). There is no rule for whether start will be more, less than, or equal to end.
Since we have an arbitrarily large list of ranges but a definite means of sorting/organizing these, a divide and conquer strategy may yield the best complexity.
At first I typed out a very long and careful explanation of how I created each step in this algorithm (the dividing portion, the conquering potion, optimizations, etc.) however the explanation got extremely long. To shorten it, here's the final answer:
<?php
$ranges = "1-3,5,6-4,31,9,19,10,25-20";
$range_array = explode(',', $ranges);
$include = array();
foreach($range_array as $range){
list($start, $end) = explode('-', $range.'-'.$range); //"1-3-1-3" or "5-5"
$include = array_merge($include, range($start, $end));
}
$include = array_unique($include);
sort($include);
$new_ranges = array();
$start = $include[0];
$count = $start;
// And begin the simple conquer algorithm
for( $i = 1; $i < count($include); $i++ ){
if( $include[$i] != ($count++) ){
if($start == $count-1){
$new_ranges[] = $start;
} else {
$new_ranges[] = $start."-".$count-1;
}
$start = $include[$i];
$count = $start;
}
}
$new_ranges = implode(',', $new_ranges);
?>
This should (theoretically) work on arrays of arbitrary length for any positive integers. Negative integers would get tripped up since - is our delimiter for the range.

We Keep Coding

PHP, A popular general-purpose scripting language that is especially suited to web development.

Removing enclosed intervals in an array of intervals in PHP - php

The problem was missing break in the while cycle function dup($a){ $c = count($a)-1; for ($i = $c; $i > 0; $i --){ while ($a[$i]['h'] <= $a[$i-1]['h']){ unset($a[$i]); break; //here } } $a = array_values($a); }

Related

Best method to intersect many arrays

Optimal way of cycling through 1000's of values

Knapsack Equation with item groups

PHP - Nest a for loop a variable number of times

How to reduce lists of ranges?

Categories

Resources