Pearson correlation in PHP - php

I'm trying to implement the calculation of correlation coefficient of people between two sets of data in php.
I'm just trying to do the porting python script that can be found at this url
http://answers.oreilly.com/topic/1066-how-to-find-similar-users-with-python/
my implementation is the following:
class LB_Similarity_PearsonCorrelation implements LB_Similarity_Interface{
public function similarity($user1, $user2){
$sharedItem = array();
$pref1 = array();
$pref2 = array();
$result1 = $user1->fetchAllPreferences();
$result2 = $user2->fetchAllPreferences();
foreach($result1 as $pref){
$pref1[$pref->item_id] = $pref->rate;
}
foreach($result2 as $pref){
$pref2[$pref->item_id] = $pref->rate;
}
foreach ($pref1 as $item => $preferenza){
if(key_exists($item,$pref2)){
$sharedItem[$item] = 1;
}
}
$n = count($sharedItem);
if ($n == 0) return 0;
$sum1 = 0;$sum2 = 0;$sumSq1 = 0;$sumSq2 = 0;$pSum = 0;
foreach ($sharedItem as $item_id => $pre) {
$sum1 += $pref1[$item_id];
$sum2 += $pref2[$item_id];
$sumSq1 += pow($pref1[$item_id],2);
$sumSq2 += pow($pref2[$item_id],2);
$pSum += $pref1[$item_id] * $pref2[$item_id];
}
$num = $pSum - (($sum1 * $sum2) / $n);
$den = sqrt(($sumSq1 - pow($sum1,2)/$n) * ($sumSq2 - pow($sum2,2)/$n));
if ($den == 0) return 0;
return $num/$den;
}
}
clarification to better understand the code, the method fetchAllPreferences return back a set of objects that are actually the items, turns them into an array for ease of management
I'm not sure that this implementation is correct, in particular I have some doubts about the correctness of the calculation of the denominator.
any advice is welcome.
thanks in advance!

This is my solution:
function php_correlation($x,$y){
if(count($x)!==count($y)){return -1;}
$x=array_values($x);
$y=array_values($y);
$xs=array_sum($x)/count($x);
$ys=array_sum($y)/count($y);
$a=0;$bx=0;$by=0;
for($i=0;$i<count($x);$i++){
$xr=$x[$i]-$xs;
$yr=$y[$i]-$ys;
$a+=$xr*$yr;
$bx+=pow($xr,2);
$by+=pow($yr,2);
}
$b = sqrt($bx*$by);
if($b==0) return 0;
return $a/$b;
}
http://profprog.ru/korrelyaciya-na-php-php-simple-pearson-correlation/

Your algorithm looks mathematically correct but numerically unstable. Finding the sum of squares explicitly is a recipe for disaster. What if you have numbers like array(10000000001, 10000000002, 10000000003)? A numerically stable one-pass algorithm for calculating the variance can be found on Wikipedia, and the same principle can be applied to computing the covariance.
Easier yet, if you don't care much about speed, you could just use two passes. Find the means in the first pass, then compute the variances and covariances using the textbook formula in the second pass.

try my package here
http://www.phpclasses.org/browse/package/5854.html

Related

Solve Multiple Choice Knapsack (MCKP) With Dynamic Programming?

Example Data
For this question, let's assume the following items:
Items: Apple, Banana, Carrot, Steak, Onion
Values: 2, 2, 4, 5, 3
Weights: 3, 1, 3, 4, 2
Max Weight: 7
Objective:
The MCKP is a type of Knapsack Problem with the additional constraint that "[T]he items are subdivided into k classes... and exactly one item must be taken from each class"
I have written the code to solve the 0/1 KS problem with dynamic programming using recursive calls and memoization. My question is whether it is possible to add this constraint to my current solution? Say my classes are Fruit, Vegetables, Meat (from the example), I would need to include 1 of each type. The classes could just as well be type 1, 2, 3.
Also, I think this can be solved with linear programming and a solver, but if possible, I'd like to understand the answer here.
Current Code:
<?php
$value = array(2, 2, 4, 5, 3);
$weight = array(3, 1, 3, 4, 2);
$maxWeight = 7;
$maxItems = 5;
$seen = array(array()); //2D array for memoization
$picked = array();
//Put a dummy zero at the front to make things easier later.
array_unshift($value, 0);
array_unshift($weight, 0);
//Call our Knapsack Solver and return the sum value of optimal set
$KSResult = KSTest($maxItems, $maxWeight, $value, $weight);
$maxValue = $KSResult; //copy the result so we can recreate the table
//Recreate the decision table from our memo array to determine what items were picked
//Here I am building the table backwards because I know the optimal value will be at the end
for($i=$maxItems; $i > 0; $i--) {
for($j=$maxWeight; $j > 0; $j--) {
if($seen[$i][$j] != $seen[$i-1][$j]
&& $maxValue == $seen[$i][$j]) {
array_push($picked, $i);
$maxValue -= $value[$i];
break;
}
}
}
//Print out picked items and max value
print("<pre>".print_r($picked,true)."</pre>");
echo $KSResult;
// Recursive formula to solve the KS Problem
// $n = number of items to check
// $c = total capacity of bag
function KSTest($n, $c, &$value, &$weight) {
global $seen;
if(isset($seen[$n][$c])) {
//We've seen this subproblem before
return $seen[$n][$c];
}
if($n === 0 || $c === 0){
//No more items to check or no more capacity
$result = 0;
}
elseif($weight[$n] > $c) {
//This item is too heavy, check next item without this one
$result = KSTest($n-1, $c, $value, $weight);
}
else {
//Take the higher result of keeping or not keeping the item
$tempVal1 = KSTest($n-1, $c, $value, $weight);
$tempVal2 = $value[$n] + KSTest($n-1, $c-$weight[$n], $value, $weight);
if($tempVal2 >= $tempVal1) {
$result = $tempVal2;
//some conditions could go here? otherwise use max()
}
else {
$result = $tempVal1;
}
}
//memo the results and return
$seen[$n][$c] = $result;
return $result;
}
?>
What I've Tried:
My first thought was to add a class (k) array, sort the items via class (k), and when we choose to select an item that is the same as the next item, check if it's better to keep the current item or the item without the next item. Seemed promising, but fell apart after a couple of items being checked. Something like this:
$tempVal3 = $value[$n] + KSTest($n-2, $c-$weight[$n]);
max( $tempVal2, $tempVal3);
Another thought is that at the function call, I could call a loop for each class type and solve the KS with only 1 item at a time of that type + the rest of the values. This will definitely be making some assumptions thought because the results of set 1 might still be assuming multiples of set 2, for example.
This looks to be the equation (If you are good at reading all those symbols?) :) and a C++ implementation? but I can't really see where the class constraint is happening?
The c++ implementation looks ok.
Your values and weights which are 1 dimensional array in your current PHP implementation will become 2 dimensional.
So for example,
values[i][j] will be value of j th item in class i. Similarly in case of weights[i][j]. You will be taking only one item for each class i and move forward while maximizing the condition.
The c++ implementation also does an optimization in memo. It only keeps 2 arrays of size respecting the max_weight condition, which are current and previous states. This is because you only need these 2 states at a time to compute present state.
Answers to your doubts:
1)
My first thought was to add a class (k) array, sort the items via
class (k), and when we choose to select an item that is the same as
the next item, check if it's better to keep the current item or the
item without the next item. Seemed promising, but fell apart after a
couple of items being checked. Something like this: $tempVal3 =
$value[$n] + KSTest($n-2, $c-$weight[$n]); max( $tempVal2, $tempVal3);
This won't work because there could be some item in class k+1 where you take a optimal value and to respect constraint you need to take a suboptimal value for class k. So sorting and picking the best won't work when the constraint is hit. If the constraint is not hit you can always pick the best value with best weight.
2)
Another thought is that at the function call, I could call a loop for
each class type and solve the KS with only 1 item at a time of that
type + the rest of the values.
Yes you are on the right track here. You will assume that you had already solved for first k classes. Now you will try extending using the values of k+1 class respecting the weight constraint.
3)
... but I can't really see where the class constraint is happening?
for (int i = 1; i < weight.size(); ++i) {
fill(current.begin(), current.end(), -1);
for (int j = 0; j < weight[i].size(); ++j) {
for (int k = weight[i][j]; k <= max_weight; ++k) {
if (last[k - weight[i][j]] > 0)
current[k] = max(current[k],
last[k - weight[i][j]] + value[i][j]);
}
}
swap(current, last);
}
In the above c++ snippet, the first loop iterates on class, the second loop iterates on values of class and the third loop extends the current state current using the previous state last and only 1 item j with class i at a time. Since you are only using previous state last and 1 item of the current class to extend and maximize, you are following the constraint.
Time complexity:
O( total_items x max_weight) which is equivalent to O( class x max_number_of_items_in_a_class x max_weight)
So I am not a php programmer but I will try to write a pseudocode with good explanation.
In the original problem each cell i, j meaning was: "Value of filling the knapsack with items 1 to i until it reach capacity j", the solution in the link you have provided defines each cell as "Value of filling the knapsack with items from buckets 1 to i until it reach capacity j". Notice that in this variation there is not such this as not taking an element from a class.
So on each step (each call for KSTest with $n, $c), we need to find which element to pick from the n'th class such that the weight of this element is less than c and it's value + KSTest(n - 1, c - w) is the greatest.
So I think you should only change the else if and else statements to something like:
else {
$result = 0
for($i=0; $i < $number_of_items_in_nth_class; $i++) {
if ($weight[$n][$i] > $c) {
//This item is too heavy, check next item
continue;
}
$result = max($result, KSTest($n-1, $c - $weight[$n][$i], $value, $weight));
}
}
Now two disclaimers:
I do not code in php so this code will not run :)
This is not the implementation given in the link you provided, TBH I didn't understood why the time complexity of their algorithm is so small (and what is C) but this implementation should work since it is following the definition of the recursive formula given.
The time complexity of this should be O(max_weight * number_of_classes * size_of_largerst_class).
This is my PHP solution. I've tried to comment the code in a way that it's easy to follow.
Update:
I updated the code because the old script was giving unreliable results. This is cleaner and has been thoroughly tested. Key takeaways are that I use two memo arrays, one at the group level to speed up execution and one at the item level to reconstruct the results. I found any attempts to track which items are being chosen as you go are unreliable and much less efficient. Also, isset() instead of if($var) is essential for checking the memo array because the previous results might have been 0 ;)
<?php
/**
* Multiple Choice Knapsack Solver
*
* #author Michael Cruz
* #version 1.0 - 03/27/2020
**/
class KS_Solve {
public $KS_Items;
public $maxValue;
public $maxWeight;
public $maxItems;
public $finalValue;
public $finalWeight;
public $finalItems;
public $finalGroups;
public $memo1 = array(); //Group memo
public $memo2 = array(); //Item memo for results rebuild
public function __construct() {
//some default variables as an example.
//KS_Items = array(Value, Weight, Group, Item #)
$this->KS_Items = array(
array(2, 3, 1, 1),
array(2, 1, 1, 2),
array(4, 3, 2, 3),
array(5, 4, 2, 4),
array(3, 2, 3, 5)
);
$this->maxWeight = 7;
$this->maxItems = 5;
$this->KS_Wrapper();
}
public function KS_Wrapper() {
$start_time = microtime(true);
//Put a dummy zero at the front to make things easier later.
array_unshift($this->KS_Items, array(0, 0, 0, 0));
//Call our Knapsack Solver
$this->maxValue = $this->KS_Solver($this->maxItems, $this->maxWeight);
//Recreate the decision table from our memo array to determine what items were picked
//ksort($this->memo2); //for debug
for($i=$this->maxItems; $i > 0; $i--) {
//ksort($this->memo2[$i]); //for debug
for($j=$this->maxWeight; $j > 0; $j--) {
if($this->maxValue == 0) {
break 2;
}
if($this->memo2[$i][$j] == $this->maxValue
&& $j == $this->maxWeight) {
$this->maxValue -= $this->KS_Items[$i][0];
$this->maxWeight -= $this->KS_Items[$i][1];
$this->finalValue += $this->KS_Items[$i][0];
$this->finalWeight += $this->KS_Items[$i][1];
$this->finalItems .= " " . $this->KS_Items[$i][3];
$this->finalGroups .= " " . $this->KS_Items[$i][2];
break;
}
}
}
//Print out the picked items and value. (IMPLEMENT Proper View or Return!)
echo "<pre>";
echo "RESULTS: <br>";
echo "Value: " . $this->finalValue . "<br>";
echo "Weight: " . $this->finalWeight . "<br>";
echo "Item's in KS:" . $this->finalItems . "<br>";
echo "Selected Groups:" . $this->finalGroups . "<br><br>";
$end_time = microtime(true);
$execution_time = ($end_time - $start_time);
echo "Results took " . sprintf('%f', $execution_time) . " seconds to execute<br>";
}
/**
* Recursive function to solve the MCKS Problem
* $n = number of items to check
* $c = total capacity of KS
**/
public function KS_Solver($n, $c) {
$group = $this->KS_Items[$n][2];
$groupItems = array();
$count = 0;
$result = 0;
$bestVal = 0;
if(isset($this->memo1[$group][$c])) {
$result = $this->memo1[$group][$c];
}
else {
//Sort out the items for this group
foreach($this->KS_Items as $item) {
if($item[2] == $group) {
$groupItems[] = $item;
$count++;
}
}
//$k adjusts the index for item memoization
$k = $count - 1;
//Find the results of each item + items of other groups
foreach($groupItems as $item) {
if($item[1] > $c) {
//too heavy
$result = 0;
}
elseif($item[1] >= $c && $group != 1) {
//too heavy for next group
$result = 0;
}
elseif($group == 1) {
//Just take the highest value
$result = $item[0];
}
else {
//check this item with following groups
$result = $item[0] + $this->KS_Solver($n - $count, $c - $item[1]);
}
if($result == $item[0] && $group != 1) {
//No solution with the following sets, so don't use this item.
$result = 0;
}
if($result > $bestVal) {
//Best item so far
$bestVal = $result;
}
//memo the results
$this->memo2[$n-$k][$c] = $result;
$k--;
}
$result = $bestVal;
}
//memo and return
$this->memo1[$group][$c] = $result;
return $result;
}
}
new KS_Solve();
?>

How to revert a function in PHP?

I am building a little game and got stuck in developing the leveling system. I created a function that will exponentially increase the experience required for the next level. However, I am not sure how to turn it around so that I can put in the amount of experience a user has gained and get the corresponding level.
PHP function
function experience($level, $curve = 300) {
// Preset value to prevent notices
$a = 0;
// Calculate level cap
for ($x = 1; $x < $level; $x++) {
$a += floor($x+$curve*pow(2, ($x/7)));
}
// Return amount of experience
return floor($a/4);
}
The issue
I am wondering how I can reverse engineer this function in order to return the correct level for a certain amount of experience.
Using the above function, my code would output the following:
Level 1: 0
Level 2: 83
Level 3: 174
Level 4: 276
Level 5: 388
Level 6: 512
Level 7: 650
Level 8: 801
Level 9: 969
Level 10: 1154
What I am looking for is a way to invert this function so that I can input a certain amount and it will return the corresponding level.
A 1000 experience should return level 9 for example.
Plugging the values into excel and creating a trend line, I got the following equation:
y = 1.17E-09x^3 - 4.93E-06x^2 + 1.19E-02x + 6.43E-02
So your reverse engineered equation would be
function level($xp) {
$a = 1.17e-9;
$b = -4.93e-6;
$c = 0.0119;
$d = 0.0643
return round($a*pow($xp, 3) + $b*pow($xp,2) + $c * $xp + $d);
}
Results are accurate to within 1dp, but if your $curve changes, you'd need to recalculate. I also haven't extended higher than level 10.
Other options include caching the results of the lookup:
$levelXpAmounts = array()
function populateLevelArray($curve=300) {
$levelXpAmounts[$curve] = array();
for($level = $minlevel; $level <= $maxLevel; $level++) {
$levelXpAmounts[$curve][$level] = experience($level);
}
}
//at game load:
populateLevelArray()
Then, your reverse lookup would be
function level($xp, $curve=300) {
if (!array_key_exists($levelXpAmounts, curve)
populateLevelArray($curve);
for($level = $minlevel; $ level <= $maxLevel; $level++) {
if ($xp < $levelXpAmounts[$curve][$level]) {
return $level - 1;
}
}
}
That way, the iteration through all the levels is only done once for each different value of $curve. You can also replace your old experience() function with a (quite likely faster) lookup.
Note: it's been a while since I've written any php, so my syntax may be a little rusty. I apologize in advance for any errors in that regard.
You can do another function called level which uses the experience function to find the level:
function level($experience)
{
for ($level = 1; $level <= 10; $level++) {
if ($experience <= experience($level)) {
return $level;
}
}
}
function experience($level, $curve = 300)
{
$a = 0;
for ($x = 1; $x < $level; $x++) {
$a += floor($x+$curve*pow(2, ($x/7)));
}
return floor($a/4);
}
var_dump(level(1000));
You can clearly work the math here and find a reverse formula. Not sure whether it will be a nice and easy formula, so I would suggest you an alternative approach which is easy to implement.
Precalculate the results for all the levels you realistically want your person to achieve (I highly doubt that you need more than 200 levels, because based on my estimation you will need tens of billions exp points).
Store all these levels in the array: $arr = [0, 83, 174, 276, 388, 512, 650, ...];. Now your array is sorted and you need to find a position where your level should fit.
If you are looking for 400 exp points, you see that it should be inserted after 5-th position - so it is 5-th level. Even a simple loop will suffice, but you can also write a binary search.
This task could be solved in other way. This is method of partial sums.
Let's assume, you have a class , which stores an array of exponential values calculated by function:
function formula($level, $curve){ return floor($level+$curve*pow(2, ($level/7)));}
$MAX_LEVEL = 90;
function calculateCurve($curve){
$array = [];
for($i =0; $i< $MAX_LEVEL; $i++) $array.push(formula($i, $curve));
return $array;
}
Now we can calculate experience, needed for a level:
$curve = calculateCurve(300);
function getExperienceForLevel($level, $curve){
$S = 0;
for($i =0; $i < level; $i++) $S += $curve[$i];
}
And calculate level for experience:
function getLevelForExperience($exp, $curve){
for($i =0; $i < $MAX_LEVEL; $i++){
$exp -= $curve[$i];
if($exp < 0) return $i-1;
}
return $MAX_LEVEL;
}
I assume there could index problems - I didn't tested the code, but I suppose that main idea is clearly explained.
Pros:
Code cleaner, There no magic numbers and interpolation coeficients.
You can easy change your learning curve.
Possibility to improve and make calculating functions as O(1);
Cons:
There is an $curve array to store, or calculate somewhere.
Also. you could make even more advanced version of this:
function calculateCurve($curve){
$array = [];
$exp = 0;
for($i =0; $i< $MAX_LEVEL; $i++) {
$exp += formula($i, $curve);
$array.push($exp);
}
return $array;
}
Now calculating experience have O(1) complexity;
function getExperienceForLevel($level, $curve){
return $curve[min($MAX_LEVEL, $level)];
}
Perhaps not the best way, but it's working.
function level($experience, $curve = 300)
{
$minLevel = 1;
$maxLevel = 10;
for($level = $minLevel; $level <= $maxLevel; $level++)
{
if(experience($level, $curve) <= $experience && $experience < experience($level + 1, $curve))
{
return $level;
}
}
return $maxLevel;
}

How to grab lottery ball naturally

<?php
class Lottery
{
private $start;
private $end;
public function __construct($start = 1, $end = 49)
{
$this->start = $start;
$this->end = $end;
}
public function balls($num = 5)
{
$balls = range($this->start, $this->end);
shuffle($balls);
$index = array_rand($balls, $num);
$result = [];
for ($i = 0; $i < $num; ++$i) {
$result[] = $balls[$index[$i]];
}
$result = implode(', ', $result);
return $result;
}
}
This is how I get my lottery ball, my friend told me with more and more game goes, there will be some rule to follow if you random by built-in function, is that true? and how to prevent that.
updated code
<?php
class Lottery
{
private $start;
private $end;
public $tempBalls = [];
public function __construct($start = 1, $end = 49)
{
$this->start = $start;
$this->end = $end;
}
public function balls($num = 5)
{
$balls = range($this->start, $this->end);
$results = [];
do {
$results[] = $this->randBall($balls);
} while (count($results) < $num);
$result = implode(', ', $results);
$this->tempBalls = [];
return $result;
}
private function randBall($range)
{
$ball = $range[mt_rand(0, count($range) - 1)];
if (!in_array($ball, $this->tempBalls)) {
$this->tempBalls[] = $ball;
} else {
return $this->randBall($range);
}
return $ball;
}
}
I've not tested with shuffle() but certainly the rand() function will always return the same sequence of numbers unless you first set a seed value. Checking if this applies to shuffle() should be trivial.
If you read the documentation, you'll see that rand() returns a pseudo random number and contains warnings about not using it for encryption. If you consider a simpler case, where rand always returns a digit from the set {0,1,2,3,4,5} then wraps around, just starting at random offset then it should be obvious that this undermines the point of the encryption, indeed changing the order of numbers in the initial set does not really impact this much.
But how much of this actually applies to lottery balls? Firstly, nobody with any knowledge of encryption would use such a small range to generate a random number in such a range. Secondly, you are not trying to protect the secrecy of data.
The random numbers you use for selecting numbers to put on a ticket does not influence the outcome.
The random numbers you pick for encryption does influence the outcome.
There may be other considerations about fairness. You don't say if you're picking values as a lottery customer or if you are picking the winning numbers. The latter implies that you must be able to prove there is not an intrinsic flaw in your code. I've said that rand() returns a predictable set of numbers, but its output when seeded requires a very large set of numbers to be known (the pattern repeats after 2^31 in the case of rand()), or the seed value to be known in order for the values to predicted.
By all means, use mt_rand() or openssl_random_pseudo_bytes() but it won't make any functional difference to your project.

Knapsack Equation with item groups

Can't call it a problem on Stack Overflow apparently, however I am currently trying to understand how to integrate constraints in the form of item groups within the Knapsack problem. My math skills are proving to be fairly limiting in this situation, however I am very motivated to both make this work as intended as well as figure out what each aspect does (in that order since things make more sense when they work).
With that said, I have found an absolutely beautiful implementation at Rosetta Code and cleaned up the variable names some to help myself better understand this from a very basic perspective.
Unfortunately I am having an incredibly difficult time figuring out how I can apply this logic to include item groups. My purpose is for building fantasy teams, supplying my own value & weight (points/salary) per player but without groups (positions in my case) I am unable to do so.
Would anyone be able to point me in the right direction for this? I'm reviewing code examples from other languages and additional descriptions of the problem as a whole, however I would like to get the groups implemented by whatever means possible.
<?php
function knapSolveFast2($itemWeight, $itemValue, $i, $availWeight, &$memoItems, &$pickedItems)
{
global $numcalls;
$numcalls++;
// Return memo if we have one
if (isset($memoItems[$i][$availWeight]))
{
return array( $memoItems[$i][$availWeight], $memoItems['picked'][$i][$availWeight] );
}
else
{
// At end of decision branch
if ($i == 0)
{
if ($itemWeight[$i] <= $availWeight)
{ // Will this item fit?
$memoItems[$i][$availWeight] = $itemValue[$i]; // Memo this item
$memoItems['picked'][$i][$availWeight] = array($i); // and the picked item
return array($itemValue[$i],array($i)); // Return the value of this item and add it to the picked list
}
else
{
// Won't fit
$memoItems[$i][$availWeight] = 0; // Memo zero
$memoItems['picked'][$i][$availWeight] = array(); // and a blank array entry...
return array(0,array()); // Return nothing
}
}
// Not at end of decision branch..
// Get the result of the next branch (without this one)
list ($without_i,$without_PI) = knapSolveFast2($itemWeight, $itemValue, $i-1, $availWeight,$memoItems,$pickedItems);
if ($itemWeight[$i] > $availWeight)
{ // Does it return too many?
$memoItems[$i][$availWeight] = $without_i; // Memo without including this one
$memoItems['picked'][$i][$availWeight] = array(); // and a blank array entry...
return array($without_i,array()); // and return it
}
else
{
// Get the result of the next branch (WITH this one picked, so available weight is reduced)
list ($with_i,$with_PI) = knapSolveFast2($itemWeight, $itemValue, ($i-1), ($availWeight - $itemWeight[$i]),$memoItems,$pickedItems);
$with_i += $itemValue[$i]; // ..and add the value of this one..
// Get the greater of WITH or WITHOUT
if ($with_i > $without_i)
{
$res = $with_i;
$picked = $with_PI;
array_push($picked,$i);
}
else
{
$res = $without_i;
$picked = $without_PI;
}
$memoItems[$i][$availWeight] = $res; // Store it in the memo
$memoItems['picked'][$i][$availWeight] = $picked; // and store the picked item
return array ($res,$picked); // and then return it
}
}
}
$items = array("map","compass","water","sandwich","glucose","tin","banana","apple","cheese","beer","suntan cream","camera","t-shirt","trousers","umbrella","waterproof trousers","waterproof overclothes","note-case","sunglasses","towel","socks","book");
$weight = array(9,13,153,50,15,68,27,39,23,52,11,32,24,48,73,42,43,22,7,18,4,30);
$value = array(150,35,200,160,60,45,60,40,30,10,70,30,15,10,40,70,75,80,20,12,50,10);
## Initialize
$numcalls = 0;
$memoItems = array();
$selectedItems = array();
## Solve
list ($m4, $selectedItems) = knapSolveFast2($weight, $value, sizeof($value)-1, 400, $memoItems, $selectedItems);
# Display Result
echo "<b>Items:</b><br>" . join(", ", $items) . "<br>";
echo "<b>Max Value Found:</b><br>$m4 (in $numcalls calls)<br>";
echo "<b>Array Indices:</b><br>". join(",", $selectedItems) . "<br>";
echo "<b>Chosen Items:</b><br>";
echo "<table border cellspacing=0>";
echo "<tr><td>Item</td><td>Value</td><td>Weight</td></tr>";
$totalValue = 0;
$totalWeight = 0;
foreach($selectedItems as $key)
{
$totalValue += $value[$key];
$totalWeight += $weight[$key];
echo "<tr><td>" . $items[$key] . "</td><td>" . $value[$key] . "</td><td>".$weight[$key] . "</td></tr>";
}
echo "<tr><td align=right><b>Totals</b></td><td>$totalValue</td><td>$totalWeight</td></tr>";
echo "</table><hr>";
?>
That knapsack program is traditional, but I think that it obscures what's going on. Let me show you how the DP can be derived more straightforwardly from a brute force solution.
In Python (sorry; this is my scripting language of choice), a brute force solution could look like this. First, there's a function for generating all subsets with breadth-first search (this is important).
def all_subsets(S): # brute force
subsets_so_far = [()]
for x in S:
new_subsets = [subset + (x,) for subset in subsets_so_far]
subsets_so_far.extend(new_subsets)
return subsets_so_far
Then there's a function that returns True if the solution is valid (within budget and with a proper position breakdown) – call it is_valid_solution – and a function that, given a solution, returns the total player value (total_player_value). Assuming that players is the list of available players, the optimal solution is this.
max(filter(is_valid_solution, all_subsets(players)), key=total_player_value)
Now, for a DP, we add a function cull to all_subsets.
def best_subsets(S): # DP
subsets_so_far = [()]
for x in S:
new_subsets = [subset + (x,) for subset in subsets_so_far]
subsets_so_far.extend(new_subsets)
subsets_so_far = cull(subsets_so_far) ### This is new.
return subsets_so_far
What cull does is to throw away the partial solutions that are clearly not going to be missed in our search for an optimal solution. If the partial solution is already over budget, or if it already has too many players at one position, then it can safely be discarded. Let is_valid_partial_solution be a function that tests these conditions (it probably looks a lot like is_valid_solution). So far we have this.
def cull(subsets): # INCOMPLETE!
return filter(is_valid_partial_solution, subsets)
The other important test is that some partial solutions are just better than others. If two partial solutions have the same position breakdown (e.g., two forwards and a center) and cost the same, then we only need to keep the more valuable one. Let cost_and_position_breakdown take a solution and produce a string that encodes the specified attributes.
def cull(subsets):
best_subset = {} # empty dictionary/map
for subset in filter(is_valid_partial_solution, subsets):
key = cost_and_position_breakdown(subset)
if (key not in best_subset or
total_value(subset) > total_value(best_subset[key])):
best_subset[key] = subset
return best_subset.values()
That's it. There's a lot of optimization to be done here (e.g., throw away partial solutions for which there's a cheaper and more valuable partial solution; modify the data structures so that we aren't always computing the value and position breakdown from scratch and to reduce the storage costs), but it can be tackled incrementally.
One potential small advantage with regard to composing recursive functions in PHP is that variables are passed by value (meaning a copy is made) rather than reference, which can save a step or two.
Perhaps you could better clarify what you are looking for by including a sample input and output. Here's an example that makes combinations from given groups - I'm not sure if that's your intention... I made the section accessing the partial result allow combinations with less value to be considered if their weight is lower - all of this can be changed to prune in the specific ways you would like.
function make_teams($players, $position_limits, $weights, $values, $max_weight){
$player_counts = array_map(function($x){
return count($x);
}, $players);
$positions = array_map(function($x){
$positions[] = [];
},$position_limits);
$num_positions = count($positions);
$combinations = [];
$hash = [];
$stack = [[$positions,0,0,0,0,0]];
while (!empty($stack)){
$params = array_pop($stack);
$positions = $params[0];
$i = $params[1];
$j = $params[2];
$len = $params[3];
$weight = $params[4];
$value = $params[5];
// too heavy
if ($weight > $max_weight){
continue;
// the variable, $positions, is accumulating so you can access the partial result
} else if ($j == 0 && $i > 0){
// remember weight and value after each position is chosen
if (!isset($hash[$i])){
$hash[$i] = [$weight,$value];
// end thread if current value is lower for similar weight
} else if ($weight >= $hash[$i][0] && $value < $hash[$i][1]){
continue;
// remember better weight and value
} else if ($weight <= $hash[$i][0] && $value > $hash[$i][1]){
$hash[$i] = [$weight,$value];
}
}
// all positions have been filled
if ($i == $num_positions){
$positions[] = $weight;
$positions[] = $value;
if (!empty($combinations)){
$last = &$combinations[count($combinations) - 1];
if ($weight < $last[$num_positions] && $value > $last[$num_positions + 1]){
$last = $positions;
} else {
$combinations[] = $positions;
}
} else {
$combinations[] = $positions;
}
// current position is filled
} else if (count($positions[$i]) == $position_limits[$i]){
$stack[] = [$positions,$i + 1,0,$len,$weight,$value];
// otherwise create two new threads: one with player $j added to
// position $i, the other thread skipping player $j
} else {
if ($j < $player_counts[$i] - 1){
$stack[] = [$positions,$i,$j + 1,$len,$weight,$value];
}
if ($j < $player_counts[$i]){
$positions[$i][] = $players[$i][$j];
$stack[] = [$positions,$i,$j + 1,$len + 1
,$weight + $weights[$i][$j],$value + $values[$i][$j]];
}
}
}
return $combinations;
}
Output:
$players = [[1,2],[3,4,5],[6,7]];
$position_limits = [1,2,1];
$weights = [[2000000,1000000],[10000000,1000500,12000000],[5000000,1234567]];
$values = [[33,5],[78,23,10],[11,101]];
$max_weight = 20000000;
echo json_encode(make_teams($players, $position_limits, $weights, $values, $max_weight));
/*
[[[1],[3,4],[7],14235067,235],[[2],[3,4],[7],13235067,207]]
*/

K-means clustering: What's wrong? (PHP)

I was looking for a way to calculate dynamic market values in a soccer manager game. I asked this question here and got a very good answer from Alceu Costa.
I tried to code this algorithm (90 elements, 5 clustes) but it doesn't work correctly:
In the first iteration, a high percentage of the elements changes its cluster.
From the second iteration, all elements change their cluster.
Since the algorithm normally works until convergence (no element changes its cluster), it doesn't finish in my case.
So I set the end to the 15th iteration manually. You can see that it runs infinitely.
You can see the output of my algorithm here. What's wrong with it? Can you tell me why it doesn't work correctly?
I hope you can help me. Thank you very much in advance!
Here's the code:
<?php
include 'zzserver.php';
function distance($player1, $player2) {
global $strengthMax, $maxStrengthMax, $motivationMax, $ageMax;
// $playerX = array(strength, maxStrength, motivation, age, id);
$distance = 0;
$distance += abs($player1['strength']-$player2['strength'])/$strengthMax;
$distance += abs($player1['maxStrength']-$player2['maxStrength'])/$maxStrengthMax;
$distance += abs($player1['motivation']-$player2['motivation'])/$motivationMax;
$distance += abs($player1['age']-$player2['age'])/$ageMax;
return $distance;
}
function calculateCentroids() {
global $cluster;
$clusterCentroids = array();
foreach ($cluster as $key=>$value) {
$strenthValues = array();
$maxStrenthValues = array();
$motivationValues = array();
$ageValues = array();
foreach ($value as $clusterEntries) {
$strenthValues[] = $clusterEntries['strength'];
$maxStrenthValues[] = $clusterEntries['maxStrength'];
$motivationValues[] = $clusterEntries['motivation'];
$ageValues[] = $clusterEntries['age'];
}
if (count($strenthValues) == 0) { $strenthValues[] = 0; }
if (count($maxStrenthValues) == 0) { $maxStrenthValues[] = 0; }
if (count($motivationValues) == 0) { $motivationValues[] = 0; }
if (count($ageValues) == 0) { $ageValues[] = 0; }
$clusterCentroids[$key] = array('strength'=>array_sum($strenthValues)/count($strenthValues), 'maxStrength'=>array_sum($maxStrenthValues)/count($maxStrenthValues), 'motivation'=>array_sum($motivationValues)/count($motivationValues), 'age'=>array_sum($ageValues)/count($ageValues));
}
return $clusterCentroids;
}
function assignPlayersToNearestCluster() {
global $cluster, $clusterCentroids;
$playersWhoChangedClusters = 0;
// BUILD NEW CLUSTER ARRAY WHICH ALL PLAYERS GO IN THEN START
$alte_cluster = array_keys($cluster);
$neuesClusterArray = array();
foreach ($alte_cluster as $alte_cluster_entry) {
$neuesClusterArray[$alte_cluster_entry] = array();
}
// BUILD NEW CLUSTER ARRAY WHICH ALL PLAYERS GO IN THEN END
foreach ($cluster as $oldCluster=>$clusterValues) {
// FOR EVERY SINGLE PLAYER START
foreach ($clusterValues as $player) {
// MEASURE DISTANCE TO ALL CENTROIDS START
$abstaende = array();
foreach ($clusterCentroids as $CentroidId=>$centroidValues) {
$distancePlayerCluster = distance($player, $centroidValues);
$abstaende[$CentroidId] = $distancePlayerCluster;
}
arsort($abstaende);
if ($neuesCluster = each($abstaende)) {
$neuesClusterArray[$neuesCluster['key']][] = $player; // add to new array
// player $player['id'] goes to cluster $neuesCluster['key'] since it is the nearest one
if ($neuesCluster['key'] != $oldCluster) {
$playersWhoChangedClusters++;
}
}
// MEASURE DISTANCE TO ALL CENTROIDS END
}
// FOR EVERY SINGLE PLAYER END
}
$cluster = $neuesClusterArray;
return $playersWhoChangedClusters;
}
// CREATE k CLUSTERS START
$k = 5; // Anzahl Cluster
$cluster = array();
for ($i = 0; $i < $k; $i++) {
$cluster[$i] = array();
}
// CREATE k CLUSTERS END
// PUT PLAYERS IN RANDOM CLUSTERS START
$sql1 = "SELECT ids, staerke, talent, trainingseifer, wiealt FROM ".$prefix."spieler LIMIT 0, 90";
$sql2 = mysql_abfrage($sql1);
$anzahlSpieler = mysql_num_rows($sql2);
$anzahlSpielerProCluster = $anzahlSpieler/$k;
$strengthMax = 0;
$maxStrengthMax = 0;
$motivationMax = 0;
$ageMax = 0;
$counter = 0; // for $anzahlSpielerProCluster so that all clusters get the same number of players
while ($sql3 = mysql_fetch_assoc($sql2)) {
$assignedCluster = floor($counter/$anzahlSpielerProCluster);
$cluster[$assignedCluster][] = array('strength'=>$sql3['staerke'], 'maxStrength'=>$sql3['talent'], 'motivation'=>$sql3['trainingseifer'], 'age'=>$sql3['wiealt'], 'id'=>$sql3['ids']);
if ($sql3['staerke'] > $strengthMax) { $strengthMax = $sql3['staerke']; }
if ($sql3['talent'] > $maxStrengthMax) { $maxStrengthMax = $sql3['talent']; }
if ($sql3['trainingseifer'] > $motivationMax) { $motivationMax = $sql3['trainingseifer']; }
if ($sql3['wiealt'] > $ageMax) { $ageMax = $sql3['wiealt']; }
$counter++;
}
// PUT PLAYERS IN RANDOM CLUSTERS END
$m = 1;
while ($m < 16) {
$clusterCentroids = calculateCentroids(); // calculate new centroids of the clusters
$playersWhoChangedClusters = assignPlayersToNearestCluster(); // assign each player to the nearest cluster
if ($playersWhoChangedClusters == 0) { $m = 1001; }
echo '<li>Iteration '.$m.': '.$playersWhoChangedClusters.' players have changed place</li>';
$m++;
}
print_r($cluster);
?>
It's so embarassing :D I think the whole problem is caused by only one letter:
In assignPlayersToNearestCluster() you can find arsort($abstaende);. After that, the function each() takes the first value. But it's arsort so the first value must be the highest. So it picks the cluster which has the highest distance value.
So it should be asort, of course. :) To prove that, I've tested it with asort - and I get convergence after 7 iterations. :)
Do you think that was the mistake? If it was, then my problem is solved. In that case: Sorry for annoying you with that stupid question. ;)
EDIT: disregard, I still get the same result as you, everyone winds up in cluster 4. I shall reconsider my code and try again.
I think I've realised what the problem is, k-means clustering is designed to break up differences in a set, however, because of the way you calculate averages etc. we are getting a situation where there are no large gaps in the ranges.
Might I suggest a change and only concentrate on a single value(strength appears to make most sense to me) to determine the clusters, or abandon this sorting method altogether, and adopt something different(not what you want to hear I know)?
I found a rather nice site with an example k-mean sort using integers, I'm going to try and edit that, I will get back with the results some time tomorrow.
http://code.blip.pt/2009/04/06/k-means-clustering-in-php/ <-- link I mentioned and forgot about.

Categories