VERY IMPORTANT EDIT: All Ai are unique.
The Question
I have a list A of n unique objects. Each object Ai has a variable percentage Pi.
I want to create an algorithm that generates a new list B of k objects (k < n/2 and in most cases k is significantly less than n/2. E.g. n=231 , k=21). List B should have no duplicates and will be populated with objects originating from list A with the following restriction:
The probability that an object Ai appears in B is Pi.
What I Have Tried
(These snipits are in PHP simply for the purposes of testing)
I first made list A
$list = [
"A" => 2.5,
"B" => 2.5,
"C" => 2.5,
"D" => 2.5,
"E" => 2.5,
"F" => 2.5,
"G" => 2.5,
"H" => 2.5,
"I" => 5,
"J" => 5,
"K" => 2.5,
"L" => 2.5,
"M" => 2.5,
"N" => 2.5,
"O" => 2.5,
"P" => 2.5,
"Q" => 2.5,
"R" => 2.5,
"S" => 2.5,
"T" => 2.5,
"U" => 5,
"V" => 5,
"W" => 5,
"X" => 5,
"Y" => 5,
"Z" => 20
];
At first I tried the following two algorthms (These are in PHP simply for the purposes of testing):
$result = [];
while (count($result) < 10) {
$rnd = rand(0,10000000) / 100000;
$sum = 0;
foreach ($list as $key => $value) {
$sum += $value;
if ($rnd <= $sum) {
if (in_array($key,$result)) {
break;
} else {
$result[] = $key;
break;
}
}
}
}
AND
$result = [];
while (count($result) < 10) {
$sum = 0;
foreach ($list as $key => $value) {
$sum += $value;
}
$rnd = rand(0,$sum * 100000) / 100000;
$sum = 0;
foreach ($list as $key => $value) {
$sum += $value;
if ($rnd <= $sum) {
$result[] = $key;
unset($list[$key]);
break;
}
}
}
The only differences between the two algorithms is that one tries again when it encounters a duplicate, and one removes the object form list A when it is picked. As it turns out, these two algorithms have the same probability outputs.
I ran the second algorithm 100,000 times and kept track of how many times each letter was picked. The following array contians the percentage chance that a letter is picked in any list B based off of the 100,000 tests.
[A] => 30.213
[B] => 29.865
[C] => 30.357
[D] => 30.198
[E] => 30.152
[F] => 30.472
[G] => 30.343
[H] => 30.011
[I] => 51.367
[J] => 51.683
[K] => 30.271
[L] => 30.197
[M] => 30.341
[N] => 30.15
[O] => 30.225
[P] => 30.135
[Q] => 30.406
[R] => 30.083
[S] => 30.251
[T] => 30.369
[U] => 51.671
[V] => 52.098
[W] => 51.772
[X] => 51.739
[Y] => 51.891
[Z] => 93.74
When looking back at the algorithm this makes sense. The algorithm incorrectly interpreted the original percentages to be the percentage chance that an object is picked for any given location, not any list B. So for example, in reality, the chance that Z is picked in a list B is 93%, but the chance that Z is picked for an index Bn is 20%. This is NOT what I want. I want the chance that Z is picked in a list B to be 20%.
Is this even possible? How can it be done?
EDIT 1
I tried simply having the sum of all Pi = k, this worked if all Pi are equal, but after modifying their values, it started to get more and more wrong.
Initial Probabilities
$list= [
"A" => 8.4615,
"B" => 68.4615,
"C" => 13.4615,
"D" => 63.4615,
"E" => 18.4615,
"F" => 58.4615,
"G" => 23.4615,
"H" => 53.4615,
"I" => 28.4615,
"J" => 48.4615,
"K" => 33.4615,
"L" => 43.4615,
"M" => 38.4615,
"N" => 38.4615,
"O" => 38.4615,
"P" => 38.4615,
"Q" => 38.4615,
"R" => 38.4615,
"S" => 38.4615,
"T" => 38.4615,
"U" => 38.4615,
"V" => 38.4615,
"W" => 38.4615,
"X" => 38.4615,
"Y" =>38.4615,
"Z" => 38.4615
];
Results after 10,000 runs
Array
(
[A] => 10.324
[B] => 59.298
[C] => 15.902
[D] => 56.299
[E] => 21.16
[F] => 53.621
[G] => 25.907
[H] => 50.163
[I] => 30.932
[J] => 47.114
[K] => 35.344
[L] => 43.175
[M] => 39.141
[N] => 39.127
[O] => 39.346
[P] => 39.364
[Q] => 39.501
[R] => 39.05
[S] => 39.555
[T] => 39.239
[U] => 39.283
[V] => 39.408
[W] => 39.317
[X] => 39.339
[Y] => 39.569
[Z] => 39.522
)
We must have sum_i P_i = k, or else we cannot succeed.
As stated, the problem is somewhat easy, but you may not like this answer, on the grounds that it's "not random enough".
Sample a uniform random permutation Perm on the integers [0, n)
Sample X uniformly at random from [0, 1)
For i in Perm
If X < P_i, then append A_i to B and update X := X + (1 - P_i)
Else, update X := X - P_i
End
You'll want to approximate the calculations involving real numbers with fixed-point arithmetic, not floating-point.
The missing condition is that the distribution have a technical property called "maximum entropy". Like amit, I cannot think of a good way to do this. Here's a clumsy way.
My first (and wrong) instinct for solving this problem was to include each A_i in B independently with probability P_i and retry until B is the right length (there won't be too many retries, for reasons that you can ask math.SE about). The problem is that the conditioning messes up the probabilities. If P_1 = 1/3 and P_2 = 2/3 and k = 1, then the outcomes are
{}: probability 2/9
{A_1}: probability 1/9
{A_2}: probability 4/9
{A_1, A_2}: probability 2/9,
and the conditional probabilities are actually 1/5 for A_1 and 4/5 for A_2.
Instead, we should substitute new probabilities Q_i that yield the proper conditional distribution. I don't know of a closed form for Q_i, so I propose to find them using a numerical optimization algorithm like gradient descent. Initialize Q_i = P_i (why not?). Using dynamic programming, it's possible to find, for the current setting of Q_i, the probability that, given an outcome with l elements, that A_i is one of those elements. (We only care about the l = k entry, but we need the others to make the recurrences work.) With a little more work, we can get the whole gradient. Sorry this is so sketchy.
In Python 3, using a nonlinear solution method that seems to converge always (update each q_i simultaneously to its marginally correct value and normalize):
#!/usr/bin/env python3
import collections
import operator
import random
def constrained_sample(qs):
k = round(sum(qs))
while True:
sample = [i for i, q in enumerate(qs) if random.random() < q]
if len(sample) == k:
return sample
def size_distribution(qs):
size_dist = [1]
for q in qs:
size_dist.append(0)
for j in range(len(size_dist) - 1, 0, -1):
size_dist[j] += size_dist[j - 1] * q
size_dist[j - 1] *= 1 - q
assert abs(sum(size_dist) - 1) <= 1e-10
return size_dist
def size_distribution_without(size_dist, q):
size_dist = size_dist[:]
if q >= 0.5:
for j in range(len(size_dist) - 1, 0, -1):
size_dist[j] /= q
size_dist[j - 1] -= size_dist[j] * (1 - q)
del size_dist[0]
else:
for j in range(1, len(size_dist)):
size_dist[j - 1] /= 1 - q
size_dist[j] -= size_dist[j - 1] * q
del size_dist[-1]
assert abs(sum(size_dist) - 1) <= 1e-10
return size_dist
def test_size_distribution(qs):
d = size_distribution(qs)
for i, q in enumerate(qs):
d1a = size_distribution_without(d, q)
d1b = size_distribution(qs[:i] + qs[i + 1 :])
assert len(d1a) == len(d1b)
assert max(map(abs, map(operator.sub, d1a, d1b))) <= 1e-10
def normalized(qs, k):
sum_qs = sum(qs)
qs = [q * k / sum_qs for q in qs]
assert abs(sum(qs) / k - 1) <= 1e-10
return qs
def approximate_qs(ps, reps=100):
k = round(sum(ps))
qs = ps[:]
for j in range(reps):
size_dist = size_distribution(qs)
for i, p in enumerate(ps):
d = size_distribution_without(size_dist, qs[i])
d.append(0)
qs[i] = p * d[k] / ((1 - p) * d[k - 1] + p * d[k])
qs = normalized(qs, k)
return qs
def test(ps, reps=100000):
print(ps)
qs = approximate_qs(ps)
print(qs)
counter = collections.Counter()
for j in range(reps):
counter.update(constrained_sample(qs))
test_size_distribution(qs)
print("p", "Actual", sep="\t")
for i, p in enumerate(ps):
print(p, counter[i] / reps, sep="\t")
if __name__ == "__main__":
test([2 / 3, 1 / 2, 1 / 2, 1 / 3])
Let's analyze it for a second.
With replacements: (not what you want, but simpler to analyze).
Given a list L of size k, and and element a_i, the probability for a_i to be in the list is denoted by your value p_i.
Let's examine the probability of a_i to be at some index j in the list. Let's denote that probability as q_i,j. Note that for any index t in the list, q_i,j = q_i,t - so we can simply say q_i_1=q_i_2=...=q_i_k=q_i.
The probability that a_i will be anywhere in the list is denoted as:
1-(1-q_i)^k
But it is also p_i - so we need to solve the equation
1-(1-q_i)^k = pi
1 - (1-q_i)^k -pi = 0
One way to do it is newton-raphson method.
After calculating the probability for each element, check if its indeed a proabability space (sums to 1, all probabilities are in [0,1]). If it's not - it cannot be done for given probabilities and k.
Without replacement: This is trickier, since now q_i,j != q_i,t (the selections are not i.i.d). Calculations for probability here will be much trickier, and I am not sure at the moment how to calculate them, it will be needed to be done in run-time, during the creation of the list I suppose.
(Deleted a solution that I am almost certain is biased).
Unless my math skills are a lot weaker than i think an average chance of an Element from list A in your example being found in list B should be 10/26 = 0.38.
If you lower this chance for any object, there must be others with higher chances.
Also, your probabilites from list A cannot compute: they are too low: you could not fill your list / you don't have enough elements to pick from.
Assuming the above is correct (or correct enough), that would mean that in your list A your average weight would have to be the average chance of a random pick. That, in turn, means your probabilities in list a don't sum up to 100.
Unless i am completely wrong, that is...
Related
Im doing a Grib2 decoder in PHP, and started with a half written library that I found. Everything is working fine except the values I get from the data are incorrect after converting Int Values to real values. I think I am converting everything right, and even when I test with cloud data it looks correct when I check it in Panoply. I think its with this formula that is all over the internet. Below im using 10 m above ground GFS from https://nomads.ncep.noaa.gov
Y*10^D = R+(X1+X2)*2^E
Im not sure I'm plugging in the values correctly but again it works with cloud cover percentages.
So.... The "Data Representation Values" I get from Grib Section 5
'Reference value (R)' => 886.25067138671875,
'Binary Scale Factor (E)' => 0,
'Decimal Scale Factor (D)' => 2,
'Number of bits used for each packed value' => 11,
'exp' => pow(2, $E), //(Equals 1) (The Library used these as the 2^E)
'base' => pow(10, $D), //(Equals 100) (And the 10^D)
'template' => 0,
As you can see below the numbers definitely have a connection to the Reference Value. The Number closest to 886(R) is 892 and its actual value should be 0.05 as shown below (EX.) The numbers Higher are than 892 are positive and the ones lower than 892 are negative. But when I user the formula (886 + 892 * 1) / 100 it give me 17.78, not 0.05. I seem to be missing something pretty obvious, am I misunderstanding the formula/equation where Y is the value I want...
X1 = 0 (documentation says)
X2 = 892 (documentation says is scaled value, the value in the Grib from bits?)
2^0 = 1
10^2 = 100
R = 886.25067138671875
Y * 10^D = R + (X1 + X2) * 2^E
Y * 100 = R + (X1 + X2) * 1
886 + (0 + 892) * 1 ) / 100
(886 + 892 * 1) / 100
= 17.78
Int Values of wind from Grib (After converting from Bits)
0 => 695,
1 => 639,
2 => 631,
3 => 0,
4 => 436,
5 => 513,
6 => 690,
7 => 570,
8 => 625,
9 => 805,
10 => 892,<-----------(EX.)
11 => 1044,
12 => 952,
13 => 1081,
14 => 1414,
15 => 997,
16 => 1106,
17 => 974,
18 => 1135,
19 => 1069,
20 => 912,
Actual decoded wind values shown in Panoply (Well known Grib App)
-1.9125067
-2.4725068
-2.5525067
-8.862507
-4.5025067
-3.7325068
-1.9625068
-3.1625068
-2.6125066
-0.81250674
0.057493284 <-----------(EX.)
1.5774933
0.6574933
1.9474933
5.2774935
1.1074933
2.1974933
0.87749326
2.4874933
1.8274933
0.2574933
y = 0.01 * (x - 886.25067138671875) seems to work for all points
so 0.01 * (892 - 886.25067138671875) = 0.0574
I am working on a step sequencer program for drum sounds. It takes a 16 bit binary pattern example: '1010010100101001' and then it breaks the binary pattern into chunks like so: 10, 100, 10, 100, 10, 100, 1. It then assigns each chunk a time value based on how many digits. Reason why, is some drum sample sounds ring out longer than the length of 1 beat, so the chunking solves this part. (for example if the beat was 60bpm 1 digit = 1 second) '10' = 2 seconds, '100' = 3 seconds, '1' = seconds. (allowing me to trim the sounds to the proper length in the pattern and concat it into a final wav using ffmpeg) Also 1 = drum hit / 0 = silent hit..... This method works great for my needs.
Now I can make perfect beat loops.... and I want to add a velocity pattern layer on top of this to allow ghost notes / add human feel / dynamics to my drum patterns. I have decided to use a 0,1,2,3,4 value system for the velocity patterns. '0' = 0% volume, '1' = 25% volume, '2' = 50% volume, '3' = 75% volume, and '4' = %100 volume. (0 volume so I can add open hi hat / cymbal crash hard stops that a 0 in binary pattern wouldn't do) So along with the '1111111111111111' pattern you would see a velocity pattern layer, say '4242424242424242' (That velocity pattern alternates 100% hit and 50% hit and sounds good with hi hats / like a real drummer)
Using PHP I am breaking 16 bit binary patterns into an array of chunks. '1001110011110010' would be
['100','1','1','100','1','1','1','100','10']
Now via a loop, I need to map another 16 digit number layer pattern of 0,1,2,3,4 digits to first digit of each chunk.
Example 1:
Velocity Pattern: '4242424242424242'
Binary Pattern: '1001110011110010'
Array = ['100','1','1','100','1','1','1','100','10']
'100' = 4 (1st digit in 4242424242424242 pattern)
'1' = 2 (4th digit in 4242424242424242 pattern)
'1' = 4 (5th digit in 4242424242424242 pattern)
'100' = 2 (6th digit in the 4242424242424242 pattern)
'1' = 4 (9th digit in the 4242424242424242 pattern)
'1' = 2 (10th digit in the 4242424242424242 pattern)
'1' = 4 (11th digit in the 4242424242424242 pattern)
'100' = 2 (12th digit in the 4242424242424242 pattern)
'10' = 4 (15th digit in the 4242424242424242 pattern)
Example 2:
Velocity Pattern: '4242424242424242'
Binary Pattern: '1111111111111111'
Array = ['1','1','1','1','1','1','1','1','1','1','1','1','1','1','1','1']
'1' = 4 (n1 digit in 4242424242424242 pattern)
'1' = 2 (n2 digit in 4242424242424242 pattern)
'1' = 4 (n3 digit in 4242424242424242 pattern)
'1' = 2 (n4 digit in 4242424242424242 pattern)
'1' = 4 (n5 digit in 4242424242424242 pattern)
'1' = 2 (n6 digit in 4242424242424242 pattern)
'1' = 4 (n7 digit in 4242424242424242 pattern)
'1' = 2 (n8 digit in 4242424242424242 pattern)
'1' = 4 (n9 digit in 4242424242424242 pattern)
'1' = 2 (n10 digit in 4242424242424242 pattern)
'1' = 4 (n11 digit in 4242424242424242 pattern)
'1' = 2 (n12 digit in 4242424242424242 pattern)
'1' = 4 (n13 digit in 4242424242424242 pattern)
'1' = 2 (n14 digit in 4242424242424242 pattern)
'1' = 4 (n15 digit in 4242424242424242 pattern)
'1' = 2 (n16 digit in 4242424242424242 pattern)
Example 3:
Velocity Pattern: '4231423142314231'
Binary Pattern: '0001000100010001'
Array = ['0','0','0','1000','1000','1000','1']
'0' = 4 (1st digit in 4231423142314231 pattern)
'0' = 2 (2nd digit in 4231423142314231 pattern)
'0' = 3 (3rd digit in 4231423142314231 pattern)
'1000' = 1 (4th digit in 4231423142314231 pattern)
'1000' = 1 (8th digit in 4231423142314231 pattern)
'1000' = 1 (12th digit in 4231423142314231 pattern)
'1' = 1 (16th digit in 4231423142314231 pattern)
The patterns will vary, so I need a method that works even if the pattern starts with 0, ect.
a pattern of 111111111111111 would be easy since each 1 is already split into a group by itself.
I tried using a counter called "$v_count" to map find the position in the pattern but its not working like expected.
$v_count = 0;
$beat_pattern = '1001110011110010';
$velocity_pattern = '4242424242424242';
preg_match_all('/10*|0/', $beat_pattern, $m);
$c_count = count($m, COUNT_RECURSIVE) - 1;
for ($z = 0; $z < $c_count; $z++) {
$z2 = $z;
${"c" . $z} = $m[0][$z];
${"cl" . $z} = strlen($m[0][$z]);
if (${"cl" . $z} == 1 & $m[0][$z] == "0") {
$v_count = $v_count + 1;
echo 'the position of this chunk is: '.$v_count.' in the velocity_pattern<br>';
};
if (${"cl" . $z} == 1 & $m[0][$z] == "1") {
$v_count = $v_count + 1;
echo 'the position of this chunk is: '.$v_count.' in the velocity_pattern<br>';
};
if (${"cl" . $z} > 1) {
if ($z == 1)
{
$v_count = 1;
}
if ($z > 1)
{
$v_count = $v_count + 1;
}
echo ' - the velocity position of this chunk is: '.$v_count.' in the pattern<br>';
$v_count = $v_count + ${"cl" . $z} + 1;
};
}
From the example you've given, it seems that you need the corresponding value from the velocity array and the duration between the 1's in the beat array.
This code first extracts the 1's by splitting it into an array and then filtering out the 0's. So
$beat_pattern = '1001110011110010';
$velocity_pattern = '4242424242424242';
$beat = array_filter(str_split($beat_pattern));
would give in $beat...
Array
(
[0] => 1
[3] => 1
[4] => 1
[5] => 1
[8] => 1
[9] => 1
[10] => 1
[11] => 1
[14] => 1
)
it then takes each entry in turn, works out the length by looking at the next key and subtract the two, also using the index to get the corresponding velocity.
To account for the starting with 0, you can loop up to the first instance of 1 and output the velocity pattern for the same element...
$beat_pattern = '1001110011110010';
$velocity_pattern = '4242424242424242';
$beat = array_filter(str_split($beat_pattern));
$beatKeys = array_keys($beat);
// For the leading 0's
for( $i = 0; $i < $beatKeys[0]; $i++ ) {
echo "1-". $velocity_pattern[$i] . PHP_EOL;
}
for ( $i = 0; $i < count($beatKeys); $i++ ) {
echo ($beatKeys[$i+1] ?? strlen($beat_pattern)) - $beatKeys[$i] . "-".
$velocity_pattern[$beatKeys[$i]] . PHP_EOL;
}
gives (length-velocity)...
3-4
1-2
1-4
3-2
1-4
1-2
1-4
3-2
2-4
Assuming your two input strings:
$binary = '0001000110101001';
$velocity = '4231423142314231';
If you analyse the pattern with a regex, you can obtain all the component parts in one operation, including pauses at the start of the pattern (which are essentially 0% volume beats).
$index = 0;
preg_match_all('/^0+|10*/', $binary, $parts);
foreach ($parts[0] as $part) {
$duration = strlen($part); // How many beats
$volume = $part[0] ? $velocity[$index] : 0; // The corresponding volume number
$index += $duration;
}
To develop this further, it seems to me that it would be practical to produce a proper array of data for the pattern, and you could package up this functionality if you so wanted:
function drumPattern($binary, $velocity) {
$output = [];
$index = 0;
preg_match_all('/^0+|10*/', $binary, $parts);
foreach ($parts[0] as $part) {
$duration = strlen($part);
$output[] = [
'duration' => $duration,
'volume' => $part[0] ? $velocity[$index] : 0
];
$index += $duration;
}
return $output;
}
Example
drumPattern($binary, $velocity);
Produces the following output
Array
(
[0] => Array
(
[duration] => 3
[volume] => 0
)
[1] => Array
(
[duration] => 4
[volume] => 1
)
[2] => Array
(
[duration] => 1
[volume] => 1
)
[3] => Array
(
[duration] => 2
[volume] => 4
)
[4] => Array
(
[duration] => 2
[volume] => 3
)
[5] => Array
(
[duration] => 3
[volume] => 4
)
[6] => Array
(
[duration] => 1
[volume] => 1
)
)
So I'm learning Php, so as I was messing around with arrays to see how they work, I stumbled into this when I made two arrays.
$TestArray1 = array( 1 => 1, "string" => "string", 24, "other", 2 => 6, 8);
$TestArray2 = array( 6 => 1, "string" => "string", 24, "other", 1 => 6, 8);
But when I print them out with print_r() this is what I get (this also happens with var_dump by the way)
Array ( [1] => 1 [string] => string [2] => 6 [3] => other [4] => 8 )
Array ( [6] => 1 [string] => string [7] => 24 [8] => other [1] => 6 [9] => 8 )
As far as I can tell, by putting the two in the second array it overwrites the next possible spot with no key and then keeps going, shortening the array. So I thought that meant that if I use a 1 it would put it at the start but that does not happen either.
Is this normal or is there something wrong with my php installation?
Im using Ampps in windows 10 with php 7.3.
Thanks in advance
Good question.
What's happening is that when determining automatic numeric indexes, PHP will look to the largest numeric index added and increment it (or use 0 if there are none).
The key is optional. If it is not specified, PHP will use the increment of the largest previously used integer key.
What's happening with your first array is that as it is evaluated left-to-right, 24 is inserted at index 2 because the last numeric index was 1 => 1.
Then when it gets to 2 => 6, it overwrites the previous value at index 2. This is why 24 is missing from your first array.
If multiple elements in the array declaration use the same key, only the last one will be used as all others are overwritten.
Here's a breakdown
$TestArray1 = [1 => 6]; // Array( [1] => 6 )
// no index, so use last numeric + 1
$TestArray1[] = 24; // Array( [1] => 6, [2] => 24 )
$TestArray1[2] = 6; // Array( [1] => 6, [2] => 6 )
When you manually add numeric indexes that are lower than previous ones (ie $TestArray2), they will be added as provided but their position will be later.
This is because PHP arrays are really maps that just pretend to be indexed arrays sometimes, depending on what's in them.
References are from the PHP manual page for Arrays
I have a column pack_size in a table called product_master_test. The problem that I am facing is that the pack_size is in mixed formats, there is no uniformity to it.
For example:
4 x 2kg (pack size should be 4)
48-43GM (pack size should be 48)
12 x 1BTL (pack size should be 12)
1 x 24EA (pack size should be 24)
I've been thinking about different approaches, but I can't think of anything that would work without having a lot of IF statements in the query/PHP code. Is there a solution that I am missing?
I do have the file in Excel, if there is an easier way to process it using PHP.
I am not including any code, as I'm not entirely sure where to start with this problem.
Using a regex to split the pack size could at least give you the various components which you can then (possibly) infer more from...
$packs = ["4 x 2kg","48-43GM","12 x 1BTL","1 x 24EA", "12 X 1 EA"];
foreach ( $packs as $size ) {
if ( preg_match("/(\d*)(?:\s+)?[xX-](?:\s+)?(\d+)(?:\s+)?(\w*)/", $size, $match) == 1 ) {
print_r($match);
}
else {
echo "cannot determine - ".$size.PHP_EOL;
}
}
(regex can probably be optimised, not my area of expertise). It basically splits it to be a number, some space with either a x or a - and then another number followed by the units (some text). The above with the test cases gives...
Array
(
[0] => 4 x 2kg
[1] => 4
[2] => 2
[3] => kg
)
Array
(
[0] => 48-43GM
[1] => 48
[2] => 43
[3] => GM
)
Array
(
[0] => 12 x 1BTL
[1] => 12
[2] => 1
[3] => BTL
)
Array
(
[0] => 1 x 24EA
[1] => 1
[2] => 24
[3] => EA
)
Array
(
[0] => 12 X 1 EA
[1] => 12
[2] => 1
[3] => EA
)
With the else part it should also give you the ones it cannot determine and perhaps allow you to change it accordingly.
You could present an associative array of all the strings from the table as keys corresponding with correct pack_size you desire.
$packsize = ["4 x 2kg" => 4, "48-43GM" => 48, "12 x 1BTL" => 12, "1 x 24EA" => 24]; //add all pack_sizes here
echo $packsize["4 x 2kg"]; // Output: 4
Now you could get the acutal pack size via the key of associative array. It could save some time you would spend making if/else conditions or switching the input. I'm not sure if there is something wrong with this approach, so correct me if so.
array
1703 => float 15916.19738
5129 => float 11799.15419
33 => float 11173.49945
1914 => float 8439.45987
2291 => float 6284.22271
5134 => float 5963.14065
5509 => float 5169.85755
4355 => float 5153.80867
2078 => float 3932.79341
31 => float 3924.09928
5433 => float 2718.7711
3172 => float 2146.1932
1896 => float 2141.36021
759 => float 1453.5501
2045 => float 1320.74681
5873 => float 1222.7448
2044 => float 1194.4903
6479 => float 1074.1714
5299 => float 950.872
3315 => float 878.06602
6193 => float 847.3372
1874 => float 813.816
1482 => float 330.6422
6395 => float 312.1545
6265 => float 165.9224
6311 => float 122.8785
6288 => float 26.5426
I would like to distribute this array into two arrays both ending up with a grand total (from the float values) to be about the same. I tried K-Clustering but that distributes higher values onto one array and lower values onto the other array. I'm pretty much trying to create a baseball team with even player skills.
Step 1: Split the players into two teams. It doesn't really matter how you do this, but you could do every other one.
Step 2: Randomly switch two players only if it makes the teams more even.
Step 3: Repeat step 2 until it converges to equality.
$diff = array_sum($teams[0]) - array_sum($teams[1]);
for ($i = 0; $i < 1000 && $diff != 0; ++$i)
{
$r1 = rand(0, 8); // assumes nine players on each team
$r2 = rand(0, 8);
$new_diff = $diff - ($teams[0][$r1] - $teams[1][$r2]) * 2;
if (abs($new_diff) < abs($diff))
{
// if the switch makes the teams more equal, then swap
$tmp = $teams[0][$r1];
$teams[0][$r1] = $teams[1][$r2];
$teams[1][$r2] = $tmp;
var_dump(abs($new_diff));
$diff = $new_diff;
}
}
You'll have to adapt that code to your own structures, but it should be simple.
Here's a sample output:
int(20)
int(4)
int(0)
I was using integers from 0 to 100 to rate each player. Notice how it gradually converges to equality, although an end result of 0 is not guaranteed.
You can stop the process after a fixed interval or until it reaches some threshold.
There are more scientific methods you could use, but this works well.
This is extremely simplistic, but have you considered just doing it like a draft? With the array sorted as in your example, Team A gets array[0], Team B gets array[1] and array[2] the next two picks go to Team A, and so on.
For the example you give, I got one team with ~50,000 and the other with ~45,000.