Accessing unique value pairs from an array without repeating myself

Accessing unique value pairs from an array without repeating myself - php

I am trying to access unique value pairs from an array in a random order - without repeating myself until I have to.
For example, if I have an array set A,B,C,D (generally an even number of items, but up to 20) then the first time through I might pair A-B & C-D. But I want to guarantee that the next time I do it, I avoid repeating my pairing and that I get both A-C & B-D and A-D and B-C before I then get A-B and C-D again. Each item should only be called once in each round.
I started off by shuffling the order of the array randomly then pairing two values together - but I need a way to prevent some pairings from occurring more frequently than others (ideally I'd want them to increment equally all the way through).
So I've moved to looking at permutations - and have managed to get a full array containing all the possible pairings using the code below:
$this->items = array('A','B','C','D');
$input = $this->items;
$input_copy = $input;
$output = array();
$i = 0;
foreach($input as $val) {
$j = 0;
foreach($input_copy as $cval) {
if($j == $i) break;
print $val.'-'.$cval.'<br/>';
//$output[] = array($val => $cval);
$j++;
}
$i++;
}
//print_r($output);
e.g for A, B, C, D I get:
b-a
c-a
c-b
d-a
d-b
d-c
I want to cycle through the set n-1 times and capture the results in another array, but I'm not sure how to generate the actual order from these unique options
In other words, I want to turn the list above in to the below:
1st run =>
1=> A-B,
2=> C-D,
2nd run =>
1=> A-C,
2=> B-D,
3rd run =>
1=> A-D,
2=> C-B,
It may be that I can do this more simply from $this->items. I've also had a look at the Math_Combinatorics PEAR package, but I wasn't sure where to start.
I'd be grateful for any help!

You can use round-robin tournament algorithm
Place elements in two rows.
Fix one element - in this case A
For next round shift all other elements in circular manner.
Pair them.
Repeat N-1 times
A B
D C
-----
A D
C B
----
A C
B D
----

I assume that you want to generate each pairing exactly once, i.e. each partition of your whole sequence into pairs. If you only want each pair exactly once, that's a different problem handled in a different answer.
Think about this problem recursively: At the beginning you have n elements. From these, take the first and choose a partner for it from the remaining n-1 elements. Take this pair out of the list and recuse with the remaining n-2 elements. If you make each choice unbiased, the remaining pairing will be unbiased as well. But that doesn't guarantee you won't repeat yourself earlier than neccessary.
If you really want to be sure you avoid repeating pairings, you should first think about how many possible pairings there are. For now I'll assume that n is even, so you only have complete pairs. It's easy to adjust this to odd n with one unpaired element. To obtain the total number of possible pairings, you have to multiply your choices:
m=(n-1)*(n-3)*(n-5)*...*7*5*3*1
So it's a product of odd numbers. That's A001147, also written as a double factorial m=(n-1)!!. Note that these numbers grow fairly quickly, so even for moderate n (like n=16) you might not have to worry about repeating yourself simply because there are so many possible pairings to choose from that a repetition is fairly unlikely.
If you really want to be sure that you avoid repetitions, you could of course simply generate the whole list and shuffle it. But as I just indicated, that list could become huge as well. So instead I'd suggest you divide this problem into two steps. Find a way to generate all numbers from 0 to m-1 each exactly once, and find a way to turn such numbers into pairings. For the latter, you can simply decompose your number step by step. At each step, take index % (n-1) to make the current choice, and choose (int)(index / (n-1)) as the index for subsequent choices in the recursive calls.
For the former, the easiest thing I can think of would be using a PRNG with a prime number p>m as its period. Using modular arithmetic, that should be easy to do. Then simply discard all values which are greater or equal to m. Discarding means that you skip to the next element in the sequence. This will give all pairings in an order which should seem fairly random. If the starting point in that sequence should be random, be sure that if you at first choose a value which is to be discarded, then you have to initialize again, not skip to the next element. Otherwise some elements would be more likely as starting points than others.

Related

Subset Sum floats Elimations

I will be happy to get some help. I have the following problem:
I'm given a list of numbers and a target number.
subset_sum([11.96,1,15.04,7.8,20,10,11.13,9,11,1.07,8.04,9], 20)
I need to find an algorithm that will find all numbers that combined will sum target number ex: 20.
First find all int equal 20
And next for example the best combinations here are:
11.96 + 8.04
1 + 10 + 9
11.13 + 7.8 + 1.07
9 + 11
Remaining value 15.04.
I need an algorithm that uses 1 value only once and it could use from 1 to n values to sum target number.
I tried some recursion in PHP but runs out of memory really fast (50k values) so a solution in Python will help (time/memory wise).
I'd be glad for some guidance here.
One possible solution is this: Finding all possible combinations of numbers to reach a given sum
The only difference is that I need to put a flag on elements already used so it won't be used twice and I can reduce the number of possible combinations
Thanks for anyone willing to help.

there are many ways to think about this problem.
If you do recursion make sure to identify your end cases first, then proceed with the rest of the program.
This is the first thing that comes to mind.
<?php
subset_sum([11.96,1,15.04,7.8,20,10,11.13,9,11,1.07,8.04,9], 20);
function subset_sum($a,$s,$c = array())
{
if($s<0)
return;
if($s!=0&&count($a)==0)
return;
if($s!=0)
{
foreach($a as $xd=>$xdd)
{
unset($a[$xd]);
subset_sum($a,$s-$xdd,array_merge($c,array($xdd)));
}
}
else
print_r($c);
}
?>

This is possible solution, but it's not pretty:
import itertools
import operator
from functools import reduce
def subset_num(array, num):
subsets = reduce(operator.add, [list(itertools.combinations(array, r)) for r in range(1, 1 + len(array))])
return [subset for subset in subsets if sum(subset) == num]
print(subset_num([11.96,1,15.04,7.8,20,10,11.13,9,11,1.07,8.04,9], 20))
Output:
[(20,), (11.96, 8.04), (9, 11), (11, 9), (1, 10, 9), (1, 10, 9), (7.8, 11.13, 1.07)]

DISCLAIMER: this is not a full solution, it is a way to just help you build the possible subsets. It does not help you to pick which ones go together (without using the same item more than once and getting the lowest remainder).
Using dynamic programming you can build all the subsets that add up to the given sum, then you will need to go through them and find which combination of subsets is best for you.
To build this archive you can (I'm assuming we're dealing with non-negative numbers only) put the items in a column, go from top to bottom and for each element compute all the subsets that add up to the sum or a lower number than it and that include only items from the column that are in the place you are looking at or higher. When you build a subset you put in its node both the sum of the subset (which may be the given sum or smaller) and the items that are included in the subset. So in order to compute the subsets for an item [i] you need only look at the subsets you've created for item [i-1]. For each of them there are 3 options:
1) the subset's sum is the given sum ---> Keep the subset as it is and move to the next one.
2) the subset's sum is smaller than the given sum but larger than it if item [i] is added to it ---> Keep the subset as it is and move on to the next one.
3) the subset's sum is smaller than the given sum and it will still be smaller or equal to it if item [i] is added to it ---> Keep one copy of the subset as it is and create another one with item [i] added to it (both as a member and added to the sum of the subset).
When you're done with the last item (item [n]), look at the subsets you've created - each one has its sum in its node and you can see which ones are equal to the given sum (and which ones are smaller - you don't need those anymore).
As I wrote at the beginning - now you need to figure out how to take the best combination of subsets that do not have a shared member between any of them.
Basically you're left with a problem that resembles the classic knapsack problem but with another limitation (not every stone can be taken with every other stone). Maybe the limitation actually helps, I'm not sure.
A bit more about the advantage of dynamic programming in this case
The basic idea of dynamic programming instead of recursion is to trade redundancy of operations with occupation of memory space. By that I mean to say that recursion with a complex problem (normally a backtrack knapsack-like problem, as we have here) normally ends up calculating the same thing a fair amount of times because the different branches of calculation have no concept of each other's operations and results. Dynamic programming saves the results and uses them along the way to build "bigger" results, relying on the previous/"smaller" ones. Because the use of the stack is much more straightforward than in recursion, you don't get the memory problem you get with recursion regarding the maintenance of the function's state, but you do need to handle a great deal of memory that you store (sometimes you can optimise that).
So for example in our problem, trying to combine a subset that would add up to the required sum, the branch that starts with item A and the branch that starts with item B do not know of each other's operations. let's assume item C and item D together add up to the sum, but either of them added alone to A or B would not exceed the sum, and that A don't go with B in the solution (we can have sum=10, A=B=4, C=D=5 and there is no subset that sums up to 2 (so A and B can't be in the same group)). The branch trying to figure out A's group would (after trying and rejecting having B in its group) add C (A+C=9) and then add D, in which point would reject this group and trackback (A+C+D=14 > sum=10). The same would happen to B of course (A=B) because the branch figuring out B's group has no information regarding what just happened to the branch dealing with A. So in fact we've calculated C+D twice, and haven't even used it yet (and we're about to calculate it yet a third time to realise they belong in a group of their own).
NOTE:
Looking around while writing this answer I came across a technique I was not familiar with and might be a better solution for you: memoization. Taken from wikipedia:
memoization is an optimization technique used primarily to speed up computer programs by storing the results of expensive function calls and returning the cached result when the same inputs occur again.

So I have a possbile solution:
#compute difference between 2 list but keep duplicates
def list_difference(a, b):
count = Counter(a) # count items in a
count.subtract(b) # subtract items that are in b
diff = []
for x in a:
if count[x] > 0:
count[x] -= 1
diff.append(x)
return diff
#return combination of numbers that match target
def subset_sum(numbers, target, partial=[]):
s = sum(partial)
# check if the partial sum is equals to target
if s == target:
print "--------------------------------------------sum_is(%s)=%s" % (partial, target)
return partial
else:
if s >= target:
return # if we reach the number why bother to continue
for i in range(len(numbers)):
n = numbers[i]
remaining = numbers[i+1:]
rest = subset_sum(remaining, target, partial + [n])
if type(rest) is list:
#repeat until rest is > target and rest is not the same as previous
def repeatUntil(subset, target):
currSubset = []
while sum(subset) > target and currSubset != subset:
diff = subset_sum(subset, target)
currSubset = subset
subset = list_difference(subset, diff)
return subset
Output:
--------------------------------------------sum_is([11.96, 8.04])=20
--------------------------------------------sum_is([1, 10, 9])=20
--------------------------------------------sum_is([7.8, 11.13, 1.07])=20
--------------------------------------------sum_is([20])=20
--------------------------------------------sum_is([9, 11])=20
[15.04]
Unfortunately this solution does work for a small list. For a big list still trying to break the list in small chunks and calculate but the answer is not quite correct. You can see it o a new thread here:
Finding unique combinations of numbers to reach a given sum

Finding out sequence similarity in arrays

I have a task where I have three arrays A,B,C. All of the contain the same data. For the sake of simplicity lets assume the data is numbers 1 to 5. The data would be in different jumbled sequences. I want to find out among B & C which array has data most similar to A.
Eg:
A = 1,2,3,4,5
B = 1,2,3,5,4
C = 4,1,2,3,5
In this case, it is easy to visually comprehend that B is more similar to A. But it gets more complicated for really jumbled sequences.
Eg:
A = 1,2,3,4,5
B = 5,3,1,4,2
C = 4,1,2,3,5
In this case, I would assume C to be more closer to A. I am thinking that this assumption can be quantified as: How many elements have the same sequence in both arrays? In above example the subsequence of [1,2,3] is the same in both arrays. The second question would be what is the offset difference between the similar subsequence ? In this case it is 1, because the subsequence begins at index 0 for A and index 1 for C.
So the number of elements in a matching sequence and their offsets are what I am thinking to use. I plan on adding a weightage to these two entities (number of elements in matching sequence, and offset difference in their occurrence)
Does this make sense? I only need a rough approximation of similarity and the results do not need to be exact. Are there any formal mathematical or data-structure models that solve this problem?
BTW, the project where I need this implemented is in PHP. Does it have any inbuilt functions like the levenstein model for string difference?
Any suggestions are very welcome!

Well I suppose you can come up with your own algorithm (for instance generate all suffixes and then search for them and then define a scoring procedure) or you could use a well known algorithm like
Smith-Waterman for local alignment or Needleman-Wunsch for global. The advantage of these algorithms is that they are well-understood and give you all the possible alignments (and you can choose the best for your case).
NW in PHP
SW in PHP

From 6 random numbers calculate random three-digit number?

I have 4 years PHP and C# experience, but Math is not my better side.
I thnik that i need in this project use some math algorithms.
When page load I need randomly create 7 numbers, 6 are numbers that I can use to calculate given three digit number:
rand 1-9
rand 1-9
rand 1-9
rand 1-9
rand 10-100 //5 steps
rand 10-100 //5 steps
and given number to calculate is 100-999,
I can use this operations: +, -, /, *, (, )
What is best algorithm for this?
I probably need to try all possible combinations with this 6 numbers to calculate given number or closest number of calculations.
example:
let say that given three digit number is
350, and I need to calculate this number from this numbers:
3,6,9,5 10, 100
so formula for this is:
(100*3)+(5*10) = 350
if is not possible to calculate exact number, than calculate closest.
You don't need to solve this problem completely, you can introduce me to solve this problem by paste some pseudo, or describing how to do that.

I have no actual experience that might help you with this, though since you're asking for some insight, I'll share my thoughts on how to do this.
As I typed my answer, I realised that this is in fact a knapsack problem, which means you can solve it to optimality using any algorithm that solves the knapsack problem. I recommend using dynamic programming to make your program run faster.
What you need to do is construct all numbers you can generate by combining two numbers with an operator, so that after this you have a list containing the numbers you started with, and the numbers you generated.
Then you solve the knapsack problem using the numbers as items with their value as their weight, and the number as the weight you can store at most.
The only thing that is slightly different is that you have an extra constraint that says that you may only use a number once. So you need to add into your implementation that if you add a combination of numbers, that you must remove the option of storing another combination that is constructed with the same number.

You could enumerate all the solutions by building "Abstract syntax trees", binary trees with the following informations :
the leaves are the 6 numbers
the nodes are the operations, for example a node '+' with the leaf '7' for left son and another node for right son that is 'x' with '140' for left son and '8' for right son would represent (7+(140*8)). Additionally, at each node you store the numbers that you already used (the leaves used in the tree), and the total.
Let's say you store all the constructed trees in the associative map TreeSets, but indexed by the number of leaves you use. For example, the tree (7+(140*8)) would not be stored directly in TreeSets but in TreeSets[3] (TreeSets[3] contains several trees, it is also a set).
You store the most close score in BestScore and one solution of the BestScore in BestSolution.
You start by constructing the 6 leaves (that makes you 6 different trees consisting of only one leaf). You save the closer number in Bestscore and the corresponding leaf in BestSolution.
Then at each step, you try to construct the trees with i leaves, i from 2 to 6, and store them in TreeSets[i].
You take j from 1 to i-1, you take each tree in TreeSets[j] and each tree in TreeSets[i-j], you check that those two trees don't use the same leaves (you don't have to check at the bottom of the tree since you have stored the leaves used in the node), if so you build the four nodes '+', 'x', '/', '-' with the tree from TreeSets[j] as left son and the tree from TreeSets[i-j] and store all four of them in TreeSets[i]. While building a node, you take the total from both tree and apply the operation, you store the total, and you check if it is closer than BestScore (if so you update BestScore and BestSolution with this new total and with the new node). If the total is exactly the value you were looking for, you can stop here.
If you didn't stopped the program by finding an exact solution, there is no such solution, and the closer one is in BestSolution at the end.
Note : You don't have to build a complete tree each time, just build the node with two pointers on other trees.
P.S. : You may avoid to enumerate all the solutions by using the dynamic programming approach, as Glubus said. In this case, it would consist, at each step (i) to remove some solutions that are considered sub-optimal. But with this problem I'm not sure that is possible (except maybe remove the nodes with a total of 0).

Permutations of Varying Size

I'm trying to write a function in PHP that gets all permutations of all possible sizes. I think an example would be the best way to start off:
$my_array = array(1,1,2,3);
Possible permutations of varying size:
1
1 // * See Note
2
3
1,1
1,2
1,3
// And so forth, for all the sets of size 2
1,1,2
1,1,3
1,2,1
// And so forth, for all the sets of size 3
1,1,2,3
1,1,3,2
// And so forth, for all the sets of size 4
Note: I don't care if there's a duplicate or not. For the purposes of this example, all future duplicates have been omitted.
What I have so far in PHP:
function getPermutations($my_array){
$permutation_length = 1;
$keep_going = true;
while($keep_going){
while($there_are_still_permutations_with_this_length){
// Generate the next permutation and return it into an array
// Of course, the actual important part of the code is what I'm having trouble with.
}
$permutation_length++;
if($permutation_length>count($my_array)){
$keep_going = false;
}
else{
$keep_going = true;
}
}
return $return_array;
}
The closest thing I can think of is shuffling the array, picking the first n elements, seeing if it's already in the results array, and if it's not, add it in, and then stop when there are mathematically no more possible permutations for that length. But it's ugly and resource-inefficient.
Any pseudocode algorithms would be greatly appreciated.
Also, for super-duper (worthless) bonus points, is there a way to get just 1 permutation with the function but make it so that it doesn't have to recalculate all previous permutations to get the next?
For example, I pass it a parameter 3, which means it's already done 3 permutations, and it just generates number 4 without redoing the previous 3? (Passing it the parameter is not necessary, it could keep track in a global or static).
The reason I ask this is because as the array grows, so does the number of possible combinations. Suffice it to say that one small data set with only a dozen elements grows quickly into the trillions of possible combinations and I don't want to task PHP with holding trillions of permutations in its memory at once.

Sorry no php code, but I can give you an algorithm.
It can be done with small amounts of memory and since you don't care about dupes, the code will be simple too.
First: Generate all possible subsets.
If you view the subset as a bit vector, you can see that there is a 1-1 correspondence to a set and a binary number.
So if your array had 12 elements, you will have 2^12 subsets (including empty set).
So to generate a subset, you start with 0 and keep incrementing till you reach 2^12. At each stage you read the set bits in the number to get the appropriate subset from the array.
Once you get one subset, you can now run through its permutations.
The next permutation (of the array indices, not the elements themselves) can be generated in lexicographic order like here: http://www.de-brauwer.be/wiki/wikka.php?wakka=Permutations and can be done with minimal memory.
You should be able to combine these two to give your-self a next_permutation function. Instead of passing in numbers, you could pass in an array of 12 elements which contains the previous permutation, plus possibly some more info (little memory again) of whether you need to go to the next subset etc.
You should actually be able to find very fast algorithms which use minimal memory, provide a next_permutation type feature and do not generate dupes: Search the web for multiset permutation/combination generation.
Hope that helps. Good luck!

The best set of functions I've come up with was the one provided by some user at the comments of the shuffle function on php.net Here is the link It works pretty good.
Hope it's useful.

The problem seems to be trying to give an index to every permutation and having a constant access time. I cannot think of a constant time algorithm, but maybe you can improve this one to be so. This algorithm has a time complexity of O(n) where n is the length of your set. The space complexity should be reducible to O(1).
Assume our set is 1,1,2,3 and we want the 10th permutation. Also, note that we will index each element of the set from 0 to 3. Going by your order, this means the single element permutations come first, then the two element, and so on. We are going to subtract from the number 10 until we can completely determine the 10th permutation.
First up are the single element permutations. There are 4 of those, so we can view this as subtracting one four times from 10. We are left with 6, so clearly we need to start considering the two element permutations. There are 12 of these, and we can view this as subtracting three up to four times from 6. We discover that the second time we subtract 3, we are left with 0. This means the indexes of our permutation must be 2 (because we subtracted 3 twice) and 0, because 0 is the remainder. Therefore, our permutation must be 2,1.
Division and modulus may help you.
If we were looking for the 12th permutation, we would run into the case where we have a remainder of 2. Depending on your desired behavior, the permutation 2,2 might not be valid. Getting around this is very simple, however, as we can trivially detect that the indexes 2 and 2 (not to be confused with the element) are the same, so the second one should be bumped to 3. Thus the 12th permutation can trivially be calculated as 2,3.
The biggest confusion right now is that the indexes and the element values happen to match up. I hope my algorithm explanation is not too confusing because of that. If it is, I will use a set other than your example and reword things.

Inputs: Permutation index k, indexed set S.
Pseudocode:
L = {S_1}
for i = 2 to |S| do
Insert S_i before L_{k % i}
k <- k / i
loop
return L
This algorithm can also be easily modified to work with duplicates.

How to determine the best case and worst case of an program(algorithm)?

Suppose I have this program, I want to compare 2 input lists. Assume array A and array B. How do I determine the best case and worst case of the function?
Here is my code in [php]:
foreach($array_1 as $k){
if(!in_array($k, $array_2)){
array_push($array_2, $k);
}
}
What is the best case and worst case of the for loop? Please include some explaination, thank you :)
EDITED:
Since my goal is to compare 2 lists that have at lists 1 element in common. I think my above code is wrong. Here is the updated of my code
foreach($array_1 as $k){
if(in_array($k, $array_2)){
array_push($array_3, $k);
}
}
And I guess it would be:
Best case: O(n)
Worst case: O(N*M)

Let's do a quick analysis then:
foreach($array_1 as $k)
means that the operation within will be repeated for each element of the array. Let denote the size of the array by N.
The operation within:
if (!in_array($k, $array_2)) {
array_push($array_2, $k);
}
There are 2 operations here:
in_array
array_push
array_push is likely to be constant, thus O(1), while in_array is more likely a linear search in array_2 which will take either 1 operation (found as the first element) up to the length of array_2 operations.
Note that in_array represent the only variable here:
best case: in_array returns at the first comparison --> all elements of array_1 are the same, and either array_2 was empty or they are equal to its first element. Complexity is O(N) since we have N elements in array_1
worst case: each time we examine each element of array_2 --> all elements of array_1 are distinct and they are distinct from the previous elements of array_2. If M is the length of array_2 when it is inputed, then the complexity is along the line of O(N * (N+M) ), (N+M)/2 being the mean time for searching in array_2 as it's growing from M to M+N elements and the constant 2 being discarded in the O notation
Hope this helps.

Big O notation is all about approximations. It makes it easy to compare algorithms.
If you imagine your array of elements, a search might be order N (you must look at each item to find the item you want), it might be order Log(N) if you have an ordered collection or it could even be order 1 depending on your collection type.
The important thing here is to look at your algorithm and determine what the key operations are that are repeated.
Foreach is clearly an order N operation, by definition you must operate on each element in your list. O(N)
Next is your if InArray 2. This sounds like a search over an array, which would most likely be unordered so it would be order N (linear search). So your complexity would now be O(N * M). (for each n elements in array 1, perform a search of order N complexity over array 2).
Finally you have an array push. I don't know your environment but this could be order 1 or order N if the array needs to be reallocated and copied in order to grow. Lets assume order 1 to keep it simple. Therefore your complexity in Big O is O(N*M).
So now best case is for each element to find it's counterpart on the first try and perform the array push, which would be O(N * 1 * 1) = O(N).
Worst case is that the each element cannot be found in the second list forcing the full search of all elements in array 2. Therefore complexity is O(N * M).
Your teachers want to understand your thinking so show them your assumptions made. I highly recommend that you read the exact question and information you have been given before relying on the assumptions given here, you may have been told the language/platform which would tell you the exact penalty and algorithms used in each case. Hope that helps :)

Generally with such a problem I just look at the algorithm as Dr. Evil and ask, "How can I make this take the most time possible?"

We Keep Coding

PHP, A popular general-purpose scripting language that is especially suited to web development.