Generating unique fixed integer ids from array of ids

Generating unique fixed integer ids from array of ids - php

So here is the situation... I got array of objects, each marked with unique integer id, and for each and every combination of those objects, I need to create new ones, each with unique ids. Problem is that that list of objects is dynamic, used in stateless environment, so newly generated ids must be same for every run.
To make it clearer what I need here, consider that array of objects as array of their ids, for example: [10, 7, 23]. And basically, I need to get ids for all the possible combinations:
10, 7
10, 23
7, 23
10, 7, 23
What's important here is that generated ids must be same for each distinct combination (for example: 10 and 7 should always produce same id). Also, newly added objects should not affect previously generated ids. So for example, when some new object is later on added to that list, ids generated from previous combinations must remain the same as before new object was added.
Currently, I have a solution that pretty much comes down to generating new id as a result of the sum of combining ids, so resulting ids are:
17
33
30
40
Of course, this approach can produce duplicate ids, and that's the reason I'm asking for advice for some more sophisticated algorithm. I also tried introducing fixed offset of 1000 for newly generated ids and multiplying sum with number of objects in combination, so that for example resulting ids are 1034 (1000+(10+7)*2), 1066 (1000+(10+23)*2), etc., but I'm not sure that it would save me from duplicates. :)
Clear mention, I need this for the purpose of certain PHP project, but as this problem is not language-specific, I hope that there are some good mathematicians that can bring some good solution. :)
Useful information is fact that combining ids are in range from 10000-99999 and maximum number of items in combination does not exceed 10.
Please note that I do not need solution for how to make all the combinations from array elements, but only that "formula" for producing integer id.
Thanks in advance.

Not really sure what your aim is, but I'll have a go...
Have you tried using character keys? For example 10, 7, 3 becomes a sequence with an underscore. Each sequence will have a unique hash.
$arrayOfKeys = array(10, 7, 3);
$hash = implode('_', $arrayOfKeys);
print $hash;
# 10_7_3
Personally I'd go for this simple approach. If you're using a database and you're not producing, say, 100k records per day, it should be pretty fast using an indexed (primary key or unique) varchar field.
If you are to create numbers, here a tip: take the length of the largest number and that will be the prefix of your sequence, e.g.:
10, 5, 1 -> 2100501
105, 45, 201 -> 3105045201
The prefix will tell you what the length of the following sequences are. I can't think of any way you'd get doubles... Anyone? ;)
Hope it helps...

Step 1: Sort the values you get.
eg: if you get 10, 7 or 7, 10 it should result result in 7, 10 before going to the ID generator. If you know the range of your numbers i.e lets assume [0-100] use radix or count sort, will be fast.
Step 2 : Represent the numbers as strings, seperated by any chosen seperator.(':') maybe.
eg: for 7, 10 id will become "7:10".
Sorting is being done to avoid generating different ID's for 10, 7 and 7, 10.
BTW What do these numbers represent?

I don't think this is possible unless you allow labels of increasing length.
Assume you have a maximum of N distinct objects, corresponding to N distinct labels.
If you want to be able to represent all possible pairs, assuming order in a pair does not matter, you potentially need N.(N-1)/2 extra labels, whatever they are, and you need to reserve them all.
And for all triples, N.(N-1).(N-2)/6, for all quads N.(N-1).(N-2).(N-3)/24...
This grows exponentially and will very quickly exceed the capacity of integers.
Any other solution that tries to compress the space of labels, such as hashing, will result in collisions. You can resolve the collisions by maintaining collision table, but this will break the "generated ids must be same for every run" requirement.

Related

From 6 random numbers calculate random three-digit number?

I have 4 years PHP and C# experience, but Math is not my better side.
I thnik that i need in this project use some math algorithms.
When page load I need randomly create 7 numbers, 6 are numbers that I can use to calculate given three digit number:
rand 1-9
rand 1-9
rand 1-9
rand 1-9
rand 10-100 //5 steps
rand 10-100 //5 steps
and given number to calculate is 100-999,
I can use this operations: +, -, /, *, (, )
What is best algorithm for this?
I probably need to try all possible combinations with this 6 numbers to calculate given number or closest number of calculations.
example:
let say that given three digit number is
350, and I need to calculate this number from this numbers:
3,6,9,5 10, 100
so formula for this is:
(100*3)+(5*10) = 350
if is not possible to calculate exact number, than calculate closest.
You don't need to solve this problem completely, you can introduce me to solve this problem by paste some pseudo, or describing how to do that.

I have no actual experience that might help you with this, though since you're asking for some insight, I'll share my thoughts on how to do this.
As I typed my answer, I realised that this is in fact a knapsack problem, which means you can solve it to optimality using any algorithm that solves the knapsack problem. I recommend using dynamic programming to make your program run faster.
What you need to do is construct all numbers you can generate by combining two numbers with an operator, so that after this you have a list containing the numbers you started with, and the numbers you generated.
Then you solve the knapsack problem using the numbers as items with their value as their weight, and the number as the weight you can store at most.
The only thing that is slightly different is that you have an extra constraint that says that you may only use a number once. So you need to add into your implementation that if you add a combination of numbers, that you must remove the option of storing another combination that is constructed with the same number.

You could enumerate all the solutions by building "Abstract syntax trees", binary trees with the following informations :
the leaves are the 6 numbers
the nodes are the operations, for example a node '+' with the leaf '7' for left son and another node for right son that is 'x' with '140' for left son and '8' for right son would represent (7+(140*8)). Additionally, at each node you store the numbers that you already used (the leaves used in the tree), and the total.
Let's say you store all the constructed trees in the associative map TreeSets, but indexed by the number of leaves you use. For example, the tree (7+(140*8)) would not be stored directly in TreeSets but in TreeSets[3] (TreeSets[3] contains several trees, it is also a set).
You store the most close score in BestScore and one solution of the BestScore in BestSolution.
You start by constructing the 6 leaves (that makes you 6 different trees consisting of only one leaf). You save the closer number in Bestscore and the corresponding leaf in BestSolution.
Then at each step, you try to construct the trees with i leaves, i from 2 to 6, and store them in TreeSets[i].
You take j from 1 to i-1, you take each tree in TreeSets[j] and each tree in TreeSets[i-j], you check that those two trees don't use the same leaves (you don't have to check at the bottom of the tree since you have stored the leaves used in the node), if so you build the four nodes '+', 'x', '/', '-' with the tree from TreeSets[j] as left son and the tree from TreeSets[i-j] and store all four of them in TreeSets[i]. While building a node, you take the total from both tree and apply the operation, you store the total, and you check if it is closer than BestScore (if so you update BestScore and BestSolution with this new total and with the new node). If the total is exactly the value you were looking for, you can stop here.
If you didn't stopped the program by finding an exact solution, there is no such solution, and the closer one is in BestSolution at the end.
Note : You don't have to build a complete tree each time, just build the node with two pointers on other trees.
P.S. : You may avoid to enumerate all the solutions by using the dynamic programming approach, as Glubus said. In this case, it would consist, at each step (i) to remove some solutions that are considered sub-optimal. But with this problem I'm not sure that is possible (except maybe remove the nodes with a total of 0).

Generate sequence on digits. Sequences should not be any similar

I'd like to generate a long list of 9-digits sequences.
Let's call them ID.
So each ID is unique and the main purpose is to have them all really different. It is unacceptable to have 2 IDs which differs by 1 or 2 digits in sequence.
Do you have any ideas how to implement it without comparing each new generated ID with each previously generated?
Probably there is some algorithm already or simple MYSQL function to compare how close those strings are?

You could try the following formula for your ID's - you would only need to check that the ID value doesn't already exist in the table (salt is a constant between 0 and 100 that doesn't ever change once you pick a value - I would recommend using a prime number, and definitely not 0):
ID = random integer * 101 + salt;
This generates ID values like the following (for salt = 73):
469956305
017775467
001195913
913620520
156482807
577463533
470183959
049290800
078643925
141526626
If you take any two of these ID values and compare them, you'll notice that no two numbers differ by only one or two digits in sequence. I wrote a script to compare all possible ID values between 0 and 3000000, and there were no two ID values of this form differing by 1 or 2 digits in sequence. If you want to test it out yourself, here's the script I used (in C#): http://ideone.com/lFHnlX - I reduced the upper limit because of timeout on IDEone.

You want to get away with not-checking for uniqueness and you don't want IDs to be similar? Then you're really looking for UUIDs/GUIDs.
MySQL's built-in uuid() function will get you there.
As Robert Harvey points out, UUIDs are alphanumeric (not numeric) and longer than 9 characters, but you're going to have to sacrifice something – you cannot satisfy all of your constraints simultaneously.

PHP: Compare two sets of numbers, no dupes

I'm creating a lottery contest for my site, and I need to know the easiest way to compare numbers, so that no two people can choose the same numbers. It's 7 sets of numbers, each number is a number between 1 and 30.
For example, if user 1 chooses: 1, 7, 9, 17, 22, 25, 29 how can I make sure that user 2 can't choose those same exact number?
I was thinking about throwing all 7 numbers into an array, sort it so the numbers are in order, then join them into one string. Then when another user chooses their 7 numbers, it does the same, then compares the two. Is there a better way of doing it?

What you describe sounds like the best way to me, IF you are dealing with all submissions in the same script - I would trim(implode(',',$array)) the sorted array, store the resulting string in an array and call in_array() to determine whether the value already exists.
HOWEVER I suspect that what you are actually doing is storing the selections in a database table and comparing later submissions against this table. In this case (I am taking a liberty and assuming MySQL here but I would say it is the most common engine used with PHP) you should create a table with 7 columns choice_1, choice_2 ... choice_7(along with whatever other columns you want) and create a unique index across all seven choice_* columns. This means that when you try and insert a duplicate row, the query will fail. This lets MySQL do all the work for you.

Try array_diff. There are some really good examples on php.net.

"Cluster analysis" with MySQL

This is a tough one. There is probably a name for this and I don't know it, so I'll describe the problem exactly.
I have a dataset including a number of user-submitted values. I need to be able to determine based on some sort of average, or better, a "closeness of data", which value is the correct value. For example, if I received the following three submissions from three users, 4, 10, 3, I would know that 3 or 4 would be the "correct" value in this case. If I were to average it out, I'd get 5.6 which is not the intended result.
I'm attempting to do this using MySQL and PHP.
tl;dr Need to find a value from a dataset based on "closeness" of relative values (using MySQL/PHP)
Thanks!

Clustering using a database isn't going to be a single query type of procedure. It takes iterations to generate the clusters effectively.
You first need to decide how many clusters you want. If you wanted only one cluster, then obviously everything would go into it. If you want two, then you can write your program to separate the nodes into two groups using some sort of correlation metric.
In other words, I don't think this is a MySQL question so much as a clustering question.

I think that is the kind of thing you're looking for:
SELECT id, MIN(ABS(id - (SELECT AVG(id) FROM table))) as min
FROM table
GROUP BY id
ORDER BY min
LIMIT 1;
Per example, if your data set contains the following IDs: 3, 4, 10, with an average of 5.6667. The closest value to 5.6667 is 4. If your data set is 3, 6, 10, 14, with an average of 8.25, the clostest value is 10.
This is what this query returns. Hope it helps.

I have the impression you are looking for the median
E.g. in the list 1 2 3 4 100, the median (central value) is 3.
You may want to search for [https://stackoverflow.com/search?q=sql+median finding the median in SQL].

Permutations of Varying Size

I'm trying to write a function in PHP that gets all permutations of all possible sizes. I think an example would be the best way to start off:
$my_array = array(1,1,2,3);
Possible permutations of varying size:
1
1 // * See Note
2
3
1,1
1,2
1,3
// And so forth, for all the sets of size 2
1,1,2
1,1,3
1,2,1
// And so forth, for all the sets of size 3
1,1,2,3
1,1,3,2
// And so forth, for all the sets of size 4
Note: I don't care if there's a duplicate or not. For the purposes of this example, all future duplicates have been omitted.
What I have so far in PHP:
function getPermutations($my_array){
$permutation_length = 1;
$keep_going = true;
while($keep_going){
while($there_are_still_permutations_with_this_length){
// Generate the next permutation and return it into an array
// Of course, the actual important part of the code is what I'm having trouble with.
}
$permutation_length++;
if($permutation_length>count($my_array)){
$keep_going = false;
}
else{
$keep_going = true;
}
}
return $return_array;
}
The closest thing I can think of is shuffling the array, picking the first n elements, seeing if it's already in the results array, and if it's not, add it in, and then stop when there are mathematically no more possible permutations for that length. But it's ugly and resource-inefficient.
Any pseudocode algorithms would be greatly appreciated.
Also, for super-duper (worthless) bonus points, is there a way to get just 1 permutation with the function but make it so that it doesn't have to recalculate all previous permutations to get the next?
For example, I pass it a parameter 3, which means it's already done 3 permutations, and it just generates number 4 without redoing the previous 3? (Passing it the parameter is not necessary, it could keep track in a global or static).
The reason I ask this is because as the array grows, so does the number of possible combinations. Suffice it to say that one small data set with only a dozen elements grows quickly into the trillions of possible combinations and I don't want to task PHP with holding trillions of permutations in its memory at once.

Sorry no php code, but I can give you an algorithm.
It can be done with small amounts of memory and since you don't care about dupes, the code will be simple too.
First: Generate all possible subsets.
If you view the subset as a bit vector, you can see that there is a 1-1 correspondence to a set and a binary number.
So if your array had 12 elements, you will have 2^12 subsets (including empty set).
So to generate a subset, you start with 0 and keep incrementing till you reach 2^12. At each stage you read the set bits in the number to get the appropriate subset from the array.
Once you get one subset, you can now run through its permutations.
The next permutation (of the array indices, not the elements themselves) can be generated in lexicographic order like here: http://www.de-brauwer.be/wiki/wikka.php?wakka=Permutations and can be done with minimal memory.
You should be able to combine these two to give your-self a next_permutation function. Instead of passing in numbers, you could pass in an array of 12 elements which contains the previous permutation, plus possibly some more info (little memory again) of whether you need to go to the next subset etc.
You should actually be able to find very fast algorithms which use minimal memory, provide a next_permutation type feature and do not generate dupes: Search the web for multiset permutation/combination generation.
Hope that helps. Good luck!

The best set of functions I've come up with was the one provided by some user at the comments of the shuffle function on php.net Here is the link It works pretty good.
Hope it's useful.

The problem seems to be trying to give an index to every permutation and having a constant access time. I cannot think of a constant time algorithm, but maybe you can improve this one to be so. This algorithm has a time complexity of O(n) where n is the length of your set. The space complexity should be reducible to O(1).
Assume our set is 1,1,2,3 and we want the 10th permutation. Also, note that we will index each element of the set from 0 to 3. Going by your order, this means the single element permutations come first, then the two element, and so on. We are going to subtract from the number 10 until we can completely determine the 10th permutation.
First up are the single element permutations. There are 4 of those, so we can view this as subtracting one four times from 10. We are left with 6, so clearly we need to start considering the two element permutations. There are 12 of these, and we can view this as subtracting three up to four times from 6. We discover that the second time we subtract 3, we are left with 0. This means the indexes of our permutation must be 2 (because we subtracted 3 twice) and 0, because 0 is the remainder. Therefore, our permutation must be 2,1.
Division and modulus may help you.
If we were looking for the 12th permutation, we would run into the case where we have a remainder of 2. Depending on your desired behavior, the permutation 2,2 might not be valid. Getting around this is very simple, however, as we can trivially detect that the indexes 2 and 2 (not to be confused with the element) are the same, so the second one should be bumped to 3. Thus the 12th permutation can trivially be calculated as 2,3.
The biggest confusion right now is that the indexes and the element values happen to match up. I hope my algorithm explanation is not too confusing because of that. If it is, I will use a set other than your example and reword things.

Inputs: Permutation index k, indexed set S.
Pseudocode:
L = {S_1}
for i = 2 to |S| do
Insert S_i before L_{k % i}
k <- k / i
loop
return L
This algorithm can also be easily modified to work with duplicates.

We Keep Coding

PHP, A popular general-purpose scripting language that is especially suited to web development.