PHP: Permutations of 26 letters in X characters

PHP: Permutations of 26 letters in X characters - php

I have an alphabet array with 26 characters A..Z .
I am searching for a performant algorithm that lists all permutations that fill an array of length X without any repeating characters.
Examples:
X=3 . Target array: _ _ _
Permutations are A B C until Z Y X .
X=4 . Target array: _ _ _ _
Permutations are A B C D until Z Y X W
X=5 . Target array: _ _ _ _ _
Permutations are A B C D E until Z Y X W V
(Sorry, I don't know how this kind of algorithm is named)
Thanks in advance.
Code in C, Delphi or Java is also OK, since it can be easy translated.

A simple solution is a recursive one
char current_combination[27];
int char_used[26];
void enumerate(int i, int n)
{
for (int j=0; j<26; j++)
{
if (!char_used[j])
{
char_used[j] = 1;
current_combination[i] = 'A' + j;
if (i+1 == n)
{
puts(current_combination);
}
else
{
enumerate(i+1, n);
}
char_used[j] = 0;
}
}
}
The function above accepts the index i of the character to be computed and the total number n of characters in a combination (the code assumes i<n). It keeps the current combination and the array of flags for already used variables in globals to avoiding copying them around.
To generate for example all combinations of length 5 call enumerate(0, 5).
Note that the total number of combinations grows very fast. For example for n=6 there are 165,765,600 combinations, with more than 1Gb of output.

I'd take a simple brute force approach though do understand the number of permutations can get up there as the number is 26!/(26-x)! which can be rather large as for 3 there are 15,600 permutations and for 5 there are 7,893,600 permutations which isn't exactly small. Basically you could just loop through all the values with loops in loops that unfortunately would be O(n^x) where x is the number of characters since the nesting of loops causing the complexity jump.
Something to consider is how finely are you examining complexity here. For example, while you could consider ways to go about being clever in the first pair of loops to avoid duplication, the third loop in becomes a bit trickier though if you started with a List of 26 letters and removed the previous ones, this would make the last loop be a simply iterative as you know there isn't any duplicates though this can be expensive in terms of memory consumed in having to make copies of the list on each pass from the outer loop. Thus the first time, you'd go through AB_ and then AC_ and so forth, but the copying of the list may be where this gets expensive in terms of operations as there would be thousands of times that the list is copied that one could wonder if that is more efficient than doing comparisons.

Are you sure you want to see all permutations? If you have X=3, you will have 26*25*24 combinations = 15600. And if X=5 number of combinations is equal to 7893600.
You need to randomly select one letter(or array index) and store it somewhere and on each iteration you should check if this letter(or index) has been already selected on one of the previous iteration. After this you will get random sequence which length is X characters. You need to store it too. Then you need to reapeat all operation made on the previous step and also you have to check if there is random sequense with subsequence you have been generating now.
Or you could use direct enumeration.
Sorry for not satisfactory english. I tried to be clear.
I hope it will be usefull.

Related

Getting every combination of X numbers given Y numbers?

I've come to a mathematical problem which for I can't program the logic.
Let me explain it with an example:
Let's say I have 4 holes and 3 marbles, the holes are in order and my marbles are A,B and C and also in order.
I need to get every posible ORDERED combination:
ABC4
AB3C
A2BC
1ABC
This is very simple, but what if the number of holes changes? Let's say now I have 5 holes.
ABC45
AB3C5
A2BC5
1ABC5
AB34C
A2B4C
1AB4C
A23BC
1A3BC
12ABC
Now let's say we have 5 holes and 4 marbles.
ABCD5
ABC4D
AB3CD
A2BCD
1ABCD
And this can be any number of holes and any number of marbles.
The number of combinations is given by:
$combinations = factorial($number_of_holes)/(factorial($number_of_marbles)*factorial($number_of_holes-$number_of_marbles)))
(Here it is the factorial function in case you need it)
function factorial($number) {
if ($number < 2) {
return 1;
} else {
return ($number * factorial($number-1));
}
}
What I need and can't figure out how to program, is a function or a loop or something, that returns an array with the position of the holes, given X numbers of holes and Y number of marbles.
For first example it would be: [[4],[3],[2],[1]], for second: [[4,5],[2,5],[1,5],[3,4],[2,4],[1,5],[2,3],[1,3],[1,2]], for third: [[5],[4],[3],[2],[1]].
It doesn't have to be returned in order, I just need all the elements.
As you can see, another approach is the complementary or inverse or don't know how to call it, but the solution is every combinations of X number of free holes given Y number of holes, so, If I have 10 holes, and 5 marbles, there would be 5 free holes, the array returned would be every combination of 5 that can be formed with (1,2,3,4,5,6,7,8,9,10), which are 252 combinations, and what I need is the 252 combinations.
Examples for the 2nd approach:
Given an array=[1,2,3,4], return every combination for sets of 2 and 3.
Sets of 2
[[1,2],[1,3],[1,4],[2,3],[2,4],[3,4]]
Sets of 3
[[1,2,3],[1,2,4],[1,3,4],[2,3,4]]
What I need is the logic to do this, I'm trying to do it in PHP, but I just can't figure out how to do it.
The function would receive the array and the set size and would return the array of sets:
function getCombinations($array,$setize){
//magic code which I can't figure out
return array(sets);
}
I hope this is clear enough and someone can help me, I've been stuck for several days now, but it seems to be just too much for me to handle by myself.
This post, PHP algorithm to generate all combinations of a specific size from a single set, is for all possible combinations, repeating the elements and order doesn't matter, its a good lead, I did read it, but it doesn't solve my problem, it's very different. I need them without repeating the elements and ordered as explained.
Let's say if I have already a set of [3,4] in my array, I don't want [4,3] as an other set.

Here's a recursive solution in PHP:
function getCombinations($array, $setsize){
if($setsize == 0)
return [[]];
// generate combinations including the first element by generating combinations for
// the remainder of the array with one less element and prepending the first element:
$sets = getCombinations(array_slice($array, 1), $setsize - 1);
foreach ($sets as &$combo) {
array_unshift($combo, $array[0]);
}
// generate combinations not including the first element and add them to the list:
if(count($array) > $setsize)
$sets = array_merge($sets, getCombinations(array_slice($array, 1), $setsize));
return $sets;
}
// test:
print_r(getCombinations([1, 2, 3, 4], 3));
Algorithm works like this:
If setsize is 0 then you return a single, empty combination
Otherwise, generate all combinations that include the first element, by recursively generating all combinations off the array excluding the first element with setsize - 1 elements, and then prepending the first element to each of them.
Then, if the array size is greater than setsize (meaning including the first element is not compulsory), generate all the combinations for the rest of the list and add them to the ones we generated in the second step.
So basically at each step you need to consider whether an element will be included or excluded in the combination, and merge together the set of combinations representing both choices.

Two-way hashing of fixed range numbers

I need to create a function which takes a single integer as argument in the range 0-N and returns a seemingly random number in the same range.
Each input number should always have exactly one output and it should always be the same.
Such a function would produce something like this:
f(1) = 4
f(2) = 1
f(3) = 5
f(4) = 2
f(5) = 3
I believe this could be accomplished by some kind of a hashing algorithm? I don't need anything complex, just not something too simple like f(1) = 2, f(2) = 3 etc.
The biggest issue is that I need this to be reversible. E.g. the above table should be true left-to-right as well as right-to-left, using a different function for the right-to-left conversion is fine.
I know the easiest way is to create an array, shuffle it and just store the relations in a db or something, but as I need N to be quite large I'd like to avoid this if possible.
Edit: For my particular case N is a specific number, it's exactly 16777216 (64^4).

If the range is always a power of two -- like [0,16777216) -- then you can use exclusive-or just as #MarkBaker suggested. It just doesn't work so easily if your range is not a power of two.
You can use addition and subtraction modulo N, although these alone are too obvious, so you have to combine it with something else.
You can also do multiplication modulo-N, but reversing that is complicated. To make it simpler, we can isolate the bottom eight bits and multiply those and add them in a way that doesn't interfere with those bits so we can use them again to reverse the operation.
I don't know PHP so I'm going to give an example in C, instead. Maybe it's the same.
int enc(int x) {
x = x + 4799 * 256 * (x % 256);
x = x + 8896843;
x = x ^ 4777277;
return (x + 1073741824) % 16777216;
}
And to decode, play the operations back in reverse order:
int dec(int x) {
x = x + 1073741824;
x = x ^ 4777277;
x = x - 8896843;
x = x - 4799 * 256 * (x % 256);
return x % 16777216;
}
That 1073741824 must be a multiple of N, and 256 must be a factor of N, and if N is not a power of two then you can't (necessarily) use exclusive-or (^ is exclusive-or in C and I assume in PHP too). The other numbers you can fiddle with, and add and remove stages, at your leisure.
The addition of 1073741824 in both functions is to ensure that x stays positive; this is so that the modulo operation doesn't ever give a negative result, even after we've subtracted values from x which might have made it go negative in the interim.

I offered to describe how I "randomly" scramble up 9-digit SSNs when producing research data sets. This does not replace or hash an SSN. It re-orders the digits. It is difficult to put the digits back in the correct order if you don't know the order in which they were scrambled. I have a gut feeling that this is not what the questioner really wants. So, I am happy to delete this answer if it is deemed off-topic.
I know that I have 9 digits. So, I start with an array that has 9 index values in order:
$a = array(0,1,2,3,4,5,6,7,8);
Now, I need to turn a key that I can remember into a way to shuffle the array. The shuffling has to be the same order for the same key every time. I use a couple tricks. I use crc32 to turn a word into a number. I use srand/rand to get a predictable order of random values. Note: mt_rand no longer produces the same sequence of random digits with the same seed, so I have to use rand.
srand(crc32("My secret key"));
usort($a, function($a, $b) { return rand(-1,1); });
The array $a still has the digits 0 through 8, but they are shuffled. If I use the same keyword I will get the same shuffled order every time. That lets me repeat this every month and get the same result. Then, with a shuffled array, I can pick the digits off the SSN. First, I ensure it has 9 characters (some SSNs are sent as integers and a leading 0 is omitted). Then, I build a masked SSN by picking the digits using $a.
$ssn = str_pad($ssn, 9, '0', STR_PAD_LEFT);
$masked_ssn = '';
foreach($a as $i) $masked_ssn.= $ssn{$i};
$masked_ssn will now have all the digits in $ssn, but in a different order. Technically, there are keywords that make $a become the original ordered array after shuffling, but that is very very rare.
Hopefully this makes sense. If so, you can do it all much faster. If you turn the original string into an array of characters, you can shuffle the array of characters. You just need to reseed rand every time.
$ssn = "111223333"; // Assume I'm using a proper 9-digit SSN
$a = str_split($ssn);
srand(crc32("My secret key"));
usort($a, function($a, $b) { return rand(-1,1); });
$masked_ssn = implode('', $a);
This is not really faster in a runtime way because rand is a rather expensive function and you run rand a hell of lot more here. If you are masking thousands of values as I do, you will want to use an index array that is shuffled just once, not a shuffling for every value.
Now, how do I undo it? Assume I'm using the first method with the index array. It will be something like $a = {5, 3, 6, 1, 0, 2, 7, 8, 4}. Those are the indexes for the original SSN in the masked order. So, I can easily build the original SSN.
$ssn = '000000000'; // I like to define all 9 characters before I start
foreach($a as $i=>$j) $ssn[$j] = $masked_ssn{$i};
As you can see, $i counts from 0 to 8 across the masked SSN. $j counts 5, 3, 6... and puts each value from the masked SSN in the correct place in the original SSN.

Looks like you've got good answer, but still there is an alternative. Linear Congruential Generator (LCG) could provide 1-to-1 mapping and it is known to be a reversible using Euclid's algorithm. For 24bit
Xi = [(A * Xi-1) + C] Mod M
where M = 2^24 = 16,777,216
A = 16,598,013
C = 12,820,163
For LCG reversability take a look at Reversible pseudo-random sequence generator

Fastest method for determining all substrings of existing string

Let's say I have the string "Hey". I would like to determine all combinations of characters that exist in this string as fast as possible. The resulting algorithm should generate this:
H, e, y, He, ey, Hey
The algorithm should not produce the string "Hy" because it does not exist in the string as a substring.

There are O(n^2) of those substrings, of lengths [1,n], so any algorithm to generate all of them will be O(n^2) * O(n) = O(n^3):
(*) See Edit2 at the end - depending on the implementation of the string - the complexity can vary from O(n^2) to O(n^3)
Pseudo code:
result <- {} #result is a set if dupes should be terminated, otherwise - it is a multiset.
for i from 0 to s.length:
for j from i+1 to s.length:
result.add(s.substring(i,j))
return result
Note however, that you can do some "cheating", by creating an iterator and generate the substrings on the fly, it should look something like this [pseudo code]:
class MyIterator:
String s
int i,j
MyIterator(String s):
this.s = s
i = 0
j = 0
next():
j = j + 1
if (j >= s.length):
i = i + 1
j = i + 1
if (i >= s.length):
throw exception
return s.substring(i,j)
Note that creating the iterator is O(1), and each iteration is O(n) - but to actually produce all the elements, you need O(n^2) steps, so complexity remains O(n^3) overall, but you decrease the latency of your application.
EDIT:
I editted complexity, claiming it is O(n^2) is wrong, the complexity is O(n^3) since you need to generate strings of variable lengths, some of them are long. At least half of the generated substrings will be of length n/2 - thus the total complexity is Theta(n^3)
EDIT2:
In some cases it can actually be O(n^2) - depending on the string implementation. In java for example - it uses a single char[], and only "plays" with the offset and length - so in java - creation is actually O(n^2), since creating a substring is O(1)
In C however - it is O(n^3), since every substring needs to be copied to a different char[].

Check for the implementation of n-grams in php.
In your example string: Hey
H, E, Y are unigrams
HE, EY are bigrams
HEY is a trigram

Multiple foreach with over 37 million possibilities

I've been tasked with creating a list of all possibilities using data in 8 blocks.
The 8 blocks have the following number of possibilities:
*Block 1: 12 possibilities
*Block 2: 8 possibilities
*Block 3: 8 possibilities
*Block 4: 11 possibilities
*Block 5: 16 possibilities
*Block 6: 11 possibilities
*Block 7: 5 possibilities
*Block 8: 5 possibilities
This gives a potential number of 37,171,200 possibilities.
I tried simply doing and limiting only to displaying the values returned with the correct string length like so:
foreach($block1 AS $b1){
foreach($block2 AS $b2){
foreach($block3 AS $b3){
foreach($block4 AS $b4){
foreach($block5 AS $b5){
foreach($block6 AS $b6){
foreach($block7 AS $b7){
foreach($block8 AS $b8){
if (strlen($b1.$b2.$b3.$b4.$b5.$b6.$b7.$b8) == 16)
{
echo $b1.$b2.$b3.$b4.$b5.$b6.$b7.$b8.'<br/>';
}
}
}
}
}
}
}
}
}
However the execution time was far too long to compute. I was wondering if anyone knew of a simpler way of doing this?

You could improve your algorithm by caching the string prefixes and remember their lengths. Then you don’t have to do that for each combination.
$len = 16:
// array for remaining characters per level
$r = array($len);
// array of level parts
$p = array();
foreach ($block1 AS &$b1) {
// skip if already too long
if (($r[0] - strlen($b1)) <= 0) continue;
$r[1] = $r[0] - strlen($b1);
foreach ($block2 AS &$b2) {
if (($r[1] - strlen($b2)) <= 0) continue;
$r[2] = $r[1] - strlen($b2);
foreach ($block3 AS $b3) {
// …
foreach ($block8 AS &$b8) {
$r[8] = $r[7] - strlen($b8);
if ($r[8] == 0) {
echo implode('', $p).'<br/>';
}
}
}
}
}
Additionally, using references in foreach will stop PHP using a copy of the array internally.

You could try to store the precomputed part the concatenated string known at each of the previous lelels for later reuse, avoiding concatenating everything in the innermost loop
foreach($block7 AS $b7){
$precomputed7 = $precomputed6.$b7
foreach($block8 AS $b8){
$precomputed8 = $precomputed7.$b8
if (strlen($precomputed8) == 16) {
echo $precomputed8.'<br/>';
}
}
}
Doing this analogously for precedent levels. Then you could try to test at one of the higher loop level for strings that are already longer as 16 chars. You can shortcut and avoid trying out other possibilities. But beware calculating the length of the string costs much performance, maybe is the latter improvement not worth it at all, depending on the input data.
Another idea is to precalculate the lengths for each block and then recurse on the array of lengths, calculating sums should be faster than concatenating and computing the length of strings. For the Vector of indexes that match the length of 16, you can easily output the full concatenated string.

Since you have that length requirement of 16 and assuming each (i) possibility in each (b) of the eight blocks has length x_i_b you can get some reduction by some cases becoming impossible.
For example, say we have length requirement 16, but only 4 blocks, with possibilities with lengths indicated
block1: [2,3,4]
block2: [5,6,7]
block3: [8,9,10]
block4: [9,10,11]
Then all of the possibilities are impossible since block 4's lengths are all too large to permit any combination of blocks 1 - 3 of making up the rest of the 16.
Now if you're length is really 16 that means that your (possible) lengths range from 1 to 9, assumng no 0 lengths.
I can see two ways of approaching this:
Greedy
Dynamic Programming
Perhaps even combine them. For the Greedy approach, pick the biggest possibility in all the blocks, then the next biggest etc, follow that through until you cross your threshold of 16. If you got all the blocks, then you can emit that one.
Whether or not you got on threshold or not, you can then iterate through the possibilities.
The dynamic appraoch means that you should store some of the results that you compute already. Like a selection from some of the blocks that gives you a length of 7, you don't need to recompute that in future, but you can iterate through the remaining blocks to see if you can find a combination to give you lenth 9.
EDIT: This is kind of like the knapsack problem but with the additional restriction of 1 choice per block per instance. Anyway, in terms of other optimizations definitely pre process the blocks into arrays of lengths only and keep a running sum at each iteration level. So you only do 1 sum per each iteration of each loop, rather than 8 sums per each iteration. Also only str concat if you need to emit the selection.
If you don't want a general solution (probably easier if you don't) then you can hand code alot of problem instance specific speedups by excluding the largest too small combination of lengths (and all selections smaller than that) and excluding the smallest too large combination of lengths (and all selections larger).

If you can express this as a nested array, try a RecursiveIteratorIterator, http://php.net/manual/en/class.recursiveiteratoriterator.php

What is the best algorithm to see if my number is in an array of ranges?

I have a 2 dimensional arrays in php containing the Ranges. for example:
From.........To
---------------
125..........3957
4000.........5500
5217628......52198281
52272128.....52273151
523030528....523229183
and so on
and it is a very long list. now I want to see if a number given by user is in range.
for example numbers 130, 4200, 52272933 are in my range but numbers 1, 5600 are not.
of course I can count all indexes and see if my number is bigger than first and smaller than second item. but is there a faster algorithm or a more efficient way of doing it using php function?
added later
It is sorted. it is actually numbers created with ip2long() showing all IPs of a country.
I just wrote a code for it:
$ips[1] = array (2,20,100);
$ips[2] = array (10,30,200);
$n=11;// input ip
$count = count($ips);
for ($i = 0; $i <= $count; $i++) {
if ($n>=$ips[1][$i]){
if ($n<=$ips[2][$i]){
echo "$i found";
break;
}
}else if($n<$ips[1][$i]){echo "not found";break;}
}
in this situation numbers 2,8,22,and 200 are in range. but not numbers 1,11,300

Put the ranges in a flat array, sorted from lower to higher, like this:
a[0] = 125
a[1] = 3957
a[2] = 4000
a[3] = 5500
a[4] = 5217628
a[5] = 52198281
a[6] = 52272128
a[7] = 52273151
a[8] = 523030528
a[9] = 523229183
Then do a binary search to determine at what index of this array the number in question should be inserted. If the insertion index is even then the number is not in any sub-range. If the insertion index is odd, then the number falls inside one of the ranges.
Examples:
n = 20 inserts at index 0 ==> not in a range
n = 126 inserts at index 1 ==> within a range
n = 523030529 inserts at index 9 ==> within a range

You can speed things up by implementing a binary search algorithm. Thus, you don't have to look at every range.
Then you can use in_array to check if the number is in the array.
I'm not sure if I got you right, do your arrays really look like this:
array(125, 126, 127, ..., 3957);
If so, what's the point? Why not just have?
array(125, 3957);
That contains all the information necessary.

The example you give suggests that the numbers may be large and the space sparse by comparison.
At that point, you don't have very many options. If the array is sorted, binary search is about all there is. If the array is not sorted, you're down to plain, old CS101 linear search.

The correct data structure to use for this problem is an interval tree. This is, in general, much faster than binary search.

I am assuming that the ranges do not overlap.
If that is the case, you can maintain a map data structure that is keyed on the lower value of the range.
Now all you have to do (given the number N) is to find the key in the map that is just lower than N (using binary search - logarithmic complexity) and then check if the number is lesser than the right value.
Basically, it is a binary search (logarithmic) on the constructed map.

From a pragmatic point of view, a linear search may very well turn out to be the fastest lookup method. Think of page faults and hard disk seek time here.
If your array is large enough (whatever "enough" actually means), it may be wise to stuff your IPs in a SQL database and let the database figure out how to efficiently compute SELECT ID FROM ip_numbers WHERE x BETWEEN start AND end;.

We Keep Coding

PHP, A popular general-purpose scripting language that is especially suited to web development.