I need help with creating an algorithm in PHP that, given an array of alphabets (represented as strings) and an array of groupings of those alphabets (also an array of strings), returns an array of arrays of all possible combinations of strings based on those groupings. The following example will make it clear -
If the input array is ['A', 'B', 'C'] and the groupings are ['AB', 'BC'] the returned output:
Without any restrictions would be
[['A','B','C'], ['AB,'C'], ['A','BC'], ['AC','B'], ['ABC']]
With the restrictions of the groupings should be [['A','B','C'], ['AB,'C'], ['A','BC']]
The reason for this is because neither 'ABC' nor 'AC' are allowed groupings and the idea is that the groupings should only exist if they belong to the specified array. In this case, since 'AB' and 'BC' are the only possible groupings, the output contains them. The first output was just for demonstration purposes, but the algorithm should produce the second output. The only other restriction is that there can't be duplicate alphabets in a single combination. So the following output is NOT correct:
[['A','B','C'], ['AB,'C'], ['A','BC'], ['AB','BC'], ['AC','B'], ['ABC']]
since 'B' is a duplicate in ['AB','BC']
A similar question I found was here, except that there are no restrictions on which numbers can be grouped together in the "Result" in this question.
I apologize if I made it sound confusing but I'll be sure to clarify if you have any questions.
The simplest approach to generate such partitions is recursive (I think).
At first, represent restrictions as boolean (or 0/1) 2d matrix. For your case graph has connections (edges) A-B and B-C and adjacency matrix is [[0,1,0][1,0,1],[0,1,0]]
Start from empty array. At every recursion level add next element (A, then B, then C) into all possible groups and into separate group.
(In languages like C I'd use bit masks for every group to determine quickly with bit-OR operation whether a group allows to add current element)
First level: add A and get:
[[A]]
Second level: add B both in existing group and in separate one:
[[A, B]], [[A],[B]]
Third Level: you add C only with:
[[A, B], C], [[A],[B, C]], [[A],[B], [C]]
You can use the answer from the post you linked. I adapted it for you:
function generate_groups($collection) {
if (count($collection) == 1) {
yield [$collection];
return;
}
$first = $collection[0];
foreach (generate_groups(array_slice($collection, 1)) as $smaller) {
foreach (array_values($smaller) as $n => $subset) {
yield array_merge(
array_slice($smaller, 0, $n),
[array_merge([$first], $subset)],
array_slice($smaller, $n+1)
);
}
yield array_merge([[$first]], $smaller);
}
}
$input = ['A', 'B', 'C'];
$groupings = ['AB', 'BC'];
foreach (generate_groups($input) as $groups) {
$are_groups_ok = true;
foreach ($groups as $group) {
$compact = implode($group);
if (strlen($compact) != 1 and !in_array($compact, $groupings)) {
$are_groups_ok = false;
}
}
if ($are_groups_ok) {
echo "[" . implode("], [", array_map("implode", $groups)) . "]\n";
}
}
This prints:
[A], [BC]
[AB], [C]
[A], [B], [C]
Related
The following is the code
<?php
$id ="202883-202882-202884-0";
$str = implode('-',array_unique(explode('-', $id)));
echo $str;
?>
The result is
202883-202882-202884-0
for $id ="202883-202882-202882-0";, result is 202883-202882-0
I would like to replace the duplicate value with zero, so that the result should be like 202883-202882-0-0, not just remove it.
and for $id ="202883-0-0-0";, result should be 202883-0-0-0. zero should not be replaced, repeating zeros are allowed.
How can I archive that?
More info:
I want to replace every duplicate numbers. Because this is for a product comparison website. There will be only maximum 4 numbers. each will be either a 6 digit number or single digit zero. all zero means no product was selected. one 6 digit number and 3 zero means, one product selected and 3 blank.
Each 6 digit number will collect data from database, I dont want to allow users to enter same number multiple times (will happen only if the number is add with the URL manually.).
Update: I understand that my question was not clear, may be my English is poor.
Here is more explanation, this function is for a smartphone comparison website.
The URL format is sitename.com/compare.html?id=202883-202882-202889-202888.
All three numbers are different smartphones(their database product ID).
I dont want to let users to type in the same product ID like id=202883-202882-202882-202888. It will not display two 202882 results in the website, but it will cause some small issues. The URL will be same without change, but the internal PHP code should consider it as id=202883-202882-202888-0.
The duplicates should be replaced as zero and added to the end.
There will be only 4 numbers separated by "-".
The following examples might clear the cloud!
if pid=202883-202882-202889-202888 the result should be 202883-202882-202889-202888
if pid=202883-202883-202883-202888 the result should be 202888-0-0-0
if pid=202883-202882-202883-202888 the result should be 202883-202882-202888-0
if pid=202882-202882-202882-202882 the result should be 202882-0-0-0
I want to allow only either 6 digit numbers or single digit zero through the string.
if pid=rgfsdg-fgsdfr4354-202883-0 the result should be 202883-0-0-0
if pid=fasdfasd-asdfads-adsfds-dasfad the result should be 0-0-0-0
if pid=4354-45882-445202882-202882 the result should be 202882-0-0-0
It is too complicated for me create, I know there are bright minds out there who can do it much more efficiently than I can.
You can do a array_unique (preserves key), then fill the gaps with 0. Sort by key and you are done :)
+ on arrays will unify the arrays but prioritizes the one on the left.
Code
$input = "0-1-1-3-1-1-3-5-0";
$array = explode('-', $input);
$result = array_unique($array) + array_fill(0, count($array), 0);
ksort($result);
var_dump(implode('-',$result));
Code (v2 - suggested by mickmackusa) - shorter and easier to understand
Fill an array of the size of the input array. And replace by leftover values from array_unique. No ksort needed. 0s will be replaced at the preserved keys of array_unique.
$input = "0-1-1-3-1-1-3-5-0";
$array = explode('-', $input);
$result = array_replace(array_fill(0, count($array), 0), array_unique($array));
var_export($result);
Working example.
Output
string(17) "0-1-0-3-0-0-0-5-0"
Working example.
references
ksort - sort by key
array_fill - generate an array filled with 0 of a certain length
This is another way to do it.
$id = "202883-202882-202882-0-234567-2-2-45435";
From the String you explode the string into an array based on the delimiter which in this case is '-'/
$id_array = explode('-', $id);
Then we can loop through the array and for every unique entry we find, we can store it in another array. Thus we are building an array as we search through the array.
$id_array_temp = [];
// Loop through the array
foreach ($id_array as $value) {
if ( in_array($value, $id_array_temp)) {
// If the entry exists, replace it with a 0
$id_array_temp[] = 0;
} else {
// If the entry does not exist, save the value so we can inspect it on the next loop.
$id_array_temp[] = $value;
}
}
At the end of this operation we will have an array of unique values with any duplicates replaced with a 0.
To recreate the string, we can use implode...
$str = implode('-', $id_array_temp);
echo $str;
Refactoring this, using a ternary to replace the If,else...
$id_array = explode('-', $id);
$id_array_temp = [];
foreach ($id_array as $value) {
$id_array_temp[] = in_array($value, $id_array_temp) ? 0 : $value;
}
$str = implode('-', $id_array_temp);
echo $str;
Output is
202883-202882-0-0-234567-2-0-45435
This appears to be a classic XY Problem.
The essential actions only need to be:
Separate the substrings in the hyphen delimited string.
Validate that the characters in each substring are in the correct format AND are unique to the set.
Only take meaningful action on qualifying value.
You see, there is no benefit to replacing/sanitizing anything when you only really need to validate the input data. Adding zeros to your input just creates more work later.
In short, you should use a direct approach similar to this flow:
if (!empty($_GET['id'])) {
$ids = array_unique(explode('-', $_GET['id']));
foreach ($ids as $id) {
if (ctype_digit($id) && strlen($id) === 6) {
// or: if (preg_match('~^\d{6}$~', $id)) {
takeYourNecessaryAction($id);
}
}
}
I executed the following code and its result made me confused!
I pass two arrays and a function named "myfunction" as arguments to the array_diff_ukey function. I see that myfunction is called 13 times (while it should be called at most 9 times). Even more amazing is that it compares the keys of the same array too! In both columns of the output, I see the key "e", while only the second array has it (the same is true for some other keys).
function myfunction($a,$b) {
echo $a . " ".$b."<br>";
if ($a===$b) {
return 0;
}
return ($a>$b)?1:-1;
}
$a1=array("a"=>"green","b"=>"blue","c"=>"red");
$a2=array("d"=>"blue","e"=>"black","f"=>"blue");
$result=array_diff_ukey($a1,$a2,"myfunction");
print_r($result);
Output:
a b
b c
d e
e f
a d
a e
a f
b d
b e
b f
c d
c e
c f
Array
(
[a] => green
[b] => blue
[c] => red
)
See it run on eval.in.
Why does the array_diff_ukey perform that many unnecessary calls to the compare function?
Nice question. Indeed the implemented algorithm is not the most efficient.
The C-source for PHP array functions can be found github. The implementation for array_diff_ukey uses a C-function php_array_diff which is also used by the implementations of array_udiff, array_diff_uassoc, and array_udiff_uassoc.
As you can see there, that function has this C-code:
for (i = 0; i < arr_argc; i++) {
//...
zend_sort((void *) lists[i], hash->nNumOfElements,
sizeof(Bucket), diff_key_compare_func, (swap_func_t)zend_hash_bucket_swap);
//...
}
...which means each input array is sorted using the compare function, explaining the first series of output you get, where keys of the same array are compared, and the first column can list other keys than the those of the first array.
Then it has a loop on the elements of the first array, a nested loop on the other arrays, and -- nested in that -- a loop on the elements of each of those:
while (Z_TYPE(ptrs[0]->val) != IS_UNDEF) {
//...
for (i = 1; i < arr_argc; i++) {
//...
while (Z_TYPE(ptr->val) != IS_UNDEF &&
(0 != (c = diff_key_compare_func(ptrs[0], ptr)))) {
ptr++;
}
//...
}
//...
}
Evidently, the sorting that is done on each of the arrays does not really contribute to anything in this algorithm, since still all keys of the first array are compared to potentially all the keys of the other array(s) with a plain 0 != comparison. The algorithm is thus O(klogk + nm), where n is the size of the first array, and m is the sum of the sizes of the other arrays, and k is the size of the largest array. Often the nm term will be the most significant.
One can only guess why this inefficient algorithm was chosen, but it looks like the main reason is code reusability: as stated above, this C code is used by other PHP functions as well, where it may make more sense. Still, it does not really sound like a good excuse.
A simple implementation of this (inefficient) array_diff_ukey algorithm in PHP (excluding all type checking, border conditions, etc) could look like this mimic_array_diff_ukey function :
function mimic_array_diff_ukey(...$args) {
$key_compare_func = array_pop($args);
foreach ($args as $arr) uksort($arr, $key_compare_func);
$first = array_shift($args);
return array_filter($first, function ($key) use($key_compare_func, $args) {
foreach ($args as $arr) {
foreach ($arr as $otherkey => $othervalue) {
if ($key_compare_func($key, $otherkey) == 0) return false;
}
}
return true;
}, ARRAY_FILTER_USE_KEY);
}
A more efficient algorithm would use sorting, but then would also take benefit from that and step through the keys of the first arrays while at the same time stepping through the keys of the other arrays in ascending order, in tandem -- never having to step back. This would make the algorithm O(nlogn + mlogm + n+m) = O(nlogn + mlogm).
Here is a possible implementation of that improved algorithm in PHP:
function better_array_diff_ukey(...$args) {
$key_compare_func = array_pop($args);
$first = array_shift($args);
$rest = [];
foreach ($args as $arr) $rest = $rest + $arr;
$rest = array_keys($rest);
uksort($first, $key_compare_func);
usort($rest, $key_compare_func);
$i = 0;
return array_filter($first, function ($key) use($key_compare_func, $rest, &$i) {
while ($i < count($rest) && ($cmp = $key_compare_func($rest[$i], $key)) < 0) $i++;
return $i >= count($rest) || $cmp > 0;
}, ARRAY_FILTER_USE_KEY);
}
Of course, this algorithm would need to be implemented in C if taken on board for improving array_diff_ukey, and to get a fair runtime comparison.
See the comparisons that are made -- on a slightly different input than in your question -- by the three functions (array_diff_ukey, mimic_array_diff_ukey and better_array_diff_ukey) on eval.in.
array_diff_ukey runs in two stages:
Sort the array keys
Compare key by key
This would probably explain why the callback is expected to return a sort value rather than a boolean "is equal".
I expect this is probably done for performance reasons, but if that's the case I would have thought that it can use this to say "well this key is bigger than all keys in the other array, so I shouldn't bother testing if these other, bigger keys are also bigger because they must be", but this doesn't seem to be the case: it compares them dutifully anyway.
I can only assume it's because the function cannot prove itself to be deterministic (and indeed in this case produces side-effects) so it can't be optimised like that. Perhaps array_diff_key (without user-defined function) does this optimisation just fine.
But anyway, that's what happens under the hood, and why you see more than just 9 comparisons. It could probably be made better in the core...
This is fairly confusing, but I'll try to explain as best I can...
I've got a MYSQL table full of strings like this:
{3}12{2}3{5}52
{3}7{2}44
{3}15{2}2{4}132{5}52{6}22
{3}15{2}3{4}168{5}52
Each string is a combination of product options and option values. The numbers inside the { } are the option, for example {3} = Color. The number immediately following each { } number is that option's value, for example 12 = Blue. I've already got the PHP code that knows how to parse these strings and deliver the information correctly, with one exception: For reasons that are probably too convoluted to get into here, the order of the options needs to be 3,4,2,5,6. (To try to modify the rest of the system to accept the current order would be too monumental a task.) It's fine if a particular combination doesn't have all five options, for instance "{3}7{2}44" delivers the expected result. The problem is just with combinations that include option 2 AND option 4-- their order needs to be switched so that any combination that includes both options 2 and 4, the {4} and its corresponding value comes before the {2} and it's corresponding value.
I've tried bringing the column into Excel and using Text to Columns, splitting them up by the "{" and "}" characters and re-ordering the columns, but since not every string yields the same number of columns, the order gets messed up in other ways (like option 5 coming before option 2).
I've also experimented with using PHP to explode each string into an array (which I thought I could then re-sort) using "}" as the delimiter, but I had no luck with that either because then the numbers blend together in other ways that make them unusable.
TL;DR: I have a bunch of strings like the ones quoted above. In every string that contains both a "{2}" and a "{4}", the placement of both of those values needs to be switched, so that the {4} and the number that follows it comes before the {2} and the number that follows it. In other words:
{3}15{2}3{4}168{5}52
needs to become
{3}15{4}168{2}3{5}52
The closest I've been able to come to a solution, in pseudocode, would be something like:
for each string,
if "{4}" is present in this string AND "{2}" is present in this string,
take the "{4}" and every digit that follows it UNTIL you hit another "{" and store that substring as a variable, then remove it from the string.
then, insert that substring back into the string, at a position starting immediately before the "{2}".
I hope that makes some kind of sense...
Is there any way with PHP, Excel, Notepad++, regular expressions, etc., that I can do this? Any help would be insanely appreciated.
EDITED TO ADD: After several people posted solutions, which I tried, I realized that it would be crucial to mention that my host is running PHP 5.2.17, which doesn't seem to allow for usort with custom sorting. If I could upvote everyone's solution (all of which I tried in PHP Sandbox and all of which worked), I would, but my rep is too low.
How would something like this work for you. The first 9 lines just transform your string into an array with each element being an array of the option number and value. The Order establishes an order for the items to appear in and the last does a usort utilizing the order array for positions.
$str = "{3}15{2}2{4}132{5}52{6}22";
$matches = array();
preg_match_all('/\{([0-9]+)\}([0-9]+)/', $str, $matches);
array_shift($matches);
$options = array();
for($x = 0; $x < count($matches[0]); $x++){
$options[] = array($matches[0][$x], $matches[1][$x]);
}
$order = [3,4,2,5,6];
usort($options, function($a, $b) use ($order) {
return array_search($a[0], $order) - array_search($b[0], $order);
});
To get you data back into the required format you would just
$str = "";
foreach($options as $opt){
$str.="{".$opt[0]."}".$opt[1];
}
On of the bonuses here is that when you add a new options type inserting adjusting the order is just a matter of inserting the option number in the correct position of the $order array.
First of all, those options should probably be in a separate table. You're breaking all kinds of normalization rules stuffing those things into a string like that.
But if you really want to parse that out in php, split the string into a key=>value array with something like this:
$options = [];
$pairs = explode('{', $option_string);
foreach($pairs as $pair) {
list($key,$value) = explode('}', $pair);
$options[$key] = $value;
}
I think this will give you:
$options[3]=15;
$options[2]=3;
$options[4]=168;
$options[5]=52;
Another option would be to use some sort of existing serialization (either serialize() or json_encode() in php) instead of rolling your own:
$options_string = json_encode($options);
// store $options_string in db
then
// get $options_string from db
$options = json_decode($options_string);
Here's a neat solution:
$order = array(3, 4, 2, 5, 6);
$string = '{3}15{2}3{4}168{5}52';
$split = preg_split('#\b(?={)#', $string);
usort($split, function($a, $b) use ($order) {
$a = array_search(preg_replace('#^{(\d+)}\d+$#', '$1', $a), $order);
$b = array_search(preg_replace('#^{(\d+)}\d+$#', '$1', $b), $order);
return $a - $b;
});
$split = implode('', $split);
var_dump($split);
Suppose a string:
$str = 'a_b_c';
I want match all possible combination with a, b, c with above. For example:
b_a_c, c_a_b, a_c_b..etc will be give true when compare with above $str.
NOTE:
$str may be random. eg: a_b, k_l_m_n etc
I would split your string into an array, and then compare it to an array of elements to match on.
$originalList = explode('_', 'a_b_c');
$matchList = array('a', 'b', 'c');
$diff = array_diff($matchList, $originalList);
if (!empty($diff)) {
// At least one of the elements in $matchList is not in $originalList
}
Beware of duplicate elements and what not, depending on how your data comes in.
Documentation:
array_diff()
explode()
There is no builtin way to quickly do this. Your task can be accomplished many different ways which will vary on how general they are. You make no mention of null values or checking the formatting of the string, so something like this might work for your purpose:
function all_combos($str,$vals) {
$s=explode("_",$str);
foreach($s as $c) {
if(!in_array($s,$vals)) return false;
}
return true;
}
Call like all_combos("b_c_a",array("a","b","c"));
I have an array that contains other arrays of US colleges broken down by the first letter of the alphabet. I've setup a test page so that you can see the array using print_r. That page is:
http://apps.richardmethod.com/Prehealth/Newpublic/test.php
In order to create that array, I used the following code:
$alphabetized = array();
foreach (range('A', 'Z') as $letter) {
// create new array based on letter
$alphabetized[$letter] = array();
// loop through results and add to array
foreach ( $users as $user ) {
$firstletter = substr($user->Schoolname, 0, 1);
if ( $letter == $firstletter ) {
array_unshift( $alphabetized[$letter], $user );
}
}
}
Now, I want to split the array so that a certain range of letters is in each array. For example,
arrayABCEFGH - would contain the schools that begin with the letters A, B, C, D, E, F, G, and H.
My question is, should I modify the code above so that I achieve this before I do one big array, OR should I do it after?
And here is the big question . . if so, how? :-)
Thanks in advance for any help. It's greatly appreciated.
First off, the code to generate the array can be made easier by only iteratating over $users:
$alphabetized = array();
// loop through results and add to array
foreach ($users as $user) {
$firstletter = strtoupper($user->Schoolname[0]);
$alphabetized[$firstletter][] = $user;
}
// sort by first letter (optional)
ksort($alphabetized);
To retrieve the first 8 entries you could use array_slice:
array_slice($alphabetized, 0, 8);
Assuming that all first letters are actually used and you used ksort() on the full array that also gives you from A - H. Otherwise you have to use array_intersect_key() and the flipped range of letters you wish to query on.
array_intersect_key($alphabetized, array_flip(range('A', 'H')));
If you don't need a full copy of the array, then you should encapsulate the above code in a function that has an array as its argument ($letterRange), which will hold the specific array identifiers (letters) to be in the final array.
Then you would need to encapsulate the code within the foreach using an if block:
if (in_array($letter, $letterRange)) { ... }
This would result in an array only containing the letter arrays for the specified letters in $letterRange.
You could write a simple function that takes an array, and an array of keys to extract from that array, and returns an array containing just those keys and their values. A sort of array_slice for non-numeric keys:
function values_at($array, $keys) {
return array_intersect_key($array, array_flip($keys));
}
Which you can then call like this to get what you want out of your alphabetized list:
$arrayABCDEFGH = values_at($alphabetized, range('A','H'));