Why does `array_diff_ukey` call the compare function so many times? - php

I executed the following code and its result made me confused!
I pass two arrays and a function named "myfunction" as arguments to the array_diff_ukey function. I see that myfunction is called 13 times (while it should be called at most 9 times). Even more amazing is that it compares the keys of the same array too! In both columns of the output, I see the key "e", while only the second array has it (the same is true for some other keys).
function myfunction($a,$b) {
echo $a . " ".$b."<br>";
if ($a===$b) {
return 0;
}
return ($a>$b)?1:-1;
}
$a1=array("a"=>"green","b"=>"blue","c"=>"red");
$a2=array("d"=>"blue","e"=>"black","f"=>"blue");
$result=array_diff_ukey($a1,$a2,"myfunction");
print_r($result);
Output:
a b
b c
d e
e f
a d
a e
a f
b d
b e
b f
c d
c e
c f
Array
(
[a] => green
[b] => blue
[c] => red
)
See it run on eval.in.
Why does the array_diff_ukey perform that many unnecessary calls to the compare function?

Nice question. Indeed the implemented algorithm is not the most efficient.
The C-source for PHP array functions can be found github. The implementation for array_diff_ukey uses a C-function php_array_diff which is also used by the implementations of array_udiff, array_diff_uassoc, and array_udiff_uassoc.
As you can see there, that function has this C-code:
for (i = 0; i < arr_argc; i++) {
//...
zend_sort((void *) lists[i], hash->nNumOfElements,
sizeof(Bucket), diff_key_compare_func, (swap_func_t)zend_hash_bucket_swap);
//...
}
...which means each input array is sorted using the compare function, explaining the first series of output you get, where keys of the same array are compared, and the first column can list other keys than the those of the first array.
Then it has a loop on the elements of the first array, a nested loop on the other arrays, and -- nested in that -- a loop on the elements of each of those:
while (Z_TYPE(ptrs[0]->val) != IS_UNDEF) {
//...
for (i = 1; i < arr_argc; i++) {
//...
while (Z_TYPE(ptr->val) != IS_UNDEF &&
(0 != (c = diff_key_compare_func(ptrs[0], ptr)))) {
ptr++;
}
//...
}
//...
}
Evidently, the sorting that is done on each of the arrays does not really contribute to anything in this algorithm, since still all keys of the first array are compared to potentially all the keys of the other array(s) with a plain 0 != comparison. The algorithm is thus O(klogk + nm), where n is the size of the first array, and m is the sum of the sizes of the other arrays, and k is the size of the largest array. Often the nm term will be the most significant.
One can only guess why this inefficient algorithm was chosen, but it looks like the main reason is code reusability: as stated above, this C code is used by other PHP functions as well, where it may make more sense. Still, it does not really sound like a good excuse.
A simple implementation of this (inefficient) array_diff_ukey algorithm in PHP (excluding all type checking, border conditions, etc) could look like this mimic_array_diff_ukey function :
function mimic_array_diff_ukey(...$args) {
$key_compare_func = array_pop($args);
foreach ($args as $arr) uksort($arr, $key_compare_func);
$first = array_shift($args);
return array_filter($first, function ($key) use($key_compare_func, $args) {
foreach ($args as $arr) {
foreach ($arr as $otherkey => $othervalue) {
if ($key_compare_func($key, $otherkey) == 0) return false;
}
}
return true;
}, ARRAY_FILTER_USE_KEY);
}
A more efficient algorithm would use sorting, but then would also take benefit from that and step through the keys of the first arrays while at the same time stepping through the keys of the other arrays in ascending order, in tandem -- never having to step back. This would make the algorithm O(nlogn + mlogm + n+m) = O(nlogn + mlogm).
Here is a possible implementation of that improved algorithm in PHP:
function better_array_diff_ukey(...$args) {
$key_compare_func = array_pop($args);
$first = array_shift($args);
$rest = [];
foreach ($args as $arr) $rest = $rest + $arr;
$rest = array_keys($rest);
uksort($first, $key_compare_func);
usort($rest, $key_compare_func);
$i = 0;
return array_filter($first, function ($key) use($key_compare_func, $rest, &$i) {
while ($i < count($rest) && ($cmp = $key_compare_func($rest[$i], $key)) < 0) $i++;
return $i >= count($rest) || $cmp > 0;
}, ARRAY_FILTER_USE_KEY);
}
Of course, this algorithm would need to be implemented in C if taken on board for improving array_diff_ukey, and to get a fair runtime comparison.
See the comparisons that are made -- on a slightly different input than in your question -- by the three functions (array_diff_ukey, mimic_array_diff_ukey and better_array_diff_ukey) on eval.in.

array_diff_ukey runs in two stages:
Sort the array keys
Compare key by key
This would probably explain why the callback is expected to return a sort value rather than a boolean "is equal".
I expect this is probably done for performance reasons, but if that's the case I would have thought that it can use this to say "well this key is bigger than all keys in the other array, so I shouldn't bother testing if these other, bigger keys are also bigger because they must be", but this doesn't seem to be the case: it compares them dutifully anyway.
I can only assume it's because the function cannot prove itself to be deterministic (and indeed in this case produces side-effects) so it can't be optimised like that. Perhaps array_diff_key (without user-defined function) does this optimisation just fine.
But anyway, that's what happens under the hood, and why you see more than just 9 comparisons. It could probably be made better in the core...

Related

Finding All Possible Combinations of Strings with Restrictions

I need help with creating an algorithm in PHP that, given an array of alphabets (represented as strings) and an array of groupings of those alphabets (also an array of strings), returns an array of arrays of all possible combinations of strings based on those groupings. The following example will make it clear -
If the input array is ['A', 'B', 'C'] and the groupings are ['AB', 'BC'] the returned output:
Without any restrictions would be
[['A','B','C'], ['AB,'C'], ['A','BC'], ['AC','B'], ['ABC']]
With the restrictions of the groupings should be [['A','B','C'], ['AB,'C'], ['A','BC']]
The reason for this is because neither 'ABC' nor 'AC' are allowed groupings and the idea is that the groupings should only exist if they belong to the specified array. In this case, since 'AB' and 'BC' are the only possible groupings, the output contains them. The first output was just for demonstration purposes, but the algorithm should produce the second output. The only other restriction is that there can't be duplicate alphabets in a single combination. So the following output is NOT correct:
[['A','B','C'], ['AB,'C'], ['A','BC'], ['AB','BC'], ['AC','B'], ['ABC']]
since 'B' is a duplicate in ['AB','BC']
A similar question I found was here, except that there are no restrictions on which numbers can be grouped together in the "Result" in this question.
I apologize if I made it sound confusing but I'll be sure to clarify if you have any questions.
The simplest approach to generate such partitions is recursive (I think).
At first, represent restrictions as boolean (or 0/1) 2d matrix. For your case graph has connections (edges) A-B and B-C and adjacency matrix is [[0,1,0][1,0,1],[0,1,0]]
Start from empty array. At every recursion level add next element (A, then B, then C) into all possible groups and into separate group.
(In languages like C I'd use bit masks for every group to determine quickly with bit-OR operation whether a group allows to add current element)
First level: add A and get:
[[A]]
Second level: add B both in existing group and in separate one:
[[A, B]], [[A],[B]]
Third Level: you add C only with:
[[A, B], C], [[A],[B, C]], [[A],[B], [C]]
You can use the answer from the post you linked. I adapted it for you:
function generate_groups($collection) {
if (count($collection) == 1) {
yield [$collection];
return;
}
$first = $collection[0];
foreach (generate_groups(array_slice($collection, 1)) as $smaller) {
foreach (array_values($smaller) as $n => $subset) {
yield array_merge(
array_slice($smaller, 0, $n),
[array_merge([$first], $subset)],
array_slice($smaller, $n+1)
);
}
yield array_merge([[$first]], $smaller);
}
}
$input = ['A', 'B', 'C'];
$groupings = ['AB', 'BC'];
foreach (generate_groups($input) as $groups) {
$are_groups_ok = true;
foreach ($groups as $group) {
$compact = implode($group);
if (strlen($compact) != 1 and !in_array($compact, $groupings)) {
$are_groups_ok = false;
}
}
if ($are_groups_ok) {
echo "[" . implode("], [", array_map("implode", $groups)) . "]\n";
}
}
This prints:
[A], [BC]
[AB], [C]
[A], [B], [C]

More efficient way to write this PHP code, and how to instantiate PHP array similar to Python list

I did a training exercise (ranked as easy), and the question is here, and my answer in PHP below.
function solution($X, $A) {
// write your code in PHP7.0
$inplace = []; # positions in place to our goal, space O(X). Index by position number 0..$X
foreach ($A as $k => $pos) {
# time O(N) - array of N integers
if ($pos <= $X) { # we are only interested in positions within $X
# count positions - but store the first $k key seconds which this position is reached.
# We are not interested in when the second leaf of this same position falls.
if (!isset($inplace[$pos])) $inplace[$pos] = $k;
}
}
$maxk = -1; # max key value which is the longest time for the needed leaf to fall
for ($i=1; $i <= $X; $i++) {
# go through every position
if (isset($inplace[$i])) {
$tempk = $inplace[$i]; //k seconds for this leaf to fall
$maxk = ($tempk > $maxk) ? $tempk : $maxk;
}
else return -1; # if this position is not set, the leaf does not fall, so we exit
}
return $maxk;
}
My questions:
1) Is there a better way you would write the code? I'm focusing on time complexity and simplicity of code, so any feedback is welcome.
2) In this training material - an example if given in Python for def
counting(A, m) to count instances of an array. I used this concept in my solution, in PHP, where the index of the array was the value. In PHP, is there a way to instantiate an array as done in Python? I imagine in PHP, I'd use isset() in my code when processing the array count result, without actually instantiating a whole array of (m+1) elements, or is there actually a way to get the below line working in PHP?
count = [0] * (m + 1) # Python code for list (array) instantiation of (m+1) elements.
# How do we do this in PHP?
Thanks very much!
1) A simpler and more efficent way to write this function would be using one for loop and a counter. Just loop over 1 - $X, and track the position of the items with array_search.
function solution($X, $A){
$p = 0;
for($i=1; $i<=$X; $i++) {
$k = array_search($i, $A);
if($k === false)
return -1;
if($k > $p)
$p = $k;
}
return $p;
}
Even though array_search is slower than isset this solution is much faster (with the given parameters) and it doesn't create an extra array for mapping the leaf positions.
2) Python has great tools for working with lists (concatenation, repetition, comprehensions, etc) which are not avaliable in PHP and other languages.
But you can achieve something similar with array_fill, ie: $count = array_fill(0, $m+1, 0);
The above function 'translated' to python would be:
def solution(X, A):
p = 0
for i in range(1, X+1):
if i not in A:
return -1
v = A.index(i)
if v > p:
p = v
return p

Generate combinations and choose best based on three parameters

I'm trying to generate pairs and then would like to choose the best pair based on set parameters. Generating pairs isn't that hard, but what's tricky is to select the best out of them.
I think it'd be best if i'd continue with example, let's take we currently have 4 elements and name them element1,element2,element3,element4. Each element has properties which are important when generating pairs:
while ($element = mysqli_fetch_assoc($queryToGetElements)){
$elementId = $element['eid'];
$elementOpponents = getElementOpponents($elementId); //Returns opponents Id's as an array. These are the possible elements which can be paired.
$elementPairs = generatePairsWithElement($elementId,$elementOpponents); //Pair generating function, this uses the possible pairs which is defined below. Pretty much checks who hasn't paired before and puts them together in the pair.
$bestPair = chooseBest($elementPairs,$allOtherPairs) // This should select the best pair out of the elementPairs considering uniqueness with other pairs.
}
//So let's define our four elements:
$element1Place = 1;
$element2Place = 2;
$element3Place = 3;
$element4Place = 4;
//execute while loop with $element1:
$element1Opponents = [$element2,$element3,$element4];
$elementPairs = [[$element1,$element3],[$element1,$element3],[$element1,$element4]];
$element2Opponents = [$element3]
$elementPairs = [[$element2,$element3],[$element2,$element1]];
$elemenet3Opponents = [$element2]
$elementPairs = [[$element2,$element3],[$element1,$element3]];
$element4Opponents = [$element1]
$elementPairs = [[$element1,$element4];
//Pairs returned should be: [$element1,$element4] & [$element2,$element3].
Possible pairs - This is an array of other elements which can be paired with current element. With current example I do have 4 elements, but some of them can not be paired together as they may have been paired together previously (and it's not allowed to pair someone together twice). This constraint is applied in function generatePairsWithElements.
Place - This is an unique integer value which cannot be same for two elements. When choosing paired element, the element with lower Place will be selected. This constraint is applied in function chooseBest.
Combinations has to be unique - For example if element1 can pair with either element2 and element3 and element4 can pair with only element2 then element1 can only pair with element3 even if element1 is iterated earlier and element2 has lower place. So the combinations would be [element1,element3] and [element2,element4]. This constraint is applied in function chooseBest.
This task wouldn't be so difficult if there wouldn't be the third aspect, generating unique combinations. It would be a pretty easy to iterate over the possible opponents and just choose the best one however generating unique combinations is vital for my project.
So I want to preface this answer by a disclaimer: I am not a mathematician; my understanding of linear algebra, matrix algebra and statistic is adequate but by no means extensive. There may be ways to achieve the same results with fewer lines of code or more efficiently. However, I believe that the verbosity level of this answer will allow more people to understand it, and follow the logic step by step. Now that that's out of the way, let's jump into it.
Calculate the Weight of Each Pair
The problem with the current approach is that the logic for finding the best possible match is happening as you loop through the results of the query. What that means is that you may be left with elements that can't be matched together when you get to the last iteration. To fix this, we're going to need to split the process a bit. The first step is to get all of the possible pairs for the given array elements.
$elements = [
1,2,3,4,5,6
];
$possiblePairs = [];
for($i = 1; $i <= $elementCount; $i++) {
for($j = $i + 1; $j <= $elementCount; $j++) {
$possiblePairs[] = [
'values' => [$elements[$i - 1], $elements[$j - 1]],
'score' => rand(1, 100)
];
}
}
As you can see, for each possible pair, I am also attaching a score element, a random integer in this case. This score represents how strongly this pair matches. The value of this score doesn't actually matter: it could be values in a specific range (ex: 0-100) or every value defined by some functions of your own. The important thing here is that pairs that have highest match potential should have a higher score, and pairs with little to no potential should have a low score. If some pairs absolutely cannot go together, simply set the score to zero.
The next step is to then use than score value to sort the array so the the pairs that are more strongly matched are on top, and the weaker (or impossible) pairs at the bottom. I used PHP 7.0's spaceship operator to get the job done, but you can find other ways to achieve this sort here.
usort($possiblePairs, function($a, $b) {
return $b['score'] <=> $a['score'];
});
Now we're finally equipped to build our final output. This step is actually fairly easy: loop through the possible pairs, check if the values haven't already been used, then push them to the output array.
$used = []; // I used a temporary array to store the processed items for simplicity
foreach($possiblePairs as $key => $pair) {
if($pair['score'] !== 0 && // additional safety if two elements cannot go together
!in_array($pair['values'][0], $used) &&
!in_array($pair['values'][1], $used))
{
$output[] = $pair['values']; // push the values to the return of the function
array_push($used, $pair['values'][0], $pair['values'][1]); // add the element to $used so they get ignored on the next iteration
}
}
var_dump($output);
// output
array(3) {
[0]=> array(2) {
[0]=> int(2)
[1]=> int(4)
}
[1]=> array(2) {
[0]=> int(1)
[1]=> int(3)
}
[2]=> array(2) {
[0]=> int(5)
[1]=> int(6)
}
}
And there it is! The strongest pair is chosen first, and then it goes down in priority. Play around with weighting algorithms, see what works for your specific needs. As a final word: if I were writing this in the context of a business application, I would probably add some error reporting if there happens to be un-matcheable pairs (after all, it is statistically possible) to make it easier to spot the cause of the problem and decide if the weighting needs tweaking.
You can try the example here
Edit to add:
In order to prevent the scenario where an element gets stored in a pair when another element can only be paired with that first element (which results in the pair never being created) you can try this little patch. I started by recreating your requirements in a getScore() function.
function getScore($elem1, $elem2) {
$score;
if($elem1 > $elem2) {
$tmp = $elem1;
$elem1 = $elem2;
$elem2 = $tmp;
}
if($elem1 === 1 && $elem2 === 2) {
$score = 100;
}
elseif(($elem1 === 3 && $elem2 !== 2) || ($elem1 !== 2 && $elem2 === 3)) {
$score = 0;
}
else {
$score = rand(0, 100);
}
return $score;
}
Then, I modified the $possiblePairs array creation to do two additional things.
if the score is 0, don't append to the array and
keep track of how many matches were found for each element. By doing this, any element that only has one possible match will have an associated value in $nbMatches of 1.
$nbMatches = [
1 => 0,
2 => 0,
3 => 0,
4 => 0
]
$possiblePairs = [];
for($i = 1; $i <= $elementCount; $i++) {
for($j = $i + 1; $j <= $elementCount; $j++) {
$score = getScore($elements[$i - 1], $elements[$j - 1]);
if($score > 0) {
$possiblePairs[] = [
'values' => [$elements[$i - 1], $elements[$j - 1]],
'score' => $score
];
$nbMatches[$elements[$i - 1]]++;
$nbMatches[$elements[$j - 1]]++;
}
}
}
Then I added another loop that will bump up those elements so that they end up on top of the list to be processed before the rest.
foreach($nbMatches as $elem => $intMatches) {
if($intMatches === 1) {
foreach($possiblePairs as $key => $pair) {
if(in_array($elem, $pair['values'])) {
$possiblePairs[$key]['score'] = 101;
}
}
}
}
usort($possiblePairs, function($a, $b) {
return $b['score'] <=> $a['score'];
});
The output is then:
array(2) {
[0]=> array(2) {
[0]=> int(2)
[1]=> int(3)
}
[1]=> array(2) {
[0]=> int(1)
[1]=> int(4)
}
}
I just want to stress: this is only a temporary patch. It will not protect you in cases where, for instance, element 1 can only match element 2 and element 3 can also only match element 2. However, we are dealing with a very small sample size, and the more elements you have, the less likely these edge case will be likely to occur. In my opinion this fix is not necessary unless you are only working with 4 elements. working with 6 elements yields 15 possible combinations, 10 elements yields 45 possible combinations. Already those cases will be unlikely to happen.
Also, if you find that those edge cases still happen, it may be a good idea to go back and tweak the matching algorithm to be more flexible, or take into account more parameters.
You can try the updated version here
So i think i got it figured out, thanks to William and thanks to ishegg from another thread.
So as a prequisite i have an array $unmatchedPlayers which is an associative array where element name is a key and element position (Place) is a value.
First off using that info, i'm generating unique permutation pairs
function generatePermutations($array) {
$permutations = [];
$pairs = [];
$i = 0;
foreach ($array as $key => $value) {
foreach ($array as $key2 => $value2) {
if ($key === $key2) continue;
$permutations[] = [$key, $key2];
}
array_shift($array);
}
foreach ($permutations as $key => $value) {
foreach ($permutations as $key2=>$value2) {
if (!in_array($value2[0], $value) && !in_array($value2[1], $value)) {
$pairs[] = [$value, $value2];
}
}
array_shift($permutations);
}
return $pairs;
}
This will return me an array called $pairs which has arrays of different possible pairs inside in a manner of: $pairs= [[['One','Two'],['Three','Four']],[['One','Three'],['Two','Four']],[['One','Four'],['Two','Three']]];
Now i will iterate over the array $pairs and choose the best, giving each permutation combination a 'score':
function chooseBest($permutations,$unmatchedPlayers){
$currentBest = 0;
$best = [];
foreach ($permutations as &$oneCombo){ //Iterate over all permutations [[1,2],[3,4]],[..]
$score = 0;
foreach ($oneCombo as &$pair){
$firstElement = $pair[0];
$secondElement = $pair[1];
//Check if these two has played against each other? If has then stop!
if(hasPlayedTP($firstElement,$secondElement)){
$score = 0;
break;
}
$score += $unmatchedPlayers[$firstElement];
$score += $unmatchedPlayers[$secondElement];
}
if($score > $currentBest){
$currentBest = $score;
$best = $oneCombo;
}
}
return $best;
}
The best score is calculated and permutation pair is returned.

Which is the best and the most efficient way to find and store non existent values between two arrays in both sides?

Perhaps this has been asked several times but I can't find the right answer so here goes.
I have two arrays: one with ~135732 and the other one with ~135730 elements. I need to find which items are on the first but not on the second and viceverse and don't know is there is an easy way to achieve that.
This is what I would do it:
$countArr1 = count($arr1);
$countArr2 = count($arr2);
for($i=0; $i < $countArr1; $i++) {
// Check whether current element on $arr1 is on $arr2 or not
if (!in_array($arr1[$i], $arr2)) {
// if it doesn't then add it to $newArr
$newArr[] = $arr1[$i];
}
}
Then I would do the same but inverse for $arr2. In huge arrays could take a while and also could kill memory or server resources, even if it's executed from CLI so which is the best and the most efficient, regarding use of resources, way to achieve this?
EDIT
Let's clear this a bit. I get $arr1 from DB and $arr2 comes from other place. So the big idea is to find which items needs to be updated and which ones needs to be added also which ones needs to be marked as obsolete. In less and common words:
if element is on $arr1 but doesn't exists on $arr2 should be marked as obsolete
if element comes in $arr2 btu doesn't exists on $arr1 then needs to be added (created)
otherwise that element just need to be updated
Clear enough? Feel free to ask everything in order to help on this
EDIT 2
Based on #dakkaron answer I made this code:
// $arr1 and $arr2 are previously built
$sortArr1 = asort($arr1);
$sortArr2 = asort($arr2);
$countArr1 = count($sortArr1);
$countArr2 = count($sortArr2);
$i = $j = 0;
$updArr = $inactiveArr = $newArr = [];
echo "original arr1 count: ", count($arr1), "\n";
echo "original arr2 count: ", count($arr2), "\n";
echo "arr1 count: ", $countArr1, "\n";
echo "arr2 count: ", $countArr2, "\n";
while ( $i < $countArr1 && $j < $countArr2) {
if ($sortArr1[$i] == $sortArr2[$j]) {
//Handle equal values
$updArr[] = $sortArr1[$i];
$i++; $j++;
} else if ($sortArr1[$i] < $sortArr2[$j]) {
//Handle values that are in arr1 but not in arr2
$inactiveArr[] = $sortArr1[$i];
$i++;
} else {
//Handle values that are in arr2 but not in arr1
$newArr[] = $sortArr2[$j];
$j++;
}
}
echo "items update: ", count($updArr), "\n", "items inactive: ", count($inactiveArr), "\n", "items new: ", count($newArr), "\n";
And I got this output:
original arr1 count: 135732
original arr2 count: 135730
arr1 count: 1
arr2 count: 1
items update: 1
items inactive: 0
items new: 0
Why sort count returns 1?
You could take avantage of array_diff: http://php.net/manual/en/function.array-diff.php
Edit
A php function construct is more likely to perform better than an equivalent user-defined one. Searching I found this, but the size of your array is way smaller, and in the end I believe you should benchmark a prototype script with candidate solutions.
See my last comment.
The best solution I can think of would be to first sort both arrays and then compare them from the bottom up.
Start with the lowest element in both arrays and compare them.
If they are equal, take them and move up one element on both arrays.
If they are different, move up one element on the array with the lower value.
If you reached the end of one of the arrays you are done.
After the sorting this should take about O(n) complexity.
This is a bit of code in pseudocode:
arr1 = ...
arr2 = ...
arr1.sort();
arr2.sort();
i1 = 0;
i2 = 0;
while (i1<arr1.length() && i2<arr2.length()) {
if (arr1[i1]==arr2[i2]) {
//Handle equal values
i1++; i2++;
} else if (arr1[i1]<arr2[i2]) {
//Handle values that are in arr1 but not in arr2
i1++;
} else {
//Handle values that are in arr2 but not in arr1
i2++;
}
}
Other than that, if you don't want to implement it yourself, just use array_diff
The best solution i can think of is to sort the second array, and try to look for values from the first array using binary search,
this would take O(nLog(n)) complexity
Since your values are strings, you could take the advantage of PHP’s implementation of arrays using a hash-table internally with O(1) for key lookups:
$diff = [];
// A \ B
$lookup = array_flip($b); // O(n)
foreach ($a as $value) { // O(n)
if (!isset($lookup[$value])) $diff[] = $value;
}
// B \ A
$lookup = array_flip($a); // O(n)
foreach ($b as $value) { // O(n)
if (!isset($lookup[$value])) $diff[] = $value;
}
So in total, it’s O(n) in both space and time.
Of course, in the end you should benchmark it to see if it’s actually more efficient than other solutions here.
Fill hashtable-based dictionary/map (don't know how it is called in PHP) with the second array elements, and check whether every element of the first array presents in this dictionary.
Usual complexity O(N)
for A in arr2
map.insert(A)
for B in arr1
if not map.contains(B) then
element B is on $arr1 but doesn't exists on $arr2
note that this approach doesn't address all problems in your edited question

Determine whether an array is associative (hash) or not [duplicate]

This question already has answers here:
How to check if PHP array is associative or sequential?
(60 answers)
Closed last year.
I'd like to be able to pass an array to a function and have the function behave differently depending on whether it's a "list" style array or a "hash" style array. E.g.:
myfunc(array("One", "Two", "Three")); // works
myfunc(array(1=>"One", 2=>"Two", 3=>"Three")); also works, but understands it's a hash
Might output something like:
One, Two, Three
1=One, 2=Two, 3=Three
ie: the function does something differently when it "detects" it's being passed a hash rather than an array. Can you tell I'm coming from a Perl background where %hashes are different references from #arrays?
I believe my example is significant because we can't just test to see whether the key is numeric, because you could very well be using numeric keys in your hash.
I'm specifically looking to avoid having to use the messier construct of myfunc(array(array(1=>"One"), array(2=>"Two"), array(3=>"Three")))
Pulled right out of the kohana framework.
public static function is_assoc(array $array)
{
// Keys of the array
$keys = array_keys($array);
// If the array keys of the keys match the keys, then the array must
// not be associative (e.g. the keys array looked like {0:0, 1:1...}).
return array_keys($keys) !== $keys;
}
This benchmark gives 3 methods.
Here's a summary, sorted from fastest to slowest. For more informations, read the complete benchmark here.
1. Using array_values()
function($array) {
return (array_values($array) !== $array);
}
2. Using array_keys()
function($array){
$array = array_keys($array); return ($array !== array_keys($array));
}
3. Using array_filter()
function($array){
return count(array_filter(array_keys($array), 'is_string')) > 0;
}
PHP treats all arrays as hashes, technically, so there is not an exact way to do this. Your best bet would be the following I believe:
if (array_keys($array) === range(0, count($array) - 1)) {
//it is a hash
}
No, PHP does not differentiate arrays where the keys are numeric strings from the arrays where the keys are integers in cases like the following:
$a = array("0"=>'a', "1"=>'b', "2"=>'c');
$b = array(0=>'a', 1=>'b', 2=>'c');
var_dump(array_keys($a), array_keys($b));
It outputs:
array(3) {
[0]=> int(0) [1]=> int(1) [2]=> int(2)
}
array(3) {
[0]=> int(0) [1]=> int(1) [2]=> int(2)
}
(above formatted for readability)
My solution is to get keys of an array like below and check that if the key is not integer:
private function is_hash($array) {
foreach($array as $key => $value) {
return ! is_int($key);
}
return false;
}
It is wrong to get array_keys of a hash array like below:
array_keys(array(
"abc" => "gfb",
"bdc" => "dbc"
)
);
will output:
array(
0 => "abc",
1 => "bdc"
)
So, it is not a good idea to compare it with a range of numbers as mentioned in top rated answer. It will always say that it is a hash array if you try to compare keys with a range.
Being a little frustrated, trying to write a function to address all combinations, an idea clicked in my mind: parse json_encode result.
When a json string contains a curly brace, then it must contain an object!
Of course, after reading the solutions here, mine is a bit funny...
Anyway, I want to share it with the community, just to present an attempt to solve the problem from another prospective (more "visual").
function isAssociative(array $arr): bool
{
// consider empty, and [0, 1, 2, ...] sequential
if(empty($arr) || array_is_list($arr)) {
return false;
}
// first scenario:
// [ 1 => [*any*] ]
// [ 'a' => [*any*] ]
foreach ($arr as $key => $value) {
if(is_array($value)) {
return true;
}
}
// second scenario: read the json string
$jsonNest = json_encode($arr, JSON_THROW_ON_ERROR);
return str_contains($jsonNest, '{'); // {} assoc, [] sequential
}
NOTES
php#8.1 is required, check out the gist on github containing the unit test of this method + Polyfills (php>=7.3).
I've tested also Hussard's posted solutions, A & B are passing all tests, C fails to recognize: {"1":0,"2":1}.
BENCHMARKS
Here json parsing is ~200 ms behind B, but still 1.7 seconds faster than solution C!
What do you think about this version? Improvements are welcome!

Categories