PHP array comparison algorithm - php

While trying to simulate a bit of PHP behaviour I stumbled across this:
$a=array(0 => 1, 'test' => 2);
$b=array('test' => 3, 0 => 1);
var_dump($a==$b, $a>$b, $b>$a);
According to the output from var_dump $b is bigger than $a. In the PHP manual there is a Transcription of standard array comparison which states that the values of the arrays are compared one by one and if a key from the first array is missing in the second array, the arrays are uncomparable. So far so good. But if I try this (change in the first element of $a only):
$a=array(0 => 2, 'test' => 2);
$b=array('test' => 3, 0 => 1);
var_dump($a==$b, $a>$b, $b>$a);
All three comparison results are false. This looks like "uncomparable" to me (because the > result is the same as the < result, while the arrays are not ==either, which makes no sense) but this does not fit the transcription from the PHP manual. Both keys are present in both arrays and I would expect $a to be bigger this time because the content of key 0 is bigger in $a (2 vs. 1).
I've tried to dig into the PHP source code and found zend_hash_compare() in zend_hash.c, but the code there seems to work as the manual describes.
What's going on here?

EDIT: As Joachim has shown, it deals with the order called. To steal his words:
"$a>$b loops over b and finds 'test' first. 'test' is greater in $b so $b is greater and it returns false. $b>$a loops over a and finds '0' first. '0' is greater in $a so $a is greater and it returns false."
-- Original Post --
I'm not 100% sure I'm right on this; I haven't seen this before, and have only briefly looked into it (major kudos, by the way, on an excellent question!). Anyway, it would appear that either PHP documentation is wrong, or this is a bug (in which case you might want to submit it), and here is why:
in zend_hash_compare() in zend_hash.c, it seems as though there is some confusion over what ordered is (I'm looking at line 1514 and 1552-1561, which is my best guess is where the problem is, without doing lots of testing).
Here's what I mean; try this:
$a=array(0 => 2, 'test' => 2);
$b=array(0 => 1, 'test' => 3);
var_dump($a==$b, $a>$b, $b>$a);
Note I merely switched the order of indexes, and $a>$b returns true. Also see this:
$x=array(0 => 2, 'test' => 2);
$y = $x;
$y[0] = 1; $y['test'] = 3;
var_dump($x==$y, $x>$y, $y>$x);
Note here, as well, $x>$y returns true. In other words, PHP is not just matching array keys! It cares about the order of those keys in the arrays! You can prevent this situation by coming up with a "base" array and "copying" it into new variables (in my x/y example) before modifying, or you can create an object, if you so desire.
To say all that differently, and much more briefly, it would appear that PHP is not just looking at key values, but at both key values AND key order.
Again, I emphasize I don't know if this expected behavior (it seems like something they ought to have noted in the PHP manual if it was), or a bug/error/etc (which seems much more likely to me). But either way, I'm finding that it is compared first by number of keys (lines 1496-1501 in zend_hash.c), and then by both key value and key order.

It would seem that the comparison loop is in the case of > done over the right hand array and in the case of < done over the left hand array, ie always over the supposedly "lesser" array. The order of the elements is significant as the foreach loop in the transcription code respects array order.
In other words;
$a>$b loops over b and finds 'test' first. 'test' is greater in $b so $b is greater and it returns false.
$b>$a loops over a and finds '0' first. '0' is greater in $a so $a is greater and it returns false.
This would actually make sense, the "greater" array is then allowed to contain elements that the "lesser" array doesn't and still be greater as long as all common elements are greater.

I think here is comparing one by one so $a[0]>$b[0] but $a['test']<$b['test'].
You can not say which array is bigger.

Related

How is this (erroneous) comparison done by PHP?

Assume the following code, which tries to determine whether the array has more than 3 elements. Note, that I am aware that this is done normally using count($array) and comparing the integers, but I got curious as to why
$array = [1, 2, 3];
var_dump($array > 3);
returns true, which it, actually, does independently of the value of the right comparison operand in the var_dump, so $array > 3 is no different than $array > 3000.
My question lies in what sort of typecasting happens internally in PHP when an array is compared to an integer in this entirely inappropriate manner, or whether there is a case where this manner is indeed appropriate.
From the PHP manual, it says:
array anything array is always greater
So when you compare anything with array, then array is greater. I am now going to check array vs. object.

Why does array_udiff use a compare function instead of a predicate function?

array_udiff computes the difference between two arrays using a callback function. However, it requires a compare function instead of a predicate function.
A compare function compares item A relative to item B. A predicate function would just determine whether or not item A is equal to item B.
Compare functions are usually required by sort functions to determine the correct ordering. Since array_udiff is just computing the differences, a predicate function that determines whether each pair is equal seems like it should suffice.
Why does array_udiff use a compare function instead of a predicate function? Does it matter if I use a predicate instead? i.e. Can I elect to just use the 0 and 1 return values to denote inequality and equality, discarding the -1 possibility? What adverse effect, if any, would this have on my results?
The implementation for php_array_diff() (which supplies implementation for a number of userspace array functions) works by reusing a number of internal comparison functions.
This is because those comparison functions already exist for other purposes, and meet the needed task at hand: determine if two items are equal or not. That they do a little extra work is unimportant; what's important is the relative reduction in code that needs to be considered. (An equals function could easily be written in terms of a comparison function, or as a separate entity, but now you've got two functions to do the same job.)
The actual implementation also works by sorting. So you need to use a comparison algorithm suitable for sorting, or you will get unexpected results. For example:
$a = [0, 1, 2, 3, 4, 5, 6];
$b = [4];
print_r(array_udiff($a, $b, function($x, $y) {
return $x <=> $y; //Sorting comparison function, correct
}));
print_r(array_udiff($a, $b, function($x, $y) {
return $x != $y; // Equality test, incorrect
}));
gives
Array //Sorting comparison function, correct
(
[0] => 0
[1] => 1
[2] => 2
[3] => 3
[5] => 5
[6] => 6
)
Array // Equality test, incorrect
(
[0] => 0
[1] => 1
[2] => 2
[3] => 3
[4] => 4 // equality test causes udiff to incorrectly include 4
[5] => 5
[6] => 6
)
The reason for this is the algorithm php_array_diff() uses. Basically it goes like this:
Duplicate and sort all of the input arrays
Set the output OUT equal to the sorted first input array
For each element in SRC
V is the value of the current element in SRC
For each input array A starting from the second
Skip ahead to the next element in A that is > V, but make a note if we go past one that is == V.
If we found a match for V, remove it from OUT.
If we don't (so it stays in the input array), skip ahead in SRC until we have a new V >= the current one
So, the algorithm relies on all of the inputs being sorted, and uses that fact, (and the the comparison function) so it only has to inspect each element in each input array once. If the comparison function doesn't result in an actually sorted array, the algorithm fails and you get a bad result.
HHVM may result in a different result, as HHVM uses a different sorting algorithm. HHVM uses a pure quicksort, while PHP uses a quicksort implementation derived from llvm that includes an insertion sort optimization.
Normally, different sorting algorithms arrive at the same solution via different means. That is, different algorithms cause elements to be compared in a different order, at different times, and in different quantities. In the case of an incorrect comparison function, this can have a large effect on the final order of the array.
It must be cheaper to complement ordered arrays. Without sorting it will take O(m*n) time.

Counting Array Returns One Digit Higher

I'm trying to count a php array.
I have my code successfully counting it, but the value is returning one digit higher than what my array is.
I have tried using -- when echoing my array, but that doesn't work.
Here is my code so far:
$quotes[0] = "Volvo";
$quotes[1] = "BMW";
$quotes[2] = "Toyota";
$quotesCount = count($quotes);
echo ($quotes[rand(0, 2)]);
echo $quotesCount--;
When it count's it returns "3" which makes sense because there are three items, but how do I subtract a number when it echos so that it reflects the the largest digit in the array?
What you tried with the echo $quotesCount--; is almost doing what you want it to. What you missed though is how the -- works. You can place it either infront of the variable or behind it - and that makes a difference.
To get the full version, read this: http://php.net/manual/en/language.operators.increment.php
But the short version is that you could potentially do this:
echo --$quotesCount;
Which will show you the value you want.
However this is still not really true - you are confusing array keys with the count of elements in an array.
If your array had non-sequential keys (1,3,5) for example, that code would return 2 - which is certainly not the highest key.
You can get a nice stepping stone to the key itself by using http://php.net/manual/en/function.array-keys.php - then you can reference the actual key itself by its order in the array.
You can use array_max($quotes) z this will return the highest key in the array.
Hey" you should array_max($array) in this case.
array_max is an array function which returns the highest value of an array.
That's it,
Keep Coding :)

PHP: What is the real meaning of an index in an indexed array?

Say, we make an array like this:
$arr = Array
(
2 => 'c',
1 => 'b',
0 => 'a'
);
When you pass it to array_pop():
array_pop($arr);
And the "last" element would be poped off, which has the index of zero!!
print_r($arr);
Result:
Array
(
[2] => c
[1] => b
)
So, what's the purpose of index?
Isn't it just a different way of saying "numeric keys of associative arrays"?
Is it only PHP dose so, or all the languages treat arrays like this?
Not all languages do this, but PHP does, because PHP is a little weird. It implements arrays more or less like dictionaries. PHP does offer some functions like ksort though, which let you sort the array by key.
And that's what this is: a key. An array has indexes as well, so what you got, is an array where item 2 has key 0. And that's where the confusion starts continues.
PHP: a fractal of bad design has a whole chapter about arrays. Interesting reading material. :)
The reason for this behavior is because arrays in PHP are actually unordered maps.
Because of this, don't think of accessing the arrays in terms of indexes, think of it in terms of keys. Keys can be numbers and they can be strings, but the result is the same; you're still using a map, not a true "array".
Once you accept that fact, you'll understand why PHP includes functions like ksort() for sorting an array by keys and why array_pop() doesn't always remove the highest key value.
It's a PHP thing. Other languages usually provide other structures to provide what is the default behaviour for arrays on PHP. JavaScript for instance will always sort the array:
a = [];
> []
a[1] = 'a';
> "a"
a[2] = 'b';
> "b"
a[0] = 'c';
> "c"
a
> ["c", "a", "b"]
In Java you would need to use a Hash Map or something else to do Associative Arrays. PHP handles data structures more loosely than other languages.
The index allows you to identify and access the elements of the array.
the reason is simple HashTables.
in php internal functions often use HashTables. basically an array is some data in memory and like in C - an array index can only hold integer values but not in php.
php solves this with hashtables. if you asign a index example foo this value is not directly assigned as foo it gets hashed and maybe end internal as 000000000111 and other hash functions.
so php doesn't work directly with your assigned value and this is the reason why you can set an array index like 0 as last index element. internal php work with hashtables that have a "list" with values which index value is assigned to which position in the array.

PHP associative arrays keys only

Is it possible to declare an array element key and not define it a value (like non-array variables)? This way if you have an associative array of booleans, you need only check if the key exists rather than assigning a boolean value. But you'd still have the advantage of not having to iterate over the array when checking if a key exists.
This would be a space saving measure. It appears 'null' gets allocated space.
No. Array element always have key and value, however you may just put anything as your value if you do not care (i.e. empty string). In your case you should just add these keys to your array which are of value i.e. true. And then when you will be looking for it and will be unable to find you can assume it's false. But in general you are doing things wrong. You are NOT really saving here but make your code unclean and hard to read and maintain. Do not do this
If you don't want to have a dictionary structure like in an accoc array, then you just want a set of values, like this:
$array = ('red', 'green', 'blue');
To check if a key (item) exists just use in_array():
if(in_array('red', $array)) {
// -> found
}
However,you should note that php will internally create numeric indicies in this case.
Another way to go would be to assign TRUE to all values. This would at least take less memory. Like this
$array (
'red' => TRUE,
'green' => TRUE,
'blue' => TRUE
);
and check existence using isset() Like:
if(isset($array['red'])) {
// -> found
}
Note: I wouldn't advice you to use NULL as the value. This because you cannot use isset() in this case as isset will return false if the value of a key is NULL. You'll have to use array_key_exists() in this case what is significantly slower than isset().
Conclusion: In terms of processor and memory consumption I would suggest the second advice in PHP. The memory consumption should be the same as with numeric arrays but search operations are optimized.
If i understand correctly.
You plan to use an associative array like this:
key value
"bool1" ""
"bool2" ""
"bool3" ""
And if a key exists, then the bool is "true".
Why not just use an ordinary array like this?:
key value
1 "bool1"
2 "bool2"
3 "bool3"
If the value exists, then the bool is "true".
Yes it's possible. You can also use array_key_exists to check for those values. PHP seperates the hash map of variable names from the actual storage of data (google on zval if you're interested). With that said, arrays pay an additional penalty in having to also have an associated "bucket" structure for each element, that depending on your os and compile options can be as large as 96 bytes/per. Zvals are also as much as 48 bytes each, btw.
I don't think there's any chance you're going to get much value from this scheme however, but purely from a hypothetical standpoint, you can store a null value.
<?php
$foo = array('a' => null, 'b' => null);
if (array_key_exists('a', $foo))
echo 'a';
This does not save you any memory however, if compared to initializing to a boolean. Which would then let you do an isset which is faster than making the function call to array_key_exists.
<?php
$foo = array('a' => true, 'b' => true);
if (isset($foo['a']))
echo 'a';

Categories