How is this (erroneous) comparison done by PHP? - php

Assume the following code, which tries to determine whether the array has more than 3 elements. Note, that I am aware that this is done normally using count($array) and comparing the integers, but I got curious as to why
$array = [1, 2, 3];
var_dump($array > 3);
returns true, which it, actually, does independently of the value of the right comparison operand in the var_dump, so $array > 3 is no different than $array > 3000.
My question lies in what sort of typecasting happens internally in PHP when an array is compared to an integer in this entirely inappropriate manner, or whether there is a case where this manner is indeed appropriate.

From the PHP manual, it says:
array anything array is always greater
So when you compare anything with array, then array is greater. I am now going to check array vs. object.

Related

Comparing PHP arrays with integers, what is the real behavior

Today I faced a weird behavior on PHP (v: 7.1) arrays.
$emptyArray = [];
echo empty($emptyArray);
echo count($emptyArray);
echo (($emptyArray > 0));
The first two echos results is known (empty : true , count: 0), but the last one which confused me returned true!
Why PHP considered an empty array is larger than zero ?!
The answer is to be found in the rules for comparisons between different types:
Operand 1 Operand 2 Result
... ... ...
array anything array is always greater
Why PHP considered an empty array is larger than zero ?!
It is written in the documentation: when it is compared with an object of a different type, the array is always greater.

Why does array_udiff use a compare function instead of a predicate function?

array_udiff computes the difference between two arrays using a callback function. However, it requires a compare function instead of a predicate function.
A compare function compares item A relative to item B. A predicate function would just determine whether or not item A is equal to item B.
Compare functions are usually required by sort functions to determine the correct ordering. Since array_udiff is just computing the differences, a predicate function that determines whether each pair is equal seems like it should suffice.
Why does array_udiff use a compare function instead of a predicate function? Does it matter if I use a predicate instead? i.e. Can I elect to just use the 0 and 1 return values to denote inequality and equality, discarding the -1 possibility? What adverse effect, if any, would this have on my results?
The implementation for php_array_diff() (which supplies implementation for a number of userspace array functions) works by reusing a number of internal comparison functions.
This is because those comparison functions already exist for other purposes, and meet the needed task at hand: determine if two items are equal or not. That they do a little extra work is unimportant; what's important is the relative reduction in code that needs to be considered. (An equals function could easily be written in terms of a comparison function, or as a separate entity, but now you've got two functions to do the same job.)
The actual implementation also works by sorting. So you need to use a comparison algorithm suitable for sorting, or you will get unexpected results. For example:
$a = [0, 1, 2, 3, 4, 5, 6];
$b = [4];
print_r(array_udiff($a, $b, function($x, $y) {
return $x <=> $y; //Sorting comparison function, correct
}));
print_r(array_udiff($a, $b, function($x, $y) {
return $x != $y; // Equality test, incorrect
}));
gives
Array //Sorting comparison function, correct
(
[0] => 0
[1] => 1
[2] => 2
[3] => 3
[5] => 5
[6] => 6
)
Array // Equality test, incorrect
(
[0] => 0
[1] => 1
[2] => 2
[3] => 3
[4] => 4 // equality test causes udiff to incorrectly include 4
[5] => 5
[6] => 6
)
The reason for this is the algorithm php_array_diff() uses. Basically it goes like this:
Duplicate and sort all of the input arrays
Set the output OUT equal to the sorted first input array
For each element in SRC
V is the value of the current element in SRC
For each input array A starting from the second
Skip ahead to the next element in A that is > V, but make a note if we go past one that is == V.
If we found a match for V, remove it from OUT.
If we don't (so it stays in the input array), skip ahead in SRC until we have a new V >= the current one
So, the algorithm relies on all of the inputs being sorted, and uses that fact, (and the the comparison function) so it only has to inspect each element in each input array once. If the comparison function doesn't result in an actually sorted array, the algorithm fails and you get a bad result.
HHVM may result in a different result, as HHVM uses a different sorting algorithm. HHVM uses a pure quicksort, while PHP uses a quicksort implementation derived from llvm that includes an insertion sort optimization.
Normally, different sorting algorithms arrive at the same solution via different means. That is, different algorithms cause elements to be compared in a different order, at different times, and in different quantities. In the case of an incorrect comparison function, this can have a large effect on the final order of the array.
It must be cheaper to complement ordered arrays. Without sorting it will take O(m*n) time.

PHP: What is the real meaning of an index in an indexed array?

Say, we make an array like this:
$arr = Array
(
2 => 'c',
1 => 'b',
0 => 'a'
);
When you pass it to array_pop():
array_pop($arr);
And the "last" element would be poped off, which has the index of zero!!
print_r($arr);
Result:
Array
(
[2] => c
[1] => b
)
So, what's the purpose of index?
Isn't it just a different way of saying "numeric keys of associative arrays"?
Is it only PHP dose so, or all the languages treat arrays like this?
Not all languages do this, but PHP does, because PHP is a little weird. It implements arrays more or less like dictionaries. PHP does offer some functions like ksort though, which let you sort the array by key.
And that's what this is: a key. An array has indexes as well, so what you got, is an array where item 2 has key 0. And that's where the confusion starts continues.
PHP: a fractal of bad design has a whole chapter about arrays. Interesting reading material. :)
The reason for this behavior is because arrays in PHP are actually unordered maps.
Because of this, don't think of accessing the arrays in terms of indexes, think of it in terms of keys. Keys can be numbers and they can be strings, but the result is the same; you're still using a map, not a true "array".
Once you accept that fact, you'll understand why PHP includes functions like ksort() for sorting an array by keys and why array_pop() doesn't always remove the highest key value.
It's a PHP thing. Other languages usually provide other structures to provide what is the default behaviour for arrays on PHP. JavaScript for instance will always sort the array:
a = [];
> []
a[1] = 'a';
> "a"
a[2] = 'b';
> "b"
a[0] = 'c';
> "c"
a
> ["c", "a", "b"]
In Java you would need to use a Hash Map or something else to do Associative Arrays. PHP handles data structures more loosely than other languages.
The index allows you to identify and access the elements of the array.
the reason is simple HashTables.
in php internal functions often use HashTables. basically an array is some data in memory and like in C - an array index can only hold integer values but not in php.
php solves this with hashtables. if you asign a index example foo this value is not directly assigned as foo it gets hashed and maybe end internal as 000000000111 and other hash functions.
so php doesn't work directly with your assigned value and this is the reason why you can set an array index like 0 as last index element. internal php work with hashtables that have a "list" with values which index value is assigned to which position in the array.

PHP array comparison algorithm

While trying to simulate a bit of PHP behaviour I stumbled across this:
$a=array(0 => 1, 'test' => 2);
$b=array('test' => 3, 0 => 1);
var_dump($a==$b, $a>$b, $b>$a);
According to the output from var_dump $b is bigger than $a. In the PHP manual there is a Transcription of standard array comparison which states that the values of the arrays are compared one by one and if a key from the first array is missing in the second array, the arrays are uncomparable. So far so good. But if I try this (change in the first element of $a only):
$a=array(0 => 2, 'test' => 2);
$b=array('test' => 3, 0 => 1);
var_dump($a==$b, $a>$b, $b>$a);
All three comparison results are false. This looks like "uncomparable" to me (because the > result is the same as the < result, while the arrays are not ==either, which makes no sense) but this does not fit the transcription from the PHP manual. Both keys are present in both arrays and I would expect $a to be bigger this time because the content of key 0 is bigger in $a (2 vs. 1).
I've tried to dig into the PHP source code and found zend_hash_compare() in zend_hash.c, but the code there seems to work as the manual describes.
What's going on here?
EDIT: As Joachim has shown, it deals with the order called. To steal his words:
"$a>$b loops over b and finds 'test' first. 'test' is greater in $b so $b is greater and it returns false. $b>$a loops over a and finds '0' first. '0' is greater in $a so $a is greater and it returns false."
-- Original Post --
I'm not 100% sure I'm right on this; I haven't seen this before, and have only briefly looked into it (major kudos, by the way, on an excellent question!). Anyway, it would appear that either PHP documentation is wrong, or this is a bug (in which case you might want to submit it), and here is why:
in zend_hash_compare() in zend_hash.c, it seems as though there is some confusion over what ordered is (I'm looking at line 1514 and 1552-1561, which is my best guess is where the problem is, without doing lots of testing).
Here's what I mean; try this:
$a=array(0 => 2, 'test' => 2);
$b=array(0 => 1, 'test' => 3);
var_dump($a==$b, $a>$b, $b>$a);
Note I merely switched the order of indexes, and $a>$b returns true. Also see this:
$x=array(0 => 2, 'test' => 2);
$y = $x;
$y[0] = 1; $y['test'] = 3;
var_dump($x==$y, $x>$y, $y>$x);
Note here, as well, $x>$y returns true. In other words, PHP is not just matching array keys! It cares about the order of those keys in the arrays! You can prevent this situation by coming up with a "base" array and "copying" it into new variables (in my x/y example) before modifying, or you can create an object, if you so desire.
To say all that differently, and much more briefly, it would appear that PHP is not just looking at key values, but at both key values AND key order.
Again, I emphasize I don't know if this expected behavior (it seems like something they ought to have noted in the PHP manual if it was), or a bug/error/etc (which seems much more likely to me). But either way, I'm finding that it is compared first by number of keys (lines 1496-1501 in zend_hash.c), and then by both key value and key order.
It would seem that the comparison loop is in the case of > done over the right hand array and in the case of < done over the left hand array, ie always over the supposedly "lesser" array. The order of the elements is significant as the foreach loop in the transcription code respects array order.
In other words;
$a>$b loops over b and finds 'test' first. 'test' is greater in $b so $b is greater and it returns false.
$b>$a loops over a and finds '0' first. '0' is greater in $a so $a is greater and it returns false.
This would actually make sense, the "greater" array is then allowed to contain elements that the "lesser" array doesn't and still be greater as long as all common elements are greater.
I think here is comparing one by one so $a[0]>$b[0] but $a['test']<$b['test'].
You can not say which array is bigger.

php associative array values always set?

$test['test'] = 'test';
if(isset($test['test']['x']))
return $test['test']['x'];
This statement returns the first character of the string in $test['test'] (in this case 't'), no matter what is specified as dimension 2.
I can't wrap my head around this behavior. I use isset() all the time. Please advise.
This happens because you're not indexing an array, you're indexing a string. Strings are not arrays in PHP. They happen to share a concept of indexes with arrays, but are really character sequences even though there is no distinct char data type in PHP.
In this case, since strings are only indexed numerically, 'x' is being converted into an integer, which results in 0. So PHP is looking for $test['test'][0]. Additionally $test is only a single-dimensional array, assuming 'test' is the only key inside.
Not really relevant to your question, but if you try something like this you should get 'e', because when converting '1x' to an integer, PHP drops anything that isn't a digit and everything after it:
// This actually returns $test['test'][1]
return $test['test']['1x'];
If you're looking for a second dimension of the $test array, $test['test'] itself needs to be an array. This will work as expected:
$test['test'] = array('x' => 'test');
if (isset($test['test']['x']))
return $test['test']['x'];
Of course, if your array potentially contains NULL values, or you want to make sure you're checking an array, use array_key_exists() instead of isset() as sirlancelot suggests. It's sliiiiightly slower, but doesn't trip on NULL values or other indexable types such as strings and objects.
Use array_key_exists for testing array keys.
It's returning 't' because all strings can be treated as arrays and 'x' will evaluate to 0 which is the first letter/value in the variable.

Categories