How does PHP keep track of order in an associative array? - php

When pushing a new value onto an indexed array
$array[] = 'new value';
the PHP documentation explains how it gets added in the [MAX_INDEX+1] position.
When pushing a new value onto an associative array
$array['key'] = 'new value';
it works the same, but I don't see any explanation in the documentation to confirm how or why it does so. The order seems to be consistent in my implementation, but how do I know for sure that the order will remain the same? Does anyone know how PHP implements this on the back-end?

All PHP Arrays, numeric and associative, are implemented as a so-called "Ordered Hash-Table". This is a data science term which amounts to: "A reasonable fast key-value store that keeps track of the order in which keys and values were inserted". In other words, PHP arrays have a bit of memory bolted on for the purpose of remembering order. Every time you put something in it, PHP automatically puts the order in there as well.
Interestingly, this happens for numeric keys as well- so if you put the values 1,2,3,4,5 into a PHP array, PHP is still separately keeping track of the order. If this sounds wasteful, that's because it is! It does, however, save brain cycles, that can be used to solve other poeple's problems, real and imagined.

MAX_INDEX actually has nothing to do with ordering.
you can do
$array[5] = 'new value';
$array[1] = 'new value';
$array[105] = 'new value';
$array[2] = 'new value';
and this array will keep that order as well.
PHP array is an ordered map, so, it's a map that keeps the order.
array elements just keep the order since they were added
that's all.

How are associative arrays implemented in PHP? might give you some insight.
It seems that PHP arrays are essentially hash tables, so the order of the array will stay the same until you reorder it (e.g. by sorting the array).
EDIT: It appears this is getting downvoted, allow me to explicitly include the sources I linked to in the comment below here...
"PHP associative arrays are in fact an implementation of HashTables", from
How is the PHP array implemented on the C level?
Also from that source: "The PHP array is a chained hash table (lookup of O(c) and O(n) on key collisions) that allows for int and string keys. It uses 2 different hashing algorithms to fit the two types into the same hash key space."
"Everything is a HashTable" from http://nikic.github.io/2012/03/28/Understanding-PHPs-internal-array-implementation.html

I prefer to rely on ksort. In my experience, arrays stay consistent until you start removing elements. Better to manually sort them and know they're in the order you want.

Related

PHP is ksort() necessary if using array values only individually?

A simple question today!
I've ended up defining the values in my array non-sequentially, in other words 2 => 'marmosets' is defined before 0 => cats and 1 => dogs. It is my understanding is that the keys will assign properly (ie value marmosets will indeed be key 2 not key 0 even though it is defined first), but that my array will be 'out of order' such that a print_r() would output:
2 => marmosets
0 => cats
1 => dogs
And that if I want to put them in numerical order by key, ksort() will do that job.
(a) Is my understanding correct?
(b) If I'm only using these values individually, and never need to output the list, is there any harm/impact in skipping the ksort() and leaving them "out of order"?
(a) Yes and (b) No.
a) PHP's arrays are ordered maps. The default order will stay insertion order until you change that, e.g. by sorting.
b) If you never do anything that relies on any order, e.g. just accessing data by keys, the order is irrelevant, so there's no harm.
Printing the array will indeed print it in the order your created it with, whether those keys are numeric or associative. This can be proven simply by testing your example. There is no harm in skipping ksort if you do not rely on the actual order of the array. However, it does not hurt to use ksort either. Unless you are dealing with huge amounts of data, sorting the array once in your application will have no noticable effect on performance.

PHP Can the order of array elements (as initialized) be counted on as a behavior?

I'm using numeric keys that are part of my data, if I can count on the order as initialized my solution is easier, friendlier to read, and cleaner code!
Probably obvious but: Between array initialization and the foreach() outputting the data no other array functions will be touching the array.
PHP arrays are implemented as hashes. Even for numeric keys, the keys actually exist and values are associated with them, unlike lists or sets in other languages. You can count on the order to never change on its own, because that would mean actually changing the values associated with the (numeric) keys.
You can count on it. PHP only changes the order after a sort() or similar function call.
You could have found out by var_dump()ing the array yourself, by the way.
If you are asking if:
array("a","b","c")
will always put a into key 1, b into key 2, and c into key 3, then yes, it can be counted on (hence the name array).

How do I sort while still keeping the same array indices?

The index is automatic, the indexes are 0, 1, 2, etc. nothing special.
But the key to the database is auto-incremented. meaning the index directly correlates to the key of the database. So after the sort, I can do a query and retrieve the info on that row with that key.
But I think that once you sort(), it changes the index, therefore I would lose that correlation between the index and the key of the database therefore making the array quite useless...so my question is how do I keep that correlation while still sorting so that the highest value goes to the top?
And then after that's successful, how do I sort through an array like that?
Thanks so much for your help and patience,
Binny
I think what you want is asort() or arsort(). According to the documentation for asort(),
This function sorts an array such that
array indices maintain their
correlation with the array elements
they are associated with. This is used
mainly when sorting associative arrays
where the actual element order is
significant.
arsort() sorts in the reverse direction.
Use asort: http://www.php.net/manual/en/function.asort.php

What is php's performance on directly accessing array's row with key specified

I have a question regarding array's performance.... how does php handle array keys? I mean if I do something like $my_city = $cities[15]; .... does php directly access the exact row item in $cities array or does php iterate trough array until it finds the matched row?
and if it access the row directly... is there a difference in performance between an array with 100 rows and array with 100,000 rows?
like in this example
$my_city = $cities[15];
PHP's arrays are implemented as hash tables, so the elements are accessed as directly as possible, without iterating through everything. Read more about the algorithm here: http://en.wikipedia.org/wiki/Hash_table
It access it directly. Behind the scene everything is memory pointers arithmetic. I can hardly believe that array concept or implementation in PHP somehow differs from other languages like C, Java or C#.
It depends on what is inside the array too.
// $my_city is a reference to the variable
$cities[15] = new stdobj();
$my_city = $cities[15];
// $my_city is a copy of the array row
$cities[15] = 'foobar';
$my_city = $cities[15];
May not be a direct answer to you question, but a reference to an object usually uses less memory than a variable copy.
there is always an obvious difference in performance between an array with 100 rows and array with 100,000 rows

Hash tables VS associative arrays

Recently I have read about hash-tables in a very famous book "Introduction to Algorithms". I haven't used them in any real applications yet, but want to. But I don't know how to start.
Can anyone give me some samples of using it, for example, how to realize a dictionary application (like ABBYY Lingvo) using hash-tables?
And finally I would like to know what is the difference between hash-tables and associative arrays in PHP, I mean which technology should I use and in which situations?
If I am wrong (I beg pardon) please correct me, because actually I am starting with hash-tables and I have just basic (theoretical) knowledge about them.
Thanks a lot.
In PHP, associative arrays are implemented as hashtables, with a bit of extra functionality.
However technically speaking, an associative array is not identical to a hashtable - it's simply implemented in part with a hashtable behind the scenes. Because most of its implementation is a hashtable, it can do everything a hashtable can - but it can do more, too.
For example, you can loop through an associative array using a for loop, which you can't do with a hashtable.
So while they're similar, an associative array can actually do a superset of what a hashtable can do - so they're not exactly the same thing. Think of it as hashtables plus extra functionality.
Code examples:
Using an associative array as a hashtable:
$favoriteColor = array();
$favoriteColor['bob']='blue';
$favoriteColor['Peter']='red';
$favoriteColor['Sally']='pink';
echo 'bob likes: '.$favoriteColor['bob']."\n";
echo 'Sally likes: '.$favoriteColor['Sally']."\n";
//output: bob likes blue
// Sally likes pink
Looping through an associative array:
$idTable=array();
$idTable['Tyler']=1;
$idTable['Bill']=20;
$idTable['Marc']=4;
//up until here, we're using the array as a hashtable.
//now we loop through the array - you can't do this with a hashtable:
foreach($idTable as $person=>$id)
echo 'id: '.$id.' | person: '.$person."\n";
//output: id: 1 | person: Tyler
// id: 20 | person: Bill
// id: 4 | person: Marc
Note especially how in the second example, the order of each element is maintained (Tyler, Bill Marc) based on the order in which they were entered into the array. This is a major difference between associative arrays and hashtables. A hashtable maintains no connection between the items it holds, whereas a PHP associative array does (you can even sort a PHP associative array).
php arrays ARE basically hash tables
The difference between an associative array and a hash table is that an associative array is a data type, while a hash table is a data implementation. Obviously the associative array type is very important in many current programming languages: Perl, Python, PHP, etc. A hash table is the main way to implement an associative array, but not quite the only way. And associative arrays are the main use of hash tables, but not quite the only use. So it's not that they are the same, but if you already have associative arrays, then you usually shouldn't worry about the difference.
For performance reasons, it can be important to know that your associative arrays in your favorite language are implemented as hashes. And it can be important to have some idea of the overhead cost of that implementation. Hash tables are slower and use more memory than linear arrays as you see them in C.
Perl lumps the two concepts together by calling associative arrays "hashes". Like a number of features of Perl, it isn't quite wrong, but it's sloppy.
An array in PHP is actually an ordered map, not hashtable. Main difference between map and hashtable consists in inability to remember the order in wich elements have been added. On the other hand, hashtables are much faster than maps. Complexity of fetching an element from map is O(nlogn) and from hashtable is O(1).
An associative array is an array where you don't access elements by an index, but by a key. How this works internally is implementation specific (there is no rule how it must work). An associative array could be implemented by a hash table (most implementations will do that), but it could also be implemented by some sort of tree structure or a skip list or the algorithm just iterates over all elements in the array and looks for a key that matches (this would be awfully slow, but it works).
A hash table is a way how to store data where values are associated to keys and where you intend to find values for keys within a (usually almost) constant time. This sounds exactly like what you expect of an associative array, that's why most of the time hash tables are used for implementing those arrays, but that is not mandatory.

Categories