Recently I have read about hash-tables in a very famous book "Introduction to Algorithms". I haven't used them in any real applications yet, but want to. But I don't know how to start.
Can anyone give me some samples of using it, for example, how to realize a dictionary application (like ABBYY Lingvo) using hash-tables?
And finally I would like to know what is the difference between hash-tables and associative arrays in PHP, I mean which technology should I use and in which situations?
If I am wrong (I beg pardon) please correct me, because actually I am starting with hash-tables and I have just basic (theoretical) knowledge about them.
Thanks a lot.
In PHP, associative arrays are implemented as hashtables, with a bit of extra functionality.
However technically speaking, an associative array is not identical to a hashtable - it's simply implemented in part with a hashtable behind the scenes. Because most of its implementation is a hashtable, it can do everything a hashtable can - but it can do more, too.
For example, you can loop through an associative array using a for loop, which you can't do with a hashtable.
So while they're similar, an associative array can actually do a superset of what a hashtable can do - so they're not exactly the same thing. Think of it as hashtables plus extra functionality.
Code examples:
Using an associative array as a hashtable:
$favoriteColor = array();
$favoriteColor['bob']='blue';
$favoriteColor['Peter']='red';
$favoriteColor['Sally']='pink';
echo 'bob likes: '.$favoriteColor['bob']."\n";
echo 'Sally likes: '.$favoriteColor['Sally']."\n";
//output: bob likes blue
// Sally likes pink
Looping through an associative array:
$idTable=array();
$idTable['Tyler']=1;
$idTable['Bill']=20;
$idTable['Marc']=4;
//up until here, we're using the array as a hashtable.
//now we loop through the array - you can't do this with a hashtable:
foreach($idTable as $person=>$id)
echo 'id: '.$id.' | person: '.$person."\n";
//output: id: 1 | person: Tyler
// id: 20 | person: Bill
// id: 4 | person: Marc
Note especially how in the second example, the order of each element is maintained (Tyler, Bill Marc) based on the order in which they were entered into the array. This is a major difference between associative arrays and hashtables. A hashtable maintains no connection between the items it holds, whereas a PHP associative array does (you can even sort a PHP associative array).
php arrays ARE basically hash tables
The difference between an associative array and a hash table is that an associative array is a data type, while a hash table is a data implementation. Obviously the associative array type is very important in many current programming languages: Perl, Python, PHP, etc. A hash table is the main way to implement an associative array, but not quite the only way. And associative arrays are the main use of hash tables, but not quite the only use. So it's not that they are the same, but if you already have associative arrays, then you usually shouldn't worry about the difference.
For performance reasons, it can be important to know that your associative arrays in your favorite language are implemented as hashes. And it can be important to have some idea of the overhead cost of that implementation. Hash tables are slower and use more memory than linear arrays as you see them in C.
Perl lumps the two concepts together by calling associative arrays "hashes". Like a number of features of Perl, it isn't quite wrong, but it's sloppy.
An array in PHP is actually an ordered map, not hashtable. Main difference between map and hashtable consists in inability to remember the order in wich elements have been added. On the other hand, hashtables are much faster than maps. Complexity of fetching an element from map is O(nlogn) and from hashtable is O(1).
An associative array is an array where you don't access elements by an index, but by a key. How this works internally is implementation specific (there is no rule how it must work). An associative array could be implemented by a hash table (most implementations will do that), but it could also be implemented by some sort of tree structure or a skip list or the algorithm just iterates over all elements in the array and looks for a key that matches (this would be awfully slow, but it works).
A hash table is a way how to store data where values are associated to keys and where you intend to find values for keys within a (usually almost) constant time. This sounds exactly like what you expect of an associative array, that's why most of the time hash tables are used for implementing those arrays, but that is not mandatory.
Related
I would like to know if there is a quicker way to add the same key=>value pair to every inner array (2nd level) of a 2 dimensional array, other than using a for loop to cycle through every inner array?
Background
The array in question is a data set created with PDO, so am unsure how to inject this at the time of creation as it is not in the database.
(the first part of this should be comments - but for various reasons pre-pended to answer)
First off, PHP does not have multidimesional arrays - it has nested arrays which can look like multidimensional arrays.
Secondly, what are your criteria for "quicker"? Something which executes faster? Something that takes less time to implement? Something else?
While there are functions which operate on arrays, such as array_map(), and therefore require marginally less code than implementing a loop, they execute no faster than a PHP loop (indeed in some cases slower).
it is not in the database
Why do you think that has got anything to do the problem? You can inject the value in the DML statement. Assuming its an SQL database and using MySQL syntax:
SELECT mytable.*, 'value' AS `key`
FROM mytable
WHERE $somecondition
I'm using numeric keys that are part of my data, if I can count on the order as initialized my solution is easier, friendlier to read, and cleaner code!
Probably obvious but: Between array initialization and the foreach() outputting the data no other array functions will be touching the array.
PHP arrays are implemented as hashes. Even for numeric keys, the keys actually exist and values are associated with them, unlike lists or sets in other languages. You can count on the order to never change on its own, because that would mean actually changing the values associated with the (numeric) keys.
You can count on it. PHP only changes the order after a sort() or similar function call.
You could have found out by var_dump()ing the array yourself, by the way.
If you are asking if:
array("a","b","c")
will always put a into key 1, b into key 2, and c into key 3, then yes, it can be counted on (hence the name array).
I have a question regarding array's performance.... how does php handle array keys? I mean if I do something like $my_city = $cities[15]; .... does php directly access the exact row item in $cities array or does php iterate trough array until it finds the matched row?
and if it access the row directly... is there a difference in performance between an array with 100 rows and array with 100,000 rows?
like in this example
$my_city = $cities[15];
PHP's arrays are implemented as hash tables, so the elements are accessed as directly as possible, without iterating through everything. Read more about the algorithm here: http://en.wikipedia.org/wiki/Hash_table
It access it directly. Behind the scene everything is memory pointers arithmetic. I can hardly believe that array concept or implementation in PHP somehow differs from other languages like C, Java or C#.
It depends on what is inside the array too.
// $my_city is a reference to the variable
$cities[15] = new stdobj();
$my_city = $cities[15];
// $my_city is a copy of the array row
$cities[15] = 'foobar';
$my_city = $cities[15];
May not be a direct answer to you question, but a reference to an object usually uses less memory than a variable copy.
there is always an obvious difference in performance between an array with 100 rows and array with 100,000 rows
If I want to use a PHP non-associative array like a dictionary and add a big key, how much memory will PHP allocate?
$myArray = Array();
$myArray[6000] = "string linked to ID 6000";
$myArray[7891] = "another key-value pair";
Will PHP also allocate memory for the unused keys 0-5999 and 6001-7890?
No, PHP doesn't implement this like a C style array. Php arrays are associative containers, as the php article on arrays states.
An array in PHP is actually an ordered
map. A map is a type that associates
values to keys.
Since order is preserved, the array will likely be some kind of binary search tree. If you're unfamiliar with binary search trees I suggest picking up a good data structures book to learn more or check out this wikipedia article for a rundown. Your example above would yield a binary search tree with two nodes -- one for data at key 6000, the other for key 7891.
It won't allocate memory for indexes 0-5999.
When pushing a new value onto an indexed array
$array[] = 'new value';
the PHP documentation explains how it gets added in the [MAX_INDEX+1] position.
When pushing a new value onto an associative array
$array['key'] = 'new value';
it works the same, but I don't see any explanation in the documentation to confirm how or why it does so. The order seems to be consistent in my implementation, but how do I know for sure that the order will remain the same? Does anyone know how PHP implements this on the back-end?
All PHP Arrays, numeric and associative, are implemented as a so-called "Ordered Hash-Table". This is a data science term which amounts to: "A reasonable fast key-value store that keeps track of the order in which keys and values were inserted". In other words, PHP arrays have a bit of memory bolted on for the purpose of remembering order. Every time you put something in it, PHP automatically puts the order in there as well.
Interestingly, this happens for numeric keys as well- so if you put the values 1,2,3,4,5 into a PHP array, PHP is still separately keeping track of the order. If this sounds wasteful, that's because it is! It does, however, save brain cycles, that can be used to solve other poeple's problems, real and imagined.
MAX_INDEX actually has nothing to do with ordering.
you can do
$array[5] = 'new value';
$array[1] = 'new value';
$array[105] = 'new value';
$array[2] = 'new value';
and this array will keep that order as well.
PHP array is an ordered map, so, it's a map that keeps the order.
array elements just keep the order since they were added
that's all.
How are associative arrays implemented in PHP? might give you some insight.
It seems that PHP arrays are essentially hash tables, so the order of the array will stay the same until you reorder it (e.g. by sorting the array).
EDIT: It appears this is getting downvoted, allow me to explicitly include the sources I linked to in the comment below here...
"PHP associative arrays are in fact an implementation of HashTables", from
How is the PHP array implemented on the C level?
Also from that source: "The PHP array is a chained hash table (lookup of O(c) and O(n) on key collisions) that allows for int and string keys. It uses 2 different hashing algorithms to fit the two types into the same hash key space."
"Everything is a HashTable" from http://nikic.github.io/2012/03/28/Understanding-PHPs-internal-array-implementation.html
I prefer to rely on ksort. In my experience, arrays stay consistent until you start removing elements. Better to manually sort them and know they're in the order you want.