Accessing array better via numeric or associative key?

Accessing array better via numeric or associative key? - php

I iterate over an array of arrays and access the array's value through associative keys, this is a code snippet. Note: i never iterate over the total array but only with a window of 10.
//extract array from a db table (not real code)
$array = $query->executeAndFetchAssociative;
$window_start = 0;
for($i = $window_start; $i<count($array) && $i<$window_start+10; $i++)
echo($entry["db_field"]);
This is a sort of paginator for a web interface. I receive the windows_start value and display hte next 10 values.
A conceptual execution:
Receive the windows_start number
Start the cycle entering the window_start-TH array of the outer array
Display the value of a field of the inner array via associative index
Move to window_start+1
The inner arrays have about 40 fields. The outer array can grow a lot as it rapresent a database table.
Now i see that as the outer array gets bigger the execution over the windows of 10 takes more and more time.
I need some "performance theory" on my code:
If I enter the values of inner arrays via numeric key can I have better performance? In general is quickier accessing the array values with numeric index than accessing with associative index (a string)?
How does it cost entering a random entry ($array[random_num]) of an array of length N ? O(N), O(N/2) just for example
Finally the speed of iterating over an array depends on the array lenght? I mean i always iterate on 10 elements of the array, but how does the array lenght impact on my fixed length iteration?
Thanks
Alberto

If I enter the values of inner arrays via numeric key can I have
better performance? In general is quicker accessing the array values
with numeric index than accessing with associative index (a string)?
There might be a theoretical speed difference for integer-based vs string-based access (it depends on what the hash function for integer values does vs the one for string values, I have not read the PHP source to get a definite answer), but it's certainly going to be negligible.
How does it cost entering a random entry ($array[random_num]) of an
array of length N ? O(N), O(N/2) just for example
Arrays in PHP are implemented through hash tables, which means that insertion is amortized O(1) -- almost all insertions are O(1), but a few may be O(n). By the way, O(n) and O(n/2) are the same thing; you might want to revisit a text on algorithmic complexity.
Finally the speed of iterating over an array depends on the array
length? I mean i always iterate on 10 elements of the array, but how
does the array length impact on my fixed length iteration?
No, array length is not a factor.
The performance drops not because of how you access your array but because of the fact that you seem to be loading all of the records from your database just to process 10 of them.
You should move the paging logic to the database itself by including an offset and a limit in your SQL query.

Premature optimization is the root of all evil. Additional numeric and associative arrays have a very different semantic meaning and are therefore usually not interchangeable. And last but not least: No. Arrays in PHP are implemented as Hashmaps and accessing them by key is always O(1)
In your case (pagination) it's much more usefull to only fetch the items you want to display instead of fetching all and slicing them later. SQL has the LIMIT 10 OFFSET 20-syntax for that.

Related

PHP numeric array index and memory

If I have an array which is using numerical keys and I add a key far outside the range so far, does it create the intermediate keys as well. For example if I have
$array = array(1,2,3,4);
$array[9] = 10;
Will this cause php to internally reserve memory for the keys 4-8 even though they don't have a value with them.
The reason I ask is I have a large 2D array which I want to use for memoization in a dynamic programming algorithm because only a small number of the total cells will need to be computed. However it would be taxing on the memory to have an empty array of that size for the full 2D array. Is there a better way to do 2 key memoization? I could use an associative array and append the keys with a separator or some scheme like that but if php won't make the extra keys I would rather (for simplicity and readability) just use the 2D array. Thoughts?

This may not fully answer your question, but should help in finding an answer, at least to the first question.
Create this program.
$arr = array(1, 2, 3, 4);
sleep(10);
$arr[100000] = 1;
sleep(10);
Now run it and monitor its memory usage.
In the first ten seconds, the program reserves memory for a small array.
In the next ten seconds, if the array reserves space for the unused indices, the memory usage goes ridiculously high compared to the previous one. If it doesn't, though, the memory used will only grow slightly.
This should give you an idea of the effect of your final program, whether or not using a 2D array is a good idea.

Don't worry, it won't make any extra keys. PHP is not like that, even arrays that you think are regular are associative arrays. You can even combine PHP arrays like this:
array(
1 => 121,
2 => 2112,
'stuff' => array('morestuff'),
'foo' => 1231
)
With PHP its very comfortable which can be good and bad also.

It doesn't seem like it will allocate a placeholder or use any memory for the unused keys, based on Doug T.'s response. Hope this helps!

Is there a scenario where array_search is faster than a consecutive array_flip and direct lookup?

Imagine you were to search an array of N elements and perform Y Searches on the array values to find the corresponding keys; you can either do Y array_search's or do one array_flip and Y direct lookups. Why is the first method alot slower than the second method? Is there a scenario where the first method becomes faster than the second one?
You can assume that keys and values are unique

Array keys are hashed, so looking them up just requires calling the hash function and indexing into the hash table. So array_flip() is O(N) and looking up an array key is O(1), so Y searches are O(Y)+O(N).
Array values are not hashed, so searching them requires a linear search. This is O(N), so Y searches are O(N*Y).
Assuming values being searched for are evenly distributed through the array, the average case of linear search has to compare N/2 elements. So array_flip() should take about the time of 2 array_search() calls, since it has to examine N elements.
There's some extra overhead in creating the hash table. However, PHP uses copy-on-write, so it doesn't have to copy the keys or values during array_flip(), so it's not too bad. For a small number of lookups, the first method may be faster. You'd have to benchmark it to find the break-even point.

Understanding PHP Arrays from a Python background

I typically code in Python, Java, or C. I'm taking on a project in PHP and I'm reading up on arrays in PHP and I'm utterly baffled. If I am understanding correctly, the numerical indices in PHP don't necessarily correspond to position and are just keys like in a dict in Python. So, when you shuffle a PHP array, the order of the elements will change, but their keys will remain the same. So when calling array[9], you might actually be getting the first element of the array if the shuffle ordered the elements that way. This raises a bunch of questions:
Is a PHP array, then, always just some kind of ordered hash table?
And what does that mean for overhead? In Python, lists function like
a classic array data structure and dictionaries more along the lines
of a hash structure. PHP seems to combine the two by assigning unique
keys to every value AND keeping track of the order of those values. If I want to use an associative array structure for constant time lookup, am I in a far worse off position than I would be with a Python dictionary because of this ordering overhead? Are there PHP data structures that are ONLY arrays or ONLY hash tables?
What happens when you remove a value from a numbered PHP array? If I
have an array, [1, 2, 3, 4, 5], I remove 4 from the array, and then
try to access array[3], is it going to give me an error, since I
removed the element with the key 3? Or does PHP do some kind of key
adjusting in such a case?
If you change the ordering of an array (i.e., through a sort or a
shuffle), is the only way to have the indices correspond to the
position to copy the array to a new array using array_values().

http://php.net/manual/en/class.splfixedarray.php
This code:
$arr = array(0,1,2,3,4);
unset($arr[3]);
echo $arr[3]; // undefined index warning, execution continues;
echo isset($arr[3]) ? $arr[3] : '';
print_r($arr);
The print_r() outputs:
Array
(
[0] => 0
[1] => 1
[2] => 2
[4] => 4
)
This depends on the function you choose. Some maintain index association, some do not.
Protip:
Never expect two seemingly-similar PHP functions to behave anything like each other. It's the "English" of programming languages: full of crap stolen from other languages and loads of conventions that contradict each other, but everyone speaks it so hop on board the freedom train.
'murca.

sorted php array "introspection"

For some reason, I have a sorted php array:
"$arr_questions" = Array [6]
0 Array [6]
1 Array [6]
2 Array [6]
3 Array [6]
4 Array [6]
5 Array [6]
each of the positions is another array. This time it is associative. See position [0]:
0 = Array [6]
question_id 40
question La tercera pregunta del mundo
explanation
choices Array [3]
correct 0
answer 1
Without looping my array, is there any way to access directly this position 0, just knowing one of its properties?
Example... Imagine I have to change some property of the position of the array whose "question_id" property is 40. That is the only I know. I don't know if the question_id property is gonna be in the first or second or which position. And, for example, imagine I want to change the "answer" property to 2.
How can I access directly to that position without looping the whole array. I mean... I don't want to do this:
foreach ($arr_questions as $question){
if ($question["question_id"] == 40){
$question["answer"] == 2;
}
}

A PHP Array lets you access random values by its id.
It is actually a big deal, because in other languages array indices must be always integers.
However, PHP arrays work mostly like other-languages dictionaries, in which your key can be of other data types, like strings.
By that, if you want to be able to access some question, and you know the ID, then you should have constructed the array by letting your question_id be the index of each array entry.
If you can't do that, don't panic.
In the end you will have to make some kind of search, that's true.
But hey, then you have two cases:
a) Your array is big. Wow, in that case, you should run an optimized sorting algorithm, such as mergesort or quicksort, so that you can order your data quickly and then have them already sorted by your wanted field.
b) Your array is not-so-big. I think in that case it's no big deal, and the sorting can slow your application more than it should, and if you want to be quicker, you should cache the results of sorting the questions (if possible) or refactoring the array construction so it uses your wanted key as the array index.
As a side note, you can't map things avoid wasting some CPU time or some RAM space, and usually you can swap one for the other.
I mean, if you store just one array indexed by question_id then you can look up for question_id's in O(1) + O(array-access) time. If O(array-access) is a constant, then you can get to things in O(1). That means constant time, and it is as fast as it can get.
However, if you need other kind of searches you can end up with O(n * log(n)) or O(n²) time complexity.
But, if you had stored as many arrays as ways to order them you should need, you would need only O(1) time to access each of them. But, you would need O(n) space (where n here is the num of features to have direct access to).
That would increment the time to build the arrays (by a constant).

With your situation it's not possible without a loop, but if you change your array structure to this:
array(
39 => array(...),
40 => array(...)
)
Which 39 and 40 are your question_id, then you can access them so fast without any loop.
If you want or have to to keep that structure, then just write a function to get the array, the associative index and the value you want as parameters to search the array and return the found index, so you will not be forced to write this loop over and over ...

No, there is no way to access that element without looping over your array. You might abstract that search into a helper function, however.

What are the Practical Differences Between "associate" and "indexed" Arrays in PHP?

The PHP array type is actually more akin to an an ordered map than a traditional C array. It's PHP's original general use data structure. The manual goes as far to say The indexed and associative array types are the same type in PHP, which can both contain integer and string indices.
However, there's a lot of cases where built-in language features will make a distinction between "indexed" arrays (arrays with sequential, integer keys) and "associative" arrays (arrays with non-sequential and/or keys of mixed types).
One example of this is the array_merge function.
If the input arrays have the same string keys, then the later value for that key will overwrite the previous one. If, however, the arrays contain numeric keys, the later value will not overwrite the original value, but will be appended.
If only one array is given and the array is numerically indexed, the keys get reindexed in a continuous way.
What are the other places in PHP where a distinction is made between indexed and associative arrays? I'm specifically interested in the Userland differences, although any insight into the Array implementation in PHP's source would be interesting as well.

Actually, any array, no matter if it's indexed or associative, is a hashtable (plus a doubly-linked list for maintaining the order of elements) in PHP. However, in userland PHP code, indexed and associative arrays almost always serve different purposes and sometimes need to be treated in different ways, so several functions like sort/asort make a distinction between them just for convenience.

The most prevalent one that comes to mind is that an indexed array can be looped over using a traditional for loop, whereas an associative one cannot (because it does not have the numeric indexes):
for ($i = 0; $i < count($indexed_array); $i++)
{
// do something with $indexed_array[$i]
}
Of course, php also has a foreach keyword, which works the same on both types.

.. and then there is SplFixedArray, starting from 5.3, it supports only integer indexes, has fixed size and is generally faster than the native arrays.

One interesting difference I've found is when using json_encode.
json_encode(array(0=>0,1=>1,2=>2));
> [0,1,2]
json_encode(array(0=>0,2=>2));
> {"0":0,"2":2}
As a lone example this makes sense, but it's more surprising when combined with, say, array_filter.
$f = function($x) { return $x != 1; };
json_encode(array_filter(array(0,1,2), $f));
> {"0":0,"2":2}
We started off with a numeric array, filtered some elements out, but the resulting json is an associative array!
Note that we can get the desired json by using array_values.
json_encode(array_values(array_filter(array(0,1,2),$f)));
> [0,2]

Pretty much all of the core sorting functions (with all the sort, ksort, asort variations depending on whether you want to maintain key association and so on).

We Keep Coding

PHP, A popular general-purpose scripting language that is especially suited to web development.