PHP numeric array index and memory - php

If I have an array which is using numerical keys and I add a key far outside the range so far, does it create the intermediate keys as well. For example if I have
$array = array(1,2,3,4);
$array[9] = 10;
Will this cause php to internally reserve memory for the keys 4-8 even though they don't have a value with them.
The reason I ask is I have a large 2D array which I want to use for memoization in a dynamic programming algorithm because only a small number of the total cells will need to be computed. However it would be taxing on the memory to have an empty array of that size for the full 2D array. Is there a better way to do 2 key memoization? I could use an associative array and append the keys with a separator or some scheme like that but if php won't make the extra keys I would rather (for simplicity and readability) just use the 2D array. Thoughts?

This may not fully answer your question, but should help in finding an answer, at least to the first question.
Create this program.
$arr = array(1, 2, 3, 4);
sleep(10);
$arr[100000] = 1;
sleep(10);
Now run it and monitor its memory usage.
In the first ten seconds, the program reserves memory for a small array.
In the next ten seconds, if the array reserves space for the unused indices, the memory usage goes ridiculously high compared to the previous one. If it doesn't, though, the memory used will only grow slightly.
This should give you an idea of the effect of your final program, whether or not using a 2D array is a good idea.

Don't worry, it won't make any extra keys. PHP is not like that, even arrays that you think are regular are associative arrays. You can even combine PHP arrays like this:
array(
1 => 121,
2 => 2112,
'stuff' => array('morestuff'),
'foo' => 1231
)
With PHP its very comfortable which can be good and bad also.

It doesn't seem like it will allocate a placeholder or use any memory for the unused keys, based on Doug T.'s response. Hope this helps!

Related

array_keys vs array_flip benchmark request

Sorry for a silly quastion, more like a request..
Been using array_keys but suddenly swapped it with array_flip and saw no difference, can somebody benchmark it? ::
The two functions perform different tasks. Do you need keys exchanged with their associated values? Or do you just need all keys? Use whichever one you need to achieve the task.
There are pretty much always many routes to take that achieve the same outcome, but (eg) perhaps it doesn't matter if the array order is changed or not, then don't use the one which flips as it'll (theoretically/speculatively) take longer and it's not the right tool for the task.
If you have huge sets of data and/or many iterations of it, and as such these things are relevant to you, then you should have benchmark systems setup to test this scenario and likely others. If not then like most applications, such considerations are micro-optimisation and choice of the tool/function should be one which fits the task - and suits your code base or framework, coding style, etc.
Choosing the one which suits the task also makes intent more clear. So eg if you don't need to flip the array, even if that function is marginally faster, use the one you actually need otherwise a future refactor might leave someone scratching their head a little at your choice of using array_flip().
array_flip
array_flip — Exchanges all keys with their associated values in an
array
$input = array("oranges", "apples", "pears");
$flipped = array_flip($input);
Array
(
[oranges] => 0
[apples] => 1
[pears] => 2
)
array_keys
array_keys — Return all the keys or a subset of the keys of an array
$array = array(0 => 100, "color" => "red");
print_r(array_keys($array));
Array
(
[0] => 0
[1] => color
)
Do you need them flipped? Or do you just need the keys? If not need flipped then use the latter, else use the former.
Array flip also states
If a value has several occurrences, the latest key will be used as its value, and all others will be lost.
Maybe this is a problem to you? Maybe it doesn't matter?

Array collisions in php

Small remark
Reading about max_input_vars variable made me to read a lot about PHP's internals for handling arrays. This is not really a question, but rather answering my own question "why do we really need this max_input_var". It is not localized, and actually related to a lot of other programming languages and not only php.
A problem:
Compare these two small php scripts:
$data = array();
for ($key = 0; $key <= 1073709056; $key += 32767){
$data[$key] = 0;
}
Can check it here. Everything normal, nothing unexpected. Execution time is close to 0.
And this mostly identical (difference is in 1)
$data = array();
for ($key = 0; $key <= 1073709056; $key += 32768){
$data[$key] = 0;
}
Check it here. Nothing is normal, everything is unexpected. You exceeded execution time. So it is at least 3000 times slower!
The question is why does it happen?
I posted it here together with an answer because this vastly improved my knowledge about php internals and I learned new things about security.
The problem is not in the loop, the problem is with how PHP and many other languages (Java, Python, ASP.Net) are storing key/value pairs in hash data structures. PHP uses hash table to store arrays (which makes them theoretically very fast for storing and retrieving data from that array O(1)). The problem arise when more than one values get mapped to the same key thus creating hash collisions. Inserting element into such a key becomes more expensive O(n) and therefore inserting n keys jumps from O(n) to O(n^2).
And this is exactly what goes on here. When the number changes from 32767 to 32768 it changes keys from no collisions to everything collides to the same key.
This is the case, because of the way it php arrays are implemented in C. Array is of the size of the power of 2. (Array of 9 and 15 elements will be allocated with array of size 16). Also if the array key is an integer, the hash will be an integer with a mask on top of it. The mask is size of the array - 1 in binary. This means that if someone will try to insert the following keys in the associative array 0, 32, 64, 128, 256, ... and so on, they all will be mapped to the same key and thus the hash will have a linked list. The above example creates exactly this.
This requires a lot of CPU to process and therefore you see huge time increase. What this means is that devs should be really careful when they accept some data from outside that they will be parsed into array (people can craft the data easily and DOS the server). This data can be $_GET, $_POST requests (this is why you can limit the number with max_input_vars), XML, JSON.
Here are the resources I used to learn about these things:
res 1
res 2
res 3
res 4
res 5
I don't know anything about php specifically but 32767 would be the max value of a 2 byte number. Increasing it to 32768 would make the use of a 3 byte number (which is never used so it'll be 4 bytes) necessary which would in turn make everything slower.

Understanding PHP Arrays from a Python background

I typically code in Python, Java, or C. I'm taking on a project in PHP and I'm reading up on arrays in PHP and I'm utterly baffled. If I am understanding correctly, the numerical indices in PHP don't necessarily correspond to position and are just keys like in a dict in Python. So, when you shuffle a PHP array, the order of the elements will change, but their keys will remain the same. So when calling array[9], you might actually be getting the first element of the array if the shuffle ordered the elements that way. This raises a bunch of questions:
Is a PHP array, then, always just some kind of ordered hash table?
And what does that mean for overhead? In Python, lists function like
a classic array data structure and dictionaries more along the lines
of a hash structure. PHP seems to combine the two by assigning unique
keys to every value AND keeping track of the order of those values. If I want to use an associative array structure for constant time lookup, am I in a far worse off position than I would be with a Python dictionary because of this ordering overhead? Are there PHP data structures that are ONLY arrays or ONLY hash tables?
What happens when you remove a value from a numbered PHP array? If I
have an array, [1, 2, 3, 4, 5], I remove 4 from the array, and then
try to access array[3], is it going to give me an error, since I
removed the element with the key 3? Or does PHP do some kind of key
adjusting in such a case?
If you change the ordering of an array (i.e., through a sort or a
shuffle), is the only way to have the indices correspond to the
position to copy the array to a new array using array_values().
http://php.net/manual/en/class.splfixedarray.php
This code:
$arr = array(0,1,2,3,4);
unset($arr[3]);
echo $arr[3]; // undefined index warning, execution continues;
echo isset($arr[3]) ? $arr[3] : '';
print_r($arr);
The print_r() outputs:
Array
(
[0] => 0
[1] => 1
[2] => 2
[4] => 4
)
This depends on the function you choose. Some maintain index association, some do not.
Protip:
Never expect two seemingly-similar PHP functions to behave anything like each other. It's the "English" of programming languages: full of crap stolen from other languages and loads of conventions that contradict each other, but everyone speaks it so hop on board the freedom train.
'murca.

sorted php array "introspection"

For some reason, I have a sorted php array:
"$arr_questions" = Array [6]
0 Array [6]
1 Array [6]
2 Array [6]
3 Array [6]
4 Array [6]
5 Array [6]
each of the positions is another array. This time it is associative. See position [0]:
0 = Array [6]
question_id 40
question La tercera pregunta del mundo
explanation
choices Array [3]
correct 0
answer 1
Without looping my array, is there any way to access directly this position 0, just knowing one of its properties?
Example... Imagine I have to change some property of the position of the array whose "question_id" property is 40. That is the only I know. I don't know if the question_id property is gonna be in the first or second or which position. And, for example, imagine I want to change the "answer" property to 2.
How can I access directly to that position without looping the whole array. I mean... I don't want to do this:
foreach ($arr_questions as $question){
if ($question["question_id"] == 40){
$question["answer"] == 2;
}
}
A PHP Array lets you access random values by its id.
It is actually a big deal, because in other languages array indices must be always integers.
However, PHP arrays work mostly like other-languages dictionaries, in which your key can be of other data types, like strings.
By that, if you want to be able to access some question, and you know the ID, then you should have constructed the array by letting your question_id be the index of each array entry.
If you can't do that, don't panic.
In the end you will have to make some kind of search, that's true.
But hey, then you have two cases:
a) Your array is big. Wow, in that case, you should run an optimized sorting algorithm, such as mergesort or quicksort, so that you can order your data quickly and then have them already sorted by your wanted field.
b) Your array is not-so-big. I think in that case it's no big deal, and the sorting can slow your application more than it should, and if you want to be quicker, you should cache the results of sorting the questions (if possible) or refactoring the array construction so it uses your wanted key as the array index.
As a side note, you can't map things avoid wasting some CPU time or some RAM space, and usually you can swap one for the other.
I mean, if you store just one array indexed by question_id then you can look up for question_id's in O(1) + O(array-access) time. If O(array-access) is a constant, then you can get to things in O(1). That means constant time, and it is as fast as it can get.
However, if you need other kind of searches you can end up with O(n * log(n)) or O(n²) time complexity.
But, if you had stored as many arrays as ways to order them you should need, you would need only O(1) time to access each of them. But, you would need O(n) space (where n here is the num of features to have direct access to).
That would increment the time to build the arrays (by a constant).
With your situation it's not possible without a loop, but if you change your array structure to this:
array(
39 => array(...),
40 => array(...)
)
Which 39 and 40 are your question_id, then you can access them so fast without any loop.
If you want or have to to keep that structure, then just write a function to get the array, the associative index and the value you want as parameters to search the array and return the found index, so you will not be forced to write this loop over and over ...
No, there is no way to access that element without looping over your array. You might abstract that search into a helper function, however.

Accessing array better via numeric or associative key?

I iterate over an array of arrays and access the array's value through associative keys, this is a code snippet. Note: i never iterate over the total array but only with a window of 10.
//extract array from a db table (not real code)
$array = $query->executeAndFetchAssociative;
$window_start = 0;
for($i = $window_start; $i<count($array) && $i<$window_start+10; $i++)
echo($entry["db_field"]);
This is a sort of paginator for a web interface. I receive the windows_start value and display hte next 10 values.
A conceptual execution:
Receive the windows_start number
Start the cycle entering the window_start-TH array of the outer array
Display the value of a field of the inner array via associative index
Move to window_start+1
The inner arrays have about 40 fields. The outer array can grow a lot as it rapresent a database table.
Now i see that as the outer array gets bigger the execution over the windows of 10 takes more and more time.
I need some "performance theory" on my code:
If I enter the values of inner arrays via numeric key can I have better performance? In general is quickier accessing the array values with numeric index than accessing with associative index (a string)?
How does it cost entering a random entry ($array[random_num]) of an array of length N ? O(N), O(N/2) just for example
Finally the speed of iterating over an array depends on the array lenght? I mean i always iterate on 10 elements of the array, but how does the array lenght impact on my fixed length iteration?
Thanks
Alberto
If I enter the values of inner arrays via numeric key can I have
better performance? In general is quicker accessing the array values
with numeric index than accessing with associative index (a string)?
There might be a theoretical speed difference for integer-based vs string-based access (it depends on what the hash function for integer values does vs the one for string values, I have not read the PHP source to get a definite answer), but it's certainly going to be negligible.
How does it cost entering a random entry ($array[random_num]) of an
array of length N ? O(N), O(N/2) just for example
Arrays in PHP are implemented through hash tables, which means that insertion is amortized O(1) -- almost all insertions are O(1), but a few may be O(n). By the way, O(n) and O(n/2) are the same thing; you might want to revisit a text on algorithmic complexity.
Finally the speed of iterating over an array depends on the array
length? I mean i always iterate on 10 elements of the array, but how
does the array length impact on my fixed length iteration?
No, array length is not a factor.
The performance drops not because of how you access your array but because of the fact that you seem to be loading all of the records from your database just to process 10 of them.
You should move the paging logic to the database itself by including an offset and a limit in your SQL query.
Premature optimization is the root of all evil. Additional numeric and associative arrays have a very different semantic meaning and are therefore usually not interchangeable. And last but not least: No. Arrays in PHP are implemented as Hashmaps and accessing them by key is always O(1)
In your case (pagination) it's much more usefull to only fetch the items you want to display instead of fetching all and slicing them later. SQL has the LIMIT 10 OFFSET 20-syntax for that.

Categories