When is foreach with a parameter by reference dangerous? - php

I knew, that it can be dangerous to pass the items by reference in foreach.
In particular, one must not reuse the variable that was passed by reference, because it affects the $array, like in this example:
$array = ['test'];
foreach ($array as &$item){
$item = $item;
}
$item = 'modified';
var_dump($array);
array(1) {
[0]=>
&string(8) "modified"
}
Now this here bite me: the content of the array gets modified inside the function should_not_modify, even though I don't pass the $array by value.
function should_not_modify($array){
foreach($array as &$item){
$item = 'modified';
}
}
$array = ['test'];
foreach ($array as &$item){
$item = (string)$item;
}
should_not_modify($array);
var_dump($array);
array(1) {
[0]=>
&string(8) "modified"
}
I'm tempted to go through my whole codebase and insert unset($item); after each foreach($array => &$item).
But, since this is a big task and introduces a potentially useless line, I would like to know if there is a simple rule to know when foreach($array => &$item) is safe without a unset($item); after it, and when not.
Edit for clarification
I think I understand what happens and why. I also know what is best to do against: foreach($array as &$item){...};unset($item);
I know that this is dangerous after foreach($array as &$item):
reuse the variable $item
pass the array to a function
My question is: Are there other cases that are dangerous, and can we build an exhaustive list of what is dangerous. Or the other way round: is it possible to describe when it is not dangerous.

About foreach
First of all, some (maybe obvious) clarifications about two behaviors of PHP:
foreach($array as $item) will leave the variable $item untouched after the loop. If the variable is a reference, as in foreach($array as &$item), it will "point" to the last element of the array even after the loop.
When a variable is a reference then the assignation, e.g. $item = 'foo'; will change whatever the reference is pointing to, not the variable ($item) itself. This is also true for a subsequent foreach($array2 as $item) which will treat $item as a reference if it has been created as such and therefore will modify whatever the reference is pointing to (the last element of the array used in the previous foreach in this case).
Obviously this is very error prone and that is why you should always unset the reference used in a foreach to ensure following writes do not modify the last element (as in example #10 of the doc for the type array).
About the function that modifies the array
It's worth noting that - as pointed out in a comment by #iainn - the behavior in your example has nothing to do with foreach. The mere existence of a reference to an element of the array will allow this element to be modified. Example:
function should_not_modify($array){
$array[0] = 'modified';
$array[1] = 'modified2';
}
$array = ['test', 'test2'];
$item = & $array[0];
should_not_modify($array);
var_dump($array);
Will output:
array(2) {
[0] =>
string(8) "modified"
[1] =>
string(5) "test2"
}
This is admittedly very suprising but explained in the PHP documentation "What References Do"
Note, however, that references inside arrays are potentially dangerous. Doing a normal (not by reference) assignment with a reference on the right side does not turn the left side into a reference, but references inside arrays are preserved in these normal assignments. This also applies to function calls where the array is passed by value. [...] In other words, the reference behavior of arrays is defined in an element-by-element basis; the reference behavior of individual elements is dissociated from the reference status of the array container.
With the following example (copy/pasted):
/* Assignment of array variables */
$arr = array(1);
$a =& $arr[0]; //$a and $arr[0] are in the same reference set
$arr2 = $arr; //not an assignment-by-reference!
$arr2[0]++;
/* $a == 2, $arr == array(2) */
/* The contents of $arr are changed even though it's not a reference! */
It's important to understand that when creating a reference, for example $a = &$b then both $a and $b are equal. $a is not pointing to $b or vice versa. $a and $b are pointing to the same place.
So when you do $item = & $array[0]; you actually make $array[0] pointing to the same place as $item. Since $item is a global variable, and references inside array are preserved, then modifying $array[0] from anywhere (even from within the function) modifies it globally.
Conclusion
Are there other cases that are dangerous, and can we build an exhaustive list of what is dangerous. Or the other way round: is it possible to describe when it is not dangerous.
I'm going to repeat the quote from the PHP doc again: "references inside arrays are potentially dangerous".
So no, it's not possible to describe when it is not dangerous, because it is never not dangerous. It's too easy to forget that $item has been created as a reference (or that a global reference as been created and not destroyed), and reuse it elsewhere in your code and corrupt the array. This has long been a topic of debate (in this bug for example), and people call it either a bug or a feature...

The accepted answer is the best, but I'd like to give a complement: When is unset($item); not necessary after a foreach($array as &$item) ?
$item: if it is never reused after, it cannot harm.
$array: the last element is a reference. This always dangerous, for all the reasons already stated.
So what does change that element form being a reference to a value ?
the most cited: unlink($item);
when $item falls out of scope when the array is returned from a function, then the array becomes 'normal' after being return from the function.
function test(){
$array = [1];
foreach($array as &$item){
$item = $item;
}
var_dump($array);
return $array;
}
$a = test();
var_dump($a);
array(1) {
[0]=>
&int(1)
}
array(1) {
[0]=>
int(1)
}
But beware: if you do anything else before returning, it can bite !

You can break the reference by "json decode/encode"
function should_not_modify($array){
$array = json_decode(json_encode($array),false);
foreach($array as &$item){
$item = 'modified';
}
}
$array = ['test'];
foreach ($array as &$item){
$item = (string)$item;
}
should_not_modify($array);
var_dump($array);
The question is purely academic, and this is a bit of a hack. But, it's sort of fun, in a stupid programming way.
And of course it outputs:
array(1) {
[0]=>string(4) "test"
}
As a side the same thing works in JavaScript, which also can give you some wonky-ness from references.
I wish I had a good example, because I've had some "weird" stuff happen, I mean like some quantum entanglement stuff. This one time at a PHP camp, I had a recursive function ( pass by reference ) with a foreach ( pass by reference ) and well it sort of ripped a hole in the space time continuum.

Related

Unexpected behavior with references

I have some unexpected behavior with references:
foreach ($this->data as &$row)
{
$row['h'] = 1;
}
foreach ($this->data as $id => $row)
{
... in some cases $row[$id] = $row;
}
The result is that the last element of the array is replaced with the second to last element of the array. It is fixed with the following code:
foreach ($this->data as $key => $row)
{
$this->data[$key]['h'] = 1;
}
Unfortunately, I don't have more time to spend on this. Maybe it is an error with PHP (PHP 5.5.9-1ubuntu4) or something I don't know about references?
There is a perfectly logical explanation and this is not a bug!
PHP 5 introduces the possibility of modifying the contents of the array directly by assigning the value of each element to the iterated variable by reference rather than by value. Consider this code, for example:
$a = array (’zero’,’one’,’two’);
foreach ($a as &$v) {
}
foreach ($a as $v) {
}
print_r ($a);
It would be natural to think that, since this little script does nothing to the array, it will not affect its contents... but that’s not the case! In fact, the script provides the following output:
Array
(
[0] => zero
[1] => one
[2] => one
)
As you can see, the array has been changed, and the last key now contains the value ’one’. How is that possible? The first foreach loop does not make any change to the array, just as we would expect. However, it does cause $v to be assigned a reference to each of $a’s elements, so that, by the time the loop is over, $v is, in fact, a reference to $a[2].
As soon as the second loop starts, $v is now assigned the value of each element. However, $v is already a reference to $a[2]; therefore, any value assigned to it will be copied automatically into the last element of the arrays! Thus, during the first iteration, $a[2] will become zero, then one, and then one again, being effectively copied on to itself. To solve this problem, you should always unset the variables you use in your by-reference foreach loops—or, better yet, avoid using the former altogether.
When looping over an array by reference, you need to manually let go of the reference at the end of your for loop to avoid weird behaviors like this one. So your first foreach should be:
foreach ($this->data as &$row)
{
.... code ....
}
unset($row);
In this case, unset is only destroying the reference, not the contents referenced by $row.
See the warning in the PHP foreach documentation

Most efficient way to empty an array

I have an array containing several keys, values, objects etc.. I need to empty that array but I'd like to do it in the most efficient manner.
The best I can come up with is:
foreach ($array as $key => $val) unset($array[$key]);
But I don't like the idea of having to loop through the array to just empty it.. surely there's a nice slick/clever way of doing this without wasting memory creating a new array?
Note: I'm not sure myself if it does cost extra memory in creating the array as new again. If it doesn't then $array = new array(); would be a fine way of 'emptying' it.
Just try with:
$array = array();
It highly depends on what you mean.
To empty the current reference you can always do
$array = array();
To completely remove the current instance from the scope
unset($array);
Unfortunately both of these cases don't necessarily mean the memory associated with each element is released.
PHP works with something called "references" for your variables. Your variables are actually labels or references pointing to data, not the actual container for data.
The PHP garbage collector can offer more insight on this subject.
Now take a look at this example, taken from the docs:
$a = "new string";
$c = $b = $a;
xdebug_debug_zval( 'a' );# a: (refcount=3, is_ref=0)='new string'
unset( $b, $c );
xdebug_debug_zval( 'a' );# a: (refcount=1, is_ref=0)='new string'
This unfortunately applies to all your variables. Including arrays. Cleaning up the memory associated with the array is a whole different subject I'm afraid.
I've noticed a longer discussion in the comments regarding using unset() on each individual key.
This feels like extremely bad practice. Consider the following code:
class A{
function __construct($name){$this->name=$name;}
function __destruct(){echo $this->name;}
}
$a=array();
$b=array();
$c=array();
for($i=0;$i<5;$i++) {
$a[]=new A('a');
$b[]=new A('b');
$c[]=new A('c');
}
unset($a);
$b=array();
echo PHP_EOL.'done'.PHP_EOL;
This will output:
aaaaabbbbb
done
ccccc
When the reference to a particular data structure reaches 0, it is cleaned from memory.
Both =array() and unset will do the same thing.
Now if you don't actually need array() you can use null :
$array=null;
This keeps the label in memory, but removes the reference it held to any particular data.
It's simple:
$array = array();
$array will be existing and type of array (but empty), and your data can be garbaged later from memory.
Well... why not: $array = array(); ?
As Suresh Kamrushi pointed out, I could use array_keys:
foreach (array_keys($array) as $key) unset($array[$key]);
This is probably the nicest solution for now.. but I'm sure someone will come up with something better soon :)
Try this:
// $array is your original array
$array = array_combine( array_keys( $array ), array_fill( 0, count($array), 0 ) );
The above will blank your array keeping the keys intact.
Hope this helps.

detecting infinite array recursion in PHP?

i've just reworked my recursion detection algorithm in my pet project dump_r()
https://github.com/leeoniya/dump_r.php
detecting object recursion is not too difficult - you use spl_object_hash() to get the unique internal id of the object instance, store it in a dict and compare against it while dumping other nodes.
for array recursion detection, i'm a bit puzzled, i have not found anything helpful. php itself is able to identify recursion, though it seems to do it one cycle too late. EDIT: nvm, it occurs where it needs to :)
$arr = array();
$arr[] = array(&$arr);
print_r($arr);
does it have to resort to keeping track of everything in the recursion stack and do shallow comparisons against every other array element?
any help would be appreciated,
thanks!
Because of PHP's call-by-value mechanism, the only solution I see here is to iterate the array by reference, and set an arbitrary value in it, which you later check if it exists to find out if you were there before:
function iterate_array(&$arr){
if(!is_array($arr)){
print $arr;
return;
}
// if this key is present, it means you already walked this array
if(isset($arr['__been_here'])){
print 'RECURSION';
return;
}
$arr['__been_here'] = true;
foreach($arr as $key => &$value){
// print your values here, or do your stuff
if($key !== '__been_here'){
if(is_array($value)){
iterate_array($value);
}
print $value;
}
}
// you need to unset it when done because you're working with a reference...
unset($arr['__been_here']);
}
You could wrap this function into another function that accepts values instead of references, but then you would get the RECURSION notice from the 2nd level on. I think print_r does the same too.
Someone will correct me if I am wrong, but PHP is actually detecting recursion at the right moment. Your assignation simply creates the additional cycle. The example should be:
$arr = array();
$arr = array(&$arr);
Which will result in
array(1) { [0]=> &array(1) { [0]=> *RECURSION* } }
As expected.
Well, I got a bit curious myself how to detect recursion and I started to Google. I found this article http://noteslog.com/post/detecting-recursive-dependencies-in-php-composite-values/ and this solution:
function hasRecursiveDependency($value)
{
//if PHP detects recursion in a $value, then a printed $value
//will contain at least one match for the pattern /\*RECURSION\*/
$printed = print_r($value, true);
$recursionMetaUser = preg_match_all('#\*RECURSION\*#', $printed, $matches);
if ($recursionMetaUser == 0)
{
return false;
}
//if PHP detects recursion in a $value, then a serialized $value
//will contain matches for the pattern /\*RECURSION\*/ never because
//of metadata of the serialized $value, but only because of user data
$serialized = serialize($value);
$recursionUser = preg_match_all('#\*RECURSION\*#', $serialized, $matches);
//all the matches that are user data instead of metadata of the
//printed $value must be ignored
$result = $recursionMetaUser > $recursionUser;
return $result;
}

Return first key of associative array in PHP

I'm trying to obtain the first key of an associative array, without creating a temporary variable via array_keys() or the like, to pass by reference. Unfortunately both reset() and array_shift() take the array argument by reference, so neither seem to be viable results.
With PHP 5.4 I'll be in heaven; array_keys($array)[0];, but unfortunately this of course is not an option either.
I could create a function to serve the purpose, but I can only imagine there is some concoction of PHP's array_* functions that will produce the desired result in a single statement, that I cannot think of or come up with.
So:
$array = array('foo' => 'bar', 'hello' => 'world');
$firstKey = assorted_functions($array); // $firstKey = 'foo'
The reason for the "no reference" clause in my question is only for the fact that I assume array_keys() will be required (if there is a way passing by reference, please fire away)
I'd use key(), but that requires a reset() as I'm not sure where the pointer will be at the time of this operation.
Addendum
I'm following up on a realization I had recently: as I mentioned in the comments, it'll use the memory all the same, so if that's a concern, this question hath no solution.
$a = range(0,99999);
var_dump(memory_get_peak_usage()); // int(8644416)
$k = array_keys($a)[0];
var_dump(memory_get_peak_usage()); // int(17168824)
I knew this, as PHP doesn't have such optimization capabilities, but figured it warranted explicit mention.
The brevity of the accepted answer is nice though, and'll work if you're working with reasonably sized arrays.
Although array_shift(array_keys($array)); will work, current(array_keys($array)); is faster as it doesn't advance the internal pointer.
Either one will work though.
Update
As #TomcatExodus noted, array_shift(); expects an array passed by reference, so the first example will issue an error. Best to stick with current();
You can use reset and key:
reset( $array );
$first_key = key( $array );
or, you can use a function:
function firstIndex($a) { foreach ($a as $k => $v) return $k; }
$key = firstIndex( $array );
array_shift(array_keys($array))
each() still a temporary required, but potentially a much smaller overhead than using array_keys().
What about using array_slice (in combination with array_keys for associative arrays)?
$a = range(0,999999);
var_dump(memory_get_peak_usage());
$k = array_keys(array_slice($a, 0, 1, TRUE))[0];
var_dump(memory_get_peak_usage());
var_dump($k);
$k = array_keys($a)[0];
var_dump(memory_get_peak_usage());
Gives as output (at least with me):
int(36354360)
int(36355112)
int(0)
int(72006024)
int(0)

while(list($key, $value) = each($array)) vs. foreach($array as $key => $value)?

Recently I experienced this weird problem:
while(list($key, $value) = each($array))
was not listing all array values, where replacing it with...
foreach($array as $key => $value)
...worked perfectly.
And, I'm curious now.. what is the difference between those two?
Had you previously traversed the array? each() remembers its position in the array, so if you don't reset() it you can miss items.
reset($array);
while(list($key, $value) = each($array))
For what it's worth this method of array traversal is ancient and has been superseded by the more idiomatic foreach. I wouldn't use it unless you specifically want to take advantage of its one-item-at-a-time nature.
array each ( array &$array )
Return the current key and value pair from an array and advance the array cursor.
After each() has executed, the array cursor will be left on the next element of the array, or past the last element if it hits the end of the array. You have to use reset() if you want to traverse the array again using each.
(Source: PHP Manual)
Well, one difference is that each() will only work on arrays (well only work right). foreach will work on any object that implements the traversable interface (Which of course includes the built in array type).
There may be a micro-optimization in the foreach. Basically, foreach is equivilant to the following:
$array->rewind();
while ($array->valid()) {
$key = $array->key();
$value = $array->current();
// Do your code here
$array->next();
}
Whereas each basically does the following:
$return = $array->valid() ? array($array->key(), $array->current()) : false;
$array->next();
return $return;
So three lines are the same for both. They are both very similar. There may be some micro-optimizations in that each doesn't need to worry about the traversable interface... But that's going to be minor at best. But it's also going to be offset by doing the boolean cast and check in php code vs foreach's compiled C... Not to mention that in your while/each code, you're calling two language constructs and one function, whereas with foreach it's a single language construct...
Not to mention that foreach is MUCH more readable IMHO... So easier to read, and more flexible means that -to me- foreach is the clear winner. (that's not to say that each doesn't have its uses, but personally I've never needed it)...
Warning! Foreach creates a copy of the array so you cannot modify it while foreach is iterating over it. each() still has a purpose and can be very useful if you are doing live edits to an array while looping over it's elements and indexes.
// Foreach creates a copy
$array = [
"foo" => ['bar', 'baz'],
"bar" => ['foo'],
"baz" => ['bar'],
"batz" => ['end']
];
// while(list($i, $value) = each($array)) { // Try this next
foreach($array as $i => $value) {
print $i . "\n";
foreach($value as $index) {
unset($array[$index]);
}
}
print_r($array); // array('baz' => ['end'])
Both foreach and while will finish their loops and the array "$array" will be changed. However, the foreach loop didn't change while it was looping - so it still iterated over every element even though we had deleted them.
Update: This answer is not a mistake.
I thought this answer was pretty straight forward but it appears the majority of users here aren't able to appreciate the specific details I mention here.
Developers that have built applications using libdom (like removing elements) or other intensive map/list/dict filtering can attest to the importance of what I said here.
If you do not understand this answer it will bite you some day.
If you passed each an object to iterate over, the PHP manual warns that it may have unexpected results.
What exactly is in $array

Categories