delete variable by reference - php

given the following code:
$tree = array();
$node =& $tree[];
// imagine tons of code that populates $tree here
how can i entirely delete the ZVAL $node points to by reference? Is that even possible?
Using unset(), only the reference is destroyed and not the node in $tree itself:
unset($node);
print_r($tree);
// outputs:
Array
(
[0] =>
)
I know this is the expected behaviour of unset($reference) and I also know how the ZVAL refcounter works.
But i really need to delete that node after processing in a specific corner case.
Can i somehow find the correct array index and unset the array element directly like unset($tree[$node_index])?
Disclaimer: The above example is minified and isolated. Actually i'm modifying a complex parser for a really ugly nested table data structure that is presented as a stream. The code heavily uses pointers as backreferences and i'd like to avoid refactoring the whole code.

If you grab a reference to an array element and unset the reference the array will not be affected at all -- that's just how unset works, and this behavior is not negotiable.
What you need to do is remember the key of the element in question and unset directly on the array afterwards:
$tree = array();
$tree[] = 'whatever';
end($tree);
$key = key($tree);
// ...later on...
unset($tree[$key]);
Of course this is extremely ugly and it requires you to keep both $tree (or a reference to it) and $key around. You can mitigate this somewhat by packaging the unset operation into an anonymous function -- if there's a good chance you are going to pull the trigger later, the convenience could offset the additional resource consumption:
$tree = array();
$tree[] = 'whatever';
end($tree);
$key = key($tree);
$killThisItem = function() use(&$tree, $key) { unset($tree[$key]); } ;
// ...later on...
$killThisItem();

Related

When is foreach with a parameter by reference dangerous?

I knew, that it can be dangerous to pass the items by reference in foreach.
In particular, one must not reuse the variable that was passed by reference, because it affects the $array, like in this example:
$array = ['test'];
foreach ($array as &$item){
$item = $item;
}
$item = 'modified';
var_dump($array);
array(1) {
[0]=>
&string(8) "modified"
}
Now this here bite me: the content of the array gets modified inside the function should_not_modify, even though I don't pass the $array by value.
function should_not_modify($array){
foreach($array as &$item){
$item = 'modified';
}
}
$array = ['test'];
foreach ($array as &$item){
$item = (string)$item;
}
should_not_modify($array);
var_dump($array);
array(1) {
[0]=>
&string(8) "modified"
}
I'm tempted to go through my whole codebase and insert unset($item); after each foreach($array => &$item).
But, since this is a big task and introduces a potentially useless line, I would like to know if there is a simple rule to know when foreach($array => &$item) is safe without a unset($item); after it, and when not.
Edit for clarification
I think I understand what happens and why. I also know what is best to do against: foreach($array as &$item){...};unset($item);
I know that this is dangerous after foreach($array as &$item):
reuse the variable $item
pass the array to a function
My question is: Are there other cases that are dangerous, and can we build an exhaustive list of what is dangerous. Or the other way round: is it possible to describe when it is not dangerous.
About foreach
First of all, some (maybe obvious) clarifications about two behaviors of PHP:
foreach($array as $item) will leave the variable $item untouched after the loop. If the variable is a reference, as in foreach($array as &$item), it will "point" to the last element of the array even after the loop.
When a variable is a reference then the assignation, e.g. $item = 'foo'; will change whatever the reference is pointing to, not the variable ($item) itself. This is also true for a subsequent foreach($array2 as $item) which will treat $item as a reference if it has been created as such and therefore will modify whatever the reference is pointing to (the last element of the array used in the previous foreach in this case).
Obviously this is very error prone and that is why you should always unset the reference used in a foreach to ensure following writes do not modify the last element (as in example #10 of the doc for the type array).
About the function that modifies the array
It's worth noting that - as pointed out in a comment by #iainn - the behavior in your example has nothing to do with foreach. The mere existence of a reference to an element of the array will allow this element to be modified. Example:
function should_not_modify($array){
$array[0] = 'modified';
$array[1] = 'modified2';
}
$array = ['test', 'test2'];
$item = & $array[0];
should_not_modify($array);
var_dump($array);
Will output:
array(2) {
[0] =>
string(8) "modified"
[1] =>
string(5) "test2"
}
This is admittedly very suprising but explained in the PHP documentation "What References Do"
Note, however, that references inside arrays are potentially dangerous. Doing a normal (not by reference) assignment with a reference on the right side does not turn the left side into a reference, but references inside arrays are preserved in these normal assignments. This also applies to function calls where the array is passed by value. [...] In other words, the reference behavior of arrays is defined in an element-by-element basis; the reference behavior of individual elements is dissociated from the reference status of the array container.
With the following example (copy/pasted):
/* Assignment of array variables */
$arr = array(1);
$a =& $arr[0]; //$a and $arr[0] are in the same reference set
$arr2 = $arr; //not an assignment-by-reference!
$arr2[0]++;
/* $a == 2, $arr == array(2) */
/* The contents of $arr are changed even though it's not a reference! */
It's important to understand that when creating a reference, for example $a = &$b then both $a and $b are equal. $a is not pointing to $b or vice versa. $a and $b are pointing to the same place.
So when you do $item = & $array[0]; you actually make $array[0] pointing to the same place as $item. Since $item is a global variable, and references inside array are preserved, then modifying $array[0] from anywhere (even from within the function) modifies it globally.
Conclusion
Are there other cases that are dangerous, and can we build an exhaustive list of what is dangerous. Or the other way round: is it possible to describe when it is not dangerous.
I'm going to repeat the quote from the PHP doc again: "references inside arrays are potentially dangerous".
So no, it's not possible to describe when it is not dangerous, because it is never not dangerous. It's too easy to forget that $item has been created as a reference (or that a global reference as been created and not destroyed), and reuse it elsewhere in your code and corrupt the array. This has long been a topic of debate (in this bug for example), and people call it either a bug or a feature...
The accepted answer is the best, but I'd like to give a complement: When is unset($item); not necessary after a foreach($array as &$item) ?
$item: if it is never reused after, it cannot harm.
$array: the last element is a reference. This always dangerous, for all the reasons already stated.
So what does change that element form being a reference to a value ?
the most cited: unlink($item);
when $item falls out of scope when the array is returned from a function, then the array becomes 'normal' after being return from the function.
function test(){
$array = [1];
foreach($array as &$item){
$item = $item;
}
var_dump($array);
return $array;
}
$a = test();
var_dump($a);
array(1) {
[0]=>
&int(1)
}
array(1) {
[0]=>
int(1)
}
But beware: if you do anything else before returning, it can bite !
You can break the reference by "json decode/encode"
function should_not_modify($array){
$array = json_decode(json_encode($array),false);
foreach($array as &$item){
$item = 'modified';
}
}
$array = ['test'];
foreach ($array as &$item){
$item = (string)$item;
}
should_not_modify($array);
var_dump($array);
The question is purely academic, and this is a bit of a hack. But, it's sort of fun, in a stupid programming way.
And of course it outputs:
array(1) {
[0]=>string(4) "test"
}
As a side the same thing works in JavaScript, which also can give you some wonky-ness from references.
I wish I had a good example, because I've had some "weird" stuff happen, I mean like some quantum entanglement stuff. This one time at a PHP camp, I had a recursive function ( pass by reference ) with a foreach ( pass by reference ) and well it sort of ripped a hole in the space time continuum.

Manipulate an array when looping

So $tr['tree'] is an array. $dic is an array stored as key values. I want to add the key source to that those arrays. It looks like the following code doesn't work as expected as I'm guessing $dic is a new instance of the array object inside $tr['tree'].
foreach($tr['tree'] as $dic){
$dic['source'] = $tr['source']." > ".$dic['name'];
}
Note, I'm coming from python where this would work brilliantly. So how would I do this in PHP?
foreach() creates copies of the items you're looping on, so $dic in the loop is detached from the array. If you want to modify the parent array, the safe method is to use:
foreach($array as $key => $value) {
$array[$key] = $new_value;
}
You could use a reference:
foreach($array as &$value) {
^---
$value = $new_value;
}
but that can lead to stupidly-hard-to-find bugs later. $value will REMAIN a reference after the foreach terminates. If you re-use that variable name later on for other stuff, you'll be modifying the array, because the var still points at it.

PHP - How to modify deeply nested associative arrays?

I'm having troubles building a deeply nested associative array in PHP. From the questions/answers I've seen here and there, I gathered I should use references but I just can't figure out how to do so.
I am using PHP 5.3
I'm parsing a file that looks like JSON. It contains nested "sections" enclosed in curly braces and I want to build up a tree representation of the file using nested associative arrays.
I'm starting with a root section and a "current section" variables:
$rootSection = array();
$currentSection = $rootSection;
$sections = array();
When I enter a new section ('{'), this is what I do:
$currentSection[$newSectionName] = array();
array_push($sections, $currentSection);
$currentSection = $currentSection[$newSectionName];
I use the $sections variable to pop out of a section ('}') into its parent one:
$currentSection = array_pop($sections);
And finally, when I want to add a property to my section, I basically do:
$currentSection[$name] = $value;
I've removed all attempt to use references from the above code, as nothing has worked so far...
I might as well say that I am used to Javascript, where references are the default...
But it's apparently not the case with PHP?
I've dumped my variables in my parsing code and I could see that all properties were correctly added to the same array, but the rootSection array or the one pushed inside $sections would not be updated identically.
I've been looking for a way to do this for a few hours now and I really don't get it...
So please share any help/pointers you might have for me!
UPDATE: The solution
Thanks to chrislondon I tried using =& again, and managed to make it work.
Init code:
$rootSection = array();
$currentSection =& $rootSection;
$sections = array();
New section ('{'):
$currentSection[$newSectionName] = array();
$sections[] =& $currentSection;
$currentSection =& $currentSection[$newSectionName];
Exiting a section ('}'):
$currentSection =& $sections[count($sections) - 1];
array_pop($sections);
Note that starting around PHP 5.3, doing something like array_push($a, &$b); is deprecated and triggers a warning. $b =& array_pop($a) is also not allowed; that's why I'm using the []=/[] operators to push/"pop" in my $sections array.
What I initially had problems with was actually this push/pop to my sections stack, I couldn't maintain a reference to the array and was constantly getting a copy.
Thanks for your help :)
If you want to pass something by reference use =& like this:
$rootSection = array();
$currentSection =& $rootSection;
$currentSection['foo'] = 'bar';
print_r($rootSection);
// Outputs: Array ( [foo] => bar )
I've also seen the syntax like this $currentSection = &$rootSection; but they're functionally the same.

Most efficient way to empty an array

I have an array containing several keys, values, objects etc.. I need to empty that array but I'd like to do it in the most efficient manner.
The best I can come up with is:
foreach ($array as $key => $val) unset($array[$key]);
But I don't like the idea of having to loop through the array to just empty it.. surely there's a nice slick/clever way of doing this without wasting memory creating a new array?
Note: I'm not sure myself if it does cost extra memory in creating the array as new again. If it doesn't then $array = new array(); would be a fine way of 'emptying' it.
Just try with:
$array = array();
It highly depends on what you mean.
To empty the current reference you can always do
$array = array();
To completely remove the current instance from the scope
unset($array);
Unfortunately both of these cases don't necessarily mean the memory associated with each element is released.
PHP works with something called "references" for your variables. Your variables are actually labels or references pointing to data, not the actual container for data.
The PHP garbage collector can offer more insight on this subject.
Now take a look at this example, taken from the docs:
$a = "new string";
$c = $b = $a;
xdebug_debug_zval( 'a' );# a: (refcount=3, is_ref=0)='new string'
unset( $b, $c );
xdebug_debug_zval( 'a' );# a: (refcount=1, is_ref=0)='new string'
This unfortunately applies to all your variables. Including arrays. Cleaning up the memory associated with the array is a whole different subject I'm afraid.
I've noticed a longer discussion in the comments regarding using unset() on each individual key.
This feels like extremely bad practice. Consider the following code:
class A{
function __construct($name){$this->name=$name;}
function __destruct(){echo $this->name;}
}
$a=array();
$b=array();
$c=array();
for($i=0;$i<5;$i++) {
$a[]=new A('a');
$b[]=new A('b');
$c[]=new A('c');
}
unset($a);
$b=array();
echo PHP_EOL.'done'.PHP_EOL;
This will output:
aaaaabbbbb
done
ccccc
When the reference to a particular data structure reaches 0, it is cleaned from memory.
Both =array() and unset will do the same thing.
Now if you don't actually need array() you can use null :
$array=null;
This keeps the label in memory, but removes the reference it held to any particular data.
It's simple:
$array = array();
$array will be existing and type of array (but empty), and your data can be garbaged later from memory.
Well... why not: $array = array(); ?
As Suresh Kamrushi pointed out, I could use array_keys:
foreach (array_keys($array) as $key) unset($array[$key]);
This is probably the nicest solution for now.. but I'm sure someone will come up with something better soon :)
Try this:
// $array is your original array
$array = array_combine( array_keys( $array ), array_fill( 0, count($array), 0 ) );
The above will blank your array keeping the keys intact.
Hope this helps.

while(list($key, $value) = each($array)) vs. foreach($array as $key => $value)?

Recently I experienced this weird problem:
while(list($key, $value) = each($array))
was not listing all array values, where replacing it with...
foreach($array as $key => $value)
...worked perfectly.
And, I'm curious now.. what is the difference between those two?
Had you previously traversed the array? each() remembers its position in the array, so if you don't reset() it you can miss items.
reset($array);
while(list($key, $value) = each($array))
For what it's worth this method of array traversal is ancient and has been superseded by the more idiomatic foreach. I wouldn't use it unless you specifically want to take advantage of its one-item-at-a-time nature.
array each ( array &$array )
Return the current key and value pair from an array and advance the array cursor.
After each() has executed, the array cursor will be left on the next element of the array, or past the last element if it hits the end of the array. You have to use reset() if you want to traverse the array again using each.
(Source: PHP Manual)
Well, one difference is that each() will only work on arrays (well only work right). foreach will work on any object that implements the traversable interface (Which of course includes the built in array type).
There may be a micro-optimization in the foreach. Basically, foreach is equivilant to the following:
$array->rewind();
while ($array->valid()) {
$key = $array->key();
$value = $array->current();
// Do your code here
$array->next();
}
Whereas each basically does the following:
$return = $array->valid() ? array($array->key(), $array->current()) : false;
$array->next();
return $return;
So three lines are the same for both. They are both very similar. There may be some micro-optimizations in that each doesn't need to worry about the traversable interface... But that's going to be minor at best. But it's also going to be offset by doing the boolean cast and check in php code vs foreach's compiled C... Not to mention that in your while/each code, you're calling two language constructs and one function, whereas with foreach it's a single language construct...
Not to mention that foreach is MUCH more readable IMHO... So easier to read, and more flexible means that -to me- foreach is the clear winner. (that's not to say that each doesn't have its uses, but personally I've never needed it)...
Warning! Foreach creates a copy of the array so you cannot modify it while foreach is iterating over it. each() still has a purpose and can be very useful if you are doing live edits to an array while looping over it's elements and indexes.
// Foreach creates a copy
$array = [
"foo" => ['bar', 'baz'],
"bar" => ['foo'],
"baz" => ['bar'],
"batz" => ['end']
];
// while(list($i, $value) = each($array)) { // Try this next
foreach($array as $i => $value) {
print $i . "\n";
foreach($value as $index) {
unset($array[$index]);
}
}
print_r($array); // array('baz' => ['end'])
Both foreach and while will finish their loops and the array "$array" will be changed. However, the foreach loop didn't change while it was looping - so it still iterated over every element even though we had deleted them.
Update: This answer is not a mistake.
I thought this answer was pretty straight forward but it appears the majority of users here aren't able to appreciate the specific details I mention here.
Developers that have built applications using libdom (like removing elements) or other intensive map/list/dict filtering can attest to the importance of what I said here.
If you do not understand this answer it will bite you some day.
If you passed each an object to iterate over, the PHP manual warns that it may have unexpected results.
What exactly is in $array

Categories