I am building a tree structure in PHP and I could leave it as an array, or turn it into a tree of objects. I think there will be much better performance if I leave it as an array, but I'm not sure.
In the case of an array, the owning object would have a reference to the root element, and that's it. The root element would contain sub-arrays which may, in turn, contain their own sub-arrays.
In the case of objects, my mapper would need to instantiate them on load, and for every child object their would be a reference from its parent. For a 300 node tree, this would mean 299 references, as opposed to 1 when using arrays.
So, it seems to me that the performance would be much better if I use arrays rather than objects. Is this correct? It's important because sacrificing the behaviour of objects will be a considerable trade off in this case.
Related
I'm having a hard time finding clear information on PHP's garbage collection as it relates to objects referencing other objects. Specifically, I'm trying to understand what happens when I have a chain of objects that reference each other, and I destroy or unset() the first object in the chain. Will PHP know to GC the remaining objects in the chain or do I have to destroy them all individually to prevent a memory leak?
For example, Say I have a class BinaryTree with a variable rootNode and a Node class that extends BinaryTree and has 3 variables, a parent Node, a leftChild Node, and a rightChild Node. The rootNode will be the Node with a parent that is null, but every other Node created will reference either the rootNode or a subsequent child Node before it as the parent and any Nodes after it as either leftChild or rightChild. The rootNode, therefore, is the only way to reference any of the child nodes.
If I unset() the rootNode, will all other Nodes remain in memory? Would I need to iterate through the entire tree unsetting each Node individually to prevent them from just existing without a way to reference them anymore, or will PHP know that those objects can no longer be referenced and GC them?
So far what I'm reading leads me to believe I would need to iterate the entire list of objects to destroy them individually, but I'm wondering if anyone has any experience with this and can offer some clarity. TIA.
The PHP manual has a section on this:
...a zval container also has an internal reference counting mechanism to optimize memory usage. This second piece of additional information, called "refcount", contains how many variable names (also called symbols) point to this one zval container.
[...]
...when the "refcount" reaches zero, the variable container is removed from memory
[...]
...unsetting a variable removes the symbol, and the reference count of the variable container it points to is decreased by one.
And so you get a cascade effect. Initially all the nodes of your binary tree node have a reference count of 1, but by unsetting the rootNode, the reference count of the root object becomes 0, and the memory used for the root is freed. This involves unsetting the references it has to other objects, whose reference counters are therefore also reduced, ...etc.
So, no, you do not have to unlink the nodes or destroy them yourself. The reference count mechanism will trigger a cascade of memory being freed.
I'm currently working on a project which requires me to only output objects when querying database and/or processing data. I have been hearing from other senior developers that, objects are "cheap" in term of memory and, I "kind of" agree with that.
So, my questions:
If I have an array as result of a query. Why not use the array "as is" since it already exists?
Is it really a good practice (besides standardization) to convert it into an object?
Does it really increase performance? (I can't grasp it because by converting the result into an object, I'm basically creating another entity, besides the already existing array, containing the same data).
In my opinion, I would say that you should NOT convert them into objects.
You already have them as arrays, so transforming/copying them into object would require more memory as at a given time, you end up with both, the objects and arrays. If you absolutely do not need any OOP functionality (inheritance, private proprieties/methods, ...) you should stick with your arrays.
But keep in mind: You can often fetch the database and get objects instead of arrays as result. (See PDO fetchAll) Then you already have your objects without having the array.
If you wonder how these both parties work in terms of performance, have a look at
Using arrays VS objects for storing data.
From what I understand, (but I might as well be wrong) PHP arrays are not arrays compared to the classical C arrays. A PHP array is a sort of object, since you have freedoms to i.e. change an array size (simply add a value). So you do not have the exact same performance as you would have with C arrays.
Conclusion: Feel free to keep your arrays. If you do not need the advantages of objects, no need to use ressources to convert them. After all, it's also a question of programming style, best practices and project guidelines.
It depends on what next is the code doing with the array.
If you leave the data in array you should be careful where you pass it next, because you do not want that array go all over the code base back and forth.
With array you cannot define any definable interface between classes or logic code blocks and that is not very predictable.
When you have array passed through 10 methods in 5 classes modifying the contents of that array its very hard to track what and where is happening to the contents.
I like this answer: https://www.reddit.com/r/PHP/comments/29eope/stop_abusing_arrays_in_php/cik8tet/
I'm working on code to manage a collection of unique objects. The first prototype of this code utilises an associative array, basically as that's the way I've always done it.
However, I'm also keen on taking advantage of functionality that's been added to more modern versions of PHP such as [SplObjectStorage][1] for doing this instead, partly as a learning experience, partly because it's bound to offer advantages (benchmarks I've seen suggest that SplObjectStorage can be faster than arrays in a lot of cases).
The current implementation has an associative array that I check with in_array() to see if an object is already in the array before adding a new object to it.
The big problem I can see with SplObjectStorage is that it doesn't seem (at first glance) to support key/value associative array behaviour, and can only be treated as an indexed array. However, the documentation for the newer features of PHP isn't up to the standards of the documentation of more established parts of the language and I might simply be missing something.
Can I use SplObjectStorage in place of an associative array? If so, how do I define the key when adding a new object? More importantly, what are the relative advantages and disadvantages of SplObjectStorage when compared to associative arrays?
You shouldn't see the SplObjectStorage as a key-value store, but merely a set of objects. Something is in the set or not, but its position is not important.
The "key" of an element in the SplObjectStorage is in fact the hash of the object. It makes it that it is not possible to add multiple copies of the same object instance to an SplObjectStorage, so you don't have to check if a copy already exists before adding.
However, in PHP 5.4 there is a new method called getHash() which you can override that will return the "hash" of the object. This - in a sense - returns/set the key so you can allow it to store under different conditions.
The main advantage of SplObjectStorage is the fact that you gain lots of methods for dealing and interacting with different sets (contains(), removeAll(), removeAllExcept() etc). Its speed is marginally better, but the memory usage is worse than normal PHP arrays.
Results after running this benchmark with 10,000 iterations on PHP 5.6.13:
Type
Time to fill
Time to check
Memory
SplObjectStorage
0.021285057068
0.019490000000
2131984
Array
0.021125078201
0.020912000000
1411440
Arrays use 34% less memory and are about the same speed as SplObjectStorage.
Results with PHP 7.4.27:
Type
Time to fill
Time to check
Memory
SplObjectStorage
0.019295692444
0.016039848328
848384
Array
0.024008750916
0.022011756897
3215416
Arrays use 3.8 times more memory and are 24% slower than SplObjectStorage.
Results with PHP 8.1.1:
Type
Time to fill
Time to check
Memory
SplObjectStorage
0.009704589844
0.003775596619
768384
Array
0.014604568481
0.012760162354
3215416
Arrays use 4.2 times more memory and are 50% slower than SplObjectStorage.
When all the memory allocated to array is used up, the memory allocated to it will be doubled. In this context, a collection of objects may be more effective structure.
I am creating a class that implements the composite pattern; the class is supposed to be able to represent a tree (so let's call it Tree).
In order to use as little memory as possible, when a Tree class instance is passed an array, it is kept as an array until the index is required; at which time a new child Tree instance is created and returned
The reasoning for this: My Tree instances pass through many filters and their content might be traversed and modified by many observers, but their content will be requested only once (on the final rendering). Moreover, a certain chunk might not be rendered at all, so it would be wasted memory to create instances of Tree that are not even going to be used.
In other words, if one uses
$class->set($array,$index); //(or $class[$index] = $array)
, the array is stored as normal. But when someone uses
$class->get($index) //(or $class[$index])
, a new instance of Tree will be returned (and cached for subsequent calls).
However, I am left with a dilemma: Do I
create a new instance of Tree when someone sets data?
Pros: code is easy to write, easy to maintain, easy for someone to pick up and improve upon
Cons: memory consumption, even more if we consider a part of the data entered might not be used. Manipulation gets more messy as special methods have to be written for special cases (as opposed to dealing with native arrays).
leave it as is, but have people do $class[$index] = array($index2=>$array)?
Pros: everything stays stored as an array, so normal array functions work, and memory consumption is
minimal.
cons: the usage is less "elegant", and possibly more complex.
I know this is not a forum, so I am not asking for opinions, but for the advice of someone having already developed a (potentially heavy) tree structure in PHP and what is the best way (or "accepted way") to go about it.
Demo creating the Tree on construct and Demo creating Tree on demand are simple tests you can run for yourself.
The latter creates new node objects only when they're accessed, thus the memory consumption is not that high.
The beauty of object oriented programming, encapsulation and facades is that nobody cares what you do internally, as long as the API does what it's supposed to. Whoever / Whatever is using your TreeNode doesn't need to know how it functions internally. Just go with whatever feels right. Design your APIs right and you can change internals any time.
I've discussed lately with one fellow a example of code where person uses nesting of arrays. And we started to discuss about use of nested arrays in code in general.
When we pass parameters to generators etc. it's sometimes easier to use nested arrays. I think this is ok to use nested arrays in these situations but it's better to use classes and good validation.
And second is that when we have XML, HTML, JSON or other parser they often return nested arrays also. I think that it's also ok to use nested arrays in these scenarios.
So when it's not ok to use nested arrays?
I think that it's when we use them with no connection with above scenarios. Especially when we use hardcoded nested arrays in source code like in this example:
$data = array(array('cat', 12),
array('dog', 15),
array('turtle', 4));
I think that this can be cansidered as a like-to-be antipattern of programming when you use nested arrays hardcoded often. You have no validation, messed code, likely to have bugs. And my fellow is not like this and tells that it's not antipattern, because it's ok to use nested arrays in programming and no reason to tell other that's not ok.
Could you please help us decide if it can be treated like a antipattern to use hardcoded nested arrays or not?
All pros and cons would be apprieciated!
It depends.
There are situations where nested arrays fit well. There are other situations where you don't want to explicitly create an object, and it will be quicker and shorter to use nested arrays.
Of course, by using arrays instead of objects, the meaning of the array items is lost. In your example, someone who reads the code has no idea if the second item in a nested array is an age of an animal, or maybe its weight, or an identifier of the pet's owner from a database. On the other hand, if you provide an array of objects like this:
class Animal
{
public $animal;
public $age;
}
the meaning of the second item becomes much easier to understand.
This does not mean that "validation" will be easier to do, but it's rather the problem with the language. For example, nothing forbids to do:
$someDog = new Animal();
$someDog->animal = 12;
$someDog->age = 'Cat';
Nested arrays can be a bad thing in strongly typed languages. For example, in C#, having Collection<object> is probably not a best thing to do: you don't know what are the properties of each object, what are those objects, etc. When you have Collection<Animal>:
you are sure that every item of a collection is an Animal.
you know what are the precise properties available for every item, and what are their types. If Animal has a property Age of type int, you know that every item in a collection has this property, and its value is an integer. (Note: separate items may have additional properties, if Animal object is inherited by another objects which extend it, but in all cases, Age property will be available).
The same thing does work only partially in PHP. For example, you cannot be sure that the type of the same property is always the same among objects in the array, thus validation is required.
My personal opinion: in languages like PHP, I would rather use nested arrays when it's quicker and easier to use nested arrays, and objects when there is a reason to use objects.
Don't know whether this would work, however based on your code snippet I would go for another approach:
$data = array('cat'=>12,
'dog'=>15,
'turtle'=>4,
);
Or if animals have more attributes you can do:
$data = array('cat'=>array(12, 'attr'),
'dog'=>array(15, 'attr'),
'turtle'=>array(4, 'attr'),
);
It all really depends on the data which way to go:
classes vs arrays