Is the PHP implementation of a Heap really a full implementation?
When I read this article, http://en.wikipedia.org/wiki/Heap_%28data_structure%29, I get the idea that a child node has a specific parent, and that a parent has specific children.
When I look at the example in the PHP documentation however, http://au.php.net/manual/en/class.splheap.php, it seems that child nodes all share the same 'level', but the specific parent/child information is not important.
For example, which node is the parent for each of the three nodes who are ranked 10th in the PHP example?
In my application, when a user selects 'node 156', I need to know who its children are so that I can pay them each a visit. (I could make their identities 'node 1561', 'node 1562', etc, so the relationship is obvious).
Is the PHP Heap implementation incomplete? Should I forget the Spl class and go my own way? Or am I missing something about how heaps should operate? Or perhaps I should be looking at a particular heap variant?
Thanks heaps!
The API for this heap implementation does not allow array access, which is what you need in this case. Heaps are typically used to implement other structures that allow you to easily remove items from the top.
You could create a wrapper iterator that seeks to the positions you need. I suspect that this would be a poorly performing solution and not a very good one.
In this situation, a BinaryTree is what I think you need. Given a spot in the tree you can go down it's children because they're directly linked. I do have a BinarySearchTree that will keep the tree ordered and you can call getRoot to get a copy of the BinaryTree. There's also an AvlTree which will keep the tree balanced for optimal searching.
A Heap is usually used to answer questions that revolve around "what is the max/min element in the set".
While a heap could coincidentally efficiently answer "who are the children of the max/min node", a heap doesn't stand out as a good data structure to use when you need to access an arbitrary node to answer your question(a node that isn't the max node). It's also not the data structure to use if there are specific parent child relationships that need to be represented and maintained, because the children can and do get swapped between different parent nodes.
So php's heap implementation is definitely not incomplete on these grounds. You simply need a different data structure.
Related
The structure that I've got is as follows
User table
id PK
name
email
username
password
UserHierarchy table
user_parent_id FK of User
user_child_id FK of User
(Composite primary key)
I've written these 2 relationships, in order to retrieve who is the father of a user and who is a child of a user
public function parent()
{
return $this->hasManyThrough(\App\Models\User::class, \App\Models\UserHierarchy::class, 'user_child_id', 'id', 'id', 'user_parent_id');
}
public function children()
{
return $this->hasManyThrough(\App\Models\User::class, \App\Models\UserHierarchy::class, 'user_parent_id', 'id', 'id', 'user_child_id');
}
In order to get all the children, grandchildren and so on, I've developed this adicional relationship, that takes use of eager loading
public function childrenRecursive()
{
return $this->children()->with('childrenRecursive.children');
}
So far so good, when I find a User with an id, I can get all the downwards tree by using childrenRecursive. What I'm trying to achieve now is to re-use these relationships to filter a certain set of results, and by that I mean: When it's a certain User (for example id 1), I want a collection of Users that belong in his downward tree (children recursive) and his first direct parents as well.
$model->where(function ($advancedWhere) use ($id) {
$advancedWhere->whereHas('parent', function ($advancedWhereHas) use ($filterValue) {
$advancedWhereHas->orWhere('user_child_id', $id);
//I want all users that are recorded as his parents
})->whereHas('childrenRecursive', function ($advancedWhereHas) use ($id) {
// Missing code, I want all users that are recorded as his children and downwards
})->get();
This is the complete tree I'm testing and the result produced above (if I add a similar orWhere on the childrenRecursive) is that it returns every User that has a Parent-Child relationship. E.g User 2 should return every number except 11 and 12, and it's returning every number except 11 (because 11 is not a child of anyone)
I'm going to answer your question first, but in the second half of the answer I have proposed an alternative which I strongly suggest adopting.
MySQL (unlike, incidentally, Microsoft SQL) doesn't have an option to write recursive queries. Accordingly, there is no good Laravel relationship to model this.
As such, there is no way for Laravel to do it other than naively, which, if you have a complex tree, is going to lead to many queries.
Essentially when you load your parent, you will only have access to its children (as a relationship collection). Then you would foreach through its children (and then their children, etc, recursively) to generate the whole tree. Each time you do this, it performs new queries for the child and its children. This is essentially what you're currently doing, and you will find that as your data set grows it is going to start becoming very slow. In the end, this provides you with a data structure on which you can apply your filters and conditions in code. You will not be able to achieve this in a single query.
If you are writing to the db a lot, i.e. adding lots of new children but rarely reading the results, then this may be your best solution.
(Edit: abr's comment below linked me to the release notes for MySQL 8 which does have this functionality. My initial response was based on MySQL 5.7. However, I'm not aware of Laravel/Eloquent having a canonical relationship solution employing this yet. Furthermore I have previously used this functionality in MSSQL and nested sets are a better solution IMO.
Furthermore, Laravel isn't necessarily coupled to MySQL - it just often is the db of choice. It will therefore probably never use such a specific solution to avoid such tight coupling.)
However most hierarchical structure read more than they write, in which case this is going to start stressing your server out considerably.
If this is the case, I would advise looking into:
https://en.wikipedia.org/wiki/Nested_set_model
We use https://github.com/lazychaser/laravel-nestedset which is an implementation of the above, and it works very well for us.
It is worth mentioning that it can be slow and memory intensive when we redefine the whole tree (we have around 20,000 parent-child relationships), but this only has to happen when we've made an error in the hierarchy that can't be unpicked manually and this is rare (we haven't done it in 6 months). Again, if you think you may have to do that regularly, this may not be the best option for you.
Lets say I have a table called Tags with an id columnm name column, and a parent_id column. Many tags are nested using the parent_id column. How would I check if Tag A has Tag B as a non-direct child efficiently.
Previously I have selected all tags that have a parent_id of the current tag and then got the result and repeated for any child elements.
How would I do this more efficiently to get all tags that match a search and is a direct or non-direct child.
Thanks for the help,
Jason
Rather than discuss in the comments... here is what I would recommend:
If you want to stick with MySQL but can play with the structure of your database then absolutely this "Closure Table" pattern suggested by Bill Karwin is the way to go. It allows you to keep your data in a flat table design while abstracting the multi-level tree structure into a separate table for easy data extraction.
If you want to try a different Relational Database System then you might try SQL Server Express which is free from Microsoft. In full disclosure, I don't use this so I don't know what functionality is excluded (and I'm sure something is otherwise you wouldn't get it for free). So please do some research to make sure Recursive Common Table Expressions (CTEs) are available. If they are then you can use Pinal Dave's blog post for recursive SQL technique using CTEs.
Otherwise if you only think you will only ever have a handful of levels to work with, you can use the original suggestion and hardcode the number of levels.
I'm currently at an impasse in reguards to the structural design of my website. At the moment I'm using objects to simplify the structure of my site (I have a person object, a party object, a position object, etc...) and in theory each of these is a row from it's respective table in the database.
Now from what I've learnt, OO Design is good for keeping things simple and easy to use/implement, which I agree with - it makes my code look so much cleaner and easier to maintain, but what I'm confused about is how I go about linking my objects to the database.
Let's say there is a person page. I create a person object, which equals one mysql query (which is reasonable), but then that person might have multiple positions which I need to fetch and display on a single page.
What I am currently doing is using a method called getPositions from the person object which gets the data from mysql and creates a separate position object for each row, passing in the data as an array. That keeps the queries down to a minimum (2 to a page) but it seems like a horrible implementation and to me, breaks the rules of object orientated design (should I want to change a mysql row, I'd need to change it in multiple places) but the alternative is worse.
In this case the alternative is just getting the ID's that I need and then creating separate positions, passing in the ID which then goes on to getting the row from the database in the constructor. If you have 20 positions per page, it can quickly add up and I've read about how much Wordpress is criticised for it's high number of queries per page and it's CPU usage. The other thing I'll need to consider in this case is sorting, and doing it this way means I'll need to sort the data using PHP, which surely can't be as efficient as natively doing it in mysql.
Of course, pages will be (and can be) cached, but to me, this seems almost like cheating for poorly built applications. In this case, what is the correct solution?
The way you're doing it now is at least on the right track. Having an array in the parent object with references to the children is basically how the data is represented in the database.
I'm not completely sure from your question if you're storing the children as references in the parent's array, but you should be and that's how PHP should store them by default. If you also use a singleton pattern for your objects that are pulled from the database, you should never need to modify multiple objects to change one row as you suggest in your question.
You should probably also create multiple constructors for your objects (using static methods that return new instances) so you can create them from their ID and have them pull the data or just create them from data you already have. The latter case would be used when you're creating children; you can have the parent pull all of the data for its children and create all of them using only one query. Getting a child from its ID will probably be used somewhere else so its good just to have if its needed.
For sorting, you could create additional private (or public if you want) arrays that have the children sorted in a particular way with references to the same objects the main array references.
I am creating a class that implements the composite pattern; the class is supposed to be able to represent a tree (so let's call it Tree).
In order to use as little memory as possible, when a Tree class instance is passed an array, it is kept as an array until the index is required; at which time a new child Tree instance is created and returned
The reasoning for this: My Tree instances pass through many filters and their content might be traversed and modified by many observers, but their content will be requested only once (on the final rendering). Moreover, a certain chunk might not be rendered at all, so it would be wasted memory to create instances of Tree that are not even going to be used.
In other words, if one uses
$class->set($array,$index); //(or $class[$index] = $array)
, the array is stored as normal. But when someone uses
$class->get($index) //(or $class[$index])
, a new instance of Tree will be returned (and cached for subsequent calls).
However, I am left with a dilemma: Do I
create a new instance of Tree when someone sets data?
Pros: code is easy to write, easy to maintain, easy for someone to pick up and improve upon
Cons: memory consumption, even more if we consider a part of the data entered might not be used. Manipulation gets more messy as special methods have to be written for special cases (as opposed to dealing with native arrays).
leave it as is, but have people do $class[$index] = array($index2=>$array)?
Pros: everything stays stored as an array, so normal array functions work, and memory consumption is
minimal.
cons: the usage is less "elegant", and possibly more complex.
I know this is not a forum, so I am not asking for opinions, but for the advice of someone having already developed a (potentially heavy) tree structure in PHP and what is the best way (or "accepted way") to go about it.
Demo creating the Tree on construct and Demo creating Tree on demand are simple tests you can run for yourself.
The latter creates new node objects only when they're accessed, thus the memory consumption is not that high.
The beauty of object oriented programming, encapsulation and facades is that nobody cares what you do internally, as long as the API does what it's supposed to. Whoever / Whatever is using your TreeNode doesn't need to know how it functions internally. Just go with whatever feels right. Design your APIs right and you can change internals any time.
I have an abstract syntax tree which I need to iterate. The AST is generated by the lemon port to PHP.
Now "normally", I'd do it with the brand new and shiny (PHP 5.3.1) SPL classes, and it would look like this:
$it = new \RecursiveIteratorIterator(
new \RecursiveArrayIterator($ast['rule']),
\RecursiveIteratorIterator::SELF_FIRST);
Actually, that's what I'm already doing in another part of the code which determinates a rough type of the entire tree (i.e it can be an assignment, a condition, etc). Now details aside, the only important thing is the iteration is done RecursiveIteratorIterator::SELF_FIRST, that is, top-down.
Going back to my problem, I need to iterate the AST bottom-up, that is, something like RecursiveIteratorIterator::CHILD_FIRST, in order to do some substitutions and optimizations in the tree.
The problem is, these operations need to be context-aware, i.e. I need the path down to the current node. And since I want to iterate bottom-up, I can't have that with RecursiveIteratorIterator.
Well think about it for a second. I want to iterate bottom-up and have the top-down context (a stack) of the current node, at each iteration. Technically it should be possible, since RecursiveIteratorIterator must first go to the tail of the tree, in order to iterate backwards. In its way to the tail, it could cache the current position, and simply pop out elements as it returns back from recursion.
Now this is a keyword: caching. This is why I suspect it should be possible with another SPL class: RecursiveCachingIterator.
The question is: is it really possible? If yes, how?
I've been trying to puzzle around with some code, without success, and the documentation is scarce. Really, really scarce.
Whoever finds the most elegant solution to this using SPL, hats off! You're a PHP guru!
PS: in case it's not clear, I'm looking for as much SPL (re)usage as possible. I know I could write my own recursive functions with a custom stack, no need to remind me about that.
I have managed to get it working by inheriting RecursiveIteratorIterator and managing the stack in ::endChildren() and ::callGetChildren respectively. Maybe this will help someone. Hats off to myself :-)