Stop recursive incestuous child parent relationship in mysql - php

I am programming in PHP / MySQL / Javascript.
I have a list of parts which we want to link in a child / parent relationship with no limit on the amount of tiers.
When I am picking from a list of parts to add a child to a parent I limit the list of parts to exclude the parent itself, and any parts which are already children of that parent.
What I have discovered is that I also want to exclude the grandparents of the parent as otherwise we can get an incestuous relationship, which when I display the tree of parts will create an infinite loop.
Not only that but I cannot allow the child part to be a great grandparent of the parent or great great grandparent e.t.c.
Here is the SQL statement I use currently which I think could also be improved by using LEFT JOIN but I am not skillful enough with SQL at this point.
SELECT *
FROM sch_part_general
WHERE (sch_part_general.part_id <> $parentId)
AND (sch_part_general.part_id NOT IN
(SELECT part_id FROM sch_part_mapping WHERE parent_id = $parentId)
)
sch_part_general is a multi column table with all the parts, with part_id as the primary key.
sch_part_mapping is a two column mapping table with part_id (child) || parent_id (parent).
Could someone point me in the right direction with the SQL query? I am not keen on using a while loop to create the SQL statement as I think this will be quite inefficient but it is the only way I have considered might work so far.

MySQL doesn't have much (if any) support for hierarchical queries. If you want to stick to what is called theAdjacency List Model, all you can do is add a JOIN for each level you like to include. Needless to say this doesn't scale well.
On the other hand, if you can alter your Database Schema, I would suggest implementing the Nested Set Model.
A very good explantion of the Nested Set Model is presented in Mike Hillyer's blog
Limitations of the Adjacency List Model
Working with the adjacency list model in pure SQL can be difficult at
best. Before being able to see the full path of a category we have to
know the level at which it resides.
Nested Set Model
the concept of nested sets in SQL has been around for over a decade,
and there is a lot of additional information available in books and on
the Internet. In my opinion the most comprehensive source of
information on managing hierarchical information is a book called Joe
Celko’s Trees and Hierarchies in SQL for Smarties, written by a very
respected author in the field of advanced SQL, Joe Celko.

If you can't alter the schema, then there is no running away from looping as the answer from Lieven suggests.
if you can alter the schema, then maybe the following can also be enough for your case:
add a new column to the sch_part_mapping , lets call it "hierarchy_id". it is a value constructed to be unique int at the first time you start a totally new hierarchy (with the first grand grand grandest grandmost parent in any hierarchy - however its said in english) and is inserted to all lines belonging to a single hierarchy no matter at what level.
then, its easy to skip parents and grand parent found in the same hierarchy: to your sql above you can then add:
SELECT *
FROM sch_part_general
WHERE (sch_part_general.part_id <> $parentId)
AND (sch_part_general.part_id NOT IN
(SELECT part_id FROM sch_part_mapping WHERE parent_id = $parentId)
//addition here
and not exists (select * from sch_part_mapping where hierarchy_id= ? and parent_id = sch_part_general.part_id)
)
the question mark should be replaced with the relevant heirarchy id that you need to calculate.
EDIT: i missed that you have a variable for a specific parent ID, therefore the hierarchy_id can be calculated in the same query:
SELECT *
FROM sch_part_general
WHERE (sch_part_general.part_id <> $parentId)
AND (sch_part_general.part_id NOT IN
(SELECT part_id FROM sch_part_mapping WHERE parent_id = $parentId)
//addition here
and not exists (select * from sch_part_mapping where hierarchy_id= (select hierarchy_id from sch_part_mapping where parent_id = $parentId limit 1) and parent_id = sch_part_general.part_id)
)

With MySql/MariaDB you can use the Open Query Graph engine (http://openquery.com/graph/doc) which is a mysql plugin that lets you create a special table where you put the relationships, basically parentId and childId.
The magic is that you query this table with a special column latch depending of the value passed in the query will tell the OQGRAPH engine which command to execute. See the docs for details.
It handle not only tree (recursive 1-n relations), but graph data structures (recursive n-m relations) with weight (think for example that you want to store companies ownership, a company can have several subsidiaries and can also have several shareholders).

Related

Update Field using a set of IDs from another table at random

I have 3 Tables. model, category, and document. Documents belong in a category, which belongs to a model, and models can have multiple categories, which can have multiple documents.
I'm trying to build & execute this query in PHP, assigning each document a randomly selected category from a list of the categories in the currently selected model.
So, if a model has 10 categories (cat1-cat10), with each of those categories having 10 documents, the end result would make 1000 documents have a random_category_id field of cat1 - cat10 assigned at random, but not overwriting the existing category_id of the document.
Later in the application, I need to be able to calculate when document.category_id == document.random_category_id.
Is there a way to do this in one query. I'm new to SQL & PHP (and haven't mastered any kind of JOIN yet), so please forgive the blunders in database design & mixed coding approaches. I know the below example will not execute.
I'm using MySQL 5.5.28 with InnoDB.
Pseudocode Example
$catList = SELECT category_id FROM category WHERE model_id = '$current_model_id'
UPDATE document.random_category_id = RANDOM($catList) WHERE document.model_id = '$current_model_id'
Thank you!

Backend app in OO PHP: Structuring classes/tables efficiently

I'm currently working on an app backend (business directory). Main "actor" is an "Entry", which will have:
- main category
- subcategory
- tags (instead of unlimited sub-levels of division)
I'm pretty new to OOP but I still want to use it here. The database is MySql and I'll be using PDO.
In an attempt to figure out what database table structure should I use in order to support the above classification of entries, I was thinking about a solution that Wordpress uses - establish relationship between an entry and cats/subcats/tags through several tables (terms, taxonomies, relationships). What keeps me from this solution at the moment is the fact that each relationship of any kind is represented by a row in the relationships table. Given 50,000 entries I would have, attaching to a particular entry: main cat, subcat and up to 15 tags might slow down the app (or I am wrong)?
I then learned a bit about Table Data Gateway which seemed an excellent solution because I liked the idea of having one table per a class but then I read there is virtually no way of successful combating the impedence missmatch between the OOP and relational-mapping.
Are there any other approaches that you may see fit for this situation? I think I will be going with:
tblentry
tblcategory
tblsubcategory
tbltag
structure. Relationships would be based on the parent IDs but I+'m wondering is that enough? Can I be using foreign key and cascade delete options here (that is something I am not too familiar with and it seems to me as a more intuitive way of having relationships between the elements in tables)?
having a table where you store the relationship between your table is a good idea, and through indexes and careful thinking you can achieve very fast results.
since each entry must represent a different kind of link between two entities (subcategory to main entry, tag to subcategory) you need at least (and at the very most) three fields:
id1 (or the unique id of the first entity)
linkid (linking to a fourth table where each link is described)
id2 (or the unique id of the second entity)
those three fields can and should be indexed.
now the fourth table to achieve this kind of many-to-many relationship will describe the nature of the link. since many different type of relationship will exist in the table, you can't keep what the type is (child of, tag of, parent of) in the same table.
that fourth table (reference) could look like this:
id nature table1 table2
1 parent of entry tags
2 tag of tags entry
the table 1 field tells you which table the first id refers to, likewise with table2
the id is the number between the two fields in your relationship table. only the id field should be indexed. the nature field is more for the human reader then for joining tables or organizing data

checking value in n-depth tree?

I have two entities, post and category which is a 1:n relationship.
I have a reference table with two columns, post_id,category_id
The categories table has an id column, a status column and a parent_id column
If a category is a child of another category (n-depth) then it's parent_id is not null.
If a category is online it's status is 1, otherwise it is 0.
What I need to do is find out if a post is visible.
This requires:
Foreach category joined to the post trace up it's tree to the root node (till a category has parent_id == null), if any of those categories have status 0 then that path is considered offline.
If any path is online then the post is considered visible, otherwise it is hidden.
The only way I can think of doing this (as semi-pseudo code) is:
function visible(category_ids){
categories = //select * from categories where id in(category_ids)
online = false
foreach(categories as category){
if(category.status == 0)
continue;
children = //select id from categories where parent_id = category.id
if(children)
online = visible(children)
}
return online
}
categories = //select c.id from categories c join posts_categories pc on pc.category_id = c.id where pc.post_id = post.id
post.online = visible(categories)
But that could end up being a lot of sql queries, is there a better way?
If nested sets are not an option, I know about the following:
If the data is ordered so that children of a parent always follow after it's parent, you can solve this with one database-query over all data by skipping hidden nodes in the output.
This works equally with a sorted nested set, too, the principle has been outlined in this answer however the algorithms about getting the depth do not work and I would suggest a recursive iterator that is able to remove hidden items.
Also if the data is not ordered, you can create a tree structure from the (unsorted) query of all rows like outlined in the answer to Nested array. Third level is disappearing. No recursion needed and you get a structure you can easily output then, I should have covered that for <ul>/<li> html style output in another answer, too.
Answer to How can I convert a series of parent-child relationships into a hierarchical tree?
Answer to How to obtain a nested HTML list from object's array recordset?
A classic database vs memory tradeoff. What you are doing is building a tree with leafs in it. To build the tree you need recursive loop the leafs. Coming from a database there are 2 scenarios:
Build the tree recursive with a query for each leaf. You hold 1 tree in memory. That is what you are doing.
Get a flat structure from the database, and build the tree recursive in memory. You hold a flat tree and the real tree in memory. That is your alternative way.
What is better depends on a lot of things: your hardware (disk access vs memory), the size of the tree to name two.

php / Mysql associate tree search and item query

I have a tree of categories in my database.
I also have a table of items associated with the tree by a category id.
Now, I want to list all items in a specific category and its children and their children, etc...
For now, I proceed this way:
Retrieve the id of all concerned categories.
Make a query in the items table with a WHERE clause like this: WHERE cat_id=2 OR cat_id=10 OR ...
I think this way cause the query to be very slow and very long if I have a lot of categories. A search can be in 100 categories sometimes.
Is there a better practice?
From gugl on "storing tree in relational database": http://mikehillyer.com/articles/managing-hierarchical-data-in-mysql/
Adjacency List is simple, but not good in most complex cases
Nested Set is complex from 1st view (mostly during write), but it much more like standard for storing and reading trees in RDBMs.
+1 about
EXPLAIN select * from table
that will help you to see bottlenecks.
Also try instead of
column1 = 1 or column1 = 2
something like:
column1 in (1, 2)
But anyway without indexes it wouldn`t help.

Selecting rows from MySQL

I'm trying to create a web index. Every advertiser in my database will be able to appear on a few categories, so I've added a categorys column, and in that column I'll store the categories separated by "," so it will look like:
1,3,5
The problem is that I have no idea how I'm supposed to select all of the advertisers in a certain category, like: mysql_query("SELECT * FROM advertisers WHERE category = ??");
If categories is another database table, you shouldn't use a plain-text field like that. Create a "pivot table" for the purpose, something like advertisers_categories that links the two tables together. With setup, you could do a query like:
SELECT A.* FROM advertisers AS A
JOIN advertisers_categories AS AC ON AC.advertiser_id = A.id
WHERE AC.category_id = 12;
The schema of advertisers_categories would look something like this:
# advertisers_categories
# --> id INT
# --> advertiser_id INT
# --> category_id INT
You should design your database in another way. Take a look at Atomicity.
Short: You should not store your value in the form of 1,3,5.
I won't give you an answer because if you starting you use it this way now, you going to run into much more severe problems later. No offense :)
It's not possible having comma-separated values to do this strictly in an SQL query. You could return every row and have a PHP script which goes through each row, using explode($row,',') and then if(in_array($exploded_row,'CATEGORY')) to check for the existence of the category.
The more common solution is to restructure your database. You're thinking too two-dimensionally. You're looking for the Many to Many Data Model
advertisers
-----------
id
name
etc.
categories
----------
id
name
etc.
ad_cat
------
advertiser_id
category_id
So ad_cat will have at least one (usually more) entry per advertiser and at least one (usually more) entry per category, and every entry in ad_cat will link one advertiser to one category.
The SQL query then involves grabbing every line from ad_cat with the desired category_id(s) and searching for an advertiser whose id is in the resulting query's output.
Your implementation as-is will make it difficult and taxing on your server's resources to do what you want.
I'd recommend creating a table that relates advertisers to categories and then querying on that table given a category id value to obtain the advertisers that are in that category.
That is a very wrong way to define categories, because your array of values cannot be normalized.
Instead, define another table called CATEGORIES, and use a JOIN-table to match CATEGORIES with ADVERTIZERS.
Only then you will be able to properly select it.
Hope this helps!

Categories