I have a tree of categories in my database.
I also have a table of items associated with the tree by a category id.
Now, I want to list all items in a specific category and its children and their children, etc...
For now, I proceed this way:
Retrieve the id of all concerned categories.
Make a query in the items table with a WHERE clause like this: WHERE cat_id=2 OR cat_id=10 OR ...
I think this way cause the query to be very slow and very long if I have a lot of categories. A search can be in 100 categories sometimes.
Is there a better practice?
From gugl on "storing tree in relational database": http://mikehillyer.com/articles/managing-hierarchical-data-in-mysql/
Adjacency List is simple, but not good in most complex cases
Nested Set is complex from 1st view (mostly during write), but it much more like standard for storing and reading trees in RDBMs.
+1 about
EXPLAIN select * from table
that will help you to see bottlenecks.
Also try instead of
column1 = 1 or column1 = 2
something like:
column1 in (1, 2)
But anyway without indexes it wouldn`t help.
Related
I was wondering if mysql has a way to look at a column and only retrieve the results when it finds a unique column once. For example
if the table looks like this:
id name category
1 test Health
2 carl Health
3 bob Oscar
4 joe Technology
As you can see their are two rows that could have the same category. Is their a way to retrieve the result where the array will one only return the category once?
What I am trying to do is get all the categories in the database so I can loop through them later in the code and use them. For example if I wanted to created a menu, I would want the menu to list all the categories in the menu.
I know I can run
SELECT categories FROM dbname
but this returns duplicate rows where I only need the cateogry to return once. Is there a way to do this on the mysql side?
I assume I can just use php's array_unique();
but I feel like this adds more overhead, is this not something MYSQL can do on the backend?
group by worked perfectly #Fred-ii- please submit this as answer so I can get that approved for you. – DEVPROCB
As requested by the OP:
You can use GROUP BY col_of_choice in order to avoid duplicates be shown in the queried results.
Reference:
https://dev.mysql.com/doc/refman/5.5/en/group-by-handling.html
By using database normalization, you would create another table with an unique id and the category name and by that link those two together, like
select * from mytable1
on mytable1.cat = mytable2.id
group by mytable1.cat
You can ofcourse also use group by without multiple tables, but for the structure, I recommend doing it.
You can use select distinct:
SELECT DISTINCT categories
FROM dbname ;
For various reasons, it is a good idea to have a separate reference table with one row per category. This helps in many ways:
Ensures that the category names are consistent ("Technology" versus "tech" for instance).
Gives a nice list of categories that are available.
Ensures that a category sticks around, even if no names currently reference it.
Allows for additional information about categories, such as the first time it appears, or a longer description.
This is recommended. However, if you still want to leave the category in place as it is, I would recommend an index on dbname(categories). The query should take advantage of the index.
SELECT id, name from dbname GROUP BY categoryname
Hope this will help.
You can even use distinct category.
I have two entities, post and category which is a 1:n relationship.
I have a reference table with two columns, post_id,category_id
The categories table has an id column, a status column and a parent_id column
If a category is a child of another category (n-depth) then it's parent_id is not null.
If a category is online it's status is 1, otherwise it is 0.
What I need to do is find out if a post is visible.
This requires:
Foreach category joined to the post trace up it's tree to the root node (till a category has parent_id == null), if any of those categories have status 0 then that path is considered offline.
If any path is online then the post is considered visible, otherwise it is hidden.
The only way I can think of doing this (as semi-pseudo code) is:
function visible(category_ids){
categories = //select * from categories where id in(category_ids)
online = false
foreach(categories as category){
if(category.status == 0)
continue;
children = //select id from categories where parent_id = category.id
if(children)
online = visible(children)
}
return online
}
categories = //select c.id from categories c join posts_categories pc on pc.category_id = c.id where pc.post_id = post.id
post.online = visible(categories)
But that could end up being a lot of sql queries, is there a better way?
If nested sets are not an option, I know about the following:
If the data is ordered so that children of a parent always follow after it's parent, you can solve this with one database-query over all data by skipping hidden nodes in the output.
This works equally with a sorted nested set, too, the principle has been outlined in this answer however the algorithms about getting the depth do not work and I would suggest a recursive iterator that is able to remove hidden items.
Also if the data is not ordered, you can create a tree structure from the (unsorted) query of all rows like outlined in the answer to Nested array. Third level is disappearing. No recursion needed and you get a structure you can easily output then, I should have covered that for <ul>/<li> html style output in another answer, too.
Answer to How can I convert a series of parent-child relationships into a hierarchical tree?
Answer to How to obtain a nested HTML list from object's array recordset?
A classic database vs memory tradeoff. What you are doing is building a tree with leafs in it. To build the tree you need recursive loop the leafs. Coming from a database there are 2 scenarios:
Build the tree recursive with a query for each leaf. You hold 1 tree in memory. That is what you are doing.
Get a flat structure from the database, and build the tree recursive in memory. You hold a flat tree and the real tree in memory. That is your alternative way.
What is better depends on a lot of things: your hardware (disk access vs memory), the size of the tree to name two.
I am programming in PHP / MySQL / Javascript.
I have a list of parts which we want to link in a child / parent relationship with no limit on the amount of tiers.
When I am picking from a list of parts to add a child to a parent I limit the list of parts to exclude the parent itself, and any parts which are already children of that parent.
What I have discovered is that I also want to exclude the grandparents of the parent as otherwise we can get an incestuous relationship, which when I display the tree of parts will create an infinite loop.
Not only that but I cannot allow the child part to be a great grandparent of the parent or great great grandparent e.t.c.
Here is the SQL statement I use currently which I think could also be improved by using LEFT JOIN but I am not skillful enough with SQL at this point.
SELECT *
FROM sch_part_general
WHERE (sch_part_general.part_id <> $parentId)
AND (sch_part_general.part_id NOT IN
(SELECT part_id FROM sch_part_mapping WHERE parent_id = $parentId)
)
sch_part_general is a multi column table with all the parts, with part_id as the primary key.
sch_part_mapping is a two column mapping table with part_id (child) || parent_id (parent).
Could someone point me in the right direction with the SQL query? I am not keen on using a while loop to create the SQL statement as I think this will be quite inefficient but it is the only way I have considered might work so far.
MySQL doesn't have much (if any) support for hierarchical queries. If you want to stick to what is called theAdjacency List Model, all you can do is add a JOIN for each level you like to include. Needless to say this doesn't scale well.
On the other hand, if you can alter your Database Schema, I would suggest implementing the Nested Set Model.
A very good explantion of the Nested Set Model is presented in Mike Hillyer's blog
Limitations of the Adjacency List Model
Working with the adjacency list model in pure SQL can be difficult at
best. Before being able to see the full path of a category we have to
know the level at which it resides.
Nested Set Model
the concept of nested sets in SQL has been around for over a decade,
and there is a lot of additional information available in books and on
the Internet. In my opinion the most comprehensive source of
information on managing hierarchical information is a book called Joe
Celko’s Trees and Hierarchies in SQL for Smarties, written by a very
respected author in the field of advanced SQL, Joe Celko.
If you can't alter the schema, then there is no running away from looping as the answer from Lieven suggests.
if you can alter the schema, then maybe the following can also be enough for your case:
add a new column to the sch_part_mapping , lets call it "hierarchy_id". it is a value constructed to be unique int at the first time you start a totally new hierarchy (with the first grand grand grandest grandmost parent in any hierarchy - however its said in english) and is inserted to all lines belonging to a single hierarchy no matter at what level.
then, its easy to skip parents and grand parent found in the same hierarchy: to your sql above you can then add:
SELECT *
FROM sch_part_general
WHERE (sch_part_general.part_id <> $parentId)
AND (sch_part_general.part_id NOT IN
(SELECT part_id FROM sch_part_mapping WHERE parent_id = $parentId)
//addition here
and not exists (select * from sch_part_mapping where hierarchy_id= ? and parent_id = sch_part_general.part_id)
)
the question mark should be replaced with the relevant heirarchy id that you need to calculate.
EDIT: i missed that you have a variable for a specific parent ID, therefore the hierarchy_id can be calculated in the same query:
SELECT *
FROM sch_part_general
WHERE (sch_part_general.part_id <> $parentId)
AND (sch_part_general.part_id NOT IN
(SELECT part_id FROM sch_part_mapping WHERE parent_id = $parentId)
//addition here
and not exists (select * from sch_part_mapping where hierarchy_id= (select hierarchy_id from sch_part_mapping where parent_id = $parentId limit 1) and parent_id = sch_part_general.part_id)
)
With MySql/MariaDB you can use the Open Query Graph engine (http://openquery.com/graph/doc) which is a mysql plugin that lets you create a special table where you put the relationships, basically parentId and childId.
The magic is that you query this table with a special column latch depending of the value passed in the query will tell the OQGRAPH engine which command to execute. See the docs for details.
It handle not only tree (recursive 1-n relations), but graph data structures (recursive n-m relations) with weight (think for example that you want to store companies ownership, a company can have several subsidiaries and can also have several shareholders).
Whew, the title is a mouthful. Once again I find myself not knowing exactly how to ask this question so I will use an example. Let's say you're making a game that has items. Items have effects, bonuses and requirements.
So each record in the items table has multiple child records in the effects table, the bonuses table and the requirements table.
And you want to select, let's say, the 100 most recent items, including all of their effects, bonuses and requirements to display in the game.
What is the most optimized way to accomplish this. Can it be done in one query? And is that even practical? Thanks.
It could be achieved in one query, but it would be quite large and there would be a lot of doubling up. The only really ideal time doing something like this in one query is if there is a "has one" relationship. i.e. An item has one effect, which can be done with a simple join, and an item only returns one show.
Take a simple case of what you've given. You have 2 items, each with 3 effects with a many to many relationship.
SELECT item.*, effect.*
FROM item
JOIN item_effect ON item.id = item_effect.item_id
JOIN effect ON effect.id = item_effect.effect_id
The return could be:
item1 effect1
item1 effect2
item1 effect3
item2 effect2
item2 effect3
item2 effect4
Then you would have to loop through and group all of the items back together. With relationships to requirements and modifiers the query would be getting larger, but still fairly organized.
You could use some type of ORM (Object Relational Mapping). Which could make your code more readable, e.g. Using syntax from Kohana's ORM.
$items = ORM::factory('item')->find_all();
foreach($items as $item) {
$effects = $item->effects->find_all();
$bonuses = $item->bonuses->find_all();
$requirements = $item->requirement->find_all();
}
But for the 100 item example you suggested that will be 301 queries.
If you are displaying this on a web page, then pagination (showing 1-20 of 100) will lower that number.
The method you use really depends on your situation. Things to consider:
How often will this be used
Do they really need to see 100 items at once
Do they need to see all the relationships at one (click an item to view its effects, etc.)
You should be able todo something like this...
SELECT `users`.`nickname` , `events`.`nickname`
FROM `users` , `events`
WHERE `events`.`aid` = `users`.`aid`
GROUP BY `events`.`nickname`
To clarify, events.aid is the unique ID of the user. So when I fetch all these records and group them by event, I get a list of all unique event nicknames and the users that created them.
I'm trying to create a web index. Every advertiser in my database will be able to appear on a few categories, so I've added a categorys column, and in that column I'll store the categories separated by "," so it will look like:
1,3,5
The problem is that I have no idea how I'm supposed to select all of the advertisers in a certain category, like: mysql_query("SELECT * FROM advertisers WHERE category = ??");
If categories is another database table, you shouldn't use a plain-text field like that. Create a "pivot table" for the purpose, something like advertisers_categories that links the two tables together. With setup, you could do a query like:
SELECT A.* FROM advertisers AS A
JOIN advertisers_categories AS AC ON AC.advertiser_id = A.id
WHERE AC.category_id = 12;
The schema of advertisers_categories would look something like this:
# advertisers_categories
# --> id INT
# --> advertiser_id INT
# --> category_id INT
You should design your database in another way. Take a look at Atomicity.
Short: You should not store your value in the form of 1,3,5.
I won't give you an answer because if you starting you use it this way now, you going to run into much more severe problems later. No offense :)
It's not possible having comma-separated values to do this strictly in an SQL query. You could return every row and have a PHP script which goes through each row, using explode($row,',') and then if(in_array($exploded_row,'CATEGORY')) to check for the existence of the category.
The more common solution is to restructure your database. You're thinking too two-dimensionally. You're looking for the Many to Many Data Model
advertisers
-----------
id
name
etc.
categories
----------
id
name
etc.
ad_cat
------
advertiser_id
category_id
So ad_cat will have at least one (usually more) entry per advertiser and at least one (usually more) entry per category, and every entry in ad_cat will link one advertiser to one category.
The SQL query then involves grabbing every line from ad_cat with the desired category_id(s) and searching for an advertiser whose id is in the resulting query's output.
Your implementation as-is will make it difficult and taxing on your server's resources to do what you want.
I'd recommend creating a table that relates advertisers to categories and then querying on that table given a category id value to obtain the advertisers that are in that category.
That is a very wrong way to define categories, because your array of values cannot be normalized.
Instead, define another table called CATEGORIES, and use a JOIN-table to match CATEGORIES with ADVERTIZERS.
Only then you will be able to properly select it.
Hope this helps!