I am making a simple todo app (Laravel4/MySQL) and it needs the ability to make tasks and subtasks (limiting it to max. 3 levels)
I was checking out Nested-Set implementations for Laravel here and here. Is it an overkill for my requirement?
I'm guessing nested-sets saves hierarchy data globally (against say, a per-user or per-project basis) and are better for items like a multilevel menu with a limited number of items.
What is the best implementation for my case, where hundreds of users would have a multitude of projects and each having hundreds of multilevel tasks/sub-tasks? Would there be unnecessary traversals/overheads if I implement nested-sets for my case?
I recommend to read the Managing hierarchical data in mysql article.
Briefly,
CREATE TABLE category(
category_id INT AUTO_INCREMENT PRIMARY KEY,
name VARCHAR(20) NOT NULL,
parent INT DEFAULT NULL
);
INSERT INTO category VALUES(1,'ELECTRONICS',NULL),(2,'TELEVISIONS',1),(3,'TUBE',2),
(4,'LCD',2),(5,'PLASMA',2),(6,'PORTABLE ELECTRONICS',1),(7,'MP3 PLAYERS',1),(8,'FLASH',7),
(9,'CD PLAYERS',6),(10,'2 WAY RADIOS',6);
SELECT * FROM category ORDER BY category_id;
+-------------+----------------------+--------+
| category_id | name | parent |
+-------------+----------------------+--------+
| 1 | ELECTRONICS | NULL |
| 2 | TELEVISIONS | 1 |
| 3 | TUBE | 2 |
| 4 | LCD | 2 |
| 5 | PLASMA | 2 |
| 6 | PORTABLE ELECTRONICS | 1 |
| 7 | MP3 PLAYERS | 6 |
| 8 | FLASH | 7 |
| 9 | CD PLAYERS | 6 |
| 10 | 2 WAY RADIOS | 6 |
+-------------+----------------------+--------+
The query retrieve all your data:
SELECT t1.name AS lev1, t2.name as lev2, t3.name as lev3, t4.name as lev4
FROM category AS t1
LEFT JOIN category AS t2 ON t2.parent = t1.category_id
LEFT JOIN category AS t3 ON t3.parent = t2.category_id
WHERE t1.name = 'ELECTRONICS';
The retrieving only leaf names:
SELECT t1.name FROM
category AS t1 LEFT JOIN category as t2
ON t1.category_id = t2.parent
WHERE t2.category_id IS NULL;
The retrieving one path:
SELECT t1.name AS lev1, t2.name as lev2, t3.name as lev3, t4.name as lev4
FROM category AS t1
LEFT JOIN category AS t2 ON t2.parent = t1.category_id
LEFT JOIN category AS t3 ON t3.parent = t2.category_id
WHERE t1.name = 'ELECTRONICS' AND t3.name = 'FLASH';
My answer would be Closure Table.
I read about a few ways to solve hierarchies in the book SQL Antipatterns (https://books.google.com/books/about/SQL_Antipatterns.html?id=Ghr4RAAACAAJ). I definitely recommend reading the book.
My favorite way to implement hierarchies is via closure tables. This is a great source that explains them in depth: http://technobytz.com/closure_table_store_hierarchical_data.html.
To summarize: make one table that keeps track of the actual items in the hierarchy (e.g. task_id, task_description, time_opened, etc.) and another table to track the relations. This second table should have things such as task_id and parent_task_id. The best trick with these tables is to keep track of every parent-child relation, not just the direct parent-child relations. So if you have Task 1 that has a child task, Task 2, and Task 2 has a child task, Task 3, keep track of the parent child relation between Task 1 and Task 2 as well as between Task 1 and Task 3.
The tradeoff with closure tables vs nested sets is that closure tables consume more memory, but have less computing needed when doing operations. This is because you store every relation between every task (this takes memory) and the simple availability of all of these relationships makes it faster for the RDBMS to get information about the relationships.
Hope this helps!
Related
I have two tables:
1st: reasons
id | title
---------------------------------
1 | Customer didn't like it
2 | Needs improving
3 | Wrong format
2nd: projects
id | title | rejected
------------------------------------
1 | Priject 1 | Null
2 | Priject 2 | 1
3 | Priject 3 | 1
4 | Priject 4 | Null
5 | Priject 5 | 2
I need to display Reasons.Title and number of project rejected for that reason. I've managed to join those tables together, with this code
SELECT reasons.title as title, count(*) as num
FROM reasons
LEFT JOIN reasons on projects.rejected = reasons.id
WHERE projects.rejectedIS NOT NULL
GROUP BY projects.rejected
Now I need to add percentage, so my final table looks like this
title | num | percentage
--------------------------------------------------
Customer didn't like it | 2 | 66,6
Needs improving | 1 | 33,3
The format of percentage is of course not important.
I would like to get this done with MySql, so I do not need to use two queries and extra PHP, but if there is another solution, other from MySql, I'm open to suggestions
You can do this by getting the total in the FROM clause:
SELECT r.title as title, count(*) as num,
COUNT(*) / pp.cnt as ratio
FROM reasons r JOIN
projects p
ON p.rejected = r.id CROSS JOIN
(SELECT COUNT(*) as cnt FROM projects p WHERE rejects IS NOT NULL) pp
GROUP BY r.title, pp.cnt;
Notes:
This fixes the table names, so the query has a projects table.
This removes the WHERE because it is not needed.
This changes the LEFT JOIN to an inner join.
I'm migrating database between 2 systems using PHP and MySQL.
In the old one I have 3 tables of interest:
t1
id (int)
...
t2
id (int)
t1_id (int)
d (string)
...
t3
id (int)
t1_id (int)
ds (string)
e (int)
...
In the new one I have only t1 and t2
t2.d can have e.g. "abc" or "def"
t3.ds can have "abc" or "def" or "abc, def"
I have created the following query:
SELECT
t2...,
t3.e
FROM t2
LEFT JOIN t3
ON t2.id = t3.id
AND t3.ds LIKE CONCAT("%", t2.d, "%")
WHERE t2.id = ?
The query does work, but I am worried about the performance of this JOIN when I have lots of entries (to migrate I obviously iterate over each of the entries from t1, each having multiple entries in t2 and t3).
So at the bottom, I go back to the question - is it worth to join them like that or should I use a different approach, like separate query or data manipulation on PHP level?
Here's MySQL EXPLAIN if that's any relevant (unfortunately it doesn't mean too much to me, so I appreciate any help):
id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra
1 | SIMPLE | modi | ref | PRIMARY,order_number | order_number | 4 | const | 3 | Using temporary; Using filesort
1 | SIMPLE | ai | ref | detail_number | detail_number | 4 | max.modi.detail_number | 1 | NULL
1 | SIMPLE | edi | ALL | NULL | NULL | NULL | NULL | 26389 | Using where; Using join buffer (Block Nested Loop)
If you are concerned about performance, then do not store lists in a string. You should have a junction table, with one row per element of the list. In other words, 'abc, def' is a no-no. Another table with two rows, one for 'abc' and another for 'def' is the way to go.
Because you have a data structure that is not optimized for SQL, there is little you can do from a performance perspective. The like is probably about as good as you can do.
If you are having index proper in t2 and t3 table then there is no issue. Please note that while doing the Left join the Duplicate entries won't insert again and again
Let's assume we have a database like this:
Project_tbl:
-----------------
id | Project_name
-----------------
1 | A
2 | B
3 | C
-----------------
personel_project_tbl:
--------------------
user_id | Project_id
--------------------
1 | 1
2 | 2
3 | 1
3 | 2
2 | 3
--------------------
instrument_project_tbl:
--------------------------
instrument_id | Project_id
--------------------------
1 | 1
1 | 2
2 | 2
2 | 1
1 | 3
--------------------------
Now, I need to sort the list of projects and rank them with regard to their similarity to the project A.
For example:
A and B have 1 users in common over the 3 users and 2 instruments over the 2 instrument so their similarity ranking is (1/2 + 2/2) / 2 = 75%
A and C have no user in common but have 1 over 2 instruments so it will be (1/2)/2 = 25%
So B is more similar than be and output should be
--------------
Project | Rank
--------------
2 | 75
3 | 25
That's the first solution came to my mind...
If I did it in PHP and MySQL, it would be something like:
for all tables as table_x
for all projects (except A) as prj_y
unique = (Select distinct count(items) from table_x where project is A)
count += (Select distinct count(items) from table_x
where project is prj_x and items are in
(select distinct items from table_x where project is a)
)/unique
So the complexity would be O(n2) and with indexing the select also would cost O(log n) which wouldn't be affordable.
Do you have any idea to do it totally in MySQL or do it in a better and faster way?
******** More information and notes:**
I'm limited to PHP and MySQL.
This is just an example, in my real project the tables are more than 20 tables so the solution should have high performance.
this question is the supplementary question for this one : Get the most repeated similar fields in MySQL database if yr solution can be used or applied in a way for both of them (somehow) It would be more than great.
I want to multiply the value of related projects with the similarity of items to get the best option...
In conclusion, these two questions will : get the most related projects, get the similar items of all projects and find the most similar item for current project where the project is also similar to the current one! yo
Thanks for your intellectual answers, its really appreciated if you could shed some light on the situations
You could do it this way:
SET #Aid = (SELECT id
FROM Project_tbl
WHERE Project_name = 'A');
SELECT P.id
, (IFNULL(personel.prop, 0) +
IFNULL(instrument.prop, 0)
)/2*100 Rank
, personel.prop AS personell
, instrument.prop AS instrument
FROM Project_tbl P
LEFT JOIN
( SELECT B.Project_id pid, COUNT(*)/C.ref prop
FROM personel_project_tbl A,
personel_project_tbl B,
(SELECT COUNT(*) AS ref
FROM personel_project_tbl
WHERE Project_id = #Aid
) AS C
WHERE A.user_id = B.user_id
AND A.Project_id = #Aid
GROUP BY B.Project_id
) personel ON P.id = personel.pid
LEFT JOIN
( SELECT B.Project_id pid, COUNT(*)/C.ref prop
FROM instrument_project_tbl A,
instrument_project_tbl B,
(SELECT COUNT(*) AS ref
FROM instrument_project_tbl
WHERE Project_id = #Aid
) AS C
WHERE A.instrument_id = B.instrument_id
AND A.Project_id = #Aid
GROUP BY B.Project_id
) instrument ON P.id = instrument.pid
WHERE P.id <> #Aid
ORDER BY Rank DESC
The idea is to have one subquery for each table, and each of these subqueries maps project id to correspondence ratio for a given table.
I'm saying nothing at all about performance. You'll have to try and see whether it is fast enough for your needs, but as I see it there is no way to beat the O(n2) complexity you mention, as you have to inspect all the data.
This question is unlikely to help any future visitors; it is only relevant to a small geographic area, a specific moment in time, or an extraordinarily narrow situation that is not generally applicable to the worldwide audience of the internet. For help making this question more broadly applicable, visit the help center.
Closed 10 years ago.
I have 4 tables like shown below
Table: leave_request
+------------+----------+--------------+------------+----------------------+
| request_id | staff_id | applied_from | applied_to | status |
+------------+----------+--------------+------------+----------------------+
| 1 | 10 | 01-07-2012 | 02-07-2012 | approved |
| 2 | 12 | 02-07-2012 | 02-07-2012 | awaiting HR approval |
+------------+----------+--------------+------------+----------------------+
Table: leave_approval
+-------------+-------------+---------------+-------------+
| request_id | approved_by | approved_from | approved_to |
+-------------+-------------+---------------+-------------+
| 1 | 1 | 01-07-2012 | 02-07-2012 |
| 1 | 2 | 01-07-2012 | 02-07-2012 |
| 2 | 1 | 02-07-2012 | 02-07-2012 |
+-------------+-------------+---------------+-------------+
Table: staff
+-----------+-------+----------+
| staff_id | name | group_id |
+-----------+-------+----------+
| 1 | jack | 1 |
| 2 | jill | 2 |
| 10 | sam | 3 |
| 12 | david | 3 |
+-----------+-------+----------+
Table: group
+-----------+------------+
| group_id | group_name |
+-----------+------------+
| 1 | admin |
| 2 | HR |
| 3 | staff |
+-----------+------------+
I need to make a report by joining these tables, It should look like below:
+----------+------------+----------+-------------+-----------+--------------+-----------+
|applied_by|applied_from|applied_to|approved_from|approved_to|approved_admin|approved_hr|
+----------+------------+----------+-------------+-----------+--------------+-----------+
| sam | 01-07-2012 |02-07-2012|01-07-2012 |02-07-2012 | Jack | Jill |
| david | 02-07-2012 |02-07-2012|02-07-2012 |02-07-2012 | Jack | null |
+----------+------------+----------+-------------+-----------+--------------+-----------+
Thanks in advance :)
Let's take it step-by-step...
First, the entities you're selecting are in the leave_request table. So let's start there:
SELECT leave_request.* FROM leave_request
Now, you need to know the data for the applied_by column in the desired results. So you join the staff table:
SELECT
applied_staff.name AS applied_by
FROM
leave_request
INNER JOIN staff AS applied_staff ON leave_request.staff_id = applied_staff.staff_id
(Note that I'm using aliases for the table names. This will come in handy later.)
Now you need to know applied_from and applied_to, which you already have available:
SELECT
applied_staff.name AS applied_by,
leave_request.applied_from,
leave_request.applied_to
FROM
leave_request
INNER JOIN staff AS applied_staff ON leave_request.staff_id = applied_staff.staff_id
Now you need to know approved_from and approved_to, which are in the leave_approval table:
SELECT
applied_staff.name AS applied_by,
leave_request.applied_from,
leave_request.applied_to,
admin_approval.approved_from,
admin_approval.approved_to
FROM
leave_request
INNER JOIN staff AS applied_staff ON leave_request.staff_id = applied_staff.staff_id
INNER JOIN leave_approval AS admin_approval ON leave_request.request_id = admin_approval.request_id
Uh oh, now we have a problem. There's a one-to-many relationship, so now we have duplicated leave requests in the results. We need to filter that down somehow. You don't specify how, so I'm going to make a couple assumptions: You want to know the approved_from and approved_to of the "admin" approval AND there will only be ONE "admin" approval.
Let's reflect those assumptions in the table joins:
SELECT
applied_staff.name AS applied_by,
leave_request.applied_from,
leave_request.applied_to,
admin_approval.approved_from,
admin_approval.approved_to
FROM
leave_request
INNER JOIN staff AS applied_staff ON leave_request.staff_id = applied_staff.staff_id
INNER JOIN leave_approval AS admin_approval ON leave_request.request_id = admin_approval.request_id
INNER JOIN staff AS approved_staff ON admin_approval.approved_by = approved_staff.staff_id
INNER JOIN group AS approved_staff_group on approved_staff.group_id = approved_staff_group.group_id
WHERE
approved_staff_group.group_name = 'admin'
That should be better. Note that the table aliasing came in handy here because we now have two instances of the staff table for two different purposes in the same query. So we needed to distinguish them. (Keep in mind that I'm flying blind here and can't actually test any of this. So correct me if there are any problems encountered along the way. I'm also free-handing this code because I don't have MySQL handy, so let me know if there are syntax errors as well.)
Now let's add the approved_admin field to the results, which is already available:
SELECT
applied_staff.name AS applied_by,
leave_request.applied_from,
leave_request.applied_to,
admin_approval.approved_from,
admin_approval.approved_to,
approved_staff.name AS approved_admin
FROM
leave_request
INNER JOIN staff AS applied_staff ON leave_request.staff_id = applied_staff.staff_id
INNER JOIN leave_approval AS admin_approval ON leave_request.request_id = admin_approval.request_id
INNER JOIN staff AS approved_staff ON admin_approval.approved_by = approved_staff.staff_id
INNER JOIN group AS approved_staff_group on approved_staff.group_id = approved_staff_group.group_id
WHERE
approved_staff_group.group_name = 'admin'
Finally, we need to know the approved_hr. And null is allowed? We're going to use a different join for this one, then. I'm also making similar assumptions to those above. Let's try this:
SELECT
applied_staff.name AS applied_by,
leave_request.applied_from,
leave_request.applied_to,
admin_approval.approved_from,
admin_approval.approved_to,
approved_staff.name AS approved_admin,
hr_staff.name AS approved_hr
FROM
leave_request
INNER JOIN staff AS applied_staff ON leave_request.staff_id = applied_staff.staff_id
INNER JOIN leave_approval AS admin_approval ON leave_request.request_id = admin_approval.request_id
INNER JOIN staff AS approved_staff ON admin_approval.approved_by = approved_staff.staff_id
INNER JOIN group AS approved_staff_group on approved_staff.group_id = approved_staff_group.group_id
LEFT OUTER JOIN leave_approval AS hr_approval ON leave_request.request_id = hr_approval.request_id
LEFT OUTER JOIN staff AS hr_staff ON hr_approval.approved_by = hr_staff.staff_id
LEFT OUTER JOIN group AS hr_staff_group ON hr_staff.group_id = hr_staff_group.group_id
WHERE
approved_staff_group.group_name = 'admin'
AND hr_staff_group.group_name = 'HR'
I'm not entirely sure about those latter LEFT OUTER JOINs. The first one is definitely going to need to be a join that allows for null values, but I'm not sure how the query engine handles joins beyond that. I'd prefer that they be INNER JOINs within the scope of the initial LEFT OUTER JOIN. But I guess all of that really also depends on the integrity of the data, which I can't guarantee.
It's also worth noting that you claim to want "Jack" as output when the value is "jack". I didn't do any string manipulation in this code to make that happen. If the value should be capitalized in the data, then capitalize it in the data.
Again, I can't guarantee this code. But as a walk-through it should get you moving in the right direction. As I mentioned in a comment on the question, I really recommend picking up a book on MySQL if you're going to be writing MySQL code.
Edit: One recommendation I can give is to the structure of the data itself. Specifically that leave_approval table feels a bit messy, and it's that table alone which is causing the confusion. I have a couple recommendations:
Add an approval_type to the leave_approval table. At the very least this would indicate if it's an admin approval, an HR approval, or any other kind of approval. (Are there even other kinds? Will there ever be?) Then you could also use request_id and approval_type as a combined primary key, or at least a combined unique constraint, to enforce better data integrity and prevent duplicate approvals.
If there are only two kinds of approvals and that's probably not going to change, reflect them both in the leave_approval table. Have one set of columns for admin_approval_* and one set for hr_approval_*. (Each set would include the staff_id and relevant dates for the approval.) Then request_id itself could be a primary key on leave_approval making it one-to-one with leave_request. This would dramatically simplify the relational data, essentially turning a leave_approval record into an optional set of additional information for a leave_request record. The joins would become much simpler and the data would express itself much more clearly.
I have the following issue : imaging an adjacency list, that's been walked over with a recursion
coming out of a sql like this
SELECT * FROM pages as ps ORDER BY COALESCE(child_of,page_id), page_id LIMIT x,y
public static function tree(&$arr, $id = NULL) {
$result = array();
foreach ($arr as $a) {
if ($id == $a['child_of']) {
$a ['children'] = self::tree($arr, $a['page_id']);
$result[] = $a;
}
}
return $result;
}
So far, so good - with another "flattener" I am getting where I need to be. Now , here is the trick ,
this works on "paginated" results, and what possibly can happen ( and it does ) is that the parent can be in one subset and the child in a different subset. With the recursion above is obvious that
the child won't make it to the tree with a missing parent.
Any ideas on how can i solve that?
Help is much appreciated.
Hierarchical data in relational tables, don't we all love it?
With your current database layout, you can only solve your problem by either always fetching all nodes, or doing as many JOINS as you have nesting levels, sort everything properly (your way of sorting only makes this fundamental problem, that you have, a little less important).
Before you ask, No, you should not do this.
The other method you have, is choose an entirely different model to create your hierarchy:
Nested sets
ascendant/descendant relationships between all nodes.
See slide 48 et seq. here.
A good read to start with is Hierarchical Data In MySQL (which I used to be able to find on the MySQL.com website, arghh)
Read It?
Here's how it could be done with the adjacency list model. But only for a known fixed amount of nesting (four nesting levels for this example).
I would find out which of my pages are root pages (of the tree). Then select only those with a query. Put the LIMIT x,x in this select statement.
After that, the following statement: (or something like it)
string query = "
SELECT t1.name AS lev1, t2.name as lev2, t3.name as lev3, t4.name as lev4
FROM category AS t1
LEFT JOIN category AS t2 ON t2.parent = t1.category_id
LEFT JOIN category AS t3 ON t3.parent = t2.category_id
LEFT JOIN category AS t4 ON t4.parent = t3.category_id
WHERE t1.name IN('ELECTRONICS', '<some other name>');
";
Could return something like this:
+-------------+----------------------+--------------+-------+
| lev1 | lev2 | lev3 | lev4 |
+-------------+----------------------+--------------+-------+
| ELECTRONICS | TELEVISIONS | TUBE | NULL |
| ELECTRONICS | TELEVISIONS | LCD | NULL |
| ELECTRONICS | TELEVISIONS | PLASMA | NULL |
| ELECTRONICS | PORTABLE ELECTRONICS | MP3 PLAYERS | FLASH |
| ELECTRONICS | PORTABLE ELECTRONICS | CD PLAYERS | NULL |
| ELECTRONICS | PORTABLE ELECTRONICS | 2 WAY RADIOS | NULL |
| etc... | etc... | etc... | |
+-------------+----------------------+--------------+-------+
The trick is to use only the root names of the query with limit (or ID's if you want) in the IN() statement of the query.
This should perform pretty good still (in theory).
The principle of the above query could also be used to find out how many descendants are in a root of a tree (with a little GROUP BYand COUNT() magic ;) Also, you could find out which of your pages are roots with this principle (though I would save that in the tabledata for performance reasons)
If you want a dynamic amount of nesting (for nearly endless scaling), implementing a nested set would be the way to go.