I'm trying to add a comment system that uses hierarchical design. Here's a sample from my database that keeps track of posts/replies (note that more rows are added as more people reply):
post_id | parent_id
1 1
2 1
3 1
4 2
5 3
6 2
7 4
I've done some research about different methods to output and manipulate the data to get what you need, but I'm not sure which method would be best for a comment system and how I would do it.
I know that adjacency lists wouldn't work because it can't handle deep trees.
Please help.
Judging by nowadays trends, AL is quite acceptable solution. All modern sites tend to dump all the comments on one page - means no sophisticated SQL logic ever required but just simple query for all the comments belongs to single article. And one loop to store them in array.
If you want no loops but get all the comments in one query already sorted, then Materialized Path would be handy. For the implementation you can find plenty of examples, I am sure.
Related
I'm aware this is normally a bad idea, and I've done my reading - in particular, this question.
However the total normalisation route seems more complex and will give me and my code more hoops to jump through. Here's my scenario:
I'm building a test creation system where users can create tests, questions and answers, and associate them all together, i.e. associate answers with questions, and questions with tests. This approach means there's no hard-linking any one kind of data to any other; a given question can be part of two or more tests, for example. So, I was thinking (simplified):
Tests table:
id (PK)
name (varchar)
questions (com-sep list of question IDs)
Questions table:
id (PK)
question text (varchar)
answers (com-sep list of answer IDs)
Answers table:
id (PK)
answer text (varchar)
So a given row in the tests table might look like:
---------------------------------------
| ID | NAME | QUESTIONS |
---------------------------------------
| 1 | SOME TEST | 1,4,7,8,11,19 |
---------------------------------------
Then, when I fetch a test and its questions, I just do some magic with group concat.
Question: is this all a bad idea? It seems a lot simpler than the alternative which is to have two further tables dedicated, respectively, to logging associations between tests and questions, and questions and answers, meaning more tables involved in any queries.
Yes, it probably is a bad idea.
Why do you think of having two more whole tables (whoa!!) as a big deal? It really isn't.
Anyway, if you're really definitely never going to want to do something like "find out which tests question 3 appears in" then go nuts, but the moment you do find you have to do something like that you'll wish you had just done it the right way.
And how will you make sure that your data is even halfway sensible? If 564 appears as an entry in one of your comma-separated lists, will you be sure that there is definitely a question number 564 in the Questions table, that it hasn't been deleted since? What a lot of extra complication to avoid creating two tables. If you don't like typing the SQL to perform the joins, you could just use an ORM.
Sure, there are cases when denormalization is worthwhile.
But keep in mind that denormalization helps simplify a subset of queries against your data, at the expense of all other queries.
The scenarios listed in my answer to Is storing a delimited list in a database column really that bad? show how many other types of queries or updates you might have to do against your data. Searching, sorting, inserting, deleting... Also, relying on referential integrity to avoid your data turning to a collection of orphans.
But if you know that fetching or updating the whole list of id's is the only thing you need to optimize for, and this will never change (famous last words), then go for it, use denormalization.
If you want any of those other types of queries to be convenient or efficient, stick with a normalized design.
Just starting out with PHP and Joomla development, have one big obstacle that I can't wrap my head around.
Building an assessment tool on the backend. I have one table ("questions") that contain a bunch of questions that are grouped by sections (1-9). I have another table ("students") that will be populated by taking all registered users, displaying one user and section along with the questions. Then the teacher will check off that they've completed those questions.
Questions Table:
questionID sectionID question
1 1 Blah Blah 1
2 1 Blah blah 2
3 2 Bork bork
4 3 Bork de bork
Students Table:
id userID questionID passed
1 85 1 1
2 85 2
3 85 3
4 85 4 1
5 85 5 1
6 92 1 1
7 92 2 1
8 92 3
9 92 4
10 92 5 1
Planning on using INSERT…ON DUPLICATE KEY UPDATE so if the values don't exist in the "students" table then it will just update each row, and if the values do exist then they'll be updated (along with a timestamp and a few other fields).
I'm pretty sure I can build the query (suggestions welcomed! ) but right now I have no idea how to use Joomla's MVC framework to make this happen. It looks like jTable isn't good for returning multiple rows which means using JModelList....but then how to use Joomla's functionality to make a Save button (JToolbar?) that will update or insert depending on if that row exists? Feel like I have all the pieces but don't know how they go together. The students view is very simple, one controller/model/view.html.php/tmpl-default.php.
References:
http://www.sourcecodester.com/php/3863/updating-multiple-rows-mysql-using-php.html
http://forum.joomla.org/viewtopic.php?p=2263231
http://forum.joomla.org/viewtopic.php?p=2406831
http://forum.joomla.org/viewtopic.php?p=1745675
http://forum.joomla.org/viewtopic.php?p=1675454 (promising but dated)
http://forum.joomla.org/viewtopic.php?p=2506722
http://stackoverflow.com/questions/1305863/update-multiple-rows
http://docs.joomla.org/JModelList/1.6
http://docs.joomla.org/JModelList::getListQuery/1.6
http://docs.joomla.org/Using_the_JTable_class
http://docs.joomla.org/How_to_use_the_JTable_class
http://docs.joomla.org/JTable
http://docs.joomla.org/JTable/getobjectslist (promising...maybe?)
Questions:
How to use Joomla's MVC framework to make a multirow update...for dummies.
Where does everything go?
If there's a different way of doing this I'm totally open to suggestions.
Thanks!
EDIT
Realize this is probably too broad. Here's a bit more:
I've gone through the Joomla MVC guide ("here's some code - PLOP") and Lynda (better but also in the "here's some code - PLOP" category).
I've set up a few simple table views/updates using those guides. These involved making one view that returned all the values and another that would pull up one record (either to edit or new), modifying the XML file to account for the different data input types, etc.
My confusion in a nutshell is this. Using those guides it seems that the edit view using JTable is meant to display ONE record. To display more than one record you use JModelList. So what do you use to display multiple records and update them? Or do you just ignore what they're supposed to do and just throw a looping update in there? Trying to learn and do this the correct way which has been a bit harder to understand than anticipated.
Again, thanks!
Here are some suggestions:
Use Joomla's database class to run your database queries, inserts, updates etc:
http://docs.joomla.org/How_to_use_the_database_classes_in_your_script
This tutorial will give you an idea of how Joomla implements MVC:
http://docs.joomla.org/Developing_a_Model-View-Controller_Component_-_Part_1
Also, this is quite handy when it comes to building a skeleton structure for your Joomla MVC component:
http://www.alphaplug.com/index.php/products/mvc-generator-online.html
Hope this helps.
You should probably use JDatabase:
http://docs.joomla.org/JDatabase/1.6
If you want to retrieve multiple rows, use either JDatabase::loadObjectList, JDatabase::loadRowList or JDatabase::loadAssocList methods depending on if you want an object, an array or an associative array.
To update multiple rows, use the JDatabase::query or JDatabase::queryBatch methods.
Typically, I use JTable to assist when there is an insert pattern as described (i.e. a single row edit):
http://docs.joomla.org/Using_the_JTable_class#Create.2FUpdate
To update multiple rows, I would use the JDatabase methods. Joomla uses JDatabase when calling it's own list of content articles, see components/com_content/models/articles.php, see the call:
$db = $this->getDbo();
This is explained here:
http://docs.joomla.org/Accessing_the_database_using_JDatabase
I'm not certain, but try accessing multiple rows with JTable by just using:
$rows->load();
Hope this helps.
I would like to build a website that has some elements of a social network.
So I have been trying to think of an efficient way to store a friend list (somewhat like Facebook).
And after searching a bit the only suggestion I have come across is making a "table" with two "ids" indicating a friendship.
That might work in small websites but it doesn't seem efficient one bit.
I have a background in Java but I am not proficient enough with PHP.
An idea has crossed my mind which I think could work pretty well, problem is I am not sure how to implement it.
the idea is to have all the "id"s of your friends saved in a tree data structure,each node in that tree resembles one digit from the friend's id.
first starting with 1 node, and then adding more nodes as the user adds friends.
(A bit like Lempel–Ziv).
every node will be able to point to 11 other nodes, 0 to 9 and X.
"X" marks the end of the Id.
for example see this tree:
An Example
In this tree the user has 4 friends with the following "id"s:
0
143
1436
15
Update: as it might have been unclear before, the idea is that every user will have a tree in a form of multidimensional array in which the existence of the pointers themselves indicate the friend's "id".
If every user had such a multidimensional array, searching if id "y" is a friend of mine, deleting id "y" from my friend list or adding id "y" to my friend list would all require constant time O(1) without being dependent on the number of users the website might have, only draw back is, taking such a huge array, serializing it and pushing it into each row of the table just doesn't seem right.
-Is this even possible to implement?
-Would using serializing to insert that tree into a table be practical?
-Is there any better way of doing this?
The benefits upon which I chose this is that even with a really large number of ids (millions or billions) the search,add,delete time is linear (depends of the number of digits).
I'd greatly appreciate any help with implementing this or any suggestions for alternative ways to improve or change this method.
I would strongly advise against this.
Storage savings are not significant, and may (probably?) be worse. In a real dataset, the actual space-savings afforded to you with this approach are minimal. Computing the average savings is a very difficult problem, but use some real numbers and try a few samples with random IDs. If you have a million users, consider a user with 15 friends. How much data do you save with this approch? You may actually use more space, since tree adjacency models can require significant data.
"Rendering" a list of users requires CPU investment.
Inserts are non-deterministic and non-trivial. When you add a new user to an existing tree, you will have a variety of methods of inserting them. Assuming you don't choose arbitrarily, it is difficult to compute which approach is the best (and would only be based on heuristics).
This are the big ones that came to my mind. But generally, I think you are over-thinking this.
You should check out OQGRAPH, the Open Query graph storage engine. It is designed to handle efficient tree and graph storage for MySQL.
You can also check out my presentation Models for Hierarchical Data with SQL and PHP, or my answer to What is the most efficient/elegant way to parse a flat table into a tree? here on Stack Overflow.
I describe a design I call Closure Table, which records all paths between ancestors and descendants in a hierarchy.
You say 'using PHP' in the title, but this seems to be just a database question at its heart. And believe it or not the linking table is by far the best way to go. Especially if you have millions or billions of users. It would be faster to process, easier to handle in the PHP code and smaller to store.
Update
Users table:
id | name | moreInfo
1 | Joe | stuff
2 | Bob | stuff
3 | Katie | stuff
4 | Harold | stuff
Friendship table:
left | right
1 | 4
1 | 2
3 | 1
3 | 4
In this example Joe knows everyone and Katie knows Harold.
This is of course a simplified example.
I'd love to hear if someone has a better logic to the left and right and an explanation as to why.
Update
I gave some php code in a comment below but it was marked up wrong so here it is again.
$sqlcmd = sprintf( 'SELECT IF( `left` = %1$d, `right`, `left`) AS "friend" FROM `friendship` WHERE `left` = %1$d OR `right` = %1$d', $userid);
Few ideas:
ordered lists - searching through ordered list is fast, though ordering itself might be heavier;
horizontal partitioning data;
getting rid of premature optimizations.
This question already has answers here:
Closed 10 years ago.
Possible Duplicate:
What is the most efficient/elegant way to parse a flat table into a tree?
This I am finding rather tricky and would like some opinions on the matter.
I am trying to store hierarchal data (tree like) with an unknown number of levels and branches. I am wanting to be able to add new ones and delete any at any time.
I need to be able to query from any node in the hierarchy for all of the children id's in one go and efficiently due to large user base.
Lets take a hypothetical example of a website where families socialise and update their status like in facebook and at any time you can be viewing a family members "Wall" which will also include all of the recent status updates form the people below them in the hierarchy in chronological order.
Obviously the fetching posts once you have the array of family members id's who are children of this family members node is easy enough in a loop.
Lets take an example simple table structure of:
id | parentId | name
________________________
1 | NULL | John
2 | 1 | Peter
3 | 1 | Bob
4 | 3 | Emma
5 | 2 | Sam
6 | 4 | Gill
etc.... You get the idea.
I need to be able to do the above with something like this unless you think the structure needs to be adapted.
I have read up on mySql nested set model.
This seems very fiddly and could be unreliable if something was not to update correctly and would mess everything up.
I am used to using php and mysql but have been reading a bit on cassandra and thrift. Not sure if this would be easier?
There are already good approaches out there which are more simple than the solution you propose.
Here are a couple of links which explain how to do it (we use this ourselves for much the same problem you describe and it works well).
Managing Hierarchical Data in MySQL (from MySQL)
Storing Hierarchical Data in a Database (from Sitepoint, but a clearer explanation, I think)
This makes inserting/updating more complex, but selecting portions of the tree structure far faster (with only one query). It allows finding all children of any given node in one query, and finding all the ancestors of a given node with one query.
So I think I have come up with an idea.
The reason I am against the nested set model is because it seems like it is still not the best way and is not going to be the ideal performance solution.
I am going to cover a proposed solution I have been thinking about.
The concept means creating an hierarchal map table to keep track of all the relationships between each family member/node.
The way it would work is:
Using map table structure of this:
id | fMemberId | parentid
=====================================
1 | 3 | 2
2 | 4 | 3
3 | 4 | 2
1) As a new family member is created as a child of a parent we would take the parents id and create a new row in our family members table with the parent id set for future additional uses and functionality.
2) As this row is created we will create new rows with all of the parent id's for the new family member.
A quick way to do this would be to take the parent id from the new family member and do a query to the map table to find all the rows with the family member id the same as the new family members parent id and then store an array in php of the subsequent parent ids required for storing alongside the new family members id in the map table. This would then only require one sql query for grabbing all the parent id's for adding them rather than a number of queries based on the number of nodes
This would mean when we are viewing a family members feed of posts we would be able to query the db for simply the rows in the map table to get all the children id's of the current family member and subsequently query other tables for the post data.
The main trade off being the amount of potential storage required for this kind of system.
However I believe reading speed would be quicker as there is no conditional SQL statements and also maybe just as quick to write to db in this way.
We could overcome this by using InnoDB's cluster id's assigning an initial family id index and creating a new table with the "next family members id" based on the family id.
Also reliability, if a row wasn't written it would be easy enough to add it in. It prevents having to continually edit rows just to create a member.
What are your thoughts on this?
So far this seems to be a good way in my opinion. Took a lot of thinking to get to here. I also believe it could maybe be improved with time and being able to store arrays of id's per member rather than all of them. Still trying to work that one out!
Yes, your solution is called a transitive closure. I have written about it before:
What is the most efficient/elegant way to parse a flat table into a tree?
Models for Hierarchical Data
You also need the zero-length paths, e.g. 2-2, 3-3, 4-4.
I am currently in the process of rewriting an application whereby teachers can plan curriculum online.
The application guides teachers through a process of creating a unit of work for their students. The tool is currently used in three states but we have plans to get much bigger than that.
One of the major draw cards of the application is that all of the student outcomes are preloaded into the system. This allows teachers to search or browse through and select which outcomes are going to be met in each unit of work.
When I originally designed the system I made the assumption that all student outcomes followed a similar Hierarchy. That is, there are named nested containers and then outcomes.
The original set of outcomes that I entered was three tiered. As such my database has the following structure:
=========================
Tables in bold
h1
id, Name
h2
id, parent___id (h1_id), Name
h3
id, parent___id (h2_id), Name
outcome
id, parent___id (h3_id), Name
=========================
Other than the obvious inability to add n/ levels of hierarchy this method also made it difficult to display a list of all of the standards without recursively querying the database.
Once the student outcomes (and their parent categories) have been added there is very little reason for them to be modified in any way. The primary requirement is that they are easy and efficient to read.
So far all of the student outcomes from different schools / states / countries have roughly followed my assumption. This may not always be the case.
All existing data must of course be transferred across from the current database.
Given the above, what is the best way for me to store all the different sets of student outcomes? Some of the ideas I have had are listed below.
Continue using 4 tables in the database, when selecting either use recusion or lots of joins
Use nested sets
XML (Either a global XML file for all of the different sets or an XML file for each)
I don't know that you actually need 4 tables for this.
If you have a single table that tracks the parent_id and a level you can have infinite levels.
outcome
id, parent_id, level, name
You can use recursion to track through the tree for any particular element (you don't actually need level, but it can be easier to query with it).
The alternative is nested sets. In this case you would still merge to a single table, but use the set stuff to track levels.
Which one to use depends on your application.
Read-intensive: nested sets
Write-intensive: parent tree thingy
This is because with nested sets you can retrieve the entire tree with a single query but at the cost of reordering the entire tree every time you insert a new node.
When you just track the parent_id, you can move or delete nodes individually.
PS: I vote no to XML. You have the same recursive issues, plus the overhead of parsing the data as well as either storing it in the db or on the filesystem (which will cause concurrency issues).
I agree with the other poster - nested sets is the way to go I think.
See here:
http://mikehillyer.com/articles/managing-hierarchical-data-in-mysql/
It explains the theory and compares it to what you are already using - which is a twist on adjacency really. It shows +/- of them all, and should help you reach a decision based on all of the subtleties of your project.
Another thing I've seen (in CakePHP's tree behaviour) is actually to use both at once. Sure its not great performance wise, but under this model, you insert/remove things just as you would with adjacency, and then there is a method to run to rebuild the left/right edge values to allow you to do the selects in a nested sets fashion. Result is you can insert/delete much more easily.
http://book.cakephp.org/view/91/Tree
there is another way to handle trees in a database that is maybe not as "smart" than nested sets and other patterns described here, but that is really efficient and easy :
instead of storing the level (or depth) of an item, you can store the full path in the tree, like this :
A
B
C
D
E
would be stored like this:
item | parent | path
----------------------------
A | NULL | A
B | A | A--B
C | A | A--C
D | C | A--C--D
E | A | A--E
then you can easyly get:
(pure SQL) all direct children of an item with a where parent = '' clause
(pure SQL) all direct and indirect children with a where path LIKE 'PARENT--%' clause
(PHP) the depth of the node (count(explode('--',$path))
those features are good enough in most situations, and quite performant, even with several sublevels, as long as you create the good indices (PK, index on parent, index on path). For sure, this solution is demanding when deleting/moving nodes to update pathes...
I hope this helps!