How to implement a nested comment system? - php

What would be the ideal way to implement this sort of thing? The idea I have in my head right now is to have a comments table and have each comment have a thread identifier and parent comment identifier. The thread identifier would indicate to which thread the comment belongs to and would allow for a simple MySQL statement using the WHERE clause. Each comment would have an auto_increment identifier as per usual database design and the parent identifier column would indicate which comment this comment is a child of.
This type of design would put most of the stress on the PHP aspect of things because it would only be one SQL call to get all comments from a thread. Another implementation I found was having an SQL query for each nesting level. This solution would place the stress on the SQL sides of things.
How would SO implement this? Currently I'm at a loss because I am not sure which solution is the "best" solution and I am still quite new to database design, PHP, and JQuery.
Thanks.

Look at Managing Hierarchical Data in MySQL, specifically the section called "Nested Set Model". You may have to read through it a few times before it makes sense (I did) but it's worth it. It's a very powerful way to work with nested data and retrieve the parts you want with only one query.
On the downside, for updates you have to do a lot more work.

Related

Creating generic comment model in Laravel 5

I have a comments table that can be useful im many parts of the project, not only in photos, for example. Is there a way to implement a generic Comment model/table that can process where a comment should be shown?
Is this a good idea or I should separate the comments for each area of the site? Like comments_photos, comments_songs, comnents_videos, and so on?
Thanks.
In short yes you can create generic comments as #pinkal vansia mentioned using polymorphic relations.
But in my humble opinion I wouldn't store all comments in a single table. One of the reasons for that would be performance. If you're going to have lot's of comments where each row will have it's own type. This such table will result in bigger table size and will need additional indexes to perform well.
Also you should keep in mind that Laravel's ORM does not always suffice the needs. Then you have to write manual SQL, which is going to be more complex.
And the main reason I'd be against it is because there would be only couple of corner cases where you would need to work with all the comments in the system instead of single type. So therefore you would need to check for correct type of comment each time you do something with it. So in my opinion it breaks KISS principle without a good reason.

Multi-tiered / Hierarchical SQL : How does Reddit do it? Which is the most efficient way? And what databases make it simpler?

I've been reading up a bit on how multi-tiered commenting systems are built:
http://articles.sitepoint.com/article/hierarchical-data-database/2
I understand the two methods talked about in that article. In fact I went down the recursive path myself, and I can see how the "Modified Preorder Tree Traversal" method is very useful as well, but I have a few questions:
How well do these two method perform in a large environment like Reddit's, where you can have thousands and thousands of mutli-tiered comments?
Which method does Reddit use? It simply seems very costly, to me, to have to update thousands of rows if they use the MPTT method. I'm not deluding myself into thinking I am building a system to handle Reddit's traffic, this is simply curiosity.
There's another way of retrieving comments like this ... JOINs via SQL that return the rows with IDs defining their parents. How much slower/faster/better/worse would it be to simply take these unformatted results, loop through them and add them into a formatted array using my language of choice (PHP)?
After reading that sitepoint article, I believe I understand that Oracle offers this functionality in a much simpler, easier to use way, and MySQL does not. Are there any free databases that offer something similar to Oracle?
On a side note, how is SQL pronounced? I'm getting the feeling I've been wrong for the past several years by saying 'sequel' instead of 's - q - l', although "My Sequel" rolls easier off the tongue than "My S Q L"!
MPTT is easier to fetch (a single SQL query), but more expensive to update. Simply delegate the update to a background process (that's what queue managers are for). Also note that most of that update is a single SQL UPDATE command. It might take long to process, but a smart RDBM could make the transaction visible (in cache) to new (read-only) queries before it's committed to disk.
I'd bet it uses MPTT, but not only doing the 'hard' update in background but also quite likely do a simple rendering to in-memory cache. This way, the posting user can see his post immediately, without having to wait until updating so many rows. Also, SSDs do help in getting high transaction rates.
that's called Adjacency Model (or sometimes adjacency list), it's a more obvious way to do it, and simpler to update (doesn't modify existing records) but FAR more inefficient to read. You have to do a recursive walk of the tree, with an SQL query at each node. That's what kills you: the number of small queries.
PostgreSQL has recursive SELECTs, which do in the server what you envision in PHP. It's better than PHP because it's closer to the data; but it still has the same (huge) number of random-access disk seeks.
You should have a closer look at the links in Further reading they give in the end. The Four ways to work with hierarchical data article on evolt linked there provides another way to approach this problem (the Flat table). Since that approach is extremely easy to implement for a threaded discussion board, I wouldn't be surprised if reddit uses it (or a variation on the theme).
I do like MPTT (aka nested set) though, and have used it for hierarchies that are (almost) static.

How do I write object classes effectively when dealing with table joins?

I should start by saying I'm not now, nor do I have any delusions I'll ever be a professional programmer so most of my skills have been learned from experience very much as a hobby.
I learned PHP as it seemed a good simple introduction in certain areas and it allowed me to design simple web applications.
When I learned about objects, classes etc the tutor's basic examnples covered the idea that as a rule of thumb each database table should have its own class. While that worked well for the photo gallery project we wrote, as it had very simple mysql queries, it's not working so well now my projects are getting more complex. If I require data from two separate tables which require a table join I've instead been ignoring the class altogether and handling it on a case by case basis, OR, even worse been combining some of the data into the class and the rest as a separate entity and doing two queries, which to me seems inefficient.
As an example, when viewing content on a forum I wrote, if you view a thread, I retrieve data from the threads table, the posts table and the user table. The queries from the user and posts table are retrieved via a join and not instantiated as an object, whereas the thread data is called using my Threads class.
So how do I get from my current state of affairs to something a little less 'stupid', for want of a better word. Right now I have a DB class that deals with connection and escaping values etc, a parent db query class that deals with the common queries and methods, and all of the other classes (Thread, Upload, Session, Photo and ones thats aren't used Post, User etc ) are children of that.
Do I make a big posts class that has the relevant extra attributes that I retrieve from the users (and potentially threads) table?
Do I have separate classes that populate each of their relevant attributes with a single query? If so how do I do that?
Because of the way my classes are written, based on what I was taught, my db update row method, or insert method both just take the attributes as an array and update all of that, if I have extra attributes from other db tables in each class then how do I rewrite those methods as obbiously updating automatically like that would result in errors?
In short I think my understanding is limited right now and I'd like some pointers when it comes to the fundamentals of how to write more complex classes.
Edit:
Thanks for the answers so far they've given me lots of pointers and thoughts and a lot of reading material. What I would like though is maybe an idea of how different people have decided to handle a simple table join with any amount of classes? Did you add attributes to the classes? Query from outside the class then pass the results into each class? Something else?
Entire books have been written about how to design a set of classes to fit a database schema.
Long story short: there is no one-size-fits-all way to do it, you have to make a lot of design decisions about the trade offs you want to make on an application-by-application basis.
You can find a library or framework to help, keywords: ActiveRecord, ORM (Object Relational Mapper)
P.S. You have no idea the potential for soul-killing analysis paralysis and over designing you can get into. Do the simplest thing that can possibly work for your app.
Code sample for my (below) comment:
$post = new PublishedPost($data);
$edit = $post->setTitle($newTitle);
$edit->save();
This is too broad to be answered without going into epic length.
Basically, there is four prominent Data Source Architectural Patterns from Patterns of Enterprise Architecture: Table Data Gateway, Row Data Gateway, Active Record and Data Mapper. These can be found implemented in the common php frameworks in some variation. These are easy to grasp and implement.
Where it gets difficult is when you start to tackle the impedance mismatch between the database and the business objects in your application. To do so, there are a number of Object-Relational Behavioral, Structural and Metadata Mapping Patterns, like Identity Maps, Lazy Loading, Query Objects, Repositories, etc. Explaining these is beyond scope. They cover almost 200 pages in PoEAA.
What you can look at is Doctrine or Propel - the two most well known PHP ORM - that implement most of these patterns and which you could use in your application to replace your current database access handling.
Many of your worries can be answered by inspecting the existing solutions found in well-tested frameworks such as CakePHP, symfony and Zend Framework. Examining their approaches and peeking under the hood should shed light on your questions. Who knows? You may even decide to write future projects using them!
They've spent years putting their heads together to tackle these problems. Take advantage!
Checkout Doctrine:
Here is an example of a forum application using Doctrine.
http://www.doctrine-project.org/documentation/manual/1_2/en/real-world-examples#forum-application

Reasons why you wouldn't use a foreign key? [php + MySQL]

I'm working on an old web application my company uses to create surveys. I looked at the database schema through the mysql command prompt and thought the tables looked pretty solid. Though I'm not a DB guru I'm well versed in the theory behind it (having taken a few database design courses in my software engineering program).
That being said, I dumped the create statements into an SQL file and imported them in MySQL Workbench and saw that they make no use of any "actual" foreign keys. They'll store another table's primary key like you would with a FK but they don't declare it as one.
So seeing how their DB is designed the way I would through what I know (minus the FK issue) I'm left wondering that maybe there's a reason behind it. Is this a case of lazy programming or could you get some performance gains by doing all the error check programmatically?
In case you'd like an example they basically have Surveys and a survey has a series of Questions. A question is part of a survey so it holds it's PK in a column. That's pretty much it but they use it everywhere.
I'd appreciate any insight :) (I understand that this question might not have a right/wrong answer but I'm looking more for some information on why they would do this as this system has been pretty solid ever since we started using it so I'm led to believe that these guys knew what they were doing)
The original developers might have opted to use MyISAM or any other storage engine that does not support foreign key constraints.
MySQL only supports the defining of actual foreign key relationships on InnoDB tables, maybe yours are MyISAM, or something else?
More important is that the proper columns have indices defined on them (so the ones holding the PK of another table should be indexed). This is also possible in MyISAM.
As general points; keys speed up reads (if they are applicable to the read taking place they help the optimizer) and slow down writes (because they add overhead to the tables).
In the vast majority of cases the improvement of speed for reading and maintenance of referential integrity outweighs the minor overhead they add to writes.
This distinction has been blurred by cacheing, mirroring etc as so many reads on the very big sites don't actually hit the 'live' database - but this is not very relevant unless you are working for Amazon, Twitter or the like.
On uber large databases (the type that Teradata support) you find that they don't use Foreign keys. The reason is performance. Every time you write out to the database, which is often enough in a data warehouse you have the added overhead of having to check all the fk's on a table. If you already know it to be true, what's the point.
Good design on a small db would just mean you put them in, but there are performance gains to be had by leaving them out.
You don't really have to use foreign keys.
If you don't have them, data might became inconsistent and you won't be able to use cascade deletes and updates.
If you have them you might loose some of the users data due to the bug in your SQL statements that happens because of schema changes.
Some prefer to have them, some prefer life without them. There's no real advantages in either case.
Here is a real life instance where I'm not using a foreign key.
I needed a way to store a parent child relationship where the child may not exist, and the child is an abstract class. Since the child could be of a few types, I use one field to name the type of the child and one field to list the id of the child. The application handles most of the logic.
I'm not sure if this was the best design decision, but it was the best I could come up with under the deadline. It's been working well so far!

MySQL stored procedure vs. multiple selects

Here's my scenario:
I've got a table of (let's call them) nodes. Primary key on each one is simply "node_id".
I've got a table maintaining a hierarchy of nodes, with only two columns: parent_node_id and child_node_id.
The hierarchy is maintained in a separate table because nodes can have an N:N relationship. That is to say, one node can have multiple children, and multiple parents.
If I start with a node and want to get all of its ancestors (i.e. everything higher up the hierarchy), I could either do several selects, or do it all in one stored procedure.
Anyone with any practical experience with this question know which one is likely to have the best performance? I've read things online that recommend both ways.
"which one is likely to have the best performance? " : No one can know ! The only thing you can do is try both and MEASURE. That's sadly enough the main answer to all performance related questions... except in cases where you clearly have a O(n) difference between algorithms.
And, by the way, "multiple parents" does not make a hierarchy (otherwise I would recommend to read some books by Joe Celko) but a DAG (Direct Acyclic Graph) a much harder beast to tame...
If performance is your concern, then that schema design is not going to work as well for you as others could.
See More Trees & Hierarchies in SQL for more info.
I think a general statements could lead into problem, because it depends on how you your queries respectively the stored procedure make of usage of the indices.
To make a helpful declaration it would be necessary to compare the SQL of your selects and the stored procedure.

Categories