Database/code design for limiting hierarchical comments at certain level? - php

I'm making small commenting app written in PHP as backend, React as frontend and PostgreSQL as database. I have table comment which holds all comments and it is self referencing table.
\d+ comment:
Column | Type | Collation | Nullable | Default | Storage | Stats target | Description
-----------------+--------------------------+-----------+----------+-------------------------------------+----------+--------------+-------------
id | bigint | | not null | nextval('comment_id_seq'::regclass) | plain | |
website_page_id | bigint | | not null | | plain | |
author_id | bigint | | not null | | plain | |
parent_id | bigint | | | | plain | |
content | text | | | | extended | |
deleted_date | timestamp with time zone | | | | plain | |
updated_date | timestamp with time zone | | not null | | plain | |
created_date | timestamp with time zone | | not null | | plain |
On the client side I make request to get all comments, backend makes recusrive query to database to grab all comments and return them in appropriate format, then I render it.
Here is JSON of parent comment:
{
id: 1
author_id: 1
content: "Some content"
created_date: "2019-05-29 06:11:43+00"
depth: 0
parent_id: null
replies: [...]
updated_date: "2019-05-29 06:11:43+00"
website_page_id: null
}
So each comment as depth parameter, which I use to define identation (I don't nest comments recursively like comment -> replies -> comment -> replies, it is only comment and all its replies. I do extra processing on backend to make this form, PostgreSQL returns just data as it is with depth definition.
I have a form for creating new comments and replies to existing comments. So far replies can nest as far as it can go (not sure about database limitations).
Here are my concerns:
I don't want to nest forever as it kills performance (I assume). Does it really? Also, it is resonable to limit it up to n level by default so it does not go off the screen on the client side.
Not sure where and how to make limitation. Whether it should be on the database level, backend or client side?
I had only one idea how to solve it, but so far it does not seem to be elegant solution. Here it is:
Ignore that it nests on the database level and just limit identation on client side, so if I defined 5 level as maximum, then anything above that would have 5 level identation. It works, but it does not help the database performace.
I am pretty sure there are other possible ways to do this, help would be appreciated!

Recursive queries (when they take advantage of index) are really fast. It will probably take more time to nest the results in Javascript. The nesting limitation is more for the UI and not very difficult to fetch:
with recursive
comment_node (comment_id, parent_id, level) as (
select comment_id, comment_parent_id, 1::int4 as level
from comment
where website_page_id = $*
union all
select c.comment_id, c.comment_parent_id, parent.level + 1 as level
from comment as c
inner join comment_node as parent
on parent.comment_id = c.parent_id
and parent.level < 5
)
select c.comment_id, cn.level, c.comment_parent_id, c.content, a.name, ...
from comment as c
join comment_node as cn
using (comment_id)
join author as a
using (author_id)
Limiting the insertion of comments with a nesting level of 5 or more is probably not a meaningful database constraint as it does not break the data consistency.

Related

finding next ID in a CHAR field without AUTO_INCREMENT

I got a Table which stores objects. An Object can be anything from a chair to a employee. An Object got an ObjectID, which is a 10 characters code-39 barcode label on the Object.
Many Objects already have a Label, thus an ObjectID assinged to them. Some have Prefixes, e.g. "9000000345" might be a Desk or "0000000895" might be a folder with invoices.
When People start a new Folder for example, they take pre-printed Barcode Labels
and put them on it. The pre-printed Barcode Labels are generated by a Printer which just increases a number by 1 and zerofills it to 10 Digits and then prints it as code-39.
All Most of the objects are stored in Excel Sheets. They now should be migrated into a MySQL Database.
Now, the System should also be able to create objects on its own. Objects created by the System have a leading "1" e.g. "1000000426".
The Problem: How do I get the next ObjectID for Auto generated Objects?
I cant really use AUTO_INCREMENT because there are also non-auto-generated rows in the table.
Another Thing to say is that the 'ObjectID' field has to be CHAR(10) because for special occasions there were alphanumeric prefixes used like "T1" -> "T100003158"
My Table when using AUTO_INCREMENT:
| ID | Created | ObjectID | Parent | Title | Changed | Note |
|----|-------------|--------------|--------|-------------|-------------|------|
| 1 | <timestamp> | "1000000001" | NULL | "Shelf 203" | <timestamp> | NULL |
| 2 | <timestamp> | "9000000458" | NULL | "Lamp" | <timestamp> | NULL |
| 3 | <timestamp> | "1000000003" | NULL | "Shelf 204" | <timestamp> | NULL |
The ObjectID of the last Object in the table should be "1000000002" not "1000000003"
I hope I could explain the Problem well enough.
Naive solution can be:
SELECT CAST(ObjectID AS UNSIGNED) + 1 FROM yourTable WHERE ObjectId LIKE "1%" ORDER BY ObjectID DESC LIMIT 1
Basically search for all Object ID starting with 1xxxx then sort them (because its zero padded we can still sort) and then cast result to int and increment it.
Might be faster to cast to int first and then do between. Rest would be the same

How should I Query this in mysql

I have a web app in which I show a series of posts based on this table schema (there are thousands of rows like this and other columns too (removed as not required for this question)) :-
+---------+----------+----------+
| ID | COL1 | COL2 |
+---------+----------+----------+
| 1 | NULL | ---- |
| 2 | --- | NULL |
| 3 | NULL | ---- |
| 4 | --- | NULL |
| 5 | NULL | NULL |
| 6 | --- | NULL |
| 7 | NULL | ---- |
| 8 | --- | NULL |
+---------+----------+----------+
And I use this query :-
SELECT * from `TABLE` WHERE `COL1` IS NOT NULL AND `COL2` IS NULL ORDER BY `COL1`;
And the resultant result set I get is like:-
+---------+----------+----------+
| ID | COL1 | COL2 |
+---------+----------+----------+
| 12 | --- | NULL |
| 1 | --- | NULL |
| 6 | --- | NULL |
| 8 | --- | NULL |
| 11 | --- | NULL |
| 13 | --- | NULL |
| 5 | --- | NULL |
| 9 | --- | NULL |
| 17 | --- | NULL |
| 21 | --- | NULL |
| 23 | --- | NULL |
| 4 | --- | NULL |
| 32 | --- | NULL |
| 58 | --- | NULL |
| 61 | --- | NULL |
| 43 | --- | NULL |
+---------+----------+----------+
Notice that the IDs column is jumbled thanks to the order by clause.
I have proper indexes to optimize these queries.
Now, let me explain the real problem. I have a lazy-load kind of functionality in my web-app. So, I display around 10 posts per page by using a LIMIT 10 after the query for the first page.
We are good till here. But, the real problem comes when I have to load the second page. What do I query now? I do not want the posts to be repeated. And there are new posts coming up almost every 15 seconds which make them go on top(by top I literally mean the first row) of the resultset(I do not want to display these latest posts in the second or third pages but they alter the resultset size so I cannot use LIMIT 10,10 for the 2nd page and so on as the posts will be repeated.).
Now, all I know is the last ID of the post that I displayed. Say 21 here. So, I want to display the posts of IDs 23,4,32,58,61,43 (refer to the resultset table above). Now, do I load all the rows without using the LIMIT clause and display 10 ids occurring after the id 21. But for that I will have to interate over thousands of useless rows.But, I cannot use a LIMIT clause for the 2nd,3rd... pages that is for sure. Also, the IDs are jumbled, so I can definitely not use WHERE ID>.... So, where do we go now?
I'm not sure if I've understood your question correctly, but here's how I think I would do it:
Add a timestamp column to your table, let's call it date_added
When displaying the first page, use your query as-is (with LIMIT 10) and hang on to the timestamp of the most recent record; let's call it last_date_added.
For the 2nd, 3rd and subsequent pages, modify your query to filter out all records with date_added > last_date_added, and use LIMIT 10, 10, LIMIT 20, 10, LIMIT 30, 10 and so on.
This will have the effect of freezing your resultset in time, and resetting it every time the first page is accessed.
Notes:
Depending on the ordering of your resultset, you might need a separate query to obtain the last_date_added. Alternatively, you could just cut off at the current time, i.e. the time when the first page was accessed.
If your IDs are sequential, you could use the same trick with the ID.
Hmm..
I thought for a while and came up with 2 solutions. :-
To store the Ids of the post already displayed and query WHERE ID NOT IN(id1,id2,...). But, that would cost you extra memory. And if the user loads 100 pages and the ids are in 100000s then a single GET request would not be able to handle it. At least not in all browsers. A POST request can be used.
Alter the way you display posts from COL1. I don't know if this would be a good way for you. But, it can save you bandwith and make your code cleaner. It may also be a better way. I would suggest this :- SELECT * from TABLE where COL1 IS NOT NULL AND COL2 IS NULL AND Id>.. ORDER BY ID DESC LIMIT 10,10. This can affect the way you display your posts by leaps and bounds. But, as you said in your comments that you check if a post meets a criteria and change the COL1 from NULL to the current timestammp, I guess that the newer the posts the, the more above you want to display them. It's just an idea.
I assume new posts will be added with a higher ID than the current max ID right? So couldn't you just run your query and grab the current max ID. Then when you query for page 2 do the same query but with "ID < max_id". This should give you the same result set as your page 1 query because any new rows will have ID > max_id. Hope that helps?
How about?
ORDER BY `COL1`,`ID`;
This would always put IDs in order. This will let you use:
LIMIT 10,10
for your second page.

Omit / ignore any records that have been purchased by a user

I'm currently in the process of developing a site that amongst other things allows a user to filter a marketplace by showing or hiding items they have already purchased. This works on a basic AJAX call that passes through the current conditions of those filters available, and then using CodeIgniter's active record, it builds the appropriate query.
My issue is wrapping my head around the query so that if a user selects to hide purchased items the query omits / ignores any relevant records (i.e. if user_id = 5 and hide purchased is true, any scenes that user_id = 5 owns are not returned in the query).
Tbl: scenes
-------------------------------------------------------------------------
| design_id | scene_id | scene_name | ... [irrelevant columns to the Q] |
|-----------|----------|------------|-----------------------------------|
| 1 | 1 | welcome | |
| 1 | 2 | hello | |
| 2 | 3 | asd | |
-------------------------------------------------------------------------
The designs table is very similar to this and includes references to the game, game type, design name and so forth.
Tbl: user_scenes
----------------------------------------------------------------------
| design_id | scene_id | user_id | ... [irrelevant columns to the Q] |
|-----------|----------|---------|-----------------------------------|
| 1 | 1 | 5 | |
| 1 | 2 | 5 | |
| 1 | 1 | 9 | |
----------------------------------------------------------------------
Query
SELECT `designs`.`design_id`, `designs`.`design_name`, `scenes`.`scene_id`, `scenes`.`scene_name`, `scenes`.`scene_description`, `scenes`.`scene_unique_code`, `scenes`.`date_created`, `scenes`.`scene_cost`, `scenes`.`type`, `games`.`game_title`, `games`.`game_title_short`, `games_genres`.`genre`
FROM (`scenes`)
JOIN `designs` ON `designs`.`design_id` = `scenes`.`design_id`
JOIN `games` ON `designs`.`game_id` = `games`.`game_id`
JOIN `games_genres` ON `games`.`genre_id` = `games_genres`.`genre_id`
WHERE `scenes`.`private` = 0
ORDER BY `designs`.`design_name` asc, `scenes`.`scene_name` asc
LIMIT 6
The query uses CodeIgniter's active record ($this->db->select() / $this->db->where()) but that is somewhat irrelevant.
--
I've tried things like an INNER JOIN with user_scenes and then grouping by scene_id, but that presents an issue with only returning scenes that are present in user_scenes. I then made an attempt at a subquery but then questioned whether that was the correct route.
I understand there are other ways - looping through the returned data and querying whether that record exists for a specific user, but that I suspect would be highly inefficient. As such, I'm at a loss as to what to try and would appreciate any help.
I don't know if your setup permits it, but I would do a subselect:
Either via a NOT IN:
SELECT * FROM `scenes`
WHERE `scenes`.`scene_id` NOT IN (SELECT `scene_id` FROM `user_scenes` WHERE `user_id` = 5)
Or maybe via a LEFT JOIN:
SELECT * FROM `scenes`
LEFT JOIN (SELECT `scene_id`, `user_id` FROM `user_scenes` WHERE `user_id` = 5) AS `user_scenes`
ON `scenes`.`scene_id` = `user_scenes`.`scene_id`
WHERE `user_scenes`.`user_id` IS NULL
Bit I guess the first way is faster.

Is there a way to get nested data out of MySQL without using recursion?

So let us say that I have a menu system with all the navigation items stored in a MySQL table like so:
Table: Menu
-------------------------------------------------------
| id | title | url | parent_id |
-------------------------------------------------------
| 1 | Home | /home | 0 |
| 2 | About | /about | 0 |
| 3 | History | /about/history | 2 |
| 4 | Location | /about/location | 2 |
| 5 | Staff | /about/staff | 2 |
| 6 | Articles | /blog | 0 |
| 7 | Archive | /blog/archive | 6 |
| 8 | Tags | /blog/tags | 6 |
| 9 | Tag Name 1 | /blog/tags/tag-name-1 | 8 |
| 10 | Tag Name 2 | /blog/tags/tag-name-2 | 8 |
-------------------------------------------------------
As you can see this table is quite simple with the only complication being the self referencing column parent_id, which defines how the menu should be nested.
So this would produce the following menu:
- Home
- About
- History
- Location
- Staff
- Articles
- Archive
- Tags
- Tag Name 1
- Tag Name 2
Is there a way to get this structure from the aforementioned table without making use of a recursive function in PHP (but it could be Python, Java or any other language) that queries the database with each iteration?
Ideally this could be handled with one MySQL query. Perhaps the table structure needs to be changed to accommodate this - if so how?
You could pull all of it out in one single pull, and then work with it recursively in PHP. That way you save some of the query time, but gain a little scripting time.
I would do something like this:
Get all data, ordered by parent id
Put row into $data[$parent_id][]
define function to build menu, takes one param which is id
get $data[$id] and work with that array, building the array.
while looping through the items, check if size of $data[current-item-id] > 0
if so, call above function with 0 as param
This way, you only query the database once, but use a little more of the servers ram.
If you're fetching the whole tree and you can't or don't want to change the table structure, take a look at https://stackoverflow.com/a/8325451/4833
This can be done in sql query, take a look at this resource which explains recursion in a query
http://www.artfulsoftware.com/mysqlbook/sampler/mysqled1ch20.html.
MySQL don't have an default function to do that.
You can make an procedure with loop to get the data result you want, or create an function and use in your sql select.
Anyway you will use loop.
Example:
DROP PROCEDURE IF EXISTS famsubtree;
DELIMITER go
CREATE PROCEDURE famsubtree( root INT )
BEGIN
DROP TABLE IF EXISTS famsubtree;
CREATE TABLE famsubtree
SELECT childID, parentID, 0 AS level
FROM familytree
WHERE parentID = root;
ALTER TABLE famsubtree ADD PRIMARY KEY(childID,parentID);
REPEAT
INSERT IGNORE INTO famsubtree
SELECT f.childID, f.parentID, s.level+1
FROM familytree AS f
JOIN famsubtree AS s ON f.parentID = s.childID;
UNTIL Row_Count() = 0 END REPEAT;
E ND ;
go
DELIMITER ;
And use to query:
call famsubtree(1); -- from the root you can see forever
SELECT Concat(Space(level),parentID) AS Parent, Group_Concat(childID ORDER BY childID) AS Child
FROM famsubtree
GROUP BY parentID;

What is the algorithm behind nested comments?

I want to learn the comment displaying algorithm behind Reddit. How is a comment related with its child and so on? How they are stored in the database?
Lets say
comment1
-comment2
--comment3
-comment4
--comment5
--comment6
---comment7
----comment8
comment9
How to display comment5 which is after comment4 which is after comment1? What is the idea behind this sequencing? And how to relate them in the database?
It is called hierarchy. Each comment either has no parent comment, or has one parent comment. This way you can display every "top level" comment (thanks to the fact they have no parent comments), then child comments for each of them etc. etc.
And the database structure may look like this for comments table:
id field identifying single comment,
parent_id being set to parent's ID or not set (set to NULL or set to 0),
created - timestamp for comment creation,
content - actual comment content,
any additional field you need,
AS #Rafe said, the actual storage is pretty easy, it would be something like:
| id | name | parent |
| 1 | comment1 | 0 |
| 2 | comment2 | 1 |
| 3 | comment3 | 2 |
| 4 | comment4 | 1 |
| 5 | comment5 | 4 |
| 6 | comment6 | 4 |
| 7 | comment7 | 6 |
| 8 | comment8 | 7 |
| 9 | comment9 | 0 |
Of course actually getting information from this is (arguably) the hard part. You can of course get the children of a comment with something like: SELECT * FROM table WHERE parent='4' will give you all the children of comment4. But counting children, listing all the children in hierarchical order would be a bit harder. Other answers may provide more information on that.
Pretty much what #Rafe Kettler noted - comments can have parent columns. However, if you want a more detailed and in-depth algorithm to use as a pattern for your implementation, take a look at this message threading algorithm.

Categories