Article with revisions system

Article with revisions system - php

For a project I am making I need the possibility (like stackoverflow does) to save all the previous edit (revisions) for posts.
Consider I can have some 1 to N association with the post (for example 1 post with 5 images associated).
How would you suggest me to design the database for this?
Of course the ID of the post should stay the same to don't broke URLs:
site/post/123 (whenever revisions it is)
Each revisions to posts should be manually approved so you can't show directly the last revisions inserted. How would you suggest me to design the db?
I have tought
Table: Post
postID | reviewID | isApproved | authorID | text
And the image table (for example image, but it could be everything)
Secondary Table: Image
imageID | postID | reviewID | imagedata

Actually, I would split the post table in two, with the approved revisions in one, and the latest (not approved) revision in another. The rational is that any non approved revision which is not the latest would be supersceded by the next one (unless you really want to keep track of all the intermediate modifications, approved or not).
Table: OldPost
postID | reviewID | authorID | text
Table: PendingPost
postID | authorID | text
In that layout, whenever a new revision has been approved, it must be moved to the approved ones, but you don't have to filter them out when displaying the whole history, and conversely, you wont have to filter the approved revisions in the approval part of your site.
You could even refine the layout with yet another dedicated table for the latest approved revision (so three tables for the post in total, not counting attachements). This partitioning would improve the overall performance of your site for the most common queries, at the cost of more complex queries when you need all the data (less frequent operations).
Table: CurrentPost
postID | authorID | text
As you can see, this table structure is the same as the one for pending posts, so the updates would be trivial.
moving a revision to the old post table requires to find out the revision count, but you would have to do that operation anyway with a more classic db layout.
Regarding the attachment table, the layout seems to work.

Separate all aspects of a post between global information and versionable information. In other words, what things can be changed in a revision and what are always going to apply to any revision. These are going to be the fields in your two tables, one for your posts, and one for the revisions. You will also need a row to specify what post the revision is for as well as whether the revision is approved, and on the posts table, you need a row to specify what the current revision in.

Related

Trying to understand database schema, particular WordPress Comments

I am having trouble fully understanding the schema of the WordPress comments and commentmeta tables, and how they are linked together.
I'd like to learn by making a custom row in each table (wp_commentmeta & wp_comments).
WordPress Database Schema
Following is the example I am working with.
wp_commentmeta:
meta_id | comment_id | meta_key | meta_value
2 1352 verified 1
What does the meta_value denote in the wp_commentmeta table? Is this a rating system 0-5, or similar?
wp_comments
comment_ID | comment_post_ID | misc_cols --- | user_id
2,1352,Waldo,test#test.com,"",127.0.0.1,2014-11-15 00:18:39,2014-11-15 04:18:39,"test comment",0,1,"user_agent","",0,657
comment_type is an empty field, third from last. I'll just tried adding "comment" there, no luck.
The review does show on the backend and the product page however, the product page says "Reviews (0)." The reviews are not being counted on the product page.
Would you please explain this to me?

meta_value in the postmeta table is type-agnostic. What that data represents depends on which plugin/function stored it and what it wants it to mean. You can store integers, dates, strings, or PHP data structures, WordPress does not care, and stores them all as strings internally. In your case, I'm guessing that 1 means the user is verified and 0, NULL or no row means the user isn't verified.
comment_type is similar to post_type. If you want to add a special kind of comment (a review in your case), you'll have to figure out what the software you are using expected as comment_type. Look at existing reviews, what do they have set as comment_type?

In order for the reviews to show the count, I had to navigate to comments, edit the comment, and update the comment (with no changed fields). Perhaps the HTTP server needed to be reloaded, or WooCommerce needs to be reloaded in some way.

Concurrent mutiuser access php+mysql

Im going to develop Stock maintaining system using php+mysql. which will runs on server machine, so many users can update stock data. (in/out)
Im currently working on this system. I have following problems.
User A opens record “A”. ex- val=10
User B opens record “A”. ex - val=10
User A saves changes to record “A”. ex - val=10+2=12 (add 3 items, then stock should be 12)
User B saves changes to record “A”. ex - here i need to get record "A" value AS = 12, then B update val=12+3=15. (then add 3 items final stock will be 15)
In this example, User A’s changes are lost – replaced by User B’s changes.
I know mysql Innodb facilitate row level locking. My question is ,
is innodb engine do concurrent control ; and is this enough to (Innodb) to avoid "lost update" problem. or need to do extra coding to avoid this problem.
Is this enough please tell me how innodb works with my previous example. (lost update)
(sorry for my bad english)
thanks

InnoDB allows concurrent access, so User A and User B could definitely be handling the same data. User A will update the row based on his/her data, then User B can do the same -- ultimately resulting in User A's loss of data.
You should consider an alternative, if every update is vital to keep. For example, if both users are updating a blog article, you could make a new table that holds all these edits. Both user's edits would be preserved, despite when they retrieved the article content. When the article is retrieved, you can check when the most recent edit occurred and retrieve that instead.

Look, there's something called "versioning".
The idea is simple:
When a user opens a record, he also gets the version number.
When he saves changes to that record, at the sql level, the update is conditional, meaning that the update will happen ONLY if the current version is the same. This update also increases the version by one.
This way ensures you're not writing to a "stale" copy of your record.
Hope it's clear.

You could also implement some polling to the server, keep a record of the last update of the row and if it changes where if user B updates the record before A then you can notify user A that the record has been updated and that his changes wont take effect or you could update the values dynamically.

You can use two tables for this purpose. First - StockItems with item name, id, and count. Second - StockActivities with item id and operation amount.
To add or remove items from stock you need to insert records to the second table StockActivities, with item id and quantity that is added / removed.
item id:1, qnt: +10
item id:1, qnt: +1
item id:10, qnt: -2
Field count of StockItems table should be "read only" for users and should be calculated based on StockActivities table.
For example, you can create after insert trigger for StockActivities table that will update count field of added / removed stock item.

Judging by comments left, I think it prudent to respond with some pointers I have come across, in case someone needs to.
If you only want to update a value by an offset, you can do this quite easily and atomically. Assume the following data:
+----+--------+-------+
| id | name | price |
+----+--------+-------+
| 1 | Foo | 49 |
| 2 | Bar | 532 |
| 3 | Foobar | 24 |
+----+--------+-------+
We can now run the following queries to add one to the price:
select id, price from prices where name like "Foo";
// Later in the application
update prices set price=50 where id=1;
This is the non-concurrent/non-atomic way to do this, assuming that there is no changes or fetches in between the two queries. A more atomic way to do this, is the following.
select id, price from prices where name like "Foo";
// Later in the application
update prices set price=price+1 where id=1;
Here, this query allows us to increment the price in one query, eliminating the ability for others to come and update between two queries.
Additionally, there are methods of updating data safely, where the nature of the update is not a simple addition or subtraction. Let's say, here, that we have the following data:
+----+----------+---------------------+
| id | job_name | last_run |
+----+----------+---------------------+
| 1 | foo_job | 2016-07-13 00:00:00 |
| 2 | bar_job | 2016-07-14 00:00:00 |
+----+----------+---------------------+
In this case, we have multiple different clients, where all clients can do any job. We then need a way to dispatch work to one client, and only one client.
We can either use a transaction, where we will error out if the record has been updated or we can use a technique called CAS, or Compare and Swap.
Here's how we do this in MySQL:
update jobs set last_run=NOW() where id=1 and last_run='2016-07-13 00:00:00'
Then, in the data returned from mysql, we can tell the number of rows affected. If we have affected a row, then we have successfully updated it, and the job is ours. If there were no rows updated, then another machine has updated it, claiming the job there.
This works because any update from our application will cause the column to change, and since the column's value is a condition for completing the updated, it will avoid concurrent changes, allowing the application to decide what occurs next.

Database Schema for News System

I have a news system I'm designing, and it seemed straight-forward at first, but as I've pushed forward with my planned schema I've hit problems... Clearly I haven't thought it through. Can anyone help?
The system requires that the latest 20 news articles be grabbed from the database. It's blog-like in this way. Each article can have sub-articles (usually around 3) that can be accessed from the parent article. The sub-articles are only ever visible when the parent article is visible -- they're not used elsewhere.
The client needs to be able to hide/display news articles (easy), but also change their order, if they desire (harder).
I initially stored the sub-articles in a separate table, but then I realised that the fields were essentially the same: Headline, Copy, Image. So why not just put them all in one big table?
Now I've hit other problems around the ordering. It's Friday evening and my head hurts!
Can anyone offer advice?
Thanks.
Update: People have asked to see my "existing" schema:
articleID *
headline
copy
imageURL
visible
pageOrder
subArticleID *
articleID
headline
copy
imageURL
visible
pageNumber
pageOrder
Will this work? How would I go about letting users change the order? It seemed the wrong way to do it, to me, so I threw this out.

I initially stored the sub-articles in a separate table, but then I realised that the fields were essentially the same: Headline, Copy, Image. So why not just put them all in one big table?
Because referential integrities are not the same.
That is, of course, if you want to restrict the tree to exactly 2 levels. If you want more general data model (even if that means later restricting it at the application level), then go ahead and make a general tree.
This would probably look something like this:
Note how both PARENT_ARTICLE_ID and ORDER are NULL-able (so you can represent a root) and how both comprise the UNIQUE constraint denoted by U1 in the diagram above (so no two articles can be ambiguously ordered under the same parent).

Based on what you've described. I would use two tables. The first table would hold all the articles and sub-articles. The second would tie the articles to their sub-articles.
The first table (call it articles) might have these columns:
+-----------+----------+------+----------+---------+------------+-----------+
| articleID | headline | copy | imageURL | visible | pageNumber | pageOrder |
+-----------+----------+------+----------+---------+------------+-----------+
The second table (call it articleRelationships) might have these columns:
+-----------------+----------------+
| parentArticleID | childArticleID |
+-----------------+----------------+
Not sure if you already accomplish this with the pageNumber column, but if not, you could add a column for something like articleLevel and give it something like a 1 for main articles, 2 for sub-articles of the main one, 3 for sub-articles of a level 2 article, etc. So that way, when selecting the latest 20 articles to be grabbed, you just select from the table where articleLevel = 1.
I'm thinking it would probably also be useful to store a date/time with each article so that you can order by that. As far as any other ordering goes, you'll have to clarify more on that for me to be more help there.
To display them for the user, I would use AJAX. I would first display the latest 20 main articles on the screen, then when the user chooses to view the sub-articles for a particular article, use AJAX to call the database and do a query like this:
SELECT a.articleID, a.headline
FROM articles a
INNER JOIN articleRelationships ar ON a.articleID = ar.childArticleID
WHERE ar.parentArticleID = ? /* ? is the articleID that the user clicked */
ORDER BY articleID

The client needs to be able to hide/display news articles (easy), but
also change their order, if they desire (harder).
On this particular point, you'll need to store client-specific ordering in a table. Exactly how you do this will depend, in part, on how you choose to deal with articles and subarticles. Something along these lines will work for articles.
client_id article_id article_order
--
1 1067 1
1 2340 2
1 87 3
...
You'll probably need to make some adjustments to the table and column names.
create table client_article_order (
client_id integer not null,
article_id integer not null,
article_order integer not null,
primary key (client_id, article_id),
foreign key (client_id) references clients (client_id) on delete cascade,
foreign key (article_id) references articles (article_id) on delete cascade
) engine = innodb;
Although I made article_order an integer, you can make a good case for using other data types instead. You could use float, double, or even varchar(n). Reordering can be troublesome.
If you don't need the client id, you can store the article ordering in the article's table.
But this is sounding more and more like the kind of thing Drupal and Wordpress do right out of the box. Is there a compelling reason to reinvent this wheel?

Create a new field in news(article) table "parent" which will contain news id of parent article. This new field will be used as a connection between articles and sub articles.

As SlideID "owns" SubSlideID, I would use a composite primary key for the second table.
PrimaryKey: slideID, subSlideID
Other index: slideID, pageNumber, pageOrder (Or however they get displayed)
One blog post I prefer to point out about this is http://weblogs.sqlteam.com/jeffs/archive/2007/08/23/composite_primary_keys.aspx as it explains why very nicely.
If you're replying on Auto_Increment, that can be handled too (with MyISAM tables), you can still set subSlideID to auto_increment.
If you're likely to go to a third level then merge - follow Branko above. But it does start to get very complicated, so keep separate for 2 layers only.

querying records that have revisions related to them

I created a commenting system that allow users to submit comments on each item.
It turned into bit of a project/scope creep and now I need to implement the ability for users to edit their original comments and keep track of those comments.
All comments are located in the comments table
comments: id, comment, item_id, timestamp
Now that revisions must be tracked, I created a new table titled revisions:
comment_id, revision_id, timestamp
All comments (new or old) are entered into the comments table, if the user decides to revise an existing comment, it will be entered as a new record in the comments, then recorded into the revisions table. Once the new comment is entered into the comments table, it will take the id that was created and pass it into the revisions.reivison_id, and it will populate revisions.comment_id with the id of the original comment the user revised (hope I didn't lose you).
Now I've come to the problem I need help with: I need to display a list of all comments for a specific item, which would have a query of something like
select * from comments where item_id = 1
Now that I added the revisions table, I need to retrieve a list of comments for the specific item (just like the above query does) and (and heres the kicker) if any comment is revised, I need to return the most recent version of that comment.
What is the best way of accomplishing this?
I thought about running two queries, one to retrieve all the comments in the comments table, store in an array, and another query to return all records within the revisions table where I would set revisions.comment_id to be distinct and would only want to return the more recent one
the revisions query might look something like this
select comment_id DISTINCT, revision_id, timestamp
from revisions order by timestamp desc
What is the best way of only displaying the most recent version of each comment (some will have revisions and most won't)?
I am not a sql expert, so it might be accomplished using sql or will I need to run two different queries, store data into separate arrays, then run thru each array, compare and strip out the older versions of that comment? example (part in theory) below
foreach($revisions as $r):
$comments = strip key/value from comments array where $r['comment_id'] is
found in comments array
endforeach;
return $comments; // return the comments array after it was stripped of the older comments
I imagine if there was a way of running one query to only return a list of the most recent versions of a comment is the best practice, if so, could you provide the appropriate query for that, otherwise is the two queries into two arrays and striping out values from the comments array the best way or a better way?
Thanks in advance.

First off, I'll add two alternative approaches and then I'll edit with a query to deal with your current schema.
Option 1 - Add a deleted flag to your comments. When a comment is revised, do as you already do but also mak the original as deleted. Then you just need WHERE deleted = 0 wher you want active comments.
Option 2 - Change your revision table to be a clone of the comment table, plus an additional field for when the revision was made. Now, whenever you revise a comment, don't create a new record in comment, just update the existign row and add a new row to the revisions table. This is easily maintained with a trigger and is a very standard auditting pattern.
EDIT Option 3 - A query to cope with your schema.
As described, if I make a comment, then edit it twice (with no other activity), I get something like this...
id | comment | item_id | timestamp
----+--------------+---------+-----------
1 | Hello, | 1 | 13:00
2 | World! | 1 | 14:00
3 | Hello, World | 1 | 15:00
comment_id | revision_id | timestamp
-----------+-------------+-----------
1 | 2 | 14:00
2 | 3 | 15:00
Base on this, the live comment is the only one without an entry in the revision table...
SELECT *
FROM comment
WHERE NOT EXISTS (SELECT * FROM revision WHERE comment_id = comment.id)
AND item_id = #item_id

Achievements / Badges system

I have been browsing this site for the answer but I'm still a little unsure how to plan a similar system in its database structure and implementation.
In PHP and MySQL it would be clear that some achievements are earned immediately (when a specialized action is taken, in SO case: Filled out all profile fields), although I know SO updates and assigns badges after a certain amount of time. With so many users & badges wouldn't this create performance problems (in terms of scale: high number of both users & badges).
So the database structure I assume would something as simple as:
Badges | Badges_User | User
----------------------------------------------
bd_id | bd_id | user_id
bd_name | user_id | etc
bd_desc | assigned(bool) |
| assigned_at |
But as some people have said it would be better to have an incremental style approach so a user who has 1,000,000 forum posts wont slow any function down.
Would it then be another table for badges that could be incremental or just a 'progress' field in the badges_user table above?
Thanks for reading and please focus on the scalability of the desired system (like SO thousands of users and 20 to 40 badges).
EDIT: to some iron out some confusion I had assigned_at as a Date/Time, the criteria for awarding the badge would be best placed inside prepared queries/functions for each badge wouldn't it? (better flexibility)

I think the structure you've suggested (without the "assigned" field as per the comments) would work, with the addition of an additional table, say "Submissions_User", containing a reference to user_id & an incrementing field for counting submissions. Then all you'd need is an "event listener" as per this post and methinks you'd be set.
EDIT: For the achievement badges, run the event listener upon each submission (only for the user making the submission of course), and award any relevant badge on the spot. For the time-based badges, I would run a CRON job each night. Loop through the complete user list once and award badges as applicable.

regarding the sketch you included: get rid of the boolean column on badges_user. it makes no sense there: that relation is defined in terms of the predicate "user user_id earned the badge bd_id at assigned_at".
as for your overall question: define the schema to be relational without regard for speed first (that'll get you rid of half of potential perf. problems, possibly in exchange for different perf. problems), index it properly (what's proper depends on the query patterns), then if it's slow, derive a (still relational) design from that that's faster. like you may need to have some aggregates precomputed, etc.

I would keep a similar type structure to what you have
Badges(badge_id, badge_name, badge_desc)
Users(user_id, etc)
UserBadges(badge_id, user_id, date_awarded)
And then add tracking table(s) depending on what you want to track and # what detail level... then you can update the table accordingly and set triggers on it to "award" the badges
User_Activity(user_id, posts, upvotes, downvotes, etc...)
You can also track stats from the other direction too and trigger badge awards
Posts(post_id, user_id, upvotes, downvotes, etc...)
Some other good points are made here

I think this is one of those cases where your many-to-many table (Badges_User) is appropriate.
But with a small alteration so that unassigned badges isn't stored.
I assume assigned_at is a date and/or time.
Default is that the user does not have the badges.
Badges | Badges_User | User
----------------------------------------------
bd_id | bd_id | user_id
bd_name | user_id | etc
bd_desc | assigned_at |
| |
This way only badges actually awarded is stored.
A Badges_User row is only created when a user gets a badge.
Regards
Sigersted

We Keep Coding

PHP, A popular general-purpose scripting language that is especially suited to web development.