I have been browsing this site for the answer but I'm still a little unsure how to plan a similar system in its database structure and implementation.
In PHP and MySQL it would be clear that some achievements are earned immediately (when a specialized action is taken, in SO case: Filled out all profile fields), although I know SO updates and assigns badges after a certain amount of time. With so many users & badges wouldn't this create performance problems (in terms of scale: high number of both users & badges).
So the database structure I assume would something as simple as:
Badges | Badges_User | User
----------------------------------------------
bd_id | bd_id | user_id
bd_name | user_id | etc
bd_desc | assigned(bool) |
| assigned_at |
But as some people have said it would be better to have an incremental style approach so a user who has 1,000,000 forum posts wont slow any function down.
Would it then be another table for badges that could be incremental or just a 'progress' field in the badges_user table above?
Thanks for reading and please focus on the scalability of the desired system (like SO thousands of users and 20 to 40 badges).
EDIT: to some iron out some confusion I had assigned_at as a Date/Time, the criteria for awarding the badge would be best placed inside prepared queries/functions for each badge wouldn't it? (better flexibility)
I think the structure you've suggested (without the "assigned" field as per the comments) would work, with the addition of an additional table, say "Submissions_User", containing a reference to user_id & an incrementing field for counting submissions. Then all you'd need is an "event listener" as per this post and methinks you'd be set.
EDIT: For the achievement badges, run the event listener upon each submission (only for the user making the submission of course), and award any relevant badge on the spot. For the time-based badges, I would run a CRON job each night. Loop through the complete user list once and award badges as applicable.
regarding the sketch you included: get rid of the boolean column on badges_user. it makes no sense there: that relation is defined in terms of the predicate "user user_id earned the badge bd_id at assigned_at".
as for your overall question: define the schema to be relational without regard for speed first (that'll get you rid of half of potential perf. problems, possibly in exchange for different perf. problems), index it properly (what's proper depends on the query patterns), then if it's slow, derive a (still relational) design from that that's faster. like you may need to have some aggregates precomputed, etc.
I would keep a similar type structure to what you have
Badges(badge_id, badge_name, badge_desc)
Users(user_id, etc)
UserBadges(badge_id, user_id, date_awarded)
And then add tracking table(s) depending on what you want to track and # what detail level... then you can update the table accordingly and set triggers on it to "award" the badges
User_Activity(user_id, posts, upvotes, downvotes, etc...)
You can also track stats from the other direction too and trigger badge awards
Posts(post_id, user_id, upvotes, downvotes, etc...)
Some other good points are made here
I think this is one of those cases where your many-to-many table (Badges_User) is appropriate.
But with a small alteration so that unassigned badges isn't stored.
I assume assigned_at is a date and/or time.
Default is that the user does not have the badges.
Badges | Badges_User | User
----------------------------------------------
bd_id | bd_id | user_id
bd_name | user_id | etc
bd_desc | assigned_at |
| |
This way only badges actually awarded is stored.
A Badges_User row is only created when a user gets a badge.
Regards
Sigersted
Related
I am trying to create a pivot table to help keep track of "challenges" in my applications. Basically I have a challenge_task pivot table that creates a relationship between a challenge and a task. When a user that is in a challenge completes a task I want to be able to tell so I can track a user's progress. How is the best way to store multiple users completing a task on a challenge?
I was thinking in the pivot table adding a json column called user_completed to handle this and store the user_id for every user that completes the task for a challenge.
So challenge_task would look like
challenge_id | task_id | user_completed
Is this a good way? Or is there anything that fits this better?
I'd recommend a database structure something like this:
challenge: challenge_id | other data
task: task_id | other data
user: user_id | other data
challenge_task: challenge_task_id | challenge_id | task_id
| possibly more data (such as deadline for completion)
challenge_task_users: challenge_task_id | user_id
| possibly more data (such as status: accepted, in progress, completed)
I dont recommend Json if you want to index your data, because Json can not be indexed.
I think you should make a pivot table between the users and the tasks too, and create the neccesary relations.
I wouldn't recommend you inserting multiple values in one database column.
Note: This is my opinion. Just sharing the way I use it.
A table called tasks_settings which has the task settings.
I find this way flexible because I can always edit the title, description, and reward easily. I can also add 2 more fields here which are valid_till and valid_for. So you can make it expire after a period of time and only for a special rank like staff or all users.
Another table called users_tasks
This one controls the users. Whether they have completed the task or not. This could also achieve what you are looking for.
id | challenge_id | task_id | username | user_completed
I hope this has helped you!
What would be an efficient way to store "Quests" in an SQL database? Let's say the context is RPG. (Here was a previous question: How to store Goals (think RPG Quest) in SQL)
To summarize a Quest may be a combination of the following:
Discover [Location]
Kill n [MOB Type]
Acquire n of [Object]
Achieve a [Skill] in [Skillset]
All the other things you get in RPGs
The answer listed out in the link was:
For the Quest table:
| ID | Title | FirstStep (Foreign key to GuestStep table) | etc.
The QuestStep table
| ID | Title | Goal (Foreign key to Goal table) | NextStep (ID of next QuestStep)
I actually think it's pretty neat, but I have two things I would like to add:
Let's say I want to create it so that a quest can only be active only on certain days (e.g. M W F only) and/or active only at a certain time span (e.g. Halloween). What would be the ideal way of doing this?
Another thing: Let say I want to have a quest with two steps and a quest with 8 steps. We can create a table that is 8 columns wide but we would have lots of empty space. And what if the stars align and I needed an 9 step-wide quest?
The QuestStep table actually has a NextStep, sort of like a linked list, but what about Quests that you can do out of order?
P.S: As you can see it is potentially read-heavy, and the schema is potentially... non-schematic. Is NosSQL a vying option? (Redis seems memory only, so I'll more likely go with MongoDB)
I am developing a (potentially) large-scale tracking software that tracks customer data, along with tickets that are created for tasks associated with said customers. This system is written entirely in PHP, and the database is MySQL.
The system currently supports multiple "locations" (stores for example), and each has its own table for customer data (in the same database, each database can be host to a whole different business' installation). For example:
store1_customers
customer_id | customer_firstname | customer_lastname
----------------------------------------------------
1 | John | Doe
2 | Bill | Bob
store2_customers
customer_id | customer_firstname | customer_lastname
----------------------------------------------------
1 | Jill | Smith
2 | Jimmy | Person
This works great for keeping locations separate for different business needs. However, we are running into the need to have "global" customers for other instances that can be accessed from any location, while keeping other customers separate.
The two options I can think of are to either make a new "global_customers" table that can then be pulled from separately, or to merge all of the data into one large table.
I have concerns with both methods. The first would require a new column in every table that references the customer to determine which customer table to pull from. For example, store1_tickets would have to know whether to pull the customer ID of 1 from store1_customers or from global_customers. This seems to be a bit dirty, and I think would present problems with trying to do my multiple JOIN queries.
The second method of making one giant table concerns me in two ways: the first being the size of the table (each table so far can have potentially 20k+ records, and there are 7 locations for just one particular installation of the "software"). I know this point may be moot due to how MySQL works and can handle it. The second concern is merging the existing data. I see it being a nightmare since each table has a 1-20k customer ID, and I would have to have some way of changing thousands upon thousands of existing records in other tables to match the new numbering of this table.
Is there a better way, or more proper way of accomplishing this? I'm sorry if this question does seem subjective, but it does come down to a database problem and how to handle the data in a reasonable way.
Merge all the data into one large table. That is how databases are designed to be used.
For data migration, you will end up with new Keys, there is no way around that. You could, however, add a new column to store the 'legacy' ID. This is just some of the pain assoicatied with normalizing a database. Take the pain now rahter than presisting with a sub-optimal database design.
Customer type would be another column within the cusotmer table, probably (but depending on your requirements) this would be a FK to a CustomerType table.
For a project I am making I need the possibility (like stackoverflow does) to save all the previous edit (revisions) for posts.
Consider I can have some 1 to N association with the post (for example 1 post with 5 images associated).
How would you suggest me to design the database for this?
Of course the ID of the post should stay the same to don't broke URLs:
site/post/123 (whenever revisions it is)
Each revisions to posts should be manually approved so you can't show directly the last revisions inserted. How would you suggest me to design the db?
I have tought
Table: Post
postID | reviewID | isApproved | authorID | text
And the image table (for example image, but it could be everything)
Secondary Table: Image
imageID | postID | reviewID | imagedata
Actually, I would split the post table in two, with the approved revisions in one, and the latest (not approved) revision in another. The rational is that any non approved revision which is not the latest would be supersceded by the next one (unless you really want to keep track of all the intermediate modifications, approved or not).
Table: OldPost
postID | reviewID | authorID | text
Table: PendingPost
postID | authorID | text
In that layout, whenever a new revision has been approved, it must be moved to the approved ones, but you don't have to filter them out when displaying the whole history, and conversely, you wont have to filter the approved revisions in the approval part of your site.
You could even refine the layout with yet another dedicated table for the latest approved revision (so three tables for the post in total, not counting attachements). This partitioning would improve the overall performance of your site for the most common queries, at the cost of more complex queries when you need all the data (less frequent operations).
Table: CurrentPost
postID | authorID | text
As you can see, this table structure is the same as the one for pending posts, so the updates would be trivial.
moving a revision to the old post table requires to find out the revision count, but you would have to do that operation anyway with a more classic db layout.
Regarding the attachment table, the layout seems to work.
Separate all aspects of a post between global information and versionable information. In other words, what things can be changed in a revision and what are always going to apply to any revision. These are going to be the fields in your two tables, one for your posts, and one for the revisions. You will also need a row to specify what post the revision is for as well as whether the revision is approved, and on the posts table, you need a row to specify what the current revision in.
Im going to develop Stock maintaining system using php+mysql. which will runs on server machine, so many users can update stock data. (in/out)
Im currently working on this system. I have following problems.
User A opens record “A”. ex- val=10
User B opens record “A”. ex - val=10
User A saves changes to record “A”. ex - val=10+2=12 (add 3 items, then stock should be 12)
User B saves changes to record “A”. ex - here i need to get record "A" value AS = 12, then B update val=12+3=15. (then add 3 items final stock will be 15)
In this example, User A’s changes are lost – replaced by User B’s changes.
I know mysql Innodb facilitate row level locking. My question is ,
is innodb engine do concurrent control ; and is this enough to (Innodb) to avoid "lost update" problem. or need to do extra coding to avoid this problem.
Is this enough please tell me how innodb works with my previous example. (lost update)
(sorry for my bad english)
thanks
InnoDB allows concurrent access, so User A and User B could definitely be handling the same data. User A will update the row based on his/her data, then User B can do the same -- ultimately resulting in User A's loss of data.
You should consider an alternative, if every update is vital to keep. For example, if both users are updating a blog article, you could make a new table that holds all these edits. Both user's edits would be preserved, despite when they retrieved the article content. When the article is retrieved, you can check when the most recent edit occurred and retrieve that instead.
Look, there's something called "versioning".
The idea is simple:
When a user opens a record, he also gets the version number.
When he saves changes to that record, at the sql level, the update is conditional, meaning that the update will happen ONLY if the current version is the same. This update also increases the version by one.
This way ensures you're not writing to a "stale" copy of your record.
Hope it's clear.
You could also implement some polling to the server, keep a record of the last update of the row and if it changes where if user B updates the record before A then you can notify user A that the record has been updated and that his changes wont take effect or you could update the values dynamically.
You can use two tables for this purpose. First - StockItems with item name, id, and count. Second - StockActivities with item id and operation amount.
To add or remove items from stock you need to insert records to the second table StockActivities, with item id and quantity that is added / removed.
item id:1, qnt: +10
item id:1, qnt: +1
item id:10, qnt: -2
Field count of StockItems table should be "read only" for users and should be calculated based on StockActivities table.
For example, you can create after insert trigger for StockActivities table that will update count field of added / removed stock item.
Judging by comments left, I think it prudent to respond with some pointers I have come across, in case someone needs to.
If you only want to update a value by an offset, you can do this quite easily and atomically. Assume the following data:
+----+--------+-------+
| id | name | price |
+----+--------+-------+
| 1 | Foo | 49 |
| 2 | Bar | 532 |
| 3 | Foobar | 24 |
+----+--------+-------+
We can now run the following queries to add one to the price:
select id, price from prices where name like "Foo";
// Later in the application
update prices set price=50 where id=1;
This is the non-concurrent/non-atomic way to do this, assuming that there is no changes or fetches in between the two queries. A more atomic way to do this, is the following.
select id, price from prices where name like "Foo";
// Later in the application
update prices set price=price+1 where id=1;
Here, this query allows us to increment the price in one query, eliminating the ability for others to come and update between two queries.
Additionally, there are methods of updating data safely, where the nature of the update is not a simple addition or subtraction. Let's say, here, that we have the following data:
+----+----------+---------------------+
| id | job_name | last_run |
+----+----------+---------------------+
| 1 | foo_job | 2016-07-13 00:00:00 |
| 2 | bar_job | 2016-07-14 00:00:00 |
+----+----------+---------------------+
In this case, we have multiple different clients, where all clients can do any job. We then need a way to dispatch work to one client, and only one client.
We can either use a transaction, where we will error out if the record has been updated or we can use a technique called CAS, or Compare and Swap.
Here's how we do this in MySQL:
update jobs set last_run=NOW() where id=1 and last_run='2016-07-13 00:00:00'
Then, in the data returned from mysql, we can tell the number of rows affected. If we have affected a row, then we have successfully updated it, and the job is ours. If there were no rows updated, then another machine has updated it, claiming the job there.
This works because any update from our application will cause the column to change, and since the column's value is a condition for completing the updated, it will avoid concurrent changes, allowing the application to decide what occurs next.