I have been browsing this site for the answer but I'm still a little unsure how to plan a similar system in its database structure and implementation.
In PHP and MySQL it would be clear that some achievements are earned immediately (when a specialized action is taken, in SO case: Filled out all profile fields), although I know SO updates and assigns badges after a certain amount of time. With so many users & badges wouldn't this create performance problems (in terms of scale: high number of both users & badges).
So the database structure I assume would something as simple as:
Badges | Badges_User | User
----------------------------------------------
bd_id | bd_id | user_id
bd_name | user_id | etc
bd_desc | assigned(bool) |
| assigned_at |
But as some people have said it would be better to have an incremental style approach so a user who has 1,000,000 forum posts wont slow any function down.
Would it then be another table for badges that could be incremental or just a 'progress' field in the badges_user table above?
Thanks for reading and please focus on the scalability of the desired system (like SO thousands of users and 20 to 40 badges).
EDIT: to some iron out some confusion I had assigned_at as a Date/Time, the criteria for awarding the badge would be best placed inside prepared queries/functions for each badge wouldn't it? (better flexibility)
Source: Tips4all, CCNA FINAL EXAM
I think the structure you've suggested (without the "assigned" field as per the comments) would work, with the addition of an additional table, say "Submissions_User", containing a reference to user_id & an incrementing field for counting submissions. Then all you'd need is an "event listener" as per this post and methinks you'd be set.
ReplyDeleteEDIT: For the achievement badges, run the event listener upon each submission (only for the user making the submission of course), and award any relevant badge on the spot. For the time-based badges, I would run a CRON job each night. Loop through the complete user list once and award badges as applicable.
regarding the sketch you included: get rid of the boolean column on badges_user. it makes no sense there: that relation is defined in terms of the predicate "user user_id earned the badge bd_id at assigned_at".
ReplyDeleteas for your overall question: define the schema to be relational without regard for speed first (that'll get you rid of half of potential perf. problems, possibly in exchange for different perf. problems), index it properly (what's proper depends on the query patterns), then if it's slow, derive a (still relational) design from that that's faster. like you may need to have some aggregates precomputed, etc.
I would keep a similar type structure to what you have
ReplyDeleteBadges(badge_id, badge_name, badge_desc)
Users(user_id, etc)
UserBadges(badge_id, user_id, date_awarded)
And then add tracking table(s) depending on what you want to track and @ what detail level... then you can update the table accordingly and set triggers on it to "award" the badges
User_Activity(user_id, posts, upvotes, downvotes, etc...)
You can also track stats from the other direction too and trigger badge awards
Posts(post_id, user_id, upvotes, downvotes, etc...)
Some other good points are made here
I think this is one of those cases where your many-to-many table (Badges_User) is appropriate.
ReplyDeleteBut with a small alteration so that unassigned badges isn't stored.
I assume assigned_at is a date and/or time.
Default is that the user does not have the badges.
Badges | Badges_User | User
----------------------------------------------
bd_id | bd_id | user_id
bd_name | user_id | etc
bd_desc | assigned_at |
| |
This way only badges actually awarded is stored.
A Badges_User row is only created when a user gets a badge.
Regards
Sigersted