System requirements
Functional:
- User Registration and Authentication:
- Users can create an account securely.
- Users can log in and log out.
- Password reset and account recovery mechanisms are supported.
- Posting Content:
- Users can submit text posts or links to various topics (subreddits).
- Support for formatting options like Markdown or Rich Text is provided.
- Voting System:
- Users can upvote or downvote content.
- Vote fuzzing is implemented to combat manipulation.
- Commenting and Discussion:
- Users can comment on posts and reply to other comments.
- Threaded discussions are supported to organize comments.
- Subreddit Management:
- Users can create new subreddits based on different topics.
- Moderators can manage content and users within their subreddit.
- User Profiles:
- Each user has a profile displaying their activity, posts, comments, and upvoted/downvoted content.
- Users can follow other users.
- Content Moderation:
- A system is in place to detect and remove spam, offensive content, and ensure content quality.
- Users can report inappropriate content.
- Recommendation Engine:
- Personalized content recommendations are provided based on user activity and preferences.
- Algorithms surface trending and popular content.
- Notifications:
- Users receive notifications for new comments, upvotes on their posts, and other relevant activities.
- Real-time notifications enhance user engagement.
Non-Functional:
- Scalability:
- The system should handle a large number of users, posts, and comments efficiently.
- Horizontal scaling should be supported to accommodate increasing load.
- Performance:
- Response times for user interactions such as posting, voting, and commenting should be minimal.
- The platform should be responsive even during peak usage times.
- Reliability:
- The system should be highly available and resilient to failures.
- Data integrity and consistency should be maintained at all times.
- Security:
- User data should be encrypted and protected against unauthorized access.
- Measures should be in place to prevent spam, hacking, and other security threats.
- Usability:
- The platform should have an intuitive user interface and be easy to navigate.
- Accessibility standards should be followed to ensure inclusivity.
- Maintainability:
- The codebase should be well-structured and documented to facilitate future updates and maintenance.
- Automated testing and deployment pipelines should be implemented to streamline development workflows.
- Compliance:
- The platform should comply with relevant data protection regulations such as GDPR.
- Content moderation policies should adhere to community guidelines and legal requirements.
Capacity estimation
Traffic Volume:
Let us assume the below numbers,
Total Users : 500 million
Daily Active Users (DAU): 50 million
Monthly Active Users (MAU): 330 million
Screen Views per Month: 21 billion
Visits in One Month: 1.6 billion
Total Subreddits: 2 million
Transaction Handling:
- Number of Transactions Per Second (TPS):
- Assuming a linear distribution of screen views and visits throughout the month, we can calculate the average TPS using the monthly values.
- Total Monthly TPS = Screen Views per Month + Visits in One Month
- Total Monthly TPS = 21 billion + 1.6 billion = 22.6 billion
- Average TPS = Total Monthly TPS / (30 days * 24 hours * 3600 seconds)
- Average TPS = 22.6 billion / (30 * 24 * 3600) ≈ 8,266 TPS
- Number of Write Requests Per Second for Posts:
- Given there are 2 million posts created in a day, we can calculate the number of new posts in a year.
- New Posts Per Year = 2 million * 365 days
- New Posts Per Year = 730 million
- Write Requests Per Second = New Posts Per Year / (365 days * 24 hours * 3600 seconds)
- Write Requests Per Second = 730 million / (365 * 24 * 3600) ≈ 23.14 TPS
- Number of Comment Requests Per Second:
- Assuming each post has an average of 25 comments, we can calculate the total comment requests per second.
- Total Comment Requests Per Second = New Posts Per Year * Average Comments Per Post / (365 days * 24 hours * 3600 seconds)
- Total Comment Requests Per Second = 730 million * 25 / (365 * 24 * 3600) ≈ 18.40 TPS
Storage Estimations (5 Years):
- User Data:
- Assuming user data includes basic profile information, authentication details, and activity logs.
- Let's estimate an average user data size of 1 MB.
- Total User Data Storage = Total Users * Average User Data Size
- Total User Data Storage = 500 million * 1 MB = 500 million MB
- Total User Data Storage = 500 TB
- Subreddit Metadata:
- Subreddit metadata includes information such as subreddit names, descriptions, and moderator lists.
- Assuming an average metadata size of 10 KB per subreddit.
- Total Subreddit Metadata Storage = Total Subreddits * Average Metadata Size
- Total Subreddit Metadata Storage = 2 million * 10 KB = 20 million KB
- Total Subreddit Metadata Storage = 20 GB
- Posts:
- Considering posts include text, links, and metadata, let's estimate an average post size of 100 KB.
- Assuming the number of new posts per year remains consistent.
- Total Post Storage Per Year = New Posts Per Year * Average Post Size
- Total Post Storage Per Year = 730 million * 100 KB = 73 PB
- Total Post Storage for 5 Years = Total Post Storage Per Year * 5
- Total Post Storage for 5 Years = 73 PB * 5 = 365 PB
- Comments:
- Similar to posts, let's estimate an average comment size of 10 KB.
- Total Comment Storage Per Year = New Posts Per Year * Average Comments Per Post * Average Comment Size
- Total Comment Storage Per Year = 730 million * 25 * 10 KB = 182.5 TB
- Total Comment Storage for 5 Years = Total Comment Storage Per Year * 5
- Total Comment Storage for 5 Years = 182.5 TB * 5 = 912.5 TB
These estimations provide a basis for understanding the storage requirements of the Reddit-like platform over a 5-year period, considering user data, subreddit metadata, posts, and comments.
API design
Define what APIs are expected from the system...
Database design
Defining the system data model early on will clarify how data will flow among different components of the system. Also you could draw an ER diagram using the diagramming tool to enhance your design...
High-level design
You should identify enough components that are needed to solve the actual problem from end to end. Also remember to draw a block diagram using the diagramming tool to augment your design. If you are unfamiliar with the tool, you can simply describe your design to the chat bot and ask it to generate a starter diagram for you to modify...
Request flows
Explain how the request flows from end to end in your high level design. Also you could draw a sequence diagram using the diagramming tool to enhance your explanation...
Detailed component design
Dig deeper into 2-3 components and explain in detail how they work. For example, how well does each component scale? Any relevant algorithm or data structure you like to use for a component? Also you could draw a diagram using the diagramming tool to enhance your design...
Trade offs/Tech choices
Explain any trade offs you have made and why you made certain tech choices...
Failure scenarios/bottlenecks
Try to discuss as many failure scenarios/bottlenecks as possible.
Future improvements
What are some future improvements you would make? How would you mitigate the failure scenario(s) you described above?