System requirements
Functional:
- User Registration and Authentication: Users can create an account, log in, and log out securely.
- Compose and Share Tweets: Users can create, edit, and delete tweets to share with their followers.
- Followers and Following: Users can follow/unfollow other users to see their tweets on their timeline.
- Like (Favorite) Tweets: Users can like and unlike tweets to show appreciation.
- Timeline: Users have a timeline where they can see tweets from users they follow in a chronological order.
- Search: Users can search for other users and tweets based on keywords or hashtags.
- Notifications: Users receive notifications for likes, retweets, mentions, etc.
Non-Functional:
- Scalability: The system should handle a growing user base and increasing tweet volume.
- Availability: The system should be available with minimal downtime.
- Performance: The system should respond to user actions with low latency.
- Security: User data and privacy should be protected.
Capacity estimation
Estimating the precise user base and tweet volume is challenging without real data. However, we can make educated guesses based on existing social media platforms. Here are some possible assumptions:
- Active users: 10 million daily active users (DAUs)
- Average Tweets per User per Day: 1 tweets/day/user
- Average Tweet Size: 10 KB (including text and potential media)
That can give us conclusion:
avg posts per day: 10 million * 1= 10 million posts/day
avg daily data: 10 million * 1 * 10KB = 100 GB
API design
- registration and authentication API
- sign up
- log in
- log out
- Compose and share post API
- create post
- delete post
- Followers & Followings API
- Follow user
- Unfollow user
- Like/Unlike post API
- Notification API
- Search user/post API
- search post
- search user
- Timeline API
- get timeline
Database design
Users Table
Stores information about the users of the service.
- UserID (Primary Key): A unique identifier for each user.
- Username: The user's handle or screen name, must be unique.
- Email: The user's email address, must be unique.
- PasswordHash: A hashed representation of the user's password for security.
- CreatedAt: The date and time when the account was created.
- Bio: A short text bio about the user (optional).
- ProfileImageURL: URL to the user's profile image (optional).
Tweets Table
Stores the tweets posted by users.
- TweetID (Primary Key): A unique identifier for each tweet.
- UserID (Foreign Key): The ID of the user who posted the tweet.
- Content: The text content of the tweet, limited to a certain number of characters.
- CreatedAt: The date and time when the tweet was posted.
- MediaURL: URL to media attached to the tweet (optional).
Followers Table
Represents the follower relationships between users, essentially a many-to-many relationship within the Users table.
- FollowerID (Composite Primary Key, Foreign Key): The user ID of the follower.
- FollowedID (Composite Primary Key, Foreign Key): The user ID of the followed user.
- CreatedAt: The date and time when the follow relationship was established.
Likes Table
Tracks which users have liked which tweets.
- UserID (Composite Primary Key, Foreign Key): The ID of the user who liked the tweet.
- TweetID (Composite Primary Key, Foreign Key): The ID of the tweet that was liked.
- CreatedAt: The date and time when the tweet was liked
Notifications table
- NotificationID (Primary Key): A unique identifier for each notification.
- UserID (Foreign Key): The ID of the user who will receive the notification. This links to the Users table.
- TriggeredByUserID (Foreign Key): The ID of the user who caused the notification (e.g., the user who liked the tweet or followed the recipient). This also links to the Users table.
- Type: The type of notification (e.g., 'Like', 'Follow', 'Mention', etc.).
- EntityID: The ID of the entity that triggered the notification (e.g., the TweetID of a liked tweet or the UserID of a new follower). The specific use of this field can vary based on the notification type.
- Message: A custom message for the notification, which could be dynamically generated based on the notification type and entities involved.
- CreatedAt: The date and time when the notification was created.
- Read: A boolean flag indicating whether the notification has been read by the user.
High-level design
- Client (Web + Mobile) connects to API Gateway
- user service
- Handles user registration, authentication and profile management
- stores user information in a high available database
- provides API's for other components to access user data
- post content service
- Handles tweet creation, editing and deletion
- user a distributed database + messaging for high-throughput + scalability
- stores tweets associated metadata (hashtags, mentions, etc.)
- follower service
- stores user following relationship in a scalable graph database
- Provides efficient methods for querying followers and followers
- used by tweet service to notify followers about new tweets
- timeline service
- Generates personalized service for each user based on the following relationships
Request flows
Explain how the request flows from end to end in your high level design. Also you could draw a sequence diagram using the diagramming tool to enhance your explanation...
Detailed component design
tweet service
- Scalability: The service can be horizontally scaled by adding more instances to handle increased load.
- Data Structures: Relational database for core user and tweet data
Trade offs/Tech choices
Database Choice:
- Choice: Relational database for core user and tweet data.
- Trade-off: While relational databases offer strong consistency and schema enforcement, they might not be ideal for highly scalable tweet storage and retrieval.
- Reasoning: For this initial design, a relational database provides a familiar and well-understood option for core functionalities. It ensures data integrity and simplifies queries. However, as the platform scales, we might need to explore alternative data stores like NoSQL databases or key-value stores like Redis for storing tweets to handle high write volume and fast retrieval.
Failure scenarios/bottlenecks
. Single Point of Failure:
- Scenario: A critical component, like the API Gateway or database, fails, causing service outage.
- Impact: Users cannot access the platform, leading to frustration and potential loss of user engagement.
- Mitigation: Implement redundancy for critical components using techniques like load balancing and failover mechanisms. This ensures service remains available even if one instance fails.
2. Scalability Bottleneck:
- Scenario: Increased user base or tweet volume overwhelms the system's capacity, leading to performance degradation and potential outages.
- Impact: Slow response times, service disruptions, and negative user experience.
- Mitigation: Implement horizontal scaling by adding more instances of services like Tweet Service and Search Service to distribute the load. Additionally, utilizing caching mechanisms can reduce database load and improve response times.
3. Security Breach:
- Scenario: Hackers gain unauthorized access to user data or manipulate system functionalities.
- Impact: Compromised user privacy, potential financial losses, and reputational damage.
- Mitigation: Implement robust security measures like user authentication, data encryption, and regular security audits to identify and address vulnerabilities.
Future improvements
Advanced Search:
- Extend search functionalities to include advanced options like filtering by date, location, and user mentions.
3. Content Moderation and Recommendation Systems:
- Develop mechanisms to moderate inappropriate content and personalize user feeds based on their interests and interactions.