System requirements


Functional:

  1. User Registration and Authentication: Users can create an account, log in, and log out securely.
  2. Compose and Share Tweets: Users can create, edit, and delete tweets to share with their followers.
  3. Followers and Following: Users can follow/unfollow other users to see their tweets on their timeline.
  4. Like (Favorite) Tweets: Users can like and unlike tweets to show appreciation.
  5. Timeline: Users have a timeline where they can see tweets from users they follow in a chronological order.
  6. Search: Users can search for other users and tweets based on keywords or hashtags.
  7. Notifications: Users receive notifications for likes, retweets, mentions, etc.



Non-Functional:

  • Scalability: The system should handle a growing user base and increasing tweet volume.
  • Availability: The system should be available with minimal downtime.
  • Performance: The system should respond to user actions with low latency.
  • Security: User data and privacy should be protected.




Capacity estimation

Estimating the precise user base and tweet volume is challenging without real data. However, we can make educated guesses based on existing social media platforms. Here are some possible assumptions:

  1. Active users: 10 million daily active users (DAUs)
  2. Average Tweets per User per Day: 1 tweets/day/user
  3. Average Tweet Size: 10 KB (including text and potential media)

That can give us conclusion:

avg posts per day: 10 million * 1= 10 million posts/day

avg daily data: 10 million * 1 * 10KB = 100 GB







API design

  1. registration and authentication API
  2. sign up
  3. log in
  4. log out
  5. Compose and share post API
  6. create post
  7. delete post
  8. Followers & Followings API
  9. Follow user
  10. Unfollow user
  11. Like/Unlike post API
  12. Notification API
  13. Search user/post API
  14. search post
  15. search user
  16. Timeline API
  17. get timeline



Database design

Users Table

Stores information about the users of the service.

  • UserID (Primary Key): A unique identifier for each user.
  • Username: The user's handle or screen name, must be unique.
  • Email: The user's email address, must be unique.
  • PasswordHash: A hashed representation of the user's password for security.
  • CreatedAt: The date and time when the account was created.
  • Bio: A short text bio about the user (optional).
  • ProfileImageURL: URL to the user's profile image (optional).

Tweets Table

Stores the tweets posted by users.

  • TweetID (Primary Key): A unique identifier for each tweet.
  • UserID (Foreign Key): The ID of the user who posted the tweet.
  • Content: The text content of the tweet, limited to a certain number of characters.
  • CreatedAt: The date and time when the tweet was posted.
  • MediaURL: URL to media attached to the tweet (optional).


Followers Table

Represents the follower relationships between users, essentially a many-to-many relationship within the Users table.

  • FollowerID (Composite Primary Key, Foreign Key): The user ID of the follower.
  • FollowedID (Composite Primary Key, Foreign Key): The user ID of the followed user.
  • CreatedAt: The date and time when the follow relationship was established.

Likes Table

Tracks which users have liked which tweets.

  • UserID (Composite Primary Key, Foreign Key): The ID of the user who liked the tweet.
  • TweetID (Composite Primary Key, Foreign Key): The ID of the tweet that was liked.
  • CreatedAt: The date and time when the tweet was liked

Notifications table

  • NotificationID (Primary Key): A unique identifier for each notification.
  • UserID (Foreign Key): The ID of the user who will receive the notification. This links to the Users table.
  • TriggeredByUserID (Foreign Key): The ID of the user who caused the notification (e.g., the user who liked the tweet or followed the recipient). This also links to the Users table.
  • Type: The type of notification (e.g., 'Like', 'Follow', 'Mention', etc.).
  • EntityID: The ID of the entity that triggered the notification (e.g., the TweetID of a liked tweet or the UserID of a new follower). The specific use of this field can vary based on the notification type.
  • Message: A custom message for the notification, which could be dynamically generated based on the notification type and entities involved.
  • CreatedAt: The date and time when the notification was created.
  • Read: A boolean flag indicating whether the notification has been read by the user.






High-level design

  • Client (Web + Mobile) connects to API Gateway
  • user service
  • Handles user registration, authentication and profile management
  • stores user information in a high available database
  • provides API's for other components to access user data
  • post content service
  • Handles tweet creation, editing and deletion
  • user a distributed database + messaging for high-throughput + scalability
  • stores tweets associated metadata (hashtags, mentions, etc.)
  • follower service
  • stores user following relationship in a scalable graph database
  • Provides efficient methods for querying followers and followers
  • used by tweet service to notify followers about new tweets
  • timeline service
  • Generates personalized service for each user based on the following relationships




Request flows

Explain how the request flows from end to end in your high level design. Also you could draw a sequence diagram using the diagramming tool to enhance your explanation...






Detailed component design

tweet service

  • Scalability: The service can be horizontally scaled by adding more instances to handle increased load.
  • Data Structures: Relational database for core user and tweet data



Trade offs/Tech choices

Database Choice:

  • Choice: Relational database for core user and tweet data.
  • Trade-off: While relational databases offer strong consistency and schema enforcement, they might not be ideal for highly scalable tweet storage and retrieval.
  • Reasoning: For this initial design, a relational database provides a familiar and well-understood option for core functionalities. It ensures data integrity and simplifies queries. However, as the platform scales, we might need to explore alternative data stores like NoSQL databases or key-value stores like Redis for storing tweets to handle high write volume and fast retrieval.




Failure scenarios/bottlenecks

. Single Point of Failure:

  • Scenario: A critical component, like the API Gateway or database, fails, causing service outage.
  • Impact: Users cannot access the platform, leading to frustration and potential loss of user engagement.
  • Mitigation: Implement redundancy for critical components using techniques like load balancing and failover mechanisms. This ensures service remains available even if one instance fails.

2. Scalability Bottleneck:

  • Scenario: Increased user base or tweet volume overwhelms the system's capacity, leading to performance degradation and potential outages.
  • Impact: Slow response times, service disruptions, and negative user experience.
  • Mitigation: Implement horizontal scaling by adding more instances of services like Tweet Service and Search Service to distribute the load. Additionally, utilizing caching mechanisms can reduce database load and improve response times.

3. Security Breach:

  • Scenario: Hackers gain unauthorized access to user data or manipulate system functionalities.
  • Impact: Compromised user privacy, potential financial losses, and reputational damage.
  • Mitigation: Implement robust security measures like user authentication, data encryption, and regular security audits to identify and address vulnerabilities.




Future improvements

Advanced Search:

  • Extend search functionalities to include advanced options like filtering by date, location, and user mentions.

3. Content Moderation and Recommendation Systems:

  • Develop mechanisms to moderate inappropriate content and personalize user feeds based on their interests and interactions.