System requirements


Functional:

  • Project Goals, Scope, and Constraints:Goals: Develop a Twitter-like application allowing users to post messages, follow others, interact with posts (like, comment), and receive notifications.
  • Scope: Design frontend and backend, user authentication, scalability, and cross-platform compatibility.
  • Constraints: Consider time, budget, and technology limitations.



Non-Functional:




Capacity estimation

Let's calculate the storage requirements for storing tweets. Assuming each tweet has an average length of 280 characters and each character occupies 1 byte (assuming Unicode characters), and we expect to have 1 million users, with each user making 5 tweets on average.

Total characters per tweet = 280 bytes Average tweets per user per day = 5 Total tweets per day = 1,000,000 users * 5 tweets/user = 5,000,000 tweets Total characters per day = 5,000,000 tweets * 280 bytes/tweet = 1,400,000,000 bytes

Converting bytes to GB: Total storage per day = 1,400,000,000 bytes / (1024^3) GB = 1.305 GB

Therefore, we would need approximately 1.305 GB of storage per day to store tweets from 1 million users. We can scale up the storage infrastructure accordingly based on user growth projections and expected tweet frequency.



API design

  • User Service:Responsibilities:User management: Handles user registration, login, and profile management.
  • Authentication and authorization: Manages user authentication using OAuth 2.0 and ensures secure access to resources.
  • Functionality:Exposes endpoints for user registration, login, profile retrieval, and update.
  • Validates user credentials and issues JWT tokens for authentication.
  • Endpoints:/register: Registers a new user.
  • /login: Logs in an existing user.
  • /profile: Retrieves and updates user profile information.
  • Tweet Service:Responsibilities:Manages tweets: Handles tweet creation, and deletion.
  • Supports functionalities like posting tweets and deleting tweets.
  • Functionality:Exposes endpoints for posting tweets and deleting tweets.
  • Ensures data consistency and integrity for tweet-related operations.
  • Endpoints:/post: Creates a new tweet.
  • /delete: Deletes a tweet.

Timeline Service:

  • Responsibilities:Manages tweets: Handles user timeline generations
  • Functionality:Ensures data consistency and integrity for tweet-related operations.
  • Endpoints:/timeline: Retrieves the timeline of tweets for a user.

Follow Service:Responsibilities:Manages user relationships: Handles user follow/unfollow actions.

  • Enables users to follow/unfollow other users and retrieve followers/following lists.
  • Functionality:Exposes endpoints for following/unfollowing users and retrieving followers/following lists.
  • Ensures consistency in user relationships across the system.
  • Endpoints:/follow: Follows a user.
  • /unfollow: Unfollows a user.
  • /followers: Retrieves followers of a user.
  • /following: Retrieves users being followed by a user.
  • Interaction Service:Responsibilities:Manages interactions on tweets: Handles likes and comments on tweets.
  • Allows users to like and comment on tweets and retrieves likes and comments for a tweet.
  • Functionality:Exposes endpoints for liking and commenting on tweets and retrieving likes/comments for a tweet.
  • Ensures consistency in interactions and provides real-time updates.
  • Endpoints:/like: Likes a tweet.
  • /comment: Adds a comment to a tweet.
  • /likes: Retrieves likes for a tweet.
  • /comments: Retrieves comments for a tweet.
  • Notification Service:Responsibilities:Sends notifications to users: Notifies users about relevant activities on their posts.
  • Sends notifications for likes, comments, and new followers.
  • Functionality:Exposes endpoints for sending notifications and retrieving notifications for a user.
  • Utilizes messaging queues for asynchronous notification delivery.
  • Endpoints:/sendNotification: Sends a notification to a user.
  • /getNotifications: Retrieves notifications for a user.


Database design

  • User Service: Utilize a relational database such as PostgreSQL or MySQL for user management. These databases offer strong ACID compliance and are well-suited for handling structured data like user profiles and authentication credentials.
  • Tweet Service: Opt for a NoSQL solution like MongoDB or Cassandra for storing tweets. NoSQL databases provide flexibility in schema design and can handle large volumes of unstructured data efficiently, making them ideal for storing tweets with varying lengths and formats.
  • Follow Service: Choose a NoSQL database like Redis or Cassandra to manage user relationships. These databases offer high scalability and low-latency access, making them suitable for storing follower/following lists that can grow rapidly.
  • Interaction Service: Similar to the Tweet Service, select a NoSQL database for storing likes and comments associated with tweets. This ensures fast retrieval and high concurrency for interactions on tweets.
  • Notification Service: Depending on the complexity and consistency requirements of notifications, either a NoSQL or SQL solution can be chosen. For simple notifications, a NoSQL database like MongoDB may suffice, while for more complex notification logic or strict consistency requirements, a relational database may be preferred.




High-level design

In our Twitter-like application, we leverage Redis for storing tweets and ensuring real-time updates to user timelines. Here's a detailed overview of how we handle tweet distribution, timeline generation in Redis, and address challenges, particularly for celebrity users with a large follower base:

  • Tweet Distribution and Timeline Generation:When a user adds a tweet, it is added to a message queue for asynchronous processing.
  • A dedicated service reads from this queue and updates the timelines of all the user's followers in real-time.
  • The service retrieves the list of followers for the user from Redis and iterates over each follower to update their timeline.
  • By generating timelines in real-time, we ensure that users see new tweets from people they follow immediately upon logging in or refreshing their timeline.
  • Storing User Timelines in Redis:User timelines for all active users are stored in the Redis cluster.
  • Each user's timeline is represented as a sorted set in Redis, with tweet IDs as members and timestamps as scores.
  • Storing timelines in Redis allows for fast retrieval and efficient handling of real-time updates.
  • By generating timelines directly in Redis, we simplify the architecture and reduce the need for additional data processing steps.
  • Handling Celebrity Users with Many Followers:Celebrity users with a large follower base present a scalability challenge due to the high volume of followers and frequent updates to their timelines.
  • To address this challenge, we implement optimizations specifically for celebrity users:
  • Sharding: We shard the timelines of celebrity users across multiple Redis nodes to distribute the workload and improve scalability.
  • Caching: We employ caching mechanisms to cache frequently accessed portions of celebrity timelines, reducing the load on Redis and improving response times.
  • Rate Limiting: We implement rate limiting to control the frequency of updates to celebrity timelines, preventing overload and ensuring system stability.
  • Benefits of Redis for Timeline Generation:Real-Time Updates: Redis enables real-time updates to user timelines, ensuring that users always see the latest tweets from people they follow.
  • High Availability: Redis clustering provides high availability and fault tolerance, ensuring uninterrupted access to user timelines even in the event of node failures.
  • Scalability: Redis's ability to scale horizontally allows us to handle increasing loads and accommodate celebrity users with large follower bases.
  • Scaling our Redis clusters
  • We utilize Redis Cluster for horizontal scaling, sharding data across multiple nodes to distribute the workload.
  • Each Redis master node replicates its data to one or more slave nodes, ensuring high availability and fault tolerance. This master-slave configuration allows for both write operations directed to master nodes and read operations served by both master and slave nodes, improving performance and load balancing.
  • Continuous monitoring enables us to scale out by adding more Redis nodes as needed, ensuring seamless handling of growing user bases and high traffic loads while maintaining reliability and responsiveness.

By leveraging Redis for storing tweets, generating timelines, and implementing optimizations for scalability, our Twitter-like application delivers a reliable and responsive experience to users of all sizes, including celebrity users with large follower bases.

  • The tweets are asychronously stored in NoSQL database from redis



Request flows



Detailed component design




Trade offs/Tech choices



Failure scenarios/bottlenecks

retry flow



Future improvements

log system and mintor system