Requirements


Functional Requirements:


  • Allow users to tweet messages up to 140 characters. (POST API, tweets can be sent completely in a message)
  • Enable users to follow other users. (subscribers)
  • Allow users to like tweets from other users. ("subscribers" of subscribers)
  • Display tweets from followed users in the home feed. (subscribers)
  • Show top K popular tweets in the home feed based on likes and followers. (AI/Statistic model reasoning)



Non-Functional Requirements:


  • Latency:
    • Client:
      • load time - reasonable fast for user experience.
      • tweet publish/sync time (to other users)- negotiable...
      • feed generation time - reasonable fast
    • Server:
      • API response time - reasonable,
      • search latency - negotiable
      • media upload/download - (we should limit size)
  • Scalability:
    • support millions of users - 1. should be scalable from day one, 2. support multi region
    • spikes - should be elastic (scale up and down seamlessly)
  • Reliability: 99% model, fault tolerance,
    • RPO (Recovery Point Objective)
    • RTO (Recovery Time Objective)
  • Consistency:
    • user account data should support CRUD ops (RDB)
    • tweets can use No-SQL DB


API Design

Define the APIs expected from the system. This is your chance to analyze and define the read and write paths so that you can come up with the high-level design...

  • Client API - CRUD operations -> sync into "main" DB (API/messaging)
  • Tweet API:
    • POST Tweet -> Persist (save and index for search) -> trigger fanout to subscribers.
    • Like Tweet - Persist (save - in same record [update is more expensive but makes "GET" faster]/as individual record [insert is faster but join in no-SQL is a big no no]? ) -> trigger fanout (to all subscribers of the original Tweet).
    • GET Tweets -
      • Home feed (based on: a. liked tweets, b. top k popular tweets - AI/Statistic model reasoning)
      • Get user feeds (following) [index tweets by user]
      • Get popular tweets (how are tweets tagged/categorized? [straight forward approach: tweet => count of likes, tweet => count of followers of author, tweet => count of other tweets like of the author, popularity of other users that liked the tweet or that are following the author]
  • User API: GET Users - allow quiring for users (should it be in the Tweet API? - yes, simpler development and query logic/no, different scale needs, (maybe different tech?) - to decide later.



High-Level Design

Describe the overall system architecture. Identify the main components needed to solve the problem end-to-end. Use the diagramming tool to create a block diagram.

  1. Client UI (caching needs? update/load in background?)
  2. Databases:
    1. RDB for user data, billing etc. - ACID compliance (*Options: one DB for all vs different DBs for different parts. *Tradeoff: fast, simple, both in development and in deployment vs different properties for different parts of the system, different scaling needs. *Decision: different DBs; it'll be pay off very fast)
    2. No-SQL for Tweets - BASE (Basic Availability, Soft state, Eventual consistency)
    3. Graph-DB for followers/likes etc.
  3. Load balancer: manages routing of API calls between
    1. ClientManagement Server - (*Options: one server for all business logic vs divide to two or more servers. *Tradeoff: again, fast and simple vs allow for different configurations; DBs, connection strings, scale, security. *Decision: divide, once deployment starts benefits will stand out) less frequent API calls, so can scale in different pace.
    2. TweetManagement Server - more intensive API calls, would scaled more often.
  4. PushNotification Service:






Detailed Component Design

Deep dive into 2-3 key components. Explain how they work, how they scale, discuss tradeoffs, capacity, and any relevant algorithms or data structures.