Requirements
Functional Requirements:
- Allow users to tweet messages up to 140 characters.
- Enable users to follow other users.
- Allow users to like tweets from other users.
- Display tweets from followed users in the home feed.
- Show top K popular tweets in the home feed based on likes and followers.
Non-Functional Requirements:
- Can scale to billions of users to simultaneously post, like, and comment
- New posts can be seen by followers within 1 minute after posting
- Store all historical data of billions of users
- UI response is instantaneous (for example, like immediately increases like count on UI, although actual backend data may be delayed)
API Design
Backend:
- post(message: string, userId: number)
- like(userId: number, postUserId: number, postId: number)
- comment(userId: number, postUserId: number, postId: number, content: string)
- deletePost(userId: number, postId: number)
- deleteComment(userId: number, postId: number, commentId: number)
- dislike(userId: number, postId: number)
- getPosts(userId: number, page: number)
- getTopK(region: string)
- follow(followee: number, follower: number)
- unfollow(followee: number, follower: number)
High-Level Design
Main components:
- Frontend UI, boundary = make REST api to fetch data or write data
- Backend REST API server = handle REST api calls, cache request data, orchestrate database queries, compose home feed based on query result, etc.
- Distributed Relational Database: persistent data storage, aceept database queries, communication cross regions for data consistency
Detailed Component Design
Deep dive into 2-3 key components. Explain how they work, how they scale, discuss tradeoffs, capacity, and any relevant algorithms or data structures.
## sharded database.
- Use raft algorithm to keep comment and like data consistent across multiple copies around the globe. Also add sharding based on userId to scale up serving capability. In the CAP theorem, we are more towards giving up consistency, since it's less valued for users.
- Posts are less prone to data race. Since a single user can't make multiple posts at the same time, not to mention posting from different regions at once. So a sharded database + periodic regional syncing should be enough
- database schema design:
- two main table to store posts and users
- a comment table of posts, with commentId as primary key and userId as secondary key
- similarly a like table, again mapping commentId and userId
All of the tables above should be sharded for scalability
## Read path
Frontend asynchrounously calls REST API while displaying skeleton -> Backend checks cache or queries database -> database retrieves and returns corresponding data -> Backend organizes query results and send back to frontend -> frontend update view with new data
## Write path (make a like)
Frontend instantly updates UI, play a like animation, then make async REST API call -> backend relays to database with a query "IN LIKE_TABLE INSERT commentId by userid" -> database negotiate with remote peers throgh raft protocol, retry until successful, then commit