System requirements
Functional:
- user can post tweets
- users can follow/unfollow other users
- users can see tweets on a timeline/feed of tweets from users they follow
- users can like tweets
Non-Functional:
- should be highly available, does not need to be strongly consistent
- should handle 500 million DAU, 100 million concurrent users
- should be able to load timeline (main function/page) with low latency ~200ms
- should be more read than write heavy
Capacity estimation
500 milliion DAU, 100 million concurrent users
about 100 million read requests per second
we can expect about 1000:1 read-write ratio
so about 100,000 write requests per second
API design
POST /v1/signup
{
username: abcd
password: hash(abcd)
}
POST /v1/login
{
username: abcd
password: hash(abcd)
}
GET /v1/posts
POST /v1/posts
{
text: "abcd"
}
PUT /v1/users/follow/{userid}
PUT /v1/users/unfollow/{userid}
Database design
User:
userid (pk)
password
Follow:
id
followee (userid)
follower (userid)
Post:
postid (pk)
userid (fk)
content
Like
id
userid
postid
High-level design
there will be multiple servers handling all the requests, with the load balancer balancing the requests across the servers. we also have a cache so that frequently read posts will be cached and not have to make a trip to the database.
Request flows
there will be multiple servers handling all the requests, with the load balancer balancing the requests across the servers. we also have a cache so that frequently read posts will be cached and not have to make a trip to the database.
Detailed component design
for the cache, it will be a simple key value store, with the key being the post id. we will use a cache aside strategy that the server will write to the cache when a user requests a post that is not in the cache, and will invalidate through a least frequently used strategy. we could also add a write through cache here as well, since we know users will want newer content first, a recently written post will be queried more often
Trade offs/Tech choices
i chose to use one server instead of many microservices to handle the requests, since the number of requests are not that many, and maintaining one monorepo can decrease the complexity. to scale the system, if we want to support more operations, we can split them up into microservices so that the teams can independently work on and deploy each service.
i also chose to do a write through strategy with a cache aside for the cache since more recently written posts will be requested more.
Failure scenarios/bottlenecks
bottlenecks may include having to update a lot of users timelines when a popular user makes a post
Future improvements
What are some future improvements you would make? How would you mitigate the failure scenario(s) you described above?