Design Twitter - System Design

System requirements

Functional:

user can post tweets
users can follow/unfollow other users
users can see tweets on a timeline/feed of tweets from users they follow
users can like tweets

Non-Functional:

should be highly available, does not need to be strongly consistent
should handle 500 million DAU, 100 million concurrent users
should be able to load timeline (main function/page) with low latency ~200ms
should be more read than write heavy

Capacity estimation

500 milliion DAU, 100 million concurrent users

about 100 million read requests per second

we can expect about 1000:1 read-write ratio

so about 100,000 write requests per second

API design

POST /v1/signup

{

username: abcd

password: hash(abcd)

}

POST /v1/login

{

username: abcd

password: hash(abcd)

}

GET /v1/posts

POST /v1/posts

{

text: "abcd"

}

PUT /v1/users/follow/{userid}

PUT /v1/users/unfollow/{userid}

Database design

User:

userid (pk)

password

Follow:

followee (userid)

follower (userid)

Post:

postid (pk)

userid (fk)

content

userid

postid

High-level design

there will be multiple servers handling all the requests, with the load balancer balancing the requests across the servers. we also have a cache so that frequently read posts will be cached and not have to make a trip to the database.

Request flows

Detailed component design

for the cache, it will be a simple key value store, with the key being the post id. we will use a cache aside strategy that the server will write to the cache when a user requests a post that is not in the cache, and will invalidate through a least frequently used strategy. we could also add a write through cache here as well, since we know users will want newer content first, a recently written post will be queried more often

Trade offs/Tech choices

i chose to use one server instead of many microservices to handle the requests, since the number of requests are not that many, and maintaining one monorepo can decrease the complexity. to scale the system, if we want to support more operations, we can split them up into microservices so that the teams can independently work on and deploy each service.

i also chose to do a write through strategy with a cache aside for the cache since more recently written posts will be requested more.

Failure scenarios/bottlenecks

bottlenecks may include having to update a lot of users timelines when a popular user makes a post

Future improvements

What are some future improvements you would make? How would you mitigate the failure scenario(s) you described above?