System requirements


Functional:

  • user can post tweets
  • users can follow/unfollow other users
  • users can see tweets on a timeline/feed of tweets from users they follow
  • users can like tweets



Non-Functional:

  • should be highly available, does not need to be strongly consistent
  • should handle 500 million DAU, 100 million concurrent users
  • should be able to load timeline (main function/page) with low latency ~200ms
  • should be more read than write heavy




Capacity estimation

500 milliion DAU, 100 million concurrent users

about 100 million read requests per second

we can expect about 1000:1 read-write ratio

so about 100,000 write requests per second




API design

POST /v1/signup

{

username: abcd

password: hash(abcd)

}


POST /v1/login

{

username: abcd

password: hash(abcd)

}


GET /v1/posts


POST /v1/posts

{

text: "abcd"

}


PUT /v1/users/follow/{userid}

PUT /v1/users/unfollow/{userid}





Database design

User:

userid (pk)

password


Follow:

id

followee (userid)

follower (userid)


Post:

postid (pk)

userid (fk)

content


Like

id

userid

postid








High-level design

there will be multiple servers handling all the requests, with the load balancer balancing the requests across the servers. we also have a cache so that frequently read posts will be cached and not have to make a trip to the database.






Request flows

there will be multiple servers handling all the requests, with the load balancer balancing the requests across the servers. we also have a cache so that frequently read posts will be cached and not have to make a trip to the database.





Detailed component design

for the cache, it will be a simple key value store, with the key being the post id. we will use a cache aside strategy that the server will write to the cache when a user requests a post that is not in the cache, and will invalidate through a least frequently used strategy. we could also add a write through cache here as well, since we know users will want newer content first, a recently written post will be queried more often





Trade offs/Tech choices

i chose to use one server instead of many microservices to handle the requests, since the number of requests are not that many, and maintaining one monorepo can decrease the complexity. to scale the system, if we want to support more operations, we can split them up into microservices so that the teams can independently work on and deploy each service.


i also chose to do a write through strategy with a cache aside for the cache since more recently written posts will be requested more.





Failure scenarios/bottlenecks

bottlenecks may include having to update a lot of users timelines when a popular user makes a post





Future improvements

What are some future improvements you would make? How would you mitigate the failure scenario(s) you described above?