System requirements


Functional:

  • post tweets
  • read tweets
  • post across one person or multiple person's or group's


Non-Functional:

  • scalable
  • availability
  • acceptable latency 200ms
  • consistency gets a hit that can be ok




Capacity estimation

  • DAU
  • 100 million daily users
  • Write to read ratio
  • 1:10
  • Throughput
  • 100 million/(86,200*30) = 250 tweets/sec
  • Memory
  • 20% cache
  • 1 tweet = 500 bytes
  • 100 million tweets * 500= 50 Gb/day
  • 200GB
  • replicate multiple times per region
  • Storage
  • 1 write tweet = 1.5mb
  • 10 million tweets/day * 1.5 = 15 PB/day
  • 15 PB*30*365 = 54PB/year



API design

  • Get
  • user id
  • location (optional)
  • tweet id
  • date
  • time
  • Put
  • successful post will return the tweet


Database design

  • DB - GraphQL
  • Tweet ID
  • user ID
  • content
  • tweet - latitude
  • tweet - longitude
  • location - latitude
  • locaiton - longitude
  • date
  • time
  • favs
  • User id
  • user id
  • email
  • location
  • date
  • last login
  • followers
  • user ID1
  • user id2
  • favourites
  • tweet id
  • user id



High-level design

  • API gateway service
  • Metadata
  • tweets - write
  • tweets - read
  • pictures
  • Data sharding
  • Graph DB - Neo4j
  • User ID
  • Having all the data in one DB might not work if there is one hot key user
  • Tweet ID

  • Tweet creation Time
  • Epoch time
  • merge tweet id & creation time with an increment


  • cache
  • CDN - photos/videos
  • Redis - tweets & metadata
  • load balancer
  • round robin method
  • clients & app servers
  • clients & caches
  • we can use AWS load balancer to intelligently route the traffic and it can auto scale out of the box as required
  • fault tolerance
  • leader and follower servers
  • if leader stops then pass it on to follower
  • multiple availability zone
  • rate limiter
  • security
  • monitoring
  • grafana tool to monitor the services along with kibana for analytics




Request flows

Explain how the request flows from end to end in your high level design. Also you could draw a sequence diagram using the diagramming tool to enhance your explanation...






Detailed component design

  • DB's
  • Graph DB - Neo4J or infinte graph
  • Graph user service
  • Fan out service





Trade offs/Tech choices

  • NoSql db - cassandra is a heavy write db but we can join the entities to build relationships across users/followers/favorites



Failure scenarios/bottlenecks

  • what if the celebrity posts at weet which has millions of followers which becomes a bottleneck to read or share the tweet



Future improvements

  • how to search tweets since it involves indexing and ranking associated with it