System requirements


Functional:

User should be able to

  • Compose a tweet and post it
  • Follow another user
  • View tweets of users you follow in your home feed
  • View feed of suggested content from accounts that is recommended based on popularity (not necessarily accounts you follow)
  • Favorite other users' tweets


Non-Functional:

  • Scalability: this system should be highly scalable. We want to be able to support many simultaneous users, potentially in different parts of the world
  • Response time: response time should be as low as possible for reads, we can tolerate longer response time for writes
  • Consistency: it is ok for there to be eventual consistency in this system. Users do not need to have newest updates instantaneously
  • Security: we need to ensure that users are authenticated in order to post, make changes to their account, and access the content from the users they follow




Capacity estimation

Estimate the scale of the system you are going to design...

User Base:

  • Let's assume daily active users (DAU) is 500 million.

Traffic:

We can calculate the traffic based on the number.

  1. Tweeting: each user tweets about 2 times per day, so 1 billion tweets per day total
  2. Home feed: each user loads their home feed about 10 times per day, 5 billion home feed loads per day total
  3. Favorite: each user favorites about 1 tweet per day, so 500 million favorites per day total
  4. Following: each user follows about 200 accounts, so 100 billion follow relationships total

Queries Per Second:

  1. write: 500m*2/3600/24= 15k/qps
  2. read: 500m*10/3600/24= 75k/qps
  3. Favorites: 500m*1/3600/24 = 7.5k/qps

Data size :

  1. Tweet: 1b tweets, each with 140 chars. considering encoding , let's assume 300 bytes. So total data is 280GB per day. It would be 100TB per year.



API design

Tweeting

  • POST: user ID, content of tweet

Home Feed

  • GET: user ID, page #

Following/Unfollowing

  • POST: user ID, followed user ID

Favoriting/Unfavoriting

  • POST: user ID, tweet ID




Database design

RDMS database with the following tables


Users table

  • user ID (UUID)
  • name (string)
  • email (string)
  • created at (timestamp)
  • updated at
  • etc.

Tweets table

  • tweet ID (UUID)
  • user ID (UUID)
  • tweet content (string)
  • created at (timestamp)

User Favorites table - many to many

  • user ID (UUID)
  • tweet ID (UUID)
  • created at (timestamp)
  • deleted at (timestamp)

User Followers table - many to many

  • user ID (UUID)
  • followed user ID (UUID)
  • created at (timestamp)
  • deleted at (timestamp)




High-level design

You should identify enough components that are needed to solve the actual problem from end to end.

Rate limiter

  • Protect again DOS attacks and ensure fair usage

Load balancer

  • use constant hashing to distribute load across servers

CDN

  • Serve cached data to localized areas to minimize response time

Services

  • Services to support major functions: user service (user management and following), tweet service (tweeting and favoriting), feed service (loading user feed)

Cache

  • application level caching

Database

  • use a relational database that is optimized to scale horizontally: amazon aurora, cockroach DB, etc.


Scaling

  • I will be using a combination of replication and sharding. There will be a master-slave replication pattern, with a number of slave instances to support the high volume of reads this system will handle.


Request flows

Explain how the request flows from end to end in your high level design.

  • The client will send a request that will initially be handled by the rate limiter and load balancer.
  • It will then be directed to the server and the CDN will deliver a response if possible
  • If CDN miss, then request will be routed to the correct service
  • The service interacts directly with the cache and the database to generate the response needed





Detailed component design

Dig deeper into 2-3 components and explain in detail how they work. For example, how well does each component scale? Any relevant algorithm or data structure you like to use for a component? Also you could draw a diagram using the diagramming tool to enhance your design...

Services

  • writes done asynchronously?

Cache

  • write back?





Trade offs/Tech choices

Explain any trade offs you have made and why you made certain tech choices...


  • Database type
  • Push vs pull for populating user feed
  • micro service vs monolith



Failure scenarios/bottlenecks

Try to discuss as many failure scenarios/bottlenecks as possible.

  • celebrities who have many followers




Future improvements

What are some future improvements you would make? How would you mitigate the failure scenario(s) you described above?

How to address issues outlined in the previous section:


Additional features we could add in the next iteration of the system:

  • Improving the algorithm for what kind of content is surfaced on the home feed: based on previous engagement by the user with other content, etc.
  • Allowing users to make their profiles private or public to control who can see their tweets
  • Content moderation: flagging and/or removing inappropriate content
  • Send/receive notifications whenever a followed user tweets, likes, or does any significant interaction