Design Twitter - System Design

System requirements

Functional:

User account CRUD
User profile CRUD
User follows other users
User post CRUD
Post contains text and image
User views posts on timeline
Search tweet: text, hashtag

Non-Functional:

High availability
Low latency
High Scale
Eventual consistency

Capacity estimation

100M DAU

Avg Tweets /user/day = 5

500M write/day

5B reads/day

Average Text Size/Tweet = 100Bytes

Text storage: 50GB/day

Average Image size: 2MB

10% of tweets have images

Image storage size/day: 50M*2MB = 100TB

User profile store is separate

API design

Post /post

Get /feed

Post /follow

Get /search

Database design

Tables:

User, Post, Follower

Also will have a separate search database

High-level design

Components:

API Gateway: Responsible for sending request to the right place
User Service: for CRUD operation w.r.t users
User Profile Service: CRUD operations w.r.t users
Follower Service: CRUD operation w.r.t users. Also enable graph based access of related users
Post Service: CRUD operations of posts(tweets) by individual users
Fanout Service: For propagating posts made by users
Feed Service: Creates news feeds for users with the aide of the Fanout Service
Search Service:Indexes posts, enables lookup based on keywords and hashtags

Request flows

Post Flow:
1. User makes a post from a frontend client.
2. The API Gateway sends it to the Post Service.
3. The Post Service stores the text and image components in their respective storages.
4. The post metadata is also forwarded to the Fanout Service
5. The Fanout Service propagates the post to the correct places(cache etc) so that the post can be fetched by the Feed Service for displaying to the followers
6. The post is also forwarded to the Search Service for storing and indexing
Feed Flow:
1. The fetches feed from a frontend client
2. The API Gateway sends it to the Feed Service
3. The Feed Services looks up feed already created by the Fanout service.
4. Enriches it
5. returns it to the client

Detailed component design

General:
1. all services can scale independently horizontally
2. No-sql databases and object-stores can scale horizontally by adding more partitions
3. Consistent hashing can be used to minimize with data movement with increasing partitions
4. Virtual nodes can be used to mitigate hot partition problems
Post Service:
1. Uses a no-sql storage for texts
2. Uses Object stores for images/videos
3. CDNs also store images/videos for quick access
Fanout Service:
1. Talks to follower services to get a get a list of followers
2. Appends the post to the feed cache of each follower
3. Feed cache is distributed across geography
4. The cache stores data for user in geographic proximity
5. This can be achieved in two ways:
  1. write locally at the posters location and replicate across locations
  2. write directly to all followers locations
6. Cache storage can be optimized:
  1. Encode data
  2. Store references rather than whole text
Feed Service:
1. Reads the Feed cache entry for the user
2. Enriches and transform the data.
3. Adds CDN references for images
4. Sends the feed back to the FE client
5. The Feeds Service can maintain a websocket connection with the client to push constant updates
6. The FE client can the organize and render the feed accounting for the updates
Search Service:
1. The search is built on top of storage capable of reverse-indexes
2. Various indexes are created enabled based on use cases: single word, phrases, proximity, hashtag, stemmed search etc.
Follower Service:
1. Direct follower relationships can be stored as key-value pairs in a distributed database
2. The followers list can be cached locally for frequent users
3. Cache is updated through async method as new followers are added
4. The fanout service can lookup the caches for followers to fan out to.
5. For advanced analysis of follower relationships the data should be stored in Graph DB

Trade offs/Tech choices

Databases
1. Post:
  1. text: Cassandra
  2. Media: S3
2. Users:
  1. Account data: mySQL
  2. User Profile: Cassandra
3. Search Database: ElasticSearch
4. Cache: redis

Failure scenarios/bottlenecks

Viewing posts by celebs: they have millions of followers. Fanning out to all of them is too many write ops.This should be avoided.Instead their posts should be cached in a separate place. The Feeds Service should look up celeb cache for all the celebs the current user follows. This data should be merged with the feed available in the user's own feed cache.
Top trending search keys: to avoid a hot partition for these scenarios virtual nodes should be used

Future improvements

What are some future improvements you would make? How would you mitigate the failure scenario(s) you described above?