System requirements
Functional:
- User account CRUD
- User profile CRUD
- User follows other users
- User post CRUD
- Post contains text and image
- User views posts on timeline
- Search tweet: text, hashtag
Non-Functional:
- High availability
- Low latency
- High Scale
- Eventual consistency
Capacity estimation
100M DAU
Avg Tweets /user/day = 5
500M write/day
5B reads/day
Average Text Size/Tweet = 100Bytes
Text storage: 50GB/day
Average Image size: 2MB
10% of tweets have images
Image storage size/day: 50M*2MB = 100TB
User profile store is separate
API design
Post /post
Get /feed
Post /follow
Get /search
Database design
Tables:
User, Post, Follower
Also will have a separate search database
High-level design
Components:
- API Gateway: Responsible for sending request to the right place
- User Service: for CRUD operation w.r.t users
- User Profile Service: CRUD operations w.r.t users
- Follower Service: CRUD operation w.r.t users. Also enable graph based access of related users
- Post Service: CRUD operations of posts(tweets) by individual users
- Fanout Service: For propagating posts made by users
- Feed Service: Creates news feeds for users with the aide of the Fanout Service
- Search Service:Indexes posts, enables lookup based on keywords and hashtags
Request flows
- Post Flow:
- User makes a post from a frontend client.
- The API Gateway sends it to the Post Service.
- The Post Service stores the text and image components in their respective storages.
- The post metadata is also forwarded to the Fanout Service
- The Fanout Service propagates the post to the correct places(cache etc) so that the post can be fetched by the Feed Service for displaying to the followers
- The post is also forwarded to the Search Service for storing and indexing
- Feed Flow:
- The fetches feed from a frontend client
- The API Gateway sends it to the Feed Service
- The Feed Services looks up feed already created by the Fanout service.
- Enriches it
- returns it to the client
Detailed component design
- General:
- all services can scale independently horizontally
- No-sql databases and object-stores can scale horizontally by adding more partitions
- Consistent hashing can be used to minimize with data movement with increasing partitions
- Virtual nodes can be used to mitigate hot partition problems
- Post Service:
- Uses a no-sql storage for texts
- Uses Object stores for images/videos
- CDNs also store images/videos for quick access
- Fanout Service:
- Talks to follower services to get a get a list of followers
- Appends the post to the feed cache of each follower
- Feed cache is distributed across geography
- The cache stores data for user in geographic proximity
- This can be achieved in two ways:
- write locally at the posters location and replicate across locations
- write directly to all followers locations
- Cache storage can be optimized:
- Encode data
- Store references rather than whole text
- Feed Service:
- Reads the Feed cache entry for the user
- Enriches and transform the data.
- Adds CDN references for images
- Sends the feed back to the FE client
- The Feeds Service can maintain a websocket connection with the client to push constant updates
- The FE client can the organize and render the feed accounting for the updates
- Search Service:
- The search is built on top of storage capable of reverse-indexes
- Various indexes are created enabled based on use cases: single word, phrases, proximity, hashtag, stemmed search etc.
- Follower Service:
- Direct follower relationships can be stored as key-value pairs in a distributed database
- The followers list can be cached locally for frequent users
- Cache is updated through async method as new followers are added
- The fanout service can lookup the caches for followers to fan out to.
- For advanced analysis of follower relationships the data should be stored in Graph DB
Trade offs/Tech choices
- Databases
- Post:
- text: Cassandra
- Media: S3
- Users:
- Account data: mySQL
- User Profile: Cassandra
- Search Database: ElasticSearch
- Cache: redis
- Post:
Failure scenarios/bottlenecks
- Viewing posts by celebs: they have millions of followers. Fanning out to all of them is too many write ops.This should be avoided.Instead their posts should be cached in a separate place. The Feeds Service should look up celeb cache for all the celebs the current user follows. This data should be merged with the feed available in the user's own feed cache.
- Top trending search keys: to avoid a hot partition for these scenarios virtual nodes should be used
Future improvements
What are some future improvements you would make? How would you mitigate the failure scenario(s) you described above?