System requirements
Functional:
- Get a feed of tweets aggregated from a list of followers and popular posts
- be able to follow and unfollow a user
- post pictures or videos potentially?
- search for tweets
- secure login
Non-Functional:
- high availability
- quick reads
- eventual consistency on posts
Capacity estimation
- assume 100 million daily active users
- assume 1 out of 10 users post tweets per minute
- 1 tweet will have max 160 b we should have around around 160 gb worth of tweets a day
- each tweet will have another 160b or so of meta data for uid, likes, retweets, and etc, doubling the amount of storage a day approximately
- band with we would want at least 1 mpbs per user to load multiple tweets and tweets meta data
- if we have photos or videos then we would have to consider a blob storage based on limits we set on media posts
API design
- postTweet(uid, content)
- post request
- getTweetFeed(uid)
- get request
- postTweetMedia(uid, content, fileType)
- post request
- searchTweet(content)
- get request
- getFollowers(uid)
- get request
- followUser(uid, followingUid)
- put request
Database design
- For the database we can have a followers table
- uid | followingUid
- We can have a posts table
- uid | tweetid | content | media link to blob storage
- likes & retweets table
- uid | likes | retweets|
- User profile table
- uid | email | age |
High-level design
We will have some algorithm figuring out which posts have the highest engagement hourly and pull that into the cache for users to to pull into their own feed. Their feed will also have cached a list of who they follow indexed on uid for our followers table. That way we can quickly figure out who they follow and pull latest posts from them.
Another thing we can do is for popular users with many followers we can cache their posts as well to fan out to users since they have a large amount of followers these posts being cached will save us a lot of time for when their followers log on and pull posts.
Request flows
Explain how the request flows from end to end in your high level design. Also you could draw a sequence diagram using the diagramming tool to enhance your explanation...
Detailed component design
Dig deeper into 2-3 components and explain in detail how they work. For example, how well does each component scale? Any relevant algorithm or data structure you like to use for a component? Also you could draw a diagram using the diagramming tool to enhance your design...
Trade offs/Tech choices
Main trade off we have is for fast reads vs writes, since most of of our users will be reading tweets rather than creating them. Its fine if the tweet isn't shown to everyone right after they post.
Failure scenarios/bottlenecks
Try to discuss as many failure scenarios/bottlenecks as possible.
Future improvements
What are some future improvements you would make? How would you mitigate the failure scenario(s) you described above?