System requirements
Functional:
Create tweet
Update tweet
Delete tweet
Follow / Un-follow user
Retweet
Get list of recent tweets
Be able to directly message another user
Create an account
Non-Functional:
Should be highly available with minimal latency
Get list of recent tweets should be displayed quickly when accessing the website
Extended requirements:
Have some metrics + Like posts
Should receive some notification of new tweet available
Capacity estimation
Number of daily users 2M
Ratio read / write : 100 / 1
Request in READ per day 200M, per second 200M / 24 * 3600 = 86 400 per sec
Request in WRITE per day 2M, per second 2M / 24 * 3600 = 864 requests per sec
Storage required:
One tweet / 100 will contain media such as image / video. Average of 1MB. So per day with need 20GB of bandwidth +
99 tweets / 100 will contain maximum 240 characters. Around 1KB of data so 1.98M request * 1KB = 1 980MB = 1.9GB of bandwidth in total
API design
createTweet(userId: string, tweet: char[]): TweetId
updateTweet(userId: string, tweet: char[]): TweetId
deleteweet(userId: string, tweetId: string): TweetId
followUser(userId: string): boolean with true if follow is a success
unFollowUser(userId: string): boolean with true if unfollow is a success
retweet(tweetId): TweetId of new tweet
like(tweetId): boolean with true if follow is a success
getRecentTweets(userId: string): Tweets[]
message(userId: string): boolean with true if follow is a success
createAccount(userPseudo:string, password:string)
Database design
Defining the system data model early on will clarify how data will flow among different components of the system. Also you could draw an ER diagram using the diagramming tool to enhance your design...
tweet
Using relational database
id : string primary key
userId: string foreign key
content: string
likeCount: int
createdAt: string
user
id : string primary key
email:string
username : string
password: string
following
followedId: string foreign key on user id
followerId: string foreign key on user id
timestamp:string
High-level design
You should identify enough components that are needed to solve the actual problem from end to end. Also remember to draw a block diagram using the diagramming tool to augment your design. If you are unfamiliar with the tool, you can simply describe your design to the chat bot and ask it to generate a starter diagram for you to modify...
News feed will be rendered using pull model.
When user load the main page the system will request new tweets about users that are followed
I will use an microservice based architecture:
A User service
Following service
Tweet service
Home feed service
Request flows
Explain how the request flows from end to end in your high level design. Also you could draw a sequence diagram using the diagramming tool to enhance your explanation...
Starting with frontend client a request will be done to load balancers that will redistribute to the server that will redistribute to the microservice. A CDN will be used to get static files such as images / videos that will be stored on Amazon S3.
The different service will store data in a MySQL database that will be replicated horizontaly.
A redis cache will be used to store the top 20% tweet and the User and tweet service will put data here
Detailed component design
Dig deeper into 2-3 components and explain in detail how they work. For example, how well does each component scale? Any relevant algorithm or data structure you like to use for a component? Also you could draw a diagram using the diagramming tool to enhance your design...
The news feed service will use an algorithm such as the round robin algorithm to compute the most relevant tweet to display in the user homepage.
Also this could be precomputed to avoid latency when user is loading the homepage.
The home feed service will get data from the redis cache. On cache succes the data are returned. Otherwise a call will be done on the database that will get static files from S3 if needed
Trade offs/Tech choices
Explain any trade offs you have made and why you made certain tech choices...
The DB used is MySQL because model is relational
The pull model used to render the home page may have some limitation such as taking some time to get all tweets.
This can be improved by using an hybrid mode that will switch between pull / push model if user have a lot of followed people to reduce call to database
Failure scenarios/bottlenecks
Try to discuss as many failure scenarios/bottlenecks as possible.
Future improvements
What are some future improvements you would make? How would you mitigate the failure scenario(s) you described above?