System requirements


Functional:

Create tweet

Update tweet

Delete tweet

Follow / Un-follow user

Retweet

Get list of recent tweets

Be able to directly message another user

Create an account


Non-Functional:

Should be highly available with minimal latency

Get list of recent tweets should be displayed quickly when accessing the website


Extended requirements:

Have some metrics + Like posts

Should receive some notification of new tweet available



Capacity estimation

Number of daily users 2M

Ratio read / write : 100 / 1

Request in READ per day 200M, per second 200M / 24 * 3600 = 86 400 per sec

Request in WRITE per day 2M, per second 2M / 24 * 3600 = 864 requests per sec

Storage required:

One tweet / 100 will contain media such as image / video. Average of 1MB. So per day with need 20GB of bandwidth +

99 tweets / 100 will contain maximum 240 characters. Around 1KB of data so 1.98M request * 1KB = 1 980MB = 1.9GB of bandwidth in total




API design

createTweet(userId: string, tweet: char[]): TweetId

updateTweet(userId: string, tweet: char[]): TweetId

deleteweet(userId: string, tweetId: string): TweetId

followUser(userId: string): boolean with true if follow is a success

unFollowUser(userId: string): boolean with true if unfollow is a success

retweet(tweetId): TweetId of new tweet

like(tweetId): boolean with true if follow is a success

getRecentTweets(userId: string): Tweets[]

message(userId: string): boolean with true if follow is a success

createAccount(userPseudo:string, password:string)



Database design

Defining the system data model early on will clarify how data will flow among different components of the system. Also you could draw an ER diagram using the diagramming tool to enhance your design...

tweet

Using relational database


id : string primary key

userId: string foreign key

content: string

likeCount: int

createdAt: string


user

id : string primary key

email:string

username : string

password: string


following

followedId: string foreign key on user id

followerId: string foreign key on user id

timestamp:string




High-level design

You should identify enough components that are needed to solve the actual problem from end to end. Also remember to draw a block diagram using the diagramming tool to augment your design. If you are unfamiliar with the tool, you can simply describe your design to the chat bot and ask it to generate a starter diagram for you to modify...


News feed will be rendered using pull model.

When user load the main page the system will request new tweets about users that are followed


I will use an microservice based architecture:

A User service

Following service

Tweet service

Home feed service

Message service

Like service





Request flows

Explain how the request flows from end to end in your high level design. Also you could draw a sequence diagram using the diagramming tool to enhance your explanation...


Starting with frontend client a request will be done to load balancers that will redistribute to the server that will redistribute to the microservice. A CDN will be used to get static files such as images / videos that will be stored on Amazon S3.


The different service will store data in a MySQL database that will be replicated horizontaly.

A redis cache will be used to store the top 20% tweet and the User and tweet service will put data here



Detailed component design

Dig deeper into 2-3 components and explain in detail how they work. For example, how well does each component scale? Any relevant algorithm or data structure you like to use for a component? Also you could draw a diagram using the diagramming tool to enhance your design...


The news feed service will use an algorithm such as the round robin algorithm to compute the most relevant tweet to display in the user homepage.

Also this could be precomputed to avoid latency when user is loading the homepage.


The home feed service will get data from the redis cache. On cache succes the data are returned. Otherwise a call will be done on the database that will get static files from S3 if needed



Trade offs/Tech choices

Explain any trade offs you have made and why you made certain tech choices...


The DB used is MySQL because model is relational

The pull model used to render the home page may have some limitation such as taking some time to get all tweets.

This can be improved by using an hybrid mode that will switch between pull / push model if user have a lot of followed people to reduce call to database



Failure scenarios/bottlenecks

Try to discuss as many failure scenarios/bottlenecks as possible.


If a tweet is really hot It can put pressure on Redis server so we may be able to add a hot zone to Redis if the tweet is hot.



Future improvements

What are some future improvements you would make? How would you mitigate the failure scenario(s) you described above?


We may manager disaster recovery and availability of service using a multi-region active-active strategy. This involve deploying service clusters and database clusters in multiples locations. Using an automatic fail over and load balancing