System requirements
Functional:
user can send up to 140 character message
user can follow other users
user can like other users tweets
User home feed show tweets from the users they are following
user home feed will show top k pupilar tweets
Non-Functional:
scalability
response time
consistency requirement
security
Capacity estimation
500M DAU
tweets 2 /day 1b new tweets/day
each user./ view 100 tweets
at peek, scale up to 20% DAU 100M users
1W tweets , 140 cha, each message take 500bytes
500GB every day. two years it will be 350TB
noSql DB will be
tweet document:
tweetid
createdby
postedtime
content
medialink
number of likes
hashtag
users mentioned
user
use_id
name
nickname
DOB
gender
API design
tweet(userid, content)
follow(follower, following)
like(user_id, tweet_id)
homefeed(user_id, offset, number)
return json document contains tweets that should be shown on the users;s home feed
offset means where in the list of top tweets should start from
when user open theapp, the offset is 0, then when user scrolls down, the offset will also increase
error handling would follow HTTP error code, 4XX means client side server, and 5XX means server side error, based on different HTTP response code, it will give user different suggestions to solve the issue
Database design
tweet
tweet_id
create_by: user id
posted_time: index
content
medisalink
number of likes
User table
user_id
name
nickname
DOB
gender
Hashtag:
hashtah_id:
hashtag
tweets
UserMentioned
user_id
array of tweet_id
High-level design
Use Push Model for normal users, when user create a tweet, the system will push to the feed of all their followers. This will minimize the latency, need more data storage.
Home Feed Service construs a list of tweets that are recommended for the users, and returns the list in JSON format
Request flows
Explain how the request flows from end to end in your high level design. Also you could draw a sequence diagram using the diagramming tool to enhance your explanation...
Detailed component design
Load Balance: use mulltiple LBN to allow them work together, if main LB down, the backup can take over
Plcae LB in different location/region to reduce latncy fro user far from the server
use LB service on Could providers like Azure, which can automatically setted and avoid manual intervention
use round-Robin. least connections
Redis. for active user, I will create an feed cache to reduce the latency of get feed.
For tweet-recommendation system, the home feed service will calculate TOP K tweets (base on likes, followers, refreshness) since the recommendation changes in read time, the main storage for the recommendation should be cached. the Redis can persist the data.
Trade offs/Tech choices
For databse, I choose NoSQL since the stability would be a potential issue
For cache, I choose Redis , Redis support various data type and horizontal scaling.
Failure scenarios/bottlenecks
For celebrity, use push model will cause high latency, So we can use pull model for the celebrities.
the Feed database and cache will store a celebrities tweets table, when use ask for feed, the system will also pull the tweets from this list and add them to correct position of the feed list.
Future improvements
What are some future improvements you would make? How would you mitigate the failure scenario(s) you described above?