System requirements


Functional:

user can send up to 140 character message

user can follow other users

user can like other users tweets

User home feed show tweets from the users they are following

user home feed will show top k pupilar tweets




Non-Functional:

scalability

response time

consistency requirement

security




Capacity estimation

500M DAU

tweets 2 /day 1b new tweets/day

each user./ view 100 tweets


at peek, scale up to 20% DAU 100M users


1W tweets , 140 cha, each message take 500bytes

500GB every day. two years it will be 350TB


noSql DB will be



tweet document:

tweetid

createdby

postedtime

content

medialink

number of likes

hashtag

users mentioned


user

use_id

email

name

nickname

DOB

gender










API design

tweet(userid, content)

follow(follower, following)

like(user_id, tweet_id)

homefeed(user_id, offset, number)

return json document contains tweets that should be shown on the users;s home feed

offset means where in the list of top tweets should start from

when user open theapp, the offset is 0, then when user scrolls down, the offset will also increase


error handling would follow HTTP error code, 4XX means client side server, and 5XX means server side error, based on different HTTP response code, it will give user different suggestions to solve the issue





Database design

tweet

tweet_id

create_by: user id

posted_time: index

content

medisalink

number of likes


User table

user_id

email

name

nickname

DOB

gender


Hashtag:

hashtah_id:

hashtag

tweets


UserMentioned

user_id

array of tweet_id







High-level design

Use Push Model for normal users, when user create a tweet, the system will push to the feed of all their followers. This will minimize the latency, need more data storage.


For






Request flows

Explain how the request flows from end to end in your high level design. Also you could draw a sequence diagram using the diagramming tool to enhance your explanation...






Detailed component design

Load Balance: use mulltiple LBN to allow them work together, if main LB down, the backup can take over

Plcae LB in different location/region to reduce latncy fro user far from the server

use LB service on Could providers like Azure, which can automatically setted and avoid manual intervention

use round-Robin. least connections


Redis. for active user, I will create an feed cache to reduce the lanten


Trade offs/Tech choices


For databse, I choose NoSQL since the stability would be a potential issue


For cache, I choose Redis , Redis support various data type and horizontal scaling.



Failure scenarios/bottlenecks

For celebrity, use push model will cause high latency, So we can use pull model for the celebrities.

the Feed database and cache will store a celebrities tweets table, when use ask for feed, the system will also pull the tweets from this list and add them to correct position of the feed list.







Future improvements

What are some future improvements you would make? How would you mitigate the failure scenario(s) you described above?