System requirements
Functional:
user can send up to 140 character message
user can follow other users
user can like other users tweets
User home feed show tweets from the users they are following
user home feed will show top k pupilar tweets
Non-Functional:
scalability
response time
consistency requirement
security
Capacity estimation
500M DAU
tweets 2 /day 1b new tweets/day
each user./ view 100 tweets
at peek, scale up to 20% DAU 100M users
1W tweets , 140 cha, each message take 500bytes
500GB every day. two years it will be 350TB
noSql DB will be
tweet document:
tweetid
createdby
postedtime
content
medialink
number of likes
hashtag
users mentioned
user
use_id
name
nickname
DOB
gender
API design
tweet(userid, content)
follow(follower, following)
like(user_id, tweet_id)
homefeed(user_id, offset, number)
return json document contains tweets that should be shown on the users;s home feed
offset means where in the list of top tweets should start from
when user open theapp, the offset is 0, then when user scrolls down, the offset will also increase
error handling would follow HTTP error code, 4XX means client side server, and 5XX means server side error, based on different HTTP response code, it will give user different suggestions to solve the issue
Database design
tweet
tweet_id
create_by: user id
posted_time: index
content
medisalink
number of likes
User table
user_id
name
nickname
DOB
gender
Hashtag:
hashtah_id:
hashtag
tweets
UserMentioned
user_id
array of tweet_id
For tweet table, I choose MongoDB to store the data , store each tweet as documents contains fields as tweet)id, user_id, timestamp
Indexes,
Primary Index on tweet_id for fast retrieval
secondary index: on user_id and timestamp
High-level design
Use Push Model for normal users, when user create a tweet, the system will push to the feed of all their followers. This will minimize the latency, need more data storage.
Home Feed Service construs a list of tweets that are recommended for the users, and returns the list in JSON format
Request flows
Explain how the request flows from end to end in your high level design. Also you could draw a sequence diagram using the diagramming tool to enhance your explanation...
Detailed component design
Load Balance: use mulltiple LBN to allow them work together, if main LB down, the backup can take over
Plcae LB in different location/region to reduce latncy fro user far from the server
use LB service on Could providers like Azure, which can automatically setted and avoid manual intervention
use round-Robin. least connections
Redis. for active user, I will create an feed cache to reduce the latency of get feed.
For tweet-recommendation system, the home feed service will calculate TOP K tweets (base on likes, followers, refreshness) since the recommendation changes in read time, the main storage for the recommendation should be cached. the Redis can persist the data.
Analyze user interactions to recommend tweet from similar users
Sharding distribute tweet across the multiple shards based on user ID or tweetID to balance load and improve performance
partition
partition data by time(month or year) to optimize retrieval
Edge case:
a tweet is deleted but the reference/like still remain in system
- soft delete, marking the tweet as "Delete" firstallwo for potential recovery or delayed cleanup
- consistency check: periodically run consistency check
Ensuring privacy and security
- Encryption : encrypt the sensitive data
- access control: implement strict access control
ensuring high availability
- replication: replicate data access multiple nodes and data centers
- failover mechanism
Sharding:
user horizontal partitionng use hash value of user_id as key to d other shading, to make the shading even distributed, use consistency hashing to balance it
data replication:
replication strategy asynchronous replication: to improve write performance and reduces lantency
master-slave replication: one master node handle writes and synchronized data with multiple slave nodes, which is good for read-heavy workload. add back up master node to avoid single point fail
Scaling:
anto-scalin: if table reach threshold of current table size, auto increase the size
also set monitoring alert in case the table/machine is out of capacity
Trade offs/Tech choices
For databse, I choose NoSQL since the stability would be a potential issue
For cache, I choose Redis , Redis support various data type and horizontal scaling.
Failure scenarios/bottlenecks
For high fan out users, use push model will cause high latency, So we can use pull model for the celebrities.
the Feed database and cache will store a celebrities tweets table, when use ask for feed, the system will also pull the tweets from this list and add them to correct position of the feed list.
Future improvements
What are some future improvements you would make? How would you mitigate the failure scenario(s) you described above?