System requirements
Functional:
- management of user accounts
- support different account types:
- REALTIME_INFO
- useful for news organizations or governmental agencies with fewer than 1000 followers
- tweets purged systematically after defined period
- tweet writes not throttled
- tweet writes pushed to follower feeds
- activity limited to tweet writes
- REALTIME_INFO_POWER
- useful for news organizations or governmental agencies with at least 1000 followers
- tweets purged systematically after defined period
- tweet writes not throttled
- tweet writes pulled to follower feeds
- activity limited to tweet writes
- USER:
- user account with fewer than 1000 followers
- tweet writes throttled
- tweet writes pushed to follower feeds
- USER_POWER:
- user account with at least 1000 followers
- tweet writes throttled
- tweet writes pulled to follower feeds
- REALTIME_INFO
- support the following account actions:
- creation
- update
- retrieval
- deletion
- undeletion
- support different account types:
- all users can create a tweet
- all users can share a tweet
- all users can delete a tweet
- all users can undelete a tweet
- non REALTIME_INFO users can subscribe to other users (i.e., "follow")
- user feed is generated for non REALTIME_INFO users
- user feed populated by latest followee activity, including:
- tweets
- retweets
- likes
- user feed prepopulated with at most 10 latest activities from followees
- user can pull additional activities for their feed, though these are not cached
- tweet/account deletion is first archived for 30 days after which deletion is permanent.
- tweet content consists of only of text
Non-Functional:
- high availability - cross-region resilience
- maximum allowable read (e.g., feed retrieval, tweet search) latency: 500ms
- maximum allowable write (e.g., compose or like a tweet, follow another user) latency: 200ms
- eventual consistency
- tweet maximum size is 140 characters, including hashtags
- userIds must be between 1 and 255 characters (inclusive) and consist only of lowercase alphabetical characters (a-z) and numerical digits (0-9)
- throttle writes:
- tweet service:
- 10 per minute for non REALTIME_INFO per IP address (e.g., VPN throttling)
- unlimited for REALTIME_INFO
- account management service:
- 1 batch creation weekly per user OR IP
- 1 single account creation per user OR IP per minute, maximum of 100 per year
- tweet service:
- account deactivation:
- last login > 30 days, reactivated by user
- tweet content violation, reactivated only by administrator
Capacity estimation
- total unique users: 1 billion
- daily active users: 500 million
- expected yearly user growth rate is 10%: 100 million
- average new tweets per user: 2 daily, 730 yearly
API design
- account management service:
- create(userId, [optional user account details])
- creates new user account, returns 201 CREATED
- returns 409 CONFLICT if userId already exists
- returns 400 BAD REQUEST if userId invalid
- createBatch(userIds, [optional user account details])
- creates new user accounts and status of each user account creation (e.g., CREATED, ALREADY EXISTS, INVALID USERID), returns 200 OK
- update(userId, login JWT, [optional user account details])
- updates user account (e.g. user information, account activation/deactivation), returns 200 OK
- returns 401 UNAUTHORIZED if not logged in
- returns 403 FORBIDDEN if attempting to update account other than own (administrators would have access)
- returns 404 NOT FOUND if user does not exist
- returns 400 BAD REQUEST otherwise
- retrieve(userId, login JWT)
- returns user account details, returns 200 OK
- returns 401 UNAUTHORIZED if not logged in
- returns 403 FORBIDDEN if attempting to retrieve account other than own (administrators would have access)
- returns 404 NOT FOUND if user does not exist
- returns 400 BAD REQUEST otherwise
- deleteUser(userId, login JWT)
- deletes user account, returns 200 OK
- returns 401 UNAUTHORIZED if not logged in
- returns 403 FORBIDDEN if attempting to delete account other than own (administrators would have access)
- returns 404 NOT FOUND if user does not exist
- returns 400 BAD REQUEST otherwise
- undeleteUser(userId, login JWT)
- undeletes user account if still archived, returns 200 OK
- returns 401 UNAUTHORIZED if not logged in
- returns 403 FORBIDDEN if attempting to undelete account other than own (administrators would have access)
- returns 404 NOT FOUND if user does not exist in the archive (i.e., user never existed OR user was archived following deletion request and was ultimately purged due to exceeding the maximum archive age limit)
- returns 400 BAD REQUEST otherwise
- login(userId, password)
- returns login JWT, 200 OK
- returns 400 BAD REQUEST otherwise
- logout(login JWT)
- deletes the JWT, returns 200 OK
- returns 400 BAD REQUEST otherwise
- create(userId, [optional user account details])
- feed service:
- getFeed(userId, JWT)
- returns latest feed, 200 OK
- returns 401 UNAUTHORIZED if not logged in
- returns 403 FORBIDDEN if attempting to get feed other than own (administrators would have access)
- returns 404 NOT FOUND if userId does not exist
- returns 400 BAD REQUEST otherwise
- getFeed(userId, JWT)
- tweet write service:
- tweet(userId, content, JWT)
- returns tweetId, 200 OK
- returns 401 UNAUTHORIZED if not logged in
- returns 403 FORBIDDEN if attempting to write tweet for other than own (administrators would have access)
- returns 404 NOT FOUND if userId does not exist
- returns 400 BAD REQUEST if content invalid
- like(userId, tweetId, JWT)
- creates like (increments count on the tweet), returns 200 OK
- returns 401 UNAUTHORIZED if not logged in
- returns 403 FORBIDDEN if attempting to like tweet for other than own user (administrators would have access)
- returns 404 NOT FOUND if tweetId or userId do not exist
- returns 400 BAD REQUEST otherwise
- retweet(userId, tweetId, JWT)
- returns new tweetId on behalf of user, returns 200 OK
- returns 401 UNAUTHORIZED if not logged in
- returns 403 FORBIDDEN if attempting to retweet for other than own user (administrators would have access)
- returns 404 NOT FOUND if tweetId or userId do not exist
- returns 400 BAD REQUEST otherwise
- follow(userId, targetUserId, JWT)
- returns 200 OK
- returns 401 UNAUTHORIZED if not logged in
- returns 403 FORBIDDEN if attempting to follow user for other than own user (administrators would have access)
- returns 404 NOT FOUND if targetUserId or userId do not exist
- returns 400 BAD REQUEST otherwise
- unfollow(userId, targetUserId, JWT)
- returns 200 OK
- returns 401 UNAUTHORIZED if not logged in
- returns 403 FORBIDDEN if attempting to unfollow user for other than own user (administrators would have access)
- returns 404 NOT FOUND if targetUserId or userId do not exist
- returns 400 BAD REQUEST otherwise
- tweet(userId, content, JWT)
- tweet read service:
- getTweet(tweetId, JWT)
- returns tweet content, 200 OK
- returns 401 UNAUTHORIZED if not logged in
- returns 404 NOT FOUND if tweetId does not exist
- returns 400 BAD REQUEST otherwise
- getTweets(userId, startTime, endTime, JWT)
- returns tweet content, 200 OK
- returns 401 UNAUTHORIZED if not logged in
- returns 404 NOT FOUND if tweetId does not exist
- returns 400 BAD REQUEST otherwise
- getTweet(tweetId, JWT)
Database design
- ACCOUNT - shard on userId
- id - string up to 255 bytes
- password_hash - string up to 255 bytes
- details - string up to 1023 bytes
- fieldName
- fieldValue
- displayable
- type - encoded, up to 2 bytes
- R: REALTIME_INFO
- RP: REALTIME_INFO_POWER
- U: USER
- UP: USER_POWER
- status - encoded, 1 byte
- A - active
- L - locked (due to violation)
- I - inactive (deactivated due to inactivity)
- D - deleted, but still archived
- TWEET - shard on userId
- userId - string up to 255 bytes
- id - string up to 255 bytes
- content - string up to 140 bytes
- likes - 4 bytes
- timestamp - 8 bytes
- FOLLOWEE - shard on userId
- userId - string up to 255 bytes
- followeeIds - list of followee ids, each up to 255 bytes
- FOLLOWER - shard on userId
- userId - string up to 255 bytes
- followerIds - list of follower ids, each up to 255 bytes
High-level design
Request flows
Explain how the request flows from end to end in your high level design. Also you could draw a sequence diagram using the diagramming tool to enhance your explanation...
Detailed component design
Account Write Service:
- handles all account creation, updates, and (un)deletions
- handles login and logout
- writes to cache
Account Monitor Service:
- deactivates inactive accounts
- locks accounts for content violators
Account Read Service:
- retrieves account details from cache
- updates cache on miss
Tweet Write Service:
- handles creation and deletion of tweets, likes, and retweets
- for non power users, full tweet is sent to feed write service queue
- for power users, userId and tweetId is sent to feed write service queue
Tweet Read Service:
- handles retrieval of tweets from cache
- on cache miss, retrieve from db and update cache
Feed Write Service:
- consumes new tweets from Tweet Write Service and writes to cache and db
Feed Read Service:
- retrieves feed from cache
- for power user tweets, retrieve from Tweet Read Service
- retrieves additional feed from db upon request (does not cache)
Trade offs/Tech choices
Explain any trade offs you have made and why you made certain tech choices...
Failure scenarios/bottlenecks
Try to discuss as many failure scenarios/bottlenecks as possible.
Future improvements
What are some future improvements you would make? How would you mitigate the failure scenario(s) you described above?