Design Twitter - System Design

System requirements

Functional:

Ability to compose tweets:
persist forever
text only (may contain urls to other multimedia, but will not store that media here)
280 char max
Ability to share tweets
Ability to track updates of other users (new followee tweets appear in feed)
Ability to favorite tweets
Ability to associate a tweet with other tweets (hashtag)
Ability to view another user's tweets

Non-Functional:

horizontally scalable - loadbalanced microservice architecture
highly available
https

Capacity estimation

100 million current users
Growth of 100 thousand per day (~3 million per month)
average number of followers per user - 500
average number of followees per user - 300
max tweet size is 280 characters
average 5 tweets per user daily (300million users * 5 tweets/day = 1.5 billion tweets daily ~ .5 trillion / year)
average hashtags per tweet - 2
average 10 favorited tweets per user per day (3 billion / day, ~1 trillion / year)

API design

REST API - returns JSON payload
user service:
/login
POST request with username and password in authorization header (base 64 encoded)
Returns login token and 200 OK on success (user exists and correct password)
Returns 401 Unauthorized if invalid login credentials
/create-user
PUT request with username and password in authorization header (base 64 encoded)
Returns 200 OK on success (new user)
Returns 400 Bad Request on error (e.g. user already exists)
feed service:
/get-feed?user-id=<user_id>&date=<YYYYMMDD>&max-count=<max # of most recent tweets>
GET request with user id
Returns list of at most |max-count| most recent tweets from followees and 200 OK on success
Returns 401 Unauthorized if not logged in
Returns 400 Bad Request on error (e.g., the user does not exist)
/view-user?user-id=<user_id>&date=<YYYYMMDD>&max-count=<max # of most recent tweets>
GET request with target user id
Returns list of at most |max-count| most recent tweets starting from date and working backwards and returns 200 OK on success
Returns 401 Unauthorized if not logged in
Returns 400 Bad Request on error (e.g., the user does not exist)
/get-favorite?user-id=<user_id>
GET request with user id
Returns list of favorite tweets and returns 200 OK on success
Returns 401 Unauthorized if not logged in
Returns 400 Bad Request on error (e.g., the user does not exist)
tweet service:
/tweet
PUT request with tweet content in request body
Returns 200 OK and tweet id on success
Returns 401 Unauthorized if not logged in
Returns 400 Bad Request on error (e.g. tweet too large or empty)
/favorite?user-id=<user_id>&tweet-id=<tweet_id>
PUT request with tweet id and user id as request parameters
Returns 200 OK on success
Returns 401 Unauthorized if not logged in
Returns 400 Bad Request on error (e.g., user or tweet do not exist)
/share?user-id=<user-id>&tweet-id=<tweet_id>
PUT request with tweet id and user id as request parameters
Returns 200 OK on success
Returns 401 Unauthorized if not logged in
Returns 400 Bad Request on error (e.g., user or tweet do not exist)
follow service:
/follow?followee=<followee user_id>&follower=<follower user_id>
PUT request with follower and followee ids
Returns 200 OK on success (if added follow association OR already exists)
Returns 401 Unauthorized if not logged in
Returns 400 Bad Request on error (e.g., either user id do not exist)
/unfollow?followee=<followee user_id>&follower=<follower user_id>
PUT request with follower and followee ids
Returns 200 OK on success (if removed follow association OR does not already exist)
Returns 401 Unauthorized if not logged in
Returns 400 Bad Request on error (e.g., either user id do not exist)

Database design

user table:
schema: <user_id VARCHAR(15), password_hash CHAR(8), create_time TIMESTAMP, last_login TIMESTAMP, email VARCHAR(255)>
row size ~ 15 + 1 ('\0') + 8 + 8 + 8 + 255 + 1 ('\0') bytes = 296 bytes
total size ~ 300million users * 296 bytes ~ 90000000000 bytes = 90GB
Growing at 100k daily users -> 300million / 100k users/day = 3000 days until doubles in size (roughly 8 years), so no real capacity worries here.
follow table:
schema: <followee_id VARCHAR(15), follower_id VARCHAR(15)>
row size = 32 bytes
(500 followers per user (average) + 300 followees per user (average)) * 300million users * 32 bytes = 7680000000000 bytes = 7.68 TB
tweet table:
schema: <tweet_id CHAR(16), user_id VARCHAR(15), time TIMESTAMP, content VARCHAR(280)>
row size = 16 + 15 + 1 + 8 + 280 + 1 bytes = 321 bytes
total size ~ 300million * 321 bytes * 5 per day ~ 481.5 GB/day * 365 ~ 175.7475 TB / year
For scalability, should shard on the user id.
hashtag table:
schema: <hashtag VARCHAR(140), tweet_id CHAR(16)>
row size = 140 + 1 + 16 = 157 bytes
total size ~ 2 hashtags / tweet * 300 million users * 5 tweets / day = 3billion hashtags / day * 157 bytes = 471GB / day, so roughly same size as tweet table
favorite table:
schema: <tweet_id CHAR(16), user_id VARCHAR(15)>
row size = 32 bytes
total size ~ 32 bytes * 300 million users * 10 tweets / day ~ 96GB/day ~ 35TB/year

High-level design

API Gateway directs incoming requests to the services
User Service - handles creation and authentication of users.
Tweet Service - provides for tweeting, retweeting, and favoriting tweets.
Follow Service - handles the association/disassociation of one user to another (i.e., follower <-> followee).
Feed Service - handles retrieving relevant tweets, viewing latest user tweets, and retrieving favorite tweets.

Request flows

User Service - User table split for writes and read - user service will always use write table, but other services will rely on read replicas.
Tweet Service - A message queue allows tweets to be persisted asynchronously - expect the Feed Service to disseminate tweets to followers. For tweets and sharing, the tweet content is written to the Tweet DB Master. For favorites, the tweet is sent to the Favorite Service for coordination between user db (read), tweet db (read), and write the favorited tweet record in Favorite DB Master. Newly created or favorited tweets are sent to all the feed caches.
Follow Service - The follow record is written to the Follow DB Master and its read replica is utilized by the Feed Service. All follow changes are sent to the feed caches.
Feed Service - The Feed Service first checks the cache and if the cache is up-to-date (configurable TTL), returns immediately with cached data. If the cache contains old data, the feed is updated by querying the tweet read db, the favorite read db, and the follow read db.

Detailed component design

flowchart TD

client[client] --> api_gateway{API Gateway}

api_gateway --> |/login, /create-user|user_service[User Service]

api_gateway --> |/get-feed, /view-user, /get-favorite|feed_service[Feed Service]

api_gateway --> |/tweet, /favorite, /share|tweet_service[Tweet Service]

api_gateway --> |/follow, /unfollow|follow_service[Follow Service]

user_service --> |/create-user, /login|user_db_master[(User DB Master)]

user_db_master --> user_db_read_replica[(User DB Read)]

tweet_service --> |/tweet, /share|tweet_write_mq{{Tweet MQ}}

tweet_write_mq --> tweet_db_master[(Tweet DB Master)]

tweet_db_master --> tweet_db_read_replica[(Tweet DB Read)]

tweet_service -->|/favorite|favorite_write_mq{{Favorite MQ}}

favorite_write_mq --> favorite_service[Favorite Service]

favorite_service <--> user_db_read_replica

favorite_service <--> tweet_db_read_replica

favorite_service --> favorite_db_master[(Favorite DB Master)]

favorite_db_master --> favorite_db_read_replica[(Favorite DB Read)]

follow_service --> follow_write_mq{{Follow MQ}}

follow_write_mq --> follow_db_master[(Follow DB Master)]

follow_db_master --> follow_db_read_replica[(Follow DB Read)]

feed_service <--> feed_cache

tweet_db_read_replica --> tweet_feed_mq{{Tweet Feed MQ}}

tweet_feed_mq --> feed_cache

follow_db_read_replica --> follow_feed_mq{{Follow Feed MQ}}

follow_feed_mq --> feed_cache

favorite_db_read_replica --> favorite_feed_mq{{Favorite Feed MQ}}

favorite_feed_mq --> feed_cache

feed_cache <--> tweet_db_read_replica

feed_cache <--> follow_db_read_replica

feed_cache <--> favorite_db_read_replica

Trade offs/Tech choices

Message queues are used extensively for scalability, however, this comes at the cost of feed latency (feeds could be minutes out of date, depending on cache update requirements)
User DB, Favorite DB, and Follow DB can be all relational since the data sizes are fairly consistent.
Tweet DB is non-relational since the tweet content is highly variable in size. Furthermore, non-relational dbs tend to have better textual search capabilities.

Failure scenarios/bottlenecks

Feed Cache - if the cache has a lot of misses (or out-of-date), then more reads must be performed against the favorite, tweet, and follow read dbs. It's possible that not all the read dbs are up-to-date as the newly written data may not have been replicated yet. As a result, the cache would say it's up-to-date and return potentially incomplete results (e.g., missing new followee's tweets).
As tweets are sharded by user, a famous user with a large number of followers could cause shard imbalances as well as delays in feed processing.

Future improvements

Implement hashtag service
Implement textual tweet search
Implement user search
Implement private/public tweets