Design Twitter - System Design

System requirements

Functional:

Create an account with the system (authentication through IAM)
Provide account details and/or change them
Search for other users based on metadata
Follow another user
Submit a tweet:
1. Includes up to 280 characters of text
2. May include multimedia content
View your feed: tweets submitted or favorited by people you follow
Get updates to your feed pushed to you
Like and/or favorite a specific tweet

Non-Functional:

Support 1.5B user accounts, growing by 20% every year
Expect that 250M accounts will be active each day and that:
1. Each active account will submit or favorite an average of 10 tweets per day (2.5B tweets per day)
A submitted tweet will appear in other people's feeds within 1 second
Content from a new follow will appear a user's feed within 1 second
Service must be always available (99.99% up-time)

Capacity estimation

User accounts: 1.5B, growing by 20% every year

Daily active accounts: 250M

Daily tweets and favorites: 2.5B

Daily storage increase for tweets: 300 bytes * 2.5B is a bit under 1TB

Accounts storage, including name, email, various metadata for 2KB per account: 3TB

Multimedia storage on CDN: assume 5% of tweets/favorites have multimedia, averaging 250KB. 2TB added per day

API design

All HTTP calls will use authentication headers utilizing the IAM user info. Each API call with validate that IAM user is in an authorization group that is allowed to make that request--and that user_id matches the account_id

Furthermore, we will perform rate limiting through the Load Balancer to prevent users from abusing the system.

POST /api/v1.0/users

{"user_id": uuid, "username": string, "birthdate": timestamp, ...}

If we don't have an existing user with that user_id, and all the fields are valid, and the username is unique:

200 {"account_id": uuid, "user_id": uuid, "username": string, "birthdate": timestamp, ...}

Otherwise 400: Bad Request

PUT /api/v1.0/users/{account_id}

{"account_id": uuid, "user_id": uuid, "username": string, "birthdate": timestamp, ...}

If all the fields are valid:

200 {"account_id": uuid, "user_id": uuid, "username": string, "birthdate": timestamp, ...}

Otherwise 400: Bad Request

GET /api/v1.0/users?username_filter=david&cursor=123abc&limit=100

200 Return a list of JSON objects for accounts matching the filters.

POST /api/v1.0/followers

{"account_id": uuid, "follow_account_id": uuid}

If both account_ids are valid:

200 {"follow_id": uuid, "account_id": uuid, "follow_account_id": uuid}

DELETE /api/v1.0/followers/{follow_id}

POST /api/v1.0/tweets

{"account_id": uuid, "message": str, "multimedia_link": str}

Check that multimedia_link is well-formed (and points to a supported content type) and that the message has at most 280 characters and they are all valid tweet characters. If so:

200 {"tweet_id": uuid, "submitted_date": timestamp, "account_id": uuid, "message": str, "multimedia_link": str}

GET /api/v1.0/feeds/{account_id}?cursor=123abc&limit=20&order_by=submitted_date

Return list of message JSON objects

POST /api/v1.0/likes

{"account_id": uuid, "tweet_id": uuid}

If tweet_id is valid,

200 {"account_id": uuid, "message_id": uuid}

registerForFeedUpdates(account_id) -> socket

This will set up a web socket connection between the client and our service that feed updates will be pushed

Database design

We will have a SQL Database (MySQL) that holds transactional data:

table Users

account_id: uuid -- Primary key

user_id: uuid UNIQUE -- Index

creation_date: timestamp

username: string

birthdate: timestamp

table Followers

follow_id: uuid -- Primary key

account_id: uuid -- Index

follow_account_id: uuid -- Index

creation_date: timestamp

We will have a NoSQL Database (Cassandra) that holds non-transactional data:

table Tweets

tweet_id: uuid -- Primary key

account_id: uuid -- Index

creation_date: timestamp

message_text: string

multimedia_link: string

table Likes:

tweet_id: uuid -- Index

account_id: uuid -- Index

Primary Key (message_id, account_id)

creation_date: timestamp

table Retweet:

tweet_id: uuid -- Index

account_id: uuid -- Index

Primary Key (message_id, account_id)

creation_date: timestamp

table Comment:

tweet_id: uuid -- Index

account_id: uuid -- Index

Primary Key (message_id, account_id)

comment_text: string

creation_date: timestamp

table Feeds:

account_id: uuid -- Primary key

messages: list of uuid

last_update: timestamp

High-level design

We have a Load Balancer to distribute requests across multiple servers.

There is a Users Service that handles Users and Followers, allowing the client to create and modify their User info, and allowing the client to add and remove users that they wish to follow

There is a Tweets Service that handles Tweets (including Retweets and Comments). A client can create a message and modify it. These messages get attached to the feeds of users that follow them.

There is a Metadata Service that updates Feeds and Tables in response to the actions on the Tweets Service. It is responsible for updating client Feeds. It uses Kubernetes so that it can dynamically scale to provide the needed computational resources.

There is a MySQL database used to store User details, including Followers. There is a Redis cache that store most recent Users and Followers

There is a DynamoDB database that stores our Messages, Likes, and Feeds. There is a Memcached cache to speed up access

We utilize a CDN to store multimedia content included in Tweets.

Finally there is a message pump service that connects to the client over Web Sockets to push updates to the client

Request flows

When a client posts a Tweet, the POST is routed through the Load Balancer to the Tweets Service. The Tweets Service checks the validity of the POST (auth and syntax) and updates the data. It also tells the Metadata service to update Feeds, etc., that are impacted by the tweet.

Detailed component design

Updates to the client are sent over Web Sockets, through the Message Pump

Trade offs/Tech choices

We choose a Load Balancer to provide redundancy across servers and to enable us to scale horizontally. Also, if our service gets big enough, we can shard the DB to improve performance

We use a SQL DB (MySQL) for the to ensure that Account-related operations are transactional

We use a NoSQL DB to support sharding and rapid updating (Cassandara)

We use a Cache to improve DB performance (Redis)

We employ a Message Pump (Kafka) to send updates back to the client

Multimedia content is accessed through a CDN so that our services don't have to send large multimedia files themselves

Failure scenarios/bottlenecks

The biggest bottleneck concern is for managing all the processing needed after Tweets, Retweets, and Commenting. Kubernetes will enable us to scale dynamically

There are many failure scenarios involving deleted Users and Tweets. Some of that can be handled in the Metadata Service

Future improvements

Detail explicit scenarios for dealing with deleted Users and Tweets.