System requirements


Functional:

  1. Create an account with the system (authentication through IAM)
  2. Provide account details and/or change them
  3. Search for other users based on metadata
  4. Follow another user
  5. Submit a tweet:
    1. Includes up to 280 characters of text
    2. May include multimedia content
  6. View your feed: tweets submitted or favorited by people you follow
  7. Get updates to your feed pushed to you
  8. Like and/or favorite a specific tweet


Non-Functional:

  1. Support 1.5B user accounts, growing by 20% every year
  2. Expect that 250M accounts will be active each day and that:
    1. Each active account will submit or favorite an average of 10 tweets per day (2.5B tweets per day)
  3. A submitted tweet will appear in other people's feeds within 1 second
  4. Content from a new follow will appear a user's feed within 1 second
  5. Service must be always available (99.99% up-time)



Capacity estimation

User accounts: 1.5B, growing by 20% every year

Daily active accounts: 250M

Daily tweets and favorites: 2.5B

Daily storage increase for tweets: 300 bytes * 2.5B is a bit under 1TB

Accounts storage, including name, email, various metadata for 2KB per account: 3TB

Multimedia storage on CDN: assume 5% of tweets/favorites have multimedia, averaging 250KB. 2TB added per day



API design

All HTTP calls will use authentication headers utilizing the IAM user info. Each API call with validate that IAM user is in an authorization group that is allowed to make that request--and that user_id matches the account_id


Furthermore, we will perform rate limiting through the Load Balancer to prevent users from abusing the system.


POST /api/v1.0/users

{"user_id": uuid, "username": string, "birthdate": timestamp, ...}


If we don't have an existing user with that user_id, and all the fields are valid, and the username is unique:


200 {"account_id": uuid, "user_id": uuid, "username": string, "birthdate": timestamp, ...}


Otherwise 400: Bad Request


PUT /api/v1.0/users/{account_id}

{"account_id": uuid, "user_id": uuid, "username": string, "birthdate": timestamp, ...}


If all the fields are valid:


200 {"account_id": uuid, "user_id": uuid, "username": string, "birthdate": timestamp, ...}


Otherwise 400: Bad Request


GET /api/v1.0/users?username_filter=david&cursor=123abc&limit=100


200 Return a list of JSON objects for accounts matching the filters.


POST /api/v1.0/followers

{"account_id": uuid, "follow_account_id": uuid}


If both account_ids are valid:

200 {"follow_id": uuid, "account_id": uuid, "follow_account_id": uuid}


DELETE /api/v1.0/followers/{follow_id}



POST /api/v1.0/tweets

{"account_id": uuid, "message": str, "multimedia_link": str}


Check that multimedia_link is well-formed (and points to a supported content type) and that the message has at most 280 characters and they are all valid tweet characters. If so:


200 {"tweet_id": uuid, "submitted_date": timestamp, "account_id": uuid, "message": str, "multimedia_link": str}


GET /api/v1.0/feeds/{account_id}?cursor=123abc&limit=20&order_by=submitted_date


Return list of message JSON objects


POST /api/v1.0/likes


{"account_id": uuid, "tweet_id": uuid}


If tweet_id is valid,

200 {"account_id": uuid, "message_id": uuid}


registerForFeedUpdates(account_id) -> socket


This will set up a web socket connection between the client and our service that feed updates will be pushed



Database design

We will have a SQL Database (MySQL) that holds transactional data:


table Users

account_id: uuid -- Primary key

user_id: uuid UNIQUE -- Index

creation_date: timestamp

username: string

birthdate: timestamp

table Followers

follow_id: uuid -- Primary key

account_id: uuid -- Index

follow_account_id: uuid -- Index

creation_date: timestamp


We will have a NoSQL Database (Cassandra) that holds non-transactional data:


table Tweets

tweet_id: uuid -- Primary key

account_id: uuid -- Index

creation_date: timestamp

message_text: string

multimedia_link: string


table Likes:

tweet_id: uuid -- Index

account_id: uuid -- Index

Primary Key (message_id, account_id)

creation_date: timestamp


table Retweet:

tweet_id: uuid -- Index

account_id: uuid -- Index

Primary Key (message_id, account_id)

creation_date: timestamp


table Comment:

tweet_id: uuid -- Index

account_id: uuid -- Index

Primary Key (message_id, account_id)

comment_text: string

creation_date: timestamp


table Feeds:

account_id: uuid -- Primary key

messages: list of uuid

last_update: timestamp



High-level design

We have a Load Balancer to distribute requests across multiple servers.

There is a Users Service that handles Users and Followers, allowing the client to create and modify their User info, and allowing the client to add and remove users that they wish to follow

There is a Tweets Service that handles Tweets (including Retweets and Comments). A client can create a message and modify it. These messages get attached to the feeds of users that follow them.

There is a Metadata Service that updates Feeds and Tables in response to the actions on the Tweets Service. It is responsible for updating client Feeds. It uses Kubernetes so that it can dynamically scale to provide the needed computational resources.

There is a MySQL database used to store User details, including Followers. There is a Redis cache that store most recent Users and Followers

There is a DynamoDB database that stores our Messages, Likes, and Feeds. There is a Memcached cache to speed up access

We utilize a CDN to store multimedia content included in Tweets.

Finally there is a message pump service that connects to the client over Web Sockets to push updates to the client



Request flows

When a client posts a Tweet, the POST is routed through the Load Balancer to the Tweets Service. The Tweets Service checks the validity of the POST (auth and syntax) and updates the data. It also tells the Metadata service to update Feeds, etc., that are impacted by the tweet.





Detailed component design

Updates to the client are sent over Web Sockets, through the Message Pump





Trade offs/Tech choices

We choose a Load Balancer to provide redundancy across servers and to enable us to scale horizontally. Also, if our service gets big enough, we can shard the DB to improve performance

We use a SQL DB (MySQL) for the to ensure that Account-related operations are transactional

We use a NoSQL DB to support sharding and rapid updating (Cassandara)

We use a Cache to improve DB performance (Redis)

We employ a Message Pump (Kafka) to send updates back to the client

Multimedia content is accessed through a CDN so that our services don't have to send large multimedia files themselves





Failure scenarios/bottlenecks

The biggest bottleneck concern is for managing all the processing needed after Tweets, Retweets, and Commenting. Kubernetes will enable us to scale dynamically

There are many failure scenarios involving deleted Users and Tweets. Some of that can be handled in the Metadata Service





Future improvements

  1. Detail explicit scenarios for dealing with deleted Users and Tweets.