System requirements
Functional:
- Create an account with the system (authentication through IAM)
- Provide account details and/or change them
- Search for other users based on metadata
- Follow another user
- Submit a tweet:
- Includes up to 280 characters of text
- May include multimedia content
- View your feed: tweets submitted or favorited by people you follow
- Get updates to your feed pushed to you
- Like and/or favorite a specific tweet
Non-Functional:
- Support 1.5B user accounts, growing by 20% every year
- Expect that 250M accounts will be active each day and that:
- Each active account will submit or favorite an average of 10 tweets per day (2.5B tweets per day)
- A submitted tweet will appear in other people's feeds within 1 second
- Content from a new follow will appear a user's feed within 1 second
- Service must be always available (99.99% up-time)
Capacity estimation
User accounts: 1.5B, growing by 20% every year
Daily active accounts: 250M
Daily tweets and favorites: 2.5B
Daily storage increase for tweets: 300 bytes * 2.5B is a bit under 1TB
Accounts storage, including name, email, various metadata for 2KB per account: 3TB
Multimedia storage on CDN: assume 5% of tweets/favorites have multimedia, averaging 250KB. 2TB added per day
API design
All HTTP calls will use authentication headers utilizing the IAM user info. Each API call with validate that IAM user is in an authorization group that is allowed to make that request--and that user_id matches the account_id
Furthermore, we will perform rate limiting through the Load Balancer to prevent users from abusing the system.
POST /api/v1.0/users
{"user_id": uuid, "username": string, "birthdate": timestamp, ...}
If we don't have an existing user with that user_id, and all the fields are valid, and the username is unique:
200 {"account_id": uuid, "user_id": uuid, "username": string, "birthdate": timestamp, ...}
Otherwise 400: Bad Request
PUT /api/v1.0/users/{account_id}
{"account_id": uuid, "user_id": uuid, "username": string, "birthdate": timestamp, ...}
If all the fields are valid:
200 {"account_id": uuid, "user_id": uuid, "username": string, "birthdate": timestamp, ...}
Otherwise 400: Bad Request
GET /api/v1.0/users?username_filter=david&cursor=123abc&limit=100
200 Return a list of JSON objects for accounts matching the filters.
POST /api/v1.0/followers
{"account_id": uuid, "follow_account_id": uuid}
If both account_ids are valid:
200 {"follow_id": uuid, "account_id": uuid, "follow_account_id": uuid}
DELETE /api/v1.0/followers/{follow_id}
POST /api/v1.0/tweets
{"account_id": uuid, "message": str, "multimedia_link": str}
Check that multimedia_link is well-formed (and points to a supported content type) and that the message has at most 280 characters and they are all valid tweet characters. If so:
200 {"tweet_id": uuid, "submitted_date": timestamp, "account_id": uuid, "message": str, "multimedia_link": str}
GET /api/v1.0/feeds/{account_id}?cursor=123abc&limit=20&order_by=submitted_date
Return list of message JSON objects
POST /api/v1.0/likes
{"account_id": uuid, "tweet_id": uuid}
If tweet_id is valid,
200 {"account_id": uuid, "message_id": uuid}
registerForFeedUpdates(account_id) -> socket
This will set up a web socket connection between the client and our service that feed updates will be pushed
Database design
We will have a SQL Database (MySQL) that holds transactional data:
table Users
account_id: uuid -- Primary key
user_id: uuid UNIQUE -- Index
creation_date: timestamp
username: string
birthdate: timestamp
table Followers
follow_id: uuid -- Primary key
account_id: uuid -- Index
follow_account_id: uuid -- Index
creation_date: timestamp
We will have a NoSQL Database (Cassandra) that holds non-transactional data:
table Tweets
tweet_id: uuid -- Primary key
account_id: uuid -- Index
creation_date: timestamp
message_text: string
multimedia_link: string
table Likes:
tweet_id: uuid -- Index
account_id: uuid -- Index
Primary Key (message_id, account_id)
creation_date: timestamp
table Retweet:
tweet_id: uuid -- Index
account_id: uuid -- Index
Primary Key (message_id, account_id)
creation_date: timestamp
table Comment:
tweet_id: uuid -- Index
account_id: uuid -- Index
Primary Key (message_id, account_id)
comment_text: string
creation_date: timestamp
table Feeds:
account_id: uuid -- Primary key
messages: list of uuid
last_update: timestamp
High-level design
We have a Load Balancer to distribute requests across multiple servers.
There is a Users Service that handles Users and Followers, allowing the client to create and modify their User info, and allowing the client to add and remove users that they wish to follow
There is a Tweets Service that handles Tweets (including Retweets and Comments). A client can create a message and modify it. These messages get attached to the feeds of users that follow them.
There is a Metadata Service that updates Feeds and Tables in response to the actions on the Tweets Service. It is responsible for updating client Feeds. It uses Kubernetes so that it can dynamically scale to provide the needed computational resources.
There is a MySQL database used to store User details, including Followers. There is a Redis cache that store most recent Users and Followers
There is a DynamoDB database that stores our Messages, Likes, and Feeds. There is a Memcached cache to speed up access
We utilize a CDN to store multimedia content included in Tweets.
Finally there is a message pump service that connects to the client over Web Sockets to push updates to the client
Request flows
When a client posts a Tweet, the POST is routed through the Load Balancer to the Tweets Service. The Tweets Service checks the validity of the POST (auth and syntax) and updates the data. It also tells the Metadata service to update Feeds, etc., that are impacted by the tweet.
Detailed component design
Updates to the client are sent over Web Sockets, through the Message Pump
Trade offs/Tech choices
We choose a Load Balancer to provide redundancy across servers and to enable us to scale horizontally. Also, if our service gets big enough, we can shard the DB to improve performance
We use a SQL DB (MySQL) for the to ensure that Account-related operations are transactional
We use a NoSQL DB to support sharding and rapid updating (Cassandara)
We use a Cache to improve DB performance (Redis)
We employ a Message Pump (Kafka) to send updates back to the client
Multimedia content is accessed through a CDN so that our services don't have to send large multimedia files themselves
Failure scenarios/bottlenecks
The biggest bottleneck concern is for managing all the processing needed after Tweets, Retweets, and Commenting. Kubernetes will enable us to scale dynamically
There are many failure scenarios involving deleted Users and Tweets. Some of that can be handled in the Metadata Service
Future improvements
- Detail explicit scenarios for dealing with deleted Users and Tweets.