System requirements
Functional:
a) Users can post text and media tweets including images and videos
b) The length of tweets - 140 characters
c) Hashtag for searching feature
d) Users can follow other users and can view their tweets on their home page
e) Users can save tweets as favourite tweets
Non-Functional:
a) Availability
b) Scalability
c) Reliability
d) Tweet should be delivered within minutes
c) Receive notifications of tweets within minutes
c) Eventual consistency
Capacity estimation
a) 500 M user base
b) 100 M daily active users
c) Follows average 100 users
d) Average 1 tweet per day
e) Logged in 10 times a day to view tweets
f) One out of 4 tweets is an image size 5 MB
g) One out of 5 tweets is video size 100 MB
h) Write TPS -> 10^8 /10^5 -> 1000 writes/sec
i) Read TPS -> 10 * 1000-> 10000 read/sec
j) Image Storage -> 25 *10^6*5 MB /day -> 125 TB / day
k) Video Storage -> 20*10^6*100 MB -> 2 PB
API design
POST /content
Attributes - Content
userID
PUT /content
Attributes contentID
userID
actionID
GET /feed
Attribute - userID
lastAccessTime
POST /follow
Attributes - userID
targetUserID
DELETE /follow
userID
targetUserID
Database design
User {
userID int (4 bytes)
login Varchar (15)
FirstName Varchar (15)
SecondName Varchar (15)
lastLogin TIMESTAMP (8)
}
Following {
userID int (4)
targetUserID int(4)
}
Tweet{
tweetID 8
userID 4
tweetText 256
imageURL 256
videoURL 256
tweetDate 8
noOfLikes
noOfDislikes
}
TweetAction{
actionID
actionType
tweetID
userID
actionDate
comments
}
High-level design
PostTweets
Request flows
- User Logged In: The entry point when a user is authenticated and ready to post tweets.
- Post Tweets: The action the user takes to compose and submit a tweet.
- Object Store (Amazon S3): Stores media such as images and videos associated with the tweets.
- Tweet Table: Contains the text of the tweets and URLs to any included media.
- Fan Out Service: Gathers follower data to pre-compute timelines.
- Message Queues: Distributes the tweets to subscribers.
- Subscribers: Responsible for caching the new tweets.
- Cache with TTL Eviction Policy: Manages stored tweets with a Time-To-Live policy for cache management and eviction.
Here’s how this flow can be represented in a mermaid diagram:
Post Tweet
Store Images/Videos
Store Text Tweets + URLs
Trigger Fan Out
Get Follower Data
Send Tweets to Message Queues
Deliver Tweets
Append New Tweets
Eviction Policy
User Logged In
Post Tweets
Object Store Amazon S3
Tweet Table
Fan Out Service
Following Table
Message Queues
Subscribers
Cache
TTL Eviction Policy
Explanation of the Request Flow Diagram:
- A[User Logged In] → B[Post Tweets]: The process begins when a user is logged in and posts a tweet.
- B[Post Tweets] → C[Object Store Amazon S3]: Images or videos associated with the tweet are stored in the Object Store.
- B[Post Tweets] → D[Tweet Table]: Textual content and URLs linking to stored media are saved in the Tweet Table.
- D[Tweet Table] → E[Fan Out Service]: The Fan Out service is triggered to handle the new tweet.
- E[Fan Out Service] → F[Following Table]: The Fan Out service queries the Following table to get relevant follower data.
- E[Fan Out Service] → G[Message Queues]: The tweets are sent to the Message Queue for later processing.
- G[Message Queues] → H[Subscribers]: Subscribers take the tweets from the queue.
- H[Subscribers] → I[Cache]: New tweets are appended to the local cache.
- I[Cache] → J[TTL Eviction Policy]: The cache operates under a TTL eviction policy to manage data expiration.
This diagram illustrates the complete flow of a user posting a tweet and how it propagates through the system, ensuring efficient caching and retrieval. Would you like to explore any specific parts of this flow or add additional features?
Detailed component design
Twitter Tab
Trade offs/Tech choices
Database -> MySQL because data requires a relation between users, users shared based on userID
Tweet Table -> NoSQL DB Cassandra for high volume of Read/Write Partition Key TweetID
MessageQueue -> Kafaka or Amazon Kinesis
Cache -> Redis to support TTL
Failure scenarios/bottlenecks
HotKey problem due to influencer users ->
Cache misuse for inactive users
Future improvements
HotKey problem due to influencer users -> Tweets posted by influencer users are added in the pre generated tweet timelines to the followers.
Use Hybrid option for generating Timeline views based on User Analytics. FanOut on write for active users and Fan Out on read for inactive users