System requirements
Functional:
User Registration/Authentication: Users can sign up/login via email, phone or social media accounts.
Post Tweets: Users can post short text messages(tweets) with optimal media like images or videos.
Follow Users: Users can follow/unfollow other users.
Timeline: Users should see tweets from the people they follow, sorted chronologically or by relevance.
Like/Retweet: Users can like or retweet tweets to show appreciation or share them with followers.
Search: Users can search tweets and accounts by keywords, hashtags, or usernames.
Notifications: Users should receive notifications for likes, retweets, mentions, and new followers.
Non-Functional:
High Availability: The service must be available 24/7.
• Low Latency: The system must provide fast access to timelines and post real-time updates.
• Scalability: The platform should scale to handle millions of users and tweets per day.
• Consistency: Ensure consistency in likes, retweets, and follow counts across the system.
• Fault Tolerance: The system should be resilient to node failures.
Capacity estimation
Traffic Estimation
Total Users:
• Assume Twitter has 500 million users (active and inactive).
• We will focus on the Active Daily Users (ADUs), which could be around 20% of the total user base.
• Active Daily Users (ADUs): 500M × 20% = 100 million ADUs.
Number of Tweets:
• Assume each active user tweets 3 times per day on average.
• Total tweets per day: 100M ADUs × 3 tweets = 300 million tweets/day.
• Tweets per second (TPS): 300M tweets ÷ 86,400 seconds (per day) ≈ 3,500 tweets per second (TPS).
Number of Likes/Retweets:
• On average, each tweet gets 10 interactions (likes, retweets, or replies).
• Total interactions per day: 300M tweets × 10 interactions = 3 billion interactions/day.
• Interactions per second: 3B ÷ 86,400 ≈ 35,000 interactions per second (IPS).
Reads per Write:
• For each tweet posted, there are an average of 100 reads (timeline views).
• Reads per day: 300M tweets × 100 reads = 30 billion read requests/day.
• Reads per second: 30B ÷ 86,400 ≈ 347,222 reads per second.
2. Data Storage Estimation
Tweet Storage:
• Each tweet can contain text (280 characters max), metadata (timestamp, user ID, location), and possibly media (images/videos).
• Assume:
• Text-only tweet size: ~300 bytes (280 characters + metadata).
• Media (images/videos): Let’s assume 1 in 5 tweets contains media, with the average media size being ~500KB (compressed image/video).
• Average tweet size: (4 × 300 bytes + 1 × 500KB) ÷ 5 ≈ 104KB per tweet.
• Total tweet storage per day:
• 300M tweets × 104KB = ~30TB/day.
• Total tweet storage per year:
• 30TB/day × 365 = 10.95PB/year (petabytes).
Like/Retweet Storage:
• Each like or retweet stores the user ID and tweet ID (around 20 bytes).
• Like/retweet storage per day: 3 billion interactions × 20 bytes = 60GB/day.
• Like/retweet storage per year: 60GB/day × 365 = ~22TB/year.
User Storage:
• Each user profile (user ID, username, bio, email, hashed password, etc.) may take ~1KB of storage.
• For 500M users:
• User profile storage: 500M users × 1KB = 500GB.
3. Bandwidth Estimation
Tweet Write Bandwidth:
• Total tweets per second: 3,500 TPS.
• Average tweet size: 104KB.
• Write bandwidth for tweets: 3,500 tweets/sec × 104KB = 364MB/sec.
Like/Retweet Write Bandwidth:
• Total interactions per second: 35,000 IPS.
• Average like/retweet size: 20 bytes.
• Write bandwidth for likes/retweets: 35,000 interactions/sec × 20 bytes ≈ 0.7MB/sec.
API design
Authentication & Authorization
Before accessing any resources, users need to authenticate and authorize their identity.
• Login API: Authenticates users and issues access tokens.
• Sign-up API: Registers new users.
API Endpoint: User Login (Authentication)
• URL: POST /api/v1/auth/login
API Endpoint: User Registration (Sign-up)
• URL: POST /api/v1/auth/signup
API Endpoint: Post a Tweet
• URL: POST /api/v1/tweets
API Endpoint: Retrieve a Tweet
• URL: GET /api/v1/tweets/{tweet_id}
API Endpoint: Delete a Tweet
• URL: DELETE /api/v1/tweets/{tweet_id}
API Endpoint: Get User Timeline
• URL: GET /api/v1/users/{user_id}/timeline
API Endpoint: Like a Tweet
• URL: POST /api/v1/tweets/{tweet_id}/like
API Endpoint: Retweet a Tweet
• URL: POST /api/v1/tweets/{tweet_id}/retweet
• Description: Retweet a specific tweet.
API Endpoint: Follow a User
• URL: POST /api/v1/users/{user_id}/follow
API Endpoint: Unfollow a User
• URL: POST /api/v1/users/{user_id}/unfollow
API Endpoint: Get Followers List
• URL: GET /api/v1/users/{user_id}/followers
API Endpoint: Search Tweets
• URL: GET /api/v1/search/tweets
Database design
Database Design for Twitter-like Service
CREATE TABLE users (
user_id BIGINT PRIMARY KEY,
username VARCHAR(50) UNIQUE NOT NULL,
email VARCHAR(100) UNIQUE NOT NULL,
password_hash VARCHAR(255) NOT NULL,
bio TEXT,
profile_image_url TEXT,
created_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP,
updated_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP
);
CREATE TABLE tweets (
tweet_id BIGINT PRIMARY KEY,
user_id BIGINT NOT NULL,
content TEXT NOT NULL,
media_url TEXT, -- optional, for images/videos
created_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP,
FOREIGN KEY (user_id) REFERENCES users(user_id) ON DELETE CASCADE
);
CREATE TABLE follows (
follower_id BIGINT NOT NULL,
followed_id BIGINT NOT NULL,
followed_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP,
PRIMARY KEY (follower_id, followed_id),
FOREIGN KEY (follower_id) REFERENCES users(user_id) ON DELETE CASCADE,
FOREIGN KEY (followed_id) REFERENCES users(user_id) ON DELETE CASCADE
);
CREATE TABLE retweets (
retweet_id BIGINT PRIMARY KEY,
tweet_id BIGINT NOT NULL,
user_id BIGINT NOT NULL,
retweeted_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP,
FOREIGN KEY (tweet_id) REFERENCES tweets(tweet_id) ON DELETE CASCADE,
FOREIGN KEY (user_id) REFERENCES users(user_id) ON DELETE CASCADE
);
CREATE TABLE comments (
comment_id BIGINT PRIMARY KEY,
tweet_id BIGINT NOT NULL,
user_id BIGINT NOT NULL,
comment_text TEXT NOT NULL,
commented_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP,
FOREIGN KEY (tweet_id) REFERENCES tweets(tweet_id) ON DELETE CASCADE,
FOREIGN KEY (user_id) REFERENCES users(user_id) ON DELETE CASCADE
);
High-level design
You should identify enough components that are needed to solve the actual problem from end to end. Also remember to draw a block diagram using the diagramming tool to augment your design. If you are unfamiliar with the tool, you can simply describe your design to the chat bot and ask it to generate a starter diagram for you to modify...
Request flows
Explain how the request flows from end to end in your high level design. Also you could draw a sequence diagram using the diagramming tool to enhance your explanation...
Detailed component design
Dig deeper into 2-3 components and explain in detail how they work. For example, how well does each component scale? Any relevant algorithm or data structure you like to use for a component? Also you could draw a diagram using the diagramming tool to enhance your design...
Trade offs/Tech choices
Explain any trade offs you have made and why you made certain tech choices...
Failure scenarios/bottlenecks
Try to discuss as many failure scenarios/bottlenecks as possible.
Future improvements
What are some future improvements you would make? How would you mitigate the failure scenario(s) you described above?