System requirements


Functional:

User Registration/Authentication: Users can sign up/login via email, phone or social media accounts.


Post Tweets: Users can post short text messages(tweets) with optimal media like images or videos.


Follow Users: Users can follow/unfollow other users.


Timeline: Users should see tweets from the people they follow, sorted chronologically or by relevance.


Like/Retweet: Users can like or retweet tweets to show appreciation or share them with followers.


Search: Users can search tweets and accounts by keywords, hashtags, or usernames.

Notifications: Users should receive notifications for likes, retweets, mentions, and new followers.


Non-Functional:


High Availability: The service must be available 24/7.

Low Latency: The system must provide fast access to timelines and post real-time updates.

Scalability: The platform should scale to handle millions of users and tweets per day.

Consistency: Ensure consistency in likes, retweets, and follow counts across the system.

Fault Tolerance: The system should be resilient to node failures.




Capacity estimation

Traffic Estimation


Total Users:


• Assume Twitter has 500 million users (active and inactive).

• We will focus on the Active Daily Users (ADUs), which could be around 20% of the total user base.

Active Daily Users (ADUs): 500M × 20% = 100 million ADUs.


Number of Tweets:


• Assume each active user tweets 3 times per day on average.

• Total tweets per day: 100M ADUs × 3 tweets = 300 million tweets/day.

• Tweets per second (TPS): 300M tweets ÷ 86,400 seconds (per day) ≈ 3,500 tweets per second (TPS).


Number of Likes/Retweets:


• On average, each tweet gets 10 interactions (likes, retweets, or replies).

• Total interactions per day: 300M tweets × 10 interactions = 3 billion interactions/day.

• Interactions per second: 3B ÷ 86,400 ≈ 35,000 interactions per second (IPS).


Reads per Write:


• For each tweet posted, there are an average of 100 reads (timeline views).

Reads per day: 300M tweets × 100 reads = 30 billion read requests/day.

• Reads per second: 30B ÷ 86,400 ≈ 347,222 reads per second.


2. Data Storage Estimation


Tweet Storage:


• Each tweet can contain text (280 characters max), metadata (timestamp, user ID, location), and possibly media (images/videos).

• Assume:

Text-only tweet size: ~300 bytes (280 characters + metadata).

Media (images/videos): Let’s assume 1 in 5 tweets contains media, with the average media size being ~500KB (compressed image/video).

Average tweet size: (4 × 300 bytes + 1 × 500KB) ÷ 5 ≈ 104KB per tweet.

Total tweet storage per day:

• 300M tweets × 104KB = ~30TB/day.

Total tweet storage per year:

• 30TB/day × 365 = 10.95PB/year (petabytes).


Like/Retweet Storage:


• Each like or retweet stores the user ID and tweet ID (around 20 bytes).

Like/retweet storage per day: 3 billion interactions × 20 bytes = 60GB/day.

Like/retweet storage per year: 60GB/day × 365 = ~22TB/year.


User Storage:


• Each user profile (user ID, username, bio, email, hashed password, etc.) may take ~1KB of storage.

• For 500M users:

User profile storage: 500M users × 1KB = 500GB.


3. Bandwidth Estimation


Tweet Write Bandwidth:


• Total tweets per second: 3,500 TPS.

• Average tweet size: 104KB.

Write bandwidth for tweets: 3,500 tweets/sec × 104KB = 364MB/sec.


Like/Retweet Write Bandwidth:


• Total interactions per second: 35,000 IPS.

• Average like/retweet size: 20 bytes.

Write bandwidth for likes/retweets: 35,000 interactions/sec × 20 bytes ≈ 0.7MB/sec.





API design


Authentication & Authorization


Before accessing any resources, users need to authenticate and authorize their identity.


Login API: Authenticates users and issues access tokens.

Sign-up API: Registers new users.


API Endpoint: User Login (Authentication)


URL: POST /api/v1/auth/login


API Endpoint: User Registration (Sign-up)


URL: POST /api/v1/auth/signup


API Endpoint: Post a Tweet


URL: POST /api/v1/tweets


API Endpoint: Retrieve a Tweet


URL: GET /api/v1/tweets/{tweet_id}


API Endpoint: Delete a Tweet


URL: DELETE /api/v1/tweets/{tweet_id}


API Endpoint: Get User Timeline


URL: GET /api/v1/users/{user_id}/timeline


API Endpoint: Like a Tweet


URL: POST /api/v1/tweets/{tweet_id}/like


API Endpoint: Retweet a Tweet


URL: POST /api/v1/tweets/{tweet_id}/retweet

Description: Retweet a specific tweet.


API Endpoint: Follow a User


URL: POST /api/v1/users/{user_id}/follow


API Endpoint: Unfollow a User


URL: POST /api/v1/users/{user_id}/unfollow


API Endpoint: Get Followers List


URL: GET /api/v1/users/{user_id}/followers


API Endpoint: Search Tweets


URL: GET /api/v1/search/tweets








Database design


Database Design for Twitter-like Service

CREATE TABLE users (

user_id BIGINT PRIMARY KEY,

username VARCHAR(50) UNIQUE NOT NULL,

email VARCHAR(100) UNIQUE NOT NULL,

password_hash VARCHAR(255) NOT NULL,

bio TEXT,

profile_image_url TEXT,

created_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP,

updated_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP

);



CREATE TABLE tweets (

tweet_id BIGINT PRIMARY KEY,

user_id BIGINT NOT NULL,

content TEXT NOT NULL,

media_url TEXT, -- optional, for images/videos

created_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP,

FOREIGN KEY (user_id) REFERENCES users(user_id) ON DELETE CASCADE

);



CREATE TABLE follows (

follower_id BIGINT NOT NULL,

followed_id BIGINT NOT NULL,

followed_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP,

PRIMARY KEY (follower_id, followed_id),

FOREIGN KEY (follower_id) REFERENCES users(user_id) ON DELETE CASCADE,

FOREIGN KEY (followed_id) REFERENCES users(user_id) ON DELETE CASCADE

);



CREATE TABLE retweets (

retweet_id BIGINT PRIMARY KEY,

tweet_id BIGINT NOT NULL,

user_id BIGINT NOT NULL,

retweeted_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP,

FOREIGN KEY (tweet_id) REFERENCES tweets(tweet_id) ON DELETE CASCADE,

FOREIGN KEY (user_id) REFERENCES users(user_id) ON DELETE CASCADE

);



CREATE TABLE comments (

comment_id BIGINT PRIMARY KEY,

tweet_id BIGINT NOT NULL,

user_id BIGINT NOT NULL,

comment_text TEXT NOT NULL,

commented_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP,

FOREIGN KEY (tweet_id) REFERENCES tweets(tweet_id) ON DELETE CASCADE,

FOREIGN KEY (user_id) REFERENCES users(user_id) ON DELETE CASCADE

);






High-level design

You should identify enough components that are needed to solve the actual problem from end to end. Also remember to draw a block diagram using the diagramming tool to augment your design. If you are unfamiliar with the tool, you can simply describe your design to the chat bot and ask it to generate a starter diagram for you to modify...







Request flows

Explain how the request flows from end to end in your high level design. Also you could draw a sequence diagram using the diagramming tool to enhance your explanation...






Detailed component design

Dig deeper into 2-3 components and explain in detail how they work. For example, how well does each component scale? Any relevant algorithm or data structure you like to use for a component? Also you could draw a diagram using the diagramming tool to enhance your design...






Trade offs/Tech choices

Explain any trade offs you have made and why you made certain tech choices...






Failure scenarios/bottlenecks

Try to discuss as many failure scenarios/bottlenecks as possible.






Future improvements

What are some future improvements you would make? How would you mitigate the failure scenario(s) you described above?