System requirements
Functional:
- User Management:
- Create, update, and delete user accounts (profile management).
- User authentication and authorization (login, signup).
- Tweet Management:
- Compose and post tweets (text, images, videos).
- Edit or delete tweets.
- View a list of tweets, either from followed users or from the user's timeline.
- Following System:
- Follow and unfollow other users.
- View the list of followers and following users.
- Interaction with Tweets:
- Like (favorite) a tweet.
- Retweet functionality to share others' tweets.
- Notifications:
- Real-time notifications for new tweets from followed users, likes, and retweets.
- Search Functionality:
- Search for users, hashtags, and tweets.
- Trending Topics:
- Display trending topics or hashtags based on activity.
Non-Functional:
- Performance:
- The system should support at least 1 million users concurrently.
- Tweets should load in less than 2 seconds.
- Scalability:
- The system should scale horizontally to handle increased user load and data.
- Should accommodate growth to support up to 100 million users without performance degradation.
- Availability:
- The service should have at least 99.9% uptime to ensure users can access the platform at all times.
- Implement redundancy to avoid single points of failure.
- Security:
- Data should be encrypted in transit and at rest.
- Implement robust user authentication to prevent unauthorized access.
- Usability:
- The service must have an intuitive user interface to enhance user experience.
- Provide help and support features to assist users.
- Reliability:
- The system should be able to recover from failures without substantial loss of data (e.g., disaster recovery).
- Ensure consistent performance under varying loads.
- Compliance:
- Adhere to legal and regulatory requirements concerning user data protection and privacy (like GDPR).
- Maintainability:
- Code should be structured in a way that makes it easy to update and modify in the future.
- Use of documentation and coding standards to facilitate ease of understanding.
Capacity estimation
- User Base:
- Active Users: Aim for supporting at least 100 million active users.
- Concurrent Users: The system should accommodate at least 1 million concurrent users at peak times.
- Tweets:
- Daily Tweets: Estimate around 500 million tweets being posted daily.
- Tweets per Second: Aim for handling up to 5,000 tweets per second during peak traffic times.
- Followers/Following:
- Average Followers: Users typically follow an average of about 300 users.
- Follow Requests: The system should support thousands of follow requests per second.
- Favorites and Retweets:
- Daily Favorites: Expect around 1 billion favorites (likes) each day.
- Retweets: Estimate approximately 300 million retweets daily.
- Notifications:
- Real-time Notifications: The service should send out around 2 billion notifications daily.
- Search Queries:
- Search Volume: Support at least 100 million search queries daily.
- Tweets:
- Assume each user tweets an average of 1 tweet per day.
- This amounts to 1 request per user for posting a tweet.
- Timeline Feed:
- Users typically refresh their timeline about 10 times a day (once every couple of hours).
- This totals around 10 requests per user for fetching their timeline.
- Likes (Favorites):
- Assume each user likes about 5 tweets per day.
- This adds 5 requests per user.
- Retweets:
- Assume each user retweets about 2 tweets per day.
- This adds 2 requests per user.
- Search Queries:
- If users perform searches, let's estimate an average of 1 search per day.
- This adds 1 request per user.
- Follow/Unfollow:
- Assume users follow/unfollow approximately 1 user per day.
- This adds 1 request per user.
Given these estimations, the total number of requests per user could be calculated as follows:
- Tweets: 1
- Timeline Feed: 10
- Likes: 5
- Retweets: 2
- Search Queries: 1
- Follow/Unfollow: 1
Total Approximate Requests Per User Per Day: 1 + 10 + 5 + 2 + 1 + 1 = 20 requests per user per day
- Total Daily Requests: [ \text{Total Daily Requests} = \text{Number of Users} \times \text{Requests per User per Day} ] For 100 million users and 20 requests per user: [ 100,000,000 \times 20 = 2,000,000,000 \text{ (2 billion requests per day)} ]
- Convert Daily Requests to Requests per Second:
- There are 24 hours in a day, 60 minutes in an hour, and 60 seconds in a minute.
- Therefore, the number of seconds in a day is: [ 24 \times 60 \times 60 = 86,400 \text{ seconds per day} ]
- Requests per Second: [ \text{Requests per Second} = \frac{\text{Total Daily Requests}}{\text{Seconds per Day}} = \frac{2,000,000,000}{86,400} ] Calculating this gives approximately: [ \text{Requests per Second} \approx 23,148 ]
Data to Store:
- User Data:
- User ID (Primary Key)
- Username (Unique)
- Email (Unique)
- Profile Picture URL
- Bio
- Date of Account Creation
- Number of Followers
- Number of Following
- Settings (privacy, notification preferences)
- Tweet Data:
- Tweet ID (Primary Key)
- User ID (Foreign Key referencing User Data)
- Content (text of the tweet, typically up to 280 characters)
- Created At (timestamp)
- Updated At (timestamp)
- Number of Likes
- Number of Retweets
- Media URL (if the tweet contains images, videos, etc.)
- Size of a tweet:
- Text can be up to 280 characters, which is approximately 280 bytes (assuming 1 byte per character for standard text).
- If you include metadata (like timestamps, user ID), assume around 100–200 additional bytes.
- For media attachments, images can vary significantly in size:
- Average image size: ~500 KB (depends on resolution)
- Video can be even larger, often in the range of 1MB or more per upload.
- Media Data:
- Media ID (Primary Key)
- User ID (Foreign Key referencing User Data)
- Media Type (image/video)
- Media URL (location of the stored media)
- Size of the media
- Created At (timestamp)
- Interactions Data:
- Like ID (Primary Key)
- User ID (Foreign Key referencing User Data)
- Tweet ID (Foreign Key referencing Tweet Data)
- Created At (timestamp for when the like occurred)
- Retweet Data:
- Retweet ID (Primary Key)
- Original Tweet ID (Foreign Key referencing Tweet Data)
- User ID (Foreign Key referencing User Data)
- Created At (timestamp for when the retweet occurred)
Example Estimate of Storage Size:
For a single tweet containing text and an image:
- Text: ~280 bytes
- Metadata: ~200 bytes
- Image: ~500 KB (approx. 500,000 bytes)
- Total Size per Tweet (with media):
- 280 + 200 + 500,000 = ~500,480 bytes (approx. 500 KB)
Given Data:
- Size of a Tweet: 500,480 bytes (or approximately 500 KB)
- Requests per Second (RPS): 86,400 requests per second
Total Data Calculation:
- Daily Data Storage from Tweets:
- Data per Second: [ 500,480 \text{ bytes/tweet} \times 86,400 \text{ requests/second} ]
- This would give the total data for one second.
- For one day (60 seconds × 60 minutes × 24 hours): [ \text{Total Daily Data} = 500,480 \times 86,400 \times 60 \times 60 \times 24 ]
- Convert to Yearly Data:
- For one year (365 days): [ \text{Total Yearly Data} = \text{Total Daily Data} \times 365 ]
- Convert to Gigabytes:
- Since there are (1024 \times 1024 \times 1024) bytes in a Gigabyte, we will divide the total by (1024^3): [ \text{Total Yearly Data in GB} = \frac{500,480 \times 86,400 \times 60 \times 60 \times 24 \times 365}{1024 \times 1024 \times 1024} ]
API design
- User API:
twitter/user- POST: Create a new user account. This would include registering a new user with details like username, email, and password.
- GET: Retrieve user profile information. This could include fetching details of the authenticated user or public information about other users.
- DELETE: Delete a user account. This would handle account deactivation and associated data management.
- Tweet API:
twitter/tweet- POST: Create a new tweet. This allows a user to post a tweet containing text, images, or videos.
- GET: Retrieve tweets. This could encompass fetching a user’s timeline (tweets from users they follow) or a specific tweet by its ID.
- DELETE: Remove a tweet. This allows users to delete their own tweets.
- Like API:
twitter/tweet/like?tweetID- POST: Like a tweet. This helps users to express appreciation for a tweet. The
tweetIDparameter identifies which tweet is being liked.
- POST: Like a tweet. This helps users to express appreciation for a tweet. The
- Follow API:
twitter/follow- POST: Follow another user.
- DELETE: Unfollow a user.
- Retweet API:
twitter/tweet/retweet?tweetID- POST: Retweet a tweet. This allows users to share someone else's tweet with their own followers.
- Search API:
twitter/search- GET: Search for tweets or users by keywords or hashtags.
Database design
User : +int RecordID PK
User : +int UserID FK
User : +String UserName
User : +String HashedPassword
User : +String Email Index
User : +DateTime CreatedAt Index
User : +int mediaLink(profile picture) FK
Tweet : +int RecordID PK
Tweet : +int CreatorID FK
Tweet : +int TweetID FK
Tweet : +int LikeCounterID FK
Tweet : +String TweetData
Tweet : +DateTime createdAt
LikeCounter: +int RecorID PK
LikeCounter: +int TweetID FK
LikeCounter: +int likes
LikeCounter: +int dislikes
FollowerRecord : +int RecordID PK
FollowerRecord : +int FollowerID FK
FollowerRecord : +int FolloweeID FK
MediaMetadata : +int RecordID PK
MediaMetadata : +int mediaID FK
MediaMetadata : +int tweetID FK
MediaMetadata : +DateTime createdAt
MediaMetadata : +String link
Summary of Relationships:
- User ↔ Media: One-to-One (a user has one profile picture)
- User ↔ FollowerRecord: Many-to-Many (users can follow and be followed by multiple users)
- Tweet ↔ User: One-to-One (a tweet is created by one user)
- Tweet ↔ LikeCounter: One-to-One (each tweet has one like counter)
- Tweet ↔ Media: One-to-Many (a tweet can have multiple media entries)
High-level design
Components Breakdown:
- Load Balancer:
- Distributes incoming requests across available API servers to ensure no single server becomes a bottleneck. This promotes high availability and reliability.
- API Servers:
- These handle incoming API requests from users and route them to the appropriate services like UserService, TweetService, and NotificationService.
- UserService:
- Manages user-related operations such as user authentication, registration, profile management, and follower relationships.
- TweetService:
- Handles all tweet-related actions including posting tweets, fetching tweets (user timelines), liking, and retweeting tweets.
- NotificationService:
- Manages real-time notifications for users regarding actions taken on their tweets, such as likes, retweets, and new followers.
- UserDB:
- A database dedicated to storing user-related data. It holds user profiles, account details, follower and following relationships.
- TweetDB:
- A database that stores tweet-related data, including the text of tweets, likes, retweets, and media references.
- TimeLineBuffer:
- A caching mechanism or in-memory store (like Redis) to speed up access to user timelines and minimize the load on the TweetDB when users retrieve their feeds.
- TimelineService:
- This service assembles and manages the construction of user timelines, pulling the necessary tweets from the TweetDB and combining them efficiently for user queries.
- Media Storage (AWS S3):
- A cloud-based storage solution for storing media files (images and videos) uploaded by users. S3 is scalable, durable, and serves static files efficiently.
- Content Delivery Network (CDN):
- A CDN is used to cache and provide fast access to media content globally, optimizing speed and reducing latency in fetching images and videos.
Request flows
Explain how the request flows from end to end in your high level design. Also you could draw a sequence diagram using the diagramming tool to enhance your explanation...
Detailed component design
Detailed Breakdown:
- TweetDB Sharding:
- Sharding by UserID: This means that the data for tweets is distributed across multiple database instances based on user identifiers. Each shard contains the tweets of different users, allowing for horizontal scaling and improving read/write performance as the dataset grows.
- Benefits include reduced contention on individual databases and improved query response times since fewer records are processed for each request.
- UserDB Sharding:
- Similar to the TweetDB, sharding the UserDB by UserID helps in managing a large number of users efficiently, allowing for quick access to user profiles and their associated tweet records. This reduces load and improves availability.
- In-Memory Storage for Timeline Service and LikeCounter:
- Distributed In-Memory Storage: Utilizing in-memory data stores (like Redis or Memcached) for the Timeline Service and LikeCounter ensures that frequently accessed data (such as user timelines and engagement metrics) is retrieved quickly, minimizing latency.
- This architecture is especially effective during high traffic when many users query their timelines simultaneously.
- Multi-Master Media Storage:
- In a multi-master schema for media storage, multiple nodes can accept writes simultaneously. This provides high availability and redundancy, as there is no single point of failure. It allows for quick uploads and changes to media files.
- This design is beneficial for managing media uploads where user-generated content is prevalent, as you can handle high volumes of uploads without performance degradation.
- Push CDN:
- Using a push CDN implies that media content is pushed to the CDN (Content Delivery Network) as it is uploaded, allowing for faster access to media files globally.
- This approach minimizes latency for end-users and ensures that media content is readily available, improving the user experience by delivering high-quality images and videos quickly.
Trade offs/Tech choices
Explain any trade offs you have made and why you made certain tech choices...
Failure Recovery Strategies:
- API Servers and Tweet Servers:
- Standby Servers with Heartbeat System: Monitoring servers using a heartbeat mechanism allows the system to regularly check the health of API and Tweet servers. If a primary server fails, the load balancer can redirect traffic to a standby server almost seamlessly, ensuring minimal downtime.
- Load Balancers:
- Heartbeat Mechanism: Similar to the API servers, load balancers use health checks to assess the state of the servers they manage. If a backend server becomes unresponsive, the load balancer automatically reroutes traffic, maintaining the service availability.
- Rate Limiter: Incorporating rate limiting at the load balancer level helps prevent abuse and ensures fair usage among all users, protecting backend systems from overload.
- User Database - Read Replica Schema:
- Read Replicas: Utilizing read replicas allows you to offload read queries from the master database, improving performance. In case the primary (master) database fails, one of the read replicas can be promoted to master, minimizing disruption and maintaining data availability.
- In-Memory Databases for Timeline and Like Counter:
- Cluster System Setup: Setting up in-memory databases like Redis in a clustered configuration allows for a distributed and fault-tolerant system. If one node fails, the other nodes in the cluster can continue to serve requests, maintaining access to cached timelines and like counters.
Future improvements
What are some future improvements you would make? How would you mitigate the failure scenario(s) you described above?