System requirements
Functional:
List functional requirements for the system (Ask the chat bot for hints if stuck.)...
- Users can post tweets (with text and media).
- Users can like or comment on tweets.
- Users can follow/unfollow others.
- Users can search tweets.
- Tweets can be shared.
- Timelines must be generated and served efficiently.
- Users receive notifications for activities (likes, comments, follows).
Non-Functional:
List non-functional requirements for the system...
- High scalability
- Low latency
- Fault tolerance
- Strong consistency where necessary (e.g., tweet creation)
- Data security
Capacity estimation
Estimate the scale of the system you are going to design...
For the number of tweets
Lets assume there are 10^6 users and each user makes 10 tweets a day
then the number QPS =10^6*10=10^7/10^5 =100
Lets assume each user follows 50 users then the number of follows qps=
50 *10^8= 5*10^9/10^6= 5*10^3
API design
Define what APIs are expected from the system...
1) /v1/tweet
RequestBody
{
twitterContent:"",
images and GIF:[""],
userId:""
}
ReponseBody on successfull creation and response code 201
{
tweetId:"12434"
}
and response body on error
{
tweetId:null
}
2) /v1/comment/:tweetId?commentString=" retrun code 200
RepsonseBody
{
message:"successfully added"
}
and response body on error
{
message:cannot insert
}
3) /v1/like/:tweetId
RepsonseBody on status code 200
{
message:"successfully liked"
}
and response body on error
{
tweetId:null
}
4) /v1/seach?tweetName=&limitValue=20&cursorValue=after
responseBody response code 2200
{
tweetId:""
tweetName:"",
postedBy:"",
}
response code 500 on some error
5) /v1/follow
resposne body Returns response code 200
{
followingUserId:"",
followerUserId:"",
}
response code 500 on some error
Database design
Defining the system data model early on will clarify how data will flow among different components of the system. Also you could draw an ER diagram using the diagramming tool to enhance your design...
- User: stores user profile
- Tweet: stores tweet content
- Image: stores tweet-associated media
- FollowerFollowing: tracks follow relationships
- Comment: stores comments per tweet
- TweetLike: tracks which user liked which tweet
Sharding (by tweetId) and master-slave replication are used for horizontal scalability and fault tolerance.
High-level design
You should identify enough components that are needed to solve the actual problem from end to end. Also remember to draw a block diagram using the diagramming tool to augment your design. If you are unfamiliar with the tool, you can simply describe your design to the chat bot and ask it to generate a starter diagram for you to modify...
Client → API Gateway → Rate Limiter → Services → Kafka → Worker → DB
- All user requests go through API Gateway.
- Token bucket algorithm controls rate per user.
- Tweet creation, like, and comment calls go to dedicated services.
- Kafka is used to publish events for tweets, likes, follows, etc.
- Worker services consume Kafka and write to DB asynchronously.
- Popular tweets and media are cached (Redis).
- Media files are uploaded directly to S3 via pre-signed URLs.
- Elasticsearch powers tweet search using inverted indexes.
6. Timeline Generation
- For users following many popular accounts: precompute their feed using push-based fan-out.
- For popular users with many followers: tweet is pushed to followers' timelines on post.
- For non-popular users: feed is generated on-demand (pull model).
- Timeline data is cached in Redis.
7. Notification System
- Kafka publishes events (like, follow, comment).
- Notification service listens and queues messages.
- Downstream systems (APN, FCM, Email) consume from queues for push/email delivery.
Request flows
Explain how the request flows from end to end in your high level design. Also you could draw a sequence diagram using the diagramming tool to enhance your explanation...
Client → API Gateway → Rate Limiter → Services → Kafka → Worker → DB
- All user requests go through API Gateway.
- Token bucket algorithm controls rate per user.
- Tweet creation, like, and comment calls go to dedicated services.
- Kafka is used to publish events for tweets, likes, follows, etc.
- Worker services consume Kafka and write to DB asynchronously.
- Popular tweets and media are cached (Redis).
- Media files are uploaded directly to S3 via pre-signed URLs.
- Elasticsearch powers tweet search using inverted indexes.
6. Timeline Generation
- For users following many popular accounts: precompute their feed using push-based fan-out.
- For popular users with many followers: tweet is pushed to followers' timelines on post.
- For non-popular users: feed is generated on-demand (pull model).
- Timeline data is cached in Redis.
7. Notification System
- Kafka publishes events (like, follow, comment).
- Notification service listens and queues messages.
- Downstream systems (APN, FCM, Email) consume from queues for push/email delivery.
Detailed component design
Dig deeper into 2-3 components and explain in detail how they work. For example, how well does each component scale? Any relevant algorithm or data structure you like to use for a component? Also you could draw a diagram using the diagramming tool to enhance your design...
Client → API Gateway → Rate Limiter → Services → Kafka → Worker → DB
- All user requests go through API Gateway.
- Token bucket algorithm controls rate per user.
- Tweet creation, like, and comment calls go to dedicated services.
- Kafka is used to publish events for tweets, likes, follows, etc.
- Worker services consume Kafka and write to DB asynchronously.
- Popular tweets and media are cached (Redis).
- Media files are uploaded directly to S3 via pre-signed URLs.
- Elasticsearch powers tweet search using inverted indexes.
6. Timeline Generation
- For users following many popular accounts: precompute their feed using push-based fan-out.
- For popular users with many followers: tweet is pushed to followers' timelines on post.
- For non-popular users: feed is generated on-demand (pull model).
- Timeline data is cached in Redis.
7. Notification System
- Kafka publishes events (like, follow, comment).
- Notification service listens and queues messages.
- Downstream systems (APN, FCM, Email) consume from queues for push/email delivery.
Trade offs/Tech choices
Explain any trade offs you have made and why you made certain tech choices...
1) Relation Database are hard to scale
Failure scenarios/bottlenecks
Try to discuss as many failure scenarios/bottlenecks as possible.
Future improvements
What are some future improvements you would make? How would you mitigate the failure scenario(s) you described above?
1) Introducing machine larning to im