System requirements
Functional:
We need to support the following functionality:
- user creates and posts a tweet
- user start/end following another user
- home feed with tweets from users followed by the user
Registrations, authorisation, notification functionalities are also very important, but we'll leave them out of scope for now.
Non-Functional:
We want the system to
- have high availability
- be scalable and handle peak loads
- have low latency
Capacity estimation
- 100k DAU
- Peak values can reach x10+
- 5 tweets per day on average
- may scale along with the DAU (so x10+ also)
Let's assume a tweet is ~70 characters long, so it takes about 140B of storage, and every now and then (let's say 1/5 of all tweets) users post a photo (~5MB) then will need
10 ^ 5 * 4 * 70 = 28MB of storage per day for text content right now and up to x20 later on when DAU base and their activity has grown.
10 ^ 5 * 5 * 10 ^ 6 = 5 * 10 ^ 11 = 500GB of storage per day for storing photos, which we can reduce by preprocessing and optimising the original files.
API design
- POST /api/v1/tweets/new - returns a status code with some metadata about the new tweet or an identifier for a processing status requests
- PUT /api/v1/tweets/{tweet_id}/like - returns a status code
- POST /api/v1/users/{user_id}/follow - returns a status code along with some metadata about the user followed
- GET /api/v1/tweets/user_feed/{user_id} - returns a paginated collection of tweets for a specific user, according to a business logic (e.g. top K popular/newest)
Database design
We'll have the following entities:
High-level design
You should identify enough components that are needed to solve the actual problem from end to end. Also remember to draw a block diagram using the diagramming tool to augment your design. If you are unfamiliar with the tool, you can simply describe your design to the chat bot and ask it to generate a starter diagram for you to modify...
Request flows
- User posts a tweet
- a request with a tweet text is sent to a Tweet service
- It either adds the tweet to the database, or sends photo content for processing and returns some identifier to the user which they can use to check processing status
- [Optional] Notification sent via Notification service about a new tweet
- User follows another user
- a follow request is sent to the Follow service
- the relation between the users is updated in the DB
- [Optional] Notification sent via Notifications service about a new follower
- User likes a tweet
- a request is sent to the Likes service
- likes state is updated in the DB
- [Optional] Notification is sent about new like
- User requests their feed
- a request is sent to the Feed service
- Feed service constructs a feed and returns it to the user
Detailed component design
Dig deeper into 2-3 components and explain in detail how they work. For example, how well does each component scale? Any relevant algorithm or data structure you like to use for a component? Also you could draw a diagram using the diagramming tool to enhance your design...
Trade offs/Tech choices
Explain any trade offs you have made and why you made certain tech choices...
Failure scenarios/bottlenecks
Try to discuss as many failure scenarios/bottlenecks as possible.
Future improvements
What are some future improvements you would make? How would you mitigate the failure scenario(s) you described above?