System requirements
Functional:
Share tweets: User will be able to upload post
Feed creation: A feed of posts will be computed and viewed by users
Favoring tweets: Users will be able to save their favorite tweets
Non-Functional:
- The system can handle all the traffic
- High availability
- Notify the users
Capacity estimation
The platform will have in total 10 million active users. It will also have 2 million estimated daily active users and on average 3 tweets per day per user so it should handle 6 million tweets per day.
API design
We mainly have 3 APIs. One GET /app/me/feed which has an auth_token parameter and will retrieve the user's tweets feed. One POST app/me/feed which will hold two parameters, one content which will hold the text content of the tweet and also an auth_token will be used to authenticate API requests and is used to share a tweet. Finally it Finally it will also have a third enpoint /app/me/favorite with an auth_token which will be used for favoring a tweet.
Database design
We will have 3 databases. One graph database to get a user's friend list, another relational database for user metadata and a final posts no-sql database.
High-level design
flowchart TD
B["client"];
C["Load Balancer"];
D["Authentication Service"];
n0["API Gateway"];
n1["User service"];
n2["Tweet Service"];
n3[("User Database")];
n4["Notification Service"];
n5[("Tweet Database")];
n6["Feed Service"];
n7[("Redis Cache")];
n8[("Friend Database")];
n9["Message Queue"];
B --> C;
C --> D;
D --> n0;
n0 --> n1;
n0 --> n2;
n1 --> n3;
n1 --> n4;
n2 --> n5;
n0 --> n6;
n6 --> n7;
n4 --> B;
n7 --> n5;
n1 --> n8;
n5 --> n9;
n9 --> n6;
Request flows
Explain how the request flows from end to end in your high level design. Also you could draw a sequence diagram using the diagramming tool to enhance your explanation...
Detailed component design
As discussed before we use no-sql database for the posts since we need quick reads and the key-value store satisfy our needs. Mongodb will do just fine. For our friend databse we could use AWS Neptune which scales really well. Finally for the user database we could use a relational database like MySQL. We also introduce a message queue before the feed service in order to decuple it and help it scale better so that the feed retrieval has low latency. Finally I also introduce a Redis cache for tweets that are accessed frequently and we want to keep them in the hot cache instead of always searching the database.
Trade offs/Tech choices
Explain any trade offs you have made and why you made certain tech choices...
Failure scenarios/bottlenecks
We could face some issues with the current approach. For example we would like to limit how many tweets can a user upload within a time limit, so we should introduce a rate limiter in the future. Also the user database could introduce a single point of failure and not scale well for many users.
Future improvements
We could shard the user database for increased availability and better performance alongside support of multiple data centers for better geographical coverage. As mentioned before we could also a introduce a rate limiter. Finally we could introduce some metrics .e.g most liked tweets.