System requirements
Functional:
User should be able to register
User should be able to view another user
User should be able to follow another user
User should have a feed of the most popular tweets from the users it follows
User should be able to post a tweet
User should be able to delete a tweet
Tweets should support text and images
Non-Functional:
User should be highly available and have low latency
Capacity estimation
Assuming there are about 500 million customers and 100 million daily active users. Let's assume each customer tweets 5 times a day. We need to store 500 million tweets.
API design
We need the following APIs
- Open profile to open another user's profile or self
- create tweet to create a new tweet
- get feed to get feed from the user is following
- delete tweet to delete a tweet
- follow user to follow another user
- unfollow user to unfollow another user
Database design
For the user table, we will store the user id, users who follow the user, the people the user is following, the id of the tweets
For the tweets table we can store the id of the tweet, tweet message, the id of the image if available that can be used to retrieve from data store
High-level design
We should have a server that will handle the operations and database to store all the data of the tweets and users. In order to store the images we should use some sort of object store such as AWS S3
Request flows
For viewing another profile, we will have the server fetch all the tweets for the other users profile. After getting the tweets, the tweets will be looked up in the tweets database and returned back to the client
For following another user, the server will add the newly followed user to the current user's following list and the other's follower list. For unfollowing, we will remove instead
To post a tweet, the server will create a 10 digit unique id using base62 encoding which include all alphanumerics and store it into the tweets database with the id, image object id (if available) and message. For the user, we will also append to the tweets id list. If the tweet contains an image, it will be uploaded to the datastore and store the id of the uploaded image into the tweet's database
Detailed component design
Trade offs/Tech choices
Because we have so many relationships among the database tables, we should opt to use a relational database such as AWS RDS or MySQL or PostgreSQL
Failure scenarios/bottlenecks
One of the main failure scenarios is if a single server gets all the traffic. In order to avoid such failure case, we will include load balancers to evenly distribute the traffic to the servers.
A second bottleneck is database and attempting to access the database with limited connections available. In order to scale the database horizontally, we can use database replication. the databases will have a master-slave relationship where the reads can happen in any database while the writes only happen to the master databases that will take care of replicating the data to the slaves
Another bottleneck is latency when loading images or tweets. In order to get images quickly from the object store, we can use a CDN for the customer to be able to retrieve the images from whichever edge server they are closest to. For general latency when fetching images, we can have a cache for tweets
Future improvements
We can add logs and metrics for any time an operation fails in order to track failures.