System requirements


Functional:

User should be able to register

User should be able to view another user

User should be able to follow another user

User should have a feed of the most popular tweets from the users it follows

User should be able to post a tweet

User should be able to delete a tweet

Tweets should support text and images


Non-Functional:

User should be highly available and have low latency



Capacity estimation

Assuming there are about 500 million customers and 100 million daily active users. Let's assume each customer tweets 5 times a day. We need to store 500 million tweets.




API design

We need the following APIs

  • Open profile to open another user's profile or self
  • create tweet to create a new tweet
  • get feed to get feed from the user is following
  • delete tweet to delete a tweet
  • follow user to follow another user
  • unfollow user to unfollow another user




Database design

For the user table, we will store the user id, users who follow the user, the people the user is following, the id of the tweets


For the tweets table we can store the id of the tweet, tweet message, the id of the image if available that can be used to retrieve from data store


High-level design

We should have a server that will handle the operations and database to store all the data of the tweets and users. In order to store the images we should use some sort of object store such as AWS S3


Request flows

For viewing another profile, we will have the server fetch all the tweets for the other users profile. After getting the tweets, the tweets will be looked up in the tweets database and returned back to the client


For following another user, the server will add the newly followed user to the current user's following list and the other's follower list. For unfollowing, we will remove instead


To post a tweet, the server will create a 10 digit unique id using base62 encoding which include all alphanumerics and store it into the tweets database with the id, image object id (if available) and message. For the user, we will also append to the tweets id list. If the tweet contains an image, it will be uploaded to the datastore and store the id of the uploaded image into the tweet's database



Detailed component design


Trade offs/Tech choices

Because we have so many relationships among the database tables, we should opt to use a relational database such as AWS RDS or MySQL or PostgreSQL



Failure scenarios/bottlenecks

One of the main failure scenarios is if a single server gets all the traffic. In order to avoid such failure case, we will include load balancers to evenly distribute the traffic to the servers.


A second bottleneck is database and attempting to access the database with limited connections available. In order to scale the database horizontally, we can use database replication. the databases will have a master-slave relationship where the reads can happen in any database while the writes only happen to the master databases that will take care of replicating the data to the slaves


Another bottleneck is latency when loading images or tweets. In order to get images quickly from the object store, we can use a CDN for the customer to be able to retrieve the images from whichever edge server they are closest to. For general latency when fetching images, we can have a cache for tweets


Future improvements

We can add logs and metrics for any time an operation fails in order to track failures.