System requirements


Functional:

  • Post Tweet API - users should be able to write a tweet and attach images or vides
  • Read timeline API - users should be able to generate a timeline of tweets
  • Favorite Tweet API - users should be able to favorite a tweet



Non-Functional:

  • 1 million users
  • 10 million write requests per month
  • 1 billion read requests per month




Capacity estimation

  • 1 million users
  • 10 million write requests per month
  • 1 billion read requests per month
  • there are roughly 2.5 million seconds in a month:
    • 10 million / 2.5 million = 4 writes per second
    • 1 billion / 2.5 million = 400 reads per second
  • Assume a user likes 10% of the tweets they see. and a user sees 1000 tweets per month. This means that there will be:
    • 10 million likes per month
  • Likes are write requests so we will double the writes to month to 20 million which means there are now 8 writes per second on average
  • Tweets will be capped at 255 characters which is 255 bytes and the max attachment size will be 1MB
  • Total data we need to store is roughly 1MB per tweet * 10 million tweets per month = 10TB






API design

  • The system will have 3 API's
  • Tweet API - This takes a user's text input and attachment and stores it
  • Generate Timeline API - This API reads from a database a list of new tweets and displays them on the user's timeline.
  • Favorite API - This API increments a like counter on the tweet and stores a reference to the tweet in the user's database entry





Database design

  • Eventual consistency will be fine here are there is no time critical data in a social media site
  • I will choose a NoSQL database here because of the structure of the calls that need to be made and the lack of necessity for relational mapping
  • NoSQL also allows us to scale horizontally and we could use a managed service such as DynamoDB or Firebase Firestore
  • The NoSQL database will have two collections: Tweets and User's
  • Tweets will be contained in separate documents with the following fields: timestamp, text, attachments, favorites and userID
  • User's will have have documents containing fields such as tweets from user (maps or links to the tweet document), favorites (maps or links to tweets they have favorited)





High-level design

  1. From the high level our client will make a request to the server
  2. The client request will hit a load balancer which will use round robin to direct it to the appropriate server
  3. Depending on the request type the server will call one of 3 API's: write tweet, read timeline, or favorite tweet
  4. these three api's will perform their necessary actions in regards to the NoSQL database
  5. For write tweet, it will create a new document in the tweets collection with the following fields: timestamp, text, attachments, favorites and userID. Attachments should be stored in the object store bucket and the reference to the attachment should be put in the attachment field in the document.
  6. The read timeline api first hits a cache to see if a new enough version of the timeline lets say within one minute has already been read from the database and returns that
  7. If the cache is stale, which means the newest tweet in the cache is greater than one minute old, we go to the database and grab all documents that are newer than the newest tweet in the cache. We replace the current cache with the list of tweets we just obtained and return this data to the user to populate their timeline.
  8. The favorite tweet api finds the matching tweet document in the database and increments the number of likes counter. It then goes to the current user using their UID and adds a reference to that tweet to the user's "liked tweets" field in their document






Request flows

  1. From the high level our client will make a request to the server
  2. The client request will hit a load balancer which will use round robin to direct it to the appropriate server
  3. Depending on the request type the server will call one of 3 API's: write tweet, read timeline, or favorite tweet
  4. these three api's will perform their necessary actions in regards to the NoSQL database
  5. For write tweet, it will create a new document in the tweets collection with the following fields: timestamp, text, attachments, favorites and userID
  6. The read timeline api first hits a cache to see if a new enough version of the timeline lets say within one minute has already been read from the database and returns that
  7. If the cache is stale, which means the newest tweet in the cache is greater than one minute old, we go to the database and grab all documents that are newer than the newest tweet in the cache. We replace the current cache with the list of tweets we just obtained and return this data to the user to populate their timeline
  8. The favorite tweet api finds the matching tweet document in the database and increments the number of likes counter. It then goes to the current user using their UID and adds a reference to that tweet to the user's "liked tweets" field in their document







Detailed component design

  • Read timeline api - The cache should be AWS elasticache allowing for managed scaling of the cache
  • The client should read attachment references using a CDN like AWS Cloudfront allowing for faster loading of the attachments which could be images or videos in an Availability zone closer to them
  • The write api should store the attachments in an object store like AWS S3 and in the tweet document store a reference to that CDN reference to that object. For example if we are storing objects in S3 and using Cloudfront we will store the cloudfront link to that image in the tweet document





Trade offs/Tech choices

  • Using a NoSQL database here over a SQL database. We are using this because there is not a strict relational mapping between any of the actions the server needs to perform. Also as the number of writes grows scaling the SQL database vertically will become challenging and expensive. Using NoSQL from the beginning will allow us to horizontally scale into billions of users and write requests. Also eventual consistency is fine for a social media platform and we do not need ACID transactions to show people tweets or the number of likes as it will not affect the user experience.
  • We are also assuming that the user will see every tweet. If the user only wanted to see tweets from user's they follow, SQL could be better to join the read api calls where the user's follower list contains the tweets user UID





Failure scenarios/bottlenecks

  • We can replicate the database so in case of an emergencies we have a backup. This can be done every hour as losing an hour of tweets at the worst is not that big of an impact on user experience.
  • If we have a cache miss there could be a large amount of tweets that a user has to load. This means on the tail end, one user will have a long load time to load new tweets and cache
  • Availability should not be a concern as we will use managed services for the backend. AWS ELB for the load balancers, EC2 for the servers, Cloudfront for the CDN, S3 for the object store, elasticache for the read cache, and dynamodb for the NoSQL database.




Future improvements

  • Have a worker that fans out the new tweets to individual user's timelines. Instead of having to read the from the cache or check the database, a worker would generate a timeline for the user before they request it. Then when it is requested it is guaranteed to already be generated