System requirements


Functional:

Users can create and share tweets

User can track other users (follow each other)

User will be notified if someone they follow tweets

Ability to like a tweet



Non-Functional:

Largely scalable, capable of saving lots of tweets and data for each user - handle 100 million users, ~5 requests a day

Speed - quickly respond to requests < 1000ms

Availability - ensure practically 100% uptime

Security - protect data




Capacity estimation

Estimated users ~100 million users

Estimated requests ~500 million requests per day

Estimated storage ~1 KB of storage for each tweet (timestamp, user ID, likes), potentially 500 GB of data saved per day.


Data model -

Users - user info, username, email, password, profile info

Tweets - text, timestamp, user ID, likes

Followers - follower ID and followee ID



API design

RESTful API

  • POST /tweets - create new tweet
  • GET /tweets - retrieve tweet list
  • POST /likes - like a tweet
  • GET /followers - retrieve followers list
  • GET /following - retrieve following list




Database design

We can use an SQL database, lets say PostgreSQL


User table - user ID, username, email, password

Tweet table - text, timestamp, user id, likes

  • can index id column for query performance

Follower table - follower ID, followee ID


High-level design

Starting from the client, we will connect to some load balancer to distribute incoming traffic to multiple servers. We will implement caching mechanisms for things like frequently used data, like popular tweets or users.


Use a CDN to display app assets to users quickly


Can use database replication for high availability






Request flows

User creates a new tweet

  • validate request
  • store tweet
  • return OK

User requests tweet list

  • retrieve tweets from database
  • return OK with list of tweets

User requests to like a tweet

  • validate request
  • update like count in database
  • return OK




Detailed component design

Load balancer can be managed using nginx

Caching can be managed using Redis

CDN can be managed with cloudflare


Database: PostgreSQL




Trade offs/Tech choices

Scalability vs performance: using caching

Security vs ease of use: HTTPS and OAuthentication (2 factor)


Using a relational database to query and handle a user's data and their tweets




Failure scenarios/bottlenecks

Unexpected failures can occur with the database, load balancer or caching, which makes it important to implement multiple kinds of these load balancers, and also to replicate the database to ensure uptime




Future improvements

A way to search for tweets, user profile screen with replies, media and likes

A way to message each other

Tweet analytics for users to work on their engagement