System requirements
Functional:
Users can create and share tweets
User can track other users (follow each other)
User will be notified if someone they follow tweets
Ability to like a tweet
Non-Functional:
Largely scalable, capable of saving lots of tweets and data for each user - handle 100 million users, ~5 requests a day
Speed - quickly respond to requests < 1000ms
Availability - ensure practically 100% uptime
Security - protect data
Capacity estimation
Estimated users ~100 million users
Estimated requests ~500 million requests per day
Estimated storage ~1 KB of storage for each tweet (timestamp, user ID, likes), potentially 500 GB of data saved per day.
Data model -
Users - user info, username, email, password, profile info
Tweets - text, timestamp, user ID, likes
Followers - follower ID and followee ID
API design
RESTful API
- POST /tweets - create new tweet
- GET /tweets - retrieve tweet list
- POST /likes - like a tweet
- GET /followers - retrieve followers list
- GET /following - retrieve following list
Database design
We can use an SQL database, lets say PostgreSQL
User table - user ID, username, email, password
Tweet table - text, timestamp, user id, likes
- can index id column for query performance
Follower table - follower ID, followee ID
High-level design
Starting from the client, we will connect to some load balancer to distribute incoming traffic to multiple servers. We will implement caching mechanisms for things like frequently used data, like popular tweets or users.
Use a CDN to display app assets to users quickly
Can use database replication for high availability
Request flows
User creates a new tweet
- validate request
- store tweet
- return OK
User requests tweet list
- retrieve tweets from database
- return OK with list of tweets
User requests to like a tweet
- validate request
- update like count in database
- return OK
Detailed component design
Load balancer can be managed using nginx
Caching can be managed using Redis
CDN can be managed with cloudflare
Database: PostgreSQL
Trade offs/Tech choices
Scalability vs performance: using caching
Security vs ease of use: HTTPS and OAuthentication (2 factor)
Using a relational database to query and handle a user's data and their tweets
Failure scenarios/bottlenecks
Unexpected failures can occur with the database, load balancer or caching, which makes it important to implement multiple kinds of these load balancers, and also to replicate the database to ensure uptime
Future improvements
A way to search for tweets, user profile screen with replies, media and likes
A way to message each other
Tweet analytics for users to work on their engagement