System requirements
Functional:
- postTweet
- This is for posting the tweet for user to the service
- viewTweet
- If tweet id go to service, the service shows the tweet
- viewFeed
- User can see their followee's tweets as feed
- Follow
- User can follow others
Non-Functional:
- Highly Availabile
- Low Latency when load the tweets
Capacity estimation
Estimate the scale of the system you are going to design...
DAU : ~ 100M Users
Each User post 1 tweet per day
Each tweet takes about 5KB
5KB * 100M * 365 * 10 is the storage expectation for the service (in 10 years)
API design
Define what APIs are expected from the system...
/api/postTweet
ReqBody:
{
UserID
PostObject (Video, picture)
Post Description
}
ResBody:
{
status
}
/api/getTweet
ReqBody:
{
TweetID
}
ResBody:
{
Raw Data
}
/api/viewFeed/UserID
ResBody:
{
List of Tweets from followee
}
/api/follow
ReqBody:
{
FollowerID,
FolloweeID
}
ResBody:
{
Status
}
Database design
Defining the system data model early on will clarify how data will flow among different components of the system. Also you could draw an ER diagram using the diagramming tool to enhance your design...
Tweet Table (Metadata){
TweetID : String PK
UserID : String FK
Created_By : Date
Data URL : String
Desc : String
}
Follow Table {
Follow ID : String PK
Follower ID : String FK
Followee ID : String FK
}
User Table {
User ID : String PK
User Name : String
User Profile Picture URL : String
}
High-level design
Drawn in HLD Diagram
- Client connect to LB for load balancing the request (also can add Rate Limit for preventing DDOS)
- Object store connect to user service and tweet service
- Find followers through GraphDB and move over the feed with from followee
Request flows
- postTweet
- client request to tweet service and it stores the photo or videos to Object Store and get URL from it.
- Tweet Service will store the data to DB
- viewTweet
- Client request to view tweet and see if the tweet is existed from DB
- get photo or video from Object Store (if cache available or CDN available, use from that)
- view the tweet
- Follow user
- request go to user service
- follow the user through making the data follow table to GraphDB
- view Feed
- request go to Tweet Service and use User Service graph DB to get followees
- find the DB post from followee and list of the tweets responded
Detailed component design
For Cache hit, it will use LRU cache since the least recent tweet will lose views.
GraphDB will make adjancy list for user based on followees
CDN will cache the small set of data of static contents like video or picture
Trade offs/Tech choices
Instead of using NoSQL, the reason why I use RDBMS is that the table have complicated relations. We can use NoSQL which has strong capacity on horizontal scaling
Failure scenarios/bottlenecks
Try to discuss as many failure scenarios/bottlenecks as possible.
LB should be single point of failure (it can be covered by trying adding some passive LB)
Cache can be also single point of failure (it will leads high latency to view feed)
Without Rate Limiting, danger from DDOS
Future improvements
Add Rate Limiting and Passive LB for improvement on availability
Partitioning DB Based on User ID (Shard) -> improve process on viewFeed
Make available on multiple pictures or videos to store
Authentication and Authorization system
Notification system when followw post new tweet