System requirements


Functional:

  • postTweet
    • This is for posting the tweet for user to the service
  • viewTweet
    • If tweet id go to service, the service shows the tweet
  • viewFeed
    • User can see their followee's tweets as feed
  • Follow
    • User can follow others



Non-Functional:

  • Highly Availabile
  • Low Latency when load the tweets




Capacity estimation

Estimate the scale of the system you are going to design...


DAU : ~ 100M Users

Each User post 1 tweet per day

Each tweet takes about 5KB

5KB * 100M * 365 * 10 is the storage expectation for the service (in 10 years)



API design

Define what APIs are expected from the system...

/api/postTweet

ReqBody:

{

UserID

PostObject (Video, picture)

Post Description

}


ResBody:

{

status

}


/api/getTweet

ReqBody:

{

TweetID

}

ResBody:

{

Raw Data

}


/api/viewFeed/UserID

ResBody:

{

List of Tweets from followee

}


/api/follow

ReqBody:

{

FollowerID,

FolloweeID

}

ResBody:

{

Status

}





Database design

Defining the system data model early on will clarify how data will flow among different components of the system. Also you could draw an ER diagram using the diagramming tool to enhance your design...


Tweet Table (Metadata){

TweetID : String PK

UserID : String FK

Created_By : Date

Data URL : String

Desc : String

}


Follow Table {

Follow ID : String PK

Follower ID : String FK

Followee ID : String FK

}


User Table {

User ID : String PK

User Name : String

User Profile Picture URL : String

}



High-level design

Drawn in HLD Diagram


  • Client connect to LB for load balancing the request (also can add Rate Limit for preventing DDOS)
  • Object store connect to user service and tweet service
  • Find followers through GraphDB and move over the feed with from followee





Request flows

  • postTweet
    • client request to tweet service and it stores the photo or videos to Object Store and get URL from it.
    • Tweet Service will store the data to DB
  • viewTweet
    • Client request to view tweet and see if the tweet is existed from DB
    • get photo or video from Object Store (if cache available or CDN available, use from that)
    • view the tweet
  • Follow user
    • request go to user service
    • follow the user through making the data follow table to GraphDB
  • view Feed
    • request go to Tweet Service and use User Service graph DB to get followees
    • find the DB post from followee and list of the tweets responded





Detailed component design

For Cache hit, it will use LRU cache since the least recent tweet will lose views.

GraphDB will make adjancy list for user based on followees

CDN will cache the small set of data of static contents like video or picture




Trade offs/Tech choices

Instead of using NoSQL, the reason why I use RDBMS is that the table have complicated relations. We can use NoSQL which has strong capacity on horizontal scaling






Failure scenarios/bottlenecks

Try to discuss as many failure scenarios/bottlenecks as possible.


LB should be single point of failure (it can be covered by trying adding some passive LB)

Cache can be also single point of failure (it will leads high latency to view feed)

Without Rate Limiting, danger from DDOS



Future improvements

Add Rate Limiting and Passive LB for improvement on availability

Partitioning DB Based on User ID (Shard) -> improve process on viewFeed

Make available on multiple pictures or videos to store

Authentication and Authorization system

Notification system when followw post new tweet