Design Twitter - System Design

System requirements

Functional:

post tweet
view updates of other users
like
comment
notifications
follow user

Non-Functional:

fast read than write
- cache for timeline
availability > consistency => eventual consistency
- async update for writing
scalability
- load balancer
fault tolerance
- replication of server and db

Capacity estimation

1M users
280 per a tweet
10 tweet per a day
=1M * 10 * 280 = 2.8GB/day

API design

POST /create/tweet

{

"content" : String,

"createBy" : timestamp,

"userID" : String

}

GET /get/timeline?userID=

POST /like/tweet?

{

"postID" : String,

"userID" : String

}

POST /comment/tweet?

{

"postID" : String,

"userID" : String,

"comment" : String

}

POST /follow/user?

{

"follower" : String,

"followee" : String

}

Database design

[User]

ID, String, primary key

name, string

address, string

[tweet]

ID, String, primary key

content, String

createdBy, String

createdAt, timestamp

updatedAt, timestamp

like, integer

[comment]

ID, String, primary key,

tweetID, String,

content, String,

createdBy, String,

createdAt, timestamp,

updatedAt, timestamp

[follow]

Follower, String

Follewee, String

High-level design

I draw in high level diagram

Request flows

read flow

user->load balancer->server->cache->database

write flow

user->load balancer->server->worker->database and notification service -> user

Detailed component design

Dig deeper into 2-3 components and explain in detail how they work. For example, how well does each component scale? Any relevant algorithm or data structure you like to use for a component? Also you could draw a diagram using the diagramming tool to enhance your design...

Trade offs/Tech choices

cache aside

user can view stale data until cache ttl expired

Failure scenarios/bottlenecks

worker failture

can't update to database
user can't lose his content

cache failture

can access through database

cache bottlenecks

large request to cache and store data can be too big

Future improvements

What are some future improvements you would make? How would you mitigate the failure scenario(s) you described above?