System requirements
Functional:
- post tweet
- view updates of other users
- like
- comment
- notifications
- follow user
Non-Functional:
- fast read than write
- cache for timeline
- availability > consistency => eventual consistency
- async update for writing
- scalability
- load balancer
- fault tolerance
- replication of server and db
Capacity estimation
- 1M users
- 280 per a tweet
- 10 tweet per a day
- =1M * 10 * 280 = 2.8GB/day
API design
POST /create/tweet
{
"content" : String,
"createBy" : timestamp,
"userID" : String
}
GET /get/timeline?userID=
POST /like/tweet?
{
"postID" : String,
"userID" : String
}
POST /comment/tweet?
{
"postID" : String,
"userID" : String,
"comment" : String
}
POST /follow/user?
{
"follower" : String,
"followee" : String
}
Database design
[User]
ID, String, primary key
name, string
address, string
[tweet]
ID, String, primary key
content, String
createdBy, String
createdAt, timestamp
updatedAt, timestamp
like, integer
[comment]
ID, String, primary key,
tweetID, String,
content, String,
createdBy, String,
createdAt, timestamp,
updatedAt, timestamp
[follow]
Follower, String
Follewee, String
High-level design
I draw in high level diagram
Request flows
read flow
- user->load balancer->server->cache->database
write flow
- user->load balancer->server->worker->database and notification service -> user
Detailed component design
Dig deeper into 2-3 components and explain in detail how they work. For example, how well does each component scale? Any relevant algorithm or data structure you like to use for a component? Also you could draw a diagram using the diagramming tool to enhance your design...
Trade offs/Tech choices
cache aside
- user can view stale data until cache ttl expired
Failure scenarios/bottlenecks
worker failture
- can't update to database
- user can't lose his content
cache failture
- can access through database
cache bottlenecks
- large request to cache and store data can be too big
Future improvements
What are some future improvements you would make? How would you mitigate the failure scenario(s) you described above?