Design Twitter - System Design

System Requirements

Functional requirements:

Non-functional requirements

Availability: Each request should get a response without error, without the guarantee that the data is the most recent
Consistency: Eventual consistency is chosen
Partition tolerance: The system should still operate even if some message are dropped due to the network between nodes
Low-latency: The user can see their timeline within 500ms

Assumption:

200 million DAU, each user post 3 tweets per day = 600 million tweet per day.

Each tweet with 140 bytes as content and 30bytes as metadata, and 20% of them contains photo 20KB, and 10% of contains 2Mb video.

Each user read 5 times hometimeline and 5 times other user's timeline, each timeline contains 20 tweets.

Data storage:

so the total size will be: 600m * (170bytes + 20kb * 30% + 2Mb * 10%) = 180TB per day

Bandwidth: 200 million * (5 + 5) * 20 * (140 bytes + 10 % * 2Mb + 20% * 20kb) / 86400 = 120 GB/s

createTweet(userToken, String tweetcontent) -> response status code
hometimeline(userToken, int pagesize, optional int pageOffset: indicating current page location) -> tweets list
user timeline(userToken, int userId, int pagesize, optional int pageOffset) -> tweets list
likeOrUnlikeTweet(userToken, int tweetId, boolean likeOrDislike) -> response status code

I'd choose MongoDB as our database, because:

Database design:

Tweet:

TweetID: Integer, primary key

content: Varchar(140)

Metadata: Varchar(30)

....

User:

userId: Integer, primary key

email: varchar(30)

isHotUser: Boolean

Follower:

followerUserId: Integer

FolloweeUserId: Integer

FollowingDate: Timestamp

Can see from the high level diagram