Design Twitter - System Design

System requirements

user
tweets

Functional:

List functional requirements for the system (Ask the chat bot for hints if stuck.)...

user can post tweet
user can follow a user
user can view a personalized timeline showing tweets from users they follow
user can search for tweets, users and tags

Non-Functional:

List non-functional requirements for the system...

high availability. service should have a high uptime.
consistency is not as important. it is acceptable for some users to see tweets update earlier than others
latency should be low. we want the service to be responsive and low latency so that their user experience is good.

Capacity estimation

Estimate the scale of the system you are going to design...

say we have about 10 million DAU
I'm assuming there'd be an average of about 5 million tweets posted per day
this is 5 million writes per day
5 million / 100,000 = 5 writes per sec
I'm assuming read to write ratio is about 10:1
50 reads per sec
this is a notably read heavy system
storage:
tweet - id, user,_id body, media_id
for writes, each tweet would consist of:
id: 16 Bytes
user_id: 16 Bytes
body: 50 Bytes
media_id: 16 Bytes
total = 100 Bytes
100 Bytes per write * 5 mil writes/day= 500 mil Bytes/day = 500 MB/ day = 1500MB/yr = 3000MB/yr = 1GB/yr = 2GB in 2 years

API design

Define what APIs are expected from the system...

User:
postTweet():
POST /user/{user_id}/tweet
input: body and media of tweet
followUser():
UPDATE /user/{user_id}/follow/{user_id}
searchTweet():
POST /search/{tags}
likeTweet():
POST /user/{user_id}/tweet/{tweet_id]
getTweets():
get tweets for a specific user
GET /tweets/{user_id}
timeline:
GET /timeline/{offset}
-> list of tweet id
paginated . we don't want to return entire tweets as it isa very expensive query

Database design

Defining the system data model early on will clarify how data will flow among different components of the system. Also you could draw an ER diagram using the diagramming tool to enhance your design...

SQL? - our storage constraints are definitely within the range for SQL

schema? need ACID?

consistency isn't as important so we don't necessarily need ACID guarantees.

though, our problem does seem to call for a structured schema.

problem with sql though is that it is hard to horizontally scale sinc eit tries to maintain acid properties, which we don't even need anyways

i may go with Nosql just for horizontal scaling. because perhaps we will have peak times and horizontal scaling would be necessary to maintain high performance during these peak times

Tweets - id, media_id, user_id, body, created_At, likes

User - id, password(encrypted)

Follows - follower_id, followee_id, timestamp

media_id points to a location in a object store we use

object store stores th eimages - good here, since images / media are static . database aren't meant to hold large inbyar data and

High-level design

You should identify enough components that are needed to solve the actual problem from end to end. Also remember to draw a block diagram using the diagramming tool to augment your design. If you are unfamiliar with the tool, you can simply describe your design to the chat bot and ask it to generate a starter diagram for you to modify...

CDN contains cached STATIC items, like ikmages and videos. can retrieve geographically popular static ocntent with lower latency
include api gateway for authetnication and rate limting to prevent ddos
the Tweet service:
reiceves requests fofm client to post a tweet
adds tweet metadata to the database
add tweet media, if included into S3
User serivce:
recieves requests form client to follow/unfollow a user
adjust the Follow table within the database
cache can hold timeline for more active users, so this page loads more quickly for them. can use LRU as invalidation method
data should be replicated within the database for greater fault tolerance.

Request flows

Explain how the request flows from end to end in your high level design. Also you could draw a sequence diagram using the diagramming tool to enhance your explanation...

posting Tweet:
user calls postTweet() API and sends request to our load balancer , which distributed requests among our multiple servers .
sent to the API gateway of our chosen server. this API gateway sneds our request to the Tweet Service
Tweet Service stores this new tweet into our NoSQL Database.
view Timeline:
user goes to main homepage /timeline and sends request to our load balancer
API gateway to Timeline Service
We use a cache aside model
Timeline first queries Cache to see if user's timeline is here. if not (cache miss), go through the next steps
Timeline Service is going to query the NoSQL to generate a personalized timeline for the user
Timeline Service will also keep a cache of the most active users' timelines so these are fetched more quickly
sends a list of tweet_id for the timeline . default offset is 0
client recives tweet_id and loads from their client; CDN is available for static content (media)

Detailed component design

Dig deeper into 2-3 components and explain in detail how they work. For example, how well does each component scale? Any relevant algorithm or data structure you like to use for a component? Also you could draw a diagram using the diagramming tool to enhance your design...

Load baalancer
ensure stateless , so we don't need to ensure clients are sent to the same server everytime. to ensure this, we can keep session data within our database
I think it'd make sense to set a "least connections" protocol for this load balancer. this way, requests are send to servers with the least connections
tradeoff of introducing LB is that we do need to create multiple API Gateways to service our requests.
Timeline service: how does it work ?
recommend K most interesting tweet for the user
interest_score = number_of_likes * weight_like + weight_followers * number of followes on the tweet's author + weight_retweet * nmber_of_retweets
use a top k algorithm - heap data structure so we can maintain this in a heap, and pushing data to this heap is O logn operation , getting top K is O k operation

Trade offs/Tech choices

Explain any trade offs you have made and why you made certain tech choices...

most notably, our decision on our database.

consistency isn't as important so we don't necessarily need ACID guarantees.

though, our problem does seem to call for a structured schema.

problem with sql though is that it is hard to horizontally scale sinc eit tries to maintain acid properties, which we don't even need anyways

i may go with Nosql just for horizontal scaling. because perhaps we will have peak times and horizontal scaling would be necessary to maintain high performance during these peak times

i suspect our number of users to grow too, so being prepared to scale horizontally in the future would be a wise decision

this also offers better performance, which fits our requirements for low latency

Failure scenarios/bottlenecks

Try to discuss as many failure scenarios/bottlenecks as possible.

High traffic. a famous person posts something and millions of users want to view and like this comment. this would overhwlem the document of this database.
to allow for more cache hits for the famous people and less read and writes, we could introduce sharding in this case
shard by the name of the user A-F, etc
this way, when a request comes in, they are sent to the appropriate shard . so if a user makes a request to a non popular user, maybe they aren't in that shard so it doesn't overhwlem that strained, famous part of the database at that moment. we can have more cache hits with our Tweets - if a user requests data then the famous ones are going to be likely more cached. we should use shard-level caching in this case, where each shard has a cache available for them
tradefoff: complexity. we are going to have to introduce another load balancer to distribute requests among our shards and more hardware for each shard

Future improvements

What are some future improvements you would make? How would you mitigate the failure scenario(s) you described above?

monitoring solutions to observe health of our components and services to observe which are overhwlemed