System requirements
- user
- tweets
Functional:
List functional requirements for the system (Ask the chat bot for hints if stuck.)...
- user can post tweet
- user can follow a user
- user can view a personalized timeline showing tweets from users they follow
- user can search for tweets, users and tags
Non-Functional:
List non-functional requirements for the system...
- high availability. service should have a high uptime.
- consistency is not as important. it is acceptable for some users to see tweets update earlier than others
- latency should be low. we want the service to be responsive and low latency so that their user experience is good.
Capacity estimation
Estimate the scale of the system you are going to design...
- say we have about 10 million DAU
- I'm assuming there'd be an average of about 5 million tweets posted per day
- this is 5 million writes per day
- 5 million / 100,000 = 5 writes per sec
- I'm assuming read to write ratio is about 10:1
- 50 reads per sec
- this is a notably read heavy system
- storage:
- tweet - id, user,_id body, media_id
- for writes, each tweet would consist of:
- id: 16 Bytes
- user_id: 16 Bytes
- body: 50 Bytes
- media_id: 16 Bytes
- total = 100 Bytes
- 100 Bytes per write * 5 mil writes/day= 500 mil Bytes/day = 500 MB/ day = 1500MB/yr = 3000MB/yr = 1GB/yr = 2GB in 2 years
API design
Define what APIs are expected from the system...
- User:
- postTweet():
- POST /user/{user_id}/tweet
- input: body and media of tweet
- followUser():
- UPDATE /user/{user_id}/follow/{user_id}
- searchTweet():
- POST /search/{tags}
- likeTweet():
- POST /user/{user_id}/tweet/{tweet_id]
- getTweets():
- get tweets for a specific user
- GET /tweets/{user_id}
- timeline:
- GET /timeline/{offset}
- -> list of tweet id
- paginated . we don't want to return entire tweets as it isa very expensive query
Database design
Defining the system data model early on will clarify how data will flow among different components of the system. Also you could draw an ER diagram using the diagramming tool to enhance your design...
SQL? - our storage constraints are definitely within the range for SQL
schema? need ACID?
consistency isn't as important so we don't necessarily need ACID guarantees.
though, our problem does seem to call for a structured schema.
problem with sql though is that it is hard to horizontally scale sinc eit tries to maintain acid properties, which we don't even need anyways
i may go with Nosql just for horizontal scaling. because perhaps we will have peak times and horizontal scaling would be necessary to maintain high performance during these peak times
Tweets - id, media_id, user_id, body, created_At, likes
User - id, password(encrypted)
Follows - follower_id, followee_id, timestamp
media_id points to a location in a object store we use
object store stores th eimages - good here, since images / media are static . database aren't meant to hold large inbyar data and
High-level design
You should identify enough components that are needed to solve the actual problem from end to end. Also remember to draw a block diagram using the diagramming tool to augment your design. If you are unfamiliar with the tool, you can simply describe your design to the chat bot and ask it to generate a starter diagram for you to modify...
- CDN contains cached STATIC items, like ikmages and videos. can retrieve geographically popular static ocntent with lower latency
- include api gateway for authetnication and rate limting to prevent ddos
- the Tweet service:
- reiceves requests fofm client to post a tweet
- adds tweet metadata to the database
- add tweet media, if included into S3
- User serivce:
- recieves requests form client to follow/unfollow a user
- adjust the Follow table within the database
- cache can hold timeline for more active users, so this page loads more quickly for them. can use LRU as invalidation method
- data should be replicated within the database for greater fault tolerance.
Request flows
Explain how the request flows from end to end in your high level design. Also you could draw a sequence diagram using the diagramming tool to enhance your explanation...
- posting Tweet:
- user calls postTweet() API and sends request to our load balancer , which distributed requests among our multiple servers .
- sent to the API gateway of our chosen server. this API gateway sneds our request to the Tweet Service
- Tweet Service stores this new tweet into our NoSQL Database.
- view Timeline:
- user goes to main homepage /timeline and sends request to our load balancer
- API gateway to Timeline Service
- We use a cache aside model
- Timeline first queries Cache to see if user's timeline is here. if not (cache miss), go through the next steps
- Timeline Service is going to query the NoSQL to generate a personalized timeline for the user
- Timeline Service will also keep a cache of the most active users' timelines so these are fetched more quickly
- sends a list of tweet_id for the timeline . default offset is 0
- client recives tweet_id and loads from their client; CDN is available for static content (media)
Detailed component design
Dig deeper into 2-3 components and explain in detail how they work. For example, how well does each component scale? Any relevant algorithm or data structure you like to use for a component? Also you could draw a diagram using the diagramming tool to enhance your design...
- Load baalancer
- ensure stateless , so we don't need to ensure clients are sent to the same server everytime. to ensure this, we can keep session data within our database
- I think it'd make sense to set a "least connections" protocol for this load balancer. this way, requests are send to servers with the least connections
- tradeoff of introducing LB is that we do need to create multiple API Gateways to service our requests.
- Timeline service: how does it work ?
- recommend K most interesting tweet for the user
- interest_score = number_of_likes * weight_like + weight_followers * number of followes on the tweet's author + weight_retweet * nmber_of_retweets
- use a top k algorithm - heap data structure so we can maintain this in a heap, and pushing data to this heap is O logn operation , getting top K is O k operation
Trade offs/Tech choices
Explain any trade offs you have made and why you made certain tech choices...
- most notably, our decision on our database.
consistency isn't as important so we don't necessarily need ACID guarantees.
though, our problem does seem to call for a structured schema.
problem with sql though is that it is hard to horizontally scale sinc eit tries to maintain acid properties, which we don't even need anyways
i may go with Nosql just for horizontal scaling. because perhaps we will have peak times and horizontal scaling would be necessary to maintain high performance during these peak times
i suspect our number of users to grow too, so being prepared to scale horizontally in the future would be a wise decision
this also offers better performance, which fits our requirements for low latency
Failure scenarios/bottlenecks
Try to discuss as many failure scenarios/bottlenecks as possible.
- High traffic. a famous person posts something and millions of users want to view and like this comment. this would overhwlem the document of this database.
- to allow for more cache hits for the famous people and less read and writes, we could introduce sharding in this case
- shard by the name of the user A-F, etc
- this way, when a request comes in, they are sent to the appropriate shard . so if a user makes a request to a non popular user, maybe they aren't in that shard so it doesn't overhwlem that strained, famous part of the database at that moment. we can have more cache hits with our Tweets - if a user requests data then the famous ones are going to be likely more cached. we should use shard-level caching in this case, where each shard has a cache available for them
- tradefoff: complexity. we are going to have to introduce another load balancer to distribute requests among our shards and more hardware for each shard
Future improvements
What are some future improvements you would make? How would you mitigate the failure scenario(s) you described above?
- monitoring solutions to observe health of our components and services to observe which are overhwlemed