System requirements
Functional:
List functional requirements for the system (Ask the chat bot for hints if stuck.)...
- post a tweet
- tweet may contain media like photos and videos
- view others' tweets
- modify/delete a tweet
- receive others' updates by newsfeed
- favorite/like a specific tweet
- follow/unfollow a user
Non-Functional:
List non-functional requirements for the system...
Let's assume we have 1 million daily active users. Each user posts 1 tweet and views 100 tweets, on average every day.
So read TPS: 1M * 100 / 3600 / 24 = 1200. write TPS = 12.
This will be a read-heavy system.
Capacity estimation
Estimate the scale of the system you are going to design...
For storage capacity, let's assume each tweet's text is 140 characters at maximum and 100 bytes on average (including multi-language support). In addition, metadata should also be considered, let's say 32 bytes. Therefore, every day we need 1M * (100 + 32) = 132 megabytes. Also, there is one tweet with media every 10 tweets on average and media is 10M on average. So there should be 100k * 10M = 1T disk storage. The majority of the capacity will be spent on media, so every day the system will consume 1T storage.
API design
Define what APIs are expected from the system...
POST /api/tweet
parameters:
- api_token: this is used for authorization
- tweet_text
- media_urls: media are pre-uploaded and referenced by this api via urls
- hashtags
returns:
- error code: 200 as succeed. Other codes should be accompanied by an error message
DELETE /api/tweet
parameters:
- api_token: this is used for authorization
- tweet_id: primary key to locate the target tweet
returns: error code/success
GET /api/tweet
parameters:
- api_token
- tweet_id
returns: a json including the tweet or error code
PATCH /api/tweet
parameters:
- api_token
- tweet_id
- tweet_text
- media_urls
- hashtags
returns: success or error code
POST /api/media/
parameters:
- api_token
- media: binary file
returns whether:
- success + url
- error code + error message
GET /api/feed
parameters:
- api_token
- max_id: for pagination
- min_id: for refreshing
- page_size
return a json including a list of tweets as newsfeed.
POST /api/tweet/like
parameters:
- api_token
- tweet_id
returns: success or error code
POST /api/user/follow
parameters:
- api_token
- user_id
returns: success or error code
DELETE /api/user/unfollow
parameters:
- api_token
- user_id
returns: success or error code
Database design
Defining the system data model early on will clarify how data will flow among different components of the system. Also you could draw an ER diagram using the diagramming tool to enhance your design...
There should be mainly three parts. One is user-specific data. This can be stored in a relational database as there might be some join operations/subqueries performed and relational DBs have a better handling on indexes and joins. Plus user data are not big data compared to tweets data. Tables include User, Account, Relationship
Tweets data and media metadata, on the other hand, should be stored on non-relational databases like Cassandra and DynamoDB, as the data size is extremely large and NoSQL is distributed naturally to scale the storage. Tables include Tweet, Photo, Video, UserLike, NewsFeed
Media data can be stored on object storage like Amazon S3 for robustness and the access can later be sped up via CDN. File system is also a good option but I do not prefer that as we need to maintain the servers by ourselves but S3 is serverless and autoscalable.
High-level design
You should identify enough components that are needed to solve the actual problem from end to end. Also remember to draw a block diagram using the diagramming tool to augment your design. If you are unfamiliar with the tool, you can simply describe your design to the chat bot and ask it to generate a starter diagram for you to modify...
Please see the high level diagram
Request flows
Explain how the request flows from end to end in your high level design. Also you could draw a sequence diagram using the diagramming tool to enhance your explanation...
Please see the sequence diagram
Detailed component design
Dig deeper into 2-3 components and explain in detail how they work. For example, how well does each component scale? Any relevant algorithm or data structure you like to use for a component? Also you could draw a diagram using the diagramming tool to enhance your design...
For newsfeed service, there are two types of models: push model and pull model.
Push model pros:
- speeds up the newsfeed query process so that users can see it immediately
Push model cons:
- consumes extra space, especially for a celebrity's tweet
- takes time to generate
- involves latency, not as timely as a pull model
Pull model pros:
- do not need extra space
- does not consume time to generate beforehand
- reflects more timely update
Pull model cons:
- takes time to query and aggregate
Push model is suitable for non-celebrity users while pull model is fit for celebrities. We can use a hybrid approach to apply push model on normal users and pull model on celebrities. Also for querying and aggregation, we can add a cache layer on top of tweet database and store celebrity tweets there to speed up the query process. We could use LRU as eviction strategy and cache through to get a better handle on cache misses.
Also if we choose push model, it takes time to pre-generate newsfeed information and store it into newsfeed table. We can use a message queue to decouple producers (assign the generation tasks) and consumers (generate feeds) in newsfeed service. This also holds for media uploading while creating a post. We can send a signal to frontend to incidate that uploading is completed (use spinning circle to tell that it is uploading and stop it as success after receiving the signal).
Trade offs/Tech choices
Explain any trade offs you have made and why you made certain tech choices...
Please see the discussion push/pull models.
Failure scenarios/bottlenecks
Try to discuss as many failure scenarios/bottlenecks as possible.
Databases are failed/bottlenecks.
Future improvements
What are some future improvements you would make? How would you mitigate the failure scenario(s) you described above?
Data partitioning:
based on hash(tweet_id): prone to scaling (add/remove hosts)
consistent hashing: more robust to scaling, also includes replication naturally.