System requirements
Functional:
List functional requirements for the system (Ask the chat bot for hints if stuck.)...
- Post things that have text and images in them
- Can like and "re-tweet" posts others have made in your followed feed
- Can post replies to a tweet
- Can follow people
- See other people you're following's posts (and retweets) and likes
- User profile / bio
- Users can delete and edit their posts
Out of scope:
- A recommended feed
- Users messaging
Non-Functional:
List non-functional requirements for the system...
- Availability >> consistency, we would like this service to be available as much as possible, but consistent for your own posts, likes, media
- Support media, assuming only images and not videos
- 10 million posts, likes, retweets per second
- Available on mobile + PC. All actions done on your account should be consistent across platforms for yourself.
Capacity estimation
Estimate the scale of the system you are going to design...
There will be 10 million posts, likes, retweets per second.
API design
Define what APIs are expected from the system...
- Tweets / Retweets
- PUT (/{userId}/tweet/{tweetId}) - post tweets
- POST (/{userId}/tweet/{tweetId}/edit) - edit tweet
- DELETE (/{userId}/tweet/{tweetId}/delete) - delete tweet
- GET (/{userId}/tweets) - retrieve all users tweets / retweets (retweets will be have a categorization in the database design)
- Users
- PUT (/user/) - create new user
- GET (/user/{userId}) - get user
- GET (/user/{userId}/likes) - get a users likes
- DELETE (/user/{userId}/delete) - delete user
- POST (/user/{userId}/edit) - edit user bio + picture
- POST (/user/{userId}/following) - edit who you're following
- Follow Feed
- GET (/{userId}/following) - get feed for users that you're following
Database design
Defining the system data model early on will clarify how data will flow among different components of the system. Also you could draw an ER diagram using the diagramming tool to enhance your design...
- Users (table)
- id
- profileImage (URL)
- description / bio (limited text)
- following (set, unique values of userIds)
- tweets (set, unique values of tweetIds)
- followFeed (list of tweets, elaborated more in component design)
- Tweets (table)
- id
- text
- media (URL)
- isRetweet (bool)
- isReply (bool)
- dateTime
High-level design
You should identify enough components that are needed to solve the actual problem from end to end. Also remember to draw a block diagram using the diagramming tool to augment your design. If you are unfamiliar with the tool, you can simply describe your design to the chat bot and ask it to generate a starter diagram for you to modify...
- A PC web app, and a phone app for iOS and Android
- AWS API Gateway
- A profile service
- User feed service
- Tweet service
- Cache for frequently accessed tweets
- Database:
- Users Table
- Tweets Table
- S3 buckets for tweets and profile pics
- Image compression service
Request flows
Explain how the request flows from end to end in your high level design. Also you could draw a sequence diagram using the diagramming tool to enhance your explanation...
- a user would use the web app or the mobile application
- the api gateway would route them to the home screen, which usually is the user feed service.
- the user feed will show the most recent (24h) tweets (or retweets) of the people they're following
- if they need to edit, create, or delete their account, they'd go to the profile service
- the User profile would have been created, edited, or deleted within the user table
- If they want to change or add a profile pic, they'll submit it and it'll go to the S3 bucket. The bucket will spit out a URL for us to add to the user table.
- if they want to post, edit, view a particular tweet, they would go to the tweet service
- the user's particular tweet can be posted, edited, or deleted within the users tweets list
- if there is media, that will get added into a media S3 bucket within the user's particular tweet table.
- profiles, tweets and user feeds will go through the cache first before the postgres DB
Detailed component design
Dig deeper into 2-3 components and explain in detail how they work. For example, how well does each component scale? Any relevant algorithm or data structure you like to use for a component? Also you could draw a diagram using the diagramming tool to enhance your design...
S3 Buckets:
- The S3 buckets will return a particular URL for each of the images. For profile images, they will be limited to a compressed value of 15 MBs. Tweets will have the limit of 256 MB post compression.
User Feed Generation:
- User feeds will be generated from up to 48h ago, based off tweets from people followed.
- Each user will have a follow feed, it will be a set of tweets (can be replies or retweets) from people followed within the last 48h.
- It will be organized based off dateTime.
What is in the Elasticache:
- Dictionaries of the frequently visited profiles and tweets
- This will speed up the lookups of trending / popular tweets
Trade offs/Tech choices
Explain any trade offs you have made and why you made certain tech choices...
S3 buckets:
- This helps easily get us URLs attached to the various media that is out there. Media can also be larger if needed but I set limits to help limit the amount of media that is stored on our s3 buckets.
- There are multipart uploads if we want to remove limits, but storage can be expensive. We can set media that's older to be in cold storage for S3.
Elasticache:
- Helps with retrieving profiles or tweets that are popular for users more quickly and reduces read load on the DB. Since there will be 10s of millions of reads per second, this helps with postgres' limit of ~9mil connections.
Failure scenarios/bottlenecks
Try to discuss as many failure scenarios/bottlenecks as possible.
Future improvements
What are some future improvements you would make? How would you mitigate the failure scenario(s) you described above?