Requirements
Functional Requirements:
- Allow users to tweet messages up to 140 characters.
- Enable users to follow other users.
- Allow users to like tweets from other users.
- Display tweets from followed users in the home feed.
- Show top K popular tweets in the home feed based on likes and followers.
Non-Functional Requirements:
- List the key non-functional requirements (eg low latency, scalability, reliability, etc.)...
- Scalability would be the first thought on this problem. There could be peak hours when many users are online at the same time, so we would want a load balancer to help scale the requests to the servers
- We would want to auto-scale and horizontally scale out the servers based on demand
- We could partition/shard the database for faster read times.
- Reliability/availability: we should have a replica database that would handle all the reads from services, and the primary database handles the writes
- The replica databases might also help during downtimes, where we can move the read/write operations to whatever database is available
- For high availability, we might want to consider a microservice architecture instead of a monolith, so that if one service goes down, the other services can maintain availability.
- Latency: Aggressive caching on the FE and BE
- Using a cache like Redis to store
API Design
Define the APIs expected from the system. This is your chance to analyze and define the read and write paths so that you can come up with the high-level design...
There are a few major functions in this app
- Following friends
- API Endpoint: post /user/:id/follow/:user_id
- Posting tweets
- API Endpoint: post /tweet/create
- Pass in user id in the request body
- API Endpoint: post /tweet/create
- Liking tweets
- API endpoint: post tweet/like
- Pass in tweet id and user id
- API endpoint: post tweet/like
- Seeing popular tweets and your friends' tweets on your feed
- API endpoint: /user/feed
- We need some sort of algorithm to calculate the popular tweets
- We need some sort of algorithm to rank your friends' tweets
- Both these algorithms should take into account the recency of the tweet and the number of likes it has received
We can summarize these as the following:
- Taking actions on the site to interact (i.e., liking a post, tweeting, following users)
- Getting an updated feed
Models
- Table for users
- ids, emails, relation to followers, relation to followed by users
- Table for tweets
- id, content, user_posted, likes, created at
- Table for likes
- id, tweet id, user id, created at
High-Level Design
Describe the overall system architecture. Identify the main components needed to solve the problem end-to-end. Use the diagramming tool to create a block diagram.
Design for the Posting service
- Client sends a request to the web server
- A load balancer processes requests to ensure distribution
- The server processes the request once it is authenticated and rate-limited (we may have bad actors trying to take down our system by sending too many tweets, too many follows, etc).
- Posting a tweet service, liking a tweet service, follow user service
- These services should also trigger a fan-out service which will asynchronously update user feeds
- This can be a pub/sub relationship since we don't want to update user feeds until we refresh
- The fanout service will also need to read from the Users DB to update the relevant user's followers' feeds
Design for the feed:
- Client sends a request
- The load balancer receives the request and distributes it to the appropriate server. It should also rate limit and authenticate the user
- The server sends a request to the feed service
- The feed service reads the likes DB and the users DB
- The service should use an algorithm to determine the most popular tweets of the day, ranked by likes and recency
- The service should read from the logged-in users followers and run the ranking algorithm on their follower tweets
Detailed Component Design
Deep dive into 2-3 key components. Explain how they work, how they scale, discuss tradeoffs, capacity, and any relevant algorithms or data structures.
The fanout service uses a message queue to publish and subscribe to any changes that would impact the user's feed. We don't need real-time responses, so we send the request to a message queue to fan out workers to work on updating the feed cache. The fanout service should also check the user DB to update the user's followers' feeds. The fanout service requires a read to the User DB to ensure that the users are getting their followed users' tweets.
The feed service will need to search for the latest tweets with the highest likes; maybe we can do a search timeboxed to tweets posted today and sort them by likes and then by the time posted. We would only need a feed cache instead of a feed database. There should likely be two caches, one for your followers' ranked tweets and one for the globally ranked tweets so we don't rerun the feed service unnecessarily. We should also consider how frequently we want to run this service, since it's possible during peak times to have your followers post many tweets at once so we likely want to add a timeout on the FE to trigger the feed service, or a Redis lock on the backend to only run the service at a certain frequency.