System requirements
Functional:
User should be able to
- Compose a tweet and post it
- Follow another user
- View tweets of users you follow in your home feed
- View feed of suggested content from accounts that is recommended based on popularity (not necessarily accounts you follow)
- Favorite other users' tweets
Non-Functional:
- Scalability: this system should be highly scalable. We want to be able to support many simultaneous users, potentially in different parts of the world
- Response time: response time should be as low as possible for reads, we can tolerate longer response time for writes
- Consistency: it is ok for there to be eventual consistency in this system. Users do not need to have newest updates instantaneously
- Security: we need to ensure that users are authenticated in order to post, make changes to their account, and access the content from the users they follow
Capacity estimation
Estimate the scale of the system you are going to design...
User Base:
- Let's assume daily active users (DAU) is 500 million.
Traffic:
We can calculate the traffic based on the number.
- Tweeting: each user tweets about 2 times per day, so 1 billion tweets per day total
- Home feed: each user loads their home feed about 10 times per day, 5 billion home feed loads per day total
- Favorite: each user favorites about 1 tweet per day, so 500 million favorites per day total
- Following: each user follows about 200 accounts, so 100 billion follow relationships total
Queries Per Second:
- write: 500m*2/3600/24= 15k/qps
- read: 500m*10/3600/24= 75k/qps
- Favorites: 500m*1/3600/24 = 7.5k/qps
Data size :
- Tweet: 1b tweets, each with 140 chars. considering encoding , let's assume 300 bytes. So total data is 280GB per day. It would be 100TB per year.
API design
Tweeting
- POST: user ID, content of tweet
Home Feed
- GET: user ID, page #
Following/Unfollowing
- POST: user ID, followed user ID
Favoriting/Unfavoriting
- POST: user ID, tweet ID
Database design
RDMS database with the following tables
Users table
- user ID (UUID)
- name (string)
- email (string)
- created at (timestamp)
- updated at
- etc.
Tweets table
- tweet ID (UUID)
- user ID (UUID)
- tweet content (string)
- created at (timestamp)
User Favorites table - many to many
- user ID (UUID)
- tweet ID (UUID)
- created at (timestamp)
- deleted at (timestamp)
User Followers table - many to many
- user ID (UUID)
- followed user ID (UUID)
- created at (timestamp)
- deleted at (timestamp)
High-level design
You should identify enough components that are needed to solve the actual problem from end to end.
Rate limiter
- Protect again DOS attacks and ensure fair usage
Load balancer
- use constant hashing to distribute load across servers
CDN
- Serve cached data to localized areas to minimize response time
Services
- Services to support major functions: user service (user management and following), tweet service (tweeting and favoriting), feed service (loading user feed)
Cache
- application level caching
Database
- use a relational database that is optimized to scale horizontally: amazon aurora, cockroach DB, etc.
Request flows
Explain how the request flows from end to end in your high level design.
- The client will send a request that will initially be handled by the rate limiter and load balancer.
- It will then be directed to the server and the CDN will deliver a response if possible
- If CDN miss, then request will be routed to the correct service
- The service interacts directly with the cache and the database to generate the response needed
Detailed component design
Dig deeper into 2-3 components and explain in detail how they work. For example, how well does each component scale? Any relevant algorithm or data structure you like to use for a component? Also you could draw a diagram using the diagramming tool to enhance your design...
Cache
- Use Redis cache
- Will use cache aside update strategy
- Use TTL invalidation strategy
- Use LRU eviction strategy
Services
- User service: handles all requests to create/update user accounts and handle following/unfollowing
- Writes associated with creating and following users will be done asynchronously to reduce response time and load on the server
- Tweet service: handles all requests to post and favorite tweets
- Writes associated with posting and favoriting tweets will be done asynchronously to reduce response time and load on the server
- Feed service
- Will check the cache for feed data for a particular user
- Otherwise will generate the feed data itself
- For home feed: queries the most recent tweets from followed users, results are paginated
- For popular feed: queries the most popular recent tweets across twitter based on like count, results are paginated
Database
- I will be using a combination of replication and sharding. There will be a master-slave replication pattern, with a number of slave instances to support the high volume of reads this system will handle.
Trade offs/Tech choices
Explain any trade offs you have made and why you made certain tech choices...
Database
- We could have chosen a NoSQL database as they are known to scale well horizontally and handle large amounts of data
- However, because it seems like the structure of Twitter's data is relatively stable and it seems like we will currently and increasingly need to represent out data in a relational way, I chose a RDBMS
- Additionally there are strategies that can help us scale even if we use a RDBMS
Populating the Feed
- There are 2 approaches to populating user's feeds: either pushing data to followers when a user posts or pulling all followed users data to display to a user when requested
- The pull strategy requires us to do some heavier processing in our service to collect tweets from all the accounts a user follows, but it allows us to avoid the scenario of pushing messages to a large number of followers every time a user tweets.
- The push model would also require us to save users' feeds (as the pushed messages need to go somewhere) and this would need additional storage
- I have chosen to proceed with the pull strategy to avoid the requirement for additional storage and to avoid building a message queue system
Failure scenarios/bottlenecks
Try to discuss as many failure scenarios/bottlenecks as possible.
- There may be some users with a very high number of followers that generate very high traffic when they post, our system will need to be able to handle these bursts
- If we do not manage resources effectively, we could get backed up with asynchronous job handling if we get many write requests at the same time
Future improvements
What are some future improvements you would make? How would you mitigate the failure scenario(s) you described above?
How to address issues outlined in the previous section:
- We can adjust our caching strategy to favor "hot" data so that we avoid making the full trip for these requests
Additional features we could add in the next iteration of the system:
- Improving the algorithm for what kind of content is surfaced on the home feed: based on previous engagement by the user with other content, etc.
- Allowing users to make their profiles private or public to control who can see their tweets
- Content moderation: flagging and/or removing inappropriate content
- Send/receive notifications whenever a followed user tweets, likes, or does any significant interaction