System requirements
Functional:
- post tweets
- read tweets
- post across one person or multiple person's or group's
Non-Functional:
- scalable
- availability
- acceptable latency 200ms
- consistency gets a hit that can be ok
Capacity estimation
- DAU
- 100 million daily users
- Write to read ratio
- 1:10
- Throughput
- 100 million/(86,200*30) = 250 tweets/sec
- Memory
- 20% cache
- 1 tweet = 500 bytes
- 100 million tweets * 500= 50 Gb/day
- 200GB
- replicate multiple times per region
- Storage
- 1 write tweet = 1.5mb
- 10 million tweets/day * 1.5 = 15 PB/day
- 15 PB*30*365 = 54PB/year
API design
- Get
- user id
- location (optional)
- tweet id
- date
- time
- Put
- successful post will return the tweet
Database design
- DB - GraphQL
- Tweet ID
- user ID
- content
- tweet - latitude
- tweet - longitude
- location - latitude
- locaiton - longitude
- date
- time
- favs
- User id
- user id
- location
- date
- last login
- followers
- user ID1
- user id2
- favourites
- tweet id
- user id
High-level design
- API gateway service
- Metadata
- tweets - write
- tweets - read
- pictures
- Data sharding
- Graph DB - Neo4j
- hash user id or tweet id to search the tweets across multiple servers
- Cache
- CDN - photos/videos
- Redis - tweets & metadata
- load balancer
- round robin method
- clients & app servers
- clients & caches
- we can use AWS load balancer to intelligently route the traffic and it can auto scale out of the box as required
- fault tolerance
- leader and follower servers
- if leader stops then pass it on to follower
- multiple availability zone
- Rate limiter
- security
- monitoring
- grafana tool to monitor the services along with kibana for analytics
Request flows
Explain how the request flows from end to end in your high level design. Also you could draw a sequence diagram using the diagramming tool to enhance your explanation...
Detailed component design
- DB's
- Graph DB - Neo4J or infinte graph
- Graph user service
- Fan out service
Trade offs/Tech choices
- NoSql db - cassandra is a heavy write db but we can join the entities to build relationships across users/followers/favorites
- Data sharding
- Graph DB - Neo4j
- User ID
- We can plan to have all the data in one DB
- We can hash each user data and plan to store their tweets in different servers
- the challenge is what if me might encounter a hot key which might be difficult to hold the load that particular user in one server
- over a period of time it will be difficult to maintain the tweets in one db. Maintaining a uniform distribution might be difficult
- We have try a different approach to solve this problem
- Tweet ID
- What if we hash the tweet id and store them across multiple servers and pull them as required
- To search tweets we have to query all the server to pull the results
- we can do that by using an aggregator server where the results will be aggregated for the user
- This will solve the hot key problem but we might end of querying all the db's which might cause an performance issue
- We can try a different approach to further improve the performance of the query
- Tweet creation Time
- we can store the tweets based on the creation time but the load will be heavy on one server and rest of them will remain idle
- the traffic won't be evenly distributed across the servers
- Epoch time
- merge tweet id & creation time with an auto-increment
- we can figure out the shard number from this tweet id and store it there
- what could be the size of the tweet id if we want to store the data for next 50 years
- 86,400 sec/day*365*50 = 1.6b
- We require 31 bits to store the tweet creation time data
- Since the throughput are 1150 tweets/second then the remaining 17 bits to store the auto-incrementing number so this will make the tweet id 48 bits long
- we can store 2^17 = 130k new tweets/second
- we can reset the auto-incrementing sequence every second
- for fault tolerance and better performance we can have two servers to generate these number one holding even number and other the odd number
- if we make our tweet id 64 bits (8 bytes) long then this will help us to generate tweets for the next 100 years for millisecond granularity
Failure scenarios/bottlenecks
- what if the celebrity posts at weet which has millions of followers which becomes a bottleneck to read or share the tweet
Future improvements
- how to search tweets since it involves indexing and ranking associated with it