My Solution for Design Twitter
by journey_omen351
Requirements
Functional Requirements:
- User should be able to create an account,sign in .
- User can post tweets
- User can follow/ unfollow
- Tweets can contain media/texts.
- User can see the timeline
- User can search tweets
- User can like / re-tweet
Non-Functional Requirements:
- Latency should be very less around <200ms.
- Highly scalable
- Highly available.(99.9%availablility).
- Eventual consistency is acceptable.
- Support hundreds of millions of users.
API Design
Define the APIs expected from the system. This is your chance to analyze and define the read and write paths so that you can come up with the high-level design...
High-Level Design
For load balancer we can use round-robin algorithm as it is fair , perfect for this scenario.
We have variety of load balancing algorithms like least connections , ip hash technique,round-robin algorithm.
For the search service we can keep an ELK which creates an inverted index and we can search the tweet we want
Here the tweet service will have all the CRUD operations like create a tweet and delete a tweet.So we kept the rate limiter so our api's wont get abused.
For the tweets we decided that lets keep no sql data base as the tweets will be bit unstructured and we dont need complex joins here.
For replies we will keep a different service as we dont want our replies loading lag the tweet service loading tweets as for a tweet there may be lot of repllies and loading the document will be slower.
The tweet content can be either text or media for media the storage will be so high that we are going to use S3 Amazon blob storage.
For storing the user data we can have a SQL database having the atomicity and all the ACID properties
For followers data we can have a graph data base it will be more scalable and we can have the nodes connected to each other.
For timeline service we can have fanout on write for all the users like whenevre there is a new post then all the followers of the user are fetched.Each user will have a cache and the cache will get updated if their followers have a new post.There will be list of workers which will fetch the tweets from message queue like redis and update the cache.But for a celebrity like taylor swift if the user had millions of followers fan-out-on-write wont work as the write will be huge so the process which will go is fanout on read where the users will fetch the data themselves.
Back of Envelope estimation
300M active users lets assume
As this is read heavy application so 150M DAU will be there.Lets take 500M tweets per day.
So 500M/100K = 5000tweets/sec
For read it will be 100x writes so 500K tweets/sec .
So for storage if the text is 100bytes we will have 500KB of data per second and 500KB*500K = 250MB data per day.