System requirements
Functional:
Users can create accounts and log in.
Users can follow and unfollow other users.
Users can post updates with text and images.
A user’s feed should display the latest posts from the users they follow, sorted by recency.
The feed should support pagination.
Non-Functional:
The system should handle a high number of concurrent users (scalability).
Ensure low latency for fetching the news feed (performance).
Posts should be highly available (availability).
Capacity estimation
Assuming a daily active user count to be 100M and read to write ratio as 10:1.
We can see that there will be 10M writes daily.
Assusming a post can have 280 characters and 2 images per post, 5 mb each.
Since we are allowing images to be uploaded. I'll be using amazon s3 or blob for image management and storage.
I'll be using NoSQL DB as primary storage for the system owing to high availablility for a social media platform and high scale of data that needs to be handled.
Here are the calculations for the storage:
Since a post can be 280 characters long, so 280 bytes of data per post. Adding other metadata and things, lets assume total data per post will be 1KB. Now, there are 10M writes per day. So total data obtained per day will be:
1KB*10M=10GB data per day
Also we will be having images as well. So total data for images: 10MB*10M~100TB.
Assuming we will be storing the data for 5years:
total data obtained over 5 years are:
data from posts:10GB*365*5=18TB
data from images: 100TB*365*5=~180PB
API design
Here are the most commonly used APIs for this system:
APIs for authentication:
1. /signup
Type: POST
Params: email, password, name, mobile No.
2. /signin(/login)
Type: POST
Params: email, password
APIs for feed:
1. /updates?userid={userId}&limit=20
Type: GET
2. /suggestion
Type: 'GET'
params: userId
3. /comment
Type: POST
params: userId, commentId, postId
4. /like
Type: 'POST'
params: userId, postId
APIs for user:
1. /follow
Type:PATCH
params: followed userId
2. /unfollow
Type:PATCH
params: unfollowed user id
3. /update
Type: 'POST'
params: postId
4. /deletePost
Type: 'DELETE'
params: postId
Internal APIs
1. /addFollower?userId={userId}
Type:PATCH
params: followed by userId
2. /removeFollower?userId={userId}
Type:PATCH
params: unfollowed by userId
3. /deleteImage
Type: 'DELETE'
params: imageIds for s3 location
Database design
Here are the data models that will be mostly required for this system:
1. User:
id: primary key
name
PhoneNo
password(stored in hashed format)
2. Posts:
id:primarykey
postedBy: userId(foreign key)
postedOn: Date
Images: image Ids from s3 buckets
Text: Post Text
Likes:[userId]
Comments:[commentIds]
3. Followers:
userId:foreign key(user model)
followerList:Array[userid]
4. Followings:
userId:foreign key(user model)
followingList:Array[userid]
5. Comments
postId:foreign key(Posts model)
postedBy:userId
postedOn:Date
text:comment text
High-level design
You should identify enough components that are needed to solve the actual problem from end to end. Also remember to draw a block diagram using the diagramming tool to augment your design. If you are unfamiliar with the tool, you can simply describe your design to the chat bot and ask it to generate a starter diagram for you to modify...
Request flows
Explain how the request flows from end to end in your high level design. Also you could draw a sequence diagram using the diagramming tool to enhance your explanation...
Detailed component design
1. Auth Service: Responsible for authentication and authorization of user. Uses APIs provided for
authentication purposes
2. Caching Service: Used for caching most frequently accessed data fro a user. Uses a DB like redis
for storing the cached data. As redis provides faster retrieval and others things thatt useful for caching.
Also uses cache eviction strategy like LRU.
3. Suggestion Service: Uses apis provided for suggestion. It will analyze the suer activity including things like
of the user, it's followers and following. Based on that user will be suggested new things on his/her feed.
Also suggestion for following another person.
4. CDN: Useful for cases where a user might be famous and has a lot of followers.
5. Notification Service: Although i might have missed that in the diagram, but notification service will be used
for notifying the user, about various activities that might be happening. This will be used by feed service
and suggestion service mostly. Whenever someone likes a users post or someone follows the user. User will be notified about it.
6. Feed Service: This will be most important part of our platform. Since it's latency has to be minimum and user must be provided
with this. So this service will scale out as the load or the amount of user to the platform increases. Basically this service is Responsible
for major activities that a user can perform. This will use APIs provided for feed. Here are some of the functions for this service:
(a) Getting user feed:- So in this case the user following list will be checked for most recent updates. ONce we get all the updates,
these activities will be sorted using quick sort or merge sort based on timeline for the activity. Which will then be displayed to the user.
(b) Tracking user activity: So whenever a user performs activity like following someone or liking someone's posts. The db needs to
be updated and other user needs to be notified. We may usev websockets here but that would increase load on our server but it will
provide the real-time update to the user.
Trade offs/Tech choices
1. Using NoSQL, might not provide highly consistent data as much as SQL but considering the high availablility demand as compared to consistency,
I will be using NoSQL.
2. Reading from read replicas might lead to less consistent reads.
Failure scenarios/bottlenecks
1. Feed service have to be carefully designed as it is the most important part of our system. Ensuring low latency is the most important
thing here.
2. If caching service fails, the load on our server might increase and will lead to increased latency
3. While ensuring availablility, we need to make sure that consistency is not lost
Future improvements
What are some future improvements you would make? How would you mitigate the failure scenario(s) you described above?