My Solution for Design a Platform Like Reddit with Score: 8/10. MY EYES!
by jubilee_vertex930
System requirements
Functional:
- post a message with a title, description and optional links/pictures/video
- comment on a message
- like or dislike a message
- create a sub topic
- join/leave a sub-topic
- homepage
- search for posts
- user account with roles for sub topic moderation
- filter a topic by most recent/most popular/hottest (fastest likes)
Non-Functional:
Availability is essential, a user should always be able to find their posts or see posts.
Scalability will have to me a main focus. The amount of data coming in will be enormous
Performance, less than on second to load a post but comments won't matter as much.
Observability would be good to keep track of expected throughput and data received. This will help enable our scaling systems
Capacity estimation
Users
5m * (userID 126b, username 40b, email 126b, topics[]*10 1kb, roles 100b) = 7GB
Posts
300,000 * 5000b = 1.5 GB/day
547.5 GB a year
Comments
2000000 * 1000 = 2Gb/day
715GB a year
API design
All apis will use Session Tokens. Session tokens will contain roles
User
/register(email, passwordHash)
/login(email, passwordHash)
/addRole(userId, role)
Topic
/createTopic(name)
/deleteTopic(name)
/joinTopic(userId, topicId)
/leaveTopic(userId, topicId)
filter_enum {
hottest,
most recent
popular
}
/filterTopic(topicId, filter_enum) {
posts[]
}
Post
/post(userId, topicId)
/getPost(postId[])
/deletePost(userId, postId)
/ratePost(userId, postId)
Database design
User DB: postgres, good querying capabilities, ACID compliant, transactional. Shard on the User Id when scaling the DB horizontally.
Metric DB: noSQL, Cassandra, keeps track of the postID and the like/dislike counter
Topic DB: postgres, queryable, ACID compliant, transactional. Shared on the topic ID when scaling the DB.
Topic Filter DB: noSql, mongoDB, we will need high writes and we will create a snapshot to add to a cache for the filtering.
Post DB: posgres, we will need to query posts and their content. Shard on topic ID, so that all post will be with in the same node.
Comment DB: noSQL, Cassandra for its high write capability and read performance.
Session DB: SQL, we need to confirm that session are removed when roles are changed. Auth changes should be immediate. This will add more load to our systems however
High-level design
API Gateway: will authenticate users, authorize user to call a specific API, and rate limit traffic.
Loda Balancer: Round robin, split traffic to ease load on services.
CDN: Useful for the homepage and display the most popular post on the platform.
Session Service
Creates a user session and verifies against the session DB.
User Service
Will enable user creation and role services.
Post Service
Creates posts for a given topic.
Metrics Service
Like/dislikes and views for a post, reads from kafka. The like and dislike can be eventually synced.
Homepage Service
Creates unique feeds for a user depending on the topics the followed and the most viewed/liked posts
Topic Service
Enables topic creation
Topic Ingestion Service
Will create snapshots of filters for each topic. In this case it would get the most recent, most viewed, and most liked across an hour. It will store this in the Topic Filtering db with a column for each enum
Comment Service
Will pull comments from a Kafka stream to be added to a post.
Request flows
A user logs in.
Thet hit the session service to authenticate and get a sessionToken stored in their cookies
A user loads the home page.
Hits the CDN and gets back the most popular posts
A user then select a topic to read.
Filters a topic which hits the topic filter cache that contains the posts.
Load all the posts.
User then opens a post.
Comments then begin to load from the comment service. We will only get as many comment as those that can fit on the page.
User like the post, calls the Metric service to add like and a view.
User feels inspired and creates a post. This hits the post service
Detailed component design
The topic ingestion service will create snapshots of the most popular and hottest posts. It will save this snapshot to the CDN where it can be accessed by millions. A topic ingestion service will also generate the hottest/most popular post for user. The home service will take a mix of both to create the feed.
Trade offs/Tech choices
The session token will be slower because each API will have to check against the session db before seeing if we can authorize the API. Vs a JWT where its all client based and will last however long we set the expiration time to. The session token will allow for immediate moderation
Using kafka will slow the view/like count and make it less real time. However, we do limit the load of the post service which is more important.
Failure scenarios/bottlenecks
The topic Ingestion service can get too much traffic because both the home service and the topic service will be calling it frequently. The goal would be for the home service to read from the Topic Filer DB is there is an up to date topic filter. However if there are no relevant snap shots the service can be bombarded.
Future improvements
Add likes on comments. Allow sorting of comments based on the same filters for topics.
Split up the topic ingestion service and utilize caches for popular topics that people join. This will allow for less hits to the topic ingestion service.