Requirements
Functional Requirements:
- Users should be able to create new pastes of text, code etc.
- Users should be able to search and view pastes created by them and others
- Users should be able to delete pastes
- Pastes should get deleted after their TTL expires
Non-Functional Requirements:
- The application should be highly available
- Creating new pastes should be fast and happen in less than 1 second
- Search should be semantic and run in under 50 milli second
API Design
We will need the following endpoints:
- /createUser - creates a new user
- /createPaste - creates a new paste
- /editPaste - edits an existing paste
- /deletePaste - deletes an existing paste
- /viewPaste - shows details of a paste
- /search - returns semantically related pastes
High-Level Design
Capacity estimates:
For a user we store user name, password so each record will be less than 100 Bytes. Assuming 100,000 users, this will be less than 100 MB.
For a post we store created on, TTL, created by, paste body. Paste body can be very small to very big depending on post. Let us assume on average a post will be 5 MB, if we end up storing 500k posts we will need 2.5 TB of storage. Due to the size of our data an in memory database like Redis does not make sense.
Most of our data is going to be of blob type mainly paste text body. We can use MongoDB to store this data and efficiently query it. MongoDB also gives us inbuilt TTL expiry capability, which we will use to delete expired posts.
Application components:
- Load balancer to distribute traffic
- Servers running application logic
- Database
Design:
We will design stateless services for each of the endpoint, so that we can easily scale our app depending on traffic. The database is the biggest bottleneck, so it makes sense to shard our database. Sharding on user_id makes sense as all pastes of a user should be grouped together for faster access, although this will slow down searching. To counter this we can save search results in a Redis cache to improve performance. We can also cache hot pastes on CDNs for faster access and update the CDNs once everyday.