System requirements
Functional:
- User is able to store text online and get a short link
- User is able to retrieve the text from a short link
- The text will be avaiable for a period of time - 7d
- The text size needs to be under 2KB and one user can have 10 different texts at the same time
Non-Functional:
- Support many users - 100M DAU
- Low latency for read and write - < 100ms
- No double links - two different users won't get duplicated short links for different content
- consistency >> availability
- 99.99% availability
Capacity estimation
- QPS:
- 100M DAU - 3 usage per day * 100M / 24*3600 = average 3K QPS
- peak QPS - 6K
- Storage
- For text: 2KB * 100M * 10 = 2PB
- For short links: 14B * 100M * 10 = 14GB
API design
- Create shortlink for text
POST /pasteBin/ -> shortLink
{
text : string
}
- Get text from short link
GET /pasterBin/:pasteId -> text
Database design
- We choose to use NoSQL for the database for better throughput and scalability. (We don't need transaction support from SQL)
- We can use DynamoDB
pasteId, string --- partition key
text , string
createdAt, ts
updatedAt, ts
High-level design
client -> LB -> pasteBinService -> DynamoDB <- shortLinkGenerator
-> Cache(redis)
Request flows
Write Flow:
- we can use L4 LB to split the traffic based on ip
- pasteBinService is used for find a shortLink(pasteId) for user and store the mapping of shortlink and the text
- We can store the available shortlink in cache to speed up the write request
- we can have a shortLinkgenerator which is a offline job - can be triggered by CDC by mornitoring available number of short links in cache
client -> load balancer -> pasteBinService -> Cache -> shortLinkGenerator -> DynamoDB
Read Flow:
- When a read request comes in, we can check the cache first
- If cache hit, return the text
- If cahce miss, we need to query the db and can trigger a write back to cache
client -> load balancer -> pasteBinService -> Cache -> DynamoDB
Purge Flow
- This is when we need to clean up the outdated text (> 7d)
- We can have a cron job will run periodically (once per day) to scan the DB and cache to invalid the text
Detailed component design
For write flow, to avoid duplicate shortLinks, we can have distributed lock. When a request comes in a shortLink is picked up to map the link to the text, we can lock the link, so other requests can not pick up the same link.
Another way to avoid this is when we write to the db, we check the status of the cache to see if it already taken by other.
We can have a write thorugh cache, write to the cache first and then trigger an event to write to DB to get consistency.
The cache schema will be:
{
pasteId, string --- partitionKey
text, string
}
Trade offs/Tech choices
By using cache, we add complexity but would like to achieve better latency and we also need to handle the consistency between cache and db.
Failure scenarios/bottlenecks
If we have a high traffic that consumes the shortLinks quickly. The shortLinkGenerator needs to be run in a event driven mode, not a cron job, to avoid high waiting time when the links exhausts.
If we hit a duplicated shortLink (means that when the service picked up a link for a text and trying to write to cache/db, we find that it is aleady taken).
Instead of fail immediately, we can pick up a new one for it.