System requirements
Functional:
- User should be generate url for the entered text
- On hitting the url, user should be redirected the entered text
- By default the settings should show the expiry time of the url. The user can configure the expiry time of url
Non-Functional:
- latency should be less
- fault tolerance should be there
Capacity estimation
- 100 M DAU
- 10:1 read:write ratio
- 100kb/write
- QPS 12 mb/sec
- 65 GB total storage
API design
- post createPaste
{ textContent:"hfhvfhbvhf"
returns url http code 203
- getPastecontent
{url:}
return http code if expired
other wise text content
Database design
No SQL DB like mongo db that stores
{userId:
URL:
Paste content:CDN link
expiryTime:xxxx sec
}
High-level design
- LB between the user and url creation service
- consistent hashing algorithm for the load balancers
- url generation service stores the pastebin content in CDN. CDN will help in faster access of the pastebin content for geographical locations.
- url generator use base62 encryption. Length of url = 8 8^62 combination of urls
- the url and cdn links are stored database
- there is an expiry service. Batch job that checks the expiry of the urls and marks them as retired in the db
Request flows
Detailed component design
- Expiry service will be a batched job wherein the urls will be expired. The configuration for the user will be checked based on the urls will be expired.
Trade offs/Tech choices
- Base 62 encoding is used in place of SHA because Base 62 is more secure and generates shorter urls.
- Mongo DB is used in place of relational database because there is no relationship between data and consistency is not a hard and fast requirement
- User Id sharding can make the a particular shard to increase in size
Failure scenarios/bottlenecks
- The url generator or the readpastepin service can be a single point of failure. To avoid it the db should be replicated. The db should be sharded according to userid. DB partitioning can also be done.
Future improvements
What are some future improvements you would make? How would you mitigate the failure scenario(s) you described above?