System requirements


Functional:

  1. User should be generate url for the entered text
  2. On hitting the url, user should be redirected the entered text
  3. By default the settings should show the expiry time of the url. The user can configure the expiry time of url




Non-Functional:

  1. latency should be less
  2. fault tolerance should be there



Capacity estimation

  1. 100 M DAU
  2. 10:1 read:write ratio
  3. 100kb/write
  4. QPS 12 mb/sec
  5. 65 GB total storage



API design

  1. post createPaste

{ textContent:"hfhvfhbvhf"

returns url http code 203

  1. getPastecontent

{url:}

return http code if expired

other wise text content




Database design

No SQL DB like mongo db that stores

{userId:

URL:

Paste content:CDN link

expiryTime:xxxx sec

}




High-level design

  1. LB between the user and url creation service
  2. consistent hashing algorithm for the load balancers
  3. url generation service stores the pastebin content in CDN. CDN will help in faster access of the pastebin content for geographical locations.
  4. url generator use base62 encryption. Length of url = 8 8^62 combination of urls
  5. the url and cdn links are stored database
  6. there is an expiry service. Batch job that checks the expiry of the urls and marks them as retired in the db





Request flows





Detailed component design

  1. Expiry service will be a batched job wherein the urls will be expired. The configuration for the user will be checked based on the urls will be expired.




Trade offs/Tech choices

  1. Base 62 encoding is used in place of SHA because Base 62 is more secure and generates shorter urls.
  2. Mongo DB is used in place of relational database because there is no relationship between data and consistency is not a hard and fast requirement
  3. User Id sharding can make the a particular shard to increase in size



Failure scenarios/bottlenecks


  1. The url generator or the readpastepin service can be a single point of failure. To avoid it the db should be replicated. The db should be sharded according to userid. DB partitioning can also be done.



Future improvements

What are some future improvements you would make? How would you mitigate the failure scenario(s) you described above?