System requirements


Functional:

  • anonymous users create shortened URL from any user-provided URL
  • users visiting the shortened URL will be redirected to the fully qualified URL
  • no update, no deletion
  • definition of "short": maybe 10 characters after the domain name?


Non-Functional:

  • highly available, n 9s
  • eventual consistent -- uploaded URL will eventually be visible


Capacity estimation

  • Data size: URL < 1KB, 10M * 10 * 1KB --> 100GB/day
  • Long term: in 10 year would be 100GB * 365 * 10 = 365TB
  • QPS write: 10M DAU, average 10 shorten request per day --> 10M * 10 / ~100K = 1KQPS
  • QPS read: 1B click through shortened URL --> 1B / ~100K --> 10K QPS on average, peak might be magnitude more


API design

  • POST /api/v1/shortenURL
  • returns the shortened URL
  • GET /api/v1/redirect
  • returns 301 with fully qualified URL


Database design

Do not need transactions/ACID. loose relations. Use NoSQL/KV store


Since the size of the URL is small (kilobytes), no need for blob store

URLs have no strong relations, no need for structured store either

Can simply use NoSQL/KV store. But traditional SQL database is also fine.


Schema:

Key will be the shortened URL

Value will be

  • fully qualified URL
  • creation time
  • time to live


High-level design

  • Users: making requests to either shorten a URL or visiting a shortened URL
  • Gateway/load balancer: distribute load among API servers
  • API servers: sitting behind the load balancers / gateway servers to take service requests
  • hash function to shorten URL, and write to the master database
  • retrieve the full qualified URL from a slew of slave read-only database servers
  • Database (sharding)
  • hot-cold/master-slave, with the slaves serving the read requests and master serving the writes
  • Message queue
  • Caching
  • Within the datacenter, cache the popular redirect requests, to save a db lookup
  • CDN to cache redirects to popular URLs


Request flows

Explain how the request flows from end to end in your high level design. Also you could draw a sequence diagram using the diagramming tool to enhance your explanation...

Write (shorten URL):

  • Requests come off the load balancers and hit the corresponding API server
  • API server performs the hash off the fully qualified URL, get the shortened URL, and response to the client
  • API server separately makes an async call/write to the database, via the message queue, to create a record of shortened URL
  • On API server failure, client would make another request via the LB to hit a separate server. Same goes for the DB server, since messages are persisted on the queues


Read (redirect)

  • Redirect request, on cold cache, would hit DB to retrieve the records
  • Subsequent requests would hit cache



Detailed component design

Dig deeper into 2-3 components and explain in detail how they work. For example, how well does each component scale? Any relevant algorithm or data structure you like to use for a component? Also you could draw a diagram using the diagramming tool to enhance your design...






Trade offs/Tech choices

Explain any trade offs you have made and why you made certain tech choices...






Failure scenarios/bottlenecks

Try to discuss as many failure scenarios/bottlenecks as possible.






Future improvements

What are some future improvements you would make? How would you mitigate the failure scenario(s) you described above?