Requirements


Functional Requirements:


  • Create a short URL for a given long URL.
  • Return the long URL associated with a given short URL.



Non-Functional Requirements:


  • High Availability
  • Low-latency redirects


API Design

  • POST /shortendURL passing the long URL in the body of the request
  • GET /<short_hash> returns the long url to redirect to



High-Level Design

  • Application to be hosted in a cross-region autoscaling group behind a loadbalancer
  • SSL certificated attached to the LB provide encryption in transit, SSL terminates at the LB.
  • The application would take a URL and append a random salt to the url, then use take an MD5 hash of the concatenated string, and then truncate the url
  • hashes from URLs that are created are first checked against the key/value db for existing urls, and collision checks. If the values are identical, then it's a duplicate -> just return the already generated hash, if the values are different then it's a collision, append a new random salt to the original url and repeat the process
  • The key would be stored as the shortened hash, and the value is the long url.
  • When user does a GET with the short hash, the application checks the key/value store with the hash in the url and then returns a 3xx redirect with the 'Location' header set to the value in the hash table. if no value in the key/value store then return a 404 not found
  • the key/value store is responsible for storing the shortened hashes as keys and the long urls as values, this is essentially a hash table which is incredibly quick to query, meaning latency would be low.
  • We could add read replicas of the key/value store to cope with scaling
  • HA at the db level would require a more complex distributed data layer spread over multiple regions, with master writer being elected out of the group of writers



Detailed Component Design

  • The client would be a lightweight web frontend that with a text box you can paste a long url into, then it will return the newly created short URL e.g. https://<domain>/<short_hash>
  • the serverside application would have two endpoints:
    • POST /shortenedURL -> returns a string https://<domain>/<short_hash>
    • /<short_hash> -> returns a 3xx redirect response to the long url as the 'Location' header, which forces the user's browser to redirect to that long url.
  • cache layer -> key/value store would be a redis cache