Requirements


Functional Requirements:


  • Create a short URL for a given long URL.
  • Return the long URL associated with a given short URL.



Non-Functional Requirements:


  • No need to store the same URL again if it already exists in the system;
  • High availability;
  • Low-latency as it's essentially a redirection. Users should not wait longer than 0.5s to create a redirection or 0.3s to check for a shortened URL.
  • Horizontally scalable, since we're expecting very large volume and traffic.


Capacity Estimation

  • Reads are expected to be 10x the volume of writes;
  • Resources are completely independent and can be sharded/scaled horizontally without challenge.


API Design

  • GET shorten/{url}
    • Using a GET for ease of use with browsers, but PUT would be the most consistent with the API's behavior.
  • GET follow/{url}
    • Used to decode the shortened URL into the original URL if it exists.


High-Level Design

  • The client submits a URL to be shortened
    • Server hashes the URL using two hash algorithms:
      • One result will be used as DynamoDB's primary key.
      • The other will be used as the Sort key to further minimize collisions.
      • Since the stored DB keys are created by the URL itself, there's no chance for duplication.
      • The value stored in the DB will be URL along with the URL shortened version. The shortened version is then free to be controlled in a DB-agnostic way, possibly using a 3rd hashing configuration with a base64 encoding.
      • GSI on the shortened URL pointing to the original URL;
      • In case of a concurrent failure for partition key/sort key pair already exists during write, read column and verify if they match and retrieve the shortened URL (GSI population can take up to a few minutes, so we can't rely immediately on it).
      • Once URL successfully stored in the DB, populate a short lived cache for 15min for the newly created KV pair.
  • The client submits a shortened URL expecting to be routed to the original URL:
    • Checks for the shortened URL in the cache;
    • Checks for the shortened URL in the GSI table;



Database Design

  • Given the volume and the nature of queries, KV-type database is ideal for the use case, such as DynamoDB.
  • Hashing the URL as the primary key of the table with hash alg 1, and then using hash-alg 2 of the URL as sort key and then store the URL itself as the value.
  • DynamoDB is auto-scaling;



Detailed Component Design

  • ID generation:
    • Server hashes the to-be-shortened URL using two hash algorithms:
      • One result will be used as DynamoDB's primary key.
      • The other will be used as the Sort key to further minimize collisions.
      • Since the stored DB keys are created by the URL itself, there's no chance for storing the same value twice;
  • Caching:
    • Newly created shortened URLs are auto-added to the cache with a short TTL (15 min);
    • Whenever having to reach the DB, cache is populated again.
  • Rate-limiting:
    • Limiting access on client IP using common rate-limiting designs such as token bucket to handle traffic bursts.
  • Partitioning:
    • Since the key is on the URL itself, we have maximum flexibility on the isolation of hot-partitions. Since we're also utilizing a cache, we don't really need to worry about read replicas just for the sake of read performance. The sharding should be enough.