Return the long URL associated with a given short URL.
Non-Functional Requirements:
No need to store the same URL again if it already exists in the system;
High availability;
Low-latency as it's essentially a redirection;
Capacity Estimation
Reads are expected to be 10x the volume of writes;
Resources are completely independent and can be sharded/scaled horizontally without challenge.
API Design
GET shorten/{url}
Using a GET for ease of use with browsers, but PUT would be the most consistent with the API's behavior.
GET follow/{url}
High-Level Design
The client submits a URL to be shortened
Server hashes the URL using two hash algorithms:
One result will be used as DynamoDB's primary key.
The other will be used as the Sort key to further minimize collisions.
Since the stored DB keys are created by the URL itself, there's no chance for duplication.
The value stored in the DB will be URL along with the URL shortened version. The shortened version is then free to be controlled in a DB-agnostic way, possibly using a 3rd hashing configuration with a base64 encoding.
GSI on the shortened URL pointing to the original URL;
In case of a concurrent failure for partition key/sort key pair already exists during write, read column and verify if they match and retrieve the shortened URL (GSI population can take up to a few minutes, so we can't rely immediately on it).
Once URL successfully stored in the DB, populate a short lived cache for 15min for the newly created KV pair.
The client submits a shortened URL expecting to be routed to the original URL:
Checks for the shortened URL in the cache;
Checks for the shortened URL in the GSI table;
Database Design
Given the volume and the nature of queries, KV-type database is ideal for the use case, such as DynamoDB.
Hashing the URL as the primary key of the table with hash alg 1, and then using hash-alg 2 of the URL as sort key and then store the URL itself as the value.
DynamoDB is auto-scaling;
Detailed Component Design
DynamoDB does sharding based on the primary key and is auto-scaling.