Requirements


Functional Requirements:


  • Create a short URL for a given long URL.
  • Return the long URL associated with a given short URL.
  • URLs can have an expiration. Default is last forever.
  • Creating a URL for one that already exists will return the existing URL and update it expiration



Non-Functional Requirements:


  • Low latency
  • Supports millions of URLs
  • Handles hot URLs
  • Thousands of requests per second
  • Read heavy
  • CAP theorem: Prioritize availability and partition tolerance. Updates are infrequent and can't result in a new ID
  • Value availability over latency. We'd rather the user get the content eventually rather than not at all but always have it be fast


Capacity Estimation

Estimate the scale of the system. Consider daily active users, read/write ratio, storage requirements, bandwidth, and any relevant QPS calculations...




API Design

Define the APIs expected from the system. This is your chance to analyze and define the read and write paths so that you can come up with the high-level design...


GET /url/{urlID} -> redirect to link

PUT /url -> ID


URL:

{

id: string,

url: string,

expiration: int

}


High-Level Design

Describe the overall system architecture. Identify the main components needed to solve the problem end-to-end. Use the diagramming tool to create a block diagram.


  • Client talks to API Gateway which handles AuthN, AuthZ, routing, load balancing, etc.
  • API Gateway talks to a severless function (e.g. AWS Lambda) which handles basic business rules and validation
  • Lambda talks to a database table. This is pretty simple document retrieval so I'll do NoSQL (e.g. DynamoDB)
  • We can use memcache for popular URLs. This will work for warm lambdas. If it is not working well, we can use a distributed cache for popular URLs


Database Design

Define the data model. Identify the main entities, their attributes, and relationships. Consider the choice of database type (SQL vs NoSQL) and justify your decision based on access patterns...


Using DynamoDB or similar NoSQL


Tables:

URL

{

id: string (UUID or similar), partition key

url: string,

expiration: int, ttl

}


Having the partition key on the URL object be the url string prevents duplicates. Expiration acts as a TTL so URLs get rotated out when they are temporary.


We can do DynamoDB PutItem to handle when the URL already exists in the DB


Detailed Component Design

Deep dive into 2-3 key components. Explain how they work, how they scale, discuss tradeoffs, capacity, and any relevant algorithms or data structures.


Everything is serverless so scaling will be handled for us.


IDs are a version of UUIDs that are short enough and unique enough for our use case. Collisions would be possible (although incredibly unlikely) but can be handled by returning a 500 if the DB insert does not work and having the client retry.