Requirements


Functional Requirements:

  • Create a short URL for a given long URL.
  • Return the long URL associated with a given short URL, but specifically redirect to the long URL in the response - 302.
  • No management of URLs necessary, URLs will be indefinite


Non-Functional Requirements:

  • 100-200ms API response times
  • Highly available, at least 99.9%
  • Supports 100M to 1B users, can scale higher if needed
  • Shortened URLs are reasonable ~10 alpha numeric characters


Capacity Estimation

  • High read volume 1M QPS
  • Large database
    • 1 billion users
    • 10 URLs each on average
    • 10 billion URL entries
    • 10B shortURL
    • 100B longURL on average
    • ~200B entry --> let's say 0.5KB (generous)
    • 0.5KB (10^3) * 10B (10^10) = 0.5 10^13 B = 5TB




API Design

  • POST
    • URL v1/shorten
    • Body: longUrl
    • Response code: 200
    • Response: shortUrl
  • GET
    • URL v1/shorten?shortUrl=<shortUrl>
    • Response code: 302
    • Response: longUrl


High-Level Design

Clients connect over REST APIs to ALB which either serves static webpage via CDN, or directs to a backend stateless API server pool. For storage since our data is not relational we can aim for NoSQL DB. We can use a managed highly-scalable NoSQL database such as DynamoDB.




Database Design

NoSQL database since data is not inherently relational and we have need for very high read throughput. Since NoSQL we can use a managed DynamoDB like datastore that naturally partitions on primary key to avoid any complex shard logic.


PK: shortURL

Attributes: longURL, createdAt, updatedAt



Detailed Component Design


Hashing logic:

  • Lives in API layer
  • Needs to be able to hash longURL to shortURL
  • ShortURL is 10 alpha numeric characters
    • 36^10 possibilities ~ 10^15 ~ 1 trillion+ possibilities
    • Way more than enough
  • Use MD5 hash concatenated to 10 characters
  • Check for collisions still with strongly consistent conditional check on PK when storing