Requirements
Functional Requirements:
- Create a short URL for a given long URL.
- Return the long URL associated with a given short URL, but specifically redirect to the long URL in the response - 302.
- No management of URLs necessary, URLs will be indefinite
Non-Functional Requirements:
- 100-200ms API response times
- Highly available, at least 99.9%
- Supports 100M to 1B users, can scale higher if needed
- Shortened URLs are reasonable ~10 alpha numeric characters
Capacity Estimation
- High read volume 1M QPS
- Large database
- 1 billion users
- 10 URLs each on average
- 10 billion URL entries
- 10B shortURL
- 100B longURL on average
- ~200B entry --> let's say 0.5KB (generous)
- 0.5KB (10^3) * 10B (10^10) = 0.5 10^13 B = 5TB
API Design
- POST
- URL v1/shorten
- Body: longUrl
- Response code: 200
- Response: shortUrl
- GET
- URL v1/shorten?shortUrl=<shortUrl>
- Response code: 302
- Response: longUrl
High-Level Design
Clients connect over REST APIs to ALB which either serves static webpage via CDN, or directs to a backend stateless API server pool. For storage since our data is not relational we can aim for NoSQL DB. We can use a managed highly-scalable NoSQL database such as DynamoDB.
Database Design
NoSQL database since data is not inherently relational and we have need for very high read throughput. Since NoSQL we can use a managed DynamoDB like datastore that naturally partitions on primary key to avoid any complex shard logic.
PK: shortURL
Attributes: longURL, createdAt, updatedAt
Detailed Component Design
Hashing logic:
- Lives in API layer
- Needs to be able to hash longURL to shortURL
- ShortURL is 10 alpha numeric characters
- 36^10 possibilities ~ 10^15 ~ 1 trillion+ possibilities
- Way more than enough
- Use MD5 hash concatenated to 10 characters
- Check for collisions still with strongly consistent conditional check on PK when storing