System requirements
Functional:
- 100k MAU
- 5 shortens per month per user
- allow users to create shortened URLs and redirect to original long URLs.
- Allow management of URL category, and toggle revokation of URL access.
- Custom expiry time (default to 5 years).
Non-Functional:
- High avaiability (99.999%)
- Eventual consistency
- Scalability
- Low latency
Capacity estimation
500k total shorten requests per month
30M URL mappings in storage
assume average 60 bytes per mapping
1.8GB total storage needed
API design
"Shorten URL"
POST /v1/urls/shorten
params: {
long_url
token
custom_expiration_duration
custom_category
}
returns: {
short_url
}
"Get URL info"
GET /v1/urls/info
params: {
url (short or long)
is_long_url: bool
token
}
returns: {
short_url
long_url
expiration_date
category
}
"Redirect"
returns a redirect code and the long URL in the header. Don't allow client caching of redirection so we can capture all clicks for analytics.
"Update URL info"
Allow user to specify new expiration date and category.
Database design
- URL Mapping DB
- short URL (PK), long URL, exp_date, category, user_id
High-level design
- API Server: Manges requests (stateless)
- URL Generator Service: Generates a new unique short URL for a given long URL.
- URL Mapping DB: relational DB (read heavy) with read thru cache. Use bloom filter to accelerate failed queries.
Request flows
Explain how the request flows from end to end in your high level design. Also you could draw a sequence diagram using the diagramming tool to enhance your explanation...
Detailed component design
- Short URL generation
- Generate unique ID
- ID can contain index and server number fields to ensure uniqueness across many generator servers if scaling is needed
- Convert to base62 representation to improve compactness.
Trade offs/Tech choices
- The auto incrementing ID can be a security concern as it could be easy to guess other short URLs. We could solve this by skipping large swathes of values in each increment, but this would lead to slightly larger short URLs. The alternative is using a hash of the long URL, which would lead to more distributed short URL values, but could require hash conflict resolution which would hinder performance and could add complexity at scale.
- A write thru cache could be more performant, but the read thru is better when we consider possible failures. If the cache goes down, we can bring up another one that will reload as reads are performed.
Failure scenarios/bottlenecks
- Load balancer, API server, URL generator service, and URL mapping database are all scalable.
- DB could run out of capactiy if our usage increased 5+ orders of magnitude, but this seems unlikely. However, we might need to shard the DB much sooner to keep read performance up.
Future improvements
- Hypothetically, if we needed to partition the DB, we could use a hash of the short URL as the shard key. We could either shard the cache as well, or keep one MFU cache that is asynchronously updated after DB writes.