Designing A Simple Url Shortening Service A TinyURL Approach - System Design

Requirements

Functional Requirements:

Create a short URL for a given long URL.
Return the long URL associated with a given short URL.
Redirect to long URL when request for short URL received

Non-Functional Requirements:

low latency
high availability

API Design

POST /url?long=

Response:

Status Created

Content-Type string

Body

GET /:short

Response:

Status Redirect

should redirect to long URL

Status NotFound

if long URL not found for provided "short"

High-Level Design

Load balancer.

Distributing the load and route requests to API servers processing appropriate subset of URLs.

For scalability we may introduce sharding by short URL prefix (4 symbols for example) using consistent hash algorithm.

If corresponding upstream is dead, any API should be capable to respond correctly.

API.

Should be stateless and cache most recently used short URLs for performance.

When new short url is required it should be found in database or requested in Generator service.

Generator.

Its task is to keep some amount of "free" records in the database.

It can free existing records regularly if they are not accessed for too long. (But I won't describe this mechanism in details for now)

It should create new short_urls in the ascending order and increasing the length of short_url when capacity is exhausted. Starting length is 8. So system starts with 2^64 free short urls.

Database.

Should be optimized for reading.

Sharding for scalability.

Replication for high-availability.

Low Latency

If we want to reach even low latencies for geografically distributed clients, we should consider creating multiple zones and set up geo DNS which will route incoming request to nearest zone.

Since main use case for our system is reading - low latency is achieved by tuning LRU cache in API server.

High availability

Downtime of API server instances should not be a problem due to Load Balancer configuration - it should just route requests to alive instances.

Downtime of specific database nodes also should not be a problem since we set up replication and re-election of primary.

Scalability

We can increase number of API servers that will increase total cache capacity as well as number of simultaneous requests that system can process.

Adding more database shards should be possible if we use techniques like consistent hash in sharding, then only a small part of data should be migrated.

Detailed Component Design

# Use Cases

## Create

API Server receives long_url.

If a record for this long_url is found in database it is returned in response.

If no record found server retrieves any free record. To eliminate race it should be performed in atomic database operation.

If no free record found API server requests Generator service for new record. This should not happen often so using single instance of Generator seems reasonable.

## Read

API Server receives short_url.

It checks LRU cache.

If record is found then redirect happens instantly. We should tune cache to achieve desired low latency.
If record is not found API server queries the database (for performance and to avoid overloading primaries it may query replicas). In this case latency is higher, but it should happen only during API server cache warm up.

Either "Found", "Not found", "Internal error" response is sent.

# Database.

Record format: short_url, long_url, status, last_accessed

Sharding should be configured by prefix of short_url field.

Index [short_url_prefix, short_url] allows sharded and fast queries.

Requirements

Functional Requirements:

Create a short URL for a given long URL.
Return the long URL associated with a given short URL.
Redirect to long URL when request for short URL received

Non-Functional Requirements:

low latency
high availability

API Design

POST /url?long=

Response:

Status Created

Content-Type string

Body

GET /:short

Response:

Status Redirect

should redirect to long URL

Status NotFound

if long URL not found for provided "short"

High-Level Design

Load balancer.

Distributing the load and route requests to API servers processing appropriate subset of URLs.

For scalability we may introduce sharding by short URL prefix (4 symbols for example) using consistent hash algorithm.

If corresponding upstream is dead, any API should be capable to respond correctly.

API.

Should be stateless and cache most recently used short URLs for performance.

When new short url is required it should be found in database or requested in Generator service.

Generator.

Its task is to keep some amount of "free" records in the database.

It can free existing records regularly if they are not accessed for too long. (But I won't describe this mechanism in details for now)

It should create new short_urls in the ascending order and increasing the length of short_url when capacity is exhausted. Starting length is 8. So system starts with 2^64 free short urls.

Database.

Should be optimized for reading.

Sharding for scalability.

Replication for high-availability.

Low Latency

If we want to reach even low latencies for geografically distributed clients, we should consider creating multiple zones and set up geo DNS which will route incoming request to nearest zone.

Since main use case for our system is reading - low latency is achieved by tuning LRU cache in API server.

High availability

Downtime of API server instances should not be a problem due to Load Balancer configuration - it should just route requests to alive instances.

Downtime of specific database nodes also should not be a problem since we set up replication and re-election of primary.

Scalability

We can increase number of API servers that will increase total cache capacity as well as number of simultaneous requests that system can process.

Adding more database shards should be possible if we use techniques like consistent hash in sharding, then only a small part of data should be migrated.

Detailed Component Design

# Use Cases

## Create

API Server receives long_url.

If a record for this long_url is found in database it is returned in response.

If no record found server retrieves any free record. To eliminate race it should be performed in atomic database operation.

If no free record found API server requests Generator service for new record. This should not happen often so using single instance of Generator seems reasonable.

## Read

API Server receives short_url.

It checks LRU cache.

If record is found then redirect happens instantly. We should tune cache to achieve desired low latency.
If record is not found API server queries the database (for performance and to avoid overloading primaries it may query replicas). In this case latency is higher, but it should happen only during API server cache warm up.

Either "Found", "Not found", "Internal error" response is sent.

# Database.

Record format: short_url, long_url, status, last_accessed

Sharding should be configured by prefix of short_url field.

Index [short_url_prefix, short_url] allows sharded and fast queries.