Designing A Simple Url Shortening Service A TinyURL Approach - System Design

Requirements

Functional Requirements:

Create a short URL for a given long URL.
Return the long URL associated with a given short URL.

Non-Functional Requirements:

Low latency
High consistency
Scalable to 1 lakh req / s

API Design

GET /urls/?url=

Status code: 302 redirect

{

"redirected_url":

}

POST

/urls/

{

"url": "long format url",

"short_code": ,

"expiry":

}

return {

"url":

}

1. API Architecture & Request Lifecycle

POST /api/v1/shorten (URL Creation)

Ingress & Traffic Control: Requests land on the API Gateway, which handles global rate limiting using a Token Bucket Algorithm to mitigate multi-IP abuse. Traffic is then distributed via a Load Balancer to horizontally scalable application nodes.
ID Generation Strategy:
- Custom Short Codes: If a user provides a custom short_code, the application validates it against a distributed Cuckoo Filter (which allows deletions) to quickly check for existence. If it passes, a strong database unique constraint handles final concurrency mitigation.
- System-Generated Codes: If no custom code is provided, the system utilizes a coordinated, lock-free Snowflake-inspired generator optimized for short lengths, converting the output using Base62 encoding ([a-zA-Z0-9]) to guarantee zero distributed collisions.
Storage Write: The record is written to the primary database while concurrently seeding the hot cache layer.

GET /{short_code} (URL Resolution)

Edge Caching (CDN): Viral and highly repetitive links are cached and served directly from location-based CDNs to offload traffic from core infrastructure.
Cache Penetration Mitigation: On a CDN miss, the request hits a localized Cuckoo Filter. If the filter returns a negative response, the system immediately drops the request and throws a 404 Not Found, protecting the downstream database from malicious invalid-link floods.
Hot Cache Layer: Valid requests check a Redis cluster containing a sliding window of recent links (24-hour TTL). This handles an estimated 70% of standard traffic.
Database Fallback: Cache misses query indexed read-replicas.
Redirection Status Code: The system responds with a 302 Found (Temporary Redirect) status code instead of a 301. This forces client browsers to check the server on every hit, ensuring that URL expirations, metrics tracking, and rate limits are enforced in real time.

2. High Availability, Data Partitioning & Expiration

Data Partitioning & Global Replication

To handle viral, cross-region traffic without cross-continental database latency loops, the system avoids strict geo-location IP pinning. Instead, it utilizes a Single-Leader, Multi-Region Replication topology. Writes are processed in a primary region and asynchronously replicated to read-replicas worldwide, ensuring ultra-low latency reads globally.

URL Expiration Mechanics

Soft Expiration: Every URL record contains an expires_at timestamp. Read operations evaluate this field inline; if the current time exceeds the expiration threshold, a 404 is returned immediately, and the associated Redis key is purged.
Hard Cleanup (Storage Management): To avoid heavy PostgreSQL table bloat, dead-tuple fragmentation, and intensive background VACUUM locks caused by mass row deletions, the database is partitioned by time (e.g., daily or weekly tables). Expired data blocks are cleanly removed using low-overhead DROP TABLE commands on older partitions.

Key System Highlights

Base62 Encoding: Keeps generated tokens highly readable and compact.
Cuckoo Filter Optimization: Used on both GET and POST paths to block invalid queries and handle dynamic URL updates/deletions seamlessly.
High Availability: Achieved via stateless, horizontally autoscaling application layers backed by distributed Redis caching.
Strong Consistency: Enforced on the write-path via a coordinated Snowflake-based structure to completely eliminate collision vectors across distributed data nodes.

Detailed Component Design

Availaibility perspective:

Since we have redis layer on top of it , it can scale to 1m req / s

Also cannot induce CDN since we want strong consistency

Tradeoffs:

No tradeoffs since we have extension in place to handle expiration.

Snowflake package to handle unique short code generation

Concurrency handling:

Concurrent calls will not usually collide since in packages like snowflake it used multiple parameters to create a hash Id

Thundering herd problem:

We might have issue of redis miss cache for multiple requests, hence we would be introducing rate limiting as well so that LB does not allow request above certain limit to exceed