Designing A Simple Url Shortening Service A TinyURL Approach - System Design

Requirements

Functional Requirements:

Shorten URL: Given a long URL, generate a unique short URL (e.g., tinyurl.com/abc123). The short code should be as compact as possible while guaranteeing uniqueness.
Redirect URL: Given a short URL, redirect the user to the original long URL. This is the core operation and must be fast. A slow redirect defeats the entire purpose of the service.
Custom aliases (optional): Allow users to choose a custom short code (e.g., tinyurl.com/my-brand) instead of a system-generated one. This adds namespace management complexity but is a common product requirement.
Link expiration: Support an optional time-to-live (TTL) so links automatically expire after a set duration. Expired links should return a 404 or an error page, not redirect to stale content.

Non-Functional Requirements:

High availability: The redirect path must be available at all times. A broken short link damages user trust permanently because the user has no way to determine the original destination. URL creation can tolerate brief outages, but redirects cannot.
Low latency: Redirects must complete in under 10ms. Users expect clicking a link to feel instant, and any perceivable delay suggests the link is broken or suspicious. URL creation can tolerate higher latency (a few seconds) since users expect the generation process to take a moment.
Horizontal scalability: The system must scale to billions of stored URLs and tens of thousands of redirects per second. Growth should be handled by adding machines, not by upgrading to a bigger server.
Durability: Once created, a URL mapping must never be lost. Short URLs get embedded in emails, documents, social media posts, QR codes, and printed materials. A lost mapping means every instance of that link across the internet breaks silently with no way to recover.

Capacity Estimation:

Assume the system handles:

~200 new URLs per second (~17 million/day)
~20,000 redirects per second (based on ~100:1 read-to-write ratio)

Over 5 years:

Total URLs ≈ several billion entries

Storage estimation:

Short URL (~8 bytes) → ~14 GB
Long URL (~500 bytes avg) → ~900 GB
Metadata (~100 bytes) → ~180 GB
Total storage ≈ ~1 TB over 5 years (excluding replication)

Storage estimation (per URL ≈ 256 bytes including metadata):

- Daily growth ≈ 4.4 GB

- Over 5 years ≈ 8–10 TB (with buffer)

Bandwidth:

- Write traffic is minimal (~50 KB/sec)

- Read traffic dominates (~5 MB/sec)

Conclusion:

Storage and bandwidth are manageable at scale, but the system is highly read-heavy, so caching (CDN + Redis) is critical to handle high traffic and ensure low-latency redirects.

API Design

Create Short URL

POST /api/v1/urls

Request body:

longUrl (required): The destination URL to shorten
customAlias (optional): A user-chosen short code instead of a system-generated one
expiresAt (optional): ISO 8601 timestamp for when the link should stop working

Response (201 Created):

shortCode: The generated or custom short code (e.g., abc123)
shortUrl: The full short URL (e.g., https://tinyurl.com/abc123)
longUrl: Echo of the original URL for confirmation
createdAt: Timestamp of creation
expiresAt: Expiration timestamp, if set

Authentication: API key in the Authorization header. Every write request must be associated with a registered account for rate limiting and abuse tracking.

Rate limit: 100 URLs per hour per API key for free tier. Higher limits for paid accounts. Returns 429 (Too Many Requests) when exceeded with a Retry-After header indicating when the client can try again.

Validation: The server should validate that the longUrl is a well-formed URL with a supported scheme (http or https). Reject URLs with unsupported schemes (javascript:, data:, ftp:) to prevent abuse. Optionally, check that the URL is reachable with a HEAD request, though this adds latency and is better done asynchronously.

Redirect

GET /shortCode

Returns a 302 (Found) response with the Location header set to the original long URL. The browser follows the redirect automatically. No authentication required. Anyone with the short URL can follow it. This is intentional. Short URLs are shared publicly and must work for everyone who clicks them.

If the short code does not exist or has expired, returns 404 (Not Found) with a JSON body explaining the error. For expired links, include a message indicating the link has expired rather than simply saying "not found." This helps users understand what happened and reduces confusion.

For SEO and social media compatibility, the redirect response should also include standard headers: Cache-Control to control CDN behavior, and optionally X-Robots-Tag to tell search engines whether to index the short URL or the destination URL.

301 vs 302: Why It Matters

This choice has significant architectural implications:

301 (Moved Permanently) tells the browser to cache the redirect. The next time the user clicks the same short link, the browser goes directly to the long URL without contacting the shortener at all. This reduces server load dramatically. The trade-off: you lose visibility into every click. You cannot track analytics, you cannot update the destination URL after creation, and you cannot enforce expiration because the browser never checks back.

302 (Found) tells the browser the redirect is temporary. Every click goes through the shortener. This enables click analytics (you see every redirect), destination updates (change where the link points), and expiration enforcement (the server checks TTL on each request). The trade-off: higher server load because every click hits your infrastructure.

For most URL shorteners, 302 is the right default. The analytics and flexibility outweigh the server load increase, which is handled by CDN caching anyway. The CDN caches the 302 response at edge locations, giving you the load reduction of 301 with the flexibility of 302.

Rate Limiting

Without rate limiting, a single malicious or misconfigured client could create millions of URLs, exhausting the short code space and consuming storage. Rate limiting protects shared resources and ensures fair usage. Implementation: per-API-key counters with sliding window limits, enforced at the API Gateway layer before requests reach the Shortening Service.

High-Level Design

The architecture separates into two distinct paths, each optimized for its unique requirements.

Read (Redirect) path: Client click --> CDN --> API Gateway --> Mapping Service --> Redis Cache --> DynamoDB. Optimized for sub-10ms latency with multiple caching layers. The CDN handles the hottest URLs at edge locations. Redis handles the warm set in the data center. DynamoDB is the durable fallback for cache misses.

Write (Shorten) path: Client request --> API Gateway --> Shortening Service --> Message Queue --> Queue Worker --> DynamoDB. Optimized for reliability and database protection. The Shortening Service generates a guaranteed-unique code from its pre-allocated ID range and enqueues the write. The message queue decouples the client response from the durable write, smoothing traffic spikes.

High-Level Architecture: Read (Redirect) and Write (Shorten) Paths

Components

API Gateway: TLS termination, DDoS protection, request routing, and rate limiting for write requests. All external traffic enters through the gateway, providing a single enforcement point for authentication and throttling.

Shortening Service: Generates short codes from a pre-allocated ID range, validates inputs, and enqueues write requests. Each instance owns an exclusive range of IDs, so codes are guaranteed unique at generation time. Stateless and horizontally scalable behind a load balancer.

Mapping Service: Handles redirect lookups. Checks Redis first, falls through to DynamoDB on cache miss. Also stateless. This is the most latency-sensitive component.

Message Queue (SQS or Kafka): Buffers write requests between the Shortening Service and the database. If 2,000 URL creation requests arrive in one second, the queue absorbs the spike and the Queue Worker drains them at a steady 200 per second that the database can comfortably handle.

Queue Worker: Consumes messages from the queue and writes URL mappings to DynamoDB. Since codes are generated from exclusive ID ranges, collisions cannot occur, and every write succeeds on the first attempt. Rate-limited to match database write capacity.

Redis Cache: Stores the warm set of URL mappings (recently and frequently accessed). LRU eviction policy ensures the cache holds the most useful entries. Cluster mode for horizontal scaling across multiple shards.

CDN (CloudFront): Caches redirect responses at edge locations worldwide. For a viral URL, the CDN serves millions of redirects from a nearby edge node without any request reaching the backend. This is the single most impactful component for read performance.

DynamoDB: The durable source of truth for all URL mappings. Handles cache misses from Redis, supports conditional writes for custom alias conflict detection, and provides automatic TTL-based expiration.

Why This Component Split Works

Each component in the architecture has a clear responsibility and can be scaled independently. The CDN scales by adding edge locations (no code changes needed). Redis scales by adding shards to the cluster. The Mapping Service scales by adding stateless instances behind the load balancer. The message queue scales by adding partitions. DynamoDB scales automatically by splitting partitions. No single component needs to handle the full 20,000 reads per second on its own. The CDN absorbs 80-90%, Redis handles most of the rest, and DynamoDB only sees the cold tail of traffic.

Detailed Component Design

Deep dive into 2-3 key components. Explain how they work, how they scale, discuss tradeoffs, capacity, and any relevant algorithms or data structures.

This section focuses on the core challenge of a URL shortener: generating unique short codes at scale without collisions, while ensuring fast lookups through efficient caching and partitioning.

Short Code Generation

Short codes are generated using Base62 encoding (a-z, A-Z, 0-9), giving 62 possible characters per position. A 7-character code yields ~3.5 trillion combinations—more than enough capacity for long-term growth.

Approaches:

Hash + Truncate
Hash the long URL (e.g., MD5/SHA-256) and truncate to 7 Base62 characters.
- ✅ Deterministic (same input → same code)
- ❌ High collision risk due to truncation (birthday paradox)
Random Generation
Generate a random 7-character Base62 string.
- ✅ Uniform distribution across keyspace
- ❌ Requires separate deduplication (same URL → different codes)
Range-Based Counter (Chosen)
A coordination service (e.g., ZooKeeper or DynamoDB) assigns each server a unique ID range.
- Server A: 1–1M, Server B: 1M–2M, etc.
- IDs are sequentially generated and Base62-encoded
- ✅ Zero collisions within ranges
- ⚠️ Slight predictability (can be mitigated with salting/shuffling)

Failure Handling (Important):

To prevent outages, introduce a fallback mechanism:

Secondary ID generator (backup allocator)
Pre-buffered ID ranges per server
Graceful degradation (temporary queueing of writes)

This ensures continued operation even if the primary generator fails.

Short Code Generation: Hash vs Random vs Range-Based Counter

Collision Handling (Safety Net)

Even with guaranteed uniqueness, DynamoDB conditional writes act as a safeguard:

Atomic PutItem ensures no overwrite occurs
Eliminates race conditions from read-then-write patterns
Primarily used for custom aliases, where conflicts are possible

Caching Strategy

A multi-tier caching system optimizes read performance and reduces database load:

- CDN (Hot Tier)Caches redirect responses at edge locations
- Ideal for viral/high-traffic URLs
- TTL: ~5 minutes (eventual consistency)
- Redis (Warm Tier)In-memory cache with millions of entries
- Uses cache-aside pattern (only cache accessed URLs)
- Eviction: LRU policy
- TTL: configurable (e.g., 24–48 hours for active links)

Why cache-aside?

Most URLs are rarely accessed. This avoids filling Redis with unused data and maximizes cache efficiency.

Invalidation Strategy:

Redis → immediate key deletion
CDN → rely on short TTL or explicit invalidation

Partitioning

Both Redis and DynamoDB are partitioned by short code hash, ensuring:

One request → one shard → one DB partition
No fan-out queries
Low-latency lookups

Partitioning: Shard by Short Code (Redis + DB Aligned)

Why short code?

Uniform distribution (large keyspace)
Fixed length (consistent hashing)
Primary access pattern (every redirect uses it)

Why not alternatives?

Long URLs → uneven distribution
User IDs → hotspot risk (power users)

Rate Limiting

Applied only to URL creation (write path) to prevent abuse.

Algorithm: Token Bucket
Each user/API key gets:
- Fixed capacity
- Refill rate

Behavior:

Tokens available → request allowed
Empty bucket → HTTP 429 (Too Many Requests)

Implementation:

Stored in Redis with atomic counters
Supports burst traffic while enforcing long-term limits

Backoff:

Clients receive Retry-After header
Use exponential backoff with jitter

Key Strengths of This Design

Clear separation of read vs write paths
Zero-collision ID generation with strong fallback strategy
Efficient multi-tier caching (CDN + Redis)
Optimized single-shard lookup via partitioning
Scalable and resilient under high traffic