Designing A Simple Url Shortening Service A TinyURL Approach - System Design

Requirements

Functional Requirements:

Create a short URL for a given long URL.
Return the long URL associated with a given short URL.

Non-Functional Requirements:

List the key non-functional requirements (eg low latency, scalability, reliability, etc.)...
scalability : make sure file structure is optimized for maintenance and CI/CD workflows
Documentation: proper logging and timestamps to track proplems when they occur
99.9% uptime : should be available 99% of the time
low redirect latency
high availability
horizontal scalability

API Design

REST API design:

GET endpoints:

GET /api/url

POST endpoints:

POST /api/url

POST /api/url/redirect

PUT endpoints

PUT /api/url

High-Level Design

For shortening, the client sends a long URL to the API, the service validates it, generates or reserves a unique ID, converts that ID into a compact code such as Base62, stores the mapping, and returns the final short URL. Some designs also support deduplication, where the service checks whether the same long URL already exists and reuses the existing short code.

For redirection, the client requests `short.ly/abc123`, the service looks up `abc123` in cache first, falls back to the database on a miss, and returns an HTTP redirect to the long URL. This cache-first approach is standard because it reduces database pressure and improves latency for hot links.Secure connections

At the edge, the load balancer should hold the server certificate, terminate TLS, and negotiate approved protocols and ciphers with clients, which is a common pattern in modern load-balancing guidance. For stricter internal security, service-to-service traffic can also use mutual TLS, especially if the platform grows into multiple services or runs in a service mesh.

For redirect safety, I would not trust arbitrary destination URLs at redirect time. Instead, the system should store a validated mapping when the short link is created, use that server-side token-to-URL mapping for redirects, and apply allow-listing or policy checks for suspicious domains to reduce open-redirect abuse.

Effective routing

Routing should be split by traffic type because the read path and write path behave differently. The load balancer or gateway can send /api/shorten traffic to the creation service and /{shortCode} traffic to the redirect service, while also performing health checks and removing unhealthy instances automatically.

For redirect performance, the routing path should prefer cache-first resolution, because most traffic is typically reads and many hot short codes repeat frequently. If the cache misses, the redirect service queries the backing store and can then repopulate the cache, keeping the hot path fast and reducing database load.

Storage responsibilities

The storage layer is the source of truth for URL mappings, so its primary job is to persist the relationship between short_code and long_url. It also stores metadata such as creation time, expiry, owner, status, and sometimes aggregate counters or attributes needed for governance and analytics.

In practice, the storage layer is also responsible for consistency and lookup efficiency. That means supporting writes for new links, point reads by short code, uniqueness constraints for generated aliases, replication for availability, and sharding when volume becomes too large for a single node.

1. Route traffic safely

Put an HTTPS load balancer in front of all app servers and terminate TLS there.
Configure health checks on app instances and remove unhealthy nodes from rotation automatically.
Split routes by purpose: POST /shorten to the write path, GET /{shortCode} to the redirect path, and admin/analytics to separate handlers.
Plan for traffic spikes by autoscaling stateless app servers and keeping the redirect path cache-first so the database is protected during hot-link surges.

2. Generate unique IDs correctly

Use a distributed ID generator, preferably a Snowflake-style scheme or preallocated ID ranges, then Base62-encode the numeric ID into the public short code.
Ensure uniqueness under high concurrency by giving each generator node a unique machine/worker ID and an atomic per-time-unit sequence counter.
If one generator node fails, keep the system running with redundant generators that own different ranges or worker IDs.
Prevent generator split-brain by using leader election or lease-based ownership for shared ranges, and reject stale generators with fencing or epoch checks.

3. Persist canonical mappings

Store the source of truth in the database as short_code -> long_url, plus metadata like created_at, expires_at, user_id, and status.
Index short_code as the primary lookup key for fast redirects.
Optionally index long_url if you want duplicate detection or idempotent shortening for the same user.
Use read replicas for redirect-heavy traffic and shard by short code when a single database tier becomes a bottleneck.

4. Add cache with explicit TTL and eviction

Use Redis only for hot redirect mappings: cache key = short_code, cache value = long_url.
Set a TTL on every cached item, with a practical default like 24 to 48 hours for active links.
If the link itself has an expiry, set the Redis TTL to match expires_at so expired links auto-evict and are not served stale.
Set a memory cap and an explicit eviction policy, preferably LRU for this workload, so cold entries are removed under memory pressure and cache bloat stays bounded.

5. Serve redirects through cache-first lookup

On GET /{shortCode}, check Redis first.
On a cache hit, return the redirect immediately without touching the database.
On a cache miss, read from the database, validate the link is active and unexpired, then populate Redis with the correct TTL.
On link disable, delete, or expiry, invalidate the cache entry immediately so stale targets are not served.

6. Control abuse and spikes

Apply rate limiting at the gateway, with stricter limits on POST /shorten than on redirects.
Use per-IP limits for anonymous users and per-user or per-API-key limits for authenticated users.
Protect custom alias creation separately because it is easier to abuse than generated IDs.
During spikes, prefer graceful degradation: keep redirects available, shed low-priority analytics or background work, and return 429 for abusive write traffic.

7. Handle failures explicitly

If an app instance fails health checks, the load balancer stops routing to it.
If Redis is unavailable, fall back to the database for redirects, but expect higher latency and watch database load carefully.
If one ID generator fails, other generators continue using their assigned worker IDs or preallocated ranges.
If a generator loses coordination or lease ownership, it must stop issuing IDs immediately to avoid duplicate short codes.