The URL shortening service exposes RESTful APIs that allow clients to create shortened URLs, retrieve original URLs, and optionally view analytics data. All APIs communicate using HTTP and JSON.
This endpoint generates a shortened URL for a given long URL.
Endpoint
POST /api/v1/urls
Request Body
{
"long_url": "https://example.com/articles/system-design",
"custom_alias": "my-link",
"expiration_date": "2027-01-01"
}
Parameters
| FieldDescription | |
| long_url | Original URL that needs to be shortened |
| custom_alias | Optional user-defined short code |
| expiration_date | Optional expiry date for the shortened URL |
Response
{
"short_url": "https://short.ly/abc123",
"short_code": "abc123",
"created_at": "2026-03-11T12:00:00Z"
}
Explanation
The client sends a long URL to the API server.
The server generates a unique short code using the ID generator.
The mapping between the short code and long URL is stored in the database.
The shortened URL is returned to the client.
This endpoint resolves a short URL and redirects the user to the original destination.
Endpoint
GET /{short_code}
Example request:
GET /abc123
Response Behavior
HTTP 302 Redirect
Location: https://example.com/articles/system-design
Explanation
The API server receives the short code.
It first checks the cache (Redis) for the mapping.
If found, the user is redirected immediately.
If not found, the database is queried and the cache is updated.
This endpoint retrieves metadata associated with a shortened URL.
Endpoint
GET /api/v1/urls/{short_code}
Example request:
GET /api/v1/urls/abc123
Response
{
"short_code": "abc123",
"long_url": "https://example.com/articles/system-design",
"created_at": "2026-03-11T12:00:00Z",
"click_count": 1024
}
This endpoint provides statistics for a shortened URL.
Endpoint
GET /api/v1/urls/{short_code}/stats
Response
{
"short_code": "abc123",
"click_count": 1024,
"last_accessed": "2026-03-11T14:30:00Z"
}
Explanation
The analytics service tracks the number of times a shortened URL
has been accessed. This information can be used for monitoring
and reporting purposes.
This endpoint allows a user to delete a shortened URL.
Endpoint
DELETE /api/v1/urls/{short_code}
Response
{
"message": "URL deleted successfully"
}
Explanation
The API server deletes the mapping from the database
and invalidates the corresponding cache entry.
The API servers are stateless, allowing horizontal scaling.
All endpoints are designed to be lightweight to minimize latency.
Rate limiting can be implemented to prevent abuse of the
URL creation endpoint.
Caching is used for redirect operations to reduce database load.
When many users create shortened URLs simultaneously, the system must ensure that generated short codes remain unique. To avoid collisions, the system uses a centralized ID generation strategy based on an auto-incrementing counter stored in the database or a distributed ID generator. Each new request obtains a unique numeric ID which is then encoded using Base62 to produce the short URL code.
Example flow:
Client → API Server → ID Generator → Base62 Encoder → Database
Because the generated IDs are guaranteed to be unique, the encoded short URLs will also be unique. This approach eliminates the need for repeated database checks that would otherwise be required when using random string generation.
To handle very high concurrency, the ID generator can allocate ID ranges to API servers so that each server can generate IDs independently without constantly querying the database.
The system must remain reliable even if the ID generator fails or multiple generators accidentally produce overlapping IDs (split-brain). To prevent this, the generator service runs in a replicated configuration with a leader election mechanism.
If the primary generator fails:
Replica generator becomes the leader
To avoid split-brain situations, generators coordinate using a distributed consensus system such as a leader election protocol. Each generator instance is responsible for a specific range of IDs, ensuring that no two instances generate the same IDs.
Additionally, the database enforces a unique constraint on the short_code column to guarantee that duplicate short codes cannot be inserted.
A caching layer is introduced to reduce database load and improve redirect latency. The system uses Redis as an in-memory cache.
Redirect request flow:
Client → Load Balancer → API Server → Cache
If the short URL exists in cache:
Cache Hit → Redirect immediately
If the short URL is not in cache:
Cache Miss → Query Database → Update Cache → Redirect
This approach ensures that frequently accessed URLs are served quickly while still maintaining correctness when data is not present in cache.
Cache entries include a time-to-live (TTL) value to prevent stale data from remaining in cache indefinitely.
Example:
short_code → long_url (TTL = 24 hours)
When the TTL expires, the next request results in a cache miss and the data is fetched from the database again. This ensures that cache entries remain fresh while keeping memory usage manageable.
To prevent a large number of cache entries expiring at the same time (cache stampede), the TTL values can include small random variations.
When a URL mapping is updated or deleted, the corresponding cache entry must be invalidated to ensure consistency.
The system uses a cache invalidation mechanism where the API server deletes the relevant key from Redis after updating the database.
Flow:
Update request → Database updated → Cache key deleted
Alternatively, a publish-subscribe mechanism can be used where all API servers subscribe to cache invalidation events and remove outdated entries from their local caches.
The system is designed to scale horizontally so that it can handle increasing traffic.
Key design principles:
Stateless API Servers
API servers do not store session data locally, allowing new servers to be added easily behind the load balancer.
Client → Load Balancer → Multiple API Servers
New API instances can be added dynamically as traffic increases.
Distributed Database
The URL database can be partitioned across multiple database nodes using sharding.
Example:
Shard 1 → URLs starting with A–M
Shard 2 → URLs starting with N–Z
This allows the system to store billions of URLs while distributing the load across multiple machines.
Read Replicas
To improve read performance for redirect operations, additional read replicas can be introduced.
API Servers → Read Replicas → Primary Database
Write operations go to the primary database, while read operations can be distributed across replicas.
To ensure high availability, multiple instances of each component are deployed.
Multiple API Servers
Multiple Cache Nodes
Replicated Database
If one server fails, traffic is automatically routed to healthy instances by the load balancer. Database replication ensures that data remains available even if the primary node fails.
API servers handle incoming HTTP requests from clients. They expose REST endpoints for creating short URLs and resolving short URLs to their original destination.
Responsibilities:
Validate incoming URLs
Generate short codes
Interact with cache and database
Handle redirects
Update analytics data
The API servers are designed to be stateless, meaning they do not store any session information locally. This allows multiple API server instances to run behind a load balancer and scale horizontally. When traffic increases, new API servers can be added without affecting existing ones.
Example request flow:
Client → Load Balancer → API Server
Stateless design ensures that requests can be routed to any server instance.
The short URL generator is responsible for producing unique short codes for each long URL.
To ensure uniqueness and prevent collisions under high concurrency, the generator uses a Base62 encoding scheme combined with a unique numeric ID.
Character set used:
a-z
A-Z
0-9
Example:
Numeric ID: 125
Base62 encoded: cb
This method produces compact short URLs while supporting billions of possible combinations.
To support high traffic, the generator can allocate ID ranges to API servers so that each server can generate short codes independently without repeatedly querying a central database. This reduces contention during heavy write operations.
If the generator service becomes unavailable, another instance can take over using a leader election mechanism to ensure continuity.
The system uses Redis as the caching layer to reduce database load and improve response time.
The cache stores frequently accessed URL mappings:
short_code → long_url
Example:
abc123 → https://example.com/article
Redirect flow with caching:
Client → API Server → Cache
If the short code exists in cache:
Cache Hit → Return original URL immediately
If the short code does not exist in cache:
Cache Miss → Query database → Store result in cache → Redirect user
This strategy significantly reduces database queries for frequently accessed URLs.
The database stores the persistent mapping between short URLs and long URLs.
Example schema:
Table: url_mapping
short_code (primary key)
long_url
created_at
expiration_date
click_count
Example entry:
abc123 → https://example.com/article
To ensure consistency and avoid duplicates, the short_code column has a unique constraint.
The database can also maintain analytics information such as the number of times a short URL has been accessed.
The load balancer distributes incoming traffic across multiple API server instances.
Responsibilities:
Distribute client requests evenly
Prevent server overload
Improve fault tolerance
Example flow:
Client → Load Balancer → API Servers
If one API server becomes unavailable, the load balancer automatically routes traffic to healthy instances.
An optional analytics service tracks statistics such as click counts and access patterns.
When a redirect occurs:
API Server → Analytics Service → Store click data
To avoid increasing latency, analytics updates can be processed asynchronously using background workers.
This allows the system to collect useful metrics without affecting the speed of URL redirection.
Each component in the system is deployed with redundancy to ensure high availability.
Examples:
Multiple API servers
Replicated Redis cache nodes
Primary database with read replicas
If one instance fails, other instances continue handling requests.
Database replication ensures that data remains available even if the primary database becomes unavailable.