Designing A Simple Url Shortening Service A TinyURL Approach - System Design

Requirements

Functional Requirements:

Create a short URL for a given long URL.
Return the long URL associated with a given short URL.

Non-Functional Requirements:

Low latency
high read write throughput
Horizontal Scalability for high traffic volumes
Low redirect latency
High availability

API Design

Mainly 2 api end points:

Post /url-shortener: Shortens the given long form url and return the shortened url as a response.

Headers: Include system generated cache tags for cache handling and idempotency

Body:

{

"url":

}

Response:

{

"shortened-url":

}

along with ETag for cache hits/miss

Post /url-elongate: returns the long form url when given the valid shortened url.

Body:

{

"shortUrl":

}

Response:

{

"longUrl":

}

along with e-tag

High-Level Design

Components required:

No-SQL DB for key value storage. Hashed form of shortened url as key and actual long url as value.
Cache for frequent urls
Hashing/Shortening service for shorting/elongating the url
Rate limiter to avoid exploitation
TTL service to invalidate cache and time out no longer used urls
Unique hash for collision avoidance mechanism
Horizontal scaling by consistent scaling
API gateway for request routing and auth
Load balancer for handling traffic distribution

Services implementation:

URL shortener service - shortens the url, creates hash for storage, ensures no duplication
Redirect service - short form url hashed and fetched from db, redirected to original url
Fetch long url - short url hashed matched and fetched from DB
Caching service - keep the most recent requested url as cache for faster fetching

DB Layer:

This layer handles data persistence, ensuring that URL mappings remain intact even in the event of system failures. It also supports scalability by allowing the system to handle a growing number of URL mappings as usage increases.

The storage layer typically utilizes a database or a distributed data store to maintain URL mappings. When a user creates a short URL, the service generates a unique identifier and stores it alongside the long URL in the database. For retrieval, when a short URL is accessed, the service queries the storage layer to fetch the associated long URL.

Entry layer (API Gateway):

The entry layer processes requests for both URL shortening and redirection. When a user submits a long URL, the entry layer forwards this request to the write path service, which handles the creation of the short URL. Conversely, when a user requests a short URL, the entry layer routes this to the read path service for resolution back to the original long URL.

This layer ensures that requests are efficiently managed and distributed across backend services, providing features like rate limiting, authentication, and logging. It can also implement caching strategies to reduce latency for frequently accessed URLs.

Detailed Component Design

DB layer:

Use a No- sql db, for key value storage e.g: Mongo DB

Easy to scale horizontally

why:

Unstructured data storage easy
Easy to scale horizontally
Key value pair storage
no complex queries

For collision avoidance:

Url hash id provides as a unique identifier for request urls. Whenever a new request comes in a url hash id is generated which is queried in the DB. If match is found return already exists along with the shortened url and do not write to DB
distributed ID generation system, such as Twitter's Snowflake or UUIDs, which can generate unique identifiers across multiple nodes. Snowflake generates a unique 64-bit integer based on a timestamp, machine ID, and a sequence number, ensuring that even under high load, each generated ID is unique. This method scales well horizontally, as multiple instances can generate IDs independently without risk of duplication.
Using a secret key the urls are hashed.
No two different URLs will have the same hash so no duplications.
In high concurrency scenarios use batch writes to DB. Only unique keys are written to DB.
Shortening url is done using Base62 encoding ensuring shorter lengths and no duplicates since hashed using a private key

For generator failure:

try switching traffic to other running instance of the service if available. (load balancer)
in case of all outage of service, try exponential back off giving the service some time to get back online.
in case of time out of exponential back off use circuit breaker to stop cascading failures and return appropriate error message to client

In case of network partition:

Stop writes to that instance and divert traffic to the working instance.
When the downed instance is back online maintain eventual consistency
sync up all the data writes from working instance to the partitioned one

Caching:

Frequently requested urls are cached for faster retrieval.
In case of cache miss send the request to service where entire Hashing and DB fetch operation takes place and after querying from the DB return the response and update it in the cache.
Set a time out of say 1min depending on the traffic and cache size and SLA after which the cache entry is invalidated.
For no longer used urls TTL them and remove from cache.
Once invalidated request has to go through DB query for response and again cached.
When original link cannot be reached or disabled invalidate the cache
Eviction refers to the process of removing data from a cache or storage when it is no longer needed or when storage limits are reached. TTL is a specific type of eviction policy where data is automatically removed after a predetermined period. For instance, if a shortened URL is not accessed for a month, it could be set to expire and be removed from the database.

API gateway:

Initial entry layer, routing traffic to services. Manages security and auth. Has rate limiting to prevent exploitation. Fallback mechanism for unavailable URLs.

Why:

Easy fault isolation
Separation of concerns
Logging and monitoring
Prevent api exploits

Rate limiting:

Since the application can have bursts of traffic leaky bucket algo can be used to maintain smooth flow.
Set a certain size of bucket based on user traffic
In case of service failures use exponential backoff for a set interval of time, if service still down switch to circuit breaker and show appropriate response message

Handling Hot ids:

One effective approach is to use caching layers, such as a Content Delivery Network (CDN) or an in-memory cache like Redis, to store frequently accessed short URLs. By caching these hot IDs, you can serve requests directly from the cache, significantly reducing the load on the underlying database and ensuring faster response times. This is particularly useful for URLs that experience sudden spikes in traffic, as the cache can absorb the increased demand without impacting the database's performance.

Another strategy is to implement read replicas for your database. By distributing read requests across multiple replicas, you can alleviate the load on the primary database instance. This allows the system to handle a higher volume of read requests for popular short URLs while maintaining write operations on the primary instance. It’s essential to ensure that the replicas are kept in sync with the primary database to provide accurate and up-to-date information.

Additionally, consider using sharding techniques based on URL patterns or user segments. By distributing hot IDs across multiple shards, you can balance the load more evenly. For instance, you could hash the short URL to determine which shard it belongs to, ensuring that requests for popular URLs are spread out rather than concentrated on a single shard.