Designing A Simple Url Shortening Service A TinyURL Approach - System Design

Requirements

Functional Requirements:

1. Users can submit a long URL and receive a shortened URL.

2. Users can access a short URL and get redirected to the original long URL.

3. Users can sign up/login and manage their own shortened URLs.

4. The system should track the number of clicks for each short URL.

5. The system should allow a user to view click count/analytics for their own URLs.

Non-Functional Requirements:

1. High availability:

The system should be highly available, targeting 99.9% uptime or higher.

2. Low latency:

Redirects should be fast, ideally under 200 ms.

3. Scalability:

The system should support millions of users initially and scale to billions of stored URLs over time.

4. Read-heavy optimization:

Since redirects are much more frequent than URL creation, the system should be optimized for read-heavy traffic.

5. Durability:

Once a short URL is created, the mapping should not be lost.

6. Global availability:

The system should serve users from different regions using multiple data centers, edge caching, or CDN where appropriate.

7. Security and abuse prevention:

The system should include rate limiting to prevent spam, bots, and abusive URL creation.

Capacity Estimation

Assume we have 2 million active users, and each user makes around 2 requests per day.

That gives us:

2,000,000 × 2 = 4,000,000 requests per day.

Since one day has 86,400 seconds:

4,000,000 / 86,400 ≈ 46 QPS average.

Traffic is not evenly distributed throughout the day. If we assume peak traffic is around 4–5x the average, then peak QPS will be around 200 QPS.

To allow room for future growth, I will design the system to handle around 400 QPS.

For short URL generation, I will use lowercase letters and digits for better readability.

That gives us:

26 lowercase letters + 10 digits = 36 possible characters.

If we use 6 characters per short code:

36^6 ≈ 2.17 billion possible combinations.

This is a large key space, but for long-term scalability I would prefer using 7 characters:

36^7 ≈ 78 billion possible combinations.

This gives us much more room for growth and reduces the risk of exhausting the key space.

For code generation, if we generate short codes randomly, we need to handle collisions by checking whether the generated code already exists in the database.

A better approach is to use a unique numeric ID from a database sequence or distributed ID generator, then encode that ID using Base36. Since the numeric ID is unique, the generated short code will also be unique, which helps avoid collisions.

Since redirect traffic is usually read-heavy, we should also use a cache layer later in the design to reduce database load.

API Design

get request for url then redirecting the user
post request to create a new url
delete request to delete a url
patch request to update the url

At a high level, the system will have clients, a load balancer, a rate limiter,

multiple stateless application servers, a cache layer, and a database.

The client sends requests to the load balancer. The load balancer distributes traffic

across multiple application servers so we don’t overload a single server.

Before the request reaches the application logic, we can apply rate limiting to protect

the system from abuse, spam, or sudden traffic spikes.

The application servers will handle two main APIs:

1. Create a short URL

2. Redirect a short URL to the original long URL

For creating a short URL, the app server stores the long URL in the database, generates

a unique short code using an ID generator and Base36 encoding, then returns the short URL

to the client.

For redirecting, the app server first checks Redis cache using the short code. If the

mapping exists in cache, it returns the long URL immediately. If not, it reads from the

database, stores the result in cache, and then redirects the user.

The database is the source of truth and stores the mapping between short_code and long_url.

The cache is used to reduce database load because redirect requests are read-heavy.

Database Design

Define the data model. Identify the main entities, their attributes, and relationships. Consider the choice of database type (SQL vs NoSQL) and justify your decision based on access patterns...

For the database design, the main entity is the URL mapping.
The urls table will contain:
- id
- short_code
- long_url
- created_at
- expires_at
- user_id, if we support user ownership later
I would store the short code, not the full short URL, because the domain can be generated by the application.
For the initial scale, I would start with a relational database like MySQL or PostgreSQL. The data model is simple, and SQL gives us strong consistency, unique constraints, indexing, and operational simplicity.
The most important access pattern is redirect lookup:
short_code -> long_url
So I will create a unique index on short_code.
I would not index long_url by default because long URLs can be large and we don’t usually query by the full long URL. If we need duplicate detection, I would store a long_url_hash and index that instead.
If user ownership is introduced later, I would add a users table and store user_id in the urls table. This keeps the model normalized. The redirect path still only needs short_code and long_url, so it remains fast.
If we expect massive scale from day one, then a NoSQL key-value database like DynamoDB would also be a good choice. The primary key would be short_code, and the value would contain long_url, user_id, created_at, and expires_at. This fits the main access pattern very well.
DynamoDB also handles partitioning internally based on the partition key, so we don’t need to manually manage shards. If we need to query URLs by user, we can add a secondary index on user_id and created_at.

Detailed Component Design

Deep dive into 2-3 key components. Explain how they work, how they scale, discuss tradeoffs, capacity, and any relevant algorithms or data structures.

For the detailed component design, I will focus on the stateless application servers, the cache layer, and the database.

First, the application servers will be stateless. They will expose APIs like POST /shorten to create a short URL, GET /{short_code} to redirect users, and GET /analytics/{short_code} to show click statistics.

All requests go through the load balancer, which distributes traffic across multiple servers. Since the servers are stateless, any server can handle any request. This allows us to scale horizontally by adding more servers when traffic increases.

For authentication, we should avoid storing sessions in server memory. We can either use JWT tokens or store sessions in a shared store like Redis. This keeps the servers stateless.

Second, the cache layer will store hot URL mappings. The key will be the short_code and the value will be the long_url. For redirect requests, the app server checks Redis first. If the mapping exists, we return the redirect immediately. If not, we read from the database, cache the result, and then redirect.

This reduces database load and improves latency, especially because redirects are read-heavy. Since URL mappings are mostly immutable, caching works very well. If the cache grows, we can use Redis Cluster and shard keys by short_code hash. We can also set TTLs based on the URL expiration time.

Third, the database is the source of truth. Initially, I would use MySQL or PostgreSQL with a urls table containing id, short_code, long_url, user_id, created_at, and expires_at. The most important index is a unique index on short_code because redirect lookup depends on it.

At the initial scale, a relational database with proper indexing and Redis cache should be enough. We can scale reads using cache and read replicas. If the data size or write throughput grows significantly, we can introduce sharding by short_code hash or ID range.

If we expect massive scale from day one, DynamoDB is also a good option because the main access pattern is key-value lookup from short_code to long_url. DynamoDB handles partitioning internally using the partition key. The tradeoff is that we need to design access patterns upfront, and relational queries become harder.

For ID generation, I would avoid having multiple generators independently producing the same ID range.

One safe approach is to allocate ID ranges from a central strongly consistent store.

For example:

Generator A gets range 1 - 1,000,000

Generator B gets range 1,000,001 - 2,000,000

Generator C gets range 2,000,001 - 3,000,000

Each generator can generate IDs locally from its assigned range. If a generator goes down, another generator can request a new range from the central store.

To avoid split-brain, range allocation must be done using a strongly consistent mechanism, such as a database transaction, ZooKeeper/etcd, or a consensus-based service.

For disabled or edited links, the database remains the source of truth.

When a link is edited or disabled, we first update the database, then invalidate the related cache key in Redis.

For example:

DEL url:{short_code}

After that, the next redirect request will miss the cache, read the fresh value from the database, and repopulate Redis.

To reduce the risk of stale cache, cached entries will also have TTLs.

To prevent cache bloat, I would not cache every URL forever.

I would set a TTL on cached mappings, for example 24 hours by default.

If the URL has an expiration time, the cache TTL should not exceed the remaining lifetime of the URL.

For Redis eviction policy, I would use allkeys-lfu or allkeys-lru. LFU is a good fit because redirect traffic usually has hot links, and we want frequently accessed URLs to stay in cache while rarely used URLs are evicted.

We can also set a maximum Redis memory limit and monitor cache hit ratio, memory usage, and eviction rate.