Tinyurl and race conditions

Tinyurl

Race Conditions

Web Development

URL Shortening

Concurrent Programming

Tinyurl and race conditions

Master System Design with Codemia

Enhance your system design skills with over 120 practice problems, detailed solutions, and hands-on exercises.

Start Practicing Learn More

Introduction

A URL shortener looks simple until multiple requests arrive at the same time. The core problem is concurrency: two workers may both think the same short code is free, or both try to create a mapping for the same long URL, and the loser of that race can corrupt data unless the storage layer enforces uniqueness.

Where the Race Condition Appears

A typical shortener does three things:

decide whether a long URL already has a short code
generate or reserve a new code if needed
save the mapping

The race happens when two requests interleave those steps. If both requests check first and insert second, they can both conclude that the code or URL is unused.

This pseudo-flow is unsafe:

request A checks whether code abc123 exists
request B checks whether code abc123 exists
both see nothing
both insert abc123

Even if your code generator is good, the check-then-insert sequence is still vulnerable under concurrency.

The Correct Fix: Let the Database Enforce Uniqueness

The database should own uniqueness, not application timing. Create unique constraints on the short code and, if you want one canonical short link per long URL, on the long URL as well.

The runnable Python example below uses SQLite to show the pattern. It relies on a unique index and retries when the chosen code collides.

python

1import random
2import sqlite3
3import string
4
5alphabet = string.ascii_letters + string.digits
6
7
8def generate_code(length=6):
9    return "".join(random.choice(alphabet) for _ in range(length))
10
11
12def shorten(conn, long_url):
13    row = conn.execute(
14        "SELECT short_code FROM urls WHERE long_url = ?",
15        (long_url,),
16    ).fetchone()
17    if row:
18        return row[0]
19
20    while True:
21        code = generate_code()
22        try:
23            conn.execute(
24                "INSERT INTO urls(long_url, short_code) VALUES(?, ?)",
25                (long_url, code),
26            )
27            conn.commit()
28            return code
29        except sqlite3.IntegrityError:
30            existing = conn.execute(
31                "SELECT short_code FROM urls WHERE long_url = ?",
32                (long_url,),
33            ).fetchone()
34            if existing:
35                return existing[0]
36
37
38conn = sqlite3.connect(":memory:")
39conn.execute("CREATE TABLE urls(long_url TEXT UNIQUE, short_code TEXT UNIQUE)")
40print(shorten(conn, "https://example.com/products/42"))
41print(shorten(conn, "https://example.com/products/42"))

The important point is not SQLite. The important point is that the insert is authoritative. If another request wins the race, the database rejects the conflicting insert and your code recovers cleanly.

Choosing an ID Strategy

There are several safe approaches:

random codes with a unique constraint and retry on collision
numeric ids from a sequence, then encode them in base62
deterministic mapping from long URL to a stable identifier

Sequence-based ids are easy to reason about because the database guarantees uniqueness. Random codes distribute well and hide volume better, but they still need collision handling. Deterministic hashing can work, but collisions and canonicalization rules become your responsibility.

The Real Design Question

The hard question is often not “how do I generate a code” but “do I want the same long URL to always return the same short URL.” If yes, make long_url unique and return the existing row when it is already present. If no, allow duplicate long URLs and only enforce uniqueness on the short code.

That business rule changes the schema and the race-condition handling.

Common Pitfalls

The classic mistake is solving concurrency with an in-memory lock in one application process. That works only until you run multiple instances. The database constraint is the only shared authority all workers can rely on.

Another mistake is checking first and assuming the later insert is safe. That turns the race into a timing problem instead of a data-integrity rule.

Teams also forget URL normalization. https://example.com and https://example.com/ may or may not be the same URL for your product. Decide before enforcing uniqueness on the original string.

Summary

URL shorteners are concurrency problems as much as they are encoding problems.
The unsafe pattern is check first, then insert without a uniqueness constraint.
Put unique constraints in the database and treat insert failures as expected outcomes.
Choose whether one long URL maps to one short URL or many before designing the schema.
Application-level locks are not enough once the system runs on multiple processes or servers.