Designing A Simple Url Shortening Service A TinyURL Approach - System Design

System requirements

Functional:

user submits long url, gets short url back
user navigates to long url, gets redirected to short url
track analytics for clicked links
custom short url value

out of scope:

editing previous redirects
authentication
display analytics

Non-Functional:

List non-functional requirements for the system...

high availability >> consistency
fast creation and resolution of redirects, < 100ms
scalable to handle surges in traffic and growth

Capacity estimation

dau

500M internet users
1% user url shorteners
20% market share
1 url shortened per day
1M DAU for writes
10 tps
10:1 read/write ratio
100 qps

storage

1M redirects saved per day
1K for redirect & short url etc
1GB data per day
*2 for redundancy
*2 for backups
400 day/year
1.6 TB per year

API design

create redirect:

POST /api/v1/redirect/create

body: {

long_url: string,

short_path?: string,

}

response: {

short_path: string,

}

error: 409, short_path already exists

GET /:short_path

=> 302 temporary redirect to long url

error => 404 not found

note: 302 instead of 301 for better analytics. trade-off higher load

Database design

short_url

short_path: string (primary key)
long_url: string
click_count: int

High-level design

You should identify enough components that are needed to solve the actual problem from end to end. Also remember to draw a block diagram using the diagramming tool to augment your design...

Request flows

create redirect:

POST /api/v1/redirect/create
API Gateway -> LB -> Shorten Service
generate short_path if not provided
write to DB if not exist
success?
reply to user
fail?
error message to user if custom short_path
retry N times if generated short_path, else error message to user

consume redirect

GET /:short_path
AG -> LB -> Redirect Service
aggregate click count

Detailed component design

unique short_url generation:

use lower case + digits for easy typing: 26+10 = base 36
8 chars at base 36 will give about 3T possibilities
3M days at 1M per day
=> low chance of conflict and retries
can ensure uniqueness across nodes by generating from:
8 bits for server id
40 bits for timestamp
16 bits for increment reset every second
=> convert to base 36

scale write speeds:

for better write speeds can partition or shard DB on :short_path
when sharding, we can partition by hash ranges to avoid short_path conflicts across shards

scale read speeds:

read replicas
cache for redirect service

click aggregator:

send click info to durable stream like kafka
worker aggregates clicks for short urls and batch updates click_count in db

Trade offs/Tech choices

postgres for DB, optimized for read vs writes.
key-value store like DynamoDB could've also easily worked
SQL allows transactions or more complex queries for future updates
data is highly structured and app is read-heavy, also suitable for b-tree DB

hash

only lower case. upper case would give base 62 and allow for shorter urls
but only 1 character shorter and much worse UX for user, especially mobile bc random uppercase chars can be tedious to type

Availability >> Consistency for reads

eventual consistency is acceptable for handling redirects
unavailable service would have much worse UX than temporarily stale data

Consistency >> Availability for writes

not acceptable to ignore :short_path conflict
users should have confidence that their :short_path wasn't reused by other users
reads are eventually consistent, which is a good trade-off since it allows higher availability.

Failure scenarios/bottlenecks

write speeds can be a bottle neck
prioritizing availability over consistency, users can get temporarily stale data

Future improvements

can improve write speeds with partitioning
allow editing urls
more advanced analytics