Designing A Simple Url Shortening Service A TinyURL Approach - System Design

Requirements

Functional Requirements:

Create a short URL for a given long URL.
Return the long URL associated with a given short URL.

Non-Functional Requirements:

Latency and Reliability

API Design

POST api/v1/shorten/:longUrl

-> 201 Created

GET api/v1/long/:shortUrl

-> 301 Moved Permannet

High-Level Design

Here we need to focus on two aspects mainly shorten the url and get the url

We assume 10k writes per second and peak 100k

Then

10,000 * 3600 * 24 = 864 millon url/day

26 billon/month

315 billion/ year

URL shortener:

We will take 58 char left the confusing. We know we need roughly 1T url per year. So taking 58^8 , 8 char shorten url which is 128 T url combination sufficient enough

Storage and DB:

Lets assume temporary and permanent url split. Assume that around 250b urls max length 1 year and 75b max 10 year

Assuming 500 byte per url

So for a single year we need storage of,

250 + 750 = 1T url storage ,

which is 500 TB

We assume 5TB of of per db node so 100 db node needed

Based on number of visit and time created we will create a ranking list and used caching. We assume 1% url mostly visited . Which would be 50 gb . Can be handle with 4*16 shared redis cluster

We assume 100k read per second

Unique Id Generator

There can be generally two ideas for url shortening thing 1. Url encoding and 2. Key generation and check for duplicate in the db

As we already said we will have 58 characters so we will go for 58 base encoding.

We need to have a unique number in decimal and just go for 58 base encoding. For scale we may need multiple unique id generator . For simplicity lets assume we have 10 unique id generator that can have the id of unique id % 10 = id generator num. It will be increase one by one periodically

We can map these unique id generator nodes to db nodes . Because we can decode short url from fetch and can tell which db nodes the data in

Detailed Component Design

Deep dive into 2-3 key components. Explain how they work, how they scale, discuss tradeoffs, capacity, and any relevant algorithms or data structures.

Here we need to think of CAP theorem . I think we can choose eventual consistency and focus on other two things

Thats why we can use consistent hashing among the servers

As latency is a important factor here we can use CDN as geographical location . As there is high chance created url will mostly visit from that geo graphic zone

Url collision

Url collision will likely not happen here as we are using counter and the creating unique identifier on base conversation. For scalability every unique id counter will start from different range and with min max functionality.

Cache and Storage

We will use redis shared cahed layer . Based ob 10% frequent read we will cache the value with a ttl of 1 day. It will be stored in CDN for faster delivery. In terms of cache miss it will be served from multiple read shareded db. For db we can use SQL based as no particular relation

CDN, Rate limiting & Load balancing

We can use geological based data for load balancing and cdn . As it is more likely to have the request for same geo logical place for a url shortener

Url cleanup

As we early say we will use counter . For the temporary url zone the counter will be reset after 1 year and that entry would be cleanup