Codemia | Master System Design Interviews Through Active Practice

My Solution for Designing a Simple URL Shortening Service: A TinyURL Approach with Score: 8/10

by terra_alchemy638

System requirements

Functional:

System should be able to provide two APIs
1. Takes a long URL and return the short URL
2. Takes a short URL and redirects to it's respective long URL
3. URL analytics is out of scope

Non-Functional:s

Availablity
Scalability
Fault tolerant

Capacity estimation

Let's consider the following requirements

Traffic volume: 100 million URLs per day
Character allowed in short URL: [A-Za-z0-9]
Shortened URLs cannot be updated or deleted

URLs per second: 100 * 1000000/10000 = 1000 URLs/sec (24 hr -> 86400 ~ 10K)

Write operations: 1000 URLs/sec

Read operations: 10 reads: 1 write -> 10,000 reads/sec

Storage:

short URL max length: 56 bytes (56 characters)
Long URL max length: 200 bytes (200 characters)
Life time of service: 10 years
Total records to store: 100 mill * 365 * 10 = 365 billion records
Storage required: 365 B * 256 bytes =~ 94 TB

API design

This service will primarily have two APIs

api/v1/short
1. request type: POST
2. request body: { url: long_url }
3. response: { url: short_url }
api/v1/shorturl
1. request type: GET
2. response: 302 redirect with Long URL

301 response code means URL has been permanently moved to Long URL and browser will cache it, subsequent requests for the same URL will be redirected to Long URL by the browser itself.

302 response code means URL has been temporarily moved to Long URL and sebsequent requests will be sent to our service by browser. Since we're tracking analytics like click rates, we will stick to 302 response code.

Database design

Good choice for this usecase is NoSQL database. We're planning to use cassandra.

Two table will be created

short_urls
1. short_url (index)
2. long_url
3. created_at
4. updated_at
long_urls
1. long_url(index)
2. short_url
analytics
1. short_url
2. click_count

We will create a hash for each long url and store the details in the first table and add another record in long_urls. Second table is for duplicate URLs check.

For hash we're choosing base62 encoding of long_url and hash length of 7 is enough to create 62^7 ~ 3 Trillion URLs.

High-level design

When Client will make a request a loabalancer and it will get forwarded to one of the servers.

When client makes a request with long URL. Server will assign a key to the longurl and store it in the database. This key will be base62 encoded and return a short URL. The encoded URL will be stored in cache with some TTL asynchronously.

When client makes a request with short URL. Server will first check in cache and return long URL else it will fetch from DB, store it cache asynchronously and return the long URL.

Request flows

We're using cache-aside pattern to store the data asynchronously

Detailed component design

LoadBalacer
1. This component is responsible to distribute the requests from clients to set of servers to reduce the load on each server. We can choose Least response time load balancing algorithm for better TPS.
Server
1. Server is responsible for handling the request. When server got initialized, it will fetch a key range from KGS(key generation service) and synchrously increment the key for each request, create hash from the key, update it in DB and return the response. Server peroidically update it's last used key in DB.
KGS(key generation service)
1. KGS is responsible to provide a key range for each server. It updates the last provided in DB for each server.
2. Key range will update/provided synchronously so that no two servers get the same key range

Trade offs/Tech choices

Since our system is read-heavy, adding read replica node to the database will improve the reqeusts latency.
we can choose MPP database like clickhouse for better analytics and BI.
We need to manually scale the servers by monitoring TPS

Failure scenarios/bottlenecks

KGS is a single point of failure, We need to add a standby server for KGS for better availablity.

If URL is not present, we will return 404 NOT FOUND Error.

If load is huge, we need to manually scale the servers

Future improvements

Move analytics to MPP database
Add Standby server for KGS to avoid SPOF
Add Ratelimiter based on IP addresss to avoid overhelmingly huge number of POST short url requests.
Deploy the system in K8s and add HPA for servers to auto scale
Add promtheus and grafana for monitoring the system.