Codemia | Master System Design Interviews Through Active Practice

My Solution for Designing a Simple URL Shortening Service: A TinyURL Approach with Score: 10/10

by horizon5628

System requirements

Functional:

Shorten the url
Resolve short url to long url and redirect user to it.
Option for custom short aliases
Collect stats and build popularity dashboard
Add expiration feature - Longterm support - 100 yrs a good support by default
Rest API
User management - manage their urls

Non-Functional:

Performance
Scalability
Reliability
Security
Monitoring
Data Backup
Compliance

Capacity estimation

1% reqs for shortening and 99% for retrieval.

10000 * 365 * 100 = 365 mil records if everything is unique

assuming 1 kb per request - 365 gb for storing reqs ~ 500 gb

100 reqs/sec assuming it takes 10ms to serve each request, we need 1 cpu core

assuming we will cache 100% of monthly traffic, we need less than 1 gb memory.

We use LRU cache eviction policy.

64^4 = 160 mil

64^5 = 1 bil - we will go with 5-character short url hash.

API design

Define what APIs are expected from the system...

POST aka.ms/api/v1/shorten

{"url": "https://google.com/jnsckjdfn/dfdvdfv?d=4&k=0", "userid": "abc", "oauthtoken": "gfcgjkj", "alias": "google", "expirydays": 10}

response:

sh.rt/shrturl

or error code if the alias is already taken (add suggestions if any) or not authorized

GET aka.ms/api/v1/shrturl

response:

redirects to https://google.com/jnsckjdfn/dfdvdfv?d=4&k=0 with status code 302 if url found

else returns 404

PUT aka.ms/api/v1/edit

{"shrt_url": "sh.rt/shrturl", "new_long_url": "cdcdcc"}

response:

200 if create by same user.

error if not

DELETE aka.ms/api/v1/shrturl

response:

200 if create by same user.

error if not

GET aka.ms/api/v1/analytics/popular_shorts

response:

top 10 most accessed short urls

Database design

Lot of random reads and no relation constraints (kv store?)

sequential read only for user dashboard.

Users Table schema:

name

subscription

last_login

no_requests

request_limit

no_of_urls

Available hashes schema:

Hash

State - 0/1 0 -> available 1 -> unavailable

index on state as we filter and fetch / delete based on state.

URLs table schema:

long

hash

create time

expiry time

Logs on NoSQL schema:

timestamp

log_level

message

(user + request + response)

High-level design

write request

User -> lb -> write_service-> db

key_service -> db

read request

User <-> lb <-> read_service <- cache <- db

shorten service will mark a random hash / check if custom_shrt is present in urls. if not present, assign it. else throw error if in urls, block if in hash table and assign it. if not in hash and in urls, error out and send recommendations.

Utilities:

CleanupUtil - cleans up any expired urls from urls table and adds them back in hash table

HashGenerationService - monitors enries in hash table. when below a threshold, populate it with new ones.

Request flows

Explain how the request flows from end to end in your high level design. Also you could draw a sequence diagram using the diagramming tool to enhance your explanation...

write request

User -> lb -> write_service-> db

read request

User <-> lb <-> read_service <- cache <- db

Detailed component design

LB:

load balancer partitions requests based on short uri hash to increase cache hits

Key generation:

md5? no as its expensive and we have to handle collisions etc

incr keys via key service? yes

Trade offs/Tech choices

Explain any trade offs you have made and why you made certain tech choices...

Read heavy - keep read replicas

when the uri is not replicated yet, read from original.

caching:

caching frequently used urls is beneficial.

should we load balance based on hostname?

Failure scenarios/bottlenecks

invalid url?

return error response.

two simultaneous same custom_shorts

- locking at record level

Future improvements

Should we prevent scraping shrt to long uris?