Codemia | Master System Design Interviews Through Active Practice

My Solution for Designing a Simple URL Shortening Service: A TinyURL Approach with Score: 6/10

by mystic7375

System requirements

Functional:

given long url, return shorter url

given shorter url, able to redirect to original web

Non-Functional:

support large volume of url storage

support high read traffic

high available

low latency

Capacity estimation

generate 10/s, 1day is : 10 * 86400 = 800k

1 year, new added is 800k * 365 = 25M

read-to-write ratio is: 100:1, then is 1k/s

suppose run 10 year: 250M rows

if we choose base62 url, then 62^5=250M, so the short url length just need 5 characters, while it may encounter collision as time goes on, so setting with 62^6 = 56G is better to avoid collision in the future

storage need: 250M * 1KB = 250GB

API design

1: generate(long url, optional expiration_time)

return a shorter url, expire time is optional

2: get(short url)

return long url and redirect to original web, or return error mentioning that url is expired or does not exist

Database design

consider noSQL (key value storage) since it's just a mapping from string to another string, db should be partitioned on short url for scalability and contribute to fast lookups, also do replicate for high availability (to avoid single node failure)

High-level design

generate query -> API gateway -> LB -> server -> generate base62 url to represent url -> store it

the returned url using counter-based strategy, and I need a global incremented id for it

query short url -> API gateway ->LB -> server (with cache) -> read from db and return

Request flows

Explain how the request flows from end to end in your high level design. Also you could draw a sequence diagram using the diagramming tool to enhance your explanation...

Detailed component design

counter based strategy: use ZK to provide sequence increment service, this help avoid conflict due to consistency protocol in ZK. also provide strong fault tolerant in case of single point failure.

since read-to-write ratio is very high, so introduce cache to offload database pressure, also cluster cache is better like redis cluster, usually implemented with consistent hash to decide which cache server to query

db should be horizontal scaling, partitioned to multiple subset and use range-based method to determine which partition to query, each partition is further replicated to multiple regional to increase availability. prioritize eventual consistent over strong consistency as little delay is acceptable. (i.e. prioritize more on availability over consistency)

Trade offs/Tech choices

while ZK is good, but it can not support high traffic, therefore, split to multiple range server is better, e.g. one server for assign id from 1 - 1000, another for id from 1001 - 2000, when some server exhaust the id, request new range from a ZK server instead. assigning server can using sql transaction for simplicity and avoid conflict.

Failure scenarios/bottlenecks

Try to discuss as many failure

Future improvements

for the hot link, choose CDN to accelarete it,, which will be stored in Edge server (POP), closer to user location and faster, also reduce burden on our server

do url validation, avoid bad website to redirect to

consider expiration, when do the mapping, if it's expired, should return null for this mapping, also set up a background service to perodically clean up expired items in db