Codemia | Master System Design Interviews Through Active Practice

My Solution for Designing a Simple URL Shortening Service: A TinyURL Approach with Score: 8/10

by echo_kraken585

System requirements

Functional:

user requests a short URL given a long url

user clicks on short URL and gets redirected to the original url.

short URL will expire after 30 days.

Non-Functional:

> 1 million users

100 million daily requests.

Capacity estimation

Estimate the scale of the system you are going to design...

100 million/day / 86400 second/day = 1250 hits/second

Assume a server can handle 1000 /second. We just need 2 servers to handle the average load. If peak use is 10 times of normal use. we will need 13 servers to handle the load.

API design

Define what APIs are expected from the system...

getShortURL(user_Id, long_url) returns the short URL.

delURL(user_id, short_url)

redirect(user_id, short_url) returns the long URL.

Database design

Defining the system data model early on will clarify how data will flow among different components of the system. Also you could draw an ER diagram using the diagramming tool to enhance your design...

Assume this service will exist for 10 years.

100 million /day * 3650 = 3.65 G requests

100 bytes per request = 365 GB storage needed for 10 years.

Tables:

relational database

user:{

user_Id int,

created_at,

membership_level,

name,

other_meta

}

no-sql: (over 100 million rows) mogodb is a good choice. since this data is less structured. also the fast retrieval requirement

url: {

user_id,

long_url,

short_url,

name,

expiration_date

}

High-level design

You should identify enough components that are needed to solve the actual problem from end to end. Also remember to draw a block diagram using the diagramming tool to augment your design. If you are unfamiliar with the tool, you can simply describe your design to the chat bot and ask it to generate a starter diagram for you to modify...

Request flows

Explain how the request flows from end to end in your high level design. Also you could draw a sequence diagram using the diagramming tool to enhance your explanation...

the client will send requests to server. The load balancer will distribute the traffic to the nearby server. I would add api gateway here for rate limiting.

The server will take the requests and either generate the url or redirect the url to the original url. We will use redis cache here to cache popular urls for low latency.

Detailed component design

Dig deeper into 2-3 components and explain in detail how they work. For example, how well does each component scale? Any relevant algorithm or data structure you like to use for a component? Also you could draw a diagram using the diagramming tool to enhance your design...

>10 servers require a load balancer. I would even use an API gateway in front of the load balancer to handle rate limiting, ddos attack, etc.

multiple servers + multiple cache will require leaderless replication to stay highly available and eventually consistent.

shorten url use hash function 6-7 characters long should be enough. hashing algorithm like sha is a good option.

The service will also check if the short url already exist. Bloom filter is a good algorithm to check "definitely not there."

Trade offs/Tech choices

Explain any trade offs you have made and why you made certain tech choices...

fast check if hash exist using bloom filter. it's an algorithm checking for definitely not there or maybe there. space efficient data structure. not definite but fast check for low latency requirement.

cache: redis for fast retrieval. better support for enterprise features.

Distributed cache and dbs means the requests could come in from different servers, so creates a delay or possibility of inconsistency among servers. high availability requirements make leader-follower model not working. point to leaderless replication.

Failure scenarios/bottlenecks

Try to discuss as many failure scenarios/bottlenecks as possible.

bottleneck should be with traffic. rate limiting from api gateway should prevent that from happening.

backup servers should take care of server failure.

replication of data will also take care of server down time.

conduct load testing before launch

build monitors to monitor the health of the system. usage analysis, traffic pattern, server health, etc.

Future improvements

What are some future improvements you would make? How would you mitigate the failure scenario(s) you described above?

name should be customizable with a premium.

can offer stats like directing volume, etc.

expiration date can be longer with a premium.

We can also offer service for fast retrieval, basically this customer's urls will always be in cache.