Codemia | Master System Design Interviews Through Active Practice

My Solution for Designing a Simple URL Shortening Service: A TinyURL Approach with Score: 8/10

by vivid103

System requirements

Functional:

List functional requirements for the system (Ask the chat bot for hints if stuck.)...

Accept a URL to shorten, return a shortened URL.
- The URL mapping should be consistent.
- Maybe we want to expand to expiring URL mappings?
Should forward any shortened URLs to the original
Should track hits to shortened URLs

Non-Functional:

List non-functional requirements for the system...

Scalability
Availability
- avoid SPOF
- prioritize read availability
- prioritize consistency on reads
Security
Observability
Simplicity

Capacity estimation

Estimate the scale of the system you are going to design...

shorten 100 URL / s
redirect 5,000 URL / s (typical) 50,000 URL /s (burst)
storage
- 100 bytes per saved URL
- 8 bytes per shortened URL
- ~10KB/s == ~360 MB/hour == ~6GB / day == ~1 TB / year
a 7 character string with lowercase / uppercase numbers gives 62^7 possibilities.
- 8 char might be easier to remember, split into two groups of 4

API design

Define what APIs are expected from the system...

POST /shorturl

request: {original_url: 'somestring'}

response: {short_url: 'bit.ly/aDF93d'}

GET /

response 301 redirection to original_url

Database design

Defining the system data model early on will clarify how data will flow among different components of the system. Also you could draw an ER diagram using the diagramming tool to enhance your design...

URL table

id: BigInt

original_url: text

short_url: text

created_at: datetime with timezone

hits: BigInt

High-level design

You should identify enough components that are needed to solve the actual problem from end to end. Also remember to draw a block diagram using the diagramming tool to augment your design. If you are unfamiliar with the tool, you can simply describe your design to the chat bot and ask it to generate a starter diagram for you to modify...

edge varnish cache for most popular websites (globally distributed)

load balancer to distribute requests

web server(s) to handle load

global caching server. cheaper than hitting DB

database. durable store data.

Request flows

Explain how the request flows from end to end in your high level design. Also you could draw a sequence diagram using the diagramming tool to enhance your explanation...

# writes

client makes POST request

request hits load balancer, get sent to a stateless web server

web server writes to database

web server updates cache

web server returns response to client

# reads

client makes GET request

request hits load balancer

gets sent to stateless web server

web server looks up url in cache

on cache miss, web server looks up url in database

on db miss, returns a 404

on db hit, writes to cache

on cache hit, return 301 redirect to url

Detailed component design

Dig deeper into 2-3 components and explain in detail how they work. For example, how well does each component scale? Any relevant algorithm or data structure you like to use for a component? Also you could draw a diagram using the diagramming tool to enhance your design...

load balancer could use a couple strategies:

round robin
smallest request queue
track latency metrics per server and use latency metrics

web server hash function

is an 8 character hash, we can just randomly generate a hash per url. there's enough randomness -- we can retry if we create a collision

database

relational database (postgres)
use a unique index on hash to prevent collisions
compound index on (short_url, original_url)

Trade offs/Tech choices

Explain any trade offs you have made and why you made certain tech choices...

The "hit" calculation is asynchronous through queue population. It's not as important to have it be "read-your-writes" as creating a short-url.

The server and cache use the same processing queue to minimize code path and duplication.

we include an edge cache and global redis cache for multi-tier caching. The edge case is primarily to catch power law scenarios where some URLs have blown up and keep the system robust.

Multi-level caching make cache coherency harder if we want to evict an item from all caches. There's going to be some delay (TTL).

Postgres was a fatter data store than we needed, but it makes it easy to expand if we want to collect extra metrics / metadata.

Failure scenarios/bottlenecks

Try to discuss as many failure scenarios/bottlenecks as possible.

If a cache falls over, we might get thundering herds that take down the whole system. Having read replicas that can takeover as primarys would be good for resiliency.

if a write request hits the database but doesn't return to the user, they should be able to make a request with the same long url and get back the saved short url from the database (like an idempotency key)

Future improvements

What are some future improvements you would make? How would you mitigate the failure scenario(s) you described above?