Codemia | Master System Design Interviews Through Active Practice

My Solution for Designing a Simple URL Shortening Service: A TinyURL Approach with Score: 9/10

by expanse_jubilee979

System requirements

Functional:

List functional requirements for the system (Ask the chat bot for hints if stuck.)...

Given a long url produce short url
1. User can optionally provide an alias for the url
Short URL is readable and short - 6 chars, a-z, 0-9 excluding i, I ,L, l, o, O, 1 characters.
1. 26 lowercase chars - [a - z]
2. 26 uppercase chars - [a - Z]
3. 0 - 9 : 10 digits - [0 - 9]
4. 62 usable chars - can make 62^6 combinations of hash - 56B hashes using base62 encoding
5. Optionally we could remove the confusing chars - I, i, 0, O, l, L which would leave us 56 chars, and add two chars @ and $, that results in 58 chars with Base58 encoding - comes around 38B hashes

Out Of Scope:

User Authentication & Account Management
Tracking click per short URL, geolocation
Expiring URL
Security, logging, monitoring & filtering malicious requests

Non-Functional:

List non-functional requirements for the system...

System ensures no duplication of short url
Redirection occurs less < 100ms
System is reliable and available 99.99% of time
System should support 10B URLs

CAP

The system prioritises consistency in WRITE (URL generation) and availability in READ (Redirection). The writes are consistent to a primary database node this is replicated to other nodes of database in synchronous manner. This has a latency for writes however our writes is comparatively lower than reads. Secondary nodes are only for read and a secondary node becomes primary in case of a fail over.

Capacity estimation

Estimate the scale of the system you are going to design...

URL Generated
1. 100K URLs per day i.e 1 URL/sec
2. Month - 3M URLs per month
3. Yearly - 36M URL per year
4. 5 yrs - 180M URLs in 5 yrs
Redirection
1. RPS - 1K redirections/sec
2. With a spike of 70% rounded to 2K redirections/sec
Storage Size
1. 3MB per URL record - short code (6 chars), username(15 chars), longurl(2048 chars), expiresOn(8 bytes), createdOn(8bytes)
2. overall approx 3MB per URL record x 180M URL Records = rounded to 550GB

The read to write ratio is 1000+:1 URL, that is every url which is created is read 1000 times, hence the database design takes into consideration of a high read throughput.

API design

Define what APIs are expected from the system...

Generate URL

POST /urls 
Request: {
  "userid" : "bob",
  "longUrl": "www.example.com/video1",
  "alias" "ABC123"
}
Response: HTTP 201 created with generated shorturl for instance https://mytinyurl.ly/ABC123

Verify Alias Availability

GET /urls/{alias}
Request: Path parameter
Response : 409 CONFLICT means alias is taken, 200 OK means the alias is available

This is a GET request that checks whether an alias is available against a Bloom filter in the url generator node

Redirection Request

Here a third-party site is requesting us to get the long url for a short url, for instance if some user clicks on a link "www.mytinyurl.ly/ABC123" where the host "mytinyurl.ly" is us and then we respond with a 301/302 HTTP response code with long url. Ideally the response is 302 so that its not cached in a CDN or DNS servers. This allows us to track the clicks of this short url in the future for billing purposes.

Database design

Defining the system data model early on will clarify how data will flow among different components of the system. Also you could draw an ER diagram using the diagramming tool to enhance your design...

According to the design the read to write ratio is

URL Record in database - Over all max 3MB per record

userid (15 chars), 
longUrl (2MB), Indexed
expiresOn (timestamp), 
shortUrl (6 chars), Indexed field, Primary Key // this is also the alias in case user provided one
createdOn (timestamp) 

Database Sharding

The datatabase is sharded using the primary key of URL Record that is the generated short hash for the url. (Optionally, each shard can have a read replica for lookups). The routing is done as below,

mod(abs(CRC32(short-code)), N) \\ Where N is the no of postgres nodes

We use a consistent hashing strategy to figure out which shard we should write the data. If the user provides an alias, this may appear in unexpected partition than the generated short code based hash. The code that reads the data first identifies which shard to read from by computing a mod of the short code and reach out to the respective shard.

shard_id = mod(abs(CRC32(short-code)), N) 

Read operations attach the share id to the query, this will introduce a latency since the application has to calculate the hash & mod the short code, find out which shard the data is available and read from the shard.

Database Partition

Under a shard, the short url code is used for partitioning the data this is infact hash based partitioning.

Handling Hot Shard/Partition

When a short url goes viral and spikes the request, caching the url onto Redis is an option to avoid looking up in a shard. We may never know when the url is created whether its going to be viral or not, however on the first redirect we push this to Redis cache and further it stays on in Redis (with LRU eviction strategy)

High-level design

You should identify enough components that are needed to solve the actual problem from end to end. Also remember to draw a block diagram using the diagramming tool to augment your design. If you are unfamiliar with the tool, you can simply describe your design to the chat bot and ask it to generate a starter diagram for you to modify...

API - Generate short URL, get long url using a short url, check whether alias exists or not
Service layer
1. URL Generator Service - Generates a short url code for a given long url
2. Redirection Service - Given a short url, the redirection service lookup in cache with key shortUrl, if found the long url is returned to API gateway else the service looks up in database and pull the shortUrl record, add to cache, returns the long URL which API gateway wraps into response header with HTTP 302
3. Alias Lookup service - a global bloom filter that says an alias is definitely not found or probably its taken
4. Service layer deployed in Kubernetes or containerized environment lets API gateway pass the request to Kubernetes controller and let Kubernetes load balance the request to an instance of a microservice

Request flows

Explain how the request flows from end to end in your high level design. Also you could draw a sequence diagram using the diagramming tool to enhance your explanation...

UI - User opens website and submits a long url + optional alias
Load balancer - UI submits a request to load balancer working in round robin fashion routes the requests to respective api gateway i.e routing based on the path in the url
API gateway - The api gateway routes the request to an instance of a microservice. At the same time api gateway logs metrics on who, when and what is looked up or generating a url
The API forwards request to a service that generates a short url and returns with HTTP 201 response code indicating url is generated and persisted in the database along with generated url. User copies this url.
Third-party sites or application presents a shorturl with a domain https://tinyurl.ly/{short-url} which the API gateway redirects to a redirection service that looks up in the cache, on cache miss, the services looks up in database, add to cache and return the long url

Detailed component design

Dig deeper into 2-3 components and explain in detail how they work. For example, how well does each component scale? Any relevant algorithm or data structure you like to use for a component? Also you could draw a diagram using the diagramming tool to enhance your design...

Short URL Generator - One or more nodes that generates a hash for the given long url.
Cache Service - a redis cache service that uses LRU eviction strategy to cache the shortUlr-to-longurl key/values for fast lookup using short url
Alias lookup - checks whether a given alias is taken or not

Hash Generation Steps

Each short url generator has a logic to generate a hash without collisions and without the need for distributed counter or hash generator. Each node takes a 36 bit id, which contains (bits from right to left)

0 - 21 bits - 22 bit time stamp from latest epoch for instance from 18-May-2025
22 to 27 - 6 bit node id that is generating the code, each node has a unique id assigned at the start
28 to 36 - a 8 bit sequence number starting at 0 and monotonically increases, this is persisted onto disk/database for recover from service crash
Once a 32 bit ID is generated, the id is base62 encoded to generate a hash. This hash is used as the short url code

Verifying Alias exists or not

Each url generator service connects to a global bloom filter. The Bloom filter uses hashes and keep the bits on/off for the hash and lightning fast to say whether an alias exists or not. however it might be wrong with 1% error, however it can act as first defence and avoid going into database. The given alias is checked against the bloom filter for its existence, if it says "may-be-present", we confirm this against a database lookup else we definitely let the user continue to use alias ie. when bloom filter says no then we let the alias go through.

Trade offs/Tech choices

Explain any trade offs you have made and why you made certain tech choices...

Avoids distributed hash generator - this introduces maintenance cost of running a zookeeper or such service to generate counter ranges and let individual nodes generate its own unique hashes
Postgres database - since we are dealing less writes compared to reads 1000 Reads to single url write, as well as well-structured data of a long url record a regular RDBMS with sharding and partition along with an external cache (Redis) is good enough to handle 180M URLs in 5 yrs. A RDBMS binary search on indexes will be sufficient to handle this writes. Using a NoSQL helps in sharding, partitioning, scaling to either write or read helps however that might be a heavy lift atleast at the start of this solution.
Load Balancer - NGINX or ALB to route the request to a API Gateway for either generating short url or redirection requests
API Gateway - A gateway service like Kong api gateway or AWS api gateway that routes requests, log requests, has additional functions to rate limit or deny requests is suitable
Short URL generator/Redirector service - a nodejs or even a java/go/rust service deployed in kubernetes is suitable to scale. the kubernetes has its own orchestration, load balancing and controller components to route requests to a service running in the pod.

Failure scenarios/bottlenecks

Try to discuss as many failure scenarios/bottlenecks as possible.

Expiring a URL - this requires synchronizing database with redis cache
Hash Collision - Can have a Bloom filter to quickly check whether a hash is already seen earlier to avoid collisions, this bloom filter is deployed in each of the url generator node.
database. - The single instance of postgres going down is a major bottleneck for redirection and url generator. This needs to have a synchronous replica to a standby node that will ensure failover and disaster recovery
Placing the cache service right below API gateway or loadbalancer before redirection microservice helps in looking up fast, however we cannot track metrics of how many lookups are done and bill the user. Though this is not an immediate requirement, but in the future we might introduce these functionalities

Future improvements

What are some future improvements you would make? How would you mitigate the failure scenario(s) you described above?

Billing and Payment - handle billing user based on the redirections - say 0.001 USD for a redirection
Implement telemetry - track who clicked, from where and target for redirection requests. Helps the user in devising specific marketing strategy for a region
Logging & Monitoring - implement observability and monitoring
User Authentication & Authorization