Designing A Simple Url Shortening Service A TinyURL Approach - System Design

My Solution for Designing a Simple URL Shortening Service: A TinyURL Approach

by phantom_quantum863

Requirements

Functional Requirements:

Create a short URL for a given long URL.
Return the long URL associated with a given short URL.
Links expire after x time of non-usage

Non-Functional Requirements:

Low Latency (200ms)
Scalability (Able to handle 100k RpS using horizontal scalability)
Reliability total availability 99%
Rate limit to 5 requests per second

Questions:

Can users see how their URLs performed? analytics?
If yes, can two users with the same URL generate the same hash? my assumption is no.
Do we need analytics for the whole system? or is this out of scope?

Capacity Estimation

We have read bias

Store Links indefinitely require an DB cluster

Average link data is size 200 characters

1M links per day

200M character each day in the DB this gives us 6 Billion characters in the DB or 6GB a month.

5 years projection: 360 GB

Inbound traffic for link creation is 6GB a day

Inbound traffic for link redirection 1GB a day given 100 million short url redirections.

API Design

Post /v1/shorten

Body {"url": ""}

return 201 // no content

return 400 no valid url exists in the body

return 500 internal server error {"errorCode: int, message: ""}

GET

return 302 redirect

return 404 no short url found

return 500 internal server error {"errorCode: int, message: ""}

High-Level Design

Client: just an input filled and a path
API Server:
- Resolve short URLs
- Send shorten URL requests to URLProcessor service.
- Send retrieve URL event to analytics engine.
URLProcessor: perform the URL shortening logic, including hash collision resolution etc.
Pubsub: for communicating SBE between FE and the URLProcessor Service.
KV Store: for quickly caching the hot urls
Message/event bus: to send a URL to URLProcessor
Database: Store URL mapping wit metadata maybe
Analytics Engine: Listens to the messages
Analytics database: a timeseries database to store access patterns.

Notes on technology selection:

NATS: supports the user cases of KV pubsub, and messaging queue
With NATS durable storage I can claim that we don't even need another database. although I would use one for the analytic.

Scalability:

Separating the API and the URLProcessor we can do a horizontal scaling for both independently.
Another approach is to create a gateway component that redirects traffic to the api to two components which which can also be scaled separately.
- Creating URLs
- Redirect Service.
Scaling can be done based on request load.

Low Latency:

Using a cache to load the hot URLs makes sure the latency stays very low.

Availability:

Since the components are independent then they can have redundancy behind the load balancer.

Database Design

Database can have a basic table called URLs, it has very basic information based on the requirements above. since we are user agnostic then the URL should be maybe have URL, hash, created at and last accessed (this will help with crashes later to auto reload last access URL/hot URLs) the last access at field will be pushed as a bulk snapshot from the KV store.

Database can be shared using hash ranges to have stable shards and avoid re-balancing.

Since this is a read heavy system we can have read replicas for the URLs, with read repair (maybe),read from active replica or read your own writes to provide strong consistency.

Detailed Component Design

Client:

Typing a URL and hitting submit will do two things will send api request to shorten the URL and will subscribe to URLShortened Event with the URL as a key
When a the event is published the user will be able to see the shortened URL
Client access should be rate limited

API:

All traffic to the API should be both rate limited and load balanced.
Each request to visit shortened URL first check the cache for a url hit. If there is no hit then the database is checked and the the result is loaded on the cache.
- Note: Use a CDN for frequently visited URLs to off load the API completely instead of the internal cache.

URLProcessor:

It receives a URL Creation Event
Search for the URL in the database, if it exists just broadcast a URLShortened event to the pubsub ( this way we don't have to create anew shortedned URL to URLS we already have)
if not then generate a hash and pull all hashes that have this has as a prefix. multiple exists then append a new suffix to the hash and then save the URL and broadcast URLShortened event.

Markdown supported

My Solution for Designing a Simple URL Shortening Service: A TinyURL Approach

by phantom_quantum863

Requirements

Functional Requirements:

Create a short URL for a given long URL.
Return the long URL associated with a given short URL.
Links expire after x time of non-usage

Non-Functional Requirements:

Low Latency (200ms)
Scalability (Able to handle 100k RpS using horizontal scalability)
Reliability total availability 99%
Rate limit to 5 requests per second

Questions:

Can users see how their URLs performed? analytics?
If yes, can two users with the same URL generate the same hash? my assumption is no.
Do we need analytics for the whole system? or is this out of scope?

Capacity Estimation

We have read bias

Store Links indefinitely require an DB cluster

Average link data is size 200 characters

1M links per day

200M character each day in the DB this gives us 6 Billion characters in the DB or 6GB a month.

5 years projection: 360 GB

Inbound traffic for link creation is 6GB a day

Inbound traffic for link redirection 1GB a day given 100 million short url redirections.

API Design

Post /v1/shorten

Body {"url": ""}

return 201 // no content

return 400 no valid url exists in the body

return 500 internal server error {"errorCode: int, message: ""}

GET

return 302 redirect

return 404 no short url found

return 500 internal server error {"errorCode: int, message: ""}

High-Level Design

Client: just an input filled and a path
API Server:
- Resolve short URLs
- Send shorten URL requests to URLProcessor service.
- Send retrieve URL event to analytics engine.
URLProcessor: perform the URL shortening logic, including hash collision resolution etc.
Pubsub: for communicating SBE between FE and the URLProcessor Service.
KV Store: for quickly caching the hot urls
Message/event bus: to send a URL to URLProcessor
Database: Store URL mapping wit metadata maybe
Analytics Engine: Listens to the messages
Analytics database: a timeseries database to store access patterns.

Notes on technology selection:

NATS: supports the user cases of KV pubsub, and messaging queue
With NATS durable storage I can claim that we don't even need another database. although I would use one for the analytic.

Scalability:

Separating the API and the URLProcessor we can do a horizontal scaling for both independently.
Another approach is to create a gateway component that redirects traffic to the api to two components which which can also be scaled separately.
- Creating URLs
- Redirect Service.
Scaling can be done based on request load.

Low Latency:

Using a cache to load the hot URLs makes sure the latency stays very low.

Availability:

Since the components are independent then they can have redundancy behind the load balancer.

Database Design

Database can be shared using hash ranges to have stable shards and avoid re-balancing.

Since this is a read heavy system we can have read replicas for the URLs, with read repair (maybe),read from active replica or read your own writes to provide strong consistency.

Detailed Component Design

Client:

Typing a URL and hitting submit will do two things will send api request to shorten the URL and will subscribe to URLShortened Event with the URL as a key
When a the event is published the user will be able to see the shortened URL
Client access should be rate limited

API:

All traffic to the API should be both rate limited and load balanced.
Each request to visit shortened URL first check the cache for a url hit. If there is no hit then the database is checked and the the result is loaded on the cache.
- Note: Use a CDN for frequently visited URLs to off load the API completely instead of the internal cache.

URLProcessor:

It receives a URL Creation Event
Search for the URL in the database, if it exists just broadcast a URLShortened event to the pubsub ( this way we don't have to create anew shortedned URL to URLS we already have)
if not then generate a hash and pull all hashes that have this has as a prefix. multiple exists then append a new suffix to the hash and then save the URL and broadcast URLShortened event.

Markdown supported