Designing A Simple Url Shortening Service A TinyURL Approach - System Design

My Solution for Designing a Simple URL Shortening Service: A TinyURL Approach

by spectrum_kraken284

Requirements

Functional Requirements:

Create a short URL for a given long URL.
Return the long URL associated with a given short URL.
The url will be stored for 3 days
If the url already exists, it can return the same value
Same url can only have limit of 50 requests per minute

Non-Functional Requirements:

low latency - every endpoint must have low latency, get must be under 300 ms and post must be under 500 ms. Both must have 95% of acceptable time requests
high availability - the system cannot be down even if we have deployments
Every service must have at least 2 tasks for redundancy
Horizontal scalabilitty:
- Every task must have auto-scalling configured, if it has over 80% of memory or cpu consumption it must start one new task, if it has under 40% of memory or cpu consumption it must stop one task
- The database must aways be available, as dynamodb, it can be used sharding
- Every servicemust have a load balancer to controle the traffic between the tasks defined in ecs cluster

API Design

Define the APIs expected from the system. This is your chance to analyze and define the read and write paths so that you can come up with the high-level design...

POST /shorten

URL create to receive a url and return the shorten url

GET /{id}

Receive the id and redirect to page

High-Level Design

Clients: Users or apps that will use the url shortening service

CDN: CDN to cache redirect requests

API Gateway:

Entry of the point for requests. It will have a generic rate limit implemented to avoid attacks (each endpoint will have limit of 600 requests per minute).

Shortener service:

Rate limit: Will implement rate limit by urls
The service resposible for shortening the url and return the new url for client. When the client requests POST /shorten, the api-gateway will redirect to this service.
To create the shortened url, we can use the snowflake_id (hash these values: task_id, timestamp, sequence).
- The task_id will be the hash of a random uuid v4 generated. When the tasks starts, it will create a global UUID vinculated with this task
- The timestamp will be at current time
- sequence is based on same timestamp and always will reset on next timestamp
This services will define de primary key of the table in a deterministic way based on url value, it will be the canonical key of table

redirect-service: Service responsible for search the url by the code and redirect to the page. It will use caffeine cache for local cache to reduce load in database

database: dynamodb, will store the id of url and will be de primary key and the original url. It will use horizontally scalabilty, will priorize availability than consistency. For high demands it will use auto-scalling configuration to keep low latency and multi-az replication for availabilty

Detailed Component Design

Deep dive into 2-3 key components. Explain how they work, how they scale, discuss tradeoffs, capacity, and any relevant algorithms or data structures.

CDN: CDN to cache redirect requests

It will only cache the get methods to redirect client to the original url, the ttl will be configured at 300 seconds

Function of shortener service:

The rate limit will use a redis to controle the number of requests. The key will be the url and the value will be the number of requests, every record will have 1 minute of ttl. The max number of requests will be 60
The shortener services generates the id and code of the link to bound with the original link
To generate the id the service will do those operations:
- normalize the url removing special chars and set all characters to lower case
- generate a hash based on url using the sha-256 algorith
- concat the value with URL#{hash_url}
the url code it will be the combined hash of theses attributes
- task_id: will be a random uuid based on ti
- timestamp: current timestapm
- sequence: the sequence based on timestamp, witch means that if one url is asked to be shortened at same timestamp at same task, it will be increased by 1, if the timestamp chances the sequence will be reset to 0
The combination of these three properties will have extreme low probabilty of same code conflicts

redirect-service

the redirect service will use a local cache. If the id will not be found on cache, it will search the data on database

Database

It will be dynamodb
it will use the id as pk
this database will focus more on availability than consistency
It will use the horizontal scalability and eventual consistency
For replication it will use multi-az
It will be configured with ttl of 3 days
This will be the datamodel:
- id -> pk of table and will be the canonical key genearated on shortener service
- url_code: the url genarate with snowflake_id
- original_url: the original url
- expires_at: date of expiring record, used to controle the ttl of the record

Markdown supported