My Solution for Designing a Simple URL Shortening Service: A TinyURL Approach

by spectrum_kraken284

Requirements


Functional Requirements:


  • Create a short URL for a given long URL.
  • Return the long URL associated with a given short URL.
  • The url will be stored for 3 days
  • If the url already exists, it can return the same value
  • Same url can only have limit of 50 requests per minute




Non-Functional Requirements:


  • low latency - every endpoint must have low latency, get must be under 300 ms and post must be under 500 ms. Both must have 95% of acceptable time requests
  • high availability - the system cannot be down even if we have deployments
  • Every service must have at least 2 tasks for redundancy
  • Horizontal scalabilitty:
    • Every task must have auto-scalling configured, if it has over 80% of memory or cpu consumption it must start one new task, if it has under 40% of memory or cpu consumption it must stop one task
    • The database must aways be available, as dynamodb, it can be used sharding
    • Every servicemust have a load balancer to controle the traffic between the tasks defined in ecs cluster


API Design

Define the APIs expected from the system. This is your chance to analyze and define the read and write paths so that you can come up with the high-level design...


POST /shorten

  • URL create to receive a url and return the shorten url

GET /{id}

  • Receive the id and redirect to page


High-Level Design

Clients: Users or apps that will use the url shortening service

CDN: CDN to cache redirect requests

API Gateway:

  • Entry of the point for requests. It will have a generic rate limit implemented to avoid attacks (each endpoint will have limit of 600 requests per minute).

Shortener service:

  • Rate limit: Will implement rate limit by urls
  • The service resposible for shortening the url and return the new url for client. When the client requests POST /shorten, the api-gateway will redirect to this service.
  • To create the shortened url, we can use the snowflake_id (hash these values: task_id, timestamp, sequence).
    • The task_id will be the hash of a random uuid v4 generated. When the tasks starts, it will create a global UUID vinculated with this task
    • The timestamp will be at current time
    • sequence is based on same timestamp and always will reset on next timestamp
  • This services will define de primary key of the table in a deterministic way based on url value, it will be the canonical key of table


redirect-service: Service responsible for search the url by the code and redirect to the page. It will use caffeine cache for local cache to reduce load in database


database: dynamodb, will store the id of url and will be de primary key and the original url. It will use horizontally scalabilty, will priorize availability than consistency. For high demands it will use auto-scalling configuration to keep low latency and multi-az replication for availabilty



Detailed Component Design

Deep dive into 2-3 key components. Explain how they work, how they scale, discuss tradeoffs, capacity, and any relevant algorithms or data structures.


CDN: CDN to cache redirect requests

  • It will only cache the get methods to redirect client to the original url, the ttl will be configured at 300 seconds

Function of shortener service:

  • The rate limit will use a redis to controle the number of requests. The key will be the url and the value will be the number of requests, every record will have 1 minute of ttl. The max number of requests will be 60
  • The shortener services generates the id and code of the link to bound with the original link
  • To generate the id the service will do those operations:
    • normalize the url removing special chars and set all characters to lower case
    • generate a hash based on url using the sha-256 algorith
    • concat the value with URL#{hash_url}
  • the url code it will be the combined hash of theses attributes
    • task_id: will be a random uuid based on ti
    • timestamp: current timestapm
    • sequence: the sequence based on timestamp, witch means that if one url is asked to be shortened at same timestamp at same task, it will be increased by 1, if the timestamp chances the sequence will be reset to 0
  • The combination of these three properties will have extreme low probabilty of same code conflicts

redirect-service

  • the redirect service will use a local cache. If the id will not be found on cache, it will search the data on database

Database

  • It will be dynamodb
  • it will use the id as pk
  • this database will focus more on availability than consistency
  • It will use the horizontal scalability and eventual consistency
  • For replication it will use multi-az
  • It will be configured with ttl of 3 days
  • This will be the datamodel:
    • id -> pk of table and will be the canonical key genearated on shortener service
    • url_code: the url genarate with snowflake_id
    • original_url: the original url
    • expires_at: date of expiring record, used to controle the ttl of the record




Supports markdown