Requirements


Functional Requirements:


  • Create a short URL for a given long URL.
  • Return the long URL associated with a given short URL.



Non-Functional Requirements:


  • Low latency:
    • DB replication according to geo positioning
    • Cache, CDN for publicly exposed shortened url
    • Service replications in geo edge locations, managed by load-balancer (indicated as api gateway in diagram)
  • High Availability
    • DB sharding, replication according to geo position
    • load-balancer for service replications
    • Service replicas in geo edge locations
    • a geo routing in the api gateway, to ensure resilience and high availability
  • Scalability
    • handled by load balancer
    • scales up/down services in geo locations according to demand
  • Reliability/resilience
    • handled by load balancer
    • reroutes to replicas at other geo locations if one goes down
    • for database, sharded replicas are there


API Design

  • Post - for url creation
  • Get - for fetching url mapping



High-Level Design

Flow goes like the following steps

  • Client request goes to api gateway
  • The api gateway decides which service, and which replica to send the request to, in a load balanced way
  • For url creation
    • It generates a mapping
    • Sends the data to storage layer
    • Storage layer confirms saved data
    • api returns a response
  • For url redirection
    • It asks storage layer for an existing mapping
    • If there is, it returns info
    • The redirect url is put in the header, and response is 302 (maybe), so the browser automatically redirects
  • The storage layer
    • For creation, it saves the record in database directly
    • For read, it queries the cache first, if the data is there, then done
    • Otherwise gets from database, saves it in cache, then returns




Detailed Component Design

Cache

  • Need analysis of record hits in cache. Just relying on new read may cause over-usage between cache-database transactions

Creation Service

  • Multiple creation for same redirection may happen. As there is no user management, it is hard to track. So, database can be overflooded.

Latency

  • Hard to have geo based storage, as a shortened url may be accessed from anywhere. Solution is, replication in multiple geo-edge locations, then analysis of hit locations to define where it should be