Requirements


Functional Requirements:


  • Long url --> short url
  • Short url --> long url
  • optional expiration time
  • track data analytics



Non-Functional Requirements:


  • Low latency. this service is simple, 200ms is more than enough
  • High scalability, easily scalable
  • High availability, service intact even when some nodes fail
  • Data durability


API Design


  • Create(longUrl string) string which accepts the long url and returns a short url.
  • FindOriginal(shortUrl string) string which accepts the short url and returns the long url.


High-Level Design


Describe the overall system architecture. Identify the main components needed to solve the problem end-to-end. Use the diagramming tool to create a block diagram.


Components needed:

  • CDN
  • Load balancer
  • api gateway

Services:

  • Shortening service (long to short)
  • Finding service (short to long)
  • ID generation service

Storage:

  • RDBMS
  • cache
  • analytics





Detailed Component Design

Deep dive into 2-3 key components. Explain how they work, how they scale, discuss tradeoffs, capacity, and any relevant algorithms or data structures.


load balancer can also be scaled vertically by adding more hardware or horizontally by adding more nodes. load balancers need to access the same server list, it can be an etcd or some databases. it also help to do health checks. many algorithms can be used here, round robin or least connections. round robin works well in simple scenarios but it doesn't know if the nodes are actually busy and some requests apparently take longer time than the others. least connections send requests to the least nodes having the least connections, but also connections count doesn't mean if nodes are overloaded. maybe weighted least connections are fine too.


Database can also be scaled horizontally or vertically. vertically by adding more hardware, horitonzally by adding more read write replicas. multiple nodes support high availability also. it works with cache that caches data. this business is simple so we could simply cache the data and set an expiration time.


ID generation is important here, uniqueness and safety are crucial. we could just generatae a random string using base64 with a length of 10. 64 to the power of 10 is extremely large. If there is a collision, we just retry. It is extremely fast and most importantly, it doesn't relate to the original long url, so it's hard to find relation and very secure.