Requirements
Functional Requirements:
- Long url --> short url
- Short url --> long url
- optional expiration time
- track data analytics
Non-Functional Requirements:
- Low latency. this service is simple, 200ms is more than enough
- High scalability, easily scalable
- High availability, service intact even when some nodes fail
- Data durability
API Design
- Create(longUrl string) string which accepts the long url and returns a short url.
- FindOriginal(shortUrl string) string which accepts the short url and returns the long url.
High-Level Design
Describe the overall system architecture. Identify the main components needed to solve the problem end-to-end. Use the diagramming tool to create a block diagram.
Components needed:
- CDN
- Load balancer
- api gateway
Services:
- Shortening service (long to short)
- Finding service (short to long)
- ID generation service
Storage:
- RDBMS
- cache
- analytics
Detailed Component Design
Deep dive into 2-3 key components. Explain how they work, how they scale, discuss tradeoffs, capacity, and any relevant algorithms or data structures.
load balancer can also be scaled vertically by adding more hardware or horizontally by adding more nodes. load balancers need to access the same server list, it can be an etcd or some databases. it also help to do health checks. many algorithms can be used here, round robin or least connections. round robin works well in simple scenarios but it doesn't know if the nodes are actually busy and some requests apparently take longer time than the others. least connections send requests to the least nodes having the least connections, but also connections count doesn't mean if nodes are overloaded. maybe weighted least connections are fine too.
Database can also be scaled horizontally or vertically. vertically by adding more hardware, horitonzally by adding more read write replicas. multiple nodes support high availability also. it works with cache that caches data. this business is simple so we could simply cache the data and set an expiration time.
ID generation is important here, uniqueness and safety are crucial. we could just generatae a random string using base64 with a length of 10. 64 to the power of 10 is extremely large. If there is a collision, we just retry. It is extremely fast and most importantly, it doesn't relate to the original long url, so it's hard to find relation and very secure.