Requirements
Functional Requirements:
- Long url --> short url
- Short url --> long url
- optional expiration time
- track data analytics
Non-Functional Requirements:
- Low latency. this service is simple, 200ms is more than enough
- High scalability, easily scalable
- High availability, service intact even when some nodes fail
- Data durability
API Design
- Create(longUrl string) string which accepts the long url and returns a short url.
- FindOriginal(shortUrl string) string which accepts the short url and returns the long url.
High-Level Design
Describe the overall system architecture. Identify the main components needed to solve the problem end-to-end. Use the diagramming tool to create a block diagram.
Entry layer:
- CDN (cacheing viral links, routing, ddos protection)
- Load balancer (balance traffic)
- Api Gateway (auth, rate limiting, throttling)
Services:
- Shortening service (long to short)
- Finding service (short to long)
- ID generation service
Storage:
- RDBMS
- cache
- analytics
Detailed Component Design
Deep dive into 2-3 key components. Explain how they work, how they scale, discuss tradeoffs, capacity, and any relevant algorithms or data structures.
load balancer can also be scaled vertically by adding more hardware or horizontally by adding more nodes. load balancers need to access the same server list, it can be an etcd or some databases. it also help to do health checks. many algorithms can be used here, round robin or least connections. round robin works well in simple scenarios but it doesn't know if the nodes are actually busy and some requests apparently take longer time than the others. least connections send requests to the least nodes having the least connections, but also connections count doesn't mean if nodes are overloaded. maybe weighted least connections are fine too.
Database can also be scaled horizontally or vertically. vertically by adding more hardware, horitonzally by adding more read write replicas. multiple nodes support high availability also. it works with cache that caches data. this business is simple so we could simply cache the data and set an expiration time.
ID generation is important here, uniqueness, hard to guess, and safe are the core features. Uniquesness we can use base64 with 10 digits. It's also hard to guess. But we may have two duplicate IDs in rare conditions. And a single node can fail, thus we need multiples nodes and another safety layer. How to handle collisions? We should take into timestamp, workerID and base64 to avoid duplicate IDs. WorkerID assignment can be done in a config manually, but it could be error prone in deployment. The best scenario is to lease worker id. The last layer of protection is a UNIQUE() contraint in DB. If there is a collision the contraint will return an error.
ID generation also isn't a single node, it is a group of nodes. Each node is stateless and assigned a workedId when it's active working.
cache design, cache is a big area, it includes brower cache, CDN and regional redis (for example). for generation, add to db. for redirect, check cdn, if miss then redis, if miss then db, if hit write to redis and cdn. for update and disable, update db, publish send invalidation event and redis and cdn will remove data. And we also need a TTL plan to avoid bloat. For viral link, we can make it hours or days, for regular, we can do 30s to 60s. For evitction, we usually do LRU or LFU, either is fine in this case.