Requirements
Functional Requirements:
- Create a short URL for a given long URL.
- Return the long URL associated with a given short URL.
Non-Functional Requirements:
- Low latency:
- DB replication according to geo positioning
- Cache, CDN for publicly exposed shortened url
- Service replications in geo edge locations, managed by load-balancer (indicated as api gateway in diagram)
- High Availability
- DB sharding, replication according to geo position
- load-balancer for service replications
- Service replicas in geo edge locations
- a geo routing in the api gateway, to ensure resilience and high availability
- Scalability
- handled by load balancer
- scales up/down services in geo locations according to demand
- Reliability/resilience
- handled by load balancer
- reroutes to replicas at other geo locations if one goes down
- for database, sharded replicas are there
API Design
- Post - for url creation
- Get - for fetching url mapping
High-Level Design
Flow goes like the following steps
- Client request goes to api gateway
- The api gateway decides which service, and which replica to send the request to, in a load balanced way
- For url creation
- It generates a mapping
- Sends the data to storage layer
- Storage layer confirms saved data
- api returns a response
- For url redirection
- It asks storage layer for an existing mapping
- If there is, it returns info
- The redirect url is put in the header, and response is 302 (maybe), so the browser automatically redirects
- The storage layer
- For creation, it saves the record in database directly
- For read, it queries the cache first, if the data is there, then done
- Otherwise gets from database, saves it in cache, then returns
Detailed Component Design
Cache
- Need analysis of record hits in cache. Just relying on new read may cause over-usage between cache-database transactions
Creation Service
- Multiple creation for same redirection may happen. As there is no user management, it is hard to track. So, database can be overflooded.
Latency
- Hard to have geo based storage, as a shortened url may be accessed from anywhere. Solution is, replication in multiple geo-edge locations, then analysis of hit locations to define where it should be