Requirements
Functional Requirements:
- Create a short URL for a given long URL.
- Return the long URL associated with a given short URL.
Non-Functional Requirements:
low latency
scalability
reliability
faster url loading on reads
- List the key non-functional requirements (eg low latency, scalability, reliability, etc.)...
Capacity Estimation
We may have some peaks, but lets start by simple and low volume solution. Imagine 1000 users using it a day to generate an URL, and also let`s imagine that for each generate URL we may have 100 access on avarage. Soo daily we have more 100x more reads than writes. Its important to take care of this if scaling is needed. For this volume only an relational database is enough
API Design
we may have 2 routes. One for creating and other for get the url by accessing the generated URL from the creating. The thing here is: we could use cache to improve performance at read level, after creation and the first search. also we need to take care and invalidate older or invalids urls
High-Level Design
Client level + API Gateway + LB -> Service with the 2 described routes. Each of the routes responsible for one thing. -> Non relational database (in order to make it faster to match shorter URL with complete URL. Non relational because the relationship is weak and its easier to scale if needed
Database Design
Non relational. Shorter URL Key, complete URL column. Index is the Shorter URL. Whenever user tries to access the URL using the shorter URL it comes to this database search (if cache is not set at the time) and find it by URL shorter key, returning the full url
Detailed Component Design
Deep dive into 2-3 key components. Explain how they work, how they scale, discuss tradeoffs, capacity, and any relevant algorithms or data structures.
Database can scale the numbers of available write capacity units or reads capacity units (when using dynamo for example. Also we could change the billing capacity to pay per request and let it full mode)
Bad thing of that is that we cant upgrade the system to have more relations with the stored urls, since non relational database could be a little bad in scenarios with a lot of relationships
Also we may have the cache in here, that`s essential to make the url get more performant, this is used to let the database more light weight, since it would save 99% of the requests for the generated shorter urls