Define the APIs expected from the system. This is your chance to analyze and define the read and write paths so that you can come up with the high-level design...
1) POST API to generate a short URL from the given long URL
2) GET api to return the formed long url from shorter one
Describe the overall system architecture. Identify the main components needed to solve the problem end-to-end. Use the diagramming tool to create a block diagram.
1) API Layer: 2 APIs; post (to get short url), & get (to get longer url)
2) URL Service:
a) checks if the url is valid, is it already present in cache/db?
b) generates unique short url & stores in db (mapping of short-> long url, created_at, expiration, click_count=0)
3) unique id generation:
a) create base 32 encoding of auto-increment id
b) create base 32 encoding of randomly generated 6-8 character string
c) use consistant hashing to avoid duplications
4) DB: url mapping, created_at, expiration, click_count
nosql cassandra DB as we need to store large amount of data
5) Cache: redis. HOT urls based on click_count to be stored here. whenever the expiration is met for a given url, that url is replaced with the next most_clicked url
Deep dive into 2-3 key components. Explain how they work, how they scale, discuss tradeoffs, capacity, and any relevant algorithms or data structures.
user gives long url -> API -> url service (validates n checks duplicate) -> store it in DB
User clicks short url -> API -> check cache -> if missed, check DB -> redirects
We can use a CDN to provide low redirect latency. We can also add Load Balancer in front of the API gateway to make the system scalabale. Add ratelimiting per IP/per_site for controlling burst traffic.
The system can be scaled horizontally & split-brain scenarios can be handled using electing leader mechanism
We can also shard the DB using the short-url pattern