Requirements
Functional Requirements:
- User is able to enter a long url and receive a shortened url.
- User is able to redirect to original url using the shortened url.
- User has options to set expiration as well as use custom alias for their short url.
Scale estimate - 1B short url created and 100M DAU
Non-Functional Requirements:
- Highly available - No redirect should fail
- Highly scalable - System should be able to handle 1B url creation and 100M DAU
- High read throughput - system is read heavy so read throughput should be high
- URL uniqueness - One long url should only map to one short url
API Design
// Create short url
- Post /urls
Request - {
longUrl,
expiration?,
alias?
}
Response - {
shortUrl
}
// Redirect to long url using short url
- Get /{short_url}
Response - HTTP 302 redirection to long url
High-Level Design
When the system receives request to create short url, it first hits API gateway, api gateway implements rate limitation to prevent attacks and keep system usage fair. The request is then routed to Load balancer which routes request to the write server and writer server is where we would have logic to generate short urls and ensure its uniqueness, we will use a global counter with base62 encoding for that, to be discussed in detailed in deep dive section. It would then save the long url, short url and expirationg and any alias used to the database.
Subsequently, when a read request comes in, it follows the same path to Load balancer, which then is recieved by the read server. Read server first checks cache if cache hit then ti will return response to redirect user to long url, if cache miss then it goes to DB to fetch the data. We will use Redis and LRU cache for low latecny and higher read throghput.
Detailed Component Design
We will first talk about URL uniqueness first as this is the most critical part of the system.
- To ensure that the urls are short and always unique so there no conflict in the system we will use Global counter with Base64 encoding. A global counter will increment each time the system recieves a request to create short url, that will be then used to geneate base62 encoded short url. We need to ensure our system can create 1B unique short urls. If we keep 6 characters url, then with base64 we can have nearly 56B unique urls which is more than our requirement. Also, this counter will be stored in redis as Redis can work seamlessly in distributed servers.
Second we need to ensure is high read throghput, our sytem is read heavy so our system should be handle it efficiently
- For high read throughput, we use multi cache layer.
- First of which is CDN servers/edge closer to user location, So when a request hits the system, it goes to CDN first where if the short url is already cached the resposne will be returned back.
- Second layer of cache is Redis. When the read server recieved the request, it will try redis first, if theres cache hit the response is returned if not then DB is invoked. We will use LRU cache so all hot data is always fetched from redis maintaining low latency for reads.
- This multi layer caching ensures DB is invoked significantly lesser for hot data.
- We will also use horizontal scaling on read service for better throughput, We will add more read servers and our load balancer will route requests to these servers efficiently. As our scale is massive we can efficiently use Random picking for load balancer to route requests to server.