System Requirements
Functional Requirements:
- The user should input a long URL, and the system should provide a short URL as an alias.
- When people click on the short URL, the system should redirect to the long URL associated with that given short URL.
- The system should track user engagement by providing analytics such as:
- User click tracking
- User click tracking by country
Non-Functional Requirements:
- Performance (The system has to have low latency, the users can't wait for longer than 1 second for a redirect)
- Reliability (The system needs to be highly available and data should be consistent)
- Scalability (The number of the system users could increase quickly specially on the return end, must ensure the ability for horizontal scaling, the load balancer would balance the requests between multiple application servers which could be distributed in multiple regions)
API Design
- Create a short URL - POST /url parameters: long_url returns shortened_url
- Redirect to original URL - GET /url returns 301-302 redirect with location of long URL
- Get the number of clicks on a URL - GET /analytics/{urlId}/clicks?country=Canada returns the number of clicks on a URL in a specific country. If the country is not specified, then it gets the overall number of clicks worldwide
High-Level Design
- CDN as the entry point to the system, with enabled caching to speed up the request redirects
- An Edge function to record click events and emit them asynchronously to an events queue for analytics
- An events queue to record clicks on links without delay in redirection
- An analytics service that reads the click events from the events queue, transforms them into analytics data, and stores them in an analytics database
- On cache misses, the request is forwarded to the Load Balancer(s)
- The load balancer manages traffic to the application servers, forwarding the requests to the appropriate servers
- Multiple Application servers to ensure scalability and availability:
- URL Creation servers, these ensure the creation of unique keys by generating a snowflake-style ID and converting it to base62.
- URL Redirection Servers, this service would keep the logic to find the actual short URL from the read shards based on the short_code.
- Analytics Servers, which would read from the events queue and transform it into analytics, then record the click analytics in the analytics database shards.
- Databases with read replicas to ensure scalability and data availability
- Database Sharding with 2 different strategies for creating URLs based on the short_code and hashing, and another for tracking clicks based on location
Detailed Component Design
In a URL shortener system, user interactions with the system have different outcomes based on whether the user is creating a shortened URL or navigating to a short URL to be redirected to the long URL.
In the scenario where a user is creating the short URL, the application would go directly through the CDN to the load balancer, which would forward the POST request to the URL Creation service, which in turn will create the URL and store it in the database. The URL creation service generates the unique keys for the URL using a snowflake-style ID (Based on a timestamp, node ID, and a sequence number) generation approach, which would generate a 47-bit integer and then transforms that into a base62 (8-letter key) to be used in the short URL. once the URL is generated, the service will hit the database and store the urls in the relevant database shard, based on the short_code which would be hashed and put in the correct shard, and another record would be created to link that short code to the user that created it.
In the scenario where users are navigating a short URL, the request first reaches the nearest CDN edge locations. The edge checks if that request is already in the cache. If it's cached, the CDN returns the redirect immediately and asynchronously logs the request event to the events queue. If the request was not in the cache, the CDN forwards it to the load balancers and also asynchronously log the request event to the event queue. The load balancers in turn forwards to a URL redirection service, the redirection service then looks up the short_code in the relevant shard, and returns the response to the client, the CDN also caches the response for future requests with a TTL 60 seconds.
The analytics service periodically reads the events queue and looks up for click/redirect events, transforms them into relevant analytics data, stores them in the analytics database, making sure to use the correct sharding strategy to store the data, based on region.