Requirements


Functional Requirements:

  • Create a short URL for a given long URL.
  • Return the long URL associated with a given short URL.
  • Short URLs should map to only 1 long URL.



Non-Functional Requirements:


  • Low latency for the shortened URL resolution to long URL.
  • High availability for the creation and resolution of URLs. Users should be able to generate short URLs and use short URLs with little downtime.


API Design


We will need 2 endpoints, 1 for URL shortening, and 1 for resolving the shortened URL.


For the URL shortening endpoint, the request structure for the client is as follows:

  1. Long URL


The response structure for the server is as follows:

  1. Shortened URL


We will use a REST API over HTTP for the URL generation. This is so browsers will be able to natively deserialize the data, unlike gRPC. GraphQL adds unnecessary overhead as we are always requesting the same information.



For the URL resolving endpoint, we will pass in the shortened URL and return the long URL.



High-Level Design

Describe the overall system architecture. Identify the main components needed to solve the problem end-to-end. Use the diagramming tool to create a block diagram.


I will break this section down into two steps:

  1. The shortened URL generation
  2. The shortened URL resolution


For the shortened URL generation, we need a server layer to take in the long URL and check if there's an associated shortened URL. We will check in our database if we've already created a shortened URL for the given long URL. If we have not, then we will hash the long URL and write the association into our database.


Finally, we will return the hashed shortened URL to the client.


For the shortened URL resolution, we will first check a cache if the long URL is stored. If the long URL is not in the cache, we will read from the server to get the long URL associated with the short URL.



For both paths, we will introduce a load balancer for high availability.



Detailed Component Design


For the URL generation, we need to encode the long URL in a separate format. I propose that we use some sort of encoding like base64 encoding. Next, we will need some way to avoid collisions. I propose that we can associate each long URL with an integer ID that continues to increment. Therefore, when we encode a new URL, we have high confidence that we haven't returned a duplicate shortened URL. Finally we can store our generated URLs in a database that we can check if we've generated it before. If we have, we can append some random character to the URL.


For the data storage layer, we will use a noSQL database. This is because noSQL has a high guarantee on availability because scaling is much easier. This comes at a cost of consistency, but eventual consistency is likely acceptable for our use case.


For the caching layer, we will utilize redis keyed on the passed in URL. We will use a pull strategy that will allow us to efficiently serve hot URLs when they are being generated or read. When we have a cache miss or the entry expires, we will read data from the API gateway and write it back into the cache.