System requirements


Functional:

As user provides url to the service, we need to return him shortened URL that will lead to his URL

Redirect on short URL to their linked long URL



Non-Functional:

High availability of service, as users would want to use it frequently

Low latency, as users would be frustrated in case if the service does a long redirection or creation of new short URL

Horizontal scaling to have more availability, cost effectiveness, and easier scaling



API design

We should have POST /short-urls for creating new shorten url

Maybe in case he wants to see his URLs, like GET /short-urls

Then also to get his linked URL by short url ID:

GET /short-urls/{{url_id}}

DELETE /short-urls/{{url_id}}


The endpoint queries the database using the short URL identifier to find the associated long URL

Upon successful lookup, the server responds with a redirect status code and the long URL, prompting the client to navigate to the original resource.



High-level design

Users make requests to the API service to ask/create data, and API will be asking DB to take/create it, in case of Redis having the data already, it won't go to DB, but to Redis to take cached data and return to user through API response. Load balancer will be added to maintain availability. API is validating the data. Rate limier could be added to stop users using the shortener multiple times in a short amount of time. For the short URL on creation we will assign it an ID, we can use from hash function of the provided URL, so like HASH(user_provided_url), maybe like SHA512 or some kind, as then it will be unique to different URLs provided. In DB the short URL can be the key, and the long URL can be the value.




DB for the data of URLs

Redis for the same URL requests, to give faster





Detailed component design

 User->>API: User adds new link to shorten it

  API->>DB: Saves new URL wtih new short ID

  DB-->>API: New data inserted

  API-->>User: Response with new shorten URL

  User->>API: User uses shorten URL link

  API->>Redis: API service check for cached data

  Redis-->>API: Redis returns cached data in case it exists

  API->>DB: If data does not exist in Redis, asks from DB

  DB-->>API: Returns fresh data URL link

  API-->>User: Returns response of linked URL with redirection


To avoid collisions on creating unique identifiers, we can lock the table/redis, add retry logic with exponential backoff. Use distributed locking mechanisms. If the ID generator is down, we can allow users to queue requests and process them once the generator is back online. Utilize a CDN to cache popular links at edge locations close to users. When a link is updated or disabled, ensure that the corresponding cache entries are purged or updated immediately. This can be achieved through TTL. TTL ensures that stale data is not served to users. By automatically removing old entries, TTL helps to manage memory usage effectively. Least Frequently Used eviction policy could be tried to use. For very hot IDs, consider using a dynamic caching strategy where you can increase the TTL (Time to Live) for these popular URLs. This means they stay in cache longer during peak traffic times, reducing the need to hit the database. Regularly monitor cache hit rates and eviction rates to adjust your policies as needed. This can help identify patterns in data access and optimize caching strategy. Determine optimal TTL values based on how frequently data changes and how critical it is for users to receive the latest information.


Use load balancers to distribute incoming requests evenly across multiple servers.


We can consider implementing a decentralized ID generation system, such as using a combination of timestamp-based identifiers and machine identifiers. This can help ensure that even if one generator fails, others can continue to produce unique IDs without collisions. For instance, we could use a combination of a timestamp, a machine identifier, and a sequence number to create unique IDs. This method allows multiple instances to generate IDs independently while minimizing the risk of duplication.


Database design

We will use NoSQL database, as we don't need to have structured tables, we just need to have key/value, so we will have one in cache, and one in DB for persistant data save.