System requirements
Functional:
As user provides url to the service, we need to return him shortened URL that will lead to his URL
Redirect on short URL to their linked long URL
Non-Functional:
High availability of service, as users would want to use it frequently
Low latency, as users would be frustrated in case if the service does a long redirection or creation of new short URL
Horizontal scaling to have more availability, cost effectiveness, and easier scaling
API design
We should have POST /short-urls for creating new shorten url
Maybe in case he wants to see his URLs, like GET /short-urls
Then also to get his linked URL by short url ID:
GET /short-urls/{{url_id}}
DELETE /short-urls/{{url_id}}
The endpoint queries the database using the short URL identifier to find the associated long URL
Upon successful lookup, the server responds with a redirect status code and the long URL, prompting the client to navigate to the original resource.
High-level design
Users make requests to the API service to ask/create data, and API will be asking DB to take/create it, in case of Redis having the data already, it won't go to DB, but to Redis to take cached data and return to user through API response. Load balancer will be added to maintain availability. API is validating the data. Rate limier could be added to stop users using the shortener multiple times in a short amount of time. For the short URL on creation we will assign it an ID, we can use from hash function of the provided URL, so like HASH(user_provided_url), maybe like SHA512 or some kind, as then it will be unique to different URLs provided. In DB the short URL can be the key, and the long URL can be the value.
DB for the data of URLs
Redis for the same URL requests, to give faster
Detailed component design
User->>API: User adds new link to shorten it
API->>DB: Saves new URL wtih new short ID
DB-->>API: New data inserted
API-->>User: Response with new shorten URL
User->>API: User uses shorten URL link
API->>Redis: API service check for cached data
Redis-->>API: Redis returns cached data in case it exists
API->>DB: If data does not exist in Redis, asks from DB
DB-->>API: Returns fresh data URL link
API-->>User: Returns response of linked URL with redirection
To avoid collisions on creating unique identifiers, we can lock the table/redis, add retry logic with exponential backoff. Use distributed locking mechanisms. If the ID generator is down, we can allow users to queue requests and process them once the generator is back online. Utilize a CDN to cache popular links at edge locations close to users. When a link is updated or disabled, ensure that the corresponding cache entries are purged or updated immediately. This can be achieved through TTL. TTL ensures that stale data is not served to users. By automatically removing old entries, TTL helps to manage memory usage effectively. Least Frequently Used eviction policy could be tried to use. For very hot IDs, consider using a dynamic caching strategy where you can increase the TTL (Time to Live) for these popular URLs. This means they stay in cache longer during peak traffic times, reducing the need to hit the database. Regularly monitor cache hit rates and eviction rates to adjust your policies as needed. This can help identify patterns in data access and optimize caching strategy. Determine optimal TTL values based on how frequently data changes and how critical it is for users to receive the latest information.
Use load balancers to distribute incoming requests evenly across multiple servers.
We can consider implementing a decentralized ID generation system, such as using a combination of timestamp-based identifiers and machine identifiers. This can help ensure that even if one generator fails, others can continue to produce unique IDs without collisions. For instance, we could use a combination of a timestamp, a machine identifier, and a sequence number to create unique IDs. This method allows multiple instances to generate IDs independently while minimizing the risk of duplication.
Database design
We will use NoSQL database, as we don't need to have structured tables, we just need to have key/value, so we will have one in cache, and one in DB for persistant data save.