Requirements
Functional Requirements:
- Create a short URL for a given long URL.
- Return the long URL associated with a given short URL.
- Support expiry and non-expiry links
- support alias
Non-Functional Requirements:
- Low latency in getting the redirect url ~ 10ms
- Scalable when the load on write and read is high
- Read:Write ratio = 100:1
- avg Write request: 5k per sec
- avg read request: 500k per sec
- Correct Http status code
- Fault tolerant
- high availability - in case some part of the system crashes we should be able to available to the users and provide them with a result (could be stale due to network fault and no replication) i.e we are letting go of consistency in the system.
API Design
- CreateShortUrl (POST)
- req:
- long url
- expiry: optional
- alias: optional
- res:
- short_url
- req:
- GetUrl (GET)
- req:
- short_url
- res
- http status (302 - temporary, 404-not found or expired)
- redirect_url
- req:
High-Level Design
The client will make a call to create the shortened url. the call goes to load balancer which chooses the api gateway server. This is for horizontal scalability in case the traffic is too much for one apigateway to handle. It redirects the traffic to service load balancer which directs to one of the url shortening service server.
From there we create an entry in the db and return the shortened url.
In the next getUrl api, we check if there is entry in redis we return it, else we check in db. If it is expired we soft delete it and return 404 error. else we return the url with 302 status - temporary. This is because the browser should not cache the url permanently as the long url can change for a short url later on. redis will help in reducing the high read load.
Read service will be seperate from write service. we can scale them seperately. The read request will come, we will first check in redis, if it does not exist then we will check the db and if not expired return the response. else we return does not exist. we update the redis as well with the result with the LRU/LFU algo which we are using for cache.
Cleaning of expired urls can be done lazily when the url is requested. and in the background can be done by a cleaner worker which reads the non deleted entries (Soft deleted) which are past expiry.
The model for shortenedUrls:
- short_url (primary key)
- long_url
- expiry
- created_at
- deleted_at
- updated_at
short url will be the primary key, given that we don't allow updation of it once created. we can use uuid_v7 as the shortened urls, that will help in sequencial id generation decreasing the latency for the index creation and hence writes due to sequential disk writes. if we want to have less length of url then we can also do base62 encoding of uuidv7, it will preserve the sequential disk updates and the url length will also reduce.
uuidv7 is unique enough to have no collisions in trilions or even more. if there is a collision we can add a random salt to the end.
Db type: write load is not that high. the schema is fixed and not evolving. Though there is no use for transaction and acid, we can go ahead with sql db for the given scale as redis is helping in reducing the read load. Also we can have a read replica to reduce that load. Though to be future safe for write scale we can use mongodb as it supports horizontal scalability easily than sql dbs.
The data will be fetched by the long url.
Api gateway is used for rate limiting and security checks. for us right now we don't have any user authentication but in case we want we can do that and that would authentication check will be done by api gateway.
Detailed Component Design
Db type: write load is not that high. the schema is fixed and not evolving. Though there is no use for transaction and acid, we can go ahead with sql db for the given scale as redis is helping in reducing the read load. Also we can have a read replica to reduce that load. Though to be future safe for write scale we can use mongodb as it supports horizontal scalability easily than sql dbs.
Redis - write through