My Solution for Designing a Simple URL Shortening Service: A TinyURL Approach with Score: 8/10
by kraken9196
System requirements
Functional:
- Generate a shortened URL for the URL that has been given by the user as input.
- Ensure that no 2 inputs map to the same shortened URL.
- the shortened URLs will be associated with an expiration time, which will be 365 days after which the URL will stop working and user will have to generate a new shortened URL.
- Ensure that the shortened URL redirects to the correct original URL.
- Maintain a history log for the user to keep track of how many URLs a particular user has provided.
Non-Functional:
- The system should be highly available, considering 99.999% system availability.
- The system should be fault tolerance.
- User should be able to get the tiny URLs with minimal latency.
- System should have rate limiting to avoid the user bombarding the system with requests.
Capacity estimation
Considering a total user base of 10 million users, we can consider DAU as 1 million.
daily each user provides 5 URLs to the system to be shortened.
considering for each URL if we need 1 KB of space to store the tiny URL, the original URL and the time details we can have 1 million * 5 * 1 KB of space required on a daily basis.
We can consider peak load all at once hence 1 million requests per second of peak load.
API design
- generate-url : This API would take the original URL from the user in the payload and will provide the generated shortened URL.
- redirect : This API would take the tiny URL as input and would help the user get the original URL to be redirected to.
- view-history/{userId} : This would give the details to the user of the URLs they have provided to the system till date.
- get-expiry-date : This API would take the tiny URL as input in the API payload and would return the date of expiration of the URL.
Database design
- We can make use of a relational database such as postgres in order to store and retrieve the details of the URLs for the user.
- We will have a user table which will store the user specific metadata such as username, first_name, last_name and an id user_id column which can be a UUID.
- We will also have a table url_mappings which will have the user_id from the user table as foreign key, this table will store the mapping of the original_url and tiny_url and a column called expiration date, beyond this date the URL will not be accessible to the user.
- Since this will be a read heavy system we can have a replica set system where we can have 3 read replicas and a single writer.
- We can also have sharding of the database on the basis of the location of the user which will help us to scale effectively depending on if the application has global usage.
- We can also have a log table which will store the details such as input url, generated url, time of generation, generated by user for showing the user a history or generating reports.
High-level design
- The Web App initiates the process by sending requests to the API Gateway.
- The API Gateway is responsible for authenticating and implementing rate limiting before passing the request to the Load Balancer.
- The Load Balancer distributes the requests evenly to the Backend Servers.
- Among the Backend Servers, any read requests are managed by a dedicated Load Balancer Reads which then directs them to multiple Read Replicas. This helps in enhancing performance for read operations.
- For write operations, the requests are routed through a Load Balancer Writes to ensure all writes are directed to the Primary Backend maintaining consistency.
Request flows
- The request from the web app first goes to the API gateway, at the API gateway basic authentication happens as every user will have a unique key associated with its account.
- The API gateway will also take care of the rate limiting part hence ensuring the system is not bombarded with requests.
- After passing through the gateway the request reaches the load balancer, at the load balancer level depending on the amount of traffic each server is handling the request is sent to one of the backend servers.
- At the backend server depending on whether the request is a read or a write request it will be forwarded to the respective load balancer.
Detailed component design
- First we can look at the URL generation part, diving deep into this component, every URL which is given by the user will provide him a unique tiny URL, in order to maintain the uniqueness of this URL for the user we can have a hash function which would take the username and the URL provided by the user as input, generate a string combination with this input and append the timestamp in epoch, this pattern will ensure that no 2 users generate the same URL.
- URL redirection is the second functionality, here once the system receives the input from the user will check the url_mapping table for the original URL, the system here will also check if the curren_date > date of expiration of the URL and if it is then an error will be thrown.
Trade offs/Tech choices
- Made use of postgres database in as we need relational mapping for users and URLs.
- As this system should be highly available I went with read replicas so that users are served at all times.
Failure scenarios/bottlenecks
- Network partition failures can be one of the failure scenarios.
- If we have 1 million writes at any point in time in the future, relational database can prove to be fatal, we may need to use something like cassandra.
Future improvements
- The database can be sharded on the basis of locations, if the application is being used at a global level, this will help in case of high volumes and high user base.