My Solution for Designing a Simple URL Shortening Service: A TinyURL Approach with Score: 9/10

by alchemy1135

System requirements


Functional:

  1. URL Shortening:
  2. The system should provide a mechanism to shorten long URLs.
  3. Shortened URLs should be unique and not collide with existing ones.
  4. Redirection:
  5. When users access a shortened URL, they should be redirected to the original long URL.
  6. The redirection process should be fast and seamless.
  7. Custom Alias:
  8. Users should have the option to customize their short URLs with a chosen alias (if available).
  9. Alias should be unique across the system.
  10. Expiration:
  11. Provide an option for users to set an expiration date for their short URLs.
  12. The link will by default expire after a certain period if no expiration date is set.
  13. API:
  14. Offer a RESTful API to allow integration with other services and applications.
  15. API should support URL shortening, retrieval, and analytics retrieval.


Non-Functional:

  1. Scalability:
  2. The system should handle a large number of URL shortening requests and clicks.
  3. Scalability should be achieved through horizontal scaling.
  4. Performance:
  5. The redirection process should be fast, with minimal latency.
  6. Ensure that the system can handle concurrent access without performance degradation.
  7. Reliability:
  8. The service should be highly available and have minimal downtime.
  9. Implement mechanisms for fault tolerance and disaster recovery.
  10. Storage:
  11. Efficiently store and retrieve a large number of short URLs and their associated data.
  12. Use reliable and scalable storage solutions.
  13. Backup and Recovery:
  14. Regularly backup data to prevent data loss.
  15. Implement a recovery process to restore data in case of failures.
  16. Security:
  17. Implement security measures to prevent misuse and unauthorized access to short URLs.
  18. Support HTTPS for secure data transmission.



Capacity estimation

Let us assume our service will have the following usage patterns:

  • 1 million new URL shortenings per day. which means an average 30 million shortenings per month.
  • 10 million unique users.
  • 100 million URL redirections per day.


Below are the number of url that we will have to store in our database for a 5 year period: 30 million * 5 years * 12 months = 1.8 Billion

Let’s consider we are using 8 characters to generate a short URL. These characters are a combination of 62 characters [A-Z, a-z, 0-9, _ ], something like http://ad.com/abXdef21. This will give us a total of 3.2 Trillion unique combinations.

Let's try to understand how much storage we will need to store the data for 5 years.


Storage Estimation for 5 years:

  • Short URLs: 1.8 Billion * 8 bytes = 14.4 GB
  • Long URLs: 1.8 Billion * 500 bytes = 900 GB
  • Other metadata: 1.8 Billion * 100 bytes = 180 GB


Total: ~1TB for 5 years (per database)

Every replication will need to be 1TB or more


API design


  1. POST /api/shorten : This is the API that accepts the original URL, service will generate a hash, this hash will be appended to the URL and the shortened URL will be returned.
  2. GET /<alias> : This is the API that will accept the short URL and redirect to the original URL.


Database design


  • Table 1: ShortURLs
  • id (Primary Key)
  • alias
  • original_url
  • created_at
  • expiration_date


  • Table 2: User
  • user_id (Primary Key)
  • name
  • email
  • password
  • Created_at

High-level design

  1. Web Server:
  2. Handles incoming HTTP requests.
  3. Validates and routes requests to the appropriate service.
  4. URL Shortening Service:
  5. Generates short URLs and stores them in the database.
  6. Manages alias availability and expiration.
  7. Redirection Service:
  8. Handles redirection based on the alias.
  9. Logs analytics data.
  10. Cleanup Service:
  11. This service helps in cleaning the old data from the databases which includes expired links and inactive users.
  12. Load Balancers: To handle a large number of requests, we can use a load balancer to distribute incoming traffic across multiple instances of the application server. We can add a Load balancing layer at three places in our service:
  13. Between Clients and Application servers
  14. Between Application Servers and database servers
  15. Between Application Servers and Cache servers
  16. Caching:
  17. Since reading from the database can be slow and resource-intensive, we can add a caching layer to speed up read operations. We can use in-memory caches like Redis or Memcached to store the most frequently accessed URLs.



Request flows

Explain how the request flows from end to end in your high level design. Also you could draw a sequence diagram using the diagramming tool to enhance your explanation...






Detailed component design


1. URL Shortening Service:

  • Functionality:
  • Accepts a long URL.
  • Generates a unique alias or uses a custom one if available.
  • Stores the mapping in the ShortURLs table.
  • Tech Choices:
  • Language: Python/Node.js
  • Storage: MongoDB/Redis



2. Redirection Service:

  • Functionality:
  • Takes a short URL alias from the incoming request.
  • Retrieves the original URL from the database.
  • Redirects the user to the original URL.
  • Tech Choices:
  • Language: Node.js
  • Caching: Redis for frequently accessed URLs.





Trade offs/Tech choices


Database choice:

Using a NoSQL database for a URL shortening service has several advantages, making it a good choice for certain aspects of the system. Here are some reasons why a NoSQL database may be suitable for this scenario:

  1. Schema Flexibility:
  2. NoSQL databases, such as MongoDB or Redis, provide schema flexibility, allowing you to store data in a flexible, JSON-like format. This is beneficial for a URL-shortening service where the structure of data (e.g., short URLs, custom aliases, analytics) may evolve.
  3. Scalability:
  4. NoSQL databases are generally designed to scale horizontally, making it easier to handle the growing volume of data and increasing traffic associated with a URL-shortening service. As the number of short URLs and analytics data increases, you can add more nodes to distribute the load.
  5. High Write Throughput:
  6. URL shortening services involve a high volume of write operations, especially when generating and storing new short URLs. NoSQL databases are often optimized for high write throughput, making them suitable for scenarios where new data is frequently added.
  7. Document-Oriented Storage:
  8. Many NoSQL databases are document-oriented, allowing you to store data in a JSON-like format. This is advantageous for storing complex data structures associated with short URLs, such as metadata, analytics, and custom alias information.


CAP Theorem Implications for URL Shortening Service:

  • AP (Availability and Partition Tolerance):
  • In a URL-shortening service, high availability is often more critical than strict consistency. Users expect quick access to short URLs, and downtime or delays in availability are less acceptable. Therefore, prioritizing availability and partition tolerance is a common choice.
  • Consistency Trade-offs:
  • Achieving strong consistency across all nodes in a distributed system can introduce delays, especially during periods of network partition or high traffic. The system might choose eventual consistency, where all nodes will eventually converge to the same state, but immediate consistency might not be guaranteed.
  • Eventual Consistency:
  • Since URL shortening services often involve distributed data across multiple nodes or data centers, eventual consistency might be a pragmatic approach. Users might experience temporary inconsistencies between nodes, but the system would work towards converging to a consistent state over time.
  • Caching and Redirection:
  • Caching is often employed to improve performance and availability. If a node has cached data, it can still provide a redirect even if it temporarily loses contact with the main database. This approach enhances availability and partition tolerance.



Failure scenarios/bottlenecks


  1. Database Outage:
  2. Bottleneck: Redirection and URL shortening services heavily depend on the database. A failure to reach the database or high latency to get the result might lead to a bad user experience.
  3. Mitigation: Implement caching mechanisms to retrieve URLs that are in high demand and load balancing to handle partial outages.
  4. High Traffic Peaks:
  5. Bottleneck: Sudden spikes in traffic could overwhelm the system in case the URL is added as a backlink to a popular webpage.
  6. Mitigation: Use a content delivery network (CDN) for caching and distributing load through load balancers.



Future improvements


  1. Real-time Analytics: analytics service to provide real-time data.
  2. Link Customization Options: Expand customization options, such as allowing users to set vanity paths.
  3. Integration with External Analytics Tools: Allow users to integrate their short URLs with external analytics tools like Google Analytics.
  4. Smart Alias Generation: Implement smarter algorithms for alias generation to improve user experience.