Codemia | Master System Design Interviews Through Active Practice

My Solution for Designing a Simple URL Shortening Service: A TinyURL Approach with Score: 9/10

by serenade3523

System requirements

Functional Requirements:

Shorten URL:

Input: Accepts a long URL from the user.
Output: Generates a unique, short alias for the given URL.
Options:
Allow users to customize the alias (with availability check).
Set an expiration date for the shortened URL.
Protect the shortened URL with a password.

Expand URL:

Input: Accepts a short alias.
Output: Redirects the user to the original long URL associated with the alias.
Error Handling: Handle invalid or expired aliases gracefully.

Analytics (Optional):

Track Clicks: Record the number of times each shortened URL is clicked.
Referrer Tracking: Identify the source of the click (e.g., website, social media).
Location Tracking: Track the geographical location of the click (if permitted by user).

User Management (Optional):

User Accounts: Allow users to create accounts and manage their shortened URLs.
History: Maintain a history of shortened URLs for each user.

Non-Functional Requirements:

Performance:

Low Latency: Ensure fast response times for both shortening and expanding URLs.
High Throughput: Handle a large number of requests per second.
Scalability: The system should be able to scale horizontally to accommodate increased traffic.

Reliability:

High Availability: Minimize downtime and ensure the service is accessible even during component failures.
Fault Tolerance: The system should recover gracefully from errors and failures.
Data Durability: Ensure that data is not lost in case of hardware or software failures.

Security:

Input Validation: Validate all user input to prevent security vulnerabilities like injection attacks.
Rate Limiting: Implement rate limiting to protect against abuse and denial-of-service attacks.
Data Protection: Protect user data and ensure compliance with relevant data protection regulations.

Usability:

Simple Interface: Provide a user-friendly interface for shortening and managing URLs.
API Availability: Offer a well-documented API for programmatic access.
Customizable: Allow users to customize the appearance of shortened URLs (if applicable).

Maintainability:

Monitoring and Logging: Implement comprehensive monitoring and logging to identify and diagnose issues.
Testability: Ensure the system is easily testable to catch bugs and regressions.

Capacity estimation

Traffic Estimation:

New URL Shortening Requests:
Let's assume we expect 500 million new URL shortening requests per month.
Queries per second (QPS) = 500 million / (30 days * 24 hours * 3600 seconds) ≈ 200 URLs/s
Read/Write Ratio:
A typical read/write ratio for URL shorteners is around 100:1, meaning 100 redirections (reads) for every 1 new URL creation (write).
URL redirections per second ≈ 100 * 200 URLs/s = 20,000/s

Storage Estimation:

Data Retention: Let's assume we want to store URL mappings for 5 years.
Total URLs: Over 5 years, we'll store approximately 500 million * 12 months/year * 5 years = 30 billion URLs.
Data per URL:
Long URL (average 100 characters) = 100 bytes
Short alias (7 characters) = 7 bytes
Metadata (timestamps, user ID, etc.) = 50 bytes (estimated)
Total per URL ≈ 157 bytes
Total Storage: 30 billion URLs * 157 bytes/URL ≈ 4.71 TB
Storage Growth: We need to account for annual growth in data storage needs. Let's assume a 20% annual growth rate.

Memory Estimation (for Caching):

80/20 Rule: Assume 20% of URLs generate 80% of the traffic.
Cache Size: To cache the most frequently accessed URLs, we might need around 20% of 30 billion URLs * 157 bytes/URL ≈ 942 GB of memory.

Bandwidth Estimation:

Ingress (incoming requests): 20,200 requests/s * 150 bytes/request (estimated) ≈ 3 MB/s
Egress (outgoing responses): 20,000 redirections/s * 300 bytes/response (estimated) ≈ 6 MB/s

Key Considerations:

Peak Traffic: These are average estimates. We need to consider peak traffic, which could be significantly higher (e.g., during special events or promotions).
Growth: The system needs to be designed to scale to accommodate future growth in traffic and data storage.
Database Choice: The choice of database (e.g., Redis, DynamoDB, Cassandra) will impact storage and performance considerations.

Architectural Implications:

Sharding: We'll likely need to shard the database based on the short alias to distribute the load and ensure scalability.
Caching: An in-memory cache (e.g., Redis) is essential to handle the high volume of read requests and reduce the load on the database.
Load Balancing: We'll need to use load balancers to distribute traffic evenly across multiple application servers.
Content Delivery Network (CDN): Consider using a CDN to cache short URLs and reduce latency for users across different geographic locations.

This capacity estimation helps us understand the scale of the system we need to build and guides our architectural decisions. Remember, these are just estimates, and it's crucial to monitor the system's performance and adjust resources as needed.

API design

API Endpoints

Shorten URL

Endpoint: /api/shorten
Method: POST
Request Body:
long_url (required): The original long URL to shorten.
custom_alias (optional): A custom short alias provided by the user.
expires_at (optional): Expiration date for the shortened URL (Unix timestamp).
password (optional): Password for protected URLs.
Response:
Success: 201 Created
short_url: The generated or custom short URL.
Error: 400 Bad Request (e.g., invalid long URL, custom alias already taken)

JSON

// Request Example
{
    "long_url": "https://www.example.com/very/long/url/with/many/parameters",
    "custom_alias": "mylink", 
    "expires_at": 1692582400 
}
// Response Example
{
"short_url": "http://short.ly/mylink"
}

Use code with caution.

content_copy

Expand URL

Endpoint: /api/{short_alias}
Method: GET
Response:
Success: 301 Moved Permanently (redirect to long_url)
Error: 404 Not Found (if the short alias is invalid or expired)
Error: 401 Unauthorized (if the short URL is password-protected)

URL Stats (Optional)

Endpoint: /api/stats/{short_alias}
Method: GET
Headers:
Authorization: Bearer <API_KEY> (if authentication is required)
Response:
Success: 200 OK
click_count: Total number of clicks.
top_referrers: List of top referring sources.
recent_clicks: Timestamps of recent clicks.
Error: 404 Not Found (if the short alias is invalid)

Additional API Endpoints (If User Management is Implemented):

Create Account (POST): /api/users
Login (POST): /api/users/login
User Profile (GET/PUT): /api/users/me
User History (GET): /api/users/me/history
Delete URL (DELETE): /api/urls/{short_alias}

API Considerations:

Versioning: Use versioning (e.g., /api/v1/...) to ensure backward compatibility as the API evolves.
Authentication: Consider using API keys or OAuth2 for authentication if you implement user management.
Rate Limiting: Apply rate limiting to prevent abuse.
Documentation: Provide clear and comprehensive API documentation to facilitate integration for clients.

Database design

Entities and Attributes:

URL:
id (Primary Key): Unique identifier for each URL (could be auto-incremented or a UUID).
long_url (Text): The original, long URL.
short_alias (String, Unique): The generated or custom short alias.
created_at (Timestamp): The date and time the short URL was created.
expires_at (Timestamp, Nullable): Expiration date of the short URL (optional).
password (String, Nullable): Password for protected URLs (optional).
user_id (Foreign Key): References the user who created the URL (optional, if user accounts are supported).
User (Optional):
id (Primary Key): Unique identifier for each user.
username (String, Unique): User's chosen username.
email (String, Unique): User's email address.
password_hash (String): Hashed password for authentication.
Click (Optional):
id (Primary Key): Unique identifier for each click.
short_alias (Foreign Key): References the short URL that was clicked.
timestamp (Timestamp): Date and time of the click.
referrer (String, Nullable): The referring website or source.
user_agent (String, Nullable): The user's browser or device information.
ip_address (String, Nullable): The IP address of the user (for location tracking).

Relationships:

URL - User (1 to many): A user can create multiple URLs, but each URL is associated with at most one user.
URL - Click (1 to many): A short URL can be clicked multiple times, but each click is associated with one URL.

Entity-Relationship Diagram (ERD):

Code snippet

Use code with caution.

content_copy

Additional Considerations:

Indexes: Create indexes on short_alias for fast lookups during redirection.
Data Types: Choose appropriate data types for each attribute to optimize storage and query performance.
Normalization: Consider further normalization if needed (e.g., breaking down the user_agent into separate fields for browser, operating system, etc.).

High-level design

System Architecture

Code snippet

Use code with caution.

content_copy

Component Breakdown

Client (User):

The end-user interacts with the service through a web interface or API.
Sends requests to shorten or expand URLs.

API Gateway:

Acts as a single entry point for all incoming requests.
Handles routing, authentication, rate limiting, and other cross-cutting concerns.
Forwards requests to the appropriate backend services.

Shortener Service:

Responsible for generating short aliases and storing the mappings in the key-value store.
Implements the shortening algorithm (hashing, encoding, custom alias handling).
Interacts with the cache to store and retrieve frequently used mappings.

Key-Value Store:

Persistent storage for URL mappings (short alias to long URL).
Choose a database like Redis, DynamoDB, or Cassandra based on scalability and performance requirements.

Cache (Redis):

In-memory cache for faster lookups of frequently accessed URLs.
Reduces the load on the key-value store.

Redirection Service:

Handles expansion requests by looking up the original URL based on the short alias.
Redirects the user to the original URL using a 301 HTTP status code.
Interacts with the cache for faster lookups.

Analytics Service (Optional):

Collects and analyzes usage data (clicks, referrers, etc.).
Stores analytics data in a separate database or data store.

Workflow:

Shorten URL: The client sends a POST request to the API Gateway with the long URL. The request is routed to the Shortener Service, which generates a short alias, stores the mapping, and returns it to the client.
Expand URL: The client accesses the short URL. The request goes to the API Gateway and is routed to the Redirection Service. The Redirection Service looks up the original URL in the cache or key-value store and redirects the client.
Analytics (Optional): The Redirection Service records click data and sends it to the Analytics Service for processing and storage.

Key Design Considerations:

Scalability: The system should be horizontally scalable by adding more instances of the API Gateway, Shortener Service, and Redirection Service.
Fault Tolerance: Implement redundancy and failover mechanisms to ensure high availability.
Caching Strategy: Choose an appropriate caching strategy to balance performance and data freshness.
Security: Implement measures to prevent abuse and protect user data.

Request flows

Sequence Diagram:

Code snippet

Use code with caution.

content_copy

Explanation:

Shorten URL:

The user sends a POST request to /api/shorten with the long URL in the request body.
The API Gateway receives the request, validates it, and forwards it to the Shortener Service.
The Shortener Service first checks the cache for an existing mapping. If not found (cache miss), it queries the key-value store.
If no mapping exists, the Shortener Service generates a new short alias using the shortening algorithm.
The mapping (short alias - long URL) is stored in both the key-value store and the cache.
The Shortener Service returns the short URL to the API Gateway, which then sends it back to the user as a 201 Created response.

Expand URL:

The user clicks on the shortened URL in their browser.
The browser sends a GET request to the short URL.
The API Gateway receives the request and extracts the short alias.
The request is routed to the Redirection Service.
The Redirection Service checks the cache for the long URL associated with the alias. If not found (cache miss), it queries the key-value store.
The Redirection Service logs the click in the Analytics Service (optional).
The Redirection Service returns a 301 Redirect response to the API Gateway, which then sends it back to the user's browser.
The browser follows the redirect and loads the original long URL.

Key Points:

The API Gateway acts as the central entry point and handles routing, validation, and rate limiting.
The Shortener Service is responsible for generating and storing short aliases.
The Redirection Service handles the redirection logic and optional analytics.
The Cache (Redis) is used to improve response times for frequent requests.
The Key-Value Store (e.g., Redis, DynamoDB, Cassandra) provides persistent storage for the URL mappings.

Detailed component design

1. Shortener Service

Core Functionality:
Short Alias Generation:
Hashing: Receives a long URL and applies a hashing algorithm (e.g., MD5, SHA-256) to generate a fixed-length hash.
Base62 Encoding: Converts the hash into a more compact and user-friendly base62 string (a-z, A-Z, 0-9). This significantly reduces the alias length.
Collision Handling: If the generated alias already exists, use a collision resolution mechanism:
Appending a Counter: Append an increasing number to the alias (e.g., abc1, abc2, etc.).
Regenerating: Generate a new random alias until a unique one is found.
Custom Alias Handling: If the user provides a custom alias, the service checks its availability in the key-value store and uses it if it's not taken.
Storage Interaction: The service stores the mapping between the short alias and the long URL in the key-value store. It also updates the cache with the new mapping.
Scalability:
Stateless Design: The Shortener Service is designed to be stateless, meaning it doesn't maintain any session information. This allows for easy horizontal scaling by adding more instances behind a load balancer.
Hashing Function Choice: A good hashing function with low collision probability is crucial for scalability. Consider using more secure hash functions like SHA-256 if security is a concern.
Data Structures and Algorithms:
Hash Table: An in-memory hash table can be used to store a small cache of recently generated aliases to quickly check for duplicates.
Bloom Filter (Optional): A probabilistic data structure like a Bloom filter can be used to quickly check the existence of an alias in the key-value store with a small probability of false positives.

2. Cache (Redis)

Core Functionality:
Storing URL Mappings: The cache stores the most frequently accessed URL mappings (short alias to long URL) in memory.
Retrieving URL Mappings: When a redirection request arrives, the cache is checked first for the corresponding long URL. If found, it's returned immediately without needing to access the slower key-value store.
Cache Eviction: The cache uses an eviction policy (e.g., Least Recently Used - LRU) to remove the least recently used items when the cache is full.
Scalability:
Redis Cluster: Redis can be deployed as a cluster for horizontal scalability. Data can be sharded across multiple nodes to distribute the load.
Redis Replication: Redis supports replication, allowing for data redundancy and improved read performance.
Data Structures:
Hash: The primary data structure used in Redis for storing the URL mappings. Keys are the short aliases, and values are the corresponding long URLs.

Diagram:

Code snippet

Trade offs/Tech choices

Hashing Algorithm:

Trade-off:
MD5/SHA-1: Faster but less secure, potential for collisions (though very low probability).
SHA-256/SHA-3: More secure, lower collision risk, but slightly slower.
Choice: We'll start with SHA-256 for better security and collision resistance. If performance becomes a bottleneck, we can consider switching to a faster algorithm like MD5 and implement a more robust collision resolution mechanism.

2. Database:

Trade-off:
Redis: Excellent performance, in-memory caching, but limited persistence and scalability compared to others.
DynamoDB: Highly scalable, fully managed, but higher cost and less flexible data modeling.
Cassandra: Highly scalable, fault-tolerant, but more complex to set up and manage.
Choice: We'll start with Redis for its speed and simplicity, especially for caching frequently accessed URLs. We can later integrate DynamoDB or Cassandra for storing the full URL mappings if scaling becomes necessary.

3. Caching Strategy:

Trade-off:
LRU (Least Recently Used): Simple and effective for many workloads, but can evict frequently used items if the working set is larger than the cache.
LFU (Least Frequently Used): Good for workloads with a skewed access pattern, but more complex to implement and can be less responsive to changing access patterns.
ARC (Adaptive Replacement Cache): Combines LRU and LFU for better adaptability, but more complex.
Choice: We'll begin with LRU due to its simplicity and good performance in most scenarios. If we observe that LRU is not optimal, we can switch to LFU or ARC.

4. Custom Alias Generation:

Trade-off:
Allowing custom aliases: Improves user experience but increases complexity in handling conflicts and ensuring uniqueness.
Disallowing custom aliases: Simpler to implement, but limits user choice.
Choice: We'll allow custom aliases but implement robust validation and conflict resolution to ensure a smooth user experience.

5. Analytics Collection:

Trade-off:
Detailed Analytics: Provides valuable insights but requires additional storage and processing.
Minimal Analytics: Reduces overhead but provides less information.
Choice: We'll start with basic click tracking and gradually add more sophisticated analytics features based on user demand and resource constraints.

6. Microservices vs. Monolith:

Trade-off:
Microservices: Offers better scalability, flexibility, and fault isolation, but introduces complexity in managing and deploying multiple services.
Monolith: Simpler to develop and deploy initially, but can become harder to maintain and scale as the system grows.
Choice: We'll adopt a hybrid approach, starting with a monolithic architecture for faster development and transitioning to a microservices architecture as the system grows and specific components require independent scaling.

These are just a few examples of the trade-offs and technology choices we need to consider. As the system evolves, we'll continue to evaluate and adjust our decisions based on changing requirements, performance metrics, and user feedback.

Failure scenarios/bottlenecks

1. Database Failures:

Scenario: The key-value store (e.g., Redis, DynamoDB, Cassandra) becomes unavailable due to hardware failure, network issues, or overload.
Impact: Shortening new URLs and expanding existing ones will fail, causing service disruption.
Mitigation:
Replication and Failover: Use database replication and automatic failover mechanisms to ensure high availability.
Caching: Rely on the in-memory cache (Redis) to serve frequently accessed URLs even if the database is down.
Read Replicas (if applicable): Utilize read replicas to handle read requests during a database outage.

2. Cache Failures:

Scenario: The Redis cache becomes unavailable or overloaded.
Impact: Increased latency for URL lookups, potentially leading to slower response times and increased load on the database.
Mitigation:
Cache Eviction Policies: Implement effective cache eviction policies (e.g., LRU) to ensure that the most relevant URLs remain in the cache.
Cache Replication: Replicate the cache across multiple nodes to improve availability and fault tolerance.
Fallback to Database: If the cache is down, fall back to querying the database directly, but be mindful of potential performance degradation.

3. API Gateway Overload:

Scenario: A sudden surge in traffic overwhelms the API Gateway, causing slowdowns or even crashes.
Impact: Users experience slow response times or are unable to access the service.
Mitigation:
Rate Limiting: Implement rate limiting to control the number of requests per second per user or IP address.
Load Balancing: Distribute incoming traffic across multiple API Gateway instances to handle high loads.
Auto-Scaling: Automatically scale up API Gateway instances during traffic spikes.

4. Shortener Service Bottleneck:

Scenario: The Shortener Service becomes overloaded due to heavy traffic or computationally expensive tasks like generating short aliases or handling custom alias conflicts.
Impact: Slower response times for shortening URLs.
Mitigation:
Caching: Cache recently generated aliases to avoid redundant computations.
Queueing: Use a message queue (e.g., RabbitMQ, Kafka) to decouple the Shortener Service from the API Gateway, allowing it to process requests asynchronously.
Horizontal Scaling: Add more instances of the Shortener Service to handle increased load.
Optimization: Optimize the shortening algorithm and custom alias handling logic for better performance.

5. Malicious Attacks:

Scenario: Distributed Denial of Service (DDoS) attacks or attempts to exploit vulnerabilities in the system (e.g., SQL injection, XSS).
Impact: Service disruption, data breaches, or compromised user accounts.
Mitigation:
Security Hardening: Regularly update software, implement strong authentication and authorization, and apply security best practices.
Web Application Firewall (WAF): Use a WAF to protect against common web attacks.
DDoS Protection: Employ DDoS mitigation services to detect and block malicious traffic.

6. Data Corruption:

Scenario: Data corruption in the key-value store or cache due to software bugs, hardware failures, or human error.
Impact: Incorrect redirections, invalid or missing URLs, potential data loss.
Mitigation:
Backups: Regularly back up the database to a separate location.
Data Integrity Checks: Implement checksums or other data integrity checks to detect and repair corrupted data.
Monitoring: Continuously monitor the system for errors and anomalies to detect data corruption early.

By proactively identifying and addressing these potential failure scenarios and bottlenecks, we can build a more robust and resilient URL shortening service that can handle high traffic loads, resist attacks, and ensure a smooth user experience.

Future improvements

Future Improvements:

Advanced Analytics and Reporting:

Expand analytics beyond simple click tracking.
Provide detailed reports on user behavior (location, device, referrer) to give customers insights for their marketing or tracking efforts.
Implement A/B testing capabilities for users to compare the performance of different short URLs.
Offer customizable dashboards and reporting options.

Integration with Other Services:

Integrate with popular social media platforms to streamline sharing and tracking of shortened URLs.
Allow linking short URLs to QR codes for offline campaigns.
Enable integration with third-party tools like analytics platforms and marketing automation software.

Advanced Features:

Link Retargeting: Allow users to dynamically change the destination of a short URL based on specific criteria (e.g., location, device type).
Branded Short Domains: Enable custom branded domains for short URLs (e.g., yourbrand.com/xyz).
Link Rotation: Rotate multiple destination URLs behind a single short link for testing or load balancing purposes.
API Rate Limiting and Throttling: Implement more fine-grained API rate limiting and throttling mechanisms to prevent abuse and ensure fair usage.

Mitigating Failure Scenarios and Bottlenecks:

Database Failures:
Multi-Region Replication: Replicate data across multiple geographic regions to ensure availability even if an entire region fails.
Backup and Restore: Implement automated backups and a well-defined disaster recovery plan.
Database Monitoring: Proactively monitor database health and performance metrics to detect potential issues before they escalate.
Cache Failures:
Redis Cluster: Use Redis Cluster mode for better scalability, fault tolerance, and high availability.
Circuit Breaker Pattern: Implement a circuit breaker to prevent cascading failures by isolating the cache from other components if it becomes unresponsive.
API Gateway Overload:
Caching at the Edge: Utilize a content delivery network (CDN) to cache responses at the edge, reducing the load on the API Gateway.
Microservices Architecture: Break down the monolith into smaller, independent microservices to isolate failures and scale individual components independently.
Shortener Service Bottleneck:
Asynchronous Processing: Use message queues like RabbitMQ or Kafka to handle shortening requests asynchronously, improving throughput and responsiveness.
Rate Limiting: Apply rate limiting to the shortening endpoint to prevent overloading the service.
Optimization: Continuously profile and optimize the shortening algorithm and custom alias handling logic for better performance.
Malicious Attacks:
Intrusion Detection System (IDS): Deploy an IDS to monitor and detect suspicious activity.
Web Application Firewall (WAF): Utilize a WAF to filter out malicious traffic and protect against common web attacks.
Security Audits: Conduct regular security audits and penetration testing to identify and address vulnerabilities.
Data Corruption:
Data Validation: Implement strict data validation at the API Gateway and other entry points to prevent invalid or corrupted data from entering the system.
Error Handling: Implement robust error handling and logging to detect and recover from data corruption issues.
Periodic Data Integrity Checks: Schedule periodic data integrity checks to identify and repair corrupted data proactively.

By incorporating these improvements and mitigation strategies, we can build a more resilient, scalable, and feature-rich URL shortening service that meets the growing demands of our users and provides valuable insights through advanced analytics.

pen_spark