System requirements:
Functional requirements:
- URL Creation: Clearly state that the system should accept a long URL and generate a unique short URL, ensuring that the short URL is user-friendly and easy to share.
- URL Redirection: Specify that when a user accesses the short URL, the system should retrieve and redirect them to the corresponding long URL efficiently.
Non-functional requirements:
- How the system works:
- Input Validation: Define the specific criteria for validating long URLs -> regex ^(https?:\/\/)?([a-z0-9-]+\.)+[a-z]{2,6}(\/[^\s]*)?$
- Finds a mapping: through encoding or hash maps for examples
- Stores the mapping in the database
- When creating a new short URL
- check hot cache for the long URL
- checks database
- if is nowhere -> create it
- Requirements
- Response Time:
- Low latency -> under 100ms.
- Peak loads -> under 50ms
- High-availability: service is operational and accessible
- Uptime metric: 99.9%
- Security:
- Data protection: Secure storage
- Access control: authentication and authorization
- Rate Limiting: controls the number of requests a user can make in a given timeframe
- Scalability:
- Horizontal scaling: adding more ressources instead of augmenting the capacities of one ressource
- Database partitioning: hash partitioning
- Range Partitioning: ONLY If you anticipate that certain ranges of data will be accessed more frequently)
- Caching: hot caches
- Response Time:
Capacity Estimation
- Consistency vs Availability: Considerations here will help you balance trade-offs in your design, especially in a distributed system.
- Consistency > Availability
- Traffic and Storage Estimation: Estimating read/write ratios and storage needs will inform your database design and scaling strategies.
- read:write very high 10k:1 -> hot cache can be an option
- one link can have 10k user per day
- Scalability & Load Surges: Planning for horizontal scaling and geographic distribution will ensure your service can handle traffic spikes effectively.
- scale can be very high -> 100k URL mapping needed
- user growth expected to reach 10M as one user writing may mean 100-1k users reading and opening the short link
- Storage needed: growing need for storage -> 1 link -> For example, if each entry is approximately 200 bytes and you expect to store 1 million URLs, you would need around 200 MB of storage.
=> Help pick the database strategy
API Design
- Create Short URL Endpoint:
- HTTP Method: POST
- Endpoint:
/api/shorten - Request Body:
{"longUrl":"https://example.com/some/very/long/url"}- Response:
{"shortUrl":"https://short.ly/abc123","expiration":"2023-12-31T23:59:59Z"// Optional, if you support expiration}
- Redirect Endpoint:
- HTTP Method: GET
- Endpoint:
/api/r/{shortCode} - Path Parameter:
shortCode(e.g.,abc123) - Response: Redirects to the original long URL.
Considerations
- Validation:
- Ensure the long URL is valid before shortening it.
- potential security risks (e.g., phishing).
- Collision Handling: Decide how to handle cases where a generated short code already exists. Options include generating a new code or allowing users to specify a custom code.
- Analytics: Consider adding tracking capabilities to monitor how many times a short URL is accessed.
- Rate Limiting: Implement rate limiting to prevent abuse of the API, especially if it's publicly accessible.
- Error Handling: Clearly define error responses for scenarios like invalid URLs, rate limits exceeded, or internal server errors.
Example Error Response
For invalid URLs:
{ "error": "Invalid URL format"}
By following this structure, you can create a robust URL shortening API that is easy to use and understand. This design also allows for scalability and future enhancements, such as adding user accounts or custom short URLs.
High-Level Design
The system has two main flows:
1. URL creation flow:
- checks and validate the URL
- create a mapping
- stores the mapping in the DB
2. URL redirection flow:
- load balancer or API gateway in front of the application services
- -> to route traffic,
- -> apply rate limits, and
- -> support horizontal scaling.
- health checks with the load balancer
- Database
- Key value, ACID -> SQL database
- Writing:
- Authentication and authorization
- Generating identifiers: key-generation service that pre-makes unique codes and hands them out (no request-time collisions
- Reading
- cache: Redis for example, very fast and very relevant in this read heavy paradign
The read path is optimised for:
latency and cache hit rate
The write path is optimised for:
durability and validation
Database Design
Define the data model. Identify the main entities, their attributes, and relationships. Consider the choice of database type (SQL vs NoSQL) and justify your decision based on access patterns...
Database
- Type of database: SQL
- will use easy lookup for low latency
- Entity:
- key generation service ensures fast lookup and low latency
URL: { long_url: https://example.com/long/url/random,
short_url: short.com/abc123,
created_at: YYYYMMDD_HHMMSS,
created_by: user_id,
number_of_reads: X
}
user:
{ user_id: str,
username: str,
email : str
urls_created: list,
}
- Data normalisation: third normal form (3NF)
Database features:
- Read and write requirements: less than 300ms reading
- Caching: hot cache using Redix
- Indexing strategies: B-tree because we don't expect the hash keys to be of very long values -> more efficient. - look up a cached session by it's exact token_id
- Data partitioning: high scale -> data partitioning based on hash -> corresponds to the key generation
Detailed Component Design
Deep dive into 2-3 key components. Explain how they work, how they scale, discuss tradeoffs, capacity, and any relevant algorithms or data structures.