System requirements:

Functional requirements:

URL Creation: Clearly state that the system should accept a long URL and generate a unique short URL, ensuring that the short URL is user-friendly and easy to share.
URL Redirection: Specify that when a user accesses the short URL, the system should retrieve and redirect them to the corresponding long URL efficiently.

Non-functional requirements:

How the system works:
- Input Validation: Define the specific criteria for validating long URLs -> regex ^(https?:\/\/)?([a-z0-9-]+\.)+[a-z]{2,6}(\/[^\s]*)?$
- Finds a mapping: through encoding or hash maps for examples
- Stores the mapping in the database
When creating a new short URL
- check hot cache for the long URL
- checks database
- if is nowhere -> create it
Requirements
- Response Time:
  - Low latency -> under 100ms.
  - Peak loads -> under 50ms
- High-availability: service is operational and accessible
  - Uptime metric: 99.9%
- Security:
  - Data protection: Secure storage
  - Access control: authentication and authorization
  - Rate Limiting: controls the number of requests a user can make in a given timeframe
- Scalability:
  - Horizontal scaling: adding more ressources instead of augmenting the capacities of one ressource
  - Database partitioning: hash partitioning
    - Range Partitioning: ONLY If you anticipate that certain ranges of data will be accessed more frequently)
  - Caching: hot caches

Capacity Estimation

Consistency vs Availability: Considerations here will help you balance trade-offs in your design, especially in a distributed system.
- Consistency > Availability
Traffic and Storage Estimation: Estimating read/write ratios and storage needs will inform your database design and scaling strategies.
- read:write very high 10k:1 -> hot cache can be an option
- one link can have 10k user per day
Scalability & Load Surges: Planning for horizontal scaling and geographic distribution will ensure your service can handle traffic spikes effectively.
1. scale can be very high -> 100k URL mapping needed
2. user growth expected to reach 10M as one user writing may mean 100-1k users reading and opening the short link
3. Storage needed: growing need for storage -> 1 link -> For example, if each entry is approximately 200 bytes and you expect to store 1 million URLs, you would need around 200 MB of storage.

=> Help pick the database strategy

API Design

Create Short URL Endpoint:
- HTTP Method: POST
- Endpoint: /api/shorten
- Request Body:
- { "longUrl": "https://example.com/some/very/long/url"}
- Response:
- { "shortUrl": "https://short.ly/abc123", "expiration": "2023-12-31T23:59:59Z" // Optional, if you support expiration}
Redirect Endpoint:
- HTTP Method: GET
- Endpoint: /api/r/{shortCode}
- Path Parameter: shortCode (e.g., abc123)
- Response: Redirects to the original long URL.

Considerations

Validation:
- Ensure the long URL is valid before shortening it.
- potential security risks (e.g., phishing).
Collision Handling: Decide how to handle cases where a generated short code already exists. Options include generating a new code or allowing users to specify a custom code.
Analytics: Consider adding tracking capabilities to monitor how many times a short URL is accessed.
Rate Limiting: Implement rate limiting to prevent abuse of the API, especially if it's publicly accessible.
Error Handling: Clearly define error responses for scenarios like invalid URLs, rate limits exceeded, or internal server errors.

Example Error Response

For invalid URLs:

{ "error": "Invalid URL format"}

By following this structure, you can create a robust URL shortening API that is easy to use and understand. This design also allows for scalability and future enhancements, such as adding user accounts or custom short URLs.

High-Level Design

The system has two main flows:

1. URL creation flow:

checks and validate the URL
create a mapping
stores the mapping in the DB

2. URL redirection flow:

load balancer or API gateway in front of the application services
- -> to route traffic,
- -> apply rate limits, and
- -> support horizontal scaling.
- health checks with the load balancer
Database
- Key value, ACID -> SQL database
Writing:
- Authentication and authorization
- Generating identifiers: key-generation service that pre-makes unique codes and hands them out (no request-time collisions
Reading
- cache: Redis for example, very fast and very relevant in this read heavy paradign

The read path is optimised for:

latency and cache hit rate

The write path is optimised for:

durability and validation

Database Design

Define the data model. Identify the main entities, their attributes, and relationships. Consider the choice of database type (SQL vs NoSQL) and justify your decision based on access patterns...

Database

Type of database: SQL
- will use easy lookup for low latency
Entity:
- key generation service ensures fast lookup and low latency

URL: { long_url: https://example.com/long/url/random,
short_url: short.com/abc123,
created_at: YYYYMMDD_HHMMSS,
created_by: user_id,
number_of_reads: X
}
user:
{ user_id: str,
username: str,
email : str
urls_created: list,
}

Data normalisation: third normal form (3NF)

Database features:

Read and write requirements: less than 300ms reading
Caching: hot cache using Redix
Indexing strategies: B-tree because we don't expect the hash keys to be of very long values -> more efficient. - look up a cached session by it's exact token_id
Data partitioning: high scale -> data partitioning based on hash -> corresponds to the key generation

Detailed Component Design

Deep dive into 2-3 key components. Explain how they work, how they scale, discuss tradeoffs, capacity, and any relevant algorithms or data structures.