Requirements
Functional Requirements:
- Create a short URL for a given long URL.
- Return the long URL associated with a given short URL.
- A guest user can create 5 shortened links
- A logged in user can create 500 links
- A premium user can virtually create any number of links.
- Rate limiting applies like this
- 1 request per minute for guest user
- 5 request pers minut for logged in user (including premium)|
- For guest users we can use IP address and place a cookie in their browser
- Assume VPNs are not allowed.
- Redirection with shortened URL with not more than 5 millisecond latency. I will use 302 code from HTTP/S return.
- Should serve 100 million monthly active users
- Use load balancer for evenly distrubting requests to the database r
Non-Functional Requirements:
- Availability of > 99.9%
- Low latency for redirections when shorten URL is used
Capacity Estimation
Estimate the scale of the system. Consider daily active users, read/write ratio, storage requirements, bandwidth, and any relevant QPS calculations...
I will assume that 100 million users per month and availability of 99.9% or higher is required. Assuming that
API Design
Define the APIs expected from the system. This is your chance to analyze and define the read and write paths so that you can come up with the high-level design...
So I need two APIs shorten/ and getOriginal/. Shorten is supposed to supposed to take up URL as payload and return shortened URL. THe map from original URL to shorten URL can be stored in a database. I would go with a simple relational database postgres. A table with id serialized bigint incremental auto and original URL as varchar, shortened URL as varchar, datetime stamp, duration how many seconds this URL shortened is valid. This is the primary source of truth for URL shortener.
/shorten call with hit a web server and generates the shortened URL insert into table and return back the shortened URL
/getOriginal{shortenedURL} makes a call to DB and queries above table like SELECT * FROM table WHERE shortened_URL=input and process the result and send back response.
We can keep in-memory database between hotpath that serves as cache for recent requests. Redis could be used. We could set max limit of 4GB and choose least recently used eviction scheme.
High-Level Design
Describe the overall system architecture. Identify the main components needed to solve the problem end-to-end. Use the diagramming tool to create a block diagram.
Database Design
Define the data model. Identify the main entities, their attributes, and relationships. Consider the choice of database type (SQL vs NoSQL) and justify your decision based on access patterns...
I prefer to go with relational database with log structured storage (with sequential appending because writes will be much faster with sequential writes). A single table would be enough.
create table url_map (
id UUID primary key,
original_url varchar(500) not null,
shortened_url varchar(500) unique not null,
created_at timestamp,
valid_till duration default 320*60 seconds // 2 hours
)
create index original_url_index on url_map(id, original_url);
create index shortened_url_index on url_map(shortned_url_index);
we can do 3NF normalized form later.
We can have RAID5 configuration with striping and mirroring for better throughput and reliability for my database. Application server calls insert into url_map (id, original_url, shortned_url, created_at, valid_till)
But I feel like we are missing an important part, users can generate
Detailed Component Design
Deep dive into 2-3 key components. Explain how they work, how they scale, discuss tradeoffs, capacity, and any relevant algorithms or data structures.