Designing A Simple Url Shortening Service A TinyURL Approach - System Design

Requirements

Functional Requirements:

Create a web service for URL shortening. With the means of creating account, authentication via external authority, creating and saving short URLs from long URLs, changing the permissions and editing user profile. External registered and unregistered users should be able to follow the shortened URLs via HTTP redirects.

Detailed requirements

Authentication: via external authority (e.g. OIDC)

Authorization: an authenticated user can be owner of short URL, can have permissions to follow short URLs, has permissions to edit/delete/create own short URLs.

Web Interfaces:

control interface for authenticated user
authentication flow to/from external authority
web redirect when resolving short URLs.

Transactions:

The service is not critical, it does not need to have data integrity constraints

Compliance:

No requirements at all

Data retention:

User account is considered inactive when this user does not authenticate. 6 months of inactivity trigger the sending of warning email if the email address can be retrieved from authentication authority or outright specified in user profile. 1 month of inactivity after sending the warning email trigger user account deletion. When inactive account is deleted all short URLs associated with this account are removed.

Data constraints:

We can use UUID or any other unambiguous ID to identify user account. Long URL is arbitrary string in W3C standard URL encoding, no longer than 2048 characters. Short URL is systemwide unique.

Non-Functional Requirements:

Performance – speed and responsiveness

The main flow of shortened URL redirect should have latency < 100 ms, <10 is recommended.

The expected normal load numbers are:

10k RPS create
100k RPS recall
100M unique daily users
1 year maximum data retention for inactive accounts
1 G total users
100 URLs/user for standard tier, can be expanded for higher tiers
Long URL is no longer than 2048 bytes
an URL can be shared with no more than 100 user accounts or with everyone including unauthenticated users
expected user base growth is x10 over several years
Security – protection against unauthorized access

Account data is accessible only by authenticated user. If short URL has "anonymous" permission flag set, it is accessible by anyone, authenticated or not. If short URL has "anonymous" permission flag cleared, it is accessible only by authenticated users it is explicitly shared with.

Usability – ease of use

There are no hard requirements for usability when creating or editing shortened URL. But resolving shortened URL to long form should be extremely usable and seamless is all possible scenarios.

Reliability – system stability and availability

The system should have basic protection from infrastructure outages, e.g. have the servers spread over several regions, use highly-available storage, etc. Having DoS protection is highly advised

Scalability – ability to handle growth

The normal load is expected to grow very slowly: x10 over 3 years, but the system should expect and handle normally short bursts of request rate.

Maintainability – ease of updates and fixes

The system should be able to have zero-downtime updates and fixes. It does not need to be able to preserve existing sessions and maintains strict consistency during the updates. It should not spuriously lose the stored data on updates, but it can forget the temporary data of active sessions while a change is ongoing

Portability – ability to run in different environments

The system does not need to be portable, it can be vendor-locked to infrastructure provider. It must conform to WWW standards in terms of internationalization, portability, error tolerance.

Availability

The system should tolerate limited infrastructure outages without stopping the service, such as unavailable datacenter in one region.

Desired uptime is 99.999%. Desired RPO is 1 day for a limited user set, 2 min for the entire userbase. Desired RTO is below 1 minute.

API Design

Expected APIs

Main flow

login
logout
redirect

Profile management

Create profile
Read profile properties
Update profile
Delete profile (remove user)

URL management

list my URLs
create new URL
read URL properties
update URL properties
delete URL

API doc

/login:

post:

description: redirect to external auth system, e.g. keycloak

/logout:

post:

description: call external system to forget the session

/redirect/{shortUrl}:

get:

description: resolve short URL

response:

301:

description: normal redirect response

404:

description: error - no such short URL found

content:

text/html:

403:

description: access permission error

content:

text/html:

/shorten/{longURL}:

put:

description: create short URL

parameters:

name: log url

in: path

content:

text/urencoded

response:

200:

description: successfully created

content:

text/urlencoded:

403:

description: permission error

content:

text/html:

401:

description: you have login first

content:

text/html:

/profile:

put:

content:

text/json:

response:

200:

content:

text/json

403:

content:

text/html

401:

content:

text/html

get:

response:

200:

content:

text/json

403:

content:

text/html

401:

content:

text/html

post:

response:

200:

content:

text/json

403:

content:

text/html

401:

content:

text/html

delete:

response:

200:

content:

text/json

403:

content:

text/html

401:

content:

text/html

/url/{shortUrl}:

put:

response:

200:

content:

text/json

403:

content:

text/html

401:

content:

text/html

get:

response:

200:

content:

text/json

403:

content:

text/html

401:

content:

text/html

post:

response:

200:

content:

text/json

403:

content:

text/html

401:

content:

text/html

delete:

response:

200:

content:

text/json

403:

content:

text/html

401:

content:

text/html

/urls:

get:

response:

200:

content:

text/json

403:

content:

text/html

401:

content:

text/html

TL;DR: we implement basic CRUDL for shortened urls, basic CRUD for user profile, login/logout via external validator, and shorten/redirect functionality

Since we want to have multi-region availability and short request turn-around time, we use dynamic DNS load-balancing that sends request to the closest servers and falls back to other regions in case of partial outage. Dynect or AWS Route53 provide excellent services for this.

Then we use simple HTTP load-balancer in each region that distributes the requests to stateless application servers. Application servers can be either hosted in VMs/containers or made serverless functions like AWS Lambda. Application servers are backed by cloud storage.

10k RPS inserts can be served by any industrial RDBMS on commodity hardware. Cloud KV storages are generally more performant.

User profile table has to be strongly-consistent. Use any cloud RDBMS, e.g. GCE BigTable.

Short URLs are created by calculating static hash of long URL. Then it is checked against the table of existing short URLs. If there is a collision, the shortened URL is lengthened to avoid the confusion.

Load-balancing layer implements simple stanamic routing strategy: backends have static weights that are ascribed to them relative to user region and an availability probe. Static weights help to route requests to nearest region while dynamic availability probe allows to change routing priorities depending on the backend load. This enables emergency fallback in case of outage and redistribution of the load in case of traffic spikes.

Design layer essentially stores 2 tables: user profiles and URLs. The tables are eventually consistent and replicated over the regions. The implementer is advised to have some kind of transactions when updating user profile table, or at least merge strategy when inconsistent records are encountered. URL table does not require strong consistency.

We do not need caching layer, since 100k RPS load can be handled by direct reads from RDBMS or KV-storage.

Detailed Component Design

We need load-balancing layer because worst-case scenario will bring 340 ms latency in global usage scenario just because of TCP+HTTPS handshakes. Add here internal latency of our services and we hit unacceptable high numbers. Also load-balancing improves the availability and our ability to handle traffic spikes.

Our analysis did not yield any viable attack scenario by guessing short URLs. It means that the obfuscation is unnecessary and useless.

Deep dive into 2-3 key components. Explain how they work, how they scale, and any relevant algorithms or data structures. Consider drawing detailed diagrams to enhance your explanation...

Requirements

Functional Requirements:

Detailed requirements

Authentication: via external authority (e.g. OIDC)

Authorization: an authenticated user can be owner of short URL, can have permissions to follow short URLs, has permissions to edit/delete/create own short URLs.

Web Interfaces:

control interface for authenticated user
authentication flow to/from external authority
web redirect when resolving short URLs.

Transactions:

The service is not critical, it does not need to have data integrity constraints

Compliance:

No requirements at all

Data retention:

Data constraints:

We can use UUID or any other unambiguous ID to identify user account. Long URL is arbitrary string in W3C standard URL encoding, no longer than 2048 characters. Short URL is systemwide unique.

Non-Functional Requirements:

Performance – speed and responsiveness

The main flow of shortened URL redirect should have latency < 100 ms, <10 is recommended.

The expected normal load numbers are:

10k RPS create
100k RPS recall
100M unique daily users
1 year maximum data retention for inactive accounts
1 G total users
100 URLs/user for standard tier, can be expanded for higher tiers
Long URL is no longer than 2048 bytes
an URL can be shared with no more than 100 user accounts or with everyone including unauthenticated users
expected user base growth is x10 over several years
Security – protection against unauthorized access

Usability – ease of use

There are no hard requirements for usability when creating or editing shortened URL. But resolving shortened URL to long form should be extremely usable and seamless is all possible scenarios.

Reliability – system stability and availability

The system should have basic protection from infrastructure outages, e.g. have the servers spread over several regions, use highly-available storage, etc. Having DoS protection is highly advised

Scalability – ability to handle growth

The normal load is expected to grow very slowly: x10 over 3 years, but the system should expect and handle normally short bursts of request rate.

Maintainability – ease of updates and fixes

Portability – ability to run in different environments

The system does not need to be portable, it can be vendor-locked to infrastructure provider. It must conform to WWW standards in terms of internationalization, portability, error tolerance.

Availability

The system should tolerate limited infrastructure outages without stopping the service, such as unavailable datacenter in one region.

Desired uptime is 99.999%. Desired RPO is 1 day for a limited user set, 2 min for the entire userbase. Desired RTO is below 1 minute.

API Design

Expected APIs

Main flow

login
logout
redirect

Profile management

Create profile
Read profile properties
Update profile
Delete profile (remove user)

URL management

list my URLs
create new URL
read URL properties
update URL properties
delete URL

API doc

/login:

post:

description: redirect to external auth system, e.g. keycloak

/logout:

post:

description: call external system to forget the session

/redirect/{shortUrl}:

get:

description: resolve short URL

response:

301:

description: normal redirect response

404:

description: error - no such short URL found

content:

text/html:

403:

description: access permission error

content:

text/html:

/shorten/{longURL}:

put:

description: create short URL

parameters:

name: log url

in: path

content:

text/urencoded

response:

200:

description: successfully created

content:

text/urlencoded:

403:

description: permission error

content:

text/html:

401:

description: you have login first

content:

text/html:

/profile:

put:

content:

text/json:

response:

200:

content:

text/json

403:

content:

text/html

401:

content:

text/html

get:

response:

200:

content:

text/json

403:

content:

text/html

401:

content:

text/html

post:

response:

200:

content:

text/json

403:

content:

text/html

401:

content:

text/html

delete:

response:

200:

content:

text/json

403:

content:

text/html

401:

content:

text/html

/url/{shortUrl}:

put:

response:

200:

content:

text/json

403:

content:

text/html

401:

content:

text/html

get:

response:

200:

content:

text/json

403:

content:

text/html

401:

content:

text/html

post:

response:

200:

content:

text/json

403:

content:

text/html

401:

content:

text/html

delete:

response:

200:

content:

text/json

403:

content:

text/html

401:

content:

text/html

/urls:

get:

response:

200:

content:

text/json

403:

content:

text/html

401:

content:

text/html

TL;DR: we implement basic CRUDL for shortened urls, basic CRUD for user profile, login/logout via external validator, and shorten/redirect functionality

10k RPS inserts can be served by any industrial RDBMS on commodity hardware. Cloud KV storages are generally more performant.

User profile table has to be strongly-consistent. Use any cloud RDBMS, e.g. GCE BigTable.

We do not need caching layer, since 100k RPS load can be handled by direct reads from RDBMS or KV-storage.

Detailed Component Design

Our analysis did not yield any viable attack scenario by guessing short URLs. It means that the obfuscation is unnecessary and useless.

Deep dive into 2-3 key components. Explain how they work, how they scale, and any relevant algorithms or data structures. Consider drawing detailed diagrams to enhance your explanation...