Designing A Simple Url Shortening Service A TinyURL Approach - System Design

Requirements

Functional Requirements:

Create a short URL for a given long URL.
Return the long URL associated with a given short URL.

Non-Functional Requirements:

List the key non-functional requirements (eg low latency, scalability, reliability, etc.)...
Highly available - system must be redundant and have failover mechanisms in place
Low latency - data retrieval may be cached to minimize delays
Scalable - system should expand horizontally when demand increases via load balancing and distributed databases

API Design

Define the APIs expected from the system. This is your chance to analyze and define the read and write paths so that you can come up with the high-level design...

API endpoints needed for shortened url creation. Endpoint should take post request that contains the longer target url, which then kicks off shortening service that creates compacted url and responds back to client.

/create (POST)

creates the shortened url by taking in the long url as request body
Behind the scenes, the long url is used to generate a shortened unique url
Endpoint responds with the shortened unique url

/get/{unique_url}

When the shortened url is actually hit, it invokes a get request on this endpoint, which will request the long target url based on the unique_url, if one is found the endpoint responds 302 and redirects to the long target url

High-Level Design

Describe the overall system architecture. Identify the main components needed to solve the problem end-to-end. Use the diagramming tool to create a block diagram.

Assuming an AWS Shop:

On creation:

Front End -> Shortening API -> Database -> Shortening API -> Front End

Entry layer would be a web front end that allowed a user to submit a long target url and receive a shortened url in return.

Identifiers are generated by creating unique key, then checking existing db entries for any collisions with the identifier.

Storage layer here exists to keep relationship between target url and unique short url in tact.

Caching could be used to hold the target url client side so that it is not retrieved every time the shortened url is hit.

Detailed Component Design

Deep dive into 2-3 key components. Explain how they work, how they scale, discuss tradeoffs, capacity, and any relevant algorithms or data structures.

Shortening API application

How it works - ECS fargate application built using python/fast api.

For new URL creation: Works by taking a client HTTP POST request from the front end that contains long target url along with auth headers, etc. After receiving the post request the fast api /create endpoint will invoke Shortening Service Lambda which creates the actual shortened url and creates the record in the database containing the user, long url, short url, unique id, and status, along with event timestamps. The invocation of the lambda will happen synchronously so it can confirm the lambda success/failure within the http codes. (Using InvocationType: 'RequestResponse'). This also allows us to return the shortened url in the lambda output itself contained by the http response. This shortened url can then be passed as the response to the POST request and served to the user on the front end. The api application should utilize cloudfront CDN to cache the target urls and serve redirection from there. This can be configured by including the appropriate headers when the API endpoint is hit originally to ensure the resulting url is cached for a defined period of time .

When the shortened URL is hit:

After the url has been shortened and is given back to the user, when they try to access that url, it should immediately hit the /get/{unique_url} endpoint which then serves a redirect to that target url that is retrieved based on the unique url.

How it scales - Since we are taking public traffic, we would have stood this ECS Fargate task behind an ALB, with that we can use ALB Request Count Per Target to drive our autoscaling policy in a horizontal direction

Tradeoffs - Scaling policies can be reactive vs proactive. Tradeoff is complexity. We could theoretically have the front end publish messages to a queue, and then scale based on queue depth, but for something as light weight as a url shortener, a standard HTTP loop should be sufficient.

Capacity - Capacity limits in this architecture would be bottlenecked by AWS technical limits (for the synchronous system we built here that is around 100,000 requests per second. If/When demand is better suited for asynchronous architecture a SQS queue should be implemented where capacity would be virtually unlimited for incoming requests/messages .

Algorithms- None needed here outside of autoscaling policy algo for checking queue depth and setting correct scaling triggers.

Data Structures - FE Request structure to the shortening API would look like

{

request-id: {request_id},

user: {user},

target-url: {target_url},

...

}

Invocation call structure from API to Shortening service lambda would look like this

{

request-id: {request_id},

user: {user},

target-url: {target_url},

...

}

Response from lambda to API should look like this

{

HTTPcode: 200,

lambda_output: {

output: {unique_url}

...

}

Database entry from lambda should look like this

requestId (index) - unique request id generated when FE makes initial API call

userId - User guid that submitted request

target_url - destination url

unique_url - shortened url (Global secondary index)

createTimestamp - timestamp when record is created

updateTimestamp - timestamp when record is updated

status - Overall status of request (Submitted, Completed, Failed)

TTL - Use TTL to hold shortened URLs for a specified period of time before it expires and the record is removed.

Shortening Service Lambda

How it works - lambda receives invocation request from api containing request-id, target url, etc. Using the target url, lambda will use pythons built in secrets module to generate a cryptographically secure random string and then inserts it into the db, in the small chance of collision, using conditional writes with dynamo will throw errors, which lambda can catch and retry accordingly. After url has been generated and stored, its passed back out of the lambda back to the API, and then the front end.

How it scales - This scales well, only one DB query is needed instead of going through and double checking if it exists. Dynamo db can be set up with on demand capacity which instantly scales up and down to match lambda behavior.

Tradeoffs - 1000 writes per second per dynamodb partition, could be throttled under huge demand spike. lambda cold starts will exist unless provisioned concurrency is implemented for lambdas, adding a fixed cost to the monthly bill. Could also make the slug/unique id 8 characters instead of 6 to avoid any collisions.

Capacity - lambda base concurrency is 1000 concurrent executions, or around 30k per second in our case assuming lambda run quickly. dynamo should use on demand capacity, which scales based on traffic.

Algorithms - none needed besides built in python secrets module

Data Structure -