Codemia | Master System Design Interviews Through Active Practice

My Solution for Designing a Simple URL Shortening Service: A TinyURL Approach with Score: 8/10

by quasar2974

System requirements

Functional:

List functional requirements for the system (Ask the chat bot for hints if stuck.)...

redirect users to full site when provided a short url

short url can be reused

short urls remain active for long time

users can provide vanity urls

short urls can be reclaimed

short urls are short length to be easily remembered

Non-Functional:

List non-functional requirements for the system...

network latency is kept to a minimum (geolocation-based routing)

optimized for read-heavy operation (servers to read from in-region read replicas )

scales for millions of users (auto-scaling)

service is optimized for quick responses (cache/ CDN)

can automatically recover in the case of a outage (multi-region and AZ deployment of servers and DB)

backup easily restored in event of outage (restore DB from backup)

urls are safely stored to prevent unauthorized access (security groups attached to DB)

Capacity estimation

Estimate the scale of the system you are going to design...

alphanumeric = 26 + 10 = 36 chars

url has a base domain and a hash

hash is say 6 chars

can support 36^6 = 2 Billion urls

suppose full encoded url is at most 1 KB

1 KB * 2 billion = 2 TeraBytes

API design

Define what APIs are expected from the system...

Redirect API - perform ACID operations against the DB to store and read url redirects

Database design

Defining the system data model early on will clarify how data will flow among different components of the system. Also you could draw an ER diagram using the diagramming tool to enhance your design...

short_url, full_url, number_visits, geolocation, last_accessed_time, created_time, updated_time

short_url and full url form a composite key

High-level design

You should identify enough components that are needed to solve the actual problem from end to end. Also remember to draw a block diagram using the diagramming tool to augment your design. If you are unfamiliar with the tool, you can simply describe your design to the chat bot and ask it to generate a starter diagram for you to modify...

DNS resolver

intercepts requests beginning with short url domain)

In-memory cache

Load Balancer

DB API

hosted on fleet of servers managed by cloud provider
serverless works because the operations are stateless

Database

Request flows

Explain how the request flows from end to end in your high level design. Also you could draw a sequence diagram using the diagramming tool to enhance your explanation...

Read:

Client provides request with short url domain

DNS entry routes the request to the load balancer

DNS Server receives the request through the load balancer

Server checks cache for entry

If in cache, server returns the redirect url to the client

If not in cache, server makes API call to DB to fetch the entry and update the cache

Write:

Client provides full url to shorten via the self-service URL onboarding UI

UI makes API call to generate a new short url and insert the mapping in the database

DB API returns the shortened url to the UI which returns it to the client

Detailed component design

Dig deeper into 2-3 components and explain in detail how they work. For example, how well does each component scale? Any relevant algorithm or data structure you like to use for a component? Also you could draw a diagram using the diagramming tool to enhance your design...

DB API

hosted on fleet of servers managed by cloud provider
serverless works because the operations are stateless (one request does not rely on data persisted on the server by another request)
horizontally scales based on auto-scaling policy using a metric like CPU utilization

Load balancer

Distributes traffic based on geolocation if users are from around the world and network latency must be kept to a minimum
Alternatively, distributes traffic to servers based on perceived load as some urls may be requested more than others

Self-service UI

Commercial off-the-shelf solution with an interface for accepting and validating input URLs
Validation ensures the URL is not malicious and does not contain personal access tokens or sensitive info

Database

encrypts data at rest and in transit
security groups to allow access only from the DNS server and the Database API to prevent unwanted access

Trade offs/Tech choices

Explain any trade offs you have made and why you made certain tech choices...

Simplicity vs extensibility

I made the Database a simple single-table design to optimize database operations. Write operations update a single table, so it is atomic and does not necessitate stored procedures. It is a tradeoff against extensibility because more compute and capacity is required to ingest a record into memory, more than what is needed for a read operation. However, the functional requirements are kept at a minimum for now, so there aren't too many attributes to keep track of.

Real-time vs async

I added an attribute to keep track of the number of requests for a url but didn't provide details on how that attribute is updated. Ideally, a request to fetch the redirect url should not rely on a successful update to the url's request count, as that would necessitate a write operation to the database. Instead, I opted for a message queue to enable asynchronous delivery of the request to update the database, at which point a processor would pick up the message from the queue and update the record in the database. The decision to make the attribute update async instead of real-time is driven by the fact that the info is used for the relatively low priority of analytics. DNS server would fire-and-forget the request to serve the client in near real-time.

Failure scenarios/bottlenecks

Try to discuss as many failure scenarios/bottlenecks as possible.

URL is not in the database

Region goes down -> DNS entry is failed over to the standby servers in the secondary region

Cache is empty on cold start -> Have a process pre-warm the cache to meet the initial spike in demand to the application.

Cache does not reflect the traffic distribution -> Users from different geolocations might request different completely different URLs. As a result, the entries in the cache are constantly in competition with each other. Since a cache is shared among all instances of the server, the requests containing different urls might incur more cache misses and necessitate a expensive call to the database.

Future improvements

What are some future improvements you would make? How would you mitigate the failure scenario(s) you described above?

Flesh out the async process for updating the database for analytics attributes

Experiment with different routing policies to ensure the traffic is as evenly distributed across servers as possible.

Explore if a CDN would reduce the amount of cache misses for competing requests from different geolocations.