My Solution for Designing a Simple URL Shortening Service: A TinyURL Approach with Score: 8/10
by quasar2974
System requirements
Functional:
List functional requirements for the system (Ask the chat bot for hints if stuck.)...
redirect users to full site when provided a short url
short url can be reused
short urls remain active for long time
users can provide vanity urls
short urls can be reclaimed
short urls are short length to be easily remembered
Non-Functional:
List non-functional requirements for the system...
network latency is kept to a minimum (geolocation-based routing)
optimized for read-heavy operation (servers to read from in-region read replicas )
scales for millions of users (auto-scaling)
service is optimized for quick responses (cache/ CDN)
can automatically recover in the case of a outage (multi-region and AZ deployment of servers and DB)
backup easily restored in event of outage (restore DB from backup)
urls are safely stored to prevent unauthorized access (security groups attached to DB)
Capacity estimation
Estimate the scale of the system you are going to design...
alphanumeric = 26 + 10 = 36 chars
url has a base domain and a hash
hash is say 6 chars
can support 36^6 = 2 Billion urls
suppose full encoded url is at most 1 KB
1 KB * 2 billion = 2 TeraBytes
API design
Define what APIs are expected from the system...
Redirect API - perform ACID operations against the DB to store and read url redirects
Database design
Defining the system data model early on will clarify how data will flow among different components of the system. Also you could draw an ER diagram using the diagramming tool to enhance your design...
short_url, full_url, number_visits, geolocation, last_accessed_time, created_time, updated_time
short_url and full url form a composite key
High-level design
You should identify enough components that are needed to solve the actual problem from end to end. Also remember to draw a block diagram using the diagramming tool to augment your design. If you are unfamiliar with the tool, you can simply describe your design to the chat bot and ask it to generate a starter diagram for you to modify...
DNS resolver
- intercepts requests beginning with short url domain)
In-memory cache
Load Balancer
DB API
- hosted on fleet of servers managed by cloud provider
- serverless works because the operations are stateless
Database
Request flows
Explain how the request flows from end to end in your high level design. Also you could draw a sequence diagram using the diagramming tool to enhance your explanation...
Read:
Client provides request with short url domain
DNS entry routes the request to the load balancer
DNS Server receives the request through the load balancer
Server checks cache for entry
If in cache, server returns the redirect url to the client
If not in cache, server makes API call to DB to fetch the entry and update the cache
Write:
Client provides full url to shorten via the self-service URL onboarding UI
UI makes API call to generate a new short url and insert the mapping in the database
DB API returns the shortened url to the UI which returns it to the client
Detailed component design
Dig deeper into 2-3 components and explain in detail how they work. For example, how well does each component scale? Any relevant algorithm or data structure you like to use for a component? Also you could draw a diagram using the diagramming tool to enhance your design...
DB API
- hosted on fleet of servers managed by cloud provider
- serverless works because the operations are stateless (one request does not rely on data persisted on the server by another request)
- horizontally scales based on auto-scaling policy using a metric like CPU utilization
Load balancer
- Distributes traffic based on geolocation if users are from around the world and network latency must be kept to a minimum
- Alternatively, distributes traffic to servers based on perceived load as some urls may be requested more than others
Self-service UI
- Commercial off-the-shelf solution with an interface for accepting and validating input URLs
- Validation ensures the URL is not malicious and does not contain personal access tokens or sensitive info
Database
- encrypts data at rest and in transit
- security groups to allow access only from the DNS server and the Database API to prevent unwanted access
Trade offs/Tech choices
Explain any trade offs you have made and why you made certain tech choices...
Simplicity vs extensibility
I made the Database a simple single-table design to optimize database operations. Write operations update a single table, so it is atomic and does not necessitate stored procedures. It is a tradeoff against extensibility because more compute and capacity is required to ingest a record into memory, more than what is needed for a read operation. However, the functional requirements are kept at a minimum for now, so there aren't too many attributes to keep track of.
Real-time vs async
I added an attribute to keep track of the number of requests for a url but didn't provide details on how that attribute is updated. Ideally, a request to fetch the redirect url should not rely on a successful update to the url's request count, as that would necessitate a write operation to the database. Instead, I opted for a message queue to enable asynchronous delivery of the request to update the database, at which point a processor would pick up the message from the queue and update the record in the database. The decision to make the attribute update async instead of real-time is driven by the fact that the info is used for the relatively low priority of analytics. DNS server would fire-and-forget the request to serve the client in near real-time.
Failure scenarios/bottlenecks
Try to discuss as many failure scenarios/bottlenecks as possible.
URL is not in the database
Region goes down -> DNS entry is failed over to the standby servers in the secondary region
Cache is empty on cold start -> Have a process pre-warm the cache to meet the initial spike in demand to the application.
Cache does not reflect the traffic distribution -> Users from different geolocations might request different completely different URLs. As a result, the entries in the cache are constantly in competition with each other. Since a cache is shared among all instances of the server, the requests containing different urls might incur more cache misses and necessitate a expensive call to the database.
Future improvements
What are some future improvements you would make? How would you mitigate the failure scenario(s) you described above?
Flesh out the async process for updating the database for analytics attributes
Experiment with different routing policies to ensure the traffic is as evenly distributed across servers as possible.
Explore if a CDN would reduce the amount of cache misses for competing requests from different geolocations.