My Solution for Designing a Simple URL Shortening Service: A TinyURL Approach
by phantom_quantum863
Requirements
Functional Requirements:
- Create a short URL for a given long URL.
- Return the long URL associated with a given short URL.
- Links expire after x time of non-usage
Non-Functional Requirements:
- Low Latency (200ms)
- Scalability (Able to handle 100k RpS using horizontal scalability)
- Reliability total availability 99%
- Rate limit to 5 requests per second
Questions:
- Can users see how their URLs performed? analytics?
- If yes, can two users with the same URL generate the same hash? my assumption is no.
- Do we need analytics for the whole system? or is this out of scope?
Capacity Estimation
We have read bias
Store Links indefinitely require an DB cluster
Average link data is size 200 characters
1M links per day
200M character each day in the DB this gives us 6 Billion characters in the DB or 6GB a month.
5 years projection: 360 GB
Inbound traffic for link creation is 6GB a day
Inbound traffic for link redirection 1GB a day given 100 million short url redirections.
API Design
Post /v1/shorten
Body {"url": "
return 201 // no content
return 400 no valid url exists in the body
return 500 internal server error {"errorCode: int, message: ""}
GET
return 302 redirect
return 404 no short url found
return 500 internal server error {"errorCode: int, message: ""}
High-Level Design
- Client: just an input filled and a path
- API Server:
- Resolve short URLs
- Send shorten URL requests to URLProcessor service.
- Send retrieve URL event to analytics engine.
- URLProcessor: perform the URL shortening logic, including hash collision resolution etc.
- Pubsub: for communicating SBE between FE and the URLProcessor Service.
- KV Store: for quickly caching the hot urls
- Message/event bus: to send a URL to URLProcessor
- Database: Store URL mapping wit metadata maybe
- Analytics Engine: Listens to the messages
- Analytics database: a timeseries database to store access patterns.
Notes on technology selection:
- NATS: supports the user cases of KV pubsub, and messaging queue
- With NATS durable storage I can claim that we don't even need another database. although I would use one for the analytic.
Scalability:
- Separating the API and the URLProcessor we can do a horizontal scaling for both independently.
- Another approach is to create a gateway component that redirects traffic to the api to two components which which can also be scaled separately.
- Creating URLs
- Redirect Service.
- Scaling can be done based on request load.
Low Latency:
- Using a cache to load the hot URLs makes sure the latency stays very low.
Availability:
- Since the components are independent then they can have redundancy behind the load balancer.
Database Design
Database can have a basic table called URLs, it has very basic information based on the requirements above. since we are user agnostic then the URL should be maybe have URL, hash, created at and last accessed (this will help with crashes later to auto reload last access URL/hot URLs) the last access at field will be pushed as a bulk snapshot from the KV store.
Database can be shared using hash ranges to have stable shards and avoid re-balancing.
Since this is a read heavy system we can have read replicas for the URLs, with read repair (maybe),read from active replica or read your own writes to provide strong consistency.
Detailed Component Design
Client:
- Typing a URL and hitting submit will do two things will send api request to shorten the URL and will subscribe to URLShortened Event with the URL as a key
- When a the event is published the user will be able to see the shortened URL
- Client access should be rate limited
API:
- All traffic to the API should be both rate limited and load balanced.
- Each request to visit shortened URL first check the cache for a url hit. If there is no hit then the database is checked and the the result is loaded on the cache.
- Note: Use a CDN for frequently visited URLs to off load the API completely instead of the internal cache.
URLProcessor:
- It receives a URL Creation Event
- Search for the URL in the database, if it exists just broadcast a URLShortened event to the pubsub ( this way we don't have to create anew shortedned URL to URLS we already have)
- if not then generate a hash and pull all hashes that have this has as a prefix. multiple exists then append a new suffix to the hash and then save the URL and broadcast URLShortened event.