System requirements
Functional:
- anonymous users create shortened URL from any user-provided URL
- users visiting the shortened URL will be redirected to the fully qualified URL
- no update, no deletion
- definition of "short": maybe 10 characters after the domain name?
Non-Functional:
- highly available, n 9s
- eventual consistent -- uploaded URL will eventually be visible
Capacity estimation
- Data size: URL < 1KB, 10M * 10 * 1KB --> 100GB/day
- Long term: in 10 year would be 100GB * 365 * 10 = 365TB
- QPS write: 10M DAU, average 10 shorten request per day --> 10M * 10 / ~100K = 1KQPS
- QPS read: 1B click through shortened URL --> 1B / ~100K --> 10K QPS on average, peak might be magnitude more
API design
- POST /api/v1/shortenURL
- returns the shortened URL
- GET /api/v1/redirect
- returns 301 with fully qualified URL
Database design
Do not need transactions/ACID. loose relations. Use NoSQL/KV store
Since the size of the URL is small (kilobytes), no need for blob store
URLs have no strong relations, no need for structured store either
Can simply use NoSQL/KV store. But traditional SQL database is also fine.
Schema:
Key will be the shortened URL
Value will be
- fully qualified URL
- creation time
- time to live
High-level design
- Users: making requests to either shorten a URL or visiting a shortened URL
- Gateway/load balancer: distribute load among API servers
- API servers: sitting behind the load balancers / gateway servers to take service requests
- hash function to shorten URL, and write to the master database
- retrieve the full qualified URL from a slew of slave read-only database servers
- Database (sharding)
- hot-cold/master-slave, with the slaves serving the read requests and master serving the writes
- Message queue
- Caching
- Within the datacenter, cache the popular redirect requests, to save a db lookup
- CDN to cache redirects to popular URLs
Request flows
Explain how the request flows from end to end in your high level design. Also you could draw a sequence diagram using the diagramming tool to enhance your explanation...
Write (shorten URL):
- Requests come off the load balancers and hit the corresponding API server
- API server performs the hash off the fully qualified URL, get the shortened URL, and response to the client
- API server separately makes an async call/write to the database, via the message queue, to create a record of shortened URL
- On API server failure, client would make another request via the LB to hit a separate server. Same goes for the DB server, since messages are persisted on the queues
Read (redirect)
- Redirect request, on cold cache, would hit DB to retrieve the records
- Subsequent requests would hit cache
Detailed component design
Dig deeper into 2-3 components and explain in detail how they work. For example, how well does each component scale? Any relevant algorithm or data structure you like to use for a component? Also you could draw a diagram using the diagramming tool to enhance your design...
Trade offs/Tech choices
Explain any trade offs you have made and why you made certain tech choices...
Failure scenarios/bottlenecks
Try to discuss as many failure scenarios/bottlenecks as possible.
Future improvements
What are some future improvements you would make? How would you mitigate the failure scenario(s) you described above?