My Solution for Designing a Simple URL Shortening Service: A TinyURL Approach with Score: 8/10
by vivid103
System requirements
Functional:
List functional requirements for the system (Ask the chat bot for hints if stuck.)...
- Accept a URL to shorten, return a shortened URL.
- The URL mapping should be consistent.
- Maybe we want to expand to expiring URL mappings?
- Should forward any shortened URLs to the original
- Should track hits to shortened URLs
Non-Functional:
List non-functional requirements for the system...
- Scalability
- Availability
- avoid SPOF
- prioritize read availability
- prioritize consistency on reads
- Security
- Observability
- Simplicity
Capacity estimation
Estimate the scale of the system you are going to design...
- shorten 100 URL / s
- redirect 5,000 URL / s (typical) 50,000 URL /s (burst)
- storage
- 100 bytes per saved URL
- 8 bytes per shortened URL
- ~10KB/s == ~360 MB/hour == ~6GB / day == ~1 TB / year
- a 7 character string with lowercase / uppercase numbers gives 62^7 possibilities.
- 8 char might be easier to remember, split into two groups of 4
API design
Define what APIs are expected from the system...
POST /shorturl
request: {original_url: 'somestring'}
response: {short_url: 'bit.ly/aDF93d'}
GET /
response 301 redirection to original_url
Database design
Defining the system data model early on will clarify how data will flow among different components of the system. Also you could draw an ER diagram using the diagramming tool to enhance your design...
URL table
id: BigInt
original_url: text
short_url: text
created_at: datetime with timezone
hits: BigInt
High-level design
You should identify enough components that are needed to solve the actual problem from end to end. Also remember to draw a block diagram using the diagramming tool to augment your design. If you are unfamiliar with the tool, you can simply describe your design to the chat bot and ask it to generate a starter diagram for you to modify...
edge varnish cache for most popular websites (globally distributed)
load balancer to distribute requests
web server(s) to handle load
global caching server. cheaper than hitting DB
database. durable store data.
Request flows
Explain how the request flows from end to end in your high level design. Also you could draw a sequence diagram using the diagramming tool to enhance your explanation...
# writes
client makes POST request
request hits load balancer, get sent to a stateless web server
web server writes to database
web server updates cache
web server returns response to client
# reads
client makes GET request
request hits load balancer
gets sent to stateless web server
web server looks up url in cache
on cache miss, web server looks up url in database
on db miss, returns a 404
on db hit, writes to cache
on cache hit, return 301 redirect to url
Detailed component design
Dig deeper into 2-3 components and explain in detail how they work. For example, how well does each component scale? Any relevant algorithm or data structure you like to use for a component? Also you could draw a diagram using the diagramming tool to enhance your design...
load balancer could use a couple strategies:
- round robin
- smallest request queue
- track latency metrics per server and use latency metrics
web server hash function
- is an 8 character hash, we can just randomly generate a hash per url. there's enough randomness -- we can retry if we create a collision
database
- relational database (postgres)
- use a unique index on hash to prevent collisions
- compound index on (short_url, original_url)
Trade offs/Tech choices
Explain any trade offs you have made and why you made certain tech choices...
The "hit" calculation is asynchronous through queue population. It's not as important to have it be "read-your-writes" as creating a short-url.
The server and cache use the same processing queue to minimize code path and duplication.
we include an edge cache and global redis cache for multi-tier caching. The edge case is primarily to catch power law scenarios where some URLs have blown up and keep the system robust.
Multi-level caching make cache coherency harder if we want to evict an item from all caches. There's going to be some delay (TTL).
Postgres was a fatter data store than we needed, but it makes it easy to expand if we want to collect extra metrics / metadata.
Failure scenarios/bottlenecks
Try to discuss as many failure scenarios/bottlenecks as possible.
If a cache falls over, we might get thundering herds that take down the whole system. Having read replicas that can takeover as primarys would be good for resiliency.
if a write request hits the database but doesn't return to the user, they should be able to make a request with the same long url and get back the saved short url from the database (like an idempotency key)
Future improvements
What are some future improvements you would make? How would you mitigate the failure scenario(s) you described above?