My Solution for Designing a Simple URL Shortening Service: A TinyURL Approach with Score: 10/10

by horizon5628

System requirements


Functional:

  1. Shorten the url
  2. Resolve short url to long url and redirect user to it.
  3. Option for custom short aliases
  4. Collect stats and build popularity dashboard
  5. Add expiration feature - Longterm support - 100 yrs a good support by default
  6. Rest API
  7. User management - manage their urls


Non-Functional:

  1. Performance
  2. Scalability
  3. Reliability
  4. Security
  5. Monitoring
  6. Data Backup
  7. Compliance


Capacity estimation

1% reqs for shortening and 99% for retrieval.


10000 * 365 * 100 = 365 mil records if everything is unique

assuming 1 kb per request - 365 gb for storing reqs ~ 500 gb


100 reqs/sec assuming it takes 10ms to serve each request, we need 1 cpu core

assuming we will cache 100% of monthly traffic, we need less than 1 gb memory.

We use LRU cache eviction policy.


64^4 = 160 mil

64^5 = 1 bil - we will go with 5-character short url hash.


API design

Define what APIs are expected from the system...


POST aka.ms/api/v1/shorten

{"url": "https://google.com/jnsckjdfn/dfdvdfv?d=4&k=0", "userid": "abc", "oauthtoken": "gfcgjkj", "alias": "google", "expirydays": 10}

response:

sh.rt/shrturl

or error code if the alias is already taken (add suggestions if any) or not authorized



GET aka.ms/api/v1/shrturl

response:

redirects to https://google.com/jnsckjdfn/dfdvdfv?d=4&k=0 with status code 302 if url found

else returns 404


PUT aka.ms/api/v1/edit

{"shrt_url": "sh.rt/shrturl", "new_long_url": "cdcdcc"}

response:

200 if create by same user.

error if not


DELETE aka.ms/api/v1/shrturl

response:

200 if create by same user.

error if not


GET aka.ms/api/v1/analytics/popular_shorts

response:

top 10 most accessed short urls


Database design

Lot of random reads and no relation constraints (kv store?)


sequential read only for user dashboard.


Users Table schema:

name

id

email

subscription

last_login

no_requests

request_limit

no_of_urls


Available hashes schema:

Hash

State - 0/1 0 -> available 1 -> unavailable


index on state as we filter and fetch / delete based on state.


URLs table schema:

long

hash

create time

expiry time


Logs on NoSQL schema:

timestamp

log_level

message

(user + request + response)


High-level design


write request

User -> lb -> write_service-> db

key_service -> db


read request

User <-> lb <-> read_service <- cache <- db


shorten service will mark a random hash / check if custom_shrt is present in urls. if not present, assign it. else throw error if in urls, block if in hash table and assign it. if not in hash and in urls, error out and send recommendations.


Utilities:

CleanupUtil - cleans up any expired urls from urls table and adds them back in hash table

HashGenerationService - monitors enries in hash table. when below a threshold, populate it with new ones.



Request flows

Explain how the request flows from end to end in your high level design. Also you could draw a sequence diagram using the diagramming tool to enhance your explanation...


write request

User -> lb -> write_service-> db


read request

User <-> lb <-> read_service <- cache <- db


Detailed component design


LB:

load balancer partitions requests based on short uri hash to increase cache hits


Key generation:

md5? no as its expensive and we have to handle collisions etc

incr keys via key service? yes


Trade offs/Tech choices

Explain any trade offs you have made and why you made certain tech choices...


Read heavy - keep read replicas


when the uri is not replicated yet, read from original.


caching:

caching frequently used urls is beneficial.


should we load balance based on hostname?


Failure scenarios/bottlenecks


invalid url?

return error response.


two simultaneous same custom_shorts

- locking at record level


Future improvements

Should we prevent scraping shrt to long uris?