My Solution for Designing a Simple URL Shortening Service: A TinyURL Approach with Score: 10/10
by horizon5628
System requirements
Functional:
- Shorten the url
- Resolve short url to long url and redirect user to it.
- Option for custom short aliases
- Collect stats and build popularity dashboard
- Add expiration feature - Longterm support - 100 yrs a good support by default
- Rest API
- User management - manage their urls
Non-Functional:
- Performance
- Scalability
- Reliability
- Security
- Monitoring
- Data Backup
- Compliance
Capacity estimation
1% reqs for shortening and 99% for retrieval.
10000 * 365 * 100 = 365 mil records if everything is unique
assuming 1 kb per request - 365 gb for storing reqs ~ 500 gb
100 reqs/sec assuming it takes 10ms to serve each request, we need 1 cpu core
assuming we will cache 100% of monthly traffic, we need less than 1 gb memory.
We use LRU cache eviction policy.
64^4 = 160 mil
64^5 = 1 bil - we will go with 5-character short url hash.
API design
Define what APIs are expected from the system...
POST aka.ms/api/v1/shorten
{"url": "https://google.com/jnsckjdfn/dfdvdfv?d=4&k=0", "userid": "abc", "oauthtoken": "gfcgjkj", "alias": "google", "expirydays": 10}
response:
sh.rt/shrturl
or error code if the alias is already taken (add suggestions if any) or not authorized
GET aka.ms/api/v1/shrturl
response:
redirects to https://google.com/jnsckjdfn/dfdvdfv?d=4&k=0 with status code 302 if url found
else returns 404
PUT aka.ms/api/v1/edit
{"shrt_url": "sh.rt/shrturl", "new_long_url": "cdcdcc"}
response:
200 if create by same user.
error if not
DELETE aka.ms/api/v1/shrturl
response:
200 if create by same user.
error if not
GET aka.ms/api/v1/analytics/popular_shorts
response:
top 10 most accessed short urls
Database design
Lot of random reads and no relation constraints (kv store?)
sequential read only for user dashboard.
Users Table schema:
name
id
subscription
last_login
no_requests
request_limit
no_of_urls
Available hashes schema:
Hash
State - 0/1 0 -> available 1 -> unavailable
index on state as we filter and fetch / delete based on state.
URLs table schema:
long
hash
create time
expiry time
Logs on NoSQL schema:
timestamp
log_level
message
(user + request + response)
High-level design
write request
User -> lb -> write_service-> db
key_service -> db
read request
User <-> lb <-> read_service <- cache <- db
shorten service will mark a random hash / check if custom_shrt is present in urls. if not present, assign it. else throw error if in urls, block if in hash table and assign it. if not in hash and in urls, error out and send recommendations.
Utilities:
CleanupUtil - cleans up any expired urls from urls table and adds them back in hash table
HashGenerationService - monitors enries in hash table. when below a threshold, populate it with new ones.
Request flows
Explain how the request flows from end to end in your high level design. Also you could draw a sequence diagram using the diagramming tool to enhance your explanation...
write request
User -> lb -> write_service-> db
read request
User <-> lb <-> read_service <- cache <- db
Detailed component design
LB:
load balancer partitions requests based on short uri hash to increase cache hits
Key generation:
md5? no as its expensive and we have to handle collisions etc
incr keys via key service? yes
Trade offs/Tech choices
Explain any trade offs you have made and why you made certain tech choices...
Read heavy - keep read replicas
when the uri is not replicated yet, read from original.
caching:
caching frequently used urls is beneficial.
should we load balance based on hostname?
Failure scenarios/bottlenecks
invalid url?
return error response.
two simultaneous same custom_shorts
- locking at record level
Future improvements
Should we prevent scraping shrt to long uris?