My Solution for Designing a Simple URL Shortening Service: A TinyURL Approach with Score: 7/10
by drift_vortex258
System requirements
Functional:
- validate a URL
- make sure URL has not been
shortened by user before
- take a long URL, minimize it
- save users in table
- Connect URL mapping to a user
Non-Functional:
- what is the read to write ratio?
- 2:1
- CAP - consistency or availability
more important?
- consistency > availability
- want master-master replication
schema
multiple primary DBs
- Use relational DB - models will be consistent. guarantee
of performing transactions
Capacity estimation
- 10-50k daily users
- performance?
- 2s for whole operation
- memory used?
assume long URL - 100bytes
short - 50bytes
total - 150bytes -> 200bytes
- assume user creates 10 URL maps/day
- 50k * 10 * 200 = 100mil bytes used daily
== 0.1 GB
0.1 * 30 days * 12 months = 0.3*12 = 3.6 GB / year
API design
/validate
input: long url
check if valid URL
regex or battle
tested framework
return boolean
/checkUrlInDb
input: long url
check if user has
already generated
map for this url
return boolean
/generateUrl
input: long URL
call /validate
call /checkUrlInDb
apply a hashing function
to url
call /saveToDb
if successful,
return the {long url: hash}
/saveToDb
input: long url and its hash
save it to db
return 200 if save successful
400 if error
/getAllMaps
input: userid
return all urlMaps for that user
Database design
client sends req
to our website
-> DNS
-> CDN serves
website ->
client pings
LB -> ping
web service cache
to see if req has
already been fulfilled
-> send req to BE
if not in cache
-> call /validate
-> call /checkUrlInDb
-> call /generateUrl
-> call /saveToDb
-> if saved to DB
return {longUrl: shortUrl}
-> request is cached
as userid, longurl,
{longUrl: shortUrl}
client wants to see all
url mappings -> call
/getAllMaps -> renders
on client browser
High-level design
- CDN to cache static content for user
- Load balancer to handle traffic
- Cache to serve requests already generated. ease load on server
- Server to handle requests not already saved in cache. Also handles saving new mappings into database
- Database to save user data, url mappings, and user to url mappings
Request flows
client sends req
to our website
-> DNS
-> CDN serves
website ->
client pings
LB -> ping
web service cache
to see if req has
already been fulfilled
-> send req to BE
if not in cache
-> call /validate
-> call /checkUrlInDb
-> call /generateUrl
-> call /saveToDb
-> if saved to DB
return {longUrl: shortUrl}
-> request is cached
as userid, longurl,
{longUrl: shortUrl}
client wants to see all
url mappings -> call
/getAllMaps -> renders
on client browser
Detailed component design
Dig deeper into 2-3 components and explain in detail how they work. For example, how well does each component scale? Any relevant algorithm or data structure you like to use for a component? Also you could draw a diagram using the diagramming tool to enhance your design...
- Using round robin: simple and sequential.
- Using Postgres, a relational database, because our data will be structured. Data won't deviate from norm. Having the guarantee of performing operations is more important.
Trade offs/Tech choices
- SQL vs NoSQL
chose SQL because our data will be structured.
won't deviate from norm. Having the guarantee
of performing operations is more important.
- master-slave vs master-master replication:
master-slave is simpler, but data replication
can be bottleneck
slave DB could have outdated data
master-master is more reliable. if one DB
goes down, the other can still handle operations
more complexity as we need to try to avoid duplicate
data
since we prioritized consistency, master-master is better
Failure scenarios/bottlenecks
- resolving if data exists in one DB but not the other
- latency if client is far away from DB location
- could employ horizontal sharding such that
we partition database based on regions
Future improvements
What are some future improvements you would make? How would you mitigate the failure scenario(s) you described above?