My Solution for Designing a Simple URL Shortening Service: A TinyURL Approach with Score: 7/10

by drift_vortex258

System requirements


Functional:

- validate a URL

- make sure URL has not been

shortened by user before

- take a long URL, minimize it

- save users in table

- Connect URL mapping to a user


Non-Functional:

- what is the read to write ratio?

- 2:1

- CAP - consistency or availability

more important?

- consistency > availability

- want master-master replication

schema

multiple primary DBs

- Use relational DB - models will be consistent. guarantee

of performing transactions




Capacity estimation

- 10-50k daily users

- performance?

- 2s for whole operation

- memory used?

assume long URL - 100bytes

short - 50bytes

total - 150bytes -> 200bytes

- assume user creates 10 URL maps/day

- 50k * 10 * 200 = 100mil bytes used daily

== 0.1 GB

0.1 * 30 days * 12 months = 0.3*12 = 3.6 GB / year



API design

/validate

input: long url

check if valid URL

regex or battle

tested framework

return boolean


/checkUrlInDb

input: long url

check if user has

already generated

map for this url

return boolean


/generateUrl

input: long URL

call /validate

call /checkUrlInDb

apply a hashing function

to url

call /saveToDb

if successful,

return the {long url: hash}



/saveToDb

input: long url and its hash

save it to db

return 200 if save successful

400 if error


/getAllMaps

input: userid

return all urlMaps for that user





Database design

client sends req

to our website

-> DNS

-> CDN serves

website ->

client pings

LB -> ping

web service cache

to see if req has

already been fulfilled

-> send req to BE

if not in cache

-> call /validate

-> call /checkUrlInDb

-> call /generateUrl

-> call /saveToDb

-> if saved to DB

return {longUrl: shortUrl}

-> request is cached

as userid, longurl,

{longUrl: shortUrl}


client wants to see all

url mappings -> call

/getAllMaps -> renders

on client browser





High-level design

  • CDN to cache static content for user
  • Load balancer to handle traffic
  • Cache to serve requests already generated. ease load on server
  • Server to handle requests not already saved in cache. Also handles saving new mappings into database
  • Database to save user data, url mappings, and user to url mappings



Request flows

client sends req

to our website

-> DNS

-> CDN serves

website ->

client pings

LB -> ping

web service cache

to see if req has

already been fulfilled

-> send req to BE

if not in cache

-> call /validate

-> call /checkUrlInDb

-> call /generateUrl

-> call /saveToDb

-> if saved to DB

return {longUrl: shortUrl}

-> request is cached

as userid, longurl,

{longUrl: shortUrl}


client wants to see all

url mappings -> call

/getAllMaps -> renders

on client browser





Detailed component design

Dig deeper into 2-3 components and explain in detail how they work. For example, how well does each component scale? Any relevant algorithm or data structure you like to use for a component? Also you could draw a diagram using the diagramming tool to enhance your design...


  • Using round robin: simple and sequential.
  • Using Postgres, a relational database, because our data will be structured. Data won't deviate from norm. Having the guarantee of performing operations is more important.



Trade offs/Tech choices

- SQL vs NoSQL

chose SQL because our data will be structured.

won't deviate from norm. Having the guarantee

of performing operations is more important.


- master-slave vs master-master replication:

master-slave is simpler, but data replication

can be bottleneck

slave DB could have outdated data


master-master is more reliable. if one DB

goes down, the other can still handle operations


more complexity as we need to try to avoid duplicate

data


since we prioritized consistency, master-master is better





Failure scenarios/bottlenecks

- resolving if data exists in one DB but not the other

- latency if client is far away from DB location

- could employ horizontal sharding such that

we partition database based on regions





Future improvements

What are some future improvements you would make? How would you mitigate the failure scenario(s) you described above?