System requirements


Functional:

List functional requirements for the system (Ask interviewer if stuck)...

  • Generates a unique shortcode
  • Short code is associated with long URL in database
  • Accessing short url looks up short code in database to retrieve long URL
  • Long URL is returned to user and user is redirected
  • short code is 7 characters long



Non-Functional:

List non-functional requirements for the system...

  • Short code generation should be quick (low latency)
  • High availability



Capacity estimation

1 million URLs a day = 11 URLs per second

100 million URL short code look ups a day = 1160 short code look ups a day

Assuming it runs for 10 years, this means we must support 3.65 billion short codes



API design

POST api/v1/createShortUrl:

  • request parameter: longUrl
  • returns shortURL


GET api/v1/getLongUrl:

  • request parameter: shortUrl
  • statusCode: 301
  • location: longURL




Database design

Schema


id: id

longUrl: string

shortUrl: string

createdAt: timestamp



High-level design

You should identify enough components that are needed to solve the actual problem from end to end. Also remember to draw a block diagram using the diagramming tool to augment your design...


graph LR

A(Client) --> B(Load Balancer) --> C(Web Servers) --> D(Databases)

B-->Z(ZooKeeper)

C --> BA(Cache)




Request flows

Explain how the request flows from end to end in your high level design. Also you could draw a sequence diagram using the diagramming tool to enhance your explanation...


To create a shortURL, client will send a POST request to /api/v1/createShortURL. This will return a shortURL


To retrieve a longURL, client will send a GET request to /api/v1/getlongUrl. This will return the corresponding shortURL.


sequenceDiagram

Client->>Web Server: POST /api/v1/createShortUrl: longUrl

Web Server->>Client: returns shortUrl

Client->>Web Server: GET /api/v1/getLongUrl: shortUrl

Web Server->>Client: returns 301 direct: shortUrl






Detailed component design

Dig deeper into 2-3 components and explain in detail how they work. For example, how well does each component scale? Any relevant algorithm or data structure you like to use for a component? Also you could draw a diagram using the diagramming tool to enhance your design...


To generate shortURLs, we can use a hash function like MDA or SHA. MDA will be good enough.We hash longUrls with MDA and use the first 7 characters to generate the shortURL. This is not guaranteed to be random and can cause collisions, we so must always check if we have stored the same shortURL. We repeat until we have a unique shortURL.


Zookeeper will allocate a range of IDs to a server. This will help distribute storage of shortURLs.


A cache like Redis will store the top 20% of URLs visited. This can improve the lookup speeds of shortURLs that are frequently visited.




Trade offs/Tech choices

Explain any trade offs you have made and why you made certain tech choices...


Used MDA instead of SHA because we only need the first 7 characters.


Could have base 62 encoded the shortURL id but you may have short codes where they are less than 7 characters long. This is also a security concern as if the shortURL id sequential/predictable.




Failure scenarios/bottlenecks

Try to discuss as many failure scenarios/bottlenecks as possible.


A database instance may shut down, which can result in longUrls not being returned any longer. 




Future improvements

What are some future improvements you would make? How would you mitigate the failure scenario(s) you described above?


Each database has master master design where each database has one or more backups. We want this to have high availability.