System requirements


Functional:

List functional requirements for the system (Ask interviewer if stuck)...

  1. Create a short URL for a URL requested.
  2. When a request with short URL is received, returns the long URL.
  3. User can create custom short URL.
  4. Provide analytics on URLs.
  5. The URL can expire.
  6. The URL can be deleted.


Non-Functional:

List non-functional requirements for the system...

The service should be scalable, low latency, highly available.

The system should have proper authentication and authorization.

Ensure data durability and high throughput.

The service should contain monitoring and logging system.


Capacity estimation

Estimate the scale of the system you are going to design...

Assume the service generates 100 shortened URLs per second, everyday 8640000 URLs will be generated.

Assuming the URLs expire 1 year after it is generated. We will have approximately 8640000 * 365 ~ 10^7 * 400 = 4 * 10 ^ 9 urls

Assume an average URL takes 100 bytes to store and each short URL takes 20 bytes to store. The metadata of a Url pair would be around 80 bytes. The total size of storage will be approximately 4 * 200 * 10 ^ 9 = 8 * 10 ^ 11 bytes or 8 * 10 ^ 2 GB of storage.

Assume every URL get clicked 10 times a day, that would make it 4 * 10 ^ 10 clicks a day, and 4 * 10 ^ 5 clicks per second.



API design

Define what APIs are expected from the system...

  1. shortenUrl
  2. request: url (String), custom_url (optional String)
  3. response: short_url
  4. redirect
  5. request: shortUrl (string)
  6. response: url (string)
  7. fetchAnalytics
  8. request: shortUrl (string)
  9. response: numberOfClicks
  10. deleteUrl
  11. request: shortUrl (string)
  12. response: succeed / failure



Database design

Defining the system data model early on will clarify how data will flow among different components of the system. Also you could draw an ER diagram using the diagramming tool to enhance your design...

The system use one table only

Schema:

shortUrl: varchar

url: varchar

owner: varchar # userId

expirationDate: datetime

numberOfClicks: int



High-level design

You should identify enough components that are needed to solve the actual problem from end to end. Also remember to draw a block diagram using the diagramming tool to augment your design...

The system should contain:

  1. A client, which should be a website.
  2. A server that handles all read / write traffic.
  3. A database that stores all urls and their metadata, analytics.
  4. A cache to help serve the high volume of read / write.




Request flows

Explain how the request flows from end to end in your high level design. Also you could draw a sequence diagram using the diagramming tool to enhance your explanation...

  1. shortenUrl: web sends a request with a url to the server, the server generates a shortUrl for the url, and query database to store them. Once it is stored successfully, return the shortUrl to user.
  2. redirect: web sends a request with a shortUrl to the server, the server first try to fetch the url in cache, if it results in a cache miss, the server will then query the database for the url. If a url is found, returns the url to the web. Otherwise, returns an error code 404.
  3. deleteUrl: Web sends a request with a shortUrl to the server. The server query the database to delete the url under the condition that the owner of the url is the user who sends the request. If a record is deleted, server looks for a match of the shortUrl in the cache. If a match is found, the record is deleted from the cache. The server then returns succeed or failure to the web.
  4. fetchAnalytics: web sends a request with a shorturl to the server. The server fetches the analytics data of the shorturl from the database under the condition that the owner of the shorturl is the user sending the request. The server then sends what's returned by the database to the web.





Detailed component design

Dig deeper into 2-3 components and explain in detail how they work. For example, how well does each component scale? Any relevant algorithm or data structure you like to use for a component? Also you could draw a diagram using the diagramming tool to enhance your design...

  1. The server. We need to use a good algorithm to generate the shortened url. We could use [0-9][a-z][A-Z] characters in the url, and that makes it 62 characters. To make sure all urls are unique, we need to make sure the url is at least 6 characters in length. One way to achieve this is to randomly generate a 6-characters long string, query the database to see if it exists, if not, stores the string in the database. Otherwise, regenerates a string.
  2. The database fits in the disk of a single machine, however, to ensure high availability of service, the database should have redundancy. There could be a replica of the database, which serves as a backup. When a server writes to the database, it should write to both replicas, and when it reads from the database, it could read from one of them.
  3. Similarly, there should be redundency of the server to maintain high availability.


Trade offs/Tech choices

Explain any trade offs you have made and why you made certain tech choices...

Using randomly generated urls for its simplicity, trading for the chance of generating duplicates of short urls and have to re-run.



Failure scenarios/bottlenecks

Try to discuss as many failure scenarios/bottlenecks as possible.






Future improvements

What are some future improvements you would make? How would you mitigate the failure scenario(s) you described above?