Requirements


Functional Requirements:


  • Create a short URL for a given long URL.
  • Return the long URL associated with a given short URL.
  • URL should not be easily reversed engineered such as incrementing or UUID where it is time dependent


Non-Functional Requirements:


  • High availability
  • Low latency
  • Horizontal scalability
  • Assuming a medium-scale system, with about 1B users, and 10% of which are active, which is 100M users, each creating about 3 request/minute on average, we can expect 300 request/minute, or roughly 5 request/second. Considering the longevity of the system for 100 years, we can expect 5*60*60*365*100 = 657M possibly unique URL.


API Design

Define the APIs expected from the system. This is your chance to analyze and define the read and write paths so that you can come up with the high-level design...


Domain name: trnc.io


Available endpoints:


POST trnc.io/api/v1

{

url: 'https://codemia.io"

}

returns {url: 'trnc.io/api/v1/cmio'}


GET trnc.io/api/v1/cmio

301 redirect to https://codemia.io


High-Level Design

Describe the overall system architecture. Identify the main components needed to solve the problem end-to-end. Use the diagramming tool to create a block diagram.


  1. There are 26 characters in the english alphabet, multiply by two for capitalised letters, add an additional 10 for numbers and we have 56 characters. To determine the length of the url, we can reverse engineer by using log(657000000)/log(56) ~ 6 characters long, which can create 30,840,979,456 requests. Approximately 4690 years.



Detailed Component Design

Deep dive into 2-3 key components. Explain how they work, how they scale, discuss tradeoffs, capacity, and any relevant algorithms or data structures.


  1. Hashing algorithm: Incremental id with random increments up to 46. Since we only require 657M, yet we have over 30B, making this random increment of 46 possible. Additional system design can be to start with a random id like 12345678 and then do a base62 encoding. This will make it hard to brute force to retrieve urls. However, this is still a basic hashing algorithm, so it is possible to brute force and retrieve a url.
  2. Rate limiting: place a service like KONG API gateway that is able to do rate limiting, authentication like jwt (if needed), and load balancing. This will help to address the flaw in point 1, securing our server. A user cannot exceed more than 20 request a minute.
  3. Database replication: Idk how to do this lol.