Requirements


Functional Requirements:


  • Create a short URL for a given long URL.
  • Return the long URL associated with a given short URL.



Non-Functional Requirements:


  • List the key non-functional requirements (eg low latency, scalability, reliability, etc.)...
  • Latency - i. URL creation latency should not take too long. ii. URL fetching and redirection should be very minimal time.
  • Availability: System should be available with very less downtime. Let's say Four 9s. That is 99.99% downtime overall.
  • Scalability: We can assume there will be million user requests per month and scalable to 100 millions.

For redirection latency, we need to ensure following:

Redirection latency can be caused majorly by database fetch.

So we need to optimize data fetch operation performance by using db indexing, db partitioning, cache in front of db for fetch operation for frequently requested urls.


Capacity Estimation

Estimate the scale of the system. Consider daily active users, read/write ratio, storage requirements, bandwidth, and any relevant QPS calculations...


Now I have assumed that million user requests per month for start.

Overall 30000 requests per day to scale upto 3million per day.

I assume 20% is write and 80% is read. And per user 10 requests per day.

That is about 0.3 million per month. of 30000 * 10 per day / 24 * 3600 per seconds.

At 30000 requests per day, means around 6000 new URLs per day at beginning.

If a we take one url, on average with 1 short url + 1 long url + id + timestamp + index we can get about 200bytes to 2000 bytes in worst case. So for 6000 entries per day. It's equivalent to 11Mbs per day. With 30 * 365 days per year, worst case it's about 120 gb per year. With growth of 100x we can assume per year we need about 6 to 12 TB of space.


And about QPS, I think, we need 300000 query per day = 24 * 3600 seconds = Around 3 to 4 queries per 1 seconds. Scalable to 300-400qps easily.


Bandwith will be same as 5 * space required per day. 55mbps at basic and 5Gbps per day easily at minimum scale.


API Design

Define the APIs expected from the system. This is your chance to analyze and define the read and write paths so that you can come up with the high-level design...


So for APIs, we need two parts on basics:

i.) /createShortUrl (RequestParam longURL)

ii.) /fetchLongUrl(requestParam shortUrl)

some advanced endpoints like:

iii) /usage (RequestParam shortURL)

iv) /setRateLimit(RequestParam shortURL)

v) /cleanUp(Requestparam longURL)



High-Level Design

Describe the overall system architecture. Identify the main components needed to solve the problem end-to-end. Use the diagramming tool to create a block diagram.


So we have client.

  1. Client provides long url.
  • Client can be Web or Mobile
  1. API gateway

i. will handle incoming shorturl Creation requests, send it downstream to 3. url shortener service. it will receive the short url from url shortener service.

ii. also handle income fetchLongUrl request, sends it to downstream to 6. QueryService it will return the response to browser

3. URL shortener service

this will andle the createShortUrl requests with long url and store in db the shorturl for given long urls. Then it will return the same back to user.

4. Database to store Shortened url and data

Database will store the shorturl long url mapping and any other analytics data.

5. Cache for read query in sync of database for frequent reads.

6. QueryService

this is where fetch requests are diverted by api gateway. It takes the short url. and returns the long url to browser for redirection.

it also periodically stores the frequent requests to cache for faster read. if cache misses, it will fetch from db and then update cache and serve the user/browser

7 We need a Unique Id generator service which will synchronously generate new ids. URL shortner service instances will use this id generator to generate short urls to remove collision.

In case of high load both services will use load balancer. If multiple instance of uniqueid service are required, we can use partition like 0-1000, 2000-5000 range in unique id generator to separate concerns of duplicate id generation.

8 We can use cache to store recently generated urls also


Database Design

Define the data model. Identify the main entities, their attributes, and relationships. Consider the choice of database type (SQL vs NoSQL) and justify your decision based on access patterns...


Database will have atleast following:

Table: URL Mapping

id (number) primary key,

longurl(varchar(100) or string), shorturl(varchar(20)/ string),

timestamp - creation date


we can retain a separate nosql database for analytics table:


Record: ShortUrl analytics:

id(number):

last access: timestamp (date time),

viewcount(number),

geographical region: (APAC, US East, US West)


I want to highlight that I need to use indexing on id and short url for faster retrieval. For id column we can use the already generated Unique ID to ensure no data duplication.

For short_url field, indexing is required for faster lookup even from database.


We don't need denormalization in this DB because there won't be any joins even across multiple instances of data in case of horizontal scaling and sharding.

We will keep the analytics data in nosql db which supports horizontal scaling and easy efficient update of analytics data.

Normalization already happening through indexing of short_url and id colums.


Detailed Component Design

Deep dive into 2-3 key components. Explain how they work, how they scale, discuss tradeoffs, capacity, and any relevant algorithms or data structures.