Designing A Simple Url Shortening Service A TinyURL Approach - System Design

System requirements

Functional:

User should be able to create a shortUrl by supplying a longUrl
User should be able to delete/deactivate a shortUrl they created
Users should be able to retrieve the longUrl using the shortUrl they supply to the system

Non-Functional:

List non-functional requirements for the system...

System should be highly available for the reads (accessing a longUrl from shortUrl)
System should be scalable, adding new database servers when current space is exhausted
Clean up of archived/deactivated URLs after a threshold time eg. 30 days
Should be able to support high burst throughput for reads of highly popular urls

Capacity estimation

Estimate the scale of the system you are going to design...

Storage requirement

Number of create requests per sec= 200

Number of create requests per year = 200 * 60 * 60 * 24 * 365 = 17,280,000 ~ 6205 million

Assuming 10% growth every year, at end of 5 years, number of requests ~ 9082 million which is ~9 Billion

We consider, each request creates a MD5 hash of URL (taking first 7 characters) to represent the short url which will be 7 bytes and also record the long url (average size ~300 Bytes) and the ttl (timetolive in epoch time) ~7 bytes in database.

So we will need a database size of 314 * 9 * 1000000000 Bytes = 2826 GB ~3 TB. Consider we create multiple replicas for durability and availability, lets approximate data size ~10TB.

Throughput Requirement

Also, assuming 200 create requests per second rising by 10% each year leading to ~300 TPS by end of 5 years

For read requests 20,000 TPS

~20k TPS overall

This is not a huge number but will require multiple Application Servers to be hosted

Bandwidth Requirement

For 20k TPS where each request returns ~300 Bytes, bandwidth required = 20 * 1000 * 300 * 8 /1000 Kbps = 48000 Kbps

Considering each 64 Kbit regular server supports 64 KBit instructions/sec processing, number of servers required = 48000/64 ~800 servers

Upper limit of servers to support burst traffic, suppose we encounter that all DAU (845,865 users by end of 5 years) request a read at the same time, number of servers required = 845,865* 300 * 8/(64000 * 8) ~ 4000 servers

API design

Define what APIs are expected from the system...

POST: /api/shorturl?url=longUrlValue
1. parameter url: longUrlValue which will be a String value of the URL supplied by the user to be shortened
2. This endpoint will be accessed by the client to create a resource of type shorturl using the longUrlValue String supplied by the user
GET: /api/shorturl/shortUrlValue
1. Client uses this endpoint to get the longUrlValue stored in the system by supplying the shortUrlValue as the path variable
DELETE: /api/shorturl/shortUrlValue
1. Client uses this endpoint to archive or delete the shortUrlValue resource they created

Database design

Defining the system data model early on will clarify how data will flow among different components of the system. Also you could draw an ER diagram using the diagramming tool to enhance your design...

The loadbalancer provides a way to effectively distribute requests to the

We use a NoSQL Mongo db as a database to store our shortUrl as the key and the value is the long url string. We create database shards using consistent hashing technique. The keys (shortUrls) that are hashed by the same function fall under a specific partition and is recorded by the cluster manager.

High-level design

You should identify enough components that are needed to solve the actual problem from end to end. Also remember to draw a block diagram using the diagramming tool to augment your design. If you are unfamiliar with the tool, you can simply describe your design to the chat bot and ask it to generate a starter diagram for you to modify...

CDN: A CDN server is used from locality to check if a frequent shortURL from that region is being accessed. Stores top 10,000 hot shortUrls.
Loadbalancer: Provides a way to distribute load between application servers.
Application Server: Multiple instances of application servers over different regions provide a way to redistribute incoming requests between themselves to provide high availability and response times
DB Nodes: Key-value store type DB . Each db node retains partitions and replicas of the partitions. Partitioning is done based on the consistent hashing of the node ids and the mapping is stored in Cluster Manager. Each shortUrl is hashed by the server and the partition id is assigned based on mapping from Cluster Manager.
Cluster Manager: Provides application server the partition id -> Db node mapping so that it can quickly route the user to shortUrl they are looking for.

Request flows

Explain how the request flows from end to end in your high level design. Also you could draw a sequence diagram using the diagramming tool to enhance your explanation...

A user sends any request via client.

The client checks if the request was recently served by checking the CDN server and returns the result if available
Otherwise, the loadbalancer redirects the request to application server (also helps in eliminating any duplicate create requests)
Loadbalancer distribution is based on a mix of geolocation and roundrobin. On the first level, requests are routed to specific geolocation, where another level of loadbalancer does routing based on pure round robin since the requests are nearly similar in nature (remember write request initially does a search initially as well)
For Create api request
1. the application server does MD5 hashing of the long URL + epoch timestamp) and takes first 7 characters. This becomes the variable part (shortUrl) of the user accessible short url, for eg. https://www.myshorturl.com/<shortUrl>
2. it then uses the consistent hashing function to determine the correct partition and db node it should be routed to
3. if there is a successful insertion into the partition, return success to user, else if there is an error, we let the application automatically retry after a time interval
4. The new entry is lazily copied to partition replicas under other nodes after the response is returned.
For read requests
1. steps a through d are repeaterd
For Delete request, if shortUrl is found, the application updates the ttl value to 0, which is cleaned up by Cleanup Service later.
For read requests, we create scalable shard counters on CDN nodes which will help us determine the top K hot short URLs accessed, we can reset/refresh this list on a periodical basis, eg. every 4 hours.

Detailed component design

Dig deeper into 2-3 components and explain in detail how they work. For example, how well does each component scale? Any relevant algorithm or data structure you like to use for a component? Also you could draw a diagram using the diagramming tool to enhance your design...

The Cleanup service runs periodically on each primary partition (information retrieved from cluster manager) and does two important jobs:

Updates ttl value for each data entry by x-t, where x is ttl value it just observed and t is currentTimestamp - lastUpdateTimestamp for this record.
For all values it sees as 0, it deletes the data entry (shortUrl and longUrl entry)

Trade offs/Tech choices

Explain any trade offs you have made and why you made certain tech choices...

The Mongo Db provides high read availability and latency because of its Leader-Follower replication model. This allows selection of a primary database partition node for writing data which is then propagated synchronously among locally situated replicas. However, we can speed up by asynchronously updating global replicas. This will reduce the impact on availability while making reads more consistent.

Failure scenarios/bottlenecks

Try to discuss as many failure scenarios/bottlenecks as possible.

Due to eventual consistency of the database nodes, it might be possible that application misses the shortUrl newly created in other node on read requests. Having a retry mechanism where the cluster manager provides a different node replica for the given partition (as in MongoDb) can lower the chance of complete failure but can reduce availability, if this retry is limited to approx 3 times at 500 milli seconds interval, there is high chance of success. Also availability is not heavily affected
Loadbalancers restrict the reprocessing of duplicate requests. If there are two requests with same url at same timestamp, it might create a collision. To handle this, we enable automatic retries at the application level, which after a retry creates a newer timestamp and avoids the collision.
To handle bursts of requests, we can use containerization service such as Kubernetes and a metrics based scaling technique for eg. Kubernetes Event Driven Technique to scale application servers horizontally upto the upper limit of max servers we provided during estimation. If traffic continues to scale, we can use rate limiter to cut down traffic and send errored out requests to retry queue.

Future improvements

What are some future improvements you would make? How would you mitigate the failure scenario(s) you described above?

For ever increasing user base, we may think of separating out write application server and read application servers in ratio 1:100. This will avoid writes blocking read calls.

Secondly, we can have a cache server to store recently queried URLs.

Thirdly, we can implement a retry queue to handle retry requests, for eg. SNS topic