Requirements
Functional Requirements:
- Create a short URL for a given long URL.
- Return the long URL associated with a given short URL.
- remove/update URL
Non-Functional Requirements:
- List the key non-functional requirements (eg low latency, scalability, reliability, etc.)...
Low Latency, Graceful Degradation, horizontal Scalability, Availability, Reliability. expiration. uniqueness,
100 new links per second, 10,000 redirects per second.
Analytics. Like geography, timestamps, IPs maybe. Devices.
No custom aliases.
No editing, Only add, delete.
eventual consistency.
Malware verification.
If different users shorten the same URL, they get different short links.
Link for extending/deleting URL Is sent to email.
Base64 characters. 8 characters per URL.
Storage time five years, can be extended by Link for extending
anti-bot protection for link creation
Delays for next link creation for the same email / ip / device
API Design
Define the APIs expected from the system. This is your chance to analyze and define the read and write paths so that you can come up with the high-level design...
/api/v1/checkurl To check a long URL for malware or phishing or whatever else, if it aligns with our rules. Receives long URL. Returns JSON with short lifeterm validation token. Token must be created with JWT or something similar.
/api/v1/addurl Add new URL. Gets validation token and long URL. Returns JSON with short URL.
/api/v1/geturl Get URL for short URL. Short version is /shortURL
Get short URLRedirects 302 to long URL. (301 Will eliminate analytics)
/api/v1/manageurl Delete URL or update its availability. Get short URL, management_token and command (update/delete)Returns result of execution.
High-Level Design
Describe the overall system architecture. Identify the main components needed to solve the problem end-to-end. Use the diagramming tool to create a block diagram.
/api/v1/addurl
Load Balancer -
web server -
(pre-generated short urls)(+ rate limiter)
validation service -
(verifies if long url passed verification - JWT)
queue -
(send email)(add to DB)
database worker -
database
/api/v1/geturl
Load Balancer (Health Checks) -
web server -
(async save analytics)
cache -
database
/api/v1/checkurl
Load Balancer (Health Checks) -
web server (Short Polling) -
validation queue -
validation worker -
JWT generator
/api/v1/manageurl
Load Balancer (Health Checks) -
web server - (verify token)(invalidate cache)
queue -
database worker - (update/delete record)(soft delete)
database
Detailed Component Design
*Deep dive into 2-3 key components. Explain how they work, how they scale, discuss tradeoffs, capacity, and any relevant algorithms or data structures.*
Add URL component
There is an input area. A person submits the URL. Optionally, person can submit his email to receive a management link. When person clicks submit, the URL is moving to validation component first. Validation component verifies the given URL for blacklists, phishing, malware, checks where this URL aligns with our rules.
So it is a separate request to /api/v1/checkurl
Browser receives 202 Request received And short URL, taken from Pre-generated list, like asgfdhgj. Then it periodically checks endpoint for a response. like /api/v1/checkurl/status/asgfdhgj (Short Polling)
When URL is verified, browser receives JWT token with a short life term. And with this token and with short URL It sends a request to /api/v1/addurl
On this endpoint, JWT token is validated along with a short URL. If everything is fine, Asynchronously add message to message queue to send emails And to add to database. When a database worker completes the task, it sends message Back to web server that the task is complete. While this happens, browser performs short polling on /api/v1/addurl/status/asgfdhgjAnd when URL is added to database, user receives message that everything is OK. We don't need to wait for email queue. It will happen when it will happen.
Get URL component.
User sends request to /short_url
A request comes to load balancer. Load balancer sends to web server. Web server asynchronously sends user data to save data for analytics and tries to fetch long URL from cache. In case of cache miss, it fetches URL from database worker, and also fills cache. Analytics saved in separate process for further analysis.
Browser receives 302 redirect with the long URL.
• Address scalability in the design. Explain how the system would handle high traffic and growth. Consider using load balancing, caching, and database sharding techniques.
Regarding scalability. There supposed to be a zookeeper and many stateless web servers. Zookeeper receives heartbeats from web servers. Load balancer will check with Zookeeper which servers are alive. And then send a request to A living server with lower connection amount. After web servers there have to be a cluster of Redis. Also, I will need a Kafka for creating new URLs, Kafka for analytics, and RabbitMQ for emails. And then a sharded database for urls. And database workers. Also, we'll need a separate database for analytics. And analytics workers. A JWT Generation and validation system. a KGS, pre-generated shorturls engine, For example, it will Provide every web server with a slice of hashes to return. like 10100-10200, So every web server will return a unique ID and will Will be in control of its slice. Management token generation and validation system. Also, we'll need a monitoring system. Sharded database and database workers. I believe we need separate servers for workers with connections to every shard.
• Address high availability in the design. Consider using techniques such as replication, clustering, and load balancing to ensure the system remains operational even if some components fail.
Regarding availability.
First of all, we'll need an external monitoring system that will health check all our nodes.
The load balancer Is a highly available cluster, active-active.
Also needed a smart DNS, which will send traffic to working load balancers. Updated with Zookeeper. When one of load balancers goes down, it is removed from SmartDNS almost flawlessly.
Zookeeper Is an ensemble, Quorum-based cluster. When one of the keepers is going down, other instances build quorum. If there is no quorum, we block releases of new sets of keys by KGS. If web server has enough keys to release, it keeps working Both reading and writing. If a web server doesn't have keys, It returns failed to browser on writing, And browser might try with another web server. Reading works like nothing happened.
Web servers are a highly available cluster. Active, active. They are connected to Zookeeper and send heartbeat to keep it updated. They are stateless, Except they have a set of keys to release when writing a new URL. They receive this set of keys from KGS And release one by one. When one of web servers go down, Zookeeper just excludes it from the list.
JWT, Generation and Validation System, is a Redundant setup. If primary node fails, the secondary node takes over.
KGS, pre-generated shorturls engine, is a Redundant setup. If primary node fails, the secondary node takes over.
Management token System. HA Cluster with Leader-Follower Replication. If leader fails, follower will take over. If all system is cut, web server will not be able to issue new short URLs and it will return an error (But only in case when a user typed in his email to receive a management URL. Otherwise, it will keep working).
Redis cluster is a typical Redis HA cluster setup. If one node fails, other nodes will take over.
Kafka HA Distributed cluster with Leader-Follower Replication. When one node fails, the secondary node takes over.
RabbitMQ HA Cluster with Leader-Follower Replication and quorum decisions. In split-brain case, more than one email will be sent. It's not an issue. It's important to send at least one email.
sharded database Built on SQL, is split in shards. Each shard has a leader follower replication. Followers might be located in different data centers. If leader is down, follower will take over. If shard is down, we will have a partial availability, but with followers in different data centers, it has low probability.
database workers Keep pool of connections to all shards. And they know the rule of sharding - Consistent Hashing.
database for analytics. Built on NoSQL, Has leader-follower replication. Followers might be located in different data centers. If leader is down, follower will take over.Analytics can be stored within last month and older data are backed up to cheaper servers with lower availability. Will require larger storage, Monthly usage might be 20 - 30 terabytes.
And analytics workers. Keep pool of connections. Require high-speed writing from Kafka queue.