Requirements
Functional Requirements:
- Create a short URL for a given long URL.
- Return the long URL associated with a given short URL.
- remove/update URL
Non-Functional Requirements:
- List the key non-functional requirements (eg low latency, scalability, reliability, etc.)...
Low Latency, Graceful Degradation, horizontal Scalability, Availability, Reliability. expiration. uniqueness,
100 new links per second, 10,000 redirects per second.
Analytics. Like geography, timestamps, IPs maybe. Devices.
No custom aliases.
No editing, Only add, delete.
eventual consistency.
Malware verification.
If different users shorten the same URL, they get different short links.
Link for extending/deleting URL Is sent to email.
Base64 characters. 8 characters per URL.
Storage time five years, can be extended by Link for extending
anti-bot protection for link creation
Delays for next link creation for the same email / ip / device
API Design
Define the APIs expected from the system. This is your chance to analyze and define the read and write paths so that you can come up with the high-level design...
/api/v1/checkurl To check a long URL for malware or phishing or whatever else, if it aligns with our rules. Receives long URL. Returns JSON with short lifeterm validation token. Token must be created with JWT or something similar.
/api/v1/addurl Add new URL. Gets validation token and long URL. Returns JSON with short URL.
/api/v1/geturl Get URL for short URL. Short version is /shortURL
Get short URLRedirects 302 to long URL. (301 Will eliminate analytics)
/api/v1/manageurl Delete URL or update its availability. Get short URL, management_token and command (update/delete)Returns result of execution.
High-Level Design
Describe the overall system architecture. Identify the main components needed to solve the problem end-to-end. Use the diagramming tool to create a block diagram.
/api/v1/addurl
Load Balancer -
web server -
(pre-generated short urls)(+ rate limiter)
validation service -
(verifies if long url passed verification - JWT)
queue -
(send email)(add to DB)
database worker -
database
/api/v1/geturl
Load Balancer (Health Checks) -
web server -
(async save analytics)
cache -
database
/api/v1/checkurl
Load Balancer (Health Checks) -
web server (Short Polling) -
validation queue -
validation worker -
JWT generator
/api/v1/manageurl
Load Balancer (Health Checks) -
web server - (verify token)(invalidate cache)
queue -
database worker - (update/delete record)(soft delete)
database
Detailed Component Design
*Deep dive into 2-3 key components. Explain how they work, how they scale, discuss tradeoffs, capacity, and any relevant algorithms or data structures.*
Add URL component
There is an input area. A person submits the URL. Optionally, person can submit his email to receive a management link. When person clicks submit, the URL is moving to validation component first. Validation component verifies the given URL for blacklists, phishing, malware, checks where this URL aligns with our rules.
So it is a separate request to /api/v1/checkurl
Browser receives 202 Request received And short URL, taken from Pre-generated list, like asgfdhgj. Then it periodically checks endpoint for a response. like /api/v1/checkurl/status/asgfdhgj (Short Polling)
When URL is verified, browser receives JWT token with a short life term. And with this token and with short URL It sends a request to /api/v1/addurl
On this endpoint, JWT token is validated along with a short URL. If everything is fine, Asynchronously add message to message queue to send emails And to add to database. When a database worker completes the task, it sends message Back to web server that the task is complete. While this happens, browser performs short polling on /api/v1/addurl/status/asgfdhgjAnd when URL is added to database, user receives message that everything is OK. We don't need to wait for email queue. It will happen when it will happen.
Get URL component.
User sends request to /short_url
A request comes to load balancer. Load balancer sends to web server. Web server asynchronously sends user data to save data for analytics and tries to fetch long URL from cache. In case of cache miss, it fetches URL from database worker, and also fills cache. Analytics saved in separate process for further analysis.
Browser receives 302 redirect with the long URL.
• Address scalability in the design. Explain how the system would handle high traffic and growth. Consider using load balancing, caching, and database sharding techniques.
Regarding scalability. There supposed to be a zookeeper and many stateless web servers. Zookeeper receives heartbeats from web servers. Load balancer will check with Zookeeper which servers are alive. And then send a request to A living server with lower connection amount. After web servers there have to be a cluster of Redis. Also, I will need a Kafka for creating new URLs, Kafka for analytics, and RabbitMQ for emails. And then a sharded database for urls. Also, we'll need a separate database for analytics. A JWT Generation and validation system. a pre-generated shorturls engine, For example, it will Provide every web server with a slice of hashes to return. like 10100-10200, So every web server will return a unique ID and will Will be in control of its slice. Management token generation and validation system. Also, we'll need a monitoring system. Sharded database and database workers. I believe we need separate servers for workers with connections to every shard.
• Address high availability in the design. Consider using techniques such as replication, clustering, and load balancing to ensure the system remains operational even if some components fail.