System requirements
Functional:
- Shortening URLs: The system should allow users to input long URLs and generate a unique, shorter alias for them.
- Redirection: When a user accesses the shortened URL, the system should redirect the user to the original long URL.
- Custom Alias: Optionally, allow users to choose a custom alias for their shortened URL.
- Link Expiration: Option to set an expiration date for short links, after which they are no longer accessible.
- Link Analytics: Provide analytics for short links, such as click counts, location of clicks, referral information, etc.
- API: Offer an API for integrating the service into other applications.
- User Authentication: Allow users to create accounts and manage their shortened URLs.
- Security: Ensure that shortened URLs are not easily guessable and protect against malicious activities like spam and phishing.
Non-Functional:
- Performance: The system should be able to handle a high volume of concurrent requests efficiently and provide fast redirection for shortened URLs. It should have low latency.
- Scalability: The system should be able to scale horizontally to accommodate increasing numbers of users and shortened URLs.
- Reliability: The service should be highly available and reliable, with minimal downtime.
- Security: The system should implement security best practices to protect against attacks like injection, phishing, and unauthorized access.
- Monitoring: Implement logging and monitoring to track system performance, user activity, and errors.
- Compliance: Adhere to legal requirements like GDPR to ensure user data privacy and protection.
- Usability: The user interface should be intuitive and easy to use for creating and managing shortened URLs.
- Backup and Recovery: Implement regular backups of data to prevent data loss in case of failures.
- Cost: The system should be cost-effective to operate, considering factors like server resources, storage, and bandwidth usage.
Capacity estimation
- DAU Number of users: 1million.
- Number of Shortened URLs: avg each user generate 1 shortened URL daily.
- Storage: Avg shortened URL is 5KB
-> daily storage requirement 1Million * 5KB ~= 5GB
-> TPS 1million /(24 *3600) ~= 12 TPS
API design
1.Create short URL from given one.
/shorten. POST body {"inputURL": "xxx"} response {"shortURL": "abc", Id: '123"}
2.Redirect by Id(optional)
/redirect/Id
3.Delete Short URL
/delete/id
4.Analytics
/analytics/id
topReferrer: xxx click: xxx
5.Security
Rate limit: based on ip/session id limit max queries per day avoid DDOS.
Validation: check user input URL with regex match before handle it.
Use HTTPS redirection: protect service from being intercepted.
6.User Registration
Allow users to sign up for an account by providing essential information such as email address and password. Optionally, you can offer signup via third-party services (e.g., Google, Facebook) for convenience.
- Endpoint:
/register - Method:
POST - Body:
json
Copy code
{ "email": "[email protected]", "password": "securePassword123" }
email verification
reset password
Database design
User Table
1.user id(key)
2.email
4.pwd
5.email verified boolean
URLs table
1.short URL(key)
1.id
2.user id
3.original URL
4.createdAt
5.ExpiresAt
6.shortCode
URL Visits Table
1.visit id(key)
1.short URL id
2.ReferrerIdAndClicks {"referrer1": 10, "referrer2": 2}
3.
4.
High-level design
- web service: include front end UI for signin/signupsubmit requests.
- server API gateways
- URL shorten service
- DataBase
- redirection service
- analytics service
Request flows
User interactions:
- login/signup
- once login, user can submit
URL Shorten:
- The service check if given URL exist in database or not. if no create it. if yes, check userid if match.
URL redirection
- Given shortURL, lookup database and response/redirect to original URL
analytics
Each access to a shortened URL is logged by the Analytics Service, which updates visit counts, referrers, and other relevant data in the database for reporting to the user.
Detailed component design
Let's delve into the process of generating unique shortened URLs and handling collisions in the system
Generation strategy:
- random string
- hash encoding
- counter
pro and cons
- limited by string length for the system capacity.
- SHA256 encoding.
- predicable, security concern.
Trade offs/Tech choices
Explain any trade offs you have made and why you made certain tech choices...
Failure scenarios/bottlenecks
Database Overload
Scenario: High request rates could overload the database, slowing down read and write operations, leading to timeouts or errors.
Mitigation:
- Caching: Implement caching layers to store frequently accessed data, reducing direct database hits.
Single Point of Failure (SPOF)
Scenario: If any component of the system (e.g., the API Gateway, database, or key generation service) is a single point of failure, its outage could make the entire service unavailable.
Mitigation:
- Redundancy: Ensure all critical components have redundant instances in separate availability zones or regions.
Key Generation Collisions
Scenario: The method for generating unique identifiers for shortened URLs might lead to collisions, especially as the namespace gets crowded, impacting performance due to retries.
Mitigation:
- Namespace Expansion: Consider using a longer key length as the service scales.
Future improvements
Caching: Maintain a cache of recently accessed or created short URLs to reduce database lookups for popular or newly generated URLs. LRU cache