Requirements are:
Functional Requirements:
- Create a short URL for long URL
- Return Long URL from a given a short URL or return 404
Non Functional Requirements:
- We will be choosing availability over consistency.
- System has to serve at least 3T short urls.
- Short URL has to be at least 5 characters long
API Design
To Shorten URLs:
[POST] /api/urls/short
BODY: {
"longUrl": "www.google.com"
}
Response: {
"shortUrl": "www.domain.com/xyz34"
}
To GET Long URL from SHORT URL. If not available it will return 404.
[GET] /api/urls/long?url='www.domain.com/xyz34'
Response: {
"longUrl": "www.google.com"
}
High-Level Design
Components
Hash Service: Responsible for generating unique hash per long url.
Redis: We will use redis for caching frequently accessed long url. We will use LRU algorithm to evict cache and any update or delete also remove the key from redis. If a key does not exist in cache then server will approach database for a table scan.
Load Balancer: Load Balancer will handle multiple server based on stateless RR algorithim.
Server: Server will process the request and communicate with Hash Service, Database and Redis. NoSQL: We will be using NoSQL database mongodb as our primary database. Our database will distributed. Since we are choosing availability over consistency we will return as soon as a user has written successfully in one database. We will also make sure one long url always lands on only one shard by using consistent hashing. So long Url will be the key the key here. We will be using Quorum based technique requiring majority vote (> N / 2). N will be always odd.
Detailed Component Design
Components:
- Hash Service: This part is responsible for create the hash for the long url. From the requirement we can see that our hash has to be at least 5 characters long. And it also needs to support 3T hashes. Since 5 characters will cause collission for 3T hashes. We need to pickup at least 7 characters. For base64, 7 characters represents 42 bits. 2^42 = 4.39T. So we will generate 42 bits hash and convert it into base64 characters. If the generated base64 string already exists in the database, it will regenerate the string again. Also there will be no collision, since we will be using 4 bits to support number of servers as server Id and 24 bits for current timestamp. And there will be 4 concurrency bits.
- Redis: After generating the base64 string, we will keep it in the NoSQL database. And then we will use Redis as cache with LRU algorithm to keep it accessible without database checking. Redis will keep the long url as a key and short url as value. If long url does not exist server will reach to database. In case of any update to the long url will evict the key from Redis.
- Load Balancer: LB will be able to handle increased load of request on multiple servers based on stateless RR algorithim .
- NoSQL DB: NoSQL DB like MongoDB suits perfectly since this is highly scalable. For a distributed system we have chosen availability over consistency. So other database will eventually consistent. And we also make sure one long url goes to one database using consistent hashing. Hot shard is not possible here since we are using Cache to retrieve data. For Split brain scenario we will be using quorum based leader election (> N/ 2) where n is always an odd number.
- Server: We can achieve low latency by using CDN, since most of our data will be static. We will keep multiple server to achieve high availability which will be based on stateless RR algorithm at LB level. Also we will be using Token bucket algorithm to limit user generating more than 10 short url per minute using user id as key. It will ensure no burst request are coming.