System requirements
Functional:
List functional requirements for the system (Ask interviewer if stuck)...
- generate 500,000,000 URLs per day, which is ~60000 URLs per second
- URLs have user specified expiration time, and if user does not specify the same, it will be 30 days
- when different user sends the server the request with the same url, the server will generate unique short url for them with corresponding expiration time, and each url will redirect to the same actual url
Non-Functional:
List non-functional requirements for the system...
- generate 60,000 URLs per second, that would be 60,000 writes per second
- read to write ratio 10:1
- Highly available
Capacity estimation
We have 500,000,000 URLs per day, and in average expiration for each URL is 30 days, on average at each point of time we store URLs that were generated in the past 30 days.
Each short URL is about 20 bytes in length, and each actual URL is about 50 bytes in length, while time stamp is around 10 bytes in length. Each short URL record takes up 20 bytes + 50 bytes + 10 bytes = 80 bytesin the database
Storage estimation is as following:
500,000,000 URL * 30 * 8 bytes = 120 GB
Spike storage: 120 GB * 2 = 240 GB
API design
POST api/v1/shortUrl
{
"long_url": "https://www.google.com"
"expiration": 36000 --- expiration time is in seconds
"preferred_url": "preferred" --- in the form of bit.ly/preferred if the preferred_url is not taken already
}
Response:
Status Code 200
{
"short_url": "bit.ly/A3dv3RVss",
"expiration_timestamp": 13690000000
}
Database design
Short URLs should be stored in non-relational database, ideally a Key Value Store for scalable and fast read.
Data Schema:
{
"id": 16 digit int: 64 bytes --partition key
"shortUrl": hash of the id: 10 bytes
"longUrl": 30 bytes
"expiration_timestamp" 10 bytes -- sort key
}
in total each entry takes about 120 bytes
I will put it in DynamoDB, and set shortUrl as the partition key and expiration_timestamp as the sort key.
High-level design
API Gateway: Rate limiting, Authenticating, Metric collection, and load balancing. This helps ensure the security of the service.
Unique ID Generating Service: a distributed service that generates Unique IDs and feeds them to web servers
URL Shortening Service: shorten the url, check for collision, and store the url into the database and cache. For each user request, the service fetches one ID from the Unique ID Generator, generate the 62 based hash code, and puts a message into the queue.
URL Shortening Workers: The workers polls from the kafka queue for url shortening messages. It first put the corresponding entry into the Short URL Database, then put the same key value pair or short and long url into the cache.
Short URL Database: store the shortened url with their original url and the expiration timestamp. DynamoDB is a fully-managed database product that guarantees high availability. So failure of the database is unlikely.
Caching Layer: cache frequently accessed short url. It is a read through cache, also some sort of "write aside cache" (data is written to database, then to cache). Service should go to the short url database in case of cache miss.
Worker: nightly batch processors to remove expired short urls from Short URL Database
Short URL Redirect Service: receives User's redirect request, find the long url corresponding to the short url, and send back a http redirect response. If short url not exist, go fetch from database. If short url not exist in the database, put null into cache, and return an 400 error.
Request flows
shorten url request:
Upon receiving the request, the URL Shortening Service will get a unique integer ID from Unique ID Generator, then it will hash it using some hashing algorithm like SHA256 or MD5. After that, it will take the encode the hash with base62 and take the first 10 characters of the encoding as the shorturl. It will go check if the same url exist in the dynamodb database. If duplicates exists, then it will do it again with a new ID from the unique id generator. If there is not duplicate, it publish a message into Kakfa, whose body includes id, short url, long url, expiry timestamp.
Then a worker will poll the message from kafka and 1. put the entry into Short URL Database, and 2. put the key value pair of short url and long url into the Short URL Cache, with a 14 day expiration or until the expiration timestamp of the short url, whichever comes first.
short url redirect request:
Upon receiving a short url redirect request, the Short URL Redirect Service will go check out if the short url exists in the short url cache as a key. If yes, it will just fetch the associate value and return a http redirect response, redirecting the customer to the long url.
If it does not exist, then the service should query for the same short url in the short url database and return the result. If nothing is found, return a 400 bad request error and cache the null value associated with the key.
Detailed component design
Short URL Database:
it stores data in the following schema:
id
short_url
long_url
expiration_timestamp
- utilizes DynamoDB's ttl feature to auto-delete entries that has passed its expiry timestamp
- Can partition based on short_url for better scalability
- might face hot partition issue if one short url is access by too many people, and this can be mitigated with the usage of cache. Cache will absorb most of the database reads in this case
Kafka and Worker listening to it that inserts short url record in database/cache
- Can use something like aws lambda to which is scalable
- Can scale Kafka by a. increase the number of brokers b. partition topics
- Worker's workload is idempotent, trying to insert the exact same entries into cache/database. Then if a worker failed, the operation is still going to finish after retry
Trade offs/Tech choices
I chose redis as cache because it is one of the best in memory cache offerings. DynamoDB for the database because it is a performant key value store that is suitable for our use case, and it works well with read heavy workload. If we want to scale globally, then we need to scale redis and dynamoDB. Here we gain availability but compromise on the consistency.
Using Kafka and workers decouples the URL shortening Service with the data stores. This improves availability but sacrifices performance/consistency. But this is good because if we put the data persisting logic into the URL Shortening Service, there might be data integrity issue if the service failed during the generation, resulting in stale data.
Failure scenarios/bottlenecks
Try to discuss as many failure scenarios/bottlenecks as possible.
Future improvements
What are some future improvements you would make? How would you mitigate the failure scenario(s) you described above?