My Solution for Designing a Simple URL Shortening Service: A TinyURL Approach with Score: 8/10
by drift1945
System requirements
Functional:
- The user can get a short Url with a long URL
- User accessing the short URL will be directed to long URL
- If the URL doesn't exist, it will return an error message.
Non-Functional:
- Availability: The request should receive a response without error.
- Partition tolerance: the system should still operate even if some messages are dropped due to the network between nodes
- Consitency: Support eventual consistency
Capacity estimation
For Data storage:
- Each long URL is around 100bytes
- Each short URL is 10bytes
- Around 100M URLs are generated per day
So the total data storage is around 11 GB per day
For bandwidth estimates, let's assume we have 100 million DAUs and 100 million URLs visited per day. So the bandwidth is around 100M * (110bytes) /86400 = 110kb/s
API design
- shortenURL(String longUrl): return the shortened URL
- redirectURL(String shortendURL): return the HTTPS 301 with redirect to original URL
Database design
here we use Mongo DB, as it's high performance and scalability.
The database contains one table: which might include fields:
Primary Key: Short URL: varchar(10)
Long URL: varchar(100), indices.
Optional: expiration date
Their relationship is 1:1.
High-level design
Request flows
The shortening request will firstly go to load balancer and then the load balancer will decide which server to go to, and then the URL shortening service will encode the URL and then add the shortened URL to database and cache.
The redirecting request will firstly go to load balancer and then served by retrieving URL service, then it will try to get the corresponding URL in the cache, if not found, then it will do the data base query.
If the requesturl doesn't exist at all, it will return a Not found 404.
Detailed component design
For the shortending service, we can use a-z, 0-9, 36 character in total, so, we use 32-based encoding.
For the Mongo DB database structure, we use key-value store form like, in this way, the shorten URL can find the corresponding long URL very quickly. And we will do sharding by the hash value of the short url, to minimizing the hot-spot problem. To solve the hash-collision, we can do a second-hashing. And we will use a Master-Slave mode to reduce the load and to avoid single point of failure.
For the Cache, we will use Redis, and use LFU as its eviction policy to try to reduce the load to database, besides that, we can add other fields like expiration time to better control its life cycle.
Trade offs/Tech choices
For database,
LSM-based database(e.g. Cassandra): greatly support write heavy system
B-Tree based database(e.g. MySQL): ACID, structured data but might have scalability for huge amount data.
So here we use NoSQL database MongoDB, although it's not strong for reading request but it has high performance especially for write heavy scenario.
And for cache, we use Redis, as it's key-value store and performs well as a cache level,
For scalability, I used load balancer, trying to distribute evenly the request to it service, and we could combine region-based and weighted round-robin for load balancing
Failure scenarios/bottlenecks
The bottle network maybe inside the shortening service, which might takes long time and resources, as we need a good encoding algorithm.
Besides that, due to our great request number, we need consider a good sharding policy.
Future improvements
I will try to optimize the efficing for shortening service and support more functionalities like editing shortend URLs and customized urls