My Solution for Designing a Simple URL Shortening Service: A TinyURL Approach
by quintessence9189
Requirements
Functional Requirements:
- Create a short URL for a given long URL.
- Return the long URL associated with a given short URL.
Non-Functional Requirements:
- Low latency during redirect to long URL (< 200 msec)
- availability >>> consistency, cou1d take 1 min for the short->long url mapping to be in effect.
- High throughput (TBD)
- Scale
- 1 Million URL's writes per day for 10 years
- 1B URLs read per day
API Design
API or data flow:
- Post /url/longurl={$url}---> 200 OK (short url)
- Get /url/shorturl --->302 (long url)
High level design:
The API GW acts as a LB and perform auth, rate limitation could also be added here.
Write service:
- Since Writes are smaller than reads, created a write service which maps long to short url.
- Client calls the service to map to short url. We generate a short URL and store it in URL DB. We also write to a cache for read service.
Read service:
- Read service redirects short url to long url
- The redirect could be either http 301 or 302. I choose 302 so that redirects come to us all the time so that we can have metrics and analytics if needed.
Detailed Component Design
- The algorithm used to convert to short url. Generate a md5sum of a random number and then base62encode it. The resulting short url would be between 5-7 bytes. We check in the DB for a collision and if there is a collision we try again to generate a new random number.
- If there is no collision, we save to DB and cache.
The URL write/read generator can also have 3:1 replicas for availability and throughput.
We could add logic to dynamically add more instances if all the shards of URL service are busy in terms of CPU.
DB sharding:
we could shard on the short key URL and scale horizantally with an LB before.
Cache sharding: In Redis , we use short key as the partition key.
Hot shard problem: We used replication so that the hot key would be written to multiple replicas and reads will be load balanced.
We could addition think of adding more clusters, if above is not enough.
Deep dive:
Writes:
10^6/10^5 = 10 URls per sec which is trivial. We don't really need to scale.
For DB: 1M * 365*10 = 3.65B URLS. If each DB entry is 200 bytes.
700 GB. This can easily fit in 1 server. We can definitely have multiple for replication.
Cache size: We only 100 bytes for each entry.
so 3.65B*100 = 365GB. This can fit into redis cache with replication so that reads are fast.
Reads: 1000:1 = 1000 URL/sec.
If each URL take 100 msec to redirect. We need 30K/50 = 300RPserver. We can horizantally scale to 3-4 read servers.