Requirements
Functional Requirements:
- Create a short URL for a given long URL.
- Return the long URL associated with a given short URL.
Non-Functional Requirements:
- scalability: 100m DAU.
- QPS: Peak 100,000 reads, 10,000 writes.
API Design
POST /shorten-url
body:
{
url: "http://www.example.com"
}
GET /full-url?url="http://www.a.c"
Returns 302 HTTP redirect with the full url if the short url exists. Otherwise there are two options:
- Redirect back to main website.
- Return 404 NOT FOUND
High-Level Design
An API gateway is needed to do the following:
- balance between multiple servers and growing need of users and requests. The request distributed in a way that doesn't overload a single server.
- Only authenticated users can create a URL shortener. In that case, we need to authenticate users by reading the Bearer header which should contain a JWT token.
- It rate limits a single IP address from accessing the service too much. For example 10 requests a second from a single IP is allowed
Write service:
- Client calls service with /shorten-url
- Write service generates a v4 UUID that will be mapped to the URL. The chance of a conflict is practically zero - 1 in 2^122. It is stored in an SQL DB called urls as [fullUrl PK, shortUrl], with fullUrl as primary key. Both columns are indexed. An upsert is called but without overwriting existing values. If upsert failed, the existing shortUrl will be returned. Otherwise the new Url will be returned.
- The domain name + /t/ + shortUrl (UUID) is returned to the client. Example: http://www.tiny.com/t/<UUID>
Read service:
- Client calls the URL in the browser.
- The browser passes this call to a read service that's responsible on /t/* URLs. it gets the full-url from the DB. Since short URL is indexed, it's efficient.
- The service returns the URL with 301 redirect status code.
Database:
- Database is an SQL database with one master and multiple slaves. A standard Postgres database can handle about 10k writes per hour. Our requirements fit into this. Regarding reads - We can have 3 read replicas, each can handle around 100k reads per second. This also fits our expectations.
Cache: A caching layer in front of the database will save results of UUID -> full url mapping. We can use Redis for this which has a faster response time than Postgres. Assume average URL size is 100 bytes and UUID is 16 bytes. A 1GB cache size can hold roughly 1,000,000 / 116 which is about 8000 URLs. We can use LRU to evict the cache. The cache will be filled when there's a miss in the cache and when the write service writes a new URL