Requirements
Functional Requirements:
- Create a short URL for a given long URL.
- Return the long URL associated with a given short URL.
- Short URL can be permanent and temporary with user provided expiration.
- Users can't manage their urls, they can't create accounts, it's an anonymus user service.
- Anyone can create short urls.
- All shortened urls are public, not tied to permissions.
Non-Functional Requirements:
- Expected rate of url resolution is 1000/sec.
- Expected rate or short url creation is 10/sec.
- The expected amount of urls after a year is ~300 million. Expected rate of expiring urls is less than 1/sec.
- The read/write ratio is 100:1.
- Latency of url resolution should be less than 500ms.
- Latency of url creation should be also less than 500ms.
- Long urls max length is 4KB.
- The system should be highly available but it should prefer consistency over availability: resolve urls on partition but don't allow creation of them.
- The clients will be browser based.
- This is not a finance system, data should be stored durable but some data loss is tolerable.
- To keep latency low, we need a multi region deployment.
- There can be bursts e.g if some viral content has a link of this system. The load changes mainly affect reads.
- This is a world wide service, the time of the day doesn't affect the load.
- There are no security or compliance requirements.
API Design
Web based API so the obvious protocol choice is https with REST as anything can integrate with this.
Serving url resolutions must be on the root path e.g. GET https://bit.ly/xyz
So the link resolution is with:
GET /
which on successful resolution returns with a redirect url containing the long url and a redirect status code.
When there is no such url then it should redirect to our systems not found page.
The link creation is at:
POST /api/v1/urls
That accepts in the body: url, optional expiration time and headers like the anonymus users IP.
On success the response contains the original url and short url.
High-Level Design
The service live in USA, Europe and Asia regions.
The client's traffic will connect to an API Gateway that serves many purposes:
- directs to the right region
- failover handling: if a region is out then it can route to another region
- caches resolved GET /<shorturl> responses up to 1 day this reduces read load from the backend
- common place for observability, like logging requests, metering global metrics like requests count, status codes etc
- directs read requests to the URL resolution service based on the PATH if the the requested service is not in the cache
- directs all create requests to the URL Management service based on the PATH
- firewalling, rate limiting
The API GW connects to url resolution service:
- A separate service form URL management service to scale it independently from the the low traffic writes.
- Resolves short urls from the database.
The URL management service is responsible for
- generating short urls for the input urls and save them in a database. (short url, long url, expires at)
- And to delete expired short urls.
URL generation:
- Uniqueness must be guaranteed
- Hashing a URL to create a short one would be fast but there could be collisions and if so the unique id generation is a problem.
- UUID will be most likely unique but the textual representation of it is long for a short url.
- There must be some algorithm to transform a monotonically increasing number to a text which will be the short url.
- The generated url must be directly converted to the id for fast access
ID Generation DB:
- something that provides monotincally increasing IDs for the URL management service
- Postgres
URL Resolver Database:
- Structured data and the schema is fixed: id, url, expiration time, created at. RDMS or Key Value Store both worked.
- Access patterns:
- considerable amount of ID based lookup. RDMS and Key Value Store both worked.
- deleting of expired url records based on expiration; secondary index RDMS or NoSQL could also work.
- inserting to the DB with the next available id -> RDMS if this DB generates the ID, if an ID Generation Service does it then it can be different.
- read heavy 100:1 if caching is off
- no transactions between entities + single entity
- amount of data: 300 million records / year -> partitioning by id contradicts the expiration time base secondary index scan -> Key Value store that supports TTL.
- Based on those I pick a Key Value store that supports TTL: DynamoDb
Detailed Component Design
URL Resolution service is stateless, it can scale horizontally well, without problems. It is behind a load balancer.
Url Resolver Datatabase is a global dynamodb being in multiple regions. Which means that it is eventual consistent globally, but scales out well, even automatically, Highly Available, has failover to other availability zones or region and provides low latency. Eventual consistency is the tradeoff for the sake of low latency and high availability.
The load on URL Management service is low, just a couple of instances (3) behind a load balancer / region. All instances connect to the ID Generation database in the EU region to generate the next id they can assign to a long url. Once it has an id it saves the url with that in the URL resolver database. There is an issue here with a committed id in the id generation database when saving the url in the resolver database fails.
The ID Generation database is a single master postgres with a hot failover replica in EU.
This will result in high latency for non EU accesses but guarantees the sequential id issuing.
API GW caching:
- a longer ttl, like the expiration time of the url is not viable , the cache would fill quickly
- the service needs to pick min(1 hour, expiration time) as the TTL to avoid returning expired link
- default TTL is 1 hour to defend against bursty loads
- LRU eviction if configurable
API GW:
- I would use the cloud providers solutions to prevent against attacks
- I'd also set some sensible rate limit / ip.