Designing A Simple Url Shortening Service A TinyURL Approach - System Design

Requirements

Functional Requirements:

Create a short URL for a given long URL.
Return the long URL associated with a given short URL and redirect to the long url
edit shortened url

Non-Functional Requirements:

low latency, less than 200 ms
scalability - we want to support up to 100 million DAU
high availability, eventual consistency if needed
security - those who created the links can edit

API Design

users are identified using JWT, only valid users can create short links, unregistered users can use the redirect feature.

POST /url -> short-url, 201 Created- short url is returned to share it with others

PATCH /url -> short-url, 200 OK

GET /shortUrl -> redirect, 302 moved temporarily - in case the creating user edited original link, we don't want to cache the result in the browser

we will specifically need two core entities:

User - to hold information about a user, such as id, name, email, createdAt, etc.
Url - to hold information about created links such as id, userId, original_url, short_url, createdAt, etc.

High-Level Design

Describe the overall system architecture. Identify the main components needed to solve the problem end-to-end. Use the diagramming tool to create a block diagram.

we have:

the client
the server - to do the query processing
the database - to hold out information such as url mappings and user information

API tier - is handled in the server. doing a quick calculation of 100 DAU each uses the service 5-20 times a day, lets assume 20, we get 2 billion queries per day. dividing it up uniformly throughout the day is roughly 23.2 thousand requests per second which can be handled by a single server instance. all requests are mapped to entry points in the server which proceeds to handle both write and read/redirect paths.

write path - user requests the server to create a short url. server tries to create a short url based on the creating user id and original url, and then will try to create the entry in the DB and return a short url if it was successful. the unique identifier is based of the hash of original url + creating user id, and then mapping it to base62.

so the unique identifier avoids collisions by hashing both userId (of the generating user) and the original url. the hash creates a unique value which is than mapped to base62. we have 100 million DAU and each uses the service 5-20 times a day, lets assume 20, we need to support 2 billion addresses conceptually. using 6 base62 characters will give us an address range of roughly 55 billion addresses which is more than enough compared to the 2 billion estimation.

redirect path - user requests the original url from the server. server checks first in the cache and returns the original url if it exists. if it doesn't exists it fetches it from the DB and adds it to the cache and only then returns the original_url based on the given short url.

storage path - so data is stored in PostgresDB, which is a relational database, and frequently accessed data is in the cache. the database contains two tables:

users - entry holds columns of id(unique, primary key), name, email, createdAt, etc.
urls - entry holds columns of id(unique, primary key), userId(foreign key), originalUrl, shortUrl, etc.

the ids of each table with also serve as an index (b-tree) for quick lookups.

caching layer - for the cache we'll use Redis withe its performant key-value mappings to access data quickly. accessing the database is roughly 50 ms worldwide, and with caching it reduces it to below 5 ms per query. the cache operates in a cache aside with LRU eviction policy - server first searches it instead of the database. only if data is missing it fetches it from the database and populates the cache before returning a response. older unused short urls gets pushed out while highly demanded urls remain in the cache.

Detailed Component Design

so now lets address the non functional requirements. we want to handle low latency, scalability and high availability.

caching - as stated before we are using LRU eviction policy, and entry invalidation upon write so each read is as fresh as possible. to prevent stale data we can also introduce TTL for the entries with value of 30 seconds.

low latency is achieved with the cache. if we have a hotspot problem we can also introduce an in-memory cache for even quicker responses to not overwhelm the database or the cache. lets say we keep 10000 of these short urls, which sum up to 10 KB of space that can sit nicely in memory. again with an eviction policy of LRU and TTL with a value of 30 seconds.

LRU with TTL ensures the data is kept fresh by keeping frequently accessed short urls in cache or in-memory cache while clearing from memory stale data. so LRU keeps the most used data in and the TTL ensures unused data for a prolonged amount of time will also be kicked out. that way we have a great usage of the memory space. LRU prevents staleness and TTL prevents bloating. when either the LRU cache is full or a TTL of an entry expires the relevant entries are removed from the cache.

scalability can be achieved by separating the single server instance to 2 services - one for reads and one for writes. each has a designated function that we can easily scale separately according to rising demands. now we have a couple of issues so lets address them:

how to know which server to use? we'll introduce an API gateway to be responsible for choosing the right service to use. which can also handle authentication, authorization, load balancing, rate limiting and service discovery(if we use multiple servers for the same purpose).
request failures will be handled using timeouts with retries with exponential backoff and jitter.
for rate limiting we could use a token bucket algorithm. we could add to the database a tokens table with user id, token amount, and refreshAt. each user has an amount per hour, like 10 token. that way we can check in the API gateway whether a user can perform a write action, and if they cant use a "retry-after" approach and inform them using the header and specify the time needed to wait. otherwise the request is dropped.
now scaling each dedicated server is easy and we can also introduce multiple server if needed. we can also optimize any single server if needed. also dividing the server to dedicated servers mitigates risk and contribute to high availability.

high availability is a given here since we use a single database and the separation of concerns of the server allows better availability.

security is again handled with authentication and authorization in the API gateway which now also serves as a barrier to the services behind the scenes and doesn't expose than to attacks such as DDoS.

another problem is generator outage. when the write server fails we don't know where we've stopped, and when spinning up a new instance happens we have duplicate shortUrls. so just in case we can save the information in both the cache and database, and when spinning the server back up we first check the offset in the database and continue from there. but a key point is that we don't want to update this each time since it will introduce latency, so we'll do it in hops of 10,000 address each time. that way, we never use the same shortUrl, and in the worse case scenario we loose 10K addresses which is fine since we have an address space of roughly 55B compared to the required 2B.

for durability we can introduce a secondary "warm" database with copies of all data. that way we can swap to it if needed and when to original database comes back up treat it as the secondary.

another possible issue is the thundering herd, when the cache is down and the services might overwhelm the database. which isn't really the case here since the calculations showed a QPS of 23.2K which the database can handle just fine. but if we do want to preserve latency requirements we could also introduce a secondary cache just in case.

also when multiple users request the same short url we can use request coalescing to reduce amounts of requests to the cache/database.

regarding sharding - a single PostgresDB instance can hold up to a couple of terabytes. url entries are roughly 0.4KB each and with 2B possible short urls we need to create thats 0.8 TB of data. user entries are roughly 0.5KB each with 8B possible users around the world that 4 TB of data. lets sum and round it up to 5 TB of data. a single instance of Postgres is sufficient for the task and there is no need for sharding.

for burst write we can use a dedicated message broker, like kafka, to handle the async task that will eventually write to the database. eventual consistency is fine in this case, and the short url doen't have to work from the first second, a couple seconds of delay is acceptable.

cache invalidation will be handled by the write server which will invalidate the information in the cache after adding a write/update task to the message broker.

regarding partitioning in the system. the most common request is probably a read request so we can add multiple read services to even out the load, all of which use the same cache so they share the information. that way any hotspot problem could be spread out across multiple read servers. these servers can sit behind a layer 4 load balancer to spread the load in a round robin way.