Design An API Rate Limiter - System Design

Requirements

Functional Requirements:

per-user / per-API rate limits.
Admins can add/update/delete/view rate-limit rules via an API or console.

Non-Functional Requirements:

List the key non-functional requirements (eg low latency, scalability, reliability, etc.)...
High Availability
Horizontal Scalability - scales based on CPU load or network load to handle around 1 million checks per second.
Fast decisions to determine whether to allow or deny using algorithms like token bucket.

API Design

GET /api/v1/Rules - to fetch all rules

GET /api/v1/Rule?{ruleID} - to fetch specific rule for audit and observability.

POST /api/v1/CreateRule

PUT /api/v1/UpdateRule?{RuleID}

DELETE /api/v1/DeleteRule?{RuleID}

POST /api/v1/decideCheck

High-Level Design

READ PATH:

CLIENT -> CDN -> API Gateway -> Load Balancer -> Check Service -> Rule Service -> Observability Service -> Kafka Queue -> Redis -> Postgres

WRITE PATH:

CLIENT -> CDN -> API Gateway -> Load Balancer -> Check Service -> Rule Service -> Observability Service -> Kafka Queue -> Redis -> Postgres

Before any request is allowed to backend Check service/sidecar checks for its quota and other things and only after that it allows or deny it.

The loadbalancer is integrated into the api gateway api paths through a vpc endpoint. Making it possible for communications.

When ever a request comes to an api it will go through the load balancer from there it will go to the respective service. From there if it is a read operation then it will try to get it from Redis if there a cache hit miss then it will go to Postgres.

We are going to be rolling update the changes to avoid any downtime of the service to have it up.

We have a kafka queue which will basically keep the requests in queue if a service goes down. After the service is up it can pickup where it left off. There will be a bit of lag at this time but it is what we need. We will be going with fail-closed policy.

We have a CDN which will take care of most of the get requests since it has a cache. If that misses we have a redis cache only after cache miss in these 2 will there be a direct hit to DB. Also CDN is highly scalable so down time is non existent. In case redis goes down we have CDN to handle the requests until redis is up.

We wont be using any sticky session. Making this stateless.

We can use salting technique which involves adding a random value to the partition key to distribute the data more evenly across partitions in turn reducing hotspot in a single shard.

Detailed Component Design

CDN - We host our frontend in this. I will be able to handle millions of request and is available on edge. It even acts as the first layer of cache where most of the read requests are taken care off. Also we can integrate our object store with this to serve the content files easily. this can help if external cache is either slow or unavailable.

API Gateway routes the traffic based on the api that is getting hit. This also helps in rate limiting/throttling of requests which prevents any DDoS and Brute force attacks.

Load Balancer this is configured with API gateway and is used to distribute the load across the servers hosting our services. Keeping the traffic uniform and capable of handling the spike in traffic.

Kafka Queue - Takes in the events that the services create and pushes the data into the DB and Redis. Even though if a service goes down it can continue where it left off since the queue will have the data that needs to be processed there by having our graceful degradation.

Cleanup Service - Service used to keep the redis cluster clean by removing the rows after ttl expiry. Can also invalidate the cache in CDN.