Design An API Rate Limiter - System Design

My Solution for Design an API Rate Limiter

by naquoc17uni

Requirements

Functional Requirements:

per-user / per-API rate limits.
Admins can add/update/delete/view rate-limit rules via an API or console.

Non-Functional Requirements:

Low latency (sub-millisecond)
Fail-open (High availability): If the limiter crashes, the traffic should be let through rather than blocked
High availability: the limiter is replicated, no single point of failure
Distributed-availability
Loose consistency acceptable, the limit on single server is high valuable (100 requests/ minute)

Capacity Estimation

Estimate the scale of the system. Consider daily active users, read/write ratio, storage requirements, bandwidth, and any relevant QPS calculations...

API Design

Define the APIs expected from the system. This is your chance to analyze and define the read and write paths so that you can come up with the high-level design...

The rate limiter is middleware, which stand inside server
We set limit on specific endpoints in backend server:
- GET /products: 1000 requests/ 15 min
- POST /payment: 10 requests / min
The request includes: userToken, endpoint, ... to define and remember session for allowance or block in time
Build a check endpoint to middle layer before the request goes in backend process
Create/ update rate-limit rules in code:
- POST /rules : create a new rule
- PUT /rules/{ruleId} : update an existing rule
- DELETE /rules/{ruleId} : delete a rule
Fetch rules for audit or observability:
- GET /rules : list all rules
- GET /rules/{ruleId} : get one specific rule

High-Level Design

Describe the overall system architecture. Identify the main components needed to solve the problem end-to-end. Use the diagramming tool to create a block diagram.

Use Redis to cache meta of traffics. When the burst traffic hits, every worker sends data to Redis with incremental counter. 3 things can happen:
- Redis handles it fine (happy path)
- Redis gets slow under load
- Redis goes down completely at peak
Workers and Redis shards are stateless and Redis primary node is statefull. Redis will controll and scale horizontally with traffic
Fallback mechanisms is 3 layer architecture:
- Tier 1 (normal): worker -> Redis primary node
- Tier 2 (burst traffic): worker -> local in-memory buffer -> Redis primary node
- Tier 3 (Redis down): fail-open, allow all traffic through
Traffic will pick a random active key for better distribution in shards. At every check call, the addition will be conducted
When the new rule is updated, a notification is fired to all workers. Workers are listening in a timeout and update themself when a new rule is available

Database Design

Define the data model. Identify the main entities, their attributes, and relationships. Consider the choice of database type (SQL vs NoSQL) and justify your decision based on access patterns...

Detailed Component Design

Deep dive into 2-3 key components. Explain how they work, how they scale, discuss tradeoffs, capacity, and any relevant algorithms or data structures.

We will hashmap to save userId and TTL of traffic
Every worker will save the metadata and is calculated when the endpoint 'check' is called
The worker can scale based on primary node
All read-decrement operations on Redis are executed atomically using Lua script. No 2 requests can interleave and read the same token count simultaneously
Three-tier fallback:
- Redis responds normally - use Redis as the authoritative counter
- Redis is slow (timeout > 50ms) - fall back to the worker's local in-memory counter
- Redis is unreachable - fail-open and allow all traffic through
Redis server time is a unique authoritative clock
Local buffer syncs to Redis every 100ms. When the local count exceeds 10% of the RL threshold, whichever comes first
Workers connects to a fixed-size Redis connection pools. If the pool is exhausted under high load, new requests fall back to the local in-memory counter rather than waiting for a connection

Markdown supported