Design An API Rate Limiter - System Design

Requirements

Functional Requirements:

per-user / per-API rate limits: 100 requests per minute for search,
Admins can add/update/delete/view rate-limit rules via an API or console.
Allow/deny decision with Retry-After header: the limiter returns a clear verdict. On denial, it tells the client exactly how many seconds to wait before retrying.

Non-Functional Requirements:

List the key non-functional requirements (eg low latency, scalability, reliability, etc.)...
The API limiter need to have low latency in terms of updating the api rate limit
the limiter must have strong consisitency for updates
The limiter has to be scalable to apply to millioons of users
The limiter needs high throughput, supporting 1 million decisions per second at peak
The limiter has to be atomic. When 2 requests comes in and only 1 token left, those 2 request must not succeed
The limiter must be highly avalible,if the limiter goes down, the platform either blocks all traffic (fail-closed) or allows all traffic (fail-open). Both are bad.

API Design

Define the APIs expected from the system. This is your chance to analyze and define the read and write paths so that you can come up with the high-level design...

The admin:

Management API

CRUD operations for rate limit rules. POST /ratelimit/rules creates a new rule. GET /ratelimit/rules lists all rules with filtering. , another ger for GET /ratelimit/rules/ :id PUT /ratelimit/rules/:id updates a rule. DELETE /ratelimit/rules/:id removes a rule.

This API hits PostgreSQL directly. It is called by operations teams, not by production traffic. Latency of 100ms is acceptable.

uesr:

call some service--> hit API rate limiter--> limiter decide if user extends a certain threshold--> allow pass through or not

Inside the limiter there is the deicsion API, which is a POST request to see

Decision API:

The core endpoint: POST /ratelimit/check. The gateway sends {"user_id": "u_42", "endpoint": "/api/messages", "scope": "user"}. On allow, the response is {"allowed": true, "remaining": 14, "limit": 100, "reset": 1710432000}. On denial: {"allowed": false, "remaining": 0, "limit": 100, "retry_after": 23}. The response takes under 1ms because it reads from Redis, never PostgreSQL.

The response:

the response includes the rateLimit, remainingLimit, and when the limit got reset

High-Level Design

Describe the overall system architecture. Identify the main components needed to solve the problem end-to-end. Use the diagramming tool to create a block diagram.

The request path: Client sends a request to the API Gateway. The gateway extracts the user identity and forwards the request to the Rate Limiting Service (RLS). RLS constructs a Redis key from the user identity and endpoint, runs a Lua script on Redis to atomically check and decrement tokens, and returns allow or deny. If allowed, the gateway forwards the request to the Backend Service. If denied, the gateway returns 429 with Retry-After.

API Gateway: authenticates, extracts user identity, routes to RLS. Does NOT make rate limiting decisions itself.
Rate Limiting Service (RLS): stateless workers. Each instance loads rules from a local cache (populated from PostgreSQL via NOTIFY). For each request, it constructs a key, calls Redis, interprets the result, and returns a decision.
Redis Cluster: stores all bucket state. Executes Lua scripts atomically. The single source of truth for token counts. Sharded by key hash for throughput.
PostgreSQL: stores rule configurations. Provides ACID transactions for admin operations. Publishes change events via NOTIFY.

The Lua script executes these steps atomically on Redis:

Read the hash fields tokens and last_refill from the bucket key.
Calculate elapsed time since last refill: elapsed = now - last_refill.
Refill tokens proportionally: tokens = min(max_tokens, tokens + elapsed * refill_rate).
Check: if tokens >= 1, decrement by 1 and return allowed with remaining count. Otherwise return denied with seconds until next token arrives.
Write updated tokens and last_refill back to the hash. Set TTL to 2x the window.

This lazy refill approach means idle buckets consume zero Redis resources. No background timer needed. Tokens accumulate mathematically when the next request arrives.

Detailed Component Design

Deep dive into 2-3 key components. Explain how they work, how they scale, discuss tradeoffs, capacity, and any relevant algorithms or data structures.

DATABASE DESIGN:

A postgre database on ruleID. with rule_id (uuid), scope type(storing a enum of user, org, ip, apiKey.etc), scope_val(string value), api Endpoint, max_request, algorithm applied (slidinng window, token bucket), window_second(if using sliding window), burst_size (for token bucket). The 2 columns are needed because the algo applied is not sure]

Bucket State (Redis)

Each rate limit bucket is a Redis hash. For token bucket, the hash stores two fields: tokens (current count as a float) and last_refill (Unix timestamp in milliseconds). For sliding window counter, the hash stores prev_count (previous window total) and curr_count (current window total). Each key has a TTL of 2x the window duration, sliding window needs data from both current and previous windows, so 1x TTL would expire previous-window data too early

redis key schema:

Redis keys follow the pattern rl:{scope_type}:{scope_value}:{api_endpoint}. For example, rl:user:u_42:/api/search or rl:ip:10.0.0.1:/api/login. The scope type prefix enables efficient scanning of all limits for a given scope, and the compound key ensures that different scope-endpoint combinations never collide.