1. Requirements

Functional Requirements

  • Support rate limiting based on:
    • User / API key / IP / Org
    • Per API endpoint
  • Admins can:
    • Create / update / delete / view rate-limit rules
  • System should return:
    • X-RateLimit-Limit
    • X-RateLimit-Remaining
    • X-RateLimit-Reset
    • Retry-After
  • Decision system determines whether to allow or reject each request

Non-Functional Requirements

  • Latency: < 10ms decision time
  • Throughput: ~1M QPS
  • Scalability: Horizontally scalable
  • Availability: Highly available (no SPOF)
  • Consistency: Eventual consistency acceptable


2. Estimations

Traffic

  • ~1M requests/sec (all pass through rate limiter)

Storage

Redis (counters)

  • 1M users × 1000 rules × ~32 bytes ≈ 32 GB

PostgreSQL (rules)

  • 1000 rules × ~256 bytes ≈ ~256 KB per user

👉 Storage is manageable; throughput & latency are the real challenges

Latency Budget

  • Backend API: ~100ms
  • Rate limiter budget: <10ms

3. API Design

3.1 Rule Management APIs

POST /rules PUT /rules/{id} DELETE /rules/{id} GET /rules/{id} GET /rules?scope=...

3.2 Decision API (Internal / Fallback)

POST /shouldAllow

Request:

{ "scope": "user", "scope_value": "123", "api": "/payments" }

Response:

{ "allowed": true, "remaining": 120, "reset_time": 1710000000 }

👉 Even though sidecar handles decisions, this API helps in:

  • Debugging
  • Fallback scenarios
  • External integrations


4. Data Storage & Design

4.1 PostgreSQL (Persistent Rules)

Rules Table:

rule_id (UUID PK) scope_type (user/ip/org/api_key) scope_value api_endpoint algorithm (token_bucket, sliding_window) max_requests window_seconds burst_size version

Constraint:

UNIQUE(scope_type, scope_value, api_endpoint)

4.2 Redis Cluster (Distributed Counters)

Key:

rl:{hash(scope_value)}:{api}:{shard}

Value:

  • tokens
  • last_refill_timestamp

👉 Uses sharding to avoid bottlenecks





5. High-Level Architecture (HLD)

The system is divided into Data Plane (request path) and Control Plane (configuration path).

5.1 Data Plane (Request Flow)

  1. Client sends request to API Gateway
  2. API Gateway forwards request to RL-Sidecar
  3. RL-Sidecar:
    • Fetches rule from local cache
    • Checks local token buffer
  4. If buffer empty:
    • Fetches token batch from Redis Cluster
  5. Decision:
    • If tokens available → forward to backend
    • Else → return 429

5.2 Control Plane (Rule Flow)

  1. Admin updates rule via RL-Modify Service
  2. Rule stored in PostgreSQL
  3. Event published to Kafka / PubSub
  4. RL-Sidecars consume event and update local cache

5.3 Burst Traffic Handling (Gateway + RL)

  • API Gateway absorbs initial surge using:
    • Connection limits
    • Request queueing
  • RL-Sidecar uses Token Bucket:
    • Allows bursts up to capacity
    • Enforces steady rate after burst

👉 Ensures:

  • No sudden backend overload
  • Smooth traffic shaping

5.4 Stateless Scaling of RL-Sidecars

  • RL-Sidecars are stateless workers
  • They do NOT store global counters

They only maintain:

  • Local rule cache (replicated)
  • Local token buffer (temporary)

Scaling Behavior:

  • Each API Gateway pod has its own sidecar
  • Scaling API Gateway → automatically scales RL capacity

Consistency:

  • Redis acts as shared global state

5.5 Hot Key Skew Handling

Problem:

  • Popular API/user → single Redis key → hotspot

Solution:

  • Key sharding:
rl:{user}:{api}:shard1 rl:{user}:{api}:shard2
  • Requests distributed across shards
  • Aggregation ensures correctness

5.6 Degraded Mode Operation

Case 1: Redis Slow

  • Use local token buffer temporarily
  • Reduce dependency on Redis

Case 2: Redis Unavailable

  • Circuit breaker activates
  • Strategy:
    • Critical APIs → fail-close
    • Non-critical APIs → fail-open

Case 3: Rule Propagation Delay

  • Sidecars continue using cached rules
  • Eventual consistency maintained




6. Detailed Breakdown

6.1 Decision Engine (RL-Sidecar)

  • Runs alongside API Gateway
  • Uses:
    • Local rule cache
    • Local token buffer
  • Avoids network calls for most requests

6.2 Rate Limiting Algorithm

Token Bucket

  • Allows burst traffic up to capacity
  • Smooth rate limiting after burst

Formula:

tokens = min(capacity, tokens + rate × Δt)

6.3 Local Token Buffering (Critical Optimization)

Instead of:

  • 1 Redis call per request ❌

We use:

  • Batch token fetch (e.g., 1000 tokens)

👉 Benefits:

  • Reduces Redis QPS
  • Improves latency
  • Handles bursts efficiently

6.4 Concurrency Handling

  • Use Redis Lua scripts
  • Atomic:
    • Read tokens
    • Update tokens

6.5 Hot Key Problem

Problem:

  • Popular APIs → same Redis key

Solution:

  • Key sharding:
rl:user123:/payments:shard1 rl:user123:/payments:shard2

6.6 Rule Propagation

  • Use streaming:
    • Kafka / PubSub

Flow:

  • RL-Modify → publish event
  • Sidecars consume → update cache

7. Additional Considerations (Production Readiness)

7.1 Burst Traffic Handling

  • Token bucket allows burst up to capacity
  • Gateway-level protections:
    • Connection limits
    • Queue limits

7.2 Stateless Scaling

  • RL-Sidecars are stateless:
    • No global state stored locally
  • Scale horizontally with API Gateway
  • Redis is single source of truth

7.3 Cache Miss & Partial Failure Handling

Rule Cache Miss

  • Fetch from Redis / fallback defaults

Redis Latency / Failure

  • Use local token buffer temporarily
  • Apply stricter limits

Full Redis Failure

  • Circuit breaker activated
  • Strategy:
    • Critical APIs → fail-close
    • Non-critical APIs → fail-open

Sidecar Restart

  • Warm cache via:
    • Kafka replay OR
    • Snapshot

7.4 Configuration Change Rollout

  • Each rule has a version

Flow:

  1. Admin updates rule → new version
  2. Event published to Kafka
  3. Sidecars update asynchronously

Consistency:

  • Eventual consistency
  • Old + new rules coexist briefly

7.5 Clock Skew & Time Consistency

Problem:

  • Distributed systems → inconsistent clocks

Solution:

  • Use Redis server time as source of truth
  • Avoid fixed window algorithms
  • Prefer:
    • Token bucket
    • Sliding window

7.6 Observability

Track:

  • QPS
  • Rejection rate
  • Redis latency
  • Token exhaustion rate

Tools:

  • Prometheus
  • Grafana


8. Error Handling & Exception Scenarios

8.1 Redis Failures

  • Timeout / connection failure:
    • Retry with backoff
    • Fallback to local buffer
  • Persistent failure:
    • Circuit breaker triggers

8.2 Kafka / Propagation Failure

  • Sidecars continue using last known rules
  • Retry consumption
  • No immediate system impact

8.3 Data Inconsistency

  • Temporary inconsistencies allowed
  • Eventually resolved via:
    • Kafka propagation
    • Redis updates

8.4 Unexpected Traffic Spikes

  • Gateway absorbs spike via queueing
  • RL enforces limits strictly

8.5 Sidecar Failure

  • Restart sidecar
  • Reload:
    • Rules from Kafka
    • Tokens from Redis


Final Summary

  • Sidecar-based design ensures ultra-low latency
  • Redis cluster + sharding ensures scalability
  • Local token buffering reduces load significantly
  • Streaming-based rule propagation ensures consistency
  • Hot key mitigation + Lua scripts ensure correctness
  • Multi-layer fault tolerance ensures high availability