1. Requirements
Functional Requirements
- Support rate limiting based on:
- User / API key / IP / Org
- Per API endpoint
- Admins can:
- Create / update / delete / view rate-limit rules
- System should return:
X-RateLimit-LimitX-RateLimit-RemainingX-RateLimit-ResetRetry-After
- Decision system determines whether to allow or reject each request
Non-Functional Requirements
- Latency: < 10ms decision time
- Throughput: ~1M QPS
- Scalability: Horizontally scalable
- Availability: Highly available (no SPOF)
- Consistency: Eventual consistency acceptable
2. Estimations
Traffic
- ~1M requests/sec (all pass through rate limiter)
Storage
Redis (counters)
- 1M users × 1000 rules × ~32 bytes ≈ 32 GB
PostgreSQL (rules)
- 1000 rules × ~256 bytes ≈ ~256 KB per user
👉 Storage is manageable; throughput & latency are the real challenges
Latency Budget
- Backend API: ~100ms
- Rate limiter budget: <10ms
3. API Design
3.1 Rule Management APIs
POST /rules
PUT /rules/{id}
DELETE /rules/{id}
GET /rules/{id}
GET /rules?scope=...
3.2 Decision API (Internal / Fallback)
POST /shouldAllow
Request:
{
"scope": "user",
"scope_value": "123",
"api": "/payments"
}
Response:
{
"allowed": true,
"remaining": 120,
"reset_time": 1710000000
}
👉 Even though sidecar handles decisions, this API helps in:
- Debugging
- Fallback scenarios
- External integrations
4. Data Storage & Design
4.1 PostgreSQL (Persistent Rules)
Rules Table:
rule_id (UUID PK)
scope_type (user/ip/org/api_key)
scope_value
api_endpoint
algorithm (token_bucket, sliding_window)
max_requests
window_seconds
burst_size
version
Constraint:
UNIQUE(scope_type, scope_value, api_endpoint)
4.2 Redis Cluster (Distributed Counters)
Key:
rl:{hash(scope_value)}:{api}:{shard}
Value:
- tokens
- last_refill_timestamp
👉 Uses sharding to avoid bottlenecks
5. High-Level Architecture (HLD)
The system is divided into Data Plane (request path) and Control Plane (configuration path).
5.1 Data Plane (Request Flow)
- Client sends request to API Gateway
- API Gateway forwards request to RL-Sidecar
- RL-Sidecar:
- Fetches rule from local cache
- Checks local token buffer
- If buffer empty:
- Fetches token batch from Redis Cluster
- Decision:
- If tokens available → forward to backend
- Else → return 429
5.2 Control Plane (Rule Flow)
- Admin updates rule via RL-Modify Service
- Rule stored in PostgreSQL
- Event published to Kafka / PubSub
- RL-Sidecars consume event and update local cache
5.3 Burst Traffic Handling (Gateway + RL)
- API Gateway absorbs initial surge using:
- Connection limits
- Request queueing
- RL-Sidecar uses Token Bucket:
- Allows bursts up to capacity
- Enforces steady rate after burst
👉 Ensures:
- No sudden backend overload
- Smooth traffic shaping
5.4 Stateless Scaling of RL-Sidecars
- RL-Sidecars are stateless workers
- They do NOT store global counters
They only maintain:
- Local rule cache (replicated)
- Local token buffer (temporary)
Scaling Behavior:
- Each API Gateway pod has its own sidecar
- Scaling API Gateway → automatically scales RL capacity
Consistency:
- Redis acts as shared global state
5.5 Hot Key Skew Handling
Problem:
- Popular API/user → single Redis key → hotspot
Solution:
- Key sharding:
rl:{user}:{api}:shard1
rl:{user}:{api}:shard2
- Requests distributed across shards
- Aggregation ensures correctness
5.6 Degraded Mode Operation
Case 1: Redis Slow
- Use local token buffer temporarily
- Reduce dependency on Redis
Case 2: Redis Unavailable
- Circuit breaker activates
- Strategy:
- Critical APIs → fail-close
- Non-critical APIs → fail-open
Case 3: Rule Propagation Delay
- Sidecars continue using cached rules
- Eventual consistency maintained
6. Detailed Breakdown
6.1 Decision Engine (RL-Sidecar)
- Runs alongside API Gateway
- Uses:
- Local rule cache
- Local token buffer
- Avoids network calls for most requests
6.2 Rate Limiting Algorithm
Token Bucket
- Allows burst traffic up to capacity
- Smooth rate limiting after burst
Formula:
tokens = min(capacity, tokens + rate × Δt)
6.3 Local Token Buffering (Critical Optimization)
Instead of:
- 1 Redis call per request ❌
We use:
- Batch token fetch (e.g., 1000 tokens)
👉 Benefits:
- Reduces Redis QPS
- Improves latency
- Handles bursts efficiently
6.4 Concurrency Handling
- Use Redis Lua scripts
- Atomic:
- Read tokens
- Update tokens
6.5 Hot Key Problem
Problem:
- Popular APIs → same Redis key
Solution:
- Key sharding:
rl:user123:/payments:shard1
rl:user123:/payments:shard2
6.6 Rule Propagation
- Use streaming:
- Kafka / PubSub
Flow:
- RL-Modify → publish event
- Sidecars consume → update cache
7. Additional Considerations (Production Readiness)
7.1 Burst Traffic Handling
- Token bucket allows burst up to capacity
- Gateway-level protections:
- Connection limits
- Queue limits
7.2 Stateless Scaling
- RL-Sidecars are stateless:
- No global state stored locally
- Scale horizontally with API Gateway
- Redis is single source of truth
7.3 Cache Miss & Partial Failure Handling
Rule Cache Miss
- Fetch from Redis / fallback defaults
Redis Latency / Failure
- Use local token buffer temporarily
- Apply stricter limits
Full Redis Failure
- Circuit breaker activated
- Strategy:
- Critical APIs → fail-close
- Non-critical APIs → fail-open
Sidecar Restart
- Warm cache via:
- Kafka replay OR
- Snapshot
7.4 Configuration Change Rollout
- Each rule has a version
Flow:
- Admin updates rule → new version
- Event published to Kafka
- Sidecars update asynchronously
Consistency:
- Eventual consistency
- Old + new rules coexist briefly
7.5 Clock Skew & Time Consistency
Problem:
- Distributed systems → inconsistent clocks
Solution:
- Use Redis server time as source of truth
- Avoid fixed window algorithms
- Prefer:
- Token bucket
- Sliding window
7.6 Observability
Track:
- QPS
- Rejection rate
- Redis latency
- Token exhaustion rate
Tools:
- Prometheus
- Grafana
8. Error Handling & Exception Scenarios
8.1 Redis Failures
- Timeout / connection failure:
- Retry with backoff
- Fallback to local buffer
- Persistent failure:
- Circuit breaker triggers
8.2 Kafka / Propagation Failure
- Sidecars continue using last known rules
- Retry consumption
- No immediate system impact
8.3 Data Inconsistency
- Temporary inconsistencies allowed
- Eventually resolved via:
- Kafka propagation
- Redis updates
8.4 Unexpected Traffic Spikes
- Gateway absorbs spike via queueing
- RL enforces limits strictly
8.5 Sidecar Failure
- Restart sidecar
- Reload:
- Rules from Kafka
- Tokens from Redis
Final Summary
- Sidecar-based design ensures ultra-low latency
- Redis cluster + sharding ensures scalability
- Local token buffering reduces load significantly
- Streaming-based rule propagation ensures consistency
- Hot key mitigation + Lua scripts ensure correctness
- Multi-layer fault tolerance ensures high availability