My Solution for Design an API Rate Limiter

by naquoc17uni

Requirements


Functional Requirements:


  • per-user / per-API rate limits.
  • Admins can add/update/delete/view rate-limit rules via an API or console.



Non-Functional Requirements:


  • Low latency (sub-millisecond)
  • Fail-open (High availability): If the limiter crashes, the traffic should be let through rather than blocked
  • High availability: the limiter is replicated, no single point of failure
  • Distributed-availability
  • Loose consistency acceptable, the limit on single server is high valuable (100 requests/ minute)


Capacity Estimation

Estimate the scale of the system. Consider daily active users, read/write ratio, storage requirements, bandwidth, and any relevant QPS calculations...




API Design

Define the APIs expected from the system. This is your chance to analyze and define the read and write paths so that you can come up with the high-level design...


  • The rate limiter is middleware, which stand inside server
  • We set limit on specific endpoints in backend server:
    • GET /products: 1000 requests/ 15 min
    • POST /payment: 10 requests / min
  • The request includes: userToken, endpoint, ... to define and remember session for allowance or block in time
  • Build a check endpoint to middle layer before the request goes in backend process
  • Create/ update rate-limit rules in code:
    • POST /rules : create a new rule
    • PUT /rules/{ruleId} : update an existing rule
    • DELETE /rules/{ruleId} : delete a rule
  • Fetch rules for audit or observability:
    • GET /rules : list all rules
    • GET /rules/{ruleId} : get one specific rule



High-Level Design

Describe the overall system architecture. Identify the main components needed to solve the problem end-to-end. Use the diagramming tool to create a block diagram.


  • Use Redis to cache meta of traffics. When the burst traffic hits, every worker sends data to Redis with incremental counter. 3 things can happen:
    • Redis handles it fine (happy path)
    • Redis gets slow under load
    • Redis goes down completely at peak
  • Workers and Redis shards are stateless and Redis primary node is statefull. Redis will controll and scale horizontally with traffic
  • Fallback mechanisms is 3 layer architecture:
    • Tier 1 (normal): worker -> Redis primary node
    • Tier 2 (burst traffic): worker -> local in-memory buffer -> Redis primary node
    • Tier 3 (Redis down): fail-open, allow all traffic through
  • Traffic will pick a random active key for better distribution in shards. At every check call, the addition will be conducted
  • When the new rule is updated, a notification is fired to all workers. Workers are listening in a timeout and update themself when a new rule is available





Database Design

Define the data model. Identify the main entities, their attributes, and relationships. Consider the choice of database type (SQL vs NoSQL) and justify your decision based on access patterns...




Detailed Component Design

Deep dive into 2-3 key components. Explain how they work, how they scale, discuss tradeoffs, capacity, and any relevant algorithms or data structures.


  • We will hashmap to save userId and TTL of traffic
  • Every worker will save the metadata and is calculated when the endpoint 'check' is called
  • The worker can scale based on primary node
  • All read-decrement operations on Redis are executed atomically using Lua script. No 2 requests can interleave and read the same token count simultaneously
  • Three-tier fallback:
    • Redis responds normally - use Redis as the authoritative counter
    • Redis is slow (timeout > 50ms) - fall back to the worker's local in-memory counter
    • Redis is unreachable - fail-open and allow all traffic through
  • Redis server time is a unique authoritative clock
  • Local buffer syncs to Redis every 100ms. When the local count exceeds 10% of the RL threshold, whichever comes first
  • Workers connects to a fixed-size Redis connection pools. If the pool is exhausted under high load, new requests fall back to the local in-memory counter rather than waiting for a connection



Markdown supported