Codemia | Master System Design Interviews Through Active Practice

My Solution for Design an API Rate Limiter with Score: 9/10

by dewdrop_pinnacle261

System requirements

Functional Requirements:

Rate Limiting: The system must limit the number of API requests a client can make over a specific time period (e.g., 100 requests per minute).
Configurable Rules: Allow different rate limits for different users, IPs, or API endpoints. For example, premium users may have higher rate limits.
Real-time Enforcement: The rate limiter must enforce limits in real-time, rejecting requests that exceed the allowed limit.
Granularity: It should support different time windows for rate limiting, such as per second, minute, hour, or day.
Reset Mechanism: Automatically reset the count after the time window expires.
Distributed System Support: Handle rate limiting for systems that may be distributed across multiple servers.

Non-Functional Requirements:

High Availability: The system should be highly available, ensuring it does not become a single point of failure.
Scalability: It must scale to support millions of API calls per second across distributed systems.
Low Latency: The rate-limiting checks should add minimal overhead to the API request response time.
Consistency: Ensure that rate limits are consistently applied across distributed nodes.
Fault Tolerance: The system must handle failures gracefully and continue enforcing rate limits even in case of failures.
Audit and Monitoring: Maintain logs and statistics for rate-limited requests to facilitate debugging and monitoring.

Capacity estimation

Request Volume Estimation

First, we need to understand the expected traffic and the number of requests the system should be able to handle.

Assumptions:

The system will serve 5 million active users.
On average, each user makes 10 API requests per minute (varies based on user behavior).
Rate limit per user is typically 100 requests per minute.
We will estimate the traffic volume for requests per second (RPS), as this will be a crucial factor for infrastructure sizing.

Calculations:

Requests per minute (RPM) per user = 10 requests
Total RPM for 5 million users = 5,000,000 * 10 = 50 million RPM
Convert RPM to RPS:
50,000,000 / 60 = 833,333 RPS (requests per second)

Thus, the system needs to handle 833,333 RPS.

2️⃣ Data Storage Requirements

Now, let’s estimate the data storage required to track user requests and configuration settings.

Data to Store in Redis:

User request counts: We need to store a count of the number of requests made by each user within a specific time window.
Configuration data: We’ll store rate limit configurations for each user (e.g., max requests per minute).

Assume the following:

Each request count (e.g., 4 bytes for integer) and configuration data (e.g., 100 bytes per user).
Number of users = 5 million
Data per user: 4 bytes (request count) + 100 bytes (rate limit configuration) = 104 bytes

Total Data:

Total storage for requests and configurations = 5 million users * 104 bytes = 520 MB

Redis Storage Requirements:

Since Redis handles the cache in memory, you need a distributed system for scalability.
To handle 833,333 RPS, Redis should be sharded across multiple nodes to distribute the load efficiently.
Redis will need multiple GB of memory to store the request count data and configurations.

3️⃣ Infrastructure Scaling

Redis Nodes and Clustering:

To manage high traffic, we’ll set up Redis clustering:

Assume each Redis node can handle 10,000 RPS.
To handle 833,333 RPS, we’ll need about:
- 833,333 RPS / 10,000 RPS per node = 84 Redis nodes (distributed across regions if needed)

These nodes will be responsible for storing the request counts, checking limits, and enforcing the rate-limiting policies.

API Gateway:

The API Gateway should be able to handle incoming requests, forward them to the Rate Limiter Service, and handle failures gracefully.

Assume each API Gateway can handle 100,000 RPS.
To manage 833,333 RPS, you’ll need:
- 833,333 RPS / 100,000 RPS per node = 9 API Gateway nodes

These can be scaled horizontally with a load balancer distributing requests across API Gateway instances.

4️⃣ Latency and Response Time Considerations

The system needs to maintain low latency, ideally under 100 milliseconds per request. To meet this requirement, the following infrastructure and optimizations will be considered:

Redis Caching: Redis provides sub-millisecond latency, making it ideal for high-speed rate limiting.
API Gateway Optimization: Should implement caching for popular users to reduce redundant checks.
Load Balancing: Use auto-scaling with load balancers to distribute traffic evenly.

5️⃣ Monitoring and Logging

For real-time monitoring and logging, assume that logging will generate about 0.5 KB per request.
With 50 million RPM, logging data will generate:
- 50,000,000 * 0.5 KB = 25 GB per minute.
- For storage, log data will be written to a time-series database (e.g., Prometheus or Elasticsearch) with rollup and aggregation to manage the storage size efficiently.

6️⃣ Failover and Redundancy

To ensure high availability and fault tolerance:

Use multi-region deployment for Redis and API Gateway.
Implement replication for Redis to ensure data is available even in case of node failures.
Use auto-scaling to scale up/down Redis nodes and API Gateway nodes based on traffic patterns.

7️⃣ Cost Estimation

Finally, estimating the cost for cloud infrastructure:

Compute: Each Redis node may cost around $200/month for high-performance setups, and you may need 84 nodes.
- Total Redis cost = 84 nodes * $200 = $16,800/month
Storage: Storing logs, configurations, and user data will incur additional cloud storage costs (e.g., S3 or equivalent storage), which may cost around $2,000/month for the estimated 25 GB of logs per minute.
Bandwidth: Transmitting and handling API requests will add bandwidth costs. This may vary depending on the cloud provider and regions, but could cost $3,000/month.

API design

The API will expose endpoints to configure, check, and reset rate limits. Below is a simplified design:

POST /rate-limit-config
- Description: Set or update rate-limiting configurations (e.g., limit, time window).
- Parameters:
  - user_id: The user for whom rate limits are being set.
  - limit: The maximum number of requests allowed.
  - time_window: The time window (e.g., minute, hour).
- Response:
  - status: Success or failure message.
GET /check-rate-limit
- Description: Check if the user has exceeded their rate limit.
- Parameters:
  - user_id: The user making the request.
  - api_key: The API key or IP for rate-limiting purposes.
- Response:
  - status: ok if within limits, or rate-limited if the limit is exceeded.
  - retry_after: Time in seconds to wait before retrying.
GET /reset-rate-limit
- Description: Reset the rate limit for a given user (or API key).
- Parameters:
  - user_id: The user whose rate limit should be reset.
- Response:
  - status: Success message.
GET /rate-limit-stats
- Description: View the statistics of requests made by a user within a time window.
- Parameters:
  - user_id: The user whose stats need to be retrieved.
- Response:
  - requests_made: Number of requests made.
  - limit: The configured rate limit.
  - time_window: The time window of the rate limit.

Database design

The system should track rate limit information in a distributed cache (e.g., Redis) for fast lookups. Here’s a simplified data model:

Data Model:

UserRateLimit:
- user_id (Primary Key): The user being rate-limited.
- api_key: The key to identify the API for rate-limiting.
- requests_made: A counter that tracks the number of requests made by the user.
- timestamp: The timestamp of the last request made.
- time_window: The duration of the rate limit (e.g., minute, hour).
RateLimitConfig:
- user_id: User for whom the rate limits are defined.
- limit: The number of requests allowed in the time window.
- time_window: The time window (e.g., minute, hour).
Redis Keys: Use Redis as the primary data store for tracking request counts:
- user_rate_limit:{user_id}:{api_key}: Stores the number of requests made by the user within a specific time window.
- user_rate_limit_config:{user_id}: Stores the rate limit configuration for the user.

High-level design

API Gateway: Intercepts all incoming requests and delegates to the rate limiter service.

Rate Limiter Service: Handles all logic for enforcing rate limits, interacting with a distributed cache (Redis) to store user request counts and configurations.

Configuration Manager: Allows admins to set and update rate limit configurations for users.

Database/Cache: Redis for fast, scalable data storage.

Notification Service: Alerts admins when users hit their rate limits.

Request flows

User Request: A request comes in through the API Gateway.

Rate Limit Check: The Rate Limiter Service checks Redis for the user's request count within the defined time window.

If the user is within the rate limit, the request is passed through.
If the user exceeds the limit, the service returns a rate-limited response with a retry_after field.

Update Redis: Each successful request updates the user’s request count in Redis.

Detailed component design

Rate Limiter Service:

Data Structure: Use Sorted Sets or Hashmaps in Redis to track user request counts.
Scaling: Redis handles scaling well with clustering and replication. It allows for near-instant lookups and supports sharding across multiple nodes.

Redis:

TTL: Use Time-to-Live (TTL) for key expiration, ensuring that old request counts are cleared after a time window passes.
Atomic Operations: Use atomic operations like INCRBY and EXPIRE in Redis to ensure consistency when updating request counts.

Configuration Manager:

Dynamic Rate Limits: Allows dynamic updates of rate limits for users and APIs without downtime.
Admin UI: A simple interface for administrators to update rate limit settings in real-time.

Trade offs/Tech choices

Redis vs. SQL: Redis is chosen for fast read and write operations, which is essential for real-time rate limiting. SQL databases would be slower due to higher latency.

Distributed Systems: Redis clustering ensures the rate limiter scales well horizontally, but it introduces the complexity of managing multiple Redis instances and handling potential network splits.

Eventually Consistent: Some edge cases (e.g., network partitioning) may lead to inconsistent state across distributed systems, but this is mitigated by retry mechanisms and rate limit configurations.

Failure scenarios/bottlenecks

Redis Failures: Redis may become a bottleneck or single point of failure. Use Redis clustering and replication for fault tolerance.

High Load: If the system experiences a sudden surge in traffic (e.g., a DDoS attack), the rate limiter may be overwhelmed. Sharding and caching strategies (like using multiple Redis instances) can help distribute the load.

Time Window Overlaps: Requests made just before or after a time window reset might lead to inconsistent rate limiting. This can be mitigated with a sliding window approach.

Future improvements

Sliding Window: Implement a sliding window approach for rate limiting instead of fixed time windows to smooth out spikes in request patterns.
AI/ML: Introduce AI-based models to dynamically adjust rate limits based on traffic patterns and user behavior.
Distributed Rate Limiting: Improve cross-data-center rate limiting by implementing replicated counters across regions.