Design An API Rate Limiter - System Design

System requirements

Functional:

Request Tracking: The system should track the number of requests made by each user within a specific time window (e.g., per minute, hour, or day).
Limits Configuration: The system should allow configuring rate limits (e.g., requests per minute) for different user tiers (e.g., free users vs. premium users).
Response Handling:
- If a user exceeds their limit, the system should return a specific error code (e.g., 429 Too Many Requests).
- The response should contain useful information such as when the user can try again.
Reset Window: There should be a way to define the reset window during which the count resets (e.g., resetting every minute, daily, etc.).
Logging: The system should log requests and enforcement actions for monitoring and potential debugging.
Whitelist/Blacklist: There might be functionality to allow certain users to exceed limits (whitelist) or block certain users (blacklist).
Client Identification: The system should be able to identify users uniquely (e.g., via API keys, IP addresses).
Concurrency Handling: The system should handle concurrent rate limiting effectively to avoid race conditions.

Non-Functional:

Performance:
- The system must handle a high volume of requests per second (e.g., thousands of requests per second) with minimal latency.
- Response times should be less than 100 milliseconds for a normal request.
Scalability:
- The rate limiter should scale horizontally to accommodate increased traffic, allowing the addition of more servers as needed without impacting performance.
Availability:
- The system should maintain high availability (e.g., 99.9%) to ensure that the API services are always accessible and functional.
Reliability:
- The system should ensure that rate limiting logic is consistent and accurate, providing reliable responses without fail.
Security:
- The system should prevent abuse and security threats (e.g., DDoS attacks) by correctly enforcing rate limits and logging such incidents for review.
- Sensitive data should be encrypted in transit and at rest.
Maintainability:
- The system should be easy to maintain and update, with modular components that can be independently developed and deployed.
Usability:
- The API should provide clear documentation regarding rate limits, error codes, and how users can manage their requests.
Compliance:
- The system should comply with relevant regulatory requirements and industry standards to protect user data and privacy (e.g., GDPR).

Capacity estimation

\initial capacity is 500 \rps

\concurrent connections is 1000

\if 1 server can hold 1000 connections it means we need only 1 server for the rate limiter

\peak connections is 2000 thus we need max of 2 servers to hold the peak load

API design

\post /ratelimit/rules

get /ratelimit/rules

\post /reset

\post /whitelist

\post /blacklist\

Database design

RateLimit : +int \ip\address INDEX

RateLimit : +int remaningRequests

Whitelist : +int \ip\address INDEX

WhiteList: +DateTime timestamp

BlackList : +int ipaddress INDEX

|BlackList: +DateTime timestamp

\Rules: + int ipaddress INDEX

|Rules: + enum usertype

Rules: + freeRequestsPerMinute

Log|Record: + intI paddress INDEX

LogRecord: + DateTime |TimeStamps INDEX

LogRecord: + Enum operationStatus (rate limited or success)

High-level design

1) \Ratelimiter middleware - a service that performs rate limiting based on rules. Stands before API servers

2) APi servers - servers we throttle access to

3) Rules DB - a database of user rate limiter rules

4) Rules Cache - a cache of most frequently used rules

5) White/black list db - a database of white/black listed users

6) Optional Rate limited requestes message box - a message box containinig rate limited requests for logging

7) Logging service - a service to log rate limiter evernts

Request flows

General rate limiting flow

1) User contacts rate limiter

2) Rate limiter retreives rules for IP address from rules db

3) Rate limiter retrieves black list record for IP

4) If IP is blacklisted the request is rejected

5) Rate limiter checks if there is enough requests for IP

6) If there is enough requests for IP the requesst is forwarded to IP servers

7) If not the rate limiter answers with 429 too many requests

Reset window flow

1) User sends reeset window request

2) Rate limiter checks if user is in whitelist and havent reset the window in the predefined time period

3) Rate limit is resert for a user

4) User resends request to API servers and rate limiter forward it because request is within limits

Detailed component design

1) Rate limiter internals is implemnted either with:

Token bucket algorithm. An inmemory number of tickets per IP address is tored. When requests passes the ticket is consumed and if the number of tickets is 0 we reject the request. \the tickets are refilled periodically based on a configured time period known as refresh rate.
Another algorithm is to put all requests into a queue with a predefined size and server them in a FIFO manner. If the queue gets full we reject the requests. The queue size is a predefined parameter
|To make it known for a user that he is rate limited we use X-Rate limited family of \http reponse headers that contain information about the rate limit window and remaining requests for a user
If a user sends too many requests within a time window the rate limiter will put a user into a blacklist as an abuser
To have many users access an api behind a single api address we use a special cookin contain session information sent by the rate limiter
To prevent too many requests within a time frame we can use a liding window approach wehre we track the number of requests for a specified time interval. If the number of requests is biggern than max then we reject the requests. We can also count the number of requests per IP or user+id

2) Rate limiter contancts SQL databases for whitelist and with a caching layer on top for a frequently retrieved data. The number of reads and writes is not determied in this case and the data is structured thus we can use SQL

3) Rate limiter contacts rules db which is an in memory document DB or simple cache service like redis with config files as an underlying data. We choose a nosql or a file-based solution here because config files are an unstrctured data

4) Logging service is write ahead log system for simplicity. |If we want to exapnd and scale our system we can use apache Kafka

Trade offs/Tech choices

1) We choose a token bucket solution for rate limiting because it's easier to implement and provides quick rate limiting algortihn with low latency. \the tradeoff is that you have to empirically pick params for refresh rate. To make rate limiting global we can maintain a global toke bucket for all requests

2) For storing rate limiting rules we use files or a document oritnetd storage because the config are unstrctured and can evolve over time. To maintain the database of config files we can use redis where keys are file system path.

Failure scenarios/bottlenecks

1) if rate limiter server fails all existing tokens are lost and the user will have a clean state for rate limiting. \to fix this we can maintain a global database of ip -> remanining requests

2) If rate limiter is under heavy load we can spin up additional rate limiter servers and put them behind a load balancer and shard requests based on the location of incoming requests

Future improvements

1) \use queue based solution isntead a token bucket as it's easier to pick intinal arguments

2) explore other rate limiting algorithms

3) Notify a user via a notification service about request window expiration