My Solution for Design an API Rate Limiter with Score: 8/10
by dynamo7055
System requirements
Functional:
- The requests from clients should be limited to servers
- The limit rules should be flexible for different use cases, like IP address, user id, etc.
- The rate limiter should be able to work in distributed environment
- The user should receive notification when the request is throttled.
- the limited request should be either discarded or forwarded to a queue for later processing which depends on use cases.
Non-Functional:
- High availability: the rate limiter should be high available to limit requests from clients
- low latency: the rate limiter should be in low latency when deciding if limiting a request
- High error tolerance: if the rate limiter failed, it should not affect the entire system
- monitoring, alerts and logging
Rate Limiting Algorithms:
We have the following rate limiting algorithms.
- Token Bucket
- We feed tokens to a bucket in a predefined rate. One request consumes one token. When token runs out, the request will be throttled.
- For different scenario, we can set the bucket size and feeding rate.
- + easy implementation
- + handle burst requests scenario
- - hard to tune the bucket size and feeding rate
- Leaking Bucket
- the requests are added into a queue. The consumer processes the requests from the queue in a fixed rate.
- + memory efficient as the queue size is fixed.
- - not able to handle burst requests.
- - hard to turn the queue size and processing rate.
- Fixed Window Counter
- In a fixed time window, each user has a unique counter. The rate limiter increases the counter when it receive a request. If the counter reached to the limit, the request is throttled.
- + easy implementation.
- + memory efficient
- - the spike in the traffic at the end of an interval may cause more requests than allowed quota.
- Sliding Window Log
- log the request and its timestamp. When a request received, count the total number of requests received in the passed time window and decide if throttle the request.
- + accurate throttling.
- - memory not efficient
- Sliding Window Counter
- When a request arrives in the current time window (e.g. at 30% timestamp in the current window), use the total number of requests in the previous time window to estimate the number of request in the last 70% interval of time window. Then plus the number of requests in the first 30% time of the current time window to decide if the current request exceeds the allowed quota.
- + memory efficient
- + Acceptable in accuracy (between sliding window log and fixed window counter)
- - complicated in implementation.
By comparing the above algorithm and their cons / pros, we decide to use the token bucket algorithm for our rate limiter as it is easy implementation and able to handle burst traffic. In actual practice, the algorithm can be dynamic to fit different use cases.
Database design
userID: String
IpAddress: String
timestamp: DateTime
serviceName: String
High-level design
The clients (web browser / mobile app) sends a request to rate limiter middleware.
the rate limiter middleware decides if the request should be limited
if limited, discard the request.
if not limited, pass the request to the API servers.
Flow Steps:
- When a client sends a request to a server, the request is sent to API Gateway first.
- Rate limiting rules are stored in disk as config files. Workers frequently pulls the rate limiting rules from disk and load it to rule cache.
- API gateway get the counter from the memory (redis) and the rule from the rule cache. it decides if the request is passed or throttled.
- If the request is passed, it is redirected to the API servers.
- If the request is throttled, it returns a response to the client with the response code 429 (too many requests). In some cases, the request can be pushed to a queue for later processing.
Rate Limiting Rule:
- The rate limiting rules can be stored in disk as config files. For example:
domain: auth
descriptor:
key: auth_type
value: login
rate_limit:
unit: minute
request_per_unit: 5
Rate Limiting in Distributed System:
Two problems:
- Race Condition:
- In high concurrent environment, two API gateways read the same counter from memory, they increase the counter by 1 and write it back to memory cache.
- Solution:
- add a lock on the counter
- - decrease the performance
- lua script or sorted set data structure in Redis.
- Synchronization:
- a client can send requests to different servers. If the user sessions do not synchronized between the servers, the client could send more requests than the rate limiting rule allowed.
- Solution is to use the centralized data store to store the user session data.
- client1 -> server1 -> centralized data store
- client2 -> server2 -> centralized data store
- Performance Optimization:
- In distributed environment, user should access the closest data center to send the requests.
Monitoring, Alert and Logging
- Monitoring:
- the rate limiting algorithm is effective
- the rate limiting rules are effective
- Load the data to metrics and use dashboard to visualize and alert to take actions when it is not healthy.
- Logging can help to quickly find out the issue.
Trade offs/Tech choices
Explain any trade offs you have made and why you made certain tech choices...
- co-located with servers vs distributed system
- co-located with servers:
- + simple
- - not scalable of coupling. We need to spin up the whole service just to scale the rate limiting.
- distributed system:
- + decouple. independently scaling for servers and rate limiter
- - additional latency probably introduced.
- Where to store the counter? Disk vs Memory
- Disk:
- + persistent (not necessary)
- - high latency
- In memory:
- + fast and support time-based expiration strategy
- Redis is a good choice: INCR and EXPIRE
- INCR: increase by 1 for the stored counter
- EXPIRE: it sets a timeout for the counter. If the timeout expired, the counter will be cleared.
Failure scenarios/bottlenecks
Future improvements
What are some future improvements you would make? How would you mitigate the failure scenario(s) you described above?
Hard rate limiting vs soft rate limiting:
- If the server load is not heavy and a client requests has reached the limit, we can allow extra quota for this client to improve the overall performance.