API Gateway throttling -- burst limit vs rate limit
Master System Design with Codemia
Enhance your system design skills with over 120 practice problems, detailed solutions, and hands-on exercises.
Introduction
In the world of APIs, managing the traffic efficiently is crucial to ensure seamless service delivery and to protect backend infrastructures from being overwhelmed by excessive requests. Throttling is a technique used to control the number of requests built up into a service in a given time interval. This article delves into API Gateway throttling, focusing specifically on the two primary types: burst limit and rate limit.
What is API Gateway Throttling?
API Gateway throttling involves restriction mechanisms that control the rate of traffic to an API. It is vital for preventing misuse, maintaining quality of service, and ensuring that resources are not exhausted by excessive demand. The two prominent mechanisms of throttling are the burst limit and rate limit.
Burst Limit vs Rate Limit
Burst Limit
- Definition: The burst limit is the maximum number of requests allowed to be processed in a short duration of time. It allows temporary traffic spikes to be handled smoothly without immediate throttling.
- Working: The burst limit acts like a surge protector. During a sudden spike, as long as the burst capacity is not exhausted, requests are processed without delay.
- Example: Consider an API with a burst limit of 100 requests. If 100 requests reach the API in a single burst, all are processed. If the number exceeds 100, the extra requests are throttled (i.e., rejected or queued for later processing).
Rate Limit
- Definition: The rate limit is the maximum average number of requests that can be processed over a longer interval. It provides a sustainable rate of requests that the API can handle continuously.
- Working: Rate limits work over a defined time period. They ensure that the demand on the API does not exceed what the backend resources can sustain over a long term.
- Example: If an API has a rate limit of 10 requests per second, it can handle 10 requests consistently every second. Any attempt to exceed this average rate results in throttling.
Technical Explanation
The implementation of these limits typically involves a token bucket algorithm, wherein:
- Token Bucket Logic:
- Each request consumes a token.
- Tokens are refilled at a defined rate (e.g., 10 tokens/second for a 10 requests/second rate limit).
- The bucket can hold a predefined maximum number of tokens, representing the burst limit.
- Handling Excess: When requests cannot be fulfilled due to insufficient tokens, they are either queued, delayed, or rejected outright.
Detailed Examples
Scenario 1: E-Commerce API
In an e-commerce platform:
- Burst Usage: During flash sales, traffic can spike significantly. A burst limit of 1000 requests can accommodate such surges.
- Sustained Rate: Over a normal day, a rate limit of 50 requests per second ensures steady processing without risking system overload.
Scenario 2: Social Media API
For a social media application:
- Handling Viral Content: A burst limit allows handling of sudden sharing of viral content.
- Daily Usage: A lower rate limit maintains consistent user experience throughout regular usage patterns.
Comparison Table
| Feature | Burst Limit | Rate Limit |
| Purpose | Manage sudden spikes in traffic | Regulate steady flow of requests |
| Duration | Short-term | Long-term |
| Request Handling | All requests in burst are processed if within limit | Only a certain number of requests are allowed per time interval |
| Typical Use Case | Handling flash sales or viral posts | Ensuring fair usage and protecting resources |
| Mechanism | Token bucket capacity | Token replenishment over time |
Additional Considerations
Throttling Strategies
- Hard vs Soft Throttling:
- Hard throttling strictly enforces limits.
- Soft throttling provides some leniency, often giving users warnings or allow for slight excesses before taking action.
- Queuing Mechanisms: Some gateways offer queuing of excessive requests to be processed later if immediate rejection isn't suitable.
- Graceful Degradation: Modifying responses (e.g., offering cached data) instead of flat-out denying requests can enhance user experience.
Monitoring and Alerts
Regular monitoring and alert systems should be implemented to notify administrators about potential throttling issues. Detailed logging helps in identifying patterns that could lead to service degradation.
Conclusion
Throttling through burst and rate limits is an essential part of API management, providing a balanced approach to handling traffic loads. By understanding and effectively implementing these controls, APIs can deliver reliable performance and maintain robust security against abusive patterns of traffic.
Through this structured approach, businesses can ensure that they provide a smooth user experience even under varying demands.

