AWS Lambda
TooManyRequestsException
Rate Exceeded
Cloud Computing
AWS Troubleshooting

AWS Lambda TooManyRequestsException Rate Exceeded

Master System Design with Codemia

Enhance your system design skills with over 120 practice problems, detailed solutions, and hands-on exercises.

Introduction

TooManyRequestsException from AWS Lambda is a throttling signal. Lambda is telling you that the service cannot accept the invocation at that moment because your function hit a concurrency boundary, is scaling up too quickly, or is constrained by a function-level limit such as reserved concurrency.

The important point is that “rate exceeded” is rarely solved by changing one line of handler code. You need to determine whether the bottleneck is account concurrency, function concurrency, invocation burst rate, or an upstream system sending requests too aggressively.

What Triggers the Error

Lambda scales automatically, but not infinitely and not instantaneously. AWS documents several relevant limits:

  • Account or regional concurrency quota.
  • Reserved concurrency on a function.
  • Function scaling rate as execution environments ramp up.
  • Invocation rate limits tied to available execution environments.

For synchronous invocations, once the function cannot obtain more concurrency quickly enough, additional requests can fail with HTTP 429 and TooManyRequestsException.

In SDKs and logs, you may also see more specific reasons such as account-level concurrency or function-level reserved-concurrency exhaustion.

First Things to Check

Start with CloudWatch metrics and Lambda configuration instead of guessing.

bash
1aws lambda get-function-concurrency --function-name my-function
2aws cloudwatch get-metric-statistics \
3  --namespace AWS/Lambda \
4  --metric-name Throttles \
5  --dimensions Name=FunctionName,Value=my-function \
6  --start-time 2026-03-11T00:00:00Z \
7  --end-time 2026-03-11T01:00:00Z \
8  --period 60 \
9  --statistics Sum

Useful metrics include:

  • 'Throttles'
  • 'ConcurrentExecutions'
  • 'Duration'
  • 'Invocations'
  • 'Errors'

If concurrency is pegged near a known limit, the diagnosis is straightforward. If traffic spikes sharply while concurrency is still ramping, the issue may be Lambda scaling rate rather than a fixed quota ceiling.

Typical Fixes

One category of fix is reducing the concurrency demand per request. Faster functions use concurrency for less time, which lowers the number of simultaneous execution environments needed.

Examples:

  • Reduce execution duration.
  • Move heavy setup work outside the handler where possible.
  • Increase memory if it materially reduces runtime.
  • Remove unnecessary VPC overhead or slow downstream calls.

Another category is smoothing request arrival. If the workload can tolerate buffering, place SQS, EventBridge, or another decoupling layer in front of the function so traffic is absorbed and drained more evenly.

For direct synchronous APIs, implement retries with exponential backoff and jitter on the client side.

python
1import random
2import time
3
4
5def retry_delay(attempt):
6    base = min(2 ** attempt, 30)
7    return base + random.uniform(0, 0.5)
8
9for attempt in range(5):
10    try:
11        # invoke_lambda()
12        break
13    except Exception:
14        time.sleep(retry_delay(attempt))

If a critical function is being starved by other functions in the same account, reserved concurrency can protect capacity for it. If the real issue is overall quota, request a concurrency increase through AWS Service Quotas.

Synchronous Versus Asynchronous Behavior

The impact differs by invocation type. With synchronous requests, the caller sees the throttle immediately. With asynchronous invocation, Lambda queues the event internally and retries throttling failures for a period of time, which changes the failure mode but does not remove the underlying capacity limit.

That distinction matters when debugging. A frontend API may show direct 429 responses, while an event-driven pipeline may show growing lag, retries, or dead-letter routing instead.

Common Pitfalls

A common mistake is focusing only on request count per second. Concurrency is the product of arrival rate and execution duration. A slow function can throttle under a moderate request rate.

Another issue is setting reserved concurrency too low, then forgetting that the function can never scale beyond that number. The symptom looks like random throttling, but the limit is self-inflicted.

Developers also sometimes ignore client retry strategy. Replaying throttled requests immediately without backoff makes the spike worse.

Finally, do not rely on stale quota assumptions. Lambda quotas and scaling behavior are documented by AWS and should be checked against the current service docs and your actual account settings.

Summary

  • 'TooManyRequestsException means Lambda is throttling invocation attempts.'
  • Check concurrency, throttles, duration, and scaling behavior before changing code blindly.
  • Fixes include reducing execution time, smoothing request bursts, adding retries with backoff, or raising quotas.
  • Reserved concurrency can protect a function, but it can also become the cause of throttling.
  • The visible symptom differs between synchronous and asynchronous invocation paths.

Course illustration
Course illustration

All Rights Reserved.