AWS Network ELB take 4 minutes to recognise target as healthy

AWS

Network Load Balancer

ELB

target health

troubleshooting

AWS Network ELB take 4 minutes to recognise target as healthy

Master System Design with Codemia

Enhance your system design skills with over 120 practice problems, detailed solutions, and hands-on exercises.

Start Practicing Learn More

Introduction

A Network Load Balancer does not start routing traffic to a target the moment the process starts listening. It waits for the target group health checks to succeed often enough, and that timing is controlled by the target group's settings plus the application's own readiness time. If the delay looks like four minutes, the cause is usually configuration math rather than a mysterious AWS failure.

Health Checks Determine the Transition to Healthy

For an NLB target group, health status is based on repeated checks. A target becomes healthy only after enough consecutive successful checks have passed. The key variables are:

the health check interval
the healthy threshold count
the actual startup time of the application
the type of health check, such as TCP or HTTP

The core idea is simple: a target must be ready early enough and stay ready long enough to pass the threshold.

Estimate the Delay with Simple Math

A rough estimate is:

application startup or warm-up time
plus HealthCheckIntervalSeconds * HealthyThresholdCount
plus a small amount of propagation time

Suppose the application needs 60 seconds before it is actually ready. If the target group checks every 30 seconds and requires 5 consecutive successes, that alone adds about 150 seconds after readiness begins. The total can easily approach or exceed four minutes.

That is why people often see a stable multi-minute delay after deployment and assume something is wrong with the load balancer, when the target group settings already explain most of it.

Inspect the Target Group Settings

Use the AWS CLI to see the real values instead of guessing:

bash

aws elbv2 describe-target-groups \
  --target-group-arns arn:aws:elasticloadbalancing:region:account:targetgroup/example/1234567890abcdef

Then inspect target health:

bash

aws elbv2 describe-target-health \
  --target-group-arn arn:aws:elasticloadbalancing:region:account:targetgroup/example/1234567890abcdef

If you need faster recognition and your service can tolerate it, reduce the interval or lower the healthy threshold.

bash

1aws elbv2 modify-target-group \
2  --target-group-arn arn:aws:elasticloadbalancing:region:account:targetgroup/example/1234567890abcdef \
3  --health-check-interval-seconds 10 \
4  --healthy-threshold-count 3

That change can cut time-to-healthy significantly, but it should be based on measured startup behavior, not optimism.

Make the Health Check Reflect Real Readiness

A common problem is that the process opens the port before the application is actually ready to serve requests. Database migrations, cache warm-up, secret loading, or downstream dependency checks can all delay real readiness.

With a TCP health check, NLB only knows that the port accepted a connection. That may be too weak a signal. If the service supports HTTP health checks, use a readiness endpoint that returns success only when the service is genuinely ready.

For example, a containerized application might expose:

text

GET /health/ready

and return 200 only after critical dependencies are available. That gives the load balancer a better signal than a plain socket check.

Faster Is Not Always Better

There is a tradeoff between quick registration and stable health decisions. If the interval is too short or the threshold too low, a noisy startup can produce flapping: healthy, then unhealthy, then healthy again. That is especially common when a service becomes reachable before it finishes warming up.

A disciplined tuning process is:

measure real startup time
decide what “ready” should mean
configure a health check that measures that state
reduce interval and threshold only as far as stability allows

That approach is better than setting aggressive health-check numbers and hoping the service is ready in time.

Deployment Context Matters

If targets are attached through an Auto Scaling Group, ECS service, or Kubernetes integration, there may also be delay between instance registration and the first meaningful health check. Distinguish between infrastructure registration time and application readiness time when debugging.

Logs from the service itself are often more informative than the NLB status alone. If the application does not start accepting traffic until minute three, the load balancer is probably behaving correctly.

Common Pitfalls

The most common mistake is blaming the NLB when the application is still warming up. The load balancer cannot mark a target healthy if the target is not actually ready.

Another mistake is ignoring the multiplication effect of interval and threshold. Even reasonable-looking settings can add up to minutes once you include startup time.

Teams also use a TCP health check when they really need an application-level readiness signal. A listening port is not always the same thing as a healthy service.

Summary

NLB target health depends on repeated successful health checks, not just process startup.
The total delay is usually startup time plus interval-threshold timing.
Inspect the target group configuration before assuming AWS is slow.
Use a readiness-oriented health check when possible instead of a weak socket-only check.
Tune interval and threshold carefully so faster registration does not create unstable target health.