AWS Network ELB take 4 minutes to recognise target as healthy
Master System Design with Codemia
Enhance your system design skills with over 120 practice problems, detailed solutions, and hands-on exercises.
Introduction
A Network Load Balancer does not start routing traffic to a target the moment the process starts listening. It waits for the target group health checks to succeed often enough, and that timing is controlled by the target group's settings plus the application's own readiness time. If the delay looks like four minutes, the cause is usually configuration math rather than a mysterious AWS failure.
Health Checks Determine the Transition to Healthy
For an NLB target group, health status is based on repeated checks. A target becomes healthy only after enough consecutive successful checks have passed. The key variables are:
- the health check interval
- the healthy threshold count
- the actual startup time of the application
- the type of health check, such as TCP or HTTP
The core idea is simple: a target must be ready early enough and stay ready long enough to pass the threshold.
Estimate the Delay with Simple Math
A rough estimate is:
- application startup or warm-up time
- plus
HealthCheckIntervalSeconds * HealthyThresholdCount - plus a small amount of propagation time
Suppose the application needs 60 seconds before it is actually ready. If the target group checks every 30 seconds and requires 5 consecutive successes, that alone adds about 150 seconds after readiness begins. The total can easily approach or exceed four minutes.
That is why people often see a stable multi-minute delay after deployment and assume something is wrong with the load balancer, when the target group settings already explain most of it.
Inspect the Target Group Settings
Use the AWS CLI to see the real values instead of guessing:
Then inspect target health:
If you need faster recognition and your service can tolerate it, reduce the interval or lower the healthy threshold.
That change can cut time-to-healthy significantly, but it should be based on measured startup behavior, not optimism.
Make the Health Check Reflect Real Readiness
A common problem is that the process opens the port before the application is actually ready to serve requests. Database migrations, cache warm-up, secret loading, or downstream dependency checks can all delay real readiness.
With a TCP health check, NLB only knows that the port accepted a connection. That may be too weak a signal. If the service supports HTTP health checks, use a readiness endpoint that returns success only when the service is genuinely ready.
For example, a containerized application might expose:
and return 200 only after critical dependencies are available. That gives the load balancer a better signal than a plain socket check.
Faster Is Not Always Better
There is a tradeoff between quick registration and stable health decisions. If the interval is too short or the threshold too low, a noisy startup can produce flapping: healthy, then unhealthy, then healthy again. That is especially common when a service becomes reachable before it finishes warming up.
A disciplined tuning process is:
- measure real startup time
- decide what “ready” should mean
- configure a health check that measures that state
- reduce interval and threshold only as far as stability allows
That approach is better than setting aggressive health-check numbers and hoping the service is ready in time.
Deployment Context Matters
If targets are attached through an Auto Scaling Group, ECS service, or Kubernetes integration, there may also be delay between instance registration and the first meaningful health check. Distinguish between infrastructure registration time and application readiness time when debugging.
Logs from the service itself are often more informative than the NLB status alone. If the application does not start accepting traffic until minute three, the load balancer is probably behaving correctly.
Common Pitfalls
The most common mistake is blaming the NLB when the application is still warming up. The load balancer cannot mark a target healthy if the target is not actually ready.
Another mistake is ignoring the multiplication effect of interval and threshold. Even reasonable-looking settings can add up to minutes once you include startup time.
Teams also use a TCP health check when they really need an application-level readiness signal. A listening port is not always the same thing as a healthy service.
Summary
- NLB target health depends on repeated successful health checks, not just process startup.
- The total delay is usually startup time plus interval-threshold timing.
- Inspect the target group configuration before assuming AWS is slow.
- Use a readiness-oriented health check when possible instead of a weak socket-only check.
- Tune interval and threshold carefully so faster registration does not create unstable target health.

