AWS ECS Fargate Container Healthcheck command

AWS

ECS

Fargate

Container

Healthcheck

AWS ECS Fargate Container Healthcheck command

Master System Design with Codemia

Enhance your system design skills with over 120 practice problems, detailed solutions, and hands-on exercises.

Start Practicing Learn More

Introduction

In ECS Fargate, a container health check is just a command that runs inside the container and exits with success or failure. The hard part is choosing a check that proves the app is genuinely ready without creating flapping deployments or false alarms.

How Fargate Evaluates a Health Check

The health check configured in an ECS task definition is separate from an Application Load Balancer health check. ECS runs the command inside the container, and the command must exit with code 0 for healthy or a nonzero code for unhealthy.

The main settings are:

'command: the actual process ECS runs'
'interval: how often to run it'
'timeout: how long to wait before marking the attempt as failed'
'retries: how many consecutive failures make the container unhealthy'
'startPeriod: how long to ignore early failures during startup'

That means health checks should focus on application readiness, not just process existence. A process can be alive while the app is still starting, blocked on dependencies, or unable to serve traffic.

Use a Simple Command That Runs Inside the Image

The most reliable health check uses a local endpoint or command that your image already supports. For an HTTP app that exposes /health, a common pattern is:

json

1{
2  "healthCheck": {
3    "command": [
4      "CMD-SHELL",
5      "curl -fsS http://127.0.0.1:8080/health || exit 1"
6    ],
7    "interval": 30,
8    "timeout": 5,
9    "retries": 3,
10    "startPeriod": 20
11  }
12}

CMD-SHELL is useful because it lets you write a shell expression with || exit 1. If you use CMD instead, ECS treats each array element as an argument and does not invoke a shell.

A local loopback address is usually the right target because the command runs inside the same container. You do not need the task IP or service DNS name just to check whether the current container is ready.

Pick a Health Endpoint Deliberately

Not every endpoint makes a good health check. An endpoint that hits a slow database query or calls an external API can make the service look unhealthy for reasons that have little to do with basic readiness.

A practical /health endpoint should verify only the dependencies that must be available before the container can safely receive traffic. For example, an API that cannot function without its database should probably include a database probe. A service that can degrade gracefully might expose a lighter readiness check instead.

Here is a minimal Express example:

javascript

1import express from "express";
2
3const app = express();
4
5app.get("/health", async (_req, res) => {
6  res.status(200).json({ status: "ok" });
7});
8
9app.listen(8080, () => {
10  console.log("listening on 8080");
11});

This kind of endpoint is fast, deterministic, and cheap to run every few seconds.

Tune Startup and Failure Sensitivity

A healthy command can still cause restarts if the timings are too aggressive. Slow boot times are especially common on cold starts, JIT-heavy runtimes, or apps that run migrations during startup.

If the app needs time to warm up, raise startPeriod instead of making the command less meaningful. That preserves a good health check while still giving the service a realistic startup window.

A sensible default is to start with:

'interval around 20 to 30 seconds'
'timeout around 3 to 5 seconds'
'retries around 3'
'startPeriod based on actual cold-start time'

Then adjust based on observed deployment behavior rather than guessing.

Know the Difference Between ECS and Load Balancer Checks

If you use an Application Load Balancer, the load balancer may also probe the service. The two checks serve related but different purposes:

ECS container health decides whether the container itself is healthy
Load balancer health decides whether the target should receive traffic

Often you want both, but they should not contradict each other. If the ECS check is very strict and the load balancer check is very lenient, or the reverse, diagnosis becomes confusing during incidents.

Common Pitfalls

One common mistake is writing a command that depends on tools the image does not contain. Minimal images often do not have curl or wget, so verify the command actually exists in the container.

Another pitfall is probing an external host or a service DNS name instead of the local process. That turns a container health check into a network topology check and creates avoidable noise.

It is also easy to make the health endpoint too expensive. If every probe performs real business work or waits on optional dependencies, the service may flap under load.

Finally, do not confuse container health with load balancer health. They complement each other, but they are not the same signal and should be tuned with different goals in mind.

Summary

ECS Fargate health checks are commands that run inside the container and must exit with the correct status code.
A fast local readiness probe such as curl against 127.0.0.1 is usually the best starting point.
Use startPeriod to handle slow startup instead of weakening the check itself.
Keep the health endpoint lightweight and focused on real readiness.
Tune ECS and load balancer health checks together so they produce consistent operational signals.