Cloudwatch
AWS
Monitoring
Insufficient Data
Cloud Computing

Avoiding INSUFFICIENT DATA in Cloudwatch?

Master System Design with Codemia

Enhance your system design skills with over 120 practice problems, detailed solutions, and hands-on exercises.

Introduction

A CloudWatch alarm enters INSUFFICIENT_DATA when it cannot evaluate the metric stream with confidence. That state is not always a problem, but it often means the alarm configuration does not match how the metric is actually emitted.

Why Alarms Enter INSUFFICIENT_DATA

CloudWatch alarms evaluate a metric over a period and a number of datapoints. If those datapoints are missing, delayed, or filtered by the wrong dimensions, the alarm has nothing solid to evaluate. Typical causes are:

  • the application publishes metrics only occasionally
  • the alarm period is shorter than the publish interval
  • the namespace, metric name, statistic, or dimensions are wrong
  • the metric stops during idle time, deployment windows, or scaling events

The first step is to look at the raw metric in CloudWatch, not just the alarm state. If the graph itself is sparse, the alarm is probably configured more aggressively than the data stream supports.

Align the Alarm With Metric Frequency

If your service emits one datapoint every five minutes, a one-minute alarm period will predictably spend time in INSUFFICIENT_DATA. Match the alarm period to the publication cadence.

For a custom metric sent every minute, a configuration like this is reasonable:

bash
1aws cloudwatch put-metric-alarm \
2  --alarm-name api-error-rate \
3  --namespace MyApp \
4  --metric-name ErrorCount \
5  --statistic Sum \
6  --period 60 \
7  --evaluation-periods 3 \
8  --datapoints-to-alarm 2 \
9  --threshold 5 \
10  --comparison-operator GreaterThanThreshold \
11  --treat-missing-data notBreaching

This tells CloudWatch to evaluate three one-minute periods and trigger if two of them cross the threshold. The important part is that the metric is expected to arrive every minute. If it arrives every five minutes, increase the period.

Choose the Right Missing-Data Behavior

Many alarms do not need missing data to mean failure. CloudWatch lets you control this with treat-missing-data. The common options are:

  • 'notBreaching: missing data is treated as healthy'
  • 'breaching: missing data is treated as unhealthy'
  • 'ignore: keep the existing alarm state'
  • 'missing: leave the alarm eligible for INSUFFICIENT_DATA'

For event-driven metrics, notBreaching is often the best default. Suppose a Lambda function emits ErrorCount only when it runs. During quiet periods, no data should not create alarm noise.

For heartbeat metrics, the opposite may be true. If a service is supposed to publish 1 every minute, then missing data may indicate an outage, so breaching can be appropriate.

Emit Heartbeat Metrics for Quiet Systems

A common design mistake is alarming on a metric that exists only when something bad happens. That can work, but it also produces sparse series. A more robust design is to emit a steady heartbeat or request count alongside the error metric.

python
1import time
2import boto3
3
4cloudwatch = boto3.client("cloudwatch", region_name="us-east-1")
5
6while True:
7    cloudwatch.put_metric_data(
8        Namespace="MyApp",
9        MetricData=[
10            {
11                "MetricName": "Heartbeat",
12                "Value": 1,
13                "Unit": "Count",
14            }
15        ],
16    )
17    time.sleep(60)

With a heartbeat metric, you can create one alarm for availability of data and another for business-specific failures. That separates “service is silent” from “service is unhealthy.”

Use Math and Composite Alarms Carefully

Metric math and composite alarms reduce noise, but they can also increase chances of missing data if one input series is sparse. If a math expression depends on two metrics and one stops publishing, the result may be incomplete.

Before using math alarms, verify that all input metrics share the same period and dimensions. In practice, simpler alarms are easier to trust and troubleshoot.

Common Pitfalls

One pitfall is choosing the wrong statistic. For a count metric, Average can hide the actual behavior while Sum reflects total events per period. A bad statistic can make the graph look empty or misleading.

Another issue is incorrect dimensions. If the application publishes Service=Billing but the alarm filters on Service=Payments, the alarm will never see the datapoints you expect.

Teams also sometimes assume INSUFFICIENT_DATA is always bad. For a newly created alarm or a resource that has not yet emitted data, that state can be normal. The problem is persistent insufficiency, not a short startup window.

Summary

  • 'INSUFFICIENT_DATA usually means the alarm cadence does not match the metric stream.'
  • Set alarm periods that match how often the metric is actually published.
  • Use treat-missing-data deliberately instead of accepting the default behavior.
  • Emit heartbeat metrics when you need continuous visibility from quiet services.
  • Check raw metrics, dimensions, and statistics before blaming the alarm itself.

Course illustration
Course illustration

All Rights Reserved.