Prometheus Metrics
Data Analysis
Metric Difference
Monitoring Tools
System Monitoring

How get difference between 2 different prometheus metrics?

Master System Design with Codemia

Enhance your system design skills with over 120 practice problems, detailed solutions, and hands-on exercises.

When working with Prometheus, a powerful open-source monitoring and alerting toolkit, one might face scenarios where it is necessary to calculate the difference between two metrics. This operation is common in system monitoring to track variations such as the increase or decrease in memory usage, rate of error messages, or changes in network traffic over time. Understanding how to accurately compute these differences can offer deep insights into system performance and health.

Understanding Prometheus Metrics

Prometheus collects and stores its metrics as time series data, where every metric name includes a set of labeled dimensions. Metrics data in Prometheus are primarily of four types:

  • Counter: A cumulative metric that represents a single numerical value that only ever goes up.
  • Gauge: A metric that represents a single numerical value that can arbitrarily go up or down.
  • Histogram: A cumulative metrics that provides a count of observations in configurable buckets of values.
  • Summary: Similar to histogram, but provides a total count of observed values and the sum of observed values.

Basic Operations on Metrics

Prometheus supports various operators including basic arithmetic (addition, subtraction, multiplication, division) and comparison operators which can be leveraged to find the difference between two metrics.

Step-by-Step Process to Find the Difference Between Two Metrics

1. Identifying the Metrics

First, identify the metrics you wish to compare. For the sake of an example, let's say we have two gauge metrics: gauge_metric_one and gauge_metric_two.

2. Writing the Query

Using Prometheus Query Language (PromQL), we can directly subtract one metric from another. The query would look like:

plaintext
gauge_metric_one - gauge_metric_two

This query will return the difference between both metrics for all instances where labels match. If labels do not match, and you wish to compare regardless, you might need to use the ignoring keyword.

plaintext
gauge_metric_one - ignoring(job) gauge_metric_two

This will subtract the two metrics ignoring their job labels.

3. Visualizing the Data

After executing the query in Prometheus's expression browser or Grafana, you will see the resulting differences as a new time series graph.

Advanced Usage: Using rate() Function

For counter types, since they only increase or reset, to find the difference, you usually take the rate of change. Here is how you might do it:

plaintext
rate(counter_metric_one[5m]) - rate(counter_metric_two[5m])

This calculates the per-second average rate of increase of the counters over the last 5 minutes for both metrics, then finds the difference.

Example Use Cases

  • Resource Utilization: Compare memory_used and memory_free to understand memory saturation.
  • Traffic Analysis: Subtract incoming_traffic from outgoing_traffic to monitor net network traffic status.

Summary Table

OperationDescriptionExample Query
Direct subtractionSubtract one gauge from anothergauge_metric_one - gauge_metric_two
Rate differenceDifference of rates for countersrate(metric_one[5m]) - rate(metric_two[5m])
Ignoring labelsSubtract without considering specific labelsmetric_one - ignoring(job) metric_two

Conclusion

Calculating the difference between two Prometheus metrics can provide essential insights and is foundational for effective monitoring and alerting. By utilizing PromQL effectively, one can adapt to numerous scenarios to maintain robust observability infrastructure. Whether comparing simple gauges or analyzing the rate of counters, each method offers unique advantages depending on the type of data and the desired outcome.


Course illustration
Course illustration

All Rights Reserved.