How get difference between 2 different prometheus metrics?
Master System Design with Codemia
Enhance your system design skills with over 120 practice problems, detailed solutions, and hands-on exercises.
When working with Prometheus, a powerful open-source monitoring and alerting toolkit, one might face scenarios where it is necessary to calculate the difference between two metrics. This operation is common in system monitoring to track variations such as the increase or decrease in memory usage, rate of error messages, or changes in network traffic over time. Understanding how to accurately compute these differences can offer deep insights into system performance and health.
Understanding Prometheus Metrics
Prometheus collects and stores its metrics as time series data, where every metric name includes a set of labeled dimensions. Metrics data in Prometheus are primarily of four types:
Counter: A cumulative metric that represents a single numerical value that only ever goes up.Gauge: A metric that represents a single numerical value that can arbitrarily go up or down.Histogram: A cumulative metrics that provides a count of observations in configurable buckets of values.Summary: Similar to histogram, but provides a total count of observed values and the sum of observed values.
Basic Operations on Metrics
Prometheus supports various operators including basic arithmetic (addition, subtraction, multiplication, division) and comparison operators which can be leveraged to find the difference between two metrics.
Step-by-Step Process to Find the Difference Between Two Metrics
1. Identifying the Metrics
First, identify the metrics you wish to compare. For the sake of an example, let's say we have two gauge metrics: gauge_metric_one and gauge_metric_two.
2. Writing the Query
Using Prometheus Query Language (PromQL), we can directly subtract one metric from another. The query would look like:
This query will return the difference between both metrics for all instances where labels match. If labels do not match, and you wish to compare regardless, you might need to use the ignoring keyword.
This will subtract the two metrics ignoring their job labels.
3. Visualizing the Data
After executing the query in Prometheus's expression browser or Grafana, you will see the resulting differences as a new time series graph.
Advanced Usage: Using rate() Function
For counter types, since they only increase or reset, to find the difference, you usually take the rate of change. Here is how you might do it:
This calculates the per-second average rate of increase of the counters over the last 5 minutes for both metrics, then finds the difference.
Example Use Cases
- Resource Utilization: Compare
memory_usedandmemory_freeto understand memory saturation. - Traffic Analysis: Subtract
incoming_trafficfromoutgoing_trafficto monitor net network traffic status.
Summary Table
| Operation | Description | Example Query |
| Direct subtraction | Subtract one gauge from another | gauge_metric_one - gauge_metric_two |
| Rate difference | Difference of rates for counters | rate(metric_one[5m]) - rate(metric_two[5m]) |
| Ignoring labels | Subtract without considering specific labels | metric_one - ignoring(job) metric_two |
Conclusion
Calculating the difference between two Prometheus metrics can provide essential insights and is foundational for effective monitoring and alerting. By utilizing PromQL effectively, one can adapt to numerous scenarios to maintain robust observability infrastructure. Whether comparing simple gauges or analyzing the rate of counters, each method offers unique advantages depending on the type of data and the desired outcome.

