Monitor custom kubernetes pod metrics using Prometheus

Prometheus

Kubernetes

Pod Metrics

Monitoring

Custom Metrics

Monitor custom kubernetes pod metrics using Prometheus

Master System Design with Codemia

Enhance your system design skills with over 120 practice problems, detailed solutions, and hands-on exercises.

Start Practicing Learn More

Introduction

Prometheus scrapes metrics from HTTP endpoints exposed by your application pods. To monitor custom metrics, your application must expose a /metrics endpoint in Prometheus text format, and Prometheus must be configured to discover and scrape that endpoint. The typical setup involves adding a metrics library to your application, annotating pods for auto-discovery, and configuring ServiceMonitor or Prometheus scrape annotations.

Step 1: Instrument Your Application

Add a Prometheus client library to expose custom metrics over HTTP.

Python (Flask)

python

1from flask import Flask
2from prometheus_client import Counter, Histogram, generate_latest, CONTENT_TYPE_LATEST
3
4app = Flask(__name__)
5
6# Define custom metrics
7REQUEST_COUNT = Counter(
8    'app_requests_total',
9    'Total requests',
10    ['method', 'endpoint', 'status']
11)
12
13REQUEST_LATENCY = Histogram(
14    'app_request_duration_seconds',
15    'Request latency in seconds',
16    ['endpoint']
17)
18
19@app.route('/api/orders', methods=['POST'])
20def create_order():
21    with REQUEST_LATENCY.labels(endpoint='/api/orders').time():
22        # ... process order ...
23        REQUEST_COUNT.labels(method='POST', endpoint='/api/orders', status='200').inc()
24        return '{"status": "created"}', 201
25
26@app.route('/metrics')
27def metrics():
28    return generate_latest(), 200, {'Content-Type': CONTENT_TYPE_LATEST}

Go

1package main
2
3import (
4    "net/http"
5    "github.com/prometheus/client_golang/prometheus"
6    "github.com/prometheus/client_golang/prometheus/promhttp"
7)
8
9var requestCount = prometheus.NewCounterVec(
10    prometheus.CounterOpts{
11        Name: "app_requests_total",
12        Help: "Total requests",
13    },
14    []string{"method", "endpoint", "status"},
15)
16
17func init() {
18    prometheus.MustRegister(requestCount)
19}
20
21func main() {
22    http.Handle("/metrics", promhttp.Handler())
23    http.HandleFunc("/api/orders", func(w http.ResponseWriter, r *http.Request) {
24        requestCount.WithLabelValues("POST", "/api/orders", "200").Inc()
25        w.WriteHeader(http.StatusCreated)
26    })
27    http.ListenAndServe(":8080", nil)
28}

Step 2: Expose the Metrics Port in Kubernetes

yaml

1# deployment.yaml
2apiVersion: apps/v1
3kind: Deployment
4metadata:
5  name: order-service
6  labels:
7    app: order-service
8spec:
9  replicas: 3
10  selector:
11    matchLabels:
12      app: order-service
13  template:
14    metadata:
15      labels:
16        app: order-service
17      annotations:
18        prometheus.io/scrape: "true"
19        prometheus.io/port: "8080"
20        prometheus.io/path: "/metrics"
21    spec:
22      containers:
23        - name: order-service
24          image: order-service:latest
25          ports:
26            - name: http
27              containerPort: 8080
28            - name: metrics
29              containerPort: 8080  # Same port if metrics served on same server

The prometheus.io/* annotations tell Prometheus to auto-discover and scrape this pod.

Step 3: Configure Prometheus to Scrape Pods

Option A: Annotations-Based Discovery (prometheus.yml)

yaml

1# prometheus.yml
2scrape_configs:
3  - job_name: 'kubernetes-pods'
4    kubernetes_sd_configs:
5      - role: pod
6    relabel_configs:
7      # Only scrape pods with prometheus.io/scrape=true
8      - source_labels: [__meta_kubernetes_pod_annotation_prometheus_io_scrape]
9        action: keep
10        regex: true
11      # Use custom port from annotation
12      - source_labels: [__meta_kubernetes_pod_annotation_prometheus_io_port]
13        action: replace
14        target_label: __address__
15        regex: (.+)
16        replacement: ${1}
17      # Use custom path from annotation
18      - source_labels: [__meta_kubernetes_pod_annotation_prometheus_io_path]
19        action: replace
20        target_label: __metrics_path__
21        regex: (.+)

Option B: ServiceMonitor (Prometheus Operator)

If you use the Prometheus Operator (kube-prometheus-stack), create a ServiceMonitor:

yaml

1# service.yaml
2apiVersion: v1
3kind: Service
4metadata:
5  name: order-service
6  labels:
7    app: order-service
8spec:
9  selector:
10    app: order-service
11  ports:
12    - name: metrics
13      port: 8080
14      targetPort: 8080
15---
16# servicemonitor.yaml
17apiVersion: monitoring.coreos.com/v1
18kind: ServiceMonitor
19metadata:
20  name: order-service
21  labels:
22    release: prometheus  # Must match Prometheus operator's serviceMonitorSelector
23spec:
24  selector:
25    matchLabels:
26      app: order-service
27  endpoints:
28    - port: metrics
29      path: /metrics
30      interval: 15s

Step 4: Install Prometheus with Helm

bash

1# Add the Prometheus community Helm chart
2helm repo add prometheus-community https://prometheus-community.github.io/helm-charts
3helm repo update
4
5# Install kube-prometheus-stack (includes Prometheus, Grafana, AlertManager)
6helm install prometheus prometheus-community/kube-prometheus-stack \
7  --namespace monitoring \
8  --create-namespace
9
10# Verify
11kubectl get pods -n monitoring

Step 5: Query Custom Metrics

Access the Prometheus UI and query your custom metrics:

promql

1# Total requests per endpoint
2app_requests_total
3
4# Request rate per second over 5 minutes
5rate(app_requests_total[5m])
6
7# 95th percentile latency
8histogram_quantile(0.95, rate(app_request_duration_seconds_bucket[5m]))
9
10# Error rate
11sum(rate(app_requests_total{status=~"5.."}[5m]))
12/
13sum(rate(app_requests_total[5m]))

Step 6: Create Grafana Dashboards and Alerts

yaml

1# alerting rule
2apiVersion: monitoring.coreos.com/v1
3kind: PrometheusRule
4metadata:
5  name: order-service-alerts
6  labels:
7    release: prometheus
8spec:
9  groups:
10    - name: order-service
11      rules:
12        - alert: HighErrorRate
13          expr: |
14            sum(rate(app_requests_total{status=~"5.."}[5m]))
15            /
16            sum(rate(app_requests_total[5m]))
17            > 0.05
18          for: 5m
19          labels:
20            severity: critical
21          annotations:
22            summary: "High error rate on order-service"
23            description: "Error rate is above 5% for 5 minutes"

Common Pitfalls

Missing prometheus.io/scrape: "true" annotation: Without this annotation, Prometheus with annotation-based discovery will not scrape the pod. This is the most common reason custom metrics do not appear in Prometheus.
ServiceMonitor label mismatch: The ServiceMonitor's labels must match the Prometheus operator's serviceMonitorSelector. If the operator is configured to select release: prometheus, your ServiceMonitor must have that label. Check with kubectl get prometheus -o yaml.
Metrics endpoint returning wrong format: Prometheus expects the OpenMetrics/Prometheus text format. Returning JSON or other formats causes scrape failures. Use the official client libraries which handle formatting automatically.
High cardinality labels: Adding labels with many unique values (user IDs, request IDs, timestamps) creates millions of time series and can crash Prometheus. Keep label cardinality low — use buckets for histograms and aggregate at query time.
Scrape interval too aggressive: Scraping every 1-2 seconds generates massive amounts of data. The default 15-30 second interval is appropriate for most applications. Only decrease for truly real-time requirements.

Summary

Instrument your application with a Prometheus client library to expose a /metrics endpoint
Add prometheus.io/scrape, prometheus.io/port, and prometheus.io/path annotations to pod templates
Use ServiceMonitor with the Prometheus Operator for declarative scrape configuration
Install kube-prometheus-stack via Helm for a complete monitoring setup (Prometheus + Grafana + AlertManager)
Query custom metrics with PromQL: rate(), histogram_quantile(), and aggregation functions
Keep label cardinality low to avoid Prometheus performance issues