How to get number of pods running in prometheus
Master System Design with Codemia
Enhance your system design skills with over 120 practice problems, detailed solutions, and hands-on exercises.
Introduction
Counting running pods in Prometheus is a common Kubernetes monitoring task for capacity planning and alerting. The most reliable source is usually kube-state-metrics, which exposes pod phase metrics. This guide shows practical PromQL queries for running pod counts at cluster, namespace, and workload scope.
Base Metric for Pod Phase
A standard metric is kube_pod_status_phase with labels including namespace, pod, and phase.
To count running pods cluster-wide:
This works because each pod phase metric is represented as one or zero.
Count Running Pods by Namespace
Break down counts per namespace for operational dashboards.
This query is useful for multi-team clusters where ownership follows namespace boundaries.
Count Running Pods for Specific Workloads
Use pod label joins or naming conventions depending on metrics available.
Adjust label keys to match your cluster labeling standards.
Exclude Completed Job Pods
If you want only active service pods, exclude succeeded and failed job-like phases by focusing strictly on running phase as above. For broader availability views, pair with pending and failed queries.
This gives context around rollout health beyond running count.
Use in Grafana Panels
In Grafana, create:
- Single stat panel for total running pods.
- Time series panel for namespace trends.
- Alert panel for sudden drops in critical namespaces.
Use dashboard variables for namespace or app labels to make panels reusable across environments.
Alert Rule Example
Define an alert when running pods drop below expected threshold.
Tune threshold and duration by workload sensitivity.
Compare Desired Versus Running Pods
Running count alone does not show whether workloads are underprovisioned. Pair running pods with desired replica metrics for better alerting.
If this gap is positive for sustained periods, rollout or scheduling issues may exist.
Track Historical Trends
Use range queries to monitor pod count behavior over time.
Trend views help distinguish normal deployment fluctuations from persistent capacity problems.
Multi-Cluster Dashboards
If you federate metrics from several clusters, include a cluster label in aggregations.
Without cluster dimension, counts can be misleading and hide issues localized to one environment.
Namespace Alert Tuning
Define separate thresholds per namespace based on workload criticality. Core platform namespaces usually need stricter availability alerts than batch-processing namespaces with elastic scheduling behavior.
Include deployment events on dashboards so pod-count drops can be correlated with rollouts, node maintenance, or autoscaler actions.
Review pod count alerts after each major scaling-policy change.
Common Pitfalls
A common pitfall is querying metrics that are unavailable because kube-state-metrics is not deployed or not scraped.
Another issue is counting all pod phases and overestimating active workload capacity.
Developers also rely on inconsistent labels, causing workload-specific queries to return partial counts.
A final mistake is alerting on instantaneous dips without for duration, which creates noisy alerts during routine rollouts.
Summary
- Use
kube_pod_status_phaseto count running pods reliably. - Aggregate by namespace for multi-team operational visibility.
- Filter by pod labels for workload-specific counts.
- Pair running count with pending and failed metrics for context.
- Add duration-based alerts to avoid rollout noise.

