How kubernetes HPA with 2 or more metrics behaves - especially the no.of replicas calculation?
Master System Design with Codemia
Enhance your system design skills with over 120 practice problems, detailed solutions, and hands-on exercises.
Kubernetes has become the de facto standard for orchestrating containerized applications. One of its most powerful features is the Horizontal Pod Autoscaler (HPA), which can dynamically adjust the number of running pod replicas based on observed CPU utilization or other application-provided metrics. When employing multiple metrics for autoscaling, Kubernetes HPA follows a nuanced approach to determine the appropriate scaling actions, and understanding this can lead to better-tuned applications and resource savings.
Autoscaling with Multiple Metrics
Kubernetes HPA can be configured to use multiple metrics to determine whether to scale the number of pods up or down. Each metric is evaluated independently, and the final desired state is determined by the metric that requires the most replicas.
Evaluating Multiple Metrics
When using HPA with two or more metrics, Kubernetes performs the following steps to decide the number of desired replicas:
- Fetch Current Metrics: Each metric associated with the HPA is fetched either from the Kubernetes API or a custom metrics API.
- Calculate Desired Replicas for Each Metric: • For Resource Metrics (e.g., CPU and Memory): • The calculation is based on target utilization or absolute value. The formula for CPU utilization is usually:
• For Non-Resource Metrics: • Metrics might be served by a custom metrics provider. For example, if the target number of requests per second (RPS) is a desired metric:
- Determine the Final Replicas Count: • HPA takes the maximum value of the desired replicas from all metrics. For instance, if based on CPU, the calculation yields 5 replicas, and the memory usage implies only 3 are needed, the HPA will scale to 5 replicas.
Technical Example
Imagine an application that needs to scale based on CPU usage and a custom metric "requests per second". The setup in YAML might look like:
• type: Resource • type: Pods
• If the current CPU utilization is 60% (against a target of 50%), and current replicas are 4, the desired replicas for CPU would be: • If the current RPS is 120, with a target average of 100, then: • The HPA scales up to 5 replicas. • Metric Safeguarding: Ensure metric stability and accuracy as fluctuating data can cause erratic scaling behaviors. • Cooldown Periods: It's critical to configure appropriate stabilization windows to avoid oscillations. • Cluster Capacity: Ensure your cluster has the necessary resources to handle scaling for all targeted resources.

