Kubernetes Rolling Update not obeying 'maxUnavailable' replicas when redeployed in autoscaled conditions

Kubernetes

Rolling Update

maxUnavailable

Autoscaling

Deployment Issues

Kubernetes Rolling Update not obeying 'maxUnavailable' replicas when redeployed in autoscaled conditions

Master System Design with Codemia

Enhance your system design skills with over 120 practice problems, detailed solutions, and hands-on exercises.

Start Practicing Learn More

Kubernetes is the leading orchestration platform for containerized applications, offering features such as scaling, resource management, and smooth application deployments. Among these features, the rolling update strategy is critical for ensuring zero downtime during application upgrades. However, when deploying applications in environments that use autoscaling, Kubernetes' rolling update feature can sometimes behave unpredictably, particularly in relation to the maxUnavailable setting. This article explores the nuances of this complex behavior and illustrates why Kubernetes rolling updates might not always adhere to maxUnavailable limits under autoscaled conditions.

Understanding Rolling Updates and `maxUnavailable`

The Basic Rolling Update Process

Kubernetes Deployments support a rolling update strategy to transition from one version of an application to another smoothly. During a rolling update, new replica sets are gradually introduced while the existing replica sets are phased out, minimizing disruption. The rolling update process relies heavily on two key parameters:

**maxUnavailable **: Defines the maximum number of Pods that can be unavailable during the update process. This setting helps ensure application availability.
**maxSurge **: Specifies how many additional Pods can be created to facilitate the update beyond the desired number of Pods.

The Role of Autoscaling

Autoscaling in Kubernetes adjusts the number of Pod replicas according to current load conditions. The use of a Horizontal Pod Autoscaler (HPA) modifies the replica count based on metrics like CPU and memory utilization, providing elasticity to the system. Although powerful, this dynamism introduces complexities during rolling updates.

The interplay between rolling updates and autoscaling can lead to situations where the maxUnavailable constraint is not respected as expected. This is particularly true when the Deployment is subjected to changes in resource demand, requiring immediate attention from the Autoscaler.

When `maxUnavailable`

is Ignored

Illustration of the Problem

Suppose you have a Deployment with:

Desired replicas: 10
maxUnavailable : 2
maxSurge : 2

Under normal, static conditions, Kubernetes should ensure that no more than two Pods are unavailable during the update. However, in an autoscaled environment, consider the following scenario:

An HPA scales replicas to 15 in response to high CPU usage before the update starts.
The rolling update proceeds, mindful of the maxUnavailable parameter.
As the load decreases, the HPA scales down, potentially triggering deletions that overlap with rolling update Pod terminations.

During the concurrent actions of scaling down and rolling update, more than the allowed number of Pods might become unavailable because the Autoscaler is not aware of rolling update internals. In extreme cases, this can lead to downtime that exceeds the anticipated limits, defeating the purpose of having specified maxUnavailable .

Technical Explanation

Kubernetes separates the concerns of autoscaling and rolling updates, running them as independent controllers. This decomposition leads to non-coordinated actions:

RollingUpdateStrategy: Operates purely on the ReplicaSet, without considering changes induced by HPA. Although it respects maxSurge and maxUnavailable , it doesn't interact with HPA operations occurring simultaneously.
HorizontalPodAutoscalerController: Acts based on metric thresholds and makes scaling decisions without insight into ongoing rolling update workflows.

The lack of a shared, stateful interaction model results in multi-controller adjustments that fail to obey the strictures defined by either controller independently. This becomes particularly problematic when sudden scale-in events occur during update phases.

Mitigating the Impact

Recommended Strategies

While there's no built-in solution to perfectly coordinate autoscaled deployments with rolling updates, certain strategies may help mitigate the potential for increased downtime:

Prioritize Metrics Stabilization: Allow time for metrics driving the HPA to stabilize before initiating rolling updates, reducing the likelihood of aggressive scaling actions.
Pre-emptive Manual Interventions: Temporarily disable autoscaling during critical updates to maintain control over scaling behavior explicitly.
Custom Controllers: Develop custom controllers or deploy operators that can orchestrate between HPA and Deployment updates.
Segmented Updates: Use stages or canary updates to isolate potential disruptions to specific segments before a full rollout.

Conclusion

The behavioral overlap of Kubernetes rolling updates and autoscaling presents challenges that need meticulous management. While Kubernetes is architected for flexibility, the absence of cross-controller communication can skew expected behaviors such as adherence to maxUnavailable . Identifying and addressing these interactive components ensures that Kubernetes remains robust and effective, even in dynamically scaling environments.

Here's a table summarizing the key concepts and strategies discussed:

Concept	Explanation
Rolling Update	Incremental upgrade method ensuring service availability
`maxUnavailable`
Limits Pods that can be removed during an update
`maxSurge`
Defines additional Pods beyond the original count
Autoscaling	Automatic adjustment of the number of replicas based on load metrics
HPA	Controller that modifies replica count using metric thresholds
Potential Issues	`maxUnavailable`
might be ignored due to separate scaling operation dynamics
Suggested Mitigations	Metrics stabilization, custom workflows, partial disabling of autoscaling

This article has delved into the complex interaction between rolling updates and autoscaling, highlighting areas where disparities arise. By refining understanding of these behaviors, administrators can tailor their orchestrations for maximum efficiency and reliability.

Kubernetes Rolling Update not obeying 'maxUnavailable' replicas when redeployed in autoscaled conditions

Master System Design with Codemia

Understanding Rolling Updates and maxUnavailable

The Basic Rolling Update Process

The Role of Autoscaling

When maxUnavailable

Illustration of the Problem

Technical Explanation

Mitigating the Impact

Recommended Strategies

Conclusion

Understanding Rolling Updates and `maxUnavailable`

When `maxUnavailable`