aks reporting Insufficient pods

Kubernetes

AKS troubleshooting

pod management

container orchestration

cloud infrastructure

aks reporting Insufficient pods

Master System Design with Codemia

Enhance your system design skills with over 120 practice problems, detailed solutions, and hands-on exercises.

Start Practicing Learn More

When working with Azure Kubernetes Service (AKS), encountering an "Insufficient pods" warning can block your deployments and prevent applications from scaling. This error means Kubernetes cannot schedule new pods because the cluster has hit a resource or configuration limit. This article walks through the root causes, diagnostic commands, and fixes to resolve the issue.

Understanding the "Insufficient Pods" Error

Kubernetes assigns pods to nodes through the scheduler. When the scheduler cannot find a suitable node for a pending pod, it marks the pod as Pending and emits an event describing the constraint that was violated. The "Insufficient pods" event specifically means one or more nodes have reached their maximum pod count.

Each Azure VM size has a maximum number of pods it can run. This limit depends on the node's networking configuration (Azure CNI vs. kubenet) and the VM size itself. For example, a Standard_DS2_v2 node using Azure CNI defaults to a maximum of 30 pods, while kubenet defaults to 110 pods per node.

Diagnosing the Problem

Start by checking which pods are stuck in Pending state and reading their events:

bash

kubectl get pods --all-namespaces --field-selector=status.phase=Pending

Then inspect a specific pending pod for scheduling failure details:

bash

kubectl describe pod <pod-name> -n <namespace>

Look for events like 0/3 nodes are available: 3 Too many pods in the output. This confirms the "Insufficient pods" condition.

Next, check how many pods each node is currently running and what the maximum is:

bash

kubectl get nodes -o custom-columns=NAME:.metadata.name,PODS:.status.allocatable.pods
kubectl get pods --all-namespaces -o wide --field-selector=spec.nodeName=<node-name> | wc -l

You can also get a comprehensive view of node resource allocation:

bash

kubectl describe node <node-name> | grep -A 5 "Allocated resources"

Common Causes and Fixes

1. Max Pods Per Node Limit

The most frequent cause is hitting the maxPods limit on your nodes. When you create an AKS cluster, this value is set based on the network plugin.

To check the current max pods setting:

bash

az aks show --resource-group <rg> --name <cluster> --query "agentPoolProfiles[].maxPods"

If you are using Azure CNI with the default of 30 pods per node, that limit can be reached quickly, especially with system pods (kube-proxy, CoreDNS, etc.) consuming several slots. You can increase this by creating a new node pool with a higher maxPods value:

bash

1az aks nodepool add \
2  --resource-group <rg> \
3  --cluster-name <cluster> \
4  --name newpool \
5  --node-count 3 \
6  --max-pods 50

Note that you cannot change maxPods on an existing node pool. You must create a new pool, migrate workloads, and then delete the old pool.

2. Cluster Autoscaler Not Configured

If the cluster autoscaler is disabled or misconfigured, AKS will not add nodes when existing ones are full. Enable it on your node pool:

bash

1az aks nodepool update \
2  --resource-group <rg> \
3  --cluster-name <cluster> \
4  --name nodepool1 \
5  --enable-cluster-autoscaler \
6  --min-count 2 \
7  --max-count 10

Verify the autoscaler is working by checking its status:

bash

kubectl -n kube-system logs -l app=cluster-autoscaler --tail=50

3. Resource Requests Too Large

Even if pod count limits are not reached, overly generous CPU or memory requests can prevent scheduling. Review your deployment specs and right-size resource requests based on actual usage:

yaml

1resources:
2  requests:
3    cpu: "250m"
4    memory: "256Mi"
5  limits:
6    cpu: "500m"
7    memory: "512Mi"

Use kubectl top pods and metrics from Azure Monitor to determine realistic request values.

4. Node Taints and Affinity Rules

Taints on nodes or strict affinity rules in pod specs can restrict which nodes a pod can be scheduled on. Check for taints:

bash

kubectl get nodes -o custom-columns=NAME:.metadata.name,TAINTS:.spec.taints

If a node has a taint that your pod does not tolerate, the scheduler will skip that node entirely. Either add a matching toleration to your pod spec or remove the taint if it is no longer needed.

Common Pitfalls

A frequent mistake is assuming that the "Insufficient pods" error is about CPU or memory. It is specifically about the pod count limit per node, which is a separate constraint from compute resources. You can have plenty of CPU and memory available but still hit this error.

Another pitfall is forgetting to account for system pods. DaemonSets like kube-proxy, azure-cni, and monitoring agents each consume a pod slot on every node. On a node with maxPods set to 30, you may only have 24 or 25 slots available for your application workloads.

When using Azure CNI, each pod gets its own IP address from the subnet. If your subnet is too small, you may run out of IP addresses before hitting the pod count limit. Plan your subnet CIDR range to accommodate the maximum number of pods across all nodes.

Summary

The "Insufficient pods" error in AKS is typically caused by reaching the maxPods limit on cluster nodes, not by CPU or memory exhaustion. Diagnose with kubectl describe pod and kubectl describe node to confirm the constraint. Fix the issue by creating node pools with higher maxPods values, enabling the cluster autoscaler, right-sizing resource requests, or removing restrictive taints. Always account for system pod overhead when planning your cluster capacity.