Kubernetes
Flannel
Container Networking
Cluster Management
Troubleshooting

Kubernetes cannot cleanup Flannel

Master System Design with Codemia

Enhance your system design skills with over 120 practice problems, detailed solutions, and hands-on exercises.

Introduction

Flannel gives a Kubernetes cluster a simple pod network, but it also leaves state on every node: interfaces, routes, CNI cache files, and firewall rules. When a node is removed or the CNI stack is reinstalled incorrectly, Kubernetes may appear unable to "clean up Flannel" even though the real issue is stale node-level networking state.

Why Flannel Cleanup Fails

Flannel itself does not own the entire cleanup path. Several layers participate:

  • the kube-flannel DaemonSet
  • the CNI configuration in /etc/cni/net.d
  • the CNI state under /var/lib/cni
  • Linux interfaces such as cni0 and flannel.1
  • routing and iptables or nftables rules

If one layer disappears before the others, automatic cleanup stops halfway through. A common example is deleting the Flannel DaemonSet before draining and rebooting nodes. The pods disappear, but the kernel interfaces and CNI state remain.

Another common case is a node that was hard powered off or reimaged. Kubernetes control plane objects may be gone, yet the surviving node filesystem still contains old network allocations. When the node rejoins, the container runtime can try to reuse broken state.

What To Inspect First

Start by checking whether the cluster still believes Flannel is active:

bash
kubectl get pods -n kube-flannel
kubectl get nodes -o wide
kubectl get daemonset -A | grep flannel

Then inspect one affected node:

bash
1ip link show | grep -E 'flannel|cni'
2ip route
3sudo ls /etc/cni/net.d
4sudo ls /var/lib/cni/networks
5sudo iptables -S | grep -i flannel

Typical symptoms include:

  • 'flannel.1 still exists after Flannel was removed'
  • 'cni0 has an address from an old pod subnet'
  • '/etc/cni/net.d/10-flannel.conflist points to a plugin that no longer exists'
  • '/var/lib/cni/networks contains stale IP allocation files'

If pods are failing with FailedCreatePodSandBox, stale CNI files are usually the first place to look.

Safe Cleanup Sequence

The safest repair is to remove workload pressure first, then clean node state in a controlled order.

1. Drain the node

bash
kubectl drain worker-1 --ignore-daemonsets --delete-emptydir-data

Draining prevents new pods from being scheduled while you clean interfaces and CNI cache data.

2. Stop the runtime and kubelet

On the affected node:

bash
sudo systemctl stop kubelet
sudo systemctl stop containerd

Stopping these services prevents them from recreating interfaces while you delete them.

3. Remove stale Flannel and CNI state

bash
1sudo rm -f /etc/cni/net.d/10-flannel.conflist
2sudo rm -rf /var/lib/cni/networks/*
3sudo ip link delete cni0 || true
4sudo ip link delete flannel.1 || true

If your distribution uses iptables, you may also need to remove stale rules. Do that carefully and only after confirming they belong to an old Flannel install.

4. Restart services

bash
sudo systemctl start containerd
sudo systemctl start kubelet

If Flannel is still the intended CNI, the DaemonSet should recreate the expected configuration. If you are migrating away from Flannel, install the replacement CNI before uncordoning the node.

5. Return the node to service

bash
kubectl uncordon worker-1
kubectl get pods -A -o wide

At this point, new pods should receive clean network setup instead of inheriting stale bridges and allocations.

When the Problem Is a Cluster Migration

Cleanup errors often happen during a CNI migration, for example moving from Flannel to Cilium or Calico. The mistake is usually mixing two CNIs on the same node. Kubernetes does not coordinate that handoff automatically.

The reliable sequence is:

  1. drain nodes
  2. remove old CNI config files
  3. clear node-local CNI state
  4. install the new CNI
  5. uncordon nodes

If you skip step 2 or step 3, the runtime may keep invoking the wrong plugin or reusing incompatible IP state.

Common Pitfalls

The biggest mistake is deleting the Flannel manifest and assuming the node is clean. DaemonSet removal does not guarantee removal of Linux interfaces or cached IP allocations.

Another pitfall is cleaning only Kubernetes objects while ignoring the host. Flannel is a node-level networking component, so host inspection matters more than the pod list alone.

It is also easy to restart kubelet too early. If /etc/cni/net.d still points at a broken Flannel config, kubelet will immediately reproduce the same sandbox creation failure.

Finally, do not run broad firewall flush commands on production nodes unless you fully understand the node's policy model. Remove only rules tied to the abandoned Flannel setup.

Summary

  • Flannel cleanup issues usually come from stale host networking state, not just missing Kubernetes objects.
  • Check interfaces, CNI config files, IP allocation directories, and routing rules on the node.
  • Drain the node before cleanup so kubelet does not race with manual changes.
  • Remove stale cni0, flannel.1, and old CNI cache data in a controlled order.
  • During CNI migrations, never leave old and new plugin configs active on the same node.

Course illustration
Course illustration

All Rights Reserved.