Kubernetes cannot cleanup Flannel
Master System Design with Codemia
Enhance your system design skills with over 120 practice problems, detailed solutions, and hands-on exercises.
Introduction
Flannel gives a Kubernetes cluster a simple pod network, but it also leaves state on every node: interfaces, routes, CNI cache files, and firewall rules. When a node is removed or the CNI stack is reinstalled incorrectly, Kubernetes may appear unable to "clean up Flannel" even though the real issue is stale node-level networking state.
Why Flannel Cleanup Fails
Flannel itself does not own the entire cleanup path. Several layers participate:
- the
kube-flannelDaemonSet - the CNI configuration in
/etc/cni/net.d - the CNI state under
/var/lib/cni - Linux interfaces such as
cni0andflannel.1 - routing and
iptablesornftablesrules
If one layer disappears before the others, automatic cleanup stops halfway through. A common example is deleting the Flannel DaemonSet before draining and rebooting nodes. The pods disappear, but the kernel interfaces and CNI state remain.
Another common case is a node that was hard powered off or reimaged. Kubernetes control plane objects may be gone, yet the surviving node filesystem still contains old network allocations. When the node rejoins, the container runtime can try to reuse broken state.
What To Inspect First
Start by checking whether the cluster still believes Flannel is active:
Then inspect one affected node:
Typical symptoms include:
- '
flannel.1still exists after Flannel was removed' - '
cni0has an address from an old pod subnet' - '
/etc/cni/net.d/10-flannel.conflistpoints to a plugin that no longer exists' - '
/var/lib/cni/networkscontains stale IP allocation files'
If pods are failing with FailedCreatePodSandBox, stale CNI files are usually the first place to look.
Safe Cleanup Sequence
The safest repair is to remove workload pressure first, then clean node state in a controlled order.
1. Drain the node
Draining prevents new pods from being scheduled while you clean interfaces and CNI cache data.
2. Stop the runtime and kubelet
On the affected node:
Stopping these services prevents them from recreating interfaces while you delete them.
3. Remove stale Flannel and CNI state
If your distribution uses iptables, you may also need to remove stale rules. Do that carefully and only after confirming they belong to an old Flannel install.
4. Restart services
If Flannel is still the intended CNI, the DaemonSet should recreate the expected configuration. If you are migrating away from Flannel, install the replacement CNI before uncordoning the node.
5. Return the node to service
At this point, new pods should receive clean network setup instead of inheriting stale bridges and allocations.
When the Problem Is a Cluster Migration
Cleanup errors often happen during a CNI migration, for example moving from Flannel to Cilium or Calico. The mistake is usually mixing two CNIs on the same node. Kubernetes does not coordinate that handoff automatically.
The reliable sequence is:
- drain nodes
- remove old CNI config files
- clear node-local CNI state
- install the new CNI
- uncordon nodes
If you skip step 2 or step 3, the runtime may keep invoking the wrong plugin or reusing incompatible IP state.
Common Pitfalls
The biggest mistake is deleting the Flannel manifest and assuming the node is clean. DaemonSet removal does not guarantee removal of Linux interfaces or cached IP allocations.
Another pitfall is cleaning only Kubernetes objects while ignoring the host. Flannel is a node-level networking component, so host inspection matters more than the pod list alone.
It is also easy to restart kubelet too early. If /etc/cni/net.d still points at a broken Flannel config, kubelet will immediately reproduce the same sandbox creation failure.
Finally, do not run broad firewall flush commands on production nodes unless you fully understand the node's policy model. Remove only rules tied to the abandoned Flannel setup.
Summary
- Flannel cleanup issues usually come from stale host networking state, not just missing Kubernetes objects.
- Check interfaces, CNI config files, IP allocation directories, and routing rules on the node.
- Drain the node before cleanup so kubelet does not race with manual changes.
- Remove stale
cni0,flannel.1, and old CNI cache data in a controlled order. - During CNI migrations, never leave old and new plugin configs active on the same node.

