K8s NodePort service is “unreachable by IP” only on 2/4 slaves in the cluster
Master System Design with Codemia
Enhance your system design skills with over 120 practice problems, detailed solutions, and hands-on exercises.
Introduction
When a NodePort service works on some Kubernetes nodes but not others, the problem is usually node-specific networking rather than the service definition alone. The most important checks are externalTrafficPolicy, kube-proxy health, firewall rules, and whether the failing nodes actually have a usable path to the service endpoints.
What NodePort should normally do
A NodePort service exposes the same port on every node. In the default case, traffic hitting any node can be forwarded to a backing pod even if that pod is running on another node.
So if only 2 out of 4 nodes work, something is breaking that expectation on the failing nodes.
Check externalTrafficPolicy first
One high-value check is this:
If the service has:
then only nodes with a local endpoint will accept traffic successfully. Nodes without a local backing pod can appear unreachable by design.
That is often the exact explanation when NodePort works only on a subset of nodes.
Verify endpoint placement:
If only two nodes host matching pods, externalTrafficPolicy: Local explains the pattern immediately.
Inspect kube-proxy on the failing nodes
If the service uses the default forwarding policy and still fails only on some nodes, kube-proxy is the next suspect.
Check:
Also confirm kube-proxy mode and health on the bad nodes:
- iptables or IPVS mode is consistent
- rules are actually programmed
- no repeated sync errors appear in logs
If kube-proxy is broken on two nodes, NodePort can fail there while working elsewhere.
Check node firewalls and cloud security rules
NodePort needs the chosen port to be reachable at the OS and network level. On the failing nodes, compare:
Also verify:
- host firewall rules
- cloud security groups
- node security policies
- on-prem network ACLs
A partial firewall rollout is a very common reason two nodes fail while two nodes work.
Confirm pod-to-pod and node-to-pod networking
NodePort depends on the node being able to reach service endpoints through the cluster network. If the CNI plugin is unhealthy or routing is broken on specific nodes, external NodePort access can fail there.
Good checks:
Then test from inside the cluster:
Inside the shell, probe the pod IPs and service cluster IP to see whether the node-specific failure is really an external-only problem or part of a broader network issue.
Compare working and failing nodes directly
A reliable debugging pattern is side-by-side comparison:
- choose one working node and one failing node
- compare kube-proxy logs
- compare firewall state
- compare routes and IP forwarding settings
- compare whether pods for the service are local on each node
That usually narrows the issue faster than reading generic service YAML repeatedly.
Common Pitfalls
The most common mistake is overlooking externalTrafficPolicy: Local, which intentionally makes NodePort depend on local endpoints. Another is assuming kube-proxy is healthy everywhere just because the service works on some nodes. Teams also often check the service and pods but forget host firewalls or cloud security groups, which can differ across nodes. Node-specific CNI problems are another frequent cause when packets can enter the node but cannot reach backend pods. Finally, people often debug only from outside the cluster and skip comparison between a known-good node and a failing node, which slows the diagnosis.
Summary
- If NodePort works on only some nodes, start by checking
externalTrafficPolicy. - Confirm whether the failing nodes actually have local service endpoints.
- Inspect kube-proxy health and rule programming on the affected nodes.
- Compare firewall and security-group settings between working and failing nodes.
- Verify cluster networking from the failing nodes to the backend pods.
- Use one working node and one failing node as a structured comparison pair during debugging.

