Kubernetes
NodePort
network issues
cluster troubleshooting
service connectivity

K8s NodePort service is “unreachable by IP” only on 2/4 slaves in the cluster

Master System Design with Codemia

Enhance your system design skills with over 120 practice problems, detailed solutions, and hands-on exercises.

Introduction

When a NodePort service works on some Kubernetes nodes but not others, the problem is usually node-specific networking rather than the service definition alone. The most important checks are externalTrafficPolicy, kube-proxy health, firewall rules, and whether the failing nodes actually have a usable path to the service endpoints.

What NodePort should normally do

A NodePort service exposes the same port on every node. In the default case, traffic hitting any node can be forwarded to a backing pod even if that pod is running on another node.

So if only 2 out of 4 nodes work, something is breaking that expectation on the failing nodes.

Check externalTrafficPolicy first

One high-value check is this:

bash
kubectl get svc my-service -o yaml

If the service has:

yaml
externalTrafficPolicy: Local

then only nodes with a local endpoint will accept traffic successfully. Nodes without a local backing pod can appear unreachable by design.

That is often the exact explanation when NodePort works only on a subset of nodes.

Verify endpoint placement:

bash
kubectl get endpoints my-service -o wide
kubectl get pods -o wide

If only two nodes host matching pods, externalTrafficPolicy: Local explains the pattern immediately.

Inspect kube-proxy on the failing nodes

If the service uses the default forwarding policy and still fails only on some nodes, kube-proxy is the next suspect.

Check:

bash
kubectl -n kube-system get pods -o wide | grep kube-proxy
kubectl -n kube-system logs <kube-proxy-pod-name>

Also confirm kube-proxy mode and health on the bad nodes:

  • iptables or IPVS mode is consistent
  • rules are actually programmed
  • no repeated sync errors appear in logs

If kube-proxy is broken on two nodes, NodePort can fail there while working elsewhere.

Check node firewalls and cloud security rules

NodePort needs the chosen port to be reachable at the OS and network level. On the failing nodes, compare:

bash
sudo iptables -L -n
sudo ss -ltnp

Also verify:

  • host firewall rules
  • cloud security groups
  • node security policies
  • on-prem network ACLs

A partial firewall rollout is a very common reason two nodes fail while two nodes work.

Confirm pod-to-pod and node-to-pod networking

NodePort depends on the node being able to reach service endpoints through the cluster network. If the CNI plugin is unhealthy or routing is broken on specific nodes, external NodePort access can fail there.

Good checks:

bash
kubectl get nodes -o wide
kubectl describe node <bad-node>
kubectl -n kube-system get pods -o wide

Then test from inside the cluster:

bash
kubectl run net-debug --rm -it --image=busybox -- sh

Inside the shell, probe the pod IPs and service cluster IP to see whether the node-specific failure is really an external-only problem or part of a broader network issue.

Compare working and failing nodes directly

A reliable debugging pattern is side-by-side comparison:

  1. choose one working node and one failing node
  2. compare kube-proxy logs
  3. compare firewall state
  4. compare routes and IP forwarding settings
  5. compare whether pods for the service are local on each node

That usually narrows the issue faster than reading generic service YAML repeatedly.

Common Pitfalls

The most common mistake is overlooking externalTrafficPolicy: Local, which intentionally makes NodePort depend on local endpoints. Another is assuming kube-proxy is healthy everywhere just because the service works on some nodes. Teams also often check the service and pods but forget host firewalls or cloud security groups, which can differ across nodes. Node-specific CNI problems are another frequent cause when packets can enter the node but cannot reach backend pods. Finally, people often debug only from outside the cluster and skip comparison between a known-good node and a failing node, which slows the diagnosis.

Summary

  • If NodePort works on only some nodes, start by checking externalTrafficPolicy.
  • Confirm whether the failing nodes actually have local service endpoints.
  • Inspect kube-proxy health and rule programming on the affected nodes.
  • Compare firewall and security-group settings between working and failing nodes.
  • Verify cluster networking from the failing nodes to the backend pods.
  • Use one working node and one failing node as a structured comparison pair during debugging.

Course illustration
Course illustration

All Rights Reserved.