How Kubernetes Service Discovery Hides a Lot of Moving Parts Behind One DNS Name

April 20, 2026


Service-to-service traffic in Kubernetes looks deceptively simple. A pod makes an HTTP call to payments.default.svc.cluster.local and it just works. That hostname behaves like any other DNS name. But there are at least four moving parts hiding behind it, and understanding them is the difference between debugging by guessing and debugging on purpose.

Here is what actually happens on that one call.

  1. The client pod does a DNS lookup. The resolver in the pod is configured to point at CoreDNS, which runs as its own deployment inside the cluster. CoreDNS looks up the service name and returns the ClusterIP. This IP is not a real host. It is a virtual IP that lives only in the kernel's routing rules.
  2. The request leaves the pod headed for the ClusterIP. On the node, kube-proxy (or its iptables and IPVS rules, or Cilium's eBPF programs, depending on your setup) intercepts traffic to that ClusterIP and rewrites the destination to a real pod IP.
  3. The pod IP comes from the Service's Endpoints object, which the control plane keeps in sync with whichever pods currently match the Service's label selector and pass their readiness probes. When a pod is added, removed, or fails its readiness check, the Endpoints object changes within seconds.
  4. The request hits one healthy backend pod. Next request might hit a different one. The client never has to know.

The mental model worth keeping: DNS gives you the stable name. Service gives you the stable virtual IP. Endpoints is the changing list of pods behind it. Pods are the replaceable backends.

That separation is the whole point. Pods are designed to be cattle, not pets. They get rescheduled, OOM-killed, rolled during deploys, evicted off a draining node. Pod IPs change constantly. If callers had to track pod IPs, every deploy would break every caller. Instead, callers talk to the stable name and Kubernetes keeps the routing table honest.

Where this leaks in production: stale DNS caches. Many language runtimes (looking at you, JVM) cache DNS resolutions for the lifetime of the process. So when CoreDNS returns a new ClusterIP after a Service recreation, the client never notices. Fixes are language-specific: set networkaddress.cache.ttl to a small number on the JVM, or use a client library that respects DNS TTLs, or restart the pod after Service churn. The cleaner fix is to never delete and recreate Services in normal operation. Update them in place so the ClusterIP stays stable.

Service discovery feels like magic until something breaks. Then it is just four boxes with names.

Key takeaway

Kubernetes gives you one stable DNS name and hides the churn underneath. CoreDNS, the Service IP, and the Endpoints object work together so pods can come and go without breaking callers.

Originally posted on LinkedIn. View original.


All Rights Reserved.