Debugging DNS resolutions in kubernetes
Master System Design with Codemia
Enhance your system design skills with over 120 practice problems, detailed solutions, and hands-on exercises.
Introduction
When DNS breaks in Kubernetes, the symptom is usually simple: a Pod cannot resolve a service name or an external host. The fix is rarely one command, because Kubernetes DNS depends on several layers working together: the Pod configuration, the cluster DNS service, network connectivity, and the target record itself.
Start from inside a Pod
The first question is whether the problem happens from the perspective of the workload. Launch a temporary Pod and test resolution there.
Inside the shell:
This tells you:
- whether cluster-internal names resolve
- whether external names resolve
- which nameserver and search domains the Pod is using
Without this baseline, it is hard to know whether the problem is specific to one application or to the cluster DNS path more broadly.
Understand what /etc/resolv.conf should look like
A normal Pod often gets a resolver file containing:
- the cluster DNS service IP as
nameserver - search suffixes such as
svc.cluster.local
Typical example:
If the nameserver is missing or unexpected, the Pod may not be using cluster DNS the way you think it is.
Check CoreDNS or kube-dns
Most modern clusters use CoreDNS. Verify that the DNS Pods are healthy:
Then inspect logs:
If the CoreDNS Pods are crash-looping, throttled, or logging upstream failures, the problem may have nothing to do with the application Pod itself.
Test service resolution and endpoints together
If a service name resolves but traffic still fails, DNS may not be the real issue. Check whether the Service actually has endpoints.
A DNS lookup can succeed while the Service still has no backing Pods. In that case, the resolver is fine and the application wiring is broken elsewhere.
Look for network policies and DNS port blocks
Cluster DNS usually depends on traffic to port 53 over UDP and sometimes TCP. A restrictive NetworkPolicy can block those requests.
Check policies in the namespace:
If egress is locked down, make sure the workload can reach the cluster DNS service on the needed ports.
This is a common cause when one namespace has DNS issues and another does not.
Check Pod DNS settings
Pods can override the default DNS behavior through fields such as:
- '
dnsPolicy' - '
dnsConfig'
For example:
If a Pod uses dnsPolicy: Default, it may inherit node-level DNS behavior instead of the normal Kubernetes cluster-first resolver configuration.
That can be correct for special workloads, but it is also a frequent cause of confusion when copied from another manifest without understanding the tradeoff.
Verify external resolution separately
If internal names resolve but external names do not, the cluster DNS service may be healthy while its upstream forwarding path is not.
From the test Pod:
If only the second query fails, inspect CoreDNS configuration and upstream reachability rather than the service-discovery side of Kubernetes.
Debug one layer at a time
A good DNS troubleshooting flow is:
- confirm the failure from inside a Pod
- inspect
/etc/resolv.conf - verify CoreDNS health and logs
- test the Service and its endpoints
- check namespace network policies
- review Pod DNS overrides
That order keeps you from guessing too early.
Common Pitfalls
The biggest mistake is assuming every "cannot reach service" issue is a DNS problem. If the service resolves but has no endpoints, DNS is not the root cause.
Another issue is testing only from your workstation or from a node rather than from inside an affected Pod. Kubernetes DNS behavior is defined at the Pod level, so the Pod perspective matters most.
Developers also forget about dnsPolicy and dnsConfig. A copied manifest can silently opt out of the normal cluster DNS behavior.
Finally, restrictive NetworkPolicy rules often block DNS traffic in ways that look like resolver failure. Always check whether the Pod can actually talk to the DNS service.
Summary
- Start DNS debugging from inside an affected Pod.
- Inspect
/etc/resolv.confto verify the nameserver and search domains. - Check CoreDNS health and logs in
kube-system. - Validate Service endpoints so you do not confuse service wiring with DNS resolution.
- Review network policies and Pod DNS settings when the failure is namespace-specific.

