How to enable kube-system/metrics-server from status False MissingEndpoints?
Master System Design with Codemia
Enhance your system design skills with over 120 practice problems, detailed solutions, and hands-on exercises.
Introduction
When the metrics-server APIService shows False with a MissingEndpoints message, Kubernetes is telling you that the service backing that API has no ready endpoints. In practical terms, the control plane cannot reach a healthy metrics-server pod through the expected service, so commands such as kubectl top will fail.
What MissingEndpoints Usually Means
The metrics-server deployment normally exposes a service in the kube-system namespace. That service selects the metrics-server pods, and Kubernetes publishes endpoints only for pods that are both selected and ready.
If the APIService says MissingEndpoints, one of these is usually true:
- the pods are not running
- the pods are running but not ready
- the service selector does not match the pod labels
- the deployment is failing before readiness succeeds
That is why the fix starts with inspection, not with blind reinstallation.
Check the Deployment, Service, and APIService
Start by looking at the objects that participate in the chain.
You want to confirm three things:
- the deployment has available replicas
- the service points at the correct labels
- the pod logs do not show kubelet TLS or address-selection failures
If the endpoints object is empty, the service is not seeing any ready pods. That narrows the problem quickly.
Verify Labels and Readiness
A surprisingly common issue is a selector mismatch. The service may be looking for one label set while the deployment template uses another. Compare both sides directly.
If labels match but endpoints are still missing, inspect readiness. A pod can be Running and still not be eligible as a service endpoint if its readiness probe is failing.
You should also check events:
Look for probe failures, certificate errors, DNS issues, or network-policy denials.
Fix Common Kubelet Connectivity Problems
On many self-managed clusters, metrics-server starts but cannot talk to kubelets because of certificate validation or because it is trying the wrong node address type. When that happens, the pods often never become ready, which leads to MissingEndpoints.
One common fix is to add kubelet flags to the deployment:
Use --kubelet-insecure-tls only if your cluster setup requires it and you understand the tradeoff. It is common in lab clusters and some bootstrap environments, but it is not the first choice for hardened production setups.
After the patch, wait for a rollout and recheck:
When Reapplying the Official Manifest Helps
If the installation is old or heavily modified, reapplying the upstream manifest for your cluster version can be faster than piecemeal repair. That is especially true when the service, RBAC rules, or security context drifted from the expected state.
Still, do not skip the diagnosis steps. Reapplying manifests will not fix a cluster-level network or kubelet certificate problem on its own.
Common Pitfalls
The biggest mistake is treating MissingEndpoints as an RBAC issue by default. RBAC can break metrics access, but this specific message usually points at service-to-pod wiring or pod readiness first.
Another mistake is looking only at pod phase. A Running pod is not enough. If readiness fails, the service still has no endpoints, and the APIService remains unhealthy.
Be careful with blanket use of --kubelet-insecure-tls. It is often copied from answers without understanding why it helped. If your kubelet certificates are correct, you may not need it at all.
Finally, avoid patching only the service when the real failure is inside the deployment. Empty endpoints are usually a symptom, not the root cause.
Summary
- '
MissingEndpointsmeans themetrics-serverservice has no ready backend pods.' - Check the deployment, service, endpoints, APIService, and pod logs together.
- Verify that service selectors match pod labels and that readiness succeeds.
- Fix kubelet certificate or address issues when logs point there.
- Reapply the official manifest only after you understand whether the failure is config drift or cluster connectivity.

