Kubernetes
API server
pod logs
container orchestration
cloud infrastructure

Kubernetes API server , serving pod logs

Master System Design with Codemia

Enhance your system design skills with over 120 practice problems, detailed solutions, and hands-on exercises.

Introduction

When you run kubectl logs, it looks like the Kubernetes API server is streaming logs directly from your application container. In reality, the API server is brokering a request across multiple components. It authorizes your client, proxies the request to the correct node, and returns the byte stream. Understanding that path matters because log failures are often caused by RBAC, node connectivity, kubelet problems, or container runtime configuration, not by your app.

Teams usually debug logs from only one layer, then get stuck. For example, they grant pods/get but forget pods/log, or they troubleshoot API server health while the kubelet certificate is expired. A reliable mental model of how logs are served lets you diagnose issues quickly and avoid blind cluster restarts.

Core Sections

1. Know the request path for kubectl logs

The typical path is:

  1. kubectl sends a request to API server: GET /api/v1/namespaces/<ns>/pods/<pod>/log.
  2. API server authenticates and authorizes the caller.
  3. API server resolves pod-to-node and forwards to the kubelet on that node.
  4. Kubelet reads logs from the container runtime log files and streams back.

You can inspect behavior with verbose output:

bash
kubectl logs my-pod -n prod -c web --v=8

If this hangs after authorization, suspect kubelet connectivity, node readiness, or network policies between control plane and nodes.

2. Validate permissions explicitly

Reading pod logs requires get on the pods/log subresource. Many roles miss this detail.

yaml
1apiVersion: rbac.authorization.k8s.io/v1
2kind: Role
3metadata:
4  name: pod-log-reader
5  namespace: prod
6rules:
7  - apiGroups: [""]
8    resources: ["pods", "pods/log"]
9    verbs: ["get", "list"]

Bind it and verify access:

bash
kubectl auth can-i get pods/log -n prod --as=system:serviceaccount:prod:ci-bot

This avoids misattributing “forbidden” errors to cluster instability.

3. Troubleshoot streaming edge cases

Large or long-lived streams can fail due to idle timeouts in proxies or API gateways in front of your control plane. For busy debugging sessions, scope and shorten requests:

bash
kubectl logs my-pod -n prod --since=10m --tail=500
kubectl logs my-pod -n prod -c web -f

For crash loops, use previous logs:

bash
kubectl logs my-pod -n prod --previous

If --previous is empty, the container may not have produced logs before exiting, or rotation removed old files.

4. Confirm node and runtime health

Because kubelet and runtime serve the actual content, node-level diagnostics are essential.

bash
kubectl get nodes
kubectl describe node <node-name>
kubectl describe pod my-pod -n prod

Look for NotReady nodes, disk pressure, or runtime restarts. On managed services, node upgrades or control plane to node TLS drift can break log access while workloads still appear “Running.”

Common Pitfalls

  • Granting pods/get but not pods/log, then chasing nonexistent API server bugs.
  • Ignoring container name in multi-container pods and reading the wrong stream by default.
  • Using -f through unstable ingress/proxy paths that terminate idle or long-running HTTP streams.
  • Forgetting --previous for crash loops, which hides the failure right before restart.
  • Assuming running pods guarantee readable logs even when kubelet or node runtime is degraded.

Summary

The API server serves pod logs as a secure proxy, not a raw log store. It enforces authz/authn, resolves pod location, and relays data from kubelet and container runtime. When log retrieval fails, debug each hop in order: RBAC for pods/log, API server request behavior, node/kubelet reachability, and runtime log availability. This layered approach turns a vague “kubectl logs is broken” incident into a short, deterministic checklist and dramatically reduces time to recovery.

A practical way to keep this issue from returning is to turn the fix into a lightweight runbook. Capture the exact environment assumptions (tool versions, runtime flags, cluster or platform settings, and required dependencies), then store a short verification command sequence that any teammate can run from a clean setup. This makes troubleshooting deterministic instead of person-dependent and reduces rework during on-call incidents.

It also helps to add one automated guardrail in CI or pre-deploy checks that validates the critical assumption described above. That guardrail might be a linter rule, a smoke test, a schema check, a policy validation step, or a minimal integration test. When the same class of failure is caught before release, teams spend less time on emergency debugging and more time on controlled improvements.


Course illustration
Course illustration

All Rights Reserved.