Fluentd
DaemonSet
Google Container Engine
Kubernetes
Troubleshooting

Fluentd pods running via daemonset getting terminated with warnings on google container engine

Master System Design with Codemia

Enhance your system design skills with over 120 practice problems, detailed solutions, and hands-on exercises.

Introduction

When Fluentd runs as a DaemonSet on GKE, pod terminations usually point to node pressure, resource limits, configuration crashes, or expected node lifecycle events rather than to DaemonSet logic itself. The fastest way to diagnose the problem is to distinguish between eviction, crash looping, and node-driven termination.

Start With Pod Status and Events

Begin by looking at the actual reason Kubernetes recorded:

bash
kubectl get pods -n kube-system
kubectl describe pod -n kube-system <fluentd-pod-name>
kubectl get events -n kube-system --sort-by=.lastTimestamp

The describe output will often reveal whether the pod was:

  • 'Evicted,'
  • 'OOMKilled,'
  • repeatedly restarted,
  • removed because the node was drained or replaced.

Those are very different failure modes.

Common Causes on GKE

Fluentd DaemonSet terminations commonly come from:

  • memory pressure from log buffering,
  • disk pressure on nodes,
  • aggressive resource limits,
  • bad output or parser configuration,
  • node upgrades or preemptions,
  • host path changes or permissions issues.

Because Fluentd reads logs continuously and may buffer aggressively during downstream slowdowns, resource-related failures are especially common.

Resource Pressure and OOM

If Fluentd is parsing large volumes of logs or backing up during output failures, it can consume more memory than expected. Check whether the container was killed for memory reasons.

Look for:

  • 'OOMKilled in the container state,'
  • high memory use in monitoring,
  • tight memory limits in the DaemonSet spec.

If that is the issue, increase requests and limits carefully or reduce buffering pressure in the Fluentd configuration.

Node Lifecycle and GKE Behavior

On GKE, nodes may be recreated, upgraded, drained, or preempted depending on cluster configuration. When that happens, DaemonSet pods are terminated and recreated elsewhere.

That is not a Fluentd bug. It is normal cluster behavior. Correlating pod terminations with node events often saves a lot of wasted application-level debugging. It also quickly tells you whether the issue is systemic or pod-specific. That distinction matters during incident response.

If warnings line up with node maintenance events, you should inspect node history rather than only the pod manifest.

Configuration and Output Failures

Fluentd can also terminate because the process itself crashes on bad configuration or downstream plugin problems.

Useful checks:

bash
kubectl logs -n kube-system <fluentd-pod-name>
kubectl logs -n kube-system <fluentd-pod-name> --previous

The --previous flag is important if the container has already restarted and the current logs are too new to show the crash.

Review DaemonSet Resources

A minimal resource section may be too optimistic for a noisy production cluster:

yaml
1resources:
2  requests:
3    cpu: "100m"
4    memory: "200Mi"
5  limits:
6    memory: "500Mi"

These values are just examples. The right settings depend on log volume, parsers, output latency, and buffer settings.

Common Pitfalls

  • Treating every Fluentd termination as a Fluentd configuration bug.
  • Ignoring --previous logs after a restart.
  • Looking only at pod status and not at node events on GKE.
  • Setting memory limits too low for bursty logging workloads.
  • Forgetting that DaemonSet pods will be terminated during normal node replacement events.

Summary

  • Start with kubectl describe pod and recent events to identify the real termination reason.
  • Distinguish among eviction, OOM, crash looping, and node lifecycle events.
  • Check previous logs if the container has already restarted.
  • Review resource requests, limits, and buffering configuration carefully.
  • On GKE, some pod terminations are expected results of node management rather than Fluentd failures.

Course illustration
Course illustration

All Rights Reserved.