Caching & Performance Networking & Load Balancing

How Distributed Tracing Actually Stitches a Request Together

May 8, 2026

Distributed tracing sounds magical until you see the mechanism, which turns out to be three identifiers and a discipline about passing them along.

The first time a request enters your system, usually at the edge or the API gateway, the tracing library generates a trace_id. That ID will follow this request everywhere it goes, for the rest of its life. The gateway also creates its first span_id to represent the work it is about to do. A span is one unit of work in one service: name, start time, end time, status, and a bag of attributes.

When the gateway calls the next service, two things happen. The library serializes the trace context into HTTP headers, conventionally the W3C traceparent header. That header carries the trace_id, the current span_id, and a couple of flags including the sampling decision. The receiving service deserializes the header, creates its own new span_id for the work it is doing, and sets parent_id to the span that called it. That is the entire mechanism. Every span knows its trace, and every span knows its parent. The backend reassembles the tree.

The format matters because it is the contract that lets a Node service, a Java service, and a Go service all participate in one trace. OpenTelemetry is the standardized library set across languages. Jaeger and Tempo are common backends that store the spans and let you query them. The instrumentation auto-attaches headers for popular HTTP clients, gRPC, and most message brokers, which is the only reason this works at all without humans wiring it by hand.

The classic failure is a missing span. One service in the middle of the call graph forgets to propagate the headers, often because the team there hand-rolls an HTTP client or copies a request body into a new call without copying the headers along with it. The downstream services do receive a request, but with no incoming trace context. So they start a brand new trace. The result in the UI is two disconnected traces that should have been one, with a gap right where the forgetful service sits. The bug is usually invisible until someone tries to debug a latency issue and notices the call graph stops short. The fix is to use the framework's outbound HTTP client wrapper, not roll your own.

Sampling is the last piece. Storing every span from every request would crush both the backend and the wallet. Head sampling decides at the root whether to record the trace, then sets a flag in traceparent so every downstream service makes the same call. Tail sampling lets you keep all errors and slow traces while throwing away most of the boring fast ones, at the cost of more infrastructure to do the late decision. Both are valid. Both beat keeping nothing.

Key takeaway

Distributed tracing is identifier propagation. A trace_id, a span_id, and a parent_id ride along through HTTP headers like W3C traceparent. Miss one hop and the trace breaks in the most confusing way possible.

Originally posted on LinkedIn. View original.