Elasticsearch Indexing vs Search: The Two Paths That Define the System

December 23, 2025

Elasticsearch feels like one product and is really two. The index path and the search path share almost no code. They share shards, and that is most of what connects them. Once you see the seam, the system stops feeling magical and starts feeling explicable.

The index path begins when a JSON document hits the cluster. The coordinator routes it to the primary shard for that document's hash. The shard runs the field through an analyzer chain: a character filter cleans up HTML or accents, a tokenizer breaks the text into terms, and a token filter lowercases, stems, removes stopwords, and emits synonyms. The output is a stream of terms. Each term gets appended to an in-memory buffer along with the document ID and position.

That buffer is not yet searchable. Every refresh interval, which defaults to 1 second, the buffer is flushed to a new immutable segment. The segment is a self-contained inverted index: term to posting list, with skip lists and term frequencies. Lucene merges small segments into bigger ones in the background, similar to LSM compaction. Until the refresh fires, the document exists in the cluster but is invisible to search. That is the meaning of "near-real-time" in Elasticsearch marketing.

The search path is independent. A query hits a coordinating node, which can be any node in the cluster. The coordinator parses the Query DSL, identifies which indices and shards are relevant, and fans out the query to one copy of each shard, primary or replica. Each shard executes locally against its own segments, scoring documents using BM25 by default, and returns its top K hits with scores. The coordinator merges those per-shard top-K lists into a global top K, then fires a second round trip to fetch the actual document sources for the survivors. This is the famous query-then-fetch pattern.

Two failure modes to know. The first is shard fan-out cost. Every search hits every relevant shard. If you have 1000 shards in an index and a query that does not benefit from routing, every search becomes 1000 small queries, and the coordinator's CPU and heap become your bottleneck long before the data nodes do. Fewer, larger shards usually beats more, smaller shards.

The second is refresh pressure. Dropping the refresh interval to 100ms to chase real-time creates 10 segments per shard per second, which forces aggressive merging and burns IO that your queries also need. If your workload tolerates a 30-second lag, set the interval to 30 seconds and watch indexing throughput jump. Near-real-time is a knob, not a constant.

Key takeaway

Elasticsearch is two engines bolted together: a write pipeline that runs analyzer chains and builds segment-level inverted indexes, and a read pipeline that fans queries out across shards and merges scored hits. The refresh interval is the dial that decides how near-real-time near-real-time actually is.

Originally posted on LinkedIn. View original.