Inverted Indexes: How Full-Text Search Stays Fast at a Billion Documents

March 26, 2026


When you type a query into a search box and get results in 40 milliseconds across a billion documents, you are not searching documents. You are reading a precomputed map.

That map is the inverted index. For every term in the corpus, it stores a posting list: the set of document IDs that contain the term, often with positions for phrase queries and term frequencies for ranking. A scan-the-documents approach is O(N * length). A posting-list lookup is O(matches). That is the whole performance argument.

Before the index is built, raw text passes through an analyzer. Tokenization splits on whitespace and punctuation. Lowercasing folds case. Stemming collapses running, ran, and runs to a single root. Stopword removal drops the, a, is, the high-frequency words that explode posting lists without adding signal. Get this pipeline wrong on ingest and you cannot fix it at query time. The query analyzer must match the index analyzer or nothing lines up.

Retrieval is two phases. First, boolean matching walks the posting lists for each query term and intersects or unions them. Second, ranking scores the surviving documents with BM25 or TF-IDF: terms that are rare across the corpus but frequent in a document push that document to the top. The ranker is where relevance lives. The index just produces the candidate set.

Lucene, and therefore Elasticsearch and OpenSearch, stores the index as immutable segments. Writes append a new segment. Deletes mark tombstones. A background process merges small segments into larger ones to keep the segment count bounded. Immutability is why search throughput is high and why a refresh interval exists at all.

Here is the production failure I keep watching. A logging pipeline ships every event into Elasticsearch with dynamic mapping enabled. A client library starts emitting field names derived from request IDs. Twelve million unique fields land in one cluster. The mapping itself bloats past the JVM heap. Queries time out, indexing pauses, the coordinating nodes start to OOM. The fix is not more heap. It is an explicit index template with dynamic: strict, a rejection of unexpected fields, and an upstream contract on what is allowed to be indexed.

The index is fast because it was designed for one access pattern. Respect that contract and the engine flies. Treat it like a schemaless dump and the same machinery becomes the slowest part of your stack. The lesson generalizes: any data structure that buys you orders of magnitude in one direction is buying it from somewhere, and the bill usually arrives in the form of an operator who forgot what the structure was for.

Key takeaway

Full-text search is fast because the index is inverted: every term points to the list of documents that contain it. Tokenization, ranking, and segment merges sit on top of that single idea.

Originally posted on LinkedIn. View original.


All Rights Reserved.