The Query Planner Is Guessing, and Stale Statistics Make It Guess Wrong

February 13, 2026

The query planner is a cost-based optimizer. It looks at your SQL, generates plausible physical plans, estimates the cost of each one using row count statistics, and picks the cheapest. That sounds principled, and it usually is. Until the estimates are wrong.

Three decisions matter most. The planner picks a scan type: sequential scan, index scan, or index-only scan if the index covers every referenced column. It picks a join order: which table to drive from, which to probe into. It picks a join method: nested loop when the inner side is tiny, hash join when both sides are large, merge join when both inputs are already sorted on the join key. Each choice depends on knowing roughly how many rows flow through each operator.

EXPLAIN ANALYZE is the tool that exposes the gap. It prints the plan the optimizer chose alongside the actual row counts at runtime. When the estimated rows column says 10 and the actual rows column says 1.2 million, you have found the bug. The planner thought a nested loop over 10 rows was fine. The nested loop actually ran 1.2 million times.

The most common cause is stale statistics. Postgres keeps per-column samples in pg_statistic for selectivity estimates: most common values, histograms, null fractions, distinct counts. After a bulk load, those samples reflect the old shape of the table. Function calls on indexed columns also hide selectivity from the planner, because WHERE lower(email) = 'x' cannot use the histogram on email directly without a matching expression index.

The failure I have personally cleaned up looked like this. A team ran a nightly job that bulk imported a few million rows into an analytics table, then ran a report query against it. The first night after the import, the report took 25 minutes. The same query, the next day after autovacuum had run, took 30 seconds. The cause was pg_statistic still describing the table as if it had 10,000 rows. The planner estimated 10 matching rows on a filter that actually produced 1.2 million, chose a nested loop join into a much larger table, and walked the larger table once per row.

Two fixes mattered. The first was running ANALYZE at the end of the import job, before any user query touched the table. The second was raising the per-column statistics target from the default 100 to 1000 on the heavily skewed join columns, so the histogram resolution actually captured the long tail.

The planner is not magic. It is a hypothesis engine fed by statistics. Keep the statistics honest and most plan regressions never happen.

Key takeaway

A query plan is a hypothesis built from statistics. When the statistics are stale or the column is skewed, the planner picks the wrong join and your fast query becomes a 25 minute scan. Run ANALYZE after bulk loads and raise the statistics target on skewed columns.

Originally posted on LinkedIn. View original.