Postgres VACUUM and Bloat: Why Your Hot Table Quietly Gets Slow

February 9, 2026

Postgres uses MVCC, which means an UPDATE does not modify a row in place. It writes a new row version and marks the old one as obsolete. A DELETE does not free space either, it just marks the row as no longer visible to new transactions. Every old version sits in the table until VACUUM removes it.

VACUUM cannot remove a dead tuple while any currently open transaction might still need to see it. MVCC's whole correctness story is that a transaction sees the database as of its start time. If a long-running transaction started before the row was deleted, it must still be able to read that row. VACUUM honors this by computing a horizon: the oldest snapshot any active transaction holds. Anything dead before that horizon can go. Anything dead after it has to stay.

Autovacuum is the background daemon that runs VACUUM based on per-table churn thresholds. By default it kicks in when the number of dead tuples crosses roughly 20 percent of the live row count. On a hot table with constant writes, this means autovacuum is running almost continuously, and that is fine. The trouble starts when something pins the horizon and autovacuum is forced to skip rows that would otherwise be eligible.

Three things pin the horizon: a long-running transaction (an analytical query or a poorly written cron job), an idle in transaction session (a connection that ran BEGIN, did some work, and then never committed or rolled back), and an unused logical replication slot (the slot keeps a position the slot holder has not yet consumed). Any one of them is enough.

Bloat is the ratio of dead tuples to total tuples. A healthy hot table sits at single digit percent. A bloated table can be 80 or 90 percent dead. Sequential scans get slower, index lookups touch more pages, and the planner's row estimates drift because pg_stat_user_tables is stale. At some point the planner decides the index is not worth it and falls back to seq scans on a table that no longer fits in memory.

The production failure is almost always the same shape. A backend service has an ORM bug that leaks a connection in the middle of a transaction. The session sits idle in transaction for hours. Autovacuum on the orders table runs constantly but reclaims nothing. The table grows from 4 GB to 80 GB over a week. Latency on the simplest order lookup goes from 2 ms to 800 ms. Nothing in the application changed.

The fix is two settings. Set idle_in_transaction_session_timeout to a few minutes so leaked transactions get killed. Alert on pg_stat_activity rows where state = 'idle in transaction' and xact_start is older than your tolerance. Bloat is invisible until it is not.

Key takeaway

VACUUM cannot reclaim a dead tuple while any open transaction still might see it. One idle-in-transaction session is enough to bloat your hottest table until the planner gives up on the index.

Originally posted on LinkedIn. View original.