Time-Series Database Internals: Why TSDBs Look Nothing Like Postgres
February 19, 2026
A time-series database is a database that gave up something on purpose. It gave up random updates, it gave up secondary write paths, it gave up the idea that any row will ever be modified. In return it got the ability to absorb a million points per second on hardware that would melt under that load running Postgres.
The shape of the workload is the whole story. Metrics arrive in append order, tagged with a timestamp that is almost always greater than the last one. Queries read windows: the last 5 minutes, yesterday between 9 and 10, the same hour a week ago. Nothing in that pattern needs B-tree updates or row-level locking. So TSDBs throw both away.
The first thing a TSDB does differently is chunk by time. InfluxDB calls these shards. Prometheus calls them blocks. TimescaleDB calls them chunks. Same idea: every two hours, or every day, the engine closes the current file and starts a new one. Queries that ask for "last hour" only touch the live chunk. Queries that ask for "October 2024" touch a handful of immutable files and ignore everything else. The planner does this with a single range comparison on the chunk metadata, not an index scan.
Inside a chunk the layout is columnar. Every series stores its timestamps in one column and its values in another, both compressed hard. Timestamps compress to almost nothing because deltas between adjacent samples are tiny and regular. Values compress with XOR encoding or Gorilla, which Facebook published in 2015 and which now underlies most modern TSDBs. A billion samples can fit in a few hundred megabytes on disk.
Downsampling is the second lever. Raw points at 1-second resolution are useful for an hour, marginally useful for a day, and a waste of disk after that. So the engine runs continuous aggregates: every 5 minutes, take the average, min, max, and count of the last 5 minutes of raw data, and store that as a new series. After 7 days, drop the raw points. The dashboard still works because it was reading the 5-minute rollup anyway.
The production failure mode worth knowing: cardinality explosion. If you tag metrics with anything unbounded, like a user ID or a request ID, every unique combination becomes its own series with its own index entry. A Prometheus instance that was happy at 200k series will OOM at 5M. The fix is upstream, in the metric naming, not in the database. TSDBs are tuned for many points per series, not many series per point.
Time-series workloads are insert-only, time-ordered, and read in windows. That shape lets a TSDB skip everything a general OLTP store has to defend against, which is why a Prometheus block can hold a billion samples in a few hundred megabytes.
Originally posted on LinkedIn. View original.