Database Partitioning Strategies: Range, Hash, List, and the Composite You Wish You Had Picked
February 16, 2026
Partitioning splits a logical table into smaller physical pieces so a query touches only a slice of the data. There are three primitive strategies, and most production systems eventually combine them.
Range partitioning bins rows by a continuous key. Time-series data is the canonical fit: one partition per day or per month, drop the oldest by detaching a partition rather than running DELETE. The failure pattern is range correlated with hotness. If the partition key is created_at and 95 percent of reads target the last 24 hours, one partition gets all the heat and the rest of the cluster idles.
Hash partitioning runs the key through a hash function and assigns the result to a bucket. Distribution is statistically uniform across keys with reasonable cardinality. The cost is that range scans now have to fan out across every partition, because adjacent keys land in unrelated buckets.
List partitioning maps explicit values to specific partitions. Useful when the key has a small, stable domain like region code or product line. Adding a new value requires a schema change, which is fine for a list of countries and miserable for a list of customers.
Composite partitioning combines them. The most common pairing is hash of tenant on the outer level and range of time on the inner level. Each tenant's writes spread evenly across the cluster, and within a tenant the latest day clusters together so retention drops are cheap.
The production failure I keep seeing comes from optimizing for write distribution and forgetting that reads have to live in the same world. An e-commerce team hash-partitioned orders by customer_id. Writes spread cleanly across 16 shards. Then the analytics team shipped a dashboard that joined orders to regions to compute revenue by country. Customer-to-region was not part of the partition key, so the planner had to broadcast the join across all 16 shards for every dashboard refresh. Cross-shard fan-out latency climbed from 80 ms to 4 seconds, and the dashboard rendered three times per minute.
The fix was not to re-partition. The fix was a denormalized region column on the orders shard and a secondary partition key tuple of (customer_id, region) so the planner could prune. The data engineering team also moved the dashboard query to a materialized view refreshed every five minutes, because no partitioning strategy makes a 16-way broadcast cheap.
Pick the partition key by listing your top five queries and seeing which one cannot afford to fan out. That query owns the key. Everything else lives with a secondary index or a denormalized column.
Partitioning is a trade between write distribution and read locality. The strategy is correct only when it matches the query patterns you actually run, not the ones in the design doc.
Originally posted on LinkedIn. View original.