Bin size in Matplotlib Histogram
Master System Design with Codemia
Enhance your system design skills with over 120 practice problems, detailed solutions, and hands-on exercises.
Introduction
The most important choice in a histogram is not the color or the label, but the bins. In Matplotlib, bin size controls how raw numeric data is grouped, and a good choice can reveal the distribution while a bad one can hide structure or exaggerate noise.
What Bin Size Means
A histogram divides the numeric range into intervals and counts how many values fall into each interval. In Matplotlib, you usually control this with the bins argument to plt.hist.
Here, bins=10 does not mean each bin has width 10. It means Matplotlib should split the data range into ten intervals. The actual bin width depends on the minimum and maximum values in the data.
Number of Bins Versus Bin Edges
There are two common ways to control the histogram:
- give an integer such as
10to request that many bins - give an explicit sequence of bin edges when you want precise control
Explicit edges are useful when the bins must match a business or scientific rule:
This creates bins of width 5, regardless of the exact sample minimum and maximum.
Why Bin Choice Changes the Story
If the bins are too wide, the histogram becomes overly smooth and can hide multiple peaks. If the bins are too narrow, the plot becomes jagged and starts emphasizing sampling noise instead of the underlying distribution.
That is why the same data can appear unimodal with one bin setting and multimodal with another. The chart did not change the data. It changed the grouping rule.
Automatic Bin Rules in Matplotlib
Matplotlib supports several named strategies through the bins parameter, including auto, sturges, fd, and scott.
These rules estimate a useful bin size from the data. For example:
- '
sturgesis simple and often reasonable for smaller datasets' - '
fduses the interquartile range and is often better for larger or skewed samples' - '
autolets NumPy choose between built-in heuristics'
Automatic rules are a strong starting point, but they are not a substitute for judgment. If the plot is meant for decision-making, look at more than one setting.
Comparing Histograms Fairly
If you compare two groups, keep the bin edges consistent across both histograms. Otherwise the visual comparison can be misleading because each group is being summarized with a different grid.
Using the same edges makes the comparison honest.
Bin Width and Density
If you change bin size, the raw counts in each bar also change. That is normal because a wider bin covers more values. When you want the histogram to represent probability density rather than raw count, use density=True.
This does not remove the need to think about bins, but it helps when comparing distributions with different sample sizes.
Common Pitfalls
- Confusing the number of bins with the width of each bin.
- Comparing two histograms that use different bin edges.
- Trusting a single automatic rule without checking whether it fits the data shape.
- Using too many bins and then reading noise as meaningful structure.
- Using too few bins and then missing important features such as skew or multiple peaks.
Summary
- Histogram bin size controls how numeric values are grouped.
- '
bins=10means ten intervals, not width ten.' - Explicit bin edges are best when you need exact control.
- Automatic rules such as
fdandautoare useful starting points, not final truth. - Consistent bin edges matter when comparing distributions across groups.

