histogram
matplotlib
data visualization
bin size
python plotting

Bin size in Matplotlib Histogram

Master System Design with Codemia

Enhance your system design skills with over 120 practice problems, detailed solutions, and hands-on exercises.

Introduction

The most important choice in a histogram is not the color or the label, but the bins. In Matplotlib, bin size controls how raw numeric data is grouped, and a good choice can reveal the distribution while a bad one can hide structure or exaggerate noise.

What Bin Size Means

A histogram divides the numeric range into intervals and counts how many values fall into each interval. In Matplotlib, you usually control this with the bins argument to plt.hist.

python
1import numpy as np
2import matplotlib.pyplot as plt
3
4data = np.random.normal(loc=0, scale=1, size=1000)
5
6plt.hist(data, bins=10, edgecolor="black")
7plt.show()

Here, bins=10 does not mean each bin has width 10. It means Matplotlib should split the data range into ten intervals. The actual bin width depends on the minimum and maximum values in the data.

Number of Bins Versus Bin Edges

There are two common ways to control the histogram:

  • give an integer such as 10 to request that many bins
  • give an explicit sequence of bin edges when you want precise control

Explicit edges are useful when the bins must match a business or scientific rule:

python
1import numpy as np
2import matplotlib.pyplot as plt
3
4data = np.random.randint(0, 101, size=500)
5edges = np.arange(0, 105, 5)
6
7plt.hist(data, bins=edges, edgecolor="black")
8plt.show()

This creates bins of width 5, regardless of the exact sample minimum and maximum.

Why Bin Choice Changes the Story

If the bins are too wide, the histogram becomes overly smooth and can hide multiple peaks. If the bins are too narrow, the plot becomes jagged and starts emphasizing sampling noise instead of the underlying distribution.

That is why the same data can appear unimodal with one bin setting and multimodal with another. The chart did not change the data. It changed the grouping rule.

Automatic Bin Rules in Matplotlib

Matplotlib supports several named strategies through the bins parameter, including auto, sturges, fd, and scott.

python
1import numpy as np
2import matplotlib.pyplot as plt
3
4data = np.random.lognormal(mean=0.0, sigma=0.8, size=1000)
5
6plt.hist(data, bins="fd", edgecolor="black")
7plt.show()

These rules estimate a useful bin size from the data. For example:

  • 'sturges is simple and often reasonable for smaller datasets'
  • 'fd uses the interquartile range and is often better for larger or skewed samples'
  • 'auto lets NumPy choose between built-in heuristics'

Automatic rules are a strong starting point, but they are not a substitute for judgment. If the plot is meant for decision-making, look at more than one setting.

Comparing Histograms Fairly

If you compare two groups, keep the bin edges consistent across both histograms. Otherwise the visual comparison can be misleading because each group is being summarized with a different grid.

python
1import numpy as np
2import matplotlib.pyplot as plt
3
4a = np.random.normal(0, 1, size=1000)
5b = np.random.normal(1, 1, size=1000)
6edges = np.linspace(-4, 5, 20)
7
8plt.hist(a, bins=edges, alpha=0.5, label="A")
9plt.hist(b, bins=edges, alpha=0.5, label="B")
10plt.legend()
11plt.show()

Using the same edges makes the comparison honest.

Bin Width and Density

If you change bin size, the raw counts in each bar also change. That is normal because a wider bin covers more values. When you want the histogram to represent probability density rather than raw count, use density=True.

python
plt.hist(data, bins=20, density=True, edgecolor="black")

This does not remove the need to think about bins, but it helps when comparing distributions with different sample sizes.

Common Pitfalls

  • Confusing the number of bins with the width of each bin.
  • Comparing two histograms that use different bin edges.
  • Trusting a single automatic rule without checking whether it fits the data shape.
  • Using too many bins and then reading noise as meaningful structure.
  • Using too few bins and then missing important features such as skew or multiple peaks.

Summary

  • Histogram bin size controls how numeric values are grouped.
  • 'bins=10 means ten intervals, not width ten.'
  • Explicit bin edges are best when you need exact control.
  • Automatic rules such as fd and auto are useful starting points, not final truth.
  • Consistent bin edges matter when comparing distributions across groups.

Course illustration
Course illustration

All Rights Reserved.