Introduction
The primary way to count value frequencies in a pandas DataFrame column is df['column'].value_counts(), which returns a Series of unique values sorted by frequency (most common first). For relative frequencies, pass normalize=True to get proportions. For grouping and counting across multiple columns, use df.groupby('column').size() or df.groupby('column')['other'].count(). These methods cover virtually all frequency-counting scenarios in data analysis.
value_counts — The Standard Approach
1import pandas as pd
2
3df = pd.DataFrame({
4 'city': ['NYC', 'LA', 'NYC', 'Chicago', 'LA', 'NYC', 'Chicago', 'NYC'],
5 'department': ['Sales', 'Engineering', 'Sales', 'Sales', 'HR', 'Engineering', 'Sales', 'HR']
6})
7
8# Count frequency of each value
9print(df['city'].value_counts())
10# NYC 4
11# LA 2
12# Chicago 2
13# Name: city, dtype: int64
14
15# Relative frequency (proportions)
16print(df['city'].value_counts(normalize=True))
17# NYC 0.50
18# LA 0.25
19# Chicago 0.25
20# Name: city, dtype: float64
21
22# Sort alphabetically instead of by count
23print(df['city'].value_counts().sort_index())
24
25# Include NaN values in the count
26print(df['city'].value_counts(dropna=False))
Binning Continuous Data
value_counts can bin numeric data into ranges:
1df = pd.DataFrame({'age': [22, 35, 28, 45, 52, 31, 67, 41, 29, 38]})
2
3# Automatic binning into 4 equal-width bins
4print(df['age'].value_counts(bins=4))
5# (21.955, 33.25] 4
6# (33.25, 44.5] 3
7# (44.5, 55.75] 2
8# (55.75, 67.0] 1
9# Name: age, dtype: int64
10
11# Custom bins with pd.cut
12bins = [0, 18, 30, 50, 100]
13labels = ['<18', '18-30', '31-50', '50+']
14df['age_group'] = pd.cut(df['age'], bins=bins, labels=labels)
15print(df['age_group'].value_counts())
groupby for Multi-Column Counts
1df = pd.DataFrame({
2 'city': ['NYC', 'LA', 'NYC', 'LA', 'NYC', 'LA'],
3 'department': ['Sales', 'Sales', 'Engineering', 'Sales', 'Sales', 'Engineering'],
4 'salary': [50000, 55000, 70000, 52000, 48000, 72000]
5})
6
7# Count per group
8print(df.groupby('city').size())
9# city
10# LA 3
11# NYC 3
12# dtype: int64
13
14# Count per combination of columns
15print(df.groupby(['city', 'department']).size())
16# city department
17# LA Engineering 1
18# Sales 2
19# NYC Engineering 1
20# Sales 2
21# dtype: int64
22
23# Reset index to get a regular DataFrame
24counts = df.groupby(['city', 'department']).size().reset_index(name='count')
25print(counts)
26# city department count
27# 0 LA Engineering 1
28# 1 LA Sales 2
29# 2 NYC Engineering 1
30# 3 NYC Sales 2
crosstab for Frequency Tables
1# Cross-tabulation of two columns
2print(pd.crosstab(df['city'], df['department']))
3# department Engineering Sales
4# city
5# LA 1 2
6# NYC 1 2
7
8# With margins (row and column totals)
9print(pd.crosstab(df['city'], df['department'], margins=True))
10
11# Normalized by row (percentage within each city)
12print(pd.crosstab(df['city'], df['department'], normalize='index'))
Counting with Conditions
1df = pd.DataFrame({
2 'product': ['A', 'B', 'A', 'C', 'B', 'A', 'B', 'C'],
3 'status': ['sold', 'returned', 'sold', 'sold', 'sold', 'returned', 'sold', 'sold']
4})
5
6# Count only rows matching a condition
7sold_counts = df[df['status'] == 'sold']['product'].value_counts()
8print(sold_counts)
9# B 2
10# A 2
11# C 2
12# Name: product, dtype: int64
13
14# Count unique values
15print(df['product'].nunique()) # 3
16
17# Boolean counting: how many meet a condition
18print((df['status'] == 'sold').sum()) # 6
1import numpy as np
2
3# Large DataFrame for benchmarking
4df = pd.DataFrame({'col': np.random.choice(['A', 'B', 'C', 'D'], size=1_000_000)})
5
6# value_counts — optimized C implementation, fastest
7df['col'].value_counts()
8
9# groupby + size — slightly slower, more flexible
10df.groupby('col').size()
11
12# collections.Counter — Python-level, slower for large data
13from collections import Counter
14Counter(df['col'])
15
16# numpy.unique — fast for numpy arrays
17np.unique(df['col'].values, return_counts=True)
For a single column, value_counts() is fastest. For multi-column grouping, groupby().size() is the right tool.
Converting Results
1counts = df['col'].value_counts()
2
3# To a dictionary
4count_dict = counts.to_dict()
5# {'A': 250312, 'C': 250100, 'B': 249894, 'D': 249694}
6
7# To a DataFrame
8count_df = counts.reset_index()
9count_df.columns = ['value', 'frequency']
10
11# Filter to values appearing more than N times
12frequent = counts[counts > 250000]
Common Pitfalls
Confusing size() with count() in groupby: size() counts all rows including NaN values, while count() excludes NaN. Use size() when you want the total number of rows per group, and count() when you want non-null entries for a specific column.
Forgetting dropna=False for NaN counting: By default, value_counts() excludes NaN values. If your analysis depends on knowing how many missing values exist, pass dropna=False explicitly.
Using len(df.groupby('col')) instead of nunique(): len(df.groupby('col')) returns the number of unique groups (same as nunique()), but nunique() is more readable and works directly on a Series without groupby overhead.
Expecting value_counts() on the entire DataFrame: value_counts() works on a Series (single column). Calling df.value_counts() counts unique row combinations across all columns, which is rarely what you want. Always specify the column.
Not resetting the index after groupby counts: groupby().size() returns a Series with a MultiIndex. To use the result as a regular DataFrame (for merging, exporting, or plotting), call .reset_index(name='count').
Summary
Use df['col'].value_counts() for single-column frequency counts (fastest, most common)
Pass normalize=True for proportions or bins=N for numeric binning
Use df.groupby(['col1', 'col2']).size() for multi-column frequency counts
Use pd.crosstab() for two-way frequency tables with optional margins and normalization
Call .reset_index(name='count') to convert grouped results back to a regular DataFrame