Pandas
Data Analysis
Dataframe
Python
Frequency Count

Count the frequency that a value occurs in a dataframe column

Master System Design with Codemia

Enhance your system design skills with over 120 practice problems, detailed solutions, and hands-on exercises.

Introduction

The primary way to count value frequencies in a pandas DataFrame column is df['column'].value_counts(), which returns a Series of unique values sorted by frequency (most common first). For relative frequencies, pass normalize=True to get proportions. For grouping and counting across multiple columns, use df.groupby('column').size() or df.groupby('column')['other'].count(). These methods cover virtually all frequency-counting scenarios in data analysis.

value_counts — The Standard Approach

python
1import pandas as pd
2
3df = pd.DataFrame({
4    'city': ['NYC', 'LA', 'NYC', 'Chicago', 'LA', 'NYC', 'Chicago', 'NYC'],
5    'department': ['Sales', 'Engineering', 'Sales', 'Sales', 'HR', 'Engineering', 'Sales', 'HR']
6})
7
8# Count frequency of each value
9print(df['city'].value_counts())
10# NYC        4
11# LA         2
12# Chicago    2
13# Name: city, dtype: int64
14
15# Relative frequency (proportions)
16print(df['city'].value_counts(normalize=True))
17# NYC        0.50
18# LA         0.25
19# Chicago    0.25
20# Name: city, dtype: float64
21
22# Sort alphabetically instead of by count
23print(df['city'].value_counts().sort_index())
24
25# Include NaN values in the count
26print(df['city'].value_counts(dropna=False))

Binning Continuous Data

value_counts can bin numeric data into ranges:

python
1df = pd.DataFrame({'age': [22, 35, 28, 45, 52, 31, 67, 41, 29, 38]})
2
3# Automatic binning into 4 equal-width bins
4print(df['age'].value_counts(bins=4))
5# (21.955, 33.25]    4
6# (33.25, 44.5]      3
7# (44.5, 55.75]      2
8# (55.75, 67.0]      1
9# Name: age, dtype: int64
10
11# Custom bins with pd.cut
12bins = [0, 18, 30, 50, 100]
13labels = ['<18', '18-30', '31-50', '50+']
14df['age_group'] = pd.cut(df['age'], bins=bins, labels=labels)
15print(df['age_group'].value_counts())

groupby for Multi-Column Counts

python
1df = pd.DataFrame({
2    'city': ['NYC', 'LA', 'NYC', 'LA', 'NYC', 'LA'],
3    'department': ['Sales', 'Sales', 'Engineering', 'Sales', 'Sales', 'Engineering'],
4    'salary': [50000, 55000, 70000, 52000, 48000, 72000]
5})
6
7# Count per group
8print(df.groupby('city').size())
9# city
10# LA     3
11# NYC    3
12# dtype: int64
13
14# Count per combination of columns
15print(df.groupby(['city', 'department']).size())
16# city  department
17# LA    Engineering    1
18#       Sales          2
19# NYC   Engineering    1
20#       Sales          2
21# dtype: int64
22
23# Reset index to get a regular DataFrame
24counts = df.groupby(['city', 'department']).size().reset_index(name='count')
25print(counts)
26#   city   department  count
27# 0   LA  Engineering      1
28# 1   LA        Sales      2
29# 2  NYC  Engineering      1
30# 3  NYC        Sales      2

crosstab for Frequency Tables

python
1# Cross-tabulation of two columns
2print(pd.crosstab(df['city'], df['department']))
3# department  Engineering  Sales
4# city
5# LA                    1      2
6# NYC                   1      2
7
8# With margins (row and column totals)
9print(pd.crosstab(df['city'], df['department'], margins=True))
10
11# Normalized by row (percentage within each city)
12print(pd.crosstab(df['city'], df['department'], normalize='index'))

Counting with Conditions

python
1df = pd.DataFrame({
2    'product': ['A', 'B', 'A', 'C', 'B', 'A', 'B', 'C'],
3    'status': ['sold', 'returned', 'sold', 'sold', 'sold', 'returned', 'sold', 'sold']
4})
5
6# Count only rows matching a condition
7sold_counts = df[df['status'] == 'sold']['product'].value_counts()
8print(sold_counts)
9# B    2
10# A    2
11# C    2
12# Name: product, dtype: int64
13
14# Count unique values
15print(df['product'].nunique())  # 3
16
17# Boolean counting: how many meet a condition
18print((df['status'] == 'sold').sum())  # 6

Performance Comparison

python
1import numpy as np
2
3# Large DataFrame for benchmarking
4df = pd.DataFrame({'col': np.random.choice(['A', 'B', 'C', 'D'], size=1_000_000)})
5
6# value_counts — optimized C implementation, fastest
7df['col'].value_counts()
8
9# groupby + size — slightly slower, more flexible
10df.groupby('col').size()
11
12# collections.Counter — Python-level, slower for large data
13from collections import Counter
14Counter(df['col'])
15
16# numpy.unique — fast for numpy arrays
17np.unique(df['col'].values, return_counts=True)

For a single column, value_counts() is fastest. For multi-column grouping, groupby().size() is the right tool.

Converting Results

python
1counts = df['col'].value_counts()
2
3# To a dictionary
4count_dict = counts.to_dict()
5# {'A': 250312, 'C': 250100, 'B': 249894, 'D': 249694}
6
7# To a DataFrame
8count_df = counts.reset_index()
9count_df.columns = ['value', 'frequency']
10
11# Filter to values appearing more than N times
12frequent = counts[counts > 250000]

Common Pitfalls

  • Confusing size() with count() in groupby: size() counts all rows including NaN values, while count() excludes NaN. Use size() when you want the total number of rows per group, and count() when you want non-null entries for a specific column.
  • Forgetting dropna=False for NaN counting: By default, value_counts() excludes NaN values. If your analysis depends on knowing how many missing values exist, pass dropna=False explicitly.
  • Using len(df.groupby('col')) instead of nunique(): len(df.groupby('col')) returns the number of unique groups (same as nunique()), but nunique() is more readable and works directly on a Series without groupby overhead.
  • Expecting value_counts() on the entire DataFrame: value_counts() works on a Series (single column). Calling df.value_counts() counts unique row combinations across all columns, which is rarely what you want. Always specify the column.
  • Not resetting the index after groupby counts: groupby().size() returns a Series with a MultiIndex. To use the result as a regular DataFrame (for merging, exporting, or plotting), call .reset_index(name='count').

Summary

  • Use df['col'].value_counts() for single-column frequency counts (fastest, most common)
  • Pass normalize=True for proportions or bins=N for numeric binning
  • Use df.groupby(['col1', 'col2']).size() for multi-column frequency counts
  • Use pd.crosstab() for two-way frequency tables with optional margins and normalization
  • Call .reset_index(name='count') to convert grouped results back to a regular DataFrame

Course illustration
Course illustration

All Rights Reserved.