pandas
groupby
unique values
data analysis
Python

Count unique values using pandas groupby

Master System Design with Codemia

Enhance your system design skills with over 120 practice problems, detailed solutions, and hands-on exercises.

Introduction

Counting unique values inside groups is one of the most common summary operations in pandas. The direct answer is usually groupby(...).nunique(), but the details matter once missing values, multiple columns, or distinct row combinations enter the picture. A good solution starts by being precise about what “unique” means for the data.

If you want the number of distinct values in one column for each group, nunique is the cleanest tool. If you want distinct combinations across several columns, a slightly different pattern is better.

Count Distinct Values in One Column

Suppose you want to know how many different products each customer bought.

python
1import pandas as pd
2
3df = pd.DataFrame(
4    {
5        "customer": ["A", "A", "A", "B", "B", "C"],
6        "product": ["pen", "pen", "notebook", "pen", "pencil", "pen"],
7    }
8)
9
10result = df.groupby("customer")["product"].nunique()
11print(result)

This produces one count per customer. Repeated values inside the same group are counted once.

If you want a regular DataFrame instead of a Series, reset the index:

python
1result = (
2    df.groupby("customer")["product"]
3      .nunique()
4      .reset_index(name="unique_products")
5)
6
7print(result)

That format is often easier to merge into later reporting code.

Group by Multiple Keys

You can also group by more than one column and still count unique values in another column.

python
1sales = pd.DataFrame(
2    {
3        "region": ["East", "East", "East", "West", "West"],
4        "salesperson": ["Ana", "Ana", "Ben", "Ana", "Ana"],
5        "product": ["pen", "notebook", "pen", "pen", "pen"],
6    }
7)
8
9summary = sales.groupby(["region", "salesperson"])["product"].nunique()
10print(summary)

This is the same idea, just with a multi-column grouping key.

Decide How to Handle Missing Values

By default, nunique ignores missing values. That is often what you want, but it is not always obvious when you look only at the final counts.

python
1df = pd.DataFrame(
2    {
3        "team": ["A", "A", "B", "B"],
4        "member": ["Mina", None, "Raj", None],
5    }
6)
7
8print(df.groupby("team")["member"].nunique())
9print(df.groupby("team")["member"].nunique(dropna=False))

Use dropna=False if missing values should count as one distinct category in your business logic.

Count Distinct Row Combinations

Sometimes the real question is not “how many unique values are in this one column?” but “how many unique pairs or tuples appear per group?” In that case, deduplicate the relevant columns first.

python
1orders = pd.DataFrame(
2    {
3        "customer": ["A", "A", "A", "B", "B"],
4        "product": ["pen", "pen", "pen", "pen", "pen"],
5        "color": ["blue", "blue", "red", "blue", "blue"],
6    }
7)
8
9unique_pairs = (
10    orders[["customer", "product", "color"]]
11    .drop_duplicates()
12    .groupby("customer")
13    .size()
14    .reset_index(name="unique_product_color_pairs")
15)
16
17print(unique_pairs)

That pattern is useful when uniqueness depends on more than one column.

Use agg When Building Larger Summaries

If the unique count is part of a bigger grouped summary, agg keeps everything together.

python
1summary = df.groupby("customer").agg(
2    unique_products=("product", "nunique"),
3    total_rows=("product", "size"),
4)
5
6print(summary)

This is often easier to maintain than computing several separate grouped objects and merging them later.

It also makes the output schema explicit, which is helpful in notebooks and production pipelines alike. When someone reads the code later, they can see the grouped metrics in one place instead of reconstructing them from several intermediate variables.

Common Pitfalls

The most common mistake is using count when the real goal is a distinct count. count counts non-missing rows, not unique values.

Another issue is forgetting that nunique ignores missing values by default. That can make group counts look lower than expected.

A third problem is using nunique on one column when uniqueness actually depends on a combination of columns.

Summary

  • Use groupby(...)[column].nunique() to count distinct values per group.
  • Add reset_index when you want the result as a regular DataFrame.
  • Use dropna=False if missing values should count as a unique category.
  • For multi-column uniqueness, deduplicate the relevant columns first.
  • Use agg when unique counts are part of a larger grouped summary.

Course illustration
Course illustration

All Rights Reserved.