Count unique values using pandas groupby
Master System Design with Codemia
Enhance your system design skills with over 120 practice problems, detailed solutions, and hands-on exercises.
Introduction
Counting unique values inside groups is one of the most common summary operations in pandas. The direct answer is usually groupby(...).nunique(), but the details matter once missing values, multiple columns, or distinct row combinations enter the picture. A good solution starts by being precise about what “unique” means for the data.
If you want the number of distinct values in one column for each group, nunique is the cleanest tool. If you want distinct combinations across several columns, a slightly different pattern is better.
Count Distinct Values in One Column
Suppose you want to know how many different products each customer bought.
This produces one count per customer. Repeated values inside the same group are counted once.
If you want a regular DataFrame instead of a Series, reset the index:
That format is often easier to merge into later reporting code.
Group by Multiple Keys
You can also group by more than one column and still count unique values in another column.
This is the same idea, just with a multi-column grouping key.
Decide How to Handle Missing Values
By default, nunique ignores missing values. That is often what you want, but it is not always obvious when you look only at the final counts.
Use dropna=False if missing values should count as one distinct category in your business logic.
Count Distinct Row Combinations
Sometimes the real question is not “how many unique values are in this one column?” but “how many unique pairs or tuples appear per group?” In that case, deduplicate the relevant columns first.
That pattern is useful when uniqueness depends on more than one column.
Use agg When Building Larger Summaries
If the unique count is part of a bigger grouped summary, agg keeps everything together.
This is often easier to maintain than computing several separate grouped objects and merging them later.
It also makes the output schema explicit, which is helpful in notebooks and production pipelines alike. When someone reads the code later, they can see the grouped metrics in one place instead of reconstructing them from several intermediate variables.
Common Pitfalls
The most common mistake is using count when the real goal is a distinct count. count counts non-missing rows, not unique values.
Another issue is forgetting that nunique ignores missing values by default. That can make group counts look lower than expected.
A third problem is using nunique on one column when uniqueness actually depends on a combination of columns.
Summary
- Use
groupby(...)[column].nunique()to count distinct values per group. - Add
reset_indexwhen you want the result as a regular DataFrame. - Use
dropna=Falseif missing values should count as a unique category. - For multi-column uniqueness, deduplicate the relevant columns first.
- Use
aggwhen unique counts are part of a larger grouped summary.

