Pandas
DataFrame
Unique Values
Python
Qlik

Counting unique values in a column in pandas dataframe like in Qlik?

Master System Design with Codemia

Enhance your system design skills with over 120 practice problems, detailed solutions, and hands-on exercises.

Introduction

If you are moving from Qlik to pandas, the distinct-count equivalent is usually nunique(). The main thing to decide is whether you want a plain unique count, a value-by-value frequency breakdown, or a grouped distinct count such as unique customers per region.

Use nunique() for a Distinct Count

The most direct translation of a distinct count on one field is Series.nunique().

python
1import pandas as pd
2
3sales = pd.DataFrame({
4    "customer": ["A", "B", "A", "C", "B", "D"],
5    "region": ["NA", "NA", "EU", "EU", "NA", "APAC"],
6    "amount": [100, 200, 150, 90, 220, 80],
7})
8
9print(sales["customer"].nunique())

That gives the number of different customer values in the column. It is usually the right answer when someone coming from BI tooling asks for a unique count.

Decide How Missing Values Should Behave

By default, nunique() ignores missing values. If nulls should count as their own category, say so explicitly.

python
print(sales["customer"].nunique(dropna=False))

This matters because dashboards often have a business rule about whether missing values should disappear or should be visible as a separate category.

Use value_counts() When You Want the Breakdown

Sometimes the real requirement is not the number of distinct values, but how often each distinct value appears. In that case, use value_counts().

python
counts = sales["customer"].value_counts()
print(counts)

You can also normalize the result to get shares instead of raw counts.

python
shares = sales["customer"].value_counts(normalize=True)
print(shares)

That is often closer to the kinds of distribution tables people build in BI tools.

Count Distinct Values Within Groups

A common Qlik-style question is something like "how many unique customers are there per region." In pandas, combine groupby with nunique.

python
1by_region = (
2    sales.groupby("region")["customer"]
3    .nunique()
4    .reset_index(name="distinct_customers")
5)
6
7print(by_region)

You can extend the pattern to multiple grouping keys when the business question needs more dimensions.

python
1orders = pd.DataFrame({
2    "month": ["2026-01", "2026-01", "2026-02", "2026-02", "2026-02"],
3    "region": ["NA", "EU", "NA", "EU", "EU"],
4    "customer": ["A", "A", "B", "C", "A"],
5})
6
7summary = (
8    orders.groupby(["month", "region"])["customer"]
9    .nunique()
10    .reset_index(name="distinct_customers")
11)
12
13print(summary)

Count Unique Combinations Across Columns

Sometimes uniqueness is defined by a combination of columns, such as customer plus region. In that case, counting one column is not enough. Use drop_duplicates() on the subset that defines uniqueness.

python
pairs = sales[["customer", "region"]].drop_duplicates()
print(len(pairs))
print(pairs)

This is the correct pattern when the business entity is a composite key instead of a single field.

Keep Larger Data Sets Practical

For larger frames, distinct counts are still straightforward, but memory choices matter. If a text dimension has many repeated values, converting it to category can reduce memory and make repeated grouped operations cheaper.

python
sales["region"] = sales["region"].astype("category")
result = sales.groupby("region", observed=True)["customer"].nunique()
print(result)

That is not always necessary, but it is a useful optimization when the data volume grows.

Common Pitfalls

  • Using len(df["col"].unique()) everywhere works, but nunique() is usually clearer and handles null behavior more explicitly.
  • Forgetting null handling can make pandas results disagree with a dashboard definition.
  • Counting rows after a join when the requirement is distinct business entities produces inflated metrics.
  • Ignoring composite uniqueness rules gives the wrong answer when one column is not enough to define an entity.
  • Grouping after the wrong preprocessing step can change the business meaning of the count.

Summary

  • Use nunique() for the direct distinct-count equivalent on one column.
  • Use value_counts() when you need counts for each unique value.
  • Combine groupby and nunique() for grouped distinct metrics.
  • Use drop_duplicates() when uniqueness depends on multiple columns.
  • Be explicit about null handling and business definitions so the count matches the question.

Course illustration
Course illustration

All Rights Reserved.