Counting unique values in a column in pandas dataframe like in Qlik?

Pandas

DataFrame

Unique Values

Python

Qlik

Counting unique values in a column in pandas dataframe like in Qlik?

Master System Design with Codemia

Enhance your system design skills with over 120 practice problems, detailed solutions, and hands-on exercises.

Start Practicing Learn More

Introduction

If you are moving from Qlik to pandas, the distinct-count equivalent is usually nunique(). The main thing to decide is whether you want a plain unique count, a value-by-value frequency breakdown, or a grouped distinct count such as unique customers per region.

Use `nunique()` for a Distinct Count

The most direct translation of a distinct count on one field is Series.nunique().

python

1import pandas as pd
2
3sales = pd.DataFrame({
4    "customer": ["A", "B", "A", "C", "B", "D"],
5    "region": ["NA", "NA", "EU", "EU", "NA", "APAC"],
6    "amount": [100, 200, 150, 90, 220, 80],
7})
8
9print(sales["customer"].nunique())

That gives the number of different customer values in the column. It is usually the right answer when someone coming from BI tooling asks for a unique count.

Decide How Missing Values Should Behave

By default, nunique() ignores missing values. If nulls should count as their own category, say so explicitly.

python

print(sales["customer"].nunique(dropna=False))

This matters because dashboards often have a business rule about whether missing values should disappear or should be visible as a separate category.

Use `value_counts()` When You Want the Breakdown

Sometimes the real requirement is not the number of distinct values, but how often each distinct value appears. In that case, use value_counts().

python

counts = sales["customer"].value_counts()
print(counts)

You can also normalize the result to get shares instead of raw counts.

python

shares = sales["customer"].value_counts(normalize=True)
print(shares)

That is often closer to the kinds of distribution tables people build in BI tools.

Count Distinct Values Within Groups

A common Qlik-style question is something like "how many unique customers are there per region." In pandas, combine groupby with nunique.

python

1by_region = (
2    sales.groupby("region")["customer"]
3    .nunique()
4    .reset_index(name="distinct_customers")
5)
6
7print(by_region)

You can extend the pattern to multiple grouping keys when the business question needs more dimensions.

python

1orders = pd.DataFrame({
2    "month": ["2026-01", "2026-01", "2026-02", "2026-02", "2026-02"],
3    "region": ["NA", "EU", "NA", "EU", "EU"],
4    "customer": ["A", "A", "B", "C", "A"],
5})
6
7summary = (
8    orders.groupby(["month", "region"])["customer"]
9    .nunique()
10    .reset_index(name="distinct_customers")
11)
12
13print(summary)

Count Unique Combinations Across Columns

Sometimes uniqueness is defined by a combination of columns, such as customer plus region. In that case, counting one column is not enough. Use drop_duplicates() on the subset that defines uniqueness.

python

pairs = sales[["customer", "region"]].drop_duplicates()
print(len(pairs))
print(pairs)

This is the correct pattern when the business entity is a composite key instead of a single field.

Keep Larger Data Sets Practical

For larger frames, distinct counts are still straightforward, but memory choices matter. If a text dimension has many repeated values, converting it to category can reduce memory and make repeated grouped operations cheaper.

python

sales["region"] = sales["region"].astype("category")
result = sales.groupby("region", observed=True)["customer"].nunique()
print(result)

That is not always necessary, but it is a useful optimization when the data volume grows.

Common Pitfalls

Using len(df["col"].unique()) everywhere works, but nunique() is usually clearer and handles null behavior more explicitly.
Forgetting null handling can make pandas results disagree with a dashboard definition.
Counting rows after a join when the requirement is distinct business entities produces inflated metrics.
Ignoring composite uniqueness rules gives the wrong answer when one column is not enough to define an entity.
Grouping after the wrong preprocessing step can change the business meaning of the count.

Summary

Use nunique() for the direct distinct-count equivalent on one column.
Use value_counts() when you need counts for each unique value.
Combine groupby and nunique() for grouped distinct metrics.
Use drop_duplicates() when uniqueness depends on multiple columns.
Be explicit about null handling and business definitions so the count matches the question.

Counting unique values in a column in pandas dataframe like in Qlik?

Master System Design with Codemia

Introduction

Use nunique() for a Distinct Count

Decide How Missing Values Should Behave

Use value_counts() When You Want the Breakdown

Count Distinct Values Within Groups

Count Unique Combinations Across Columns

Keep Larger Data Sets Practical

Common Pitfalls

Summary

Use `nunique()` for a Distinct Count

Use `value_counts()` When You Want the Breakdown