Counting unique values in a column in pandas dataframe like in Qlik?
Master System Design with Codemia
Enhance your system design skills with over 120 practice problems, detailed solutions, and hands-on exercises.
Introduction
If you are moving from Qlik to pandas, the distinct-count equivalent is usually nunique(). The main thing to decide is whether you want a plain unique count, a value-by-value frequency breakdown, or a grouped distinct count such as unique customers per region.
Use nunique() for a Distinct Count
The most direct translation of a distinct count on one field is Series.nunique().
That gives the number of different customer values in the column. It is usually the right answer when someone coming from BI tooling asks for a unique count.
Decide How Missing Values Should Behave
By default, nunique() ignores missing values. If nulls should count as their own category, say so explicitly.
This matters because dashboards often have a business rule about whether missing values should disappear or should be visible as a separate category.
Use value_counts() When You Want the Breakdown
Sometimes the real requirement is not the number of distinct values, but how often each distinct value appears. In that case, use value_counts().
You can also normalize the result to get shares instead of raw counts.
That is often closer to the kinds of distribution tables people build in BI tools.
Count Distinct Values Within Groups
A common Qlik-style question is something like "how many unique customers are there per region." In pandas, combine groupby with nunique.
You can extend the pattern to multiple grouping keys when the business question needs more dimensions.
Count Unique Combinations Across Columns
Sometimes uniqueness is defined by a combination of columns, such as customer plus region. In that case, counting one column is not enough. Use drop_duplicates() on the subset that defines uniqueness.
This is the correct pattern when the business entity is a composite key instead of a single field.
Keep Larger Data Sets Practical
For larger frames, distinct counts are still straightforward, but memory choices matter. If a text dimension has many repeated values, converting it to category can reduce memory and make repeated grouped operations cheaper.
That is not always necessary, but it is a useful optimization when the data volume grows.
Common Pitfalls
- Using
len(df["col"].unique())everywhere works, butnunique()is usually clearer and handles null behavior more explicitly. - Forgetting null handling can make pandas results disagree with a dashboard definition.
- Counting rows after a join when the requirement is distinct business entities produces inflated metrics.
- Ignoring composite uniqueness rules gives the wrong answer when one column is not enough to define an entity.
- Grouping after the wrong preprocessing step can change the business meaning of the count.
Summary
- Use
nunique()for the direct distinct-count equivalent on one column. - Use
value_counts()when you need counts for each unique value. - Combine
groupbyandnunique()for grouped distinct metrics. - Use
drop_duplicates()when uniqueness depends on multiple columns. - Be explicit about null handling and business definitions so the count matches the question.

