MySQL COUNT DISTINCT
Master System Design with Codemia
Enhance your system design skills with over 120 practice problems, detailed solutions, and hands-on exercises.
Introduction
COUNT(DISTINCT ...) is the MySQL tool for answering "how many unique values are there?" It looks simple, but joins, null handling, indexes, and multi-column uniqueness can all change the result or the cost of the query.
Basic Meaning of COUNT(DISTINCT column)
The simplest form counts unique non-null values in one column.
This returns 3, not 4, because NULL is ignored by COUNT.
That detail is important. COUNT(DISTINCT column) does not count missing values as one additional distinct category.
Distinct Count with Filters
You can combine COUNT(DISTINCT ...) with WHERE just like any other aggregate query.
This answers a narrower question: how many unique users visited the home page.
That is often better than trying to count everything first and filtering later in application code.
Distinct Count Per Group
A common reporting pattern is unique values within each group.
This gives one distinct count per page. In analytics and dashboards, this pattern is much more common than a single global distinct count.
Multiple Columns
MySQL can count distinct combinations across more than one column.
That is different from counting distinct user_id or distinct page. Here, uniqueness is defined by the pair.
This is useful for questions such as:
- unique customer-product pairs
- unique user-day combinations
- unique country-city combinations
Be careful to define the unit of uniqueness correctly. Many bad analytics queries come from counting the wrong shape of thing.
Joins Can Inflate Counts
The most common logic bug with COUNT(DISTINCT ...) is applying it after a join that duplicates rows.
Example:
This may still be correct, but only if the business question is "customers with at least one joined order item." If the join multiplies rows unexpectedly, a plain COUNT(*) would be wrong, while COUNT(DISTINCT ...) may repair the result or hide a larger query-design issue.
When results look suspicious, inspect the pre-aggregation rowset first.
Understand the expanded rows before trusting the aggregate.
Performance Considerations
Distinct counting is often more expensive than plain counting because MySQL has to determine uniqueness. On large tables, performance depends on:
- index quality
- filter selectivity
- join shape
- temporary sorting or hashing work
A helpful starting point is indexing the counted column or the relevant filter-plus-count combination.
That can help queries such as:
As always, check the execution plan rather than assuming the index helped.
COUNT(DISTINCT) Versus SELECT DISTINCT
These two are related but not interchangeable.
This returns one number:
This returns the unique values themselves:
If you need both the values and the count, sometimes it is clearer to write a subquery.
That is not always faster, but it can make the intent easier to review.
Common Pitfalls
- Forgetting that
COUNT(DISTINCT column)ignoresNULL. - Counting after joins without first understanding row multiplication.
- Using the wrong uniqueness unit, such as one column when the business concept is a pair.
- Assuming distinct counts are cheap on large tables with no useful indexes.
- Confusing
COUNT(DISTINCT ...)withSELECT DISTINCT ....
Summary
- '
COUNT(DISTINCT column)counts unique non-null values.' - You can combine it with
WHEREandGROUP BYfor filtered and grouped analytics. - MySQL can count distinct combinations across multiple columns.
- Joins can change the shape of the data, so inspect the rowset before trusting aggregates.
- Indexes and execution plans matter because distinct counting can be expensive at scale.

