MySQL COUNT DISTINCT

MySQL

COUNT DISTINCT

SQL queries

Database management

Data analysis

MySQL COUNT DISTINCT

Master System Design with Codemia

Enhance your system design skills with over 120 practice problems, detailed solutions, and hands-on exercises.

Start Practicing Learn More

Introduction

COUNT(DISTINCT ...) is the MySQL tool for answering "how many unique values are there?" It looks simple, but joins, null handling, indexes, and multi-column uniqueness can all change the result or the cost of the query.

Basic Meaning of `COUNT(DISTINCT column)`

The simplest form counts unique non-null values in one column.

sql

1CREATE TABLE visits (
2    id INT PRIMARY KEY,
3    user_id INT,
4    page VARCHAR(50)
5);
6
7INSERT INTO visits (id, user_id, page) VALUES
8(1, 10, 'home'),
9(2, 10, 'pricing'),
10(3, 20, 'home'),
11(4, 30, 'home'),
12(5, NULL, 'about');
13
14SELECT COUNT(DISTINCT user_id) AS unique_users
15FROM visits;

This returns 3, not 4, because NULL is ignored by COUNT.

That detail is important. COUNT(DISTINCT column) does not count missing values as one additional distinct category.

Distinct Count with Filters

You can combine COUNT(DISTINCT ...) with WHERE just like any other aggregate query.

sql

SELECT COUNT(DISTINCT user_id) AS users_on_home
FROM visits
WHERE page = 'home';

This answers a narrower question: how many unique users visited the home page.

That is often better than trying to count everything first and filtering later in application code.

Distinct Count Per Group

A common reporting pattern is unique values within each group.

sql

SELECT page, COUNT(DISTINCT user_id) AS unique_users
FROM visits
GROUP BY page;

This gives one distinct count per page. In analytics and dashboards, this pattern is much more common than a single global distinct count.

Multiple Columns

MySQL can count distinct combinations across more than one column.

sql

SELECT COUNT(DISTINCT user_id, page) AS unique_user_page_pairs
FROM visits;

That is different from counting distinct user_id or distinct page. Here, uniqueness is defined by the pair.

This is useful for questions such as:

unique customer-product pairs
unique user-day combinations
unique country-city combinations

Be careful to define the unit of uniqueness correctly. Many bad analytics queries come from counting the wrong shape of thing.

Joins Can Inflate Counts

The most common logic bug with COUNT(DISTINCT ...) is applying it after a join that duplicates rows.

Example:

sql

SELECT COUNT(DISTINCT orders.customer_id)
FROM orders
JOIN order_items ON orders.id = order_items.order_id;

This may still be correct, but only if the business question is "customers with at least one joined order item." If the join multiplies rows unexpectedly, a plain COUNT(*) would be wrong, while COUNT(DISTINCT ...) may repair the result or hide a larger query-design issue.

When results look suspicious, inspect the pre-aggregation rowset first.

sql

SELECT orders.customer_id, order_items.product_id
FROM orders
JOIN order_items ON orders.id = order_items.order_id;

Understand the expanded rows before trusting the aggregate.

Performance Considerations

Distinct counting is often more expensive than plain counting because MySQL has to determine uniqueness. On large tables, performance depends on:

index quality
filter selectivity
join shape
temporary sorting or hashing work

A helpful starting point is indexing the counted column or the relevant filter-plus-count combination.

sql

CREATE INDEX idx_visits_page_user ON visits(page, user_id);

That can help queries such as:

sql

SELECT COUNT(DISTINCT user_id)
FROM visits
WHERE page = 'home';

As always, check the execution plan rather than assuming the index helped.

`COUNT(DISTINCT)` Versus `SELECT DISTINCT`

These two are related but not interchangeable.

This returns one number:

sql

SELECT COUNT(DISTINCT user_id)
FROM visits;

This returns the unique values themselves:

sql

SELECT DISTINCT user_id
FROM visits;

If you need both the values and the count, sometimes it is clearer to write a subquery.

sql

1SELECT COUNT(*)
2FROM (
3    SELECT DISTINCT user_id
4    FROM visits
5) AS unique_users;

That is not always faster, but it can make the intent easier to review.

Common Pitfalls

Forgetting that COUNT(DISTINCT column) ignores NULL.
Counting after joins without first understanding row multiplication.
Using the wrong uniqueness unit, such as one column when the business concept is a pair.
Assuming distinct counts are cheap on large tables with no useful indexes.
Confusing COUNT(DISTINCT ...) with SELECT DISTINCT ....

Summary

'COUNT(DISTINCT column) counts unique non-null values.'
You can combine it with WHERE and GROUP BY for filtered and grouped analytics.
MySQL can count distinct combinations across multiple columns.
Joins can change the shape of the data, so inspect the rowset before trusting aggregates.
Indexes and execution plans matter because distinct counting can be expensive at scale.

MySQL COUNT DISTINCT

Master System Design with Codemia

Introduction

Basic Meaning of COUNT(DISTINCT column)

Distinct Count with Filters

Distinct Count Per Group

Multiple Columns

Joins Can Inflate Counts

Performance Considerations

COUNT(DISTINCT) Versus SELECT DISTINCT

Common Pitfalls

Summary

Basic Meaning of `COUNT(DISTINCT column)`

`COUNT(DISTINCT)` Versus `SELECT DISTINCT`