Java, How to get number of messages in a topic in apache kafka

system design

messages

apache kafka

java

Java, How to get number of messages in a topic in apache kafka

Master System Design with Codemia

Enhance your system design skills with over 120 practice problems, detailed solutions, and hands-on exercises.

Start Practicing Learn More

Introduction

Kafka does not expose a single built-in field called "message count" for a topic in the way a traditional queue might. In Java, the usual approach is to inspect offsets for each partition and compute an approximate retained-record count, but that number only answers a specific question and should be explained carefully.

Decide What Count You Actually Mean

Before writing code, decide what you want to measure. Different teams use the phrase "number of messages in a topic" to mean very different things:

all records ever produced since the topic was created
records currently retained on disk
records not yet consumed by one consumer group
records visible under read_committed

Kafka offsets help most with the second interpretation: how many records are currently retained in each partition. For a non-compacted topic, a practical estimate is:

latest offset - earliest offset

If you sum that value across all partitions, you get an approximate count of retained records. That is often good enough for diagnostics, dashboards, and rough capacity checks.

Offset-Based Approximation with AdminClient

The Java AdminClient API can describe the topic, list its partitions, and fetch the earliest and latest offsets for each partition. The following example prints a retained-record estimate for one topic.

java

1import org.apache.kafka.clients.admin.AdminClient;
2import org.apache.kafka.clients.admin.AdminClientConfig;
3import org.apache.kafka.clients.admin.ListOffsetsResult;
4import org.apache.kafka.clients.admin.OffsetSpec;
5import org.apache.kafka.clients.admin.TopicDescription;
6import org.apache.kafka.common.TopicPartition;
7
8import java.util.ArrayList;
9import java.util.Collections;
10import java.util.HashMap;
11import java.util.List;
12import java.util.Map;
13import java.util.Properties;
14
15public class KafkaTopicMessageCount {
16    public static void main(String[] args) throws Exception {
17        String bootstrapServers = "localhost:9092";
18        String topic = "orders";
19
20        Properties props = new Properties();
21        props.put(AdminClientConfig.BOOTSTRAP_SERVERS_CONFIG, bootstrapServers);
22
23        try (AdminClient admin = AdminClient.create(props)) {
24            TopicDescription description =
25                    admin.describeTopics(Collections.singleton(topic))
26                         .allTopicNames()
27                         .get()
28                         .get(topic);
29
30            List<TopicPartition> partitions = new ArrayList<>();
31            description.partitions().forEach(info ->
32                    partitions.add(new TopicPartition(topic, info.partition())));
33
34            Map<TopicPartition, OffsetSpec> earliestRequest = new HashMap<>();
35            Map<TopicPartition, OffsetSpec> latestRequest = new HashMap<>();
36
37            for (TopicPartition partition : partitions) {
38                earliestRequest.put(partition, OffsetSpec.earliest());
39                latestRequest.put(partition, OffsetSpec.latest());
40            }
41
42            ListOffsetsResult earliestResult = admin.listOffsets(earliestRequest);
43            ListOffsetsResult latestResult = admin.listOffsets(latestRequest);
44
45            long total = 0;
46            for (TopicPartition partition : partitions) {
47                long earliest = earliestResult.all().get().get(partition).offset();
48                long latest = latestResult.all().get().get(partition).offset();
49                total += (latest - earliest);
50            }
51
52            System.out.println("Approximate retained record count: " + total);
53        }
54    }
55}

The code does three things:

Reads topic metadata to discover every partition.
Requests the earliest and latest offsets for each partition.
Sums the offset differences.

If one partition has earliest 100 and latest 140, Kafka currently retains about 40 records in that partition. Adding all partition totals gives a topic-level estimate.

What the Number Means

This offset-based total is useful, but it is not a universal truth about the topic.

On a normal topic with time-based or size-based retention, the estimate describes records still retained in the log at the moment you ask. It does not tell you how many records have been produced over the full lifetime of the topic, because older data may already have expired.

Compacted topics need even more caution. Log compaction can remove older records with the same key while keeping newer ones, so offset gaps do not map cleanly to currently materialized key-value entries. You are still looking at offset movement in the log, not a clean count of unique logical records.

Transactional workloads also matter. If your question is about records visible to consumers using read_committed, raw latest offsets may not match what an application can actually read at a given moment. In that case, the business metric you want may need a consumer-based view instead of pure topic metadata.

When to Use a Different Metric

If you really want consumer lag, compare the consumer group's committed offsets to the topic's end offsets. If you want total processed events for analytics or billing, Kafka itself is usually the wrong source of truth. That kind of number is better written to a durable metrics store or counted in your stream-processing pipeline.

A good rule is:

use offset differences for rough retained-message estimates
use consumer lag APIs for backlog
use application metrics for exact business counts

That keeps the operational meaning of the number clear.

Common Pitfalls

Assuming latest - earliest means total lifetime messages produced. Retention can remove old data.
Treating the count as exact on compacted topics. Compaction changes the relationship between offsets and logical records.
Using topic-level counts as a business metric for billing or reporting. Kafka metadata is usually too low-level for that purpose.
Forgetting that the topic may have multiple partitions. A single-partition calculation is incomplete for real deployments.
Ignoring visibility semantics such as transactions and committed reads when your consumers depend on them.

Summary

Kafka does not provide one universal "message count" field for a topic.
In Java, a common approach is summing latest offset - earliest offset across partitions.
That number usually estimates currently retained records, not lifetime produced records.
Retention, compaction, and transactional semantics can change what the count means.
For exact backlog or business metrics, use consumer offsets or application-level counters instead of raw topic metadata.