purge
topic
kafka
system design

Is there a way to purge the topic in Kafka?

Master System Design with Codemia

Enhance your system design skills with over 120 practice problems, detailed solutions, and hands-on exercises.

Kafka does not provide a direct "purge" or "delete all messages" command for a topic. However, there are several workarounds you can use to effectively achieve this. Here's a breakdown of the options:


1. Deleting and Recreating the Topic

This is the simplest and most common approach to "purge" a topic.

  1. Delete the Topic:
bash
 kafka-topics.sh --bootstrap-server <broker-address> --delete --topic <topic-name>
  1. Recreate the Topic:
bash
 kafka-topics.sh --bootstrap-server <broker-address> --create --topic <topic-name> --partitions <num-partitions> --replication-factor <replication-factor>

This will effectively remove all messages in the topic.

Note: Ensure topic deletion is enabled in your Kafka configuration (delete.topic.enable=true).


2. Reduce the Retention Period Temporarily

You can configure the topic's retention period to a very low value, such as 1 millisecond, to delete all current messages, then reset it back to its original value.

  1. Set Retention Period to 1ms:
bash
 kafka-configs.sh --bootstrap-server <broker-address> --alter --entity-type topics --entity-name <topic-name> --add-config retention.ms=1
  1. Wait for Cleanup: Kafka will delete messages that exceed the retention period (this can take a few seconds to minutes).
  2. Restore Original Retention Period:
bash
 kafka-configs.sh --bootstrap-server <broker-address> --alter --entity-type topics --entity-name <topic-name> --add-config retention.ms=<original-value>

3. Use Log Segments Deletion with retention.bytes

You can force Kafka to delete all current messages by setting a very small retention.bytes value temporarily.

  1. Set Small Retention Size:
bash
 kafka-configs.sh --bootstrap-server <broker-address> --alter --entity-type topics --entity-name <topic-name> --add-config retention.bytes=1
  1. Wait for Cleanup: Kafka will delete all log segments for the topic.
  2. Restore Original Retention Size:
bash
 kafka-configs.sh --bootstrap-server <broker-address> --alter --entity-type topics --entity-name <topic-name> --add-config retention.bytes=<original-value>

4. Overwrite the Topic with Dummy Messages

If you can't delete or change retention settings, you can produce dummy messages to overwrite existing data (for compacted topics only).

  1. Produce dummy messages with the same keys as existing ones.
  2. Kafka's log compaction process will remove old messages with the same keys.

5. Use kafka-delete-records to Remove Records

Kafka provides an API to delete records up to a certain offset. This approach works for "purging" up to a point but doesn't delete the entire topic.

  1. Create a JSON file specifying the topic and partitions:
json
1 {
2   "partitions": [
3     {"topic": "my-topic", "partition": 0, "offset": 100}
4   ],
5   "version": 1
6 }
  1. Run the kafka-delete-records.sh command:
bash
 kafka-delete-records.sh --bootstrap-server <broker-address> --offset-json-file <path-to-json>

This will delete all records up to the specified offset in the partition.


Comparison of Methods

MethodProsCons
Deleting and Recreating TopicSimple, complete removal of all dataRequires topic recreation
Reduce Retention PeriodAutomatic cleanup, no topic recreationRequires altering configurations
Retention BytesRemoves all data based on sizeRequires altering configurations
Overwriting with Dummy DataNon-destructive, suitable for compactionTime-consuming, specific to compacted topics
Delete Records by OffsetGranular deletionDoes not purge the entire topic

Recommendation

If you need a complete purge and can afford to delete and recreate the topic, that is the simplest approach. For cases where topic recreation is not an option, temporarily reducing the retention period is the next best choice.


Course illustration
Course illustration

All Rights Reserved.