Kafka
Data Retention
Retention Policies
Data Management
Distributed Systems

Kafka retention policies

Master System Design with Codemia

Enhance your system design skills with over 120 practice problems, detailed solutions, and hands-on exercises.

Apache Kafka is a distributed event streaming platform that serves as a robust substrate for handling real-time data feeds. A critical aspect of managing streams within Kafka is understanding and applying the appropriate retention policies that dictate how long data is kept before being discarded. These policies are pivotal for effective space management and ensuring compliance with data governance standards.

Retention Policy Types

Kafka allows retention to be configured primarily based on time, size, or a combination of both, affecting how long data remains available in a Kafka topic:

  • Time-based Retention: Messages are retained in a topic for a configurable time period. After this period, older messages are deleted.
  • Size-based Retention: Messages are maintained in a topic up to a maximum size in bytes. Older messages are deleted once the size exceeds the specified threshold.

The retention settings can be controlled at the broker or topic level, with topic-level configurations overriding broker-level defaults.

How Retention Policies Are Applied

Time-based Retention

For time-based retention, the property retention.ms is used. By default, it is set to 7 days (604800000 milliseconds), but it can be altered to meet specific requirements.

Example: To set a topic to retain messages for 3 days, the command would be:

bash
kafka-configs --bootstrap-server localhost:9092 --entity-type topics --entity-name my-topic --add-config retention.ms=259200000

Size-based Retention

For size-based retention, the property retention.bytes dictates the maximum size in bytes of logs that can be retained in a topic. Previous messages are purged to keep the log size below this threshold.

Example: To set a topic to retain messages up to a maximum size of 500MB, the command is:

bash
kafka-configs --bootstrap-server localhost:9092 --entity-type topics --entity-name my-topic --add-config retention.bytes=524288000

Log Cleanup Policies

Kafka offers two log cleanup policies: delete and compact. These can be set using the cleanup.policy configuration:

  • Delete Policy: This is the default mode where messages are deleted either when they surpass the maximum size or become older than the retention period.
  • Compact Policy: This policy is useful for log reduction by keeping only the latest value for each key within the partition. It’s especially beneficial for topics that store state data like current user preferences.

Example:

bash
kafka-configs --bootstrap-server localhost:9092 --entity-type topics --entity-name my-topic --add-config cleanup.policy=compact

Considerations in Retention Policy Configuration

When configuring retention policies, several factors must be considered:

  1. Storage Capacity: Ensuring there is adequate disk space on Kafka brokers to handle the data produced at configures retention settings.
  2. Performance: Keeping a large amount of data might degrade broker performance, especially in terms of recovery time and read/write speeds.
  3. Compliance and Governance: Adhering to standards that require specific retention periods for data storage.

Summary Table

Configuration ParameterDescriptionDefault ValueCommand Example
retention.msTime to retain a message in milliseconds.604800000 (7 days)retention.ms=259200000
retention.bytesMaximum log size in bytes before deletion.-1 (unlimited)retention.bytes=524288000
cleanup.policyPolicy for how logs should be maintained.deletecleanup.policy=compact

Overall, understanding and configuring Kafka's retention policies precisely is essential for optimizing resource management and ensuring that the Kafka ecosystem aligns with business and technical objectives. Proper tuning of these settings helps maintain a balance between availability, performance, and cost-effectiveness of the Kafka infrastructure.


Course illustration
Course illustration

All Rights Reserved.