Kafka retention policies
Master System Design with Codemia
Enhance your system design skills with over 120 practice problems, detailed solutions, and hands-on exercises.
Apache Kafka is a distributed event streaming platform that serves as a robust substrate for handling real-time data feeds. A critical aspect of managing streams within Kafka is understanding and applying the appropriate retention policies that dictate how long data is kept before being discarded. These policies are pivotal for effective space management and ensuring compliance with data governance standards.
Retention Policy Types
Kafka allows retention to be configured primarily based on time, size, or a combination of both, affecting how long data remains available in a Kafka topic:
- Time-based Retention: Messages are retained in a topic for a configurable time period. After this period, older messages are deleted.
- Size-based Retention: Messages are maintained in a topic up to a maximum size in bytes. Older messages are deleted once the size exceeds the specified threshold.
The retention settings can be controlled at the broker or topic level, with topic-level configurations overriding broker-level defaults.
How Retention Policies Are Applied
Time-based Retention
For time-based retention, the property retention.ms is used. By default, it is set to 7 days (604800000 milliseconds), but it can be altered to meet specific requirements.
Example: To set a topic to retain messages for 3 days, the command would be:
Size-based Retention
For size-based retention, the property retention.bytes dictates the maximum size in bytes of logs that can be retained in a topic. Previous messages are purged to keep the log size below this threshold.
Example: To set a topic to retain messages up to a maximum size of 500MB, the command is:
Log Cleanup Policies
Kafka offers two log cleanup policies: delete and compact. These can be set using the cleanup.policy configuration:
- Delete Policy: This is the default mode where messages are deleted either when they surpass the maximum size or become older than the retention period.
- Compact Policy: This policy is useful for log reduction by keeping only the latest value for each key within the partition. It’s especially beneficial for topics that store state data like current user preferences.
Example:
Considerations in Retention Policy Configuration
When configuring retention policies, several factors must be considered:
- Storage Capacity: Ensuring there is adequate disk space on Kafka brokers to handle the data produced at configures retention settings.
- Performance: Keeping a large amount of data might degrade broker performance, especially in terms of recovery time and read/write speeds.
- Compliance and Governance: Adhering to standards that require specific retention periods for data storage.
Summary Table
| Configuration Parameter | Description | Default Value | Command Example |
retention.ms | Time to retain a message in milliseconds. | 604800000 (7 days) | retention.ms=259200000 |
retention.bytes | Maximum log size in bytes before deletion. | -1 (unlimited) | retention.bytes=524288000 |
cleanup.policy | Policy for how logs should be maintained. | delete | cleanup.policy=compact |
Overall, understanding and configuring Kafka's retention policies precisely is essential for optimizing resource management and ensuring that the Kafka ecosystem aligns with business and technical objectives. Proper tuning of these settings helps maintain a balance between availability, performance, and cost-effectiveness of the Kafka infrastructure.

