partitions
topics
kafka
Understanding Kafka Topics and Partitions
Master System Design with Codemia
Enhance your system design skills with over 120 practice problems, detailed solutions, and hands-on exercises.
Understanding Kafka topics and partitions is crucial for designing a Kafka-based system. Here's a detailed explanation:
Kafka Topics
- A topic is a category or feed name to which records (messages) are sent by producers.
- Topics in Kafka are logical constructs that allow grouping of related messages.
- Each message published to Kafka must belong to a specific topic.
- Topics are immutable, meaning that once data is written, it cannot be modified or deleted (until the retention period expires).
Key Points about Topics:
- Topics are distributed: A topic can have multiple partitions spread across Kafka brokers.
- Durability: Messages in a topic are stored for a configurable retention period (e.g., days, weeks).
- Producers and Consumers: Producers write to topics, and consumers read from topics.
Kafka Partitions
- Partitions are sub-divisions of a topic, enabling scalability and parallelism.
- Each topic consists of one or more partitions.
- A partition is a log where messages are appended sequentially, identified by their offset (a unique identifier for the message within the partition).
Key Points about Partitions:
- Scalability:
- Multiple partitions allow a topic to be spread across multiple brokers, enabling Kafka to handle large volumes of data.
- Partitioning enables parallel processing by consumers, as each partition can be processed by a separate consumer in a consumer group.
- Ordering Guarantees:
- Kafka guarantees ordering of messages within a single partition.
- There is no ordering guarantee across partitions of a topic.
- Replication:
- Partitions can be replicated for fault tolerance. Each partition has one leader and multiple followers.
- Only the leader handles read/write operations, and followers replicate the data.
Producer and Partitioning
- Producers send messages to topics and determine which partition to use based on:
- Key-based Partitioning (common):
- The producer can specify a key for each message.
- Kafka uses a partitioner (usually a hash function) to assign the message to a partition.
- Round-Robin Partitioning (default for no key):
- Messages are evenly distributed across partitions.
- Custom Partitioning:
- You can implement a custom partitioner to control how messages are distributed.
Consumer and Partition Assignment
- Consumers in a consumer group are assigned partitions.
- Kafka ensures that each partition is consumed by only one consumer in the group for parallelism.
- If the number of consumers exceeds the number of partitions, some consumers remain idle.
Example: Topic and Partition Workflow
Scenario:
- Topic:
user-activity - Partitions: 3
- Replication Factor: 2
- Messages:
{userId: 1, action: 'login'}
- Producers send messages to the
user-activitytopic. - Messages are assigned to partitions:
- Partition 0:
{userId: 1, action: 'login'}(if the hash ofuserId: 1maps to Partition 0) - Partition 1:
{userId: 2, action: 'logout'} - Partition 2:
{userId: 3, action: 'signup'}
- Consumers in a group consume from these partitions:
- Consumer 1 reads Partition 0.
- Consumer 2 reads Partition 1.
- Consumer 3 reads Partition 2.
Advantages of Topics and Partitions
- Scalability: Enables horizontal scaling by distributing partitions across brokers.
- Fault Tolerance: Replication ensures data availability during broker failures.
- Parallelism: Consumers can process data in parallel by consuming from different partitions.
Key Considerations
- Number of Partitions:
- Impacts scalability and throughput.
- Too few partitions can limit parallelism.
- Too many partitions can increase overhead (e.g., metadata management).
- Replication Factor:
- A higher replication factor improves fault tolerance but increases storage costs.
- Partition Key:
- Choosing a good key ensures balanced distribution of messages across partitions.
Summary
| Concept | Description |
| Topic | A logical category for organizing messages. |
| Partition | A sub-division of a topic enabling scalability and parallel processing. |
| Offset | A unique identifier for messages within a partition. |
| Key | Used by producers to determine the partition for a message. |
| Replication | Ensures fault tolerance by copying data across multiple brokers. |
| Consumer Group | A group of consumers that work together to consume messages from a topic's partitions in parallel. |
By designing Kafka topics and partitions properly, you can achieve a scalable, efficient, and reliable distributed messaging system.

