Build a multi node Kafka cluster on docker swarm
Master System Design with Codemia
Enhance your system design skills with over 120 practice problems, detailed solutions, and hands-on exercises.
Apache Kafka is an open-source stream-processing software platform developed by Linkedin and donated to the Apache Software Foundation, written in Scala and Java. Kafka aims to provide a unified, high-throughput, low-latency platform for handling real-time data feeds. Its storage layer is essentially a "massively scalable pub/sub message queue designed as a distributed transaction log," making it highly valuable for enterprise infrastructures to process streaming data. Docker Swarm is a container orchestration tool, meaning that it allows the user to manage multiple containers deployed across multiple host machines.
Setting Up a Multi-Node Kafka Cluster on Docker Swarm
Prerequisites:
- Docker Engine: Ensure Docker is installed on all the machines intended to be used in the Swarm.
- Docker Compose: Required to define and run multi-container Docker applications.
Steps to Set Up Docker Swarm:
- Initialize Swarm Mode: First, designate one of your machines as the manager node. Initialize the swarm mode on this node by running:
- Add Worker Nodes: On the manager node, retrieve the join token:
Execute the displayed command on each machine you want to join as a worker to the swarm.
Create a Docker Compose File for Kafka Cluster
This file will define the services required for your Kafka cluster. An example docker-compose.yml for a simple cluster might look like this:
Deploy the Stack
Deploy this configuration to Docker Swarm using:
Validate the Cluster
Check the status of the deployed stack:
Understanding Kafka-Docker Configuration Parameters
The configuration in the compose file specifies the environment variables and options like port mappings. Below are a few key parameters explained:
- KAFKA_CREATE_TOPICS: Automatically create topics on startup.
- KAFKA_ZOOKEEPER_CONNECT: Address for Zookeeper connection.
- KAFKA_LISTENER_SECURITY_PROTOCOL_MAP: Maps listener names to security protocols.
- KAFKA_ADVERTISED_LISTENERS: Listeners to publish to ZooKeeper for clients to use.
Monitoring and Scaling
Kafka on Docker Swarm is capable of being monitored with standard Docker monitoring tools. Scaling can be performed by changing the replicas under the deploy key for the Kafka service in the Docker compose file.
Summary Table
| Component | Description | Configuration Example |
| Zookeeper | Handles coordination for Kafka cluster | zookeeper:2181 |
| Kafka | Core messaging service | image: wurstmeister/kafka |
| Networks | Docker overlay networks for communication across nodes | kafka-net |
| Volumes | Maps local storage to container storage for log retention | /var/run/docker.sock:/var/run/docker.sock |
This setup helps in achieving a reliable and scalable Kafka cluster on Docker Swarm, providing high throughput and low-latency capabilities for real-time data feed handling across distributed systems.

